All of lore.kernel.org
 help / color / mirror / Atom feed
From: Christian Brauner <brauner@kernel.org>
To: Mike Christie <michael.christie@oracle.com>,
	Linus Torvalds <torvalds@linux-foundation.org>
Cc: oleg@redhat.com, linux@leemhuis.info, nicolas.dichtel@6wind.com,
	axboe@kernel.dk, ebiederm@xmission.com,
	linux-kernel@vger.kernel.org,
	virtualization@lists.linux-foundation.org, mst@redhat.com,
	sgarzare@redhat.com, jasowang@redhat.com, stefanha@redhat.com,
	Linux kernel regressions list <regressions@lists.linux.dev>,
	hch@infradead.org, konrad.wilk@oracle.com
Subject: Re: [RFC PATCH 0/8] vhost_tasks: Use CLONE_THREAD/SIGHAND
Date: Fri, 19 May 2023 14:15:02 +0200	[thread overview]
Message-ID: <20230519-vormittag-dschungel-83607e9d2255@brauner> (raw)
In-Reply-To: <20230518-appetit-aufsicht-238e950b97d6@brauner> <20230516-weltmeere-backofen-27f12ae2c9e0@brauner>

On Thu, May 18, 2023 at 10:25:11AM +0200, Christian Brauner wrote:
> On Wed, May 17, 2023 at 07:09:12PM -0500, Mike Christie wrote:
> > This patch allows the vhost and vhost_task code to use CLONE_THREAD,
> > CLONE_SIGHAND and CLONE_FILES. It's a RFC because I didn't do all the
> > normal testing, haven't coverted vsock and vdpa, and I know you guys
> > will not like the first patch. However, I think it better shows what
> 
> Just to summarize the core idea behind my proposal is that no signal
> handling changes are needed unless there's a bug in the current way
> io_uring workers already work. All that should be needed is
> s/PF_IO_WORKER/PF_USER_WORKER/ in signal.c.
> 
> If you follow my proposal than vhost and io_uring workers should almost
> collapse into the same concept. Specifically, io_uring workers and vhost
> workers should behave the same when it comes ot handling signals.
> 
> See 
> https://lore.kernel.org/lkml/20230518-kontakt-geduckt-25bab595f503@brauner
> 
> 
> > we need from the signal code and how we can support signals in the
> > vhost_task layer.
> > 
> > Note that I took the super simple route and kicked off some work to
> > the system workqueue. We can do more invassive approaches:
> > 1. Modify the vhost drivers so they can check for IO completions using
> > a non-blocking interface. We then don't need to run from the system
> > workqueue and can run from the vhost_task.
> > 
> > 2. We could drop patch 1 and just say we are doing a polling type
> > of approach. We then modify the vhost layer similar to #1 where we
> > can check for completions using a non-blocking interface and use
> > the vhost_task task.
> 
> My preference would be to do whatever is the minimal thing now and has
> the least bug potential and is the easiest to review for us non-vhost
> experts. Then you can take all the time to rework and improve the vhost
> infra based on the possibilities that using user workers offers. Plus,
> that can easily happen in the next kernel cycle.
> 
> Remember, that we're trying to fix a regression here. A regression on an
> unreleased kernel but still.

On Tue, May 16, 2023 at 10:40:01AM +0200, Christian Brauner wrote:
> On Mon, May 15, 2023 at 05:23:12PM -0500, Mike Christie wrote:
> > On 5/15/23 10:44 AM, Linus Torvalds wrote:
> > > On Mon, May 15, 2023 at 7:23 AM Christian Brauner <brauner@kernel.org> wrote:
> > >>
> > >> So I think we will be able to address (1) and (2) by making vhost tasks
> > >> proper threads and blocking every signal except for SIGKILL and SIGSTOP
> > >> and then having vhost handle get_signal() - as you mentioned - the same
> > >> way io uring already does. We should also remove the ingore_signals
> > >> thing completely imho. I don't think we ever want to do this with user
> > >> workers.
> > > 
> > > Right. That's what IO_URING does:
> > > 
> > >         if (args->io_thread) {
> > >                 /*
> > >                  * Mark us an IO worker, and block any signal that isn't
> > >                  * fatal or STOP
> > >                  */
> > >                 p->flags |= PF_IO_WORKER;
> > >                 siginitsetinv(&p->blocked, sigmask(SIGKILL)|sigmask(SIGSTOP));
> > >         }
> > > 
> > > and I really think that vhost should basically do exactly what io_uring does.
> > > 
> > > Not because io_uring fundamentally got this right - but simply because
> > > io_uring had almost all the same bugs (and then some), and what the
> > > io_uring worker threads ended up doing was to basically zoom in on
> > > "this works".
> > > 
> > > And it zoomed in on it largely by just going for "make it look as much
> > > as possible as a real user thread", because every time the kernel
> > > thread did something different, it just caused problems.
> > > 
> > > So I think the patch should just look something like the attached.
> > > Mike, can you test this on whatever vhost test-suite?
> > 
> > I tried that approach already and it doesn't work because io_uring and vhost
> > differ in that vhost drivers implement a device where each device has a vhost_task
> > and the drivers have a file_operations for the device. When the vhost_task's
> > parent gets signal like SIGKILL, then it will exit and call into the vhost
> > driver's file_operations->release function. At this time, we need to do cleanup
> 
> But that's no reason why the vhost worker couldn't just be allowed to
> exit on SIGKILL cleanly similar to io_uring. That's just describing the
> current architecture which isn't a necessity afaict. And the helper
> thread could e.g., crash.
> 
> > like flush the device which uses the vhost_task. There is also the case where if
> > the vhost_task gets a SIGKILL, we can just exit from under the vhost layer.
> 
> In a way I really don't like the patch below. Because this should be
> solvable by adapting vhost workers. Right now, vhost is coming from a
> kthread model and we ported it to a user worker model and the whole
> point of this excercise has been that the workers behave more like
> regular userspace processes. So my tendency is to not massage kernel
> signal handling to now also include a special case for user workers in
> addition to kthreads. That's just the wrong way around and then vhost
> could've just stuck with kthreads in the first place.
> 
> So I'm fine with skipping over the freezing case for now but SIGKILL
> should be handled imho. Only init and kthreads should get the luxury of
> ignoring SIGKILL.
> 
> So, I'm afraid I'm asking some work here of you but how feasible would a
> model be where vhost_worker() similar to io_wq_worker() gracefully
> handles SIGKILL. Yes, I see there's
> 
> net.c:   .release = vhost_net_release
> scsi.c:  .release = vhost_scsi_release
> test.c:  .release = vhost_test_release
> vdpa.c:  .release = vhost_vdpa_release
> vsock.c: .release = virtio_transport_release
> vsock.c: .release = vhost_vsock_dev_release
> 
> but that means you have all the basic logic in place and all of those
> drivers also support the VHOST_RESET_OWNER ioctl which also stops the
> vhost worker. I'm confident that a lof this can be leveraged to just
> cleanup on SIGKILL.
> 
> So it feels like this should be achievable by adding a callback to
> struct vhost_worker that get's called when vhost_worker() gets SIGKILL
> and that all the users of vhost workers are forced to implement.
> 
> Yes, it is more work but I think that's the right thing to do and not to
> complicate our signal handling.
> 
> Worst case if this can't be done fast enough we'll have to revert the
> vhost parts. I think the user worker parts are mostly sane and are

As mentioned, if we can't settle this cleanly before -rc4 we should
revert the vhost parts unless Linus wants to have it earlier.

  parent reply	other threads:[~2023-05-19 12:15 UTC|newest]

Thread overview: 176+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-02-02 23:25 [PATCH v11 0/8] Use copy_process in vhost layer Mike Christie
2023-02-02 23:25 ` Mike Christie
2023-02-02 23:25 ` [PATCH v11 1/8] fork: Make IO worker options flag based Mike Christie
2023-02-02 23:25   ` Mike Christie
2023-02-03  0:14   ` Linus Torvalds
2023-02-03  0:14     ` Linus Torvalds
2023-02-02 23:25 ` [PATCH v11 2/8] fork/vm: Move common PF_IO_WORKER behavior to new flag Mike Christie
2023-02-02 23:25   ` Mike Christie
2023-02-02 23:25 ` [PATCH v11 3/8] fork: add USER_WORKER flag to not dup/clone files Mike Christie
2023-02-02 23:25   ` Mike Christie
2023-02-03  0:16   ` Linus Torvalds
2023-02-03  0:16     ` Linus Torvalds
2023-02-02 23:25 ` [PATCH v11 4/8] fork: Add USER_WORKER flag to ignore signals Mike Christie
2023-02-02 23:25   ` Mike Christie
2023-02-03  0:19   ` Linus Torvalds
2023-02-03  0:19     ` Linus Torvalds
2023-02-05 16:06     ` Mike Christie
2023-02-05 16:06       ` Mike Christie
2023-02-02 23:25 ` [PATCH v11 5/8] fork: allow kernel code to call copy_process Mike Christie
2023-02-02 23:25   ` Mike Christie
2023-02-02 23:25 ` [PATCH v11 6/8] vhost_task: Allow vhost layer to use copy_process Mike Christie
2023-02-02 23:25   ` Mike Christie
2023-02-03  0:43   ` Linus Torvalds
2023-02-03  0:43     ` Linus Torvalds
2023-02-02 23:25 ` [PATCH v11 7/8] vhost: move worker thread fields to new struct Mike Christie
2023-02-02 23:25   ` Mike Christie
2023-02-02 23:25 ` [PATCH v11 8/8] vhost: use vhost_tasks for worker threads Mike Christie
2023-02-02 23:25   ` Mike Christie
2023-05-05 13:40   ` Nicolas Dichtel
2023-05-05 18:22     ` Linus Torvalds
2023-05-05 18:22       ` Linus Torvalds
2023-05-05 22:37       ` Mike Christie
2023-05-05 22:37         ` Mike Christie
2023-05-06  1:53         ` Linus Torvalds
2023-05-06  1:53           ` Linus Torvalds
2023-05-08 17:13         ` Christian Brauner
2023-05-09  8:09         ` Nicolas Dichtel
2023-05-09  8:17           ` Nicolas Dichtel
2023-05-13 12:39         ` Thorsten Leemhuis
2023-05-13 12:39           ` Thorsten Leemhuis
2023-05-13 15:08           ` Linus Torvalds
2023-05-13 15:08             ` Linus Torvalds
2023-05-15 14:23             ` Christian Brauner
2023-05-15 15:44               ` Linus Torvalds
2023-05-15 15:44                 ` Linus Torvalds
2023-05-15 15:52                 ` Jens Axboe
2023-05-15 15:52                   ` Jens Axboe
2023-05-15 15:54                   ` Linus Torvalds
2023-05-15 15:54                     ` Linus Torvalds
2023-05-15 17:23                     ` Linus Torvalds
2023-05-15 17:23                       ` Linus Torvalds
2023-05-15 15:56                   ` Linus Torvalds
2023-05-15 15:56                     ` Linus Torvalds
2023-05-15 22:23                 ` Mike Christie
2023-05-15 22:23                   ` Mike Christie
2023-05-15 22:54                   ` Linus Torvalds
2023-05-15 22:54                     ` Linus Torvalds
2023-05-16  3:53                     ` Mike Christie
2023-05-16  3:53                       ` Mike Christie
2023-05-16 13:18                       ` Oleg Nesterov
2023-05-16 13:18                         ` Oleg Nesterov
2023-05-16 13:40                       ` Oleg Nesterov
2023-05-16 13:40                         ` Oleg Nesterov
2023-05-16 15:56                     ` Eric W. Biederman
2023-05-16 15:56                       ` Eric W. Biederman
2023-05-16 18:37                       ` Oleg Nesterov
2023-05-16 18:37                         ` Oleg Nesterov
2023-05-16 20:12                         ` Eric W. Biederman
2023-05-16 20:12                           ` Eric W. Biederman
2023-05-17 17:09                           ` Oleg Nesterov
2023-05-17 17:09                             ` Oleg Nesterov
2023-05-17 18:22                             ` Mike Christie
2023-05-17 18:22                               ` Mike Christie
2023-05-16  8:39                   ` Christian Brauner
2023-05-16 16:24                     ` Mike Christie
2023-05-16 16:24                       ` Mike Christie
2023-05-16 16:44                       ` Christian Brauner
2023-05-19 12:15                     ` Christian Brauner [this message]
2023-06-01  7:58                       ` [RFC PATCH 0/8] vhost_tasks: Use CLONE_THREAD/SIGHAND Thorsten Leemhuis
2023-06-01  7:58                         ` Thorsten Leemhuis
2023-06-01 10:18                         ` Nicolas Dichtel
2023-06-01 10:47                         ` Christian Brauner
2023-06-01 11:29                           ` Thorsten Leemhuis
2023-06-01 11:29                             ` Thorsten Leemhuis
2023-06-01 12:26                           ` Linus Torvalds
2023-06-01 12:26                             ` Linus Torvalds
2023-06-01 16:10                           ` Mike Christie
2023-06-01 16:10                             ` Mike Christie
2023-05-16 14:06     ` [PATCH v11 8/8] vhost: use vhost_tasks for worker threads Linux regression tracking #adding (Thorsten Leemhuis)
2023-05-26  9:03       ` Linux regression tracking #update (Thorsten Leemhuis)
2023-06-02 11:38       ` Thorsten Leemhuis
2023-07-20 13:06   ` Michael S. Tsirkin
2023-07-20 13:06     ` Michael S. Tsirkin
2023-07-23  4:03     ` michael.christie
2023-07-23  4:03       ` michael.christie
2023-07-23  9:31       ` Michael S. Tsirkin
2023-07-23  9:31         ` Michael S. Tsirkin
2023-08-10 18:57       ` Michael S. Tsirkin
2023-08-10 18:57         ` Michael S. Tsirkin
2023-08-11 18:51         ` Mike Christie
2023-08-11 18:51           ` Mike Christie
2023-08-13 19:01           ` Michael S. Tsirkin
2023-08-13 19:01             ` Michael S. Tsirkin
2023-08-14  3:13             ` michael.christie
2023-08-14  3:13               ` michael.christie
2023-02-07  8:19 ` [PATCH v11 0/8] Use copy_process in vhost layer Christian Brauner
2023-05-18  0:09 [RFC PATCH 0/8] vhost_tasks: Use CLONE_THREAD/SIGHAND Mike Christie
2023-05-18  0:09 ` Mike Christie
2023-05-18  0:09 ` [RFC PATCH 1/8] signal: Dequeue SIGKILL even if SIGNAL_GROUP_EXIT/group_exec_task is set Mike Christie
2023-05-18  0:09   ` Mike Christie
2023-05-18  2:34   ` Eric W. Biederman
2023-05-18  2:34     ` Eric W. Biederman
2023-05-18  3:49   ` Eric W. Biederman
2023-05-18  3:49     ` Eric W. Biederman
2023-05-18 15:21     ` Mike Christie
2023-05-18 15:21       ` Mike Christie
2023-05-18 16:25       ` Oleg Nesterov
2023-05-18 16:25         ` Oleg Nesterov
2023-05-18 16:42         ` Mike Christie
2023-05-18 16:42           ` Mike Christie
2023-05-18 17:04           ` Oleg Nesterov
2023-05-18 17:04             ` Oleg Nesterov
2023-05-18 18:28             ` Eric W. Biederman
2023-05-18 18:28               ` Eric W. Biederman
2023-05-18 22:57               ` Mike Christie
2023-05-18 22:57                 ` Mike Christie
2023-05-19  4:16                 ` Eric W. Biederman
2023-05-19  4:16                   ` Eric W. Biederman
2023-05-19 23:24                   ` Mike Christie
2023-05-19 23:24                     ` Mike Christie
2023-05-22 13:30               ` Oleg Nesterov
2023-05-22 13:30                 ` Oleg Nesterov
2023-05-18  8:08   ` Christian Brauner
2023-05-18 15:27     ` Mike Christie
2023-05-18 15:27       ` Mike Christie
2023-05-18 17:07       ` Christian Brauner
2023-05-18 18:08         ` Oleg Nesterov
2023-05-18 18:08           ` Oleg Nesterov
2023-05-18 18:12           ` Christian Brauner
2023-05-18 18:23             ` Oleg Nesterov
2023-05-18 18:23               ` Oleg Nesterov
2023-05-18  0:09 ` [RFC PATCH 2/8] vhost/vhost_task: Hook vhost layer into signal handler Mike Christie
2023-05-18  0:09   ` Mike Christie
2023-05-18  0:16   ` Linus Torvalds
2023-05-18  0:16     ` Linus Torvalds
2023-05-18  1:01     ` Mike Christie
2023-05-18  1:01       ` Mike Christie
2023-05-18  8:16       ` Christian Brauner
2023-05-18  0:09 ` [RFC PATCH 3/8] fork/vhost_task: Switch to CLONE_THREAD and CLONE_SIGHAND Mike Christie
2023-05-18  0:09   ` Mike Christie
2023-05-18  8:18   ` Christian Brauner
2023-05-18  0:09 ` [RFC PATCH 4/8] vhost-net: Move vhost_net_open Mike Christie
2023-05-18  0:09   ` Mike Christie
2023-05-18  0:09 ` [RFC PATCH 5/8] vhost: Add callback that stops new work and waits on running ones Mike Christie
2023-05-18  0:09   ` Mike Christie
2023-05-18 14:18   ` Christian Brauner
2023-05-18 15:03     ` Mike Christie
2023-05-18 15:03       ` Mike Christie
2023-05-18 15:09       ` Christian Brauner
2023-05-18 18:38       ` Eric W. Biederman
2023-05-18 18:38         ` Eric W. Biederman
2023-05-18  0:09 ` [RFC PATCH 6/8] vhost-scsi: Add callback to stop and wait on works Mike Christie
2023-05-18  0:09   ` Mike Christie
2023-05-18  0:09 ` [RFC PATCH 7/8] vhost-net: " Mike Christie
2023-05-18  0:09   ` Mike Christie
2023-05-18  0:09 ` [RFC PATCH 8/8] fork/vhost_task: remove no_files Mike Christie
2023-05-18  0:09   ` Mike Christie
2023-05-18  1:04   ` Mike Christie
2023-05-18  1:04     ` Mike Christie
2023-05-18 12:31   ` kernel test robot
2023-05-18 15:30   ` kernel test robot
2023-05-18 23:14   ` kernel test robot
2023-05-19  7:26   ` kernel test robot
2023-05-18  8:25 ` [RFC PATCH 0/8] vhost_tasks: Use CLONE_THREAD/SIGHAND Christian Brauner
2023-05-18  8:40   ` Christian Brauner
2023-05-18 14:30   ` Christian Brauner

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20230519-vormittag-dschungel-83607e9d2255@brauner \
    --to=brauner@kernel.org \
    --cc=axboe@kernel.dk \
    --cc=ebiederm@xmission.com \
    --cc=hch@infradead.org \
    --cc=jasowang@redhat.com \
    --cc=konrad.wilk@oracle.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux@leemhuis.info \
    --cc=michael.christie@oracle.com \
    --cc=mst@redhat.com \
    --cc=nicolas.dichtel@6wind.com \
    --cc=oleg@redhat.com \
    --cc=regressions@lists.linux.dev \
    --cc=sgarzare@redhat.com \
    --cc=stefanha@redhat.com \
    --cc=torvalds@linux-foundation.org \
    --cc=virtualization@lists.linux-foundation.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.