From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 694BC1C26 for ; Tue, 16 May 2023 08:40:01 +0000 (UTC) Received: by smtp.kernel.org (Postfix) with ESMTPSA id 9B049C433D2; Tue, 16 May 2023 08:39:57 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1684226401; bh=L//2mJqpJe73Cmk74amA5WtIVkNTK2EJMElKgY4Oifw=; h=Date:From:To:Cc:Subject:References:In-Reply-To:From; b=HllN8HjVjfHFg0iKJPiSZ+Ym2m9sd35ggzSOD6myB8PVaHbOVyxF9vNF6JCWqpRTd GSJ/qoF84U8qBkM6Ikskkgi5Wi7fwb6evjn/mrZFaEFSsjVf0hT6Mme9PThb3IbOc+ 5GgWXkL74/h9xy4Pn4WSCkjUysVIF4HnnjQgck9UN4x5uXA/qcAeZ/JeqYf+NHD241 cuPjBVe8somRlu6CjRrMttWHt5XrMWYNQoz29dG21zk0PBGc1nbdF4r+wtNFSUo98C tVOxOBb40VoktDmsHMrLQksJZOBPtiZ3MKXTlG7Iq6IjIFJ2/nWFLZl7vxXmcIWa+z RXHLm6mijvu7g== Date: Tue, 16 May 2023 10:39:53 +0200 From: Christian Brauner To: Mike Christie , Linus Torvalds Cc: Thorsten Leemhuis , nicolas.dichtel@6wind.com, Linux kernel regressions list , hch@infradead.org, stefanha@redhat.com, jasowang@redhat.com, mst@redhat.com, sgarzare@redhat.com, virtualization@lists.linux-foundation.org, ebiederm@xmission.com, konrad.wilk@oracle.com, linux-kernel@vger.kernel.org, Jens Axboe Subject: Re: [PATCH v11 8/8] vhost: use vhost_tasks for worker threads Message-ID: <20230516-weltmeere-backofen-27f12ae2c9e0@brauner> References: <20230202232517.8695-1-michael.christie@oracle.com> <20230202232517.8695-9-michael.christie@oracle.com> <78c5e150-26cf-7724-74ee-4a0b16b944b1@oracle.com> <48842e92-835e-bc3f-7118-48b8f415f532@leemhuis.info> <20230515-vollrausch-liebgeworden-2765f3ca3540@brauner> <122b597e-a5fa-daf7-27bb-6f04fa98d496@oracle.com> Precedence: bulk X-Mailing-List: regressions@lists.linux.dev List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Disposition: inline Content-Transfer-Encoding: 8bit In-Reply-To: <122b597e-a5fa-daf7-27bb-6f04fa98d496@oracle.com> On Mon, May 15, 2023 at 05:23:12PM -0500, Mike Christie wrote: > On 5/15/23 10:44 AM, Linus Torvalds wrote: > > On Mon, May 15, 2023 at 7:23 AM Christian Brauner wrote: > >> > >> So I think we will be able to address (1) and (2) by making vhost tasks > >> proper threads and blocking every signal except for SIGKILL and SIGSTOP > >> and then having vhost handle get_signal() - as you mentioned - the same > >> way io uring already does. We should also remove the ingore_signals > >> thing completely imho. I don't think we ever want to do this with user > >> workers. > > > > Right. That's what IO_URING does: > > > > if (args->io_thread) { > > /* > > * Mark us an IO worker, and block any signal that isn't > > * fatal or STOP > > */ > > p->flags |= PF_IO_WORKER; > > siginitsetinv(&p->blocked, sigmask(SIGKILL)|sigmask(SIGSTOP)); > > } > > > > and I really think that vhost should basically do exactly what io_uring does. > > > > Not because io_uring fundamentally got this right - but simply because > > io_uring had almost all the same bugs (and then some), and what the > > io_uring worker threads ended up doing was to basically zoom in on > > "this works". > > > > And it zoomed in on it largely by just going for "make it look as much > > as possible as a real user thread", because every time the kernel > > thread did something different, it just caused problems. > > > > So I think the patch should just look something like the attached. > > Mike, can you test this on whatever vhost test-suite? > > I tried that approach already and it doesn't work because io_uring and vhost > differ in that vhost drivers implement a device where each device has a vhost_task > and the drivers have a file_operations for the device. When the vhost_task's > parent gets signal like SIGKILL, then it will exit and call into the vhost > driver's file_operations->release function. At this time, we need to do cleanup But that's no reason why the vhost worker couldn't just be allowed to exit on SIGKILL cleanly similar to io_uring. That's just describing the current architecture which isn't a necessity afaict. And the helper thread could e.g., crash. > like flush the device which uses the vhost_task. There is also the case where if > the vhost_task gets a SIGKILL, we can just exit from under the vhost layer. In a way I really don't like the patch below. Because this should be solvable by adapting vhost workers. Right now, vhost is coming from a kthread model and we ported it to a user worker model and the whole point of this excercise has been that the workers behave more like regular userspace processes. So my tendency is to not massage kernel signal handling to now also include a special case for user workers in addition to kthreads. That's just the wrong way around and then vhost could've just stuck with kthreads in the first place. So I'm fine with skipping over the freezing case for now but SIGKILL should be handled imho. Only init and kthreads should get the luxury of ignoring SIGKILL. So, I'm afraid I'm asking some work here of you but how feasible would a model be where vhost_worker() similar to io_wq_worker() gracefully handles SIGKILL. Yes, I see there's net.c: .release = vhost_net_release scsi.c: .release = vhost_scsi_release test.c: .release = vhost_test_release vdpa.c: .release = vhost_vdpa_release vsock.c: .release = virtio_transport_release vsock.c: .release = vhost_vsock_dev_release but that means you have all the basic logic in place and all of those drivers also support the VHOST_RESET_OWNER ioctl which also stops the vhost worker. I'm confident that a lof this can be leveraged to just cleanup on SIGKILL. So it feels like this should be achievable by adding a callback to struct vhost_worker that get's called when vhost_worker() gets SIGKILL and that all the users of vhost workers are forced to implement. Yes, it is more work but I think that's the right thing to do and not to complicate our signal handling. Worst case if this can't be done fast enough we'll have to revert the vhost parts. I think the user worker parts are mostly sane and are useful. Thoughts?