[PATCH 0/9] vhost: Support SIGKILL by flushing and exiting

* [PATCH 0/9] vhost: Support SIGKILL by flushing and exiting
@ 2024-03-16  0:46 Mike Christie
  2024-03-16  0:46 ` [PATCH 1/9] vhost-scsi: Handle vhost_vq_work_queue failures for events Mike Christie
                   ` (10 more replies)
  0 siblings, 11 replies; 32+ messages in thread
From: Mike Christie @ 2024-03-16  0:46 UTC (permalink / raw)
  To: oleg, ebiederm, virtualization, mst, sgarzare, jasowang,
	stefanha, brauner

The following patches were made over Linus's tree and also apply over
mst's vhost branch. The patches add the ability for vhost_tasks to
handle SIGKILL by flushing queued works, stop new works from being
queued, and prepare the task for an early exit.

This removes the need for the signal/coredump hacks added in:

Commit f9010dbdce91 ("fork, vhost: Use CLONE_THREAD to fix freezer/ps regression")

when the vhost_task patches were initially merged and fix the issue
in this thread:

https://lore.kernel.org/all/000000000000a41b82060e875721@google.com/

Long Background:

The original vhost worker code didn't support any signals. If the
userspace application that owned the worker got a SIGKILL, the app/
process would exit dropping all references to the device and then the
file operation's release function would be called. From there we would
wait on running IO then cleanup the device's memory.

When we switched to vhost_tasks being a thread in the owner's process we
added some hacks to the signal/coredump code so we could continue to
wait on running IO and process it from the vhost_task. The idea was that
we would eventually remove the hacks. We recently hit this bug:

https://lore.kernel.org/all/000000000000a41b82060e875721@google.com/

It turns out only vhost-scsi had an issue where it would send a command
to the block/LIO layer, wait for a response and then process in the vhost
task. So patches 1-5 prepares vhost-scsi to handle when the vhost_task
is killed while we still have commands outstanding. The next patches then
prepare and convert the vhost and vhost_task layers to handle SIGKILL
by flushing running works, marking the vhost_task as dead so there's
no future uses, then exiting.

^ permalink raw reply	[flat|nested] 32+ messages in thread