Re: [RFC] kvm: reverse call order of kvm_arch_destroy_vm() and kvm_destroy_devices()

From: Matthew Rosato <mjrosato@linux.ibm.com>
To: Tony Krowiak <akrowiak@linux.ibm.com>,
	linux-s390@vger.kernel.org, linux-kernel@vger.kernel.org,
	kvm@vger.kernel.org
Cc: jjherne@linux.ibm.com, borntraeger@de.ibm.com, cohuck@redhat.com,
	pasic@linux.ibm.com, pbonzini@redhat.com, frankja@linux.ibm.com,
	imbrenda@linux.ibm.com, david@redhat.com
Subject: Re: [RFC] kvm: reverse call order of kvm_arch_destroy_vm() and kvm_destroy_devices()
Date: Tue, 5 Jul 2022 15:30:26 -0400	[thread overview]
Message-ID: <c4062e02-4b35-e130-b653-e467bef2eb4f@linux.ibm.com> (raw)
In-Reply-To: <20220705185430.499688-1-akrowiak@linux.ibm.com>

On 7/5/22 2:54 PM, Tony Krowiak wrote:
> There is a new requirement for s390 secure execution guests that the
> hypervisor ensures all AP queues are reset and disassociated from the
> KVM guest before the secure configuration is torn down. It is the
> responsibility of the vfio_ap device driver to handle this.
> 
> Prior to commit ("vfio: remove VFIO_GROUP_NOTIFY_SET_KVM"),
> the driver reset all AP queues passed through to a KVM guest when notified
> that the KVM pointer was being set to NULL. Subsequently, the AP queues
> are only reset when the fd for the mediated device used to pass the queues
> through to the guest is closed (the vfio_ap_mdev_close_device() callback).
> This is not a problem when userspace is well-behaved and uses the
> KVM_DEV_VFIO_GROUP_DEL attribute to remove the VFIO group; however, if
> userspace for some reason does not close the mdev fd, a secure execution
> guest will tear down its configuration before the AP queues are
> reset because the teardown is done in the kvm_arch_destroy_vm function
> which is invoked prior to vm_destroy_devices.

To clarify, even before "vfio: remove VFIO_GROUP_NOTIFY_SET_KVM" if 
userspace did not delete the group via KVM_DEV_VFIO_GROUP_DEL then the 
old callback would also not have been triggered until 
kvm_destroy_devices() anyway (the callback would have been triggered 
with a NULL kvm pointer via a call from kvm_vfio_destroy(), triggered 
from kvm_destroy_devices()).

My point being: this behavior did not start with "vfio: remove 
VFIO_GROUP_NOTIFY_SET_KVM", that patch just removed the notifier since 
both actions always took place at device open/close time anyway.  So if 
destroying the devices before the vm isn't doable, a new 
notifier/whatever that sets the KVM assocation to NULL would also have 
to happen at an earlier point in time than VFIO_GROUP_NOTIFY_SET_KVM did 
(and should maybe be something that is optional/opt-in and used only by 
vfio drivers that need it to cleanup a KVM association at a point prior 
to the device being destroyed).  There should still be no need for any 
sort of notifier to set the (non-NULL) KVM association as it's already 
associated with the vfio group before device_open.

But let's first see if anyone can shed some understanding on the 
ordering between kvm_arch_destroy_vm and kvm_destroy_devices...

> 
> This patch proposes a simple solution; rather than introducing a new
> notifier into vfio or callback into KVM, what aoubt reversing the order
> in which the kvm_arch_destroy_vm and kvm_destroy_devices are called. In
> some very limited testing (i.e., the automated regression tests for
> the vfio_ap device driver) this did not seem to cause any problems.
> 
> The question remains, is there a good technical reason why the VM
> is destroyed before the devices it is using? This is not intuitive, so
> this is a request for comments on this proposed patch. The assumption
> here is that the medev fd will get closed when the devices are destroyed.
> 
> Signed-off-by: Tony Krowiak <akrowiak@linux.ibm.com>
> ---
>   virt/kvm/kvm_main.c | 2 +-
>   1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
> index a49df8988cd6..edaf2918be9b 100644
> --- a/virt/kvm/kvm_main.c
> +++ b/virt/kvm/kvm_main.c
> @@ -1248,8 +1248,8 @@ static void kvm_destroy_vm(struct kvm *kvm)
>   #else
>   	kvm_flush_shadow_all(kvm);
>   #endif
> -	kvm_arch_destroy_vm(kvm);
>   	kvm_destroy_devices(kvm);
> +	kvm_arch_destroy_vm(kvm);
>   	for (i = 0; i < KVM_ADDRESS_SPACE_NUM; i++) {
>   		kvm_free_memslots(kvm, &kvm->__memslots[i][0]);
>   		kvm_free_memslots(kvm, &kvm->__memslots[i][1]);