linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Anthony Krowiak <akrowiak@linux.ibm.com>
To: Halil Pasic <pasic@linux.ibm.com>
Cc: linux-s390@vger.kernel.org, linux-kernel@vger.kernel.org,
	kvm@vger.kernel.org, jjherne@linux.ibm.com,
	borntraeger@de.ibm.com, cohuck@redhat.com,
	mjrosato@linux.ibm.com, pbonzini@redhat.com,
	frankja@linux.ibm.com, imbrenda@linux.ibm.com, david@redhat.com,
	"scottwood@freescale.com agraf"@suse.de
Subject: Re: [RFC] kvm: reverse call order of kvm_arch_destroy_vm() and kvm_destroy_devices()
Date: Thu, 11 Aug 2022 10:39:50 -0400	[thread overview]
Message-ID: <62595a25-61a9-cc83-3941-6001fee512af@linux.ibm.com> (raw)
In-Reply-To: <20220801135310.62c34c63.pasic@linux.ibm.com>


On 8/1/22 7:53 AM, Halil Pasic wrote:
> On Wed, 27 Jul 2022 15:00:02 -0400
> Anthony Krowiak <akrowiak@linux.ibm.com> wrote:
>
>> Any Takers??????
>>
>> On 7/5/22 2:54 PM, Tony Krowiak wrote:
>>> There is a new requirement for s390 secure execution guests that the
>>> hypervisor ensures all AP queues are reset and disassociated from the
>>> KVM guest before the secure configuration is torn down. It is the
>>> responsibility of the vfio_ap device driver to handle this.
>>>
>>> Prior to commit ("vfio: remove VFIO_GROUP_NOTIFY_SET_KVM"),
>>> the driver reset all AP queues passed through to a KVM guest when notified
>>> that the KVM pointer was being set to NULL. Subsequently, the AP queues
>>> are only reset when the fd for the mediated device used to pass the queues
>>> through to the guest is closed (the vfio_ap_mdev_close_device() callback).
>>> This is not a problem when userspace is well-behaved and uses the
>>> KVM_DEV_VFIO_GROUP_DEL attribute to remove the VFIO group; however, if
>>> userspace for some reason does not close the mdev fd, a secure execution
>>> guest will tear down its configuration before the AP queues are
>>> reset because the teardown is done in the kvm_arch_destroy_vm function
>>> which is invoked prior to kvm_destroy_devices.
> As Matt has pointed out: we did not have the guarantee we need prior
> that commit. Please for the next version drop the digression about
> the old behavior.
>
>>> This patch proposes a simple solution; rather than introducing a new
>>> notifier into vfio or callback into KVM, what aoubt reversing the order
>>> in which the kvm_arch_destroy_vm and kvm_destroy_devices are called. In
>>> some very limited testing (i.e., the automated regression tests for
>>> the vfio_ap device driver) this did not seem to cause any problems.
>>>
>>> The question remains, is there a good technical reason why the VM
>>> is destroyed before the devices it is using? This is not intuitive, so
>>> this is a request for comments on this proposed patch. The assumption
>>> here is that the medev fd will get closed when the devices are destroyed.
> I did some digging! The function and the corresponding mechanism was
> introduced by  07f0a7bdec5c ("kvm: destroy emulated devices on VM
> exit"). Before that patch we used to have ref-counting, and the refcound
> got decremented in kvmppc_mpic_disconnect_vcpu() which in turn was
> called by kvm_arch_vcpu_free(). So this was basically arch specific
> stuff. For power (the patch came form power) the refcount was decremented
> before calling kvmppc_core_vcpu_free(). So I conclude the old scheme
> would have worked for us.
>
> Since the patch does not state any technical reasons, my guess is, that
> the choice was made somewhat arbitrarily under the assumption, that
> there is no requirements or dependency with regards to the destruction
> of devices or with regards towards severing the connection between
> the devices and the VM. Under these assumptions the placement of
> the invocation of kvm_destroy_devices after kvm_arch_destroy_vm()
> did made sense, because if something that is destroyed in destroy_vm()
> did hold a live reference to the device, this reference will be cleaned
> up before kvm_destroy_devices() is invoked. So basically unless the
> devices hold references to each other, things look good. If the
> positions of  kvm_arch_destroy_vm() and kvm_destroy_devices() are
> changed, then we basically need to assume that nothing that is destroyed
> in kvm_arch_destoy_vm() may logically hold a live reference (remember
> the refcount is gone, but pointers may still exist) to a kvm device.
> Does that hold? @Antony, maybe you can answer this question for us...


I do not have an answer for this without doing a deep dive into the 
code. I am not very familiar with the VM lifecycle. My hope was that 
someone who knows this area would respond to this RFC. I am copying the 
Signed-off-by email addresses for the patch (07f0a7bdec) you mentioned 
above; maybe they can provide some insight as to for their choice in 
ordering of the kvm_arch_destroy_vm() and kvm_destroy_devices() functions.



> Otherwise I will continue the digging from here, eventually.
>
> Also I have concerns about the following comments:
>
> static void kvm_destroy_devices(struct kvm *kvm)
> {
>          struct kvm_device *dev, *tmp;
>                                                                                  
>          /*
>           * We do not need to take the kvm->lock here, because nobody else
>           * has a reference to the struct kvm at this point and therefore
>           * cannot access the devices list anyhow.
> [..]
>
> Would this till hold when the order is changed?
>
> struct kvm_device_ops {
> [..]
>          /*
>           * Destroy is responsible for freeing dev.
>           *
>           * Destroy may be called before or after destructors are called
>           * on emulated I/O regions, depending on whether a reference is
>           * held by a vcpu or other kvm component that gets destroyed
>           * after the emulated I/O.
>           */
>          void (*destroy)(struct kvm_device *dev);
>
> This seems to document the order of things as is.
>
> Btw I would like to understand more about the lifecycle of these
> emulated I/O regions....
>
> @Paolo: I believe this is ultimately your truff. I'm just digging
> through the code, and the history to try to help along with this. We
> definitely need a solution for our problem. We would very much appreciate
> having your opinion!
>
> Regards,
> Halil
>
>>> Signed-off-by: Tony Krowiak <akrowiak@linux.ibm.com>
>>> ---
>>>    virt/kvm/kvm_main.c | 2 +-
>>>    1 file changed, 1 insertion(+), 1 deletion(-)
>>>
>>> diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
>>> index a49df8988cd6..edaf2918be9b 100644
>>> --- a/virt/kvm/kvm_main.c
>>> +++ b/virt/kvm/kvm_main.c
>>> @@ -1248,8 +1248,8 @@ static void kvm_destroy_vm(struct kvm *kvm)
>>>    #else
>>>    	kvm_flush_shadow_all(kvm);
>>>    #endif
>>> -	kvm_arch_destroy_vm(kvm);
>>>    	kvm_destroy_devices(kvm);
>>> +	kvm_arch_destroy_vm(kvm);
>>>    	for (i = 0; i < KVM_ADDRESS_SPACE_NUM; i++) {
>>>    		kvm_free_memslots(kvm, &kvm->__memslots[i][0]);
>>>    		kvm_free_memslots(kvm, &kvm->__memslots[i][1]);

      reply	other threads:[~2022-08-11 14:41 UTC|newest]

Thread overview: 6+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-07-05 18:54 [RFC] kvm: reverse call order of kvm_arch_destroy_vm() and kvm_destroy_devices() Tony Krowiak
2022-07-05 19:30 ` Matthew Rosato
2022-07-18 14:11 ` Anthony Krowiak
2022-07-27 19:00 ` Anthony Krowiak
2022-08-01 11:53   ` Halil Pasic
2022-08-11 14:39     ` Anthony Krowiak [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=62595a25-61a9-cc83-3941-6001fee512af@linux.ibm.com \
    --to=akrowiak@linux.ibm.com \
    --cc="scottwood@freescale.com agraf"@suse.de \
    --cc=borntraeger@de.ibm.com \
    --cc=cohuck@redhat.com \
    --cc=david@redhat.com \
    --cc=frankja@linux.ibm.com \
    --cc=imbrenda@linux.ibm.com \
    --cc=jjherne@linux.ibm.com \
    --cc=kvm@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-s390@vger.kernel.org \
    --cc=mjrosato@linux.ibm.com \
    --cc=pasic@linux.ibm.com \
    --cc=pbonzini@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).