All of lore.kernel.org
 help / color / mirror / Atom feed
From: Kirti Wankhede <kwankhede@nvidia.com>
To: Cornelia Huck <cohuck@redhat.com>
Cc: cjia@nvidia.com, aik@ozlabs.ru, Zhengxiao.zx@alibaba-inc.com,
	shuangtai.tst@alibaba-inc.com, qemu-devel@nongnu.org,
	peterx@redhat.com, eauger@redhat.com, yi.l.liu@intel.com,
	quintela@redhat.com, ziye.yang@intel.com, armbru@redhat.com,
	mlevitsk@redhat.com, pasic@linux.ibm.com, felipe@nutanix.com,
	zhi.a.wang@intel.com, kevin.tian@intel.com, yan.y.zhao@intel.com,
	"Dr. David Alan Gilbert" <dgilbert@redhat.com>,
	alex.williamson@redhat.com, changpeng.liu@intel.com,
	eskultet@redhat.com, Ken.Xue@amd.com,
	jonathan.davies@nutanix.com, pbonzini@redhat.com
Subject: Re: [PATCH v26 05/17] vfio: Add VM state change handler to know state of VM
Date: Wed, 21 Oct 2020 11:03:23 +0530	[thread overview]
Message-ID: <245abdf6-245d-5f88-e04b-35fad763560c@nvidia.com> (raw)
In-Reply-To: <20201020125105.5cd790df.cohuck@redhat.com>



On 10/20/2020 4:21 PM, Cornelia Huck wrote:
> On Sun, 18 Oct 2020 01:54:56 +0530
> Kirti Wankhede <kwankhede@nvidia.com> wrote:
> 
>> On 9/29/2020 4:33 PM, Dr. David Alan Gilbert wrote:
>>> * Cornelia Huck (cohuck@redhat.com) wrote:
>>>> On Wed, 23 Sep 2020 04:54:07 +0530
>>>> Kirti Wankhede <kwankhede@nvidia.com> wrote:
>>>>   
>>>>> VM state change handler gets called on change in VM's state. This is used to set
>>>>> VFIO device state to _RUNNING.
>>>>>
>>>>> Signed-off-by: Kirti Wankhede <kwankhede@nvidia.com>
>>>>> Reviewed-by: Neo Jia <cjia@nvidia.com>
>>>>> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
>>>>> ---
>>>>>    hw/vfio/migration.c           | 136 ++++++++++++++++++++++++++++++++++++++++++
>>>>>    hw/vfio/trace-events          |   3 +-
>>>>>    include/hw/vfio/vfio-common.h |   4 ++
>>>>>    3 files changed, 142 insertions(+), 1 deletion(-)
>>>>>   
>>>>
>>>> (...)
>>>>   
>>>>> +static int vfio_migration_set_state(VFIODevice *vbasedev, uint32_t mask,
>>>>> +                                    uint32_t value)
>>>>
>>>> I think I've mentioned that before, but this function could really
>>>> benefit from a comment what mask and value mean.
>>>>   
>>
>> Adding a comment as:
>>
>> /*
>>    *  Write device_state field to inform the vendor driver about the
>> device state
>>    *  to be transitioned to.
>>    *  vbasedev: VFIO device
>>    *  mask : bits set in the mask are preserved in device_state
>>    *  value: bits set in the value are set in device_state
>>    *  Remaining bits in device_state are cleared.
>>    */
> 
> Maybe:
> 
> "Change the device_state register for device @vbasedev. Bits set in
> @mask are preserved, bits set in @value are set, and bits not set in
> either @mask or @value are cleared in device_state. If the register
> cannot be accessed, the resulting state would be invalid, or the device
> enters an error state, an error is returned." ?
> 

Ok.
>>
>>
>>>>> +{
>>>>> +    VFIOMigration *migration = vbasedev->migration;
>>>>> +    VFIORegion *region = &migration->region;
>>>>> +    off_t dev_state_off = region->fd_offset +
>>>>> +                      offsetof(struct vfio_device_migration_info, device_state);
>>>>> +    uint32_t device_state;
>>>>> +    int ret;
>>>>> +
>>>>> +    ret = vfio_mig_read(vbasedev, &device_state, sizeof(device_state),
>>>>> +                        dev_state_off);
>>>>> +    if (ret < 0) {
>>>>> +        return ret;
>>>>> +    }
>>>>> +
>>>>> +    device_state = (device_state & mask) | value;
>>>>> +
>>>>> +    if (!VFIO_DEVICE_STATE_VALID(device_state)) {
>>>>> +        return -EINVAL;
>>>>> +    }
>>>>> +
>>>>> +    ret = vfio_mig_write(vbasedev, &device_state, sizeof(device_state),
>>>>> +                         dev_state_off);
>>>>> +    if (ret < 0) {
>>>>> +        ret = vfio_mig_read(vbasedev, &device_state, sizeof(device_state),
>>>>> +                          dev_state_off);
>>>>> +        if (ret < 0) {
>>>>> +            return ret;
>>>>> +        }
>>>>> +
>>>>> +        if (VFIO_DEVICE_STATE_IS_ERROR(device_state)) {
>>>>> +            hw_error("%s: Device is in error state 0x%x",
>>>>> +                     vbasedev->name, device_state);
>>>>> +            return -EFAULT;
>>>>
>>>> Is -EFAULT a good return value here? Maybe -EIO?
>>>>   
>>
>> Ok. Changing to -EIO.
>>
>>>>> +        }
>>>>> +    }
>>>>> +
>>>>> +    vbasedev->device_state = device_state;
>>>>> +    trace_vfio_migration_set_state(vbasedev->name, device_state);
>>>>> +    return 0;
>>>>> +}
>>>>> +
>>>>> +static void vfio_vmstate_change(void *opaque, int running, RunState state)
>>>>> +{
>>>>> +    VFIODevice *vbasedev = opaque;
>>>>> +
>>>>> +    if ((vbasedev->vm_running != running)) {
>>>>> +        int ret;
>>>>> +        uint32_t value = 0, mask = 0;
>>>>> +
>>>>> +        if (running) {
>>>>> +            value = VFIO_DEVICE_STATE_RUNNING;
>>>>> +            if (vbasedev->device_state & VFIO_DEVICE_STATE_RESUMING) {
>>>>> +                mask = ~VFIO_DEVICE_STATE_RESUMING;
>>>>
>>>> I've been staring at this for some time and I think that the desired
>>>> result is
>>>> - set _RUNNING
>>>> - if _RESUMING was set, clear it, but leave the other bits intact
>>
>> Upto here, you're correct.
>>
>>>> - if _RESUMING was not set, clear everything previously set
>>>> This would really benefit from a comment (or am I the only one
>>>> struggling here?)
>>>>   
>>
>> Here mask should be ~0. Correcting it.
> 
> Hm, now I'm confused. With value == _RUNNING, ~_RUNNING and ~0 as mask
> should be equivalent, shouldn't they?
> 

I too got confused after reading your comment.
Lets walk through the device states and transitions can happen here:

if running
  - device state could be either _SAVING or _RESUMING or _STOP. Both 
_SAVING and _RESUMING can't be set at a time, that is the error state. 
_STOP means 0.
  - Transition from _SAVING to _RUNNING can happen if there is migration 
failure, in that case we have to clear _SAVING
- Transition from _RESUMING to _RUNNING can happen on resuming and we 
have to clear _RESUMING.
- In both the above cases, we have to set _RUNNING and clear rest 2 bits.
Then:
mask = ~VFIO_DEVICE_STATE_MASK;
value = VFIO_DEVICE_STATE_RUNNING;

if !running
- device state could be either _RUNNING or _SAVING|_RUNNING. Here we 
have to reset running bit.
Then:
mask = ~VFIO_DEVICE_STATE_RUNNING;
value = 0;

I'll add comment in the code above.


>>
>>
>>>>> +            }
>>>>> +        } else {
>>>>> +            mask = ~VFIO_DEVICE_STATE_RUNNING;
>>>>> +        }
>>>>> +
>>>>> +        ret = vfio_migration_set_state(vbasedev, mask, value);
>>>>> +        if (ret) {
>>>>> +            /*
>>>>> +             * vm_state_notify() doesn't support reporting failure. If such
>>>>> +             * error reporting support added in furure, migration should be
>>>>> +             * aborted.
>>>>
>>>>
>>>> "We should abort the migration in this case, but vm_state_notify()
>>>> currently does not support reporting failures."
>>>>
>>>> ?
>>>>   
>>
>> Ok. Updating comment as suggested here.
>>
>>>> Can/should we mark the failing device in some way?
>>>
>>> I think you can call qemu_file_set_error on the migration stream to
>>> force an error.
>>>    
>>
>> It should be as below, right?
>> qemu_file_set_error(migrate_get_current()->to_dst_file, ret);
> 
> Does this indicate in any way which device was causing problems? (I'm
> not sure how visible the error_report would be?)
> 

I think it doesn't indicate which device caused problem but it sets 
error on migration stream

Thanks,
Kirti

>>
>>
>> Thanks,
>> Kirti
>>
>>> Dave
>>>    
>>>>> +             */
>>>>> +            error_report("%s: Failed to set device state 0x%x",
>>>>> +                         vbasedev->name, value & mask);
>>>>> +        }
>>>>> +        vbasedev->vm_running = running;
>>>>> +        trace_vfio_vmstate_change(vbasedev->name, running, RunState_str(state),
>>>>> +                                  value & mask);
>>>>> +    }
>>>>> +}
>>>>> +
>>>>>    static int vfio_migration_init(VFIODevice *vbasedev,
>>>>>                                   struct vfio_region_info *info)
>>>>>    {
>>>>
>>>> (...)
>>
> 


  reply	other threads:[~2020-10-21  5:34 UTC|newest]

Thread overview: 72+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-09-22 23:24 [PATCH QEMU v25 00/17] Add migration support for VFIO devices Kirti Wankhede
2020-09-22 23:24 ` [PATCH v26 01/17] vfio: Add function to unmap VFIO region Kirti Wankhede
2020-09-22 23:24 ` [PATCH v26 02/17] vfio: Add vfio_get_object callback to VFIODeviceOps Kirti Wankhede
2020-09-22 23:24 ` [PATCH v26 03/17] vfio: Add save and load functions for VFIO PCI devices Kirti Wankhede
2020-09-23  6:38   ` Zenghui Yu
2020-09-24 22:49   ` Alex Williamson
2020-10-21  9:30     ` Zenghui Yu
2020-10-21 19:03       ` Alex Williamson
2020-09-22 23:24 ` [PATCH v26 04/17] vfio: Add migration region initialization and finalize function Kirti Wankhede
2020-09-24 14:08   ` Cornelia Huck
2020-10-17 20:14     ` Kirti Wankhede
2020-09-25 20:20   ` Alex Williamson
2020-09-28  9:39     ` Cornelia Huck
2020-10-17 20:17     ` Kirti Wankhede
2020-09-22 23:24 ` [PATCH v26 05/17] vfio: Add VM state change handler to know state of VM Kirti Wankhede
2020-09-24 15:02   ` Cornelia Huck
2020-09-29 11:03     ` Dr. David Alan Gilbert
2020-10-17 20:24       ` Kirti Wankhede
2020-10-20 10:51         ` Cornelia Huck
2020-10-21  5:33           ` Kirti Wankhede [this message]
2020-10-22  7:51             ` Cornelia Huck
2020-10-22 15:42               ` Kirti Wankhede
2020-10-22 15:49                 ` Cornelia Huck
2020-09-25 20:20   ` Alex Williamson
2020-10-17 20:30     ` Kirti Wankhede
2020-10-17 23:44       ` Alex Williamson
2020-10-18 17:43         ` Kirti Wankhede
2020-10-19 17:51           ` Alex Williamson
2020-10-20 10:23             ` Cornelia Huck
2020-09-22 23:24 ` [PATCH v26 06/17] vfio: Add migration state change notifier Kirti Wankhede
2020-09-25 20:20   ` Alex Williamson
2020-10-17 20:35     ` Kirti Wankhede
2020-10-19 17:57       ` Alex Williamson
2020-10-20 10:55         ` Cornelia Huck
2020-09-22 23:24 ` [PATCH v26 07/17] vfio: Register SaveVMHandlers for VFIO device Kirti Wankhede
2020-09-24 15:15   ` Philippe Mathieu-Daudé
2020-09-29 10:19     ` Dr. David Alan Gilbert
2020-10-17 20:36       ` Kirti Wankhede
2020-09-25 11:53   ` Cornelia Huck
2020-10-18 20:55     ` Kirti Wankhede
2020-10-20 15:51       ` Cornelia Huck
2020-09-25 20:20   ` Alex Williamson
2020-10-18 17:40     ` Kirti Wankhede
2020-09-22 23:24 ` [PATCH v26 08/17] vfio: Add save state functions to SaveVMHandlers Kirti Wankhede
2020-09-23 11:42   ` Wang, Zhi A
2020-10-21 14:30     ` Kirti Wankhede
2020-09-25 21:02   ` Alex Williamson
2020-10-18 18:00     ` Kirti Wankhede
2020-09-22 23:24 ` [PATCH v26 09/17] vfio: Add load " Kirti Wankhede
2020-10-01 10:07   ` Cornelia Huck
2020-10-18 20:47     ` Kirti Wankhede
2020-10-20 16:25       ` Cornelia Huck
2020-09-22 23:24 ` [PATCH v26 10/17] memory: Set DIRTY_MEMORY_MIGRATION when IOMMU is enabled Kirti Wankhede
2020-09-22 23:24 ` [PATCH v26 11/17] vfio: Get migration capability flags for container Kirti Wankhede
2020-09-22 23:24 ` [PATCH v26 12/17] vfio: Add function to start and stop dirty pages tracking Kirti Wankhede
2020-09-25 21:55   ` Alex Williamson
2020-10-18 20:52     ` Kirti Wankhede
2020-09-22 23:24 ` [PATCH v26 13/17] vfio: create mapped iova list when vIOMMU is enabled Kirti Wankhede
2020-09-25 22:23   ` Alex Williamson
2020-10-19  6:01     ` Kirti Wankhede
2020-10-19 17:24       ` Alex Williamson
2020-10-19 19:15         ` Kirti Wankhede
2020-10-19 20:07           ` Alex Williamson
2020-09-22 23:24 ` [PATCH v26 14/17] vfio: Add vfio_listener_log_sync to mark dirty pages Kirti Wankhede
2020-09-22 23:24 ` [PATCH v26 15/17] vfio: Add ioctl to get dirty pages bitmap during dma unmap Kirti Wankhede
2020-09-22 23:24 ` [PATCH v26 16/17] vfio: Make vfio-pci device migration capable Kirti Wankhede
2020-09-25 12:17   ` Cornelia Huck
2020-09-22 23:24 ` [PATCH v26 17/17] qapi: Add VFIO devices migration stats in Migration stats Kirti Wankhede
2020-09-24 15:14   ` Eric Blake
2020-09-25 22:55   ` Alex Williamson
2020-09-29 10:40   ` Dr. David Alan Gilbert
2020-09-23  7:06 ` [PATCH QEMU v25 00/17] Add migration support for VFIO devices Zenghui Yu

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=245abdf6-245d-5f88-e04b-35fad763560c@nvidia.com \
    --to=kwankhede@nvidia.com \
    --cc=Ken.Xue@amd.com \
    --cc=Zhengxiao.zx@alibaba-inc.com \
    --cc=aik@ozlabs.ru \
    --cc=alex.williamson@redhat.com \
    --cc=armbru@redhat.com \
    --cc=changpeng.liu@intel.com \
    --cc=cjia@nvidia.com \
    --cc=cohuck@redhat.com \
    --cc=dgilbert@redhat.com \
    --cc=eauger@redhat.com \
    --cc=eskultet@redhat.com \
    --cc=felipe@nutanix.com \
    --cc=jonathan.davies@nutanix.com \
    --cc=kevin.tian@intel.com \
    --cc=mlevitsk@redhat.com \
    --cc=pasic@linux.ibm.com \
    --cc=pbonzini@redhat.com \
    --cc=peterx@redhat.com \
    --cc=qemu-devel@nongnu.org \
    --cc=quintela@redhat.com \
    --cc=shuangtai.tst@alibaba-inc.com \
    --cc=yan.y.zhao@intel.com \
    --cc=yi.l.liu@intel.com \
    --cc=zhi.a.wang@intel.com \
    --cc=ziye.yang@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.