linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Abhishek Sahu <abhsahu@nvidia.com>
To: Alex Williamson <alex.williamson@redhat.com>
Cc: Cornelia Huck <cohuck@redhat.com>,
	Yishai Hadas <yishaih@nvidia.com>,
	Jason Gunthorpe <jgg@nvidia.com>,
	Shameer Kolothum <shameerali.kolothum.thodi@huawei.com>,
	Kevin Tian <kevin.tian@intel.com>,
	"Rafael J . Wysocki" <rafael@kernel.org>,
	Max Gurtovoy <mgurtovoy@nvidia.com>,
	Bjorn Helgaas <bhelgaas@google.com>,
	linux-kernel@vger.kernel.org, kvm@vger.kernel.org,
	linux-pm@vger.kernel.org, linux-pci@vger.kernel.org
Subject: Re: [PATCH v5 5/5] vfio/pci: Implement VFIO_DEVICE_FEATURE_LOW_POWER_ENTRY_WITH_WAKEUP
Date: Mon, 25 Jul 2022 20:34:40 +0530	[thread overview]
Message-ID: <9c9b9a7a-bee4-305a-019b-54b96ffba3af@nvidia.com> (raw)
In-Reply-To: <20220721163442.7d2ae47f.alex.williamson@redhat.com>

On 7/22/2022 4:04 AM, Alex Williamson wrote:
> On Tue, 19 Jul 2022 17:45:23 +0530
> Abhishek Sahu <abhsahu@nvidia.com> wrote:
> 
>> This patch implements VFIO_DEVICE_FEATURE_LOW_POWER_ENTRY_WITH_WAKEUP
>> device feature. In the VFIO_DEVICE_FEATURE_LOW_POWER_ENTRY, if there is
>> any access for the VFIO device on the host side, then the device will
>> be moved out of the low power state without the user's guest driver
>> involvement. Once the device access has been finished, then the device
>> will be moved again into low power state. With the low power
>> entry happened through VFIO_DEVICE_FEATURE_LOW_POWER_ENTRY_WITH_WAKEUP,
>> the device will not be moved back into the low power state and
>> a notification will be sent to the user by triggering wakeup eventfd.
>>
>> vfio_pci_core_pm_entry() will be called for both the variants of low
>> power feature entry so add an extra argument for wakeup eventfd context
>> and store locally in 'struct vfio_pci_core_device'.
>>
>> For the entry happened without wakeup eventfd, all the exit related
>> handling will be done by the LOW_POWER_EXIT device feature only.
>> When the LOW_POWER_EXIT will be called, then the vfio core layer
>> vfio_device_pm_runtime_get() will increment the usage count and will
>> resume the device. In the driver runtime_resume callback,
>> the 'pm_wake_eventfd_ctx' will be NULL so the vfio_pci_runtime_pm_exit()
>> will return early. Then vfio_pci_core_pm_exit() will again call
>> vfio_pci_runtime_pm_exit() and now the exit related handling will be done.
>>
>> For the entry happened with wakeup eventfd, in the driver resume
>> callback, eventfd will be triggered and all the exit related handling will
>> be done. When vfio_pci_runtime_pm_exit() will be called by
>> vfio_pci_core_pm_exit(), then it will return early. But if the user has
>> disabled the runtime PM on the host side, the device will never go
>> runtime suspended state and in this case, all the exit related handling
>> will be done during vfio_pci_core_pm_exit() only. Also, the eventfd will
>> not be triggered since the device power state has not been changed by the
>> host driver.
>>
>> For vfio_pci_core_disable() also, all the exit related handling
>> needs to be done if user has closed the device after putting into
>> low power. In this case eventfd will not be triggered since
>> the device close has been initiated by the user only.
>>
>> Signed-off-by: Abhishek Sahu <abhsahu@nvidia.com>
>> ---
>>  drivers/vfio/pci/vfio_pci_core.c | 78 ++++++++++++++++++++++++++++++--
>>  include/linux/vfio_pci_core.h    |  1 +
>>  2 files changed, 74 insertions(+), 5 deletions(-)
>>
>> diff --git a/drivers/vfio/pci/vfio_pci_core.c b/drivers/vfio/pci/vfio_pci_core.c
>> index 726a6f282496..dbe942bcaa67 100644
>> --- a/drivers/vfio/pci/vfio_pci_core.c
>> +++ b/drivers/vfio/pci/vfio_pci_core.c
>> @@ -259,7 +259,8 @@ int vfio_pci_set_power_state(struct vfio_pci_core_device *vdev, pci_power_t stat
>>  	return ret;
>>  }
>>  
>> -static int vfio_pci_runtime_pm_entry(struct vfio_pci_core_device *vdev)
>> +static int vfio_pci_runtime_pm_entry(struct vfio_pci_core_device *vdev,
>> +				     struct eventfd_ctx *efdctx)
>>  {
>>  	/*
>>  	 * The vdev power related flags are protected with 'memory_lock'
>> @@ -272,6 +273,7 @@ static int vfio_pci_runtime_pm_entry(struct vfio_pci_core_device *vdev)
>>  	}
>>  
>>  	vdev->pm_runtime_engaged = true;
>> +	vdev->pm_wake_eventfd_ctx = efdctx;
>>  	pm_runtime_put_noidle(&vdev->pdev->dev);
>>  	up_write(&vdev->memory_lock);
>>  
>> @@ -295,21 +297,67 @@ static int vfio_pci_core_pm_entry(struct vfio_device *device, u32 flags,
>>  	 * while returning from the ioctl and then the device can go into
>>  	 * runtime suspended state.
>>  	 */
>> -	return vfio_pci_runtime_pm_entry(vdev);
>> +	return vfio_pci_runtime_pm_entry(vdev, NULL);
>>  }
>>  
>> -static void vfio_pci_runtime_pm_exit(struct vfio_pci_core_device *vdev)
>> +static int
>> +vfio_pci_core_pm_entry_with_wakeup(struct vfio_device *device, u32 flags,
>> +				   void __user *arg, size_t argsz)
>> +{
>> +	struct vfio_pci_core_device *vdev =
>> +		container_of(device, struct vfio_pci_core_device, vdev);
>> +	struct vfio_device_low_power_entry_with_wakeup entry;
>> +	struct eventfd_ctx *efdctx;
>> +	int ret;
>> +
>> +	ret = vfio_check_feature(flags, argsz, VFIO_DEVICE_FEATURE_SET,
>> +				 sizeof(entry));
>> +	if (ret != 1)
>> +		return ret;
>> +
>> +	if (copy_from_user(&entry, arg, sizeof(entry)))
>> +		return -EFAULT;
>> +
>> +	if (entry.wakeup_eventfd < 0)
>> +		return -EINVAL;
>> +
>> +	efdctx = eventfd_ctx_fdget(entry.wakeup_eventfd);
>> +	if (IS_ERR(efdctx))
>> +		return PTR_ERR(efdctx);
>> +
>> +	ret = vfio_pci_runtime_pm_entry(vdev, efdctx);
>> +	if (ret)
>> +		eventfd_ctx_put(efdctx);
>> +
>> +	return ret;
>> +}
>> +
>> +static void vfio_pci_runtime_pm_exit(struct vfio_pci_core_device *vdev,
>> +				     bool resume_callback)
>>  {
>>  	/*
>>  	 * The vdev power related flags are protected with 'memory_lock'
>>  	 * semaphore.
>>  	 */
>>  	down_write(&vdev->memory_lock);
>> +	if (resume_callback && !vdev->pm_wake_eventfd_ctx) {
>> +		up_write(&vdev->memory_lock);
>> +		return;
>> +	}
>> +
>>  	if (vdev->pm_runtime_engaged) {
>>  		vdev->pm_runtime_engaged = false;
>>  		pm_runtime_get_noresume(&vdev->pdev->dev);
>>  	}
>>  
>> +	if (vdev->pm_wake_eventfd_ctx) {
>> +		if (resume_callback)
>> +			eventfd_signal(vdev->pm_wake_eventfd_ctx, 1);
>> +
>> +		eventfd_ctx_put(vdev->pm_wake_eventfd_ctx);
>> +		vdev->pm_wake_eventfd_ctx = NULL;
>> +	}
>> +
>>  	up_write(&vdev->memory_lock);
>>  }
>>  
> 
> I find the pm_exit handling here confusing.  We only have one caller
> that can signal the eventfd, so it seems cleaner to me to have that
> caller do the eventfd signal.  We can then remove the arg to pm_exit
> and pull the core of it out to a pre-locked function for that call
> path.  Sometime like below (applies on top of this patch).  Also moved
> the intx unmasking until after the eventfd signaling.  What do you
> think?  Thanks,
> 
> Alex
> 

 Thanks Alex. The updated code looks cleaner.
 I will make the above changes.

 Regards,
 Abhishek

> diff --git a/drivers/vfio/pci/vfio_pci_core.c b/drivers/vfio/pci/vfio_pci_core.c
> index dbe942bcaa67..93169b7d6da2 100644
> --- a/drivers/vfio/pci/vfio_pci_core.c
> +++ b/drivers/vfio/pci/vfio_pci_core.c
> @@ -332,32 +332,27 @@ vfio_pci_core_pm_entry_with_wakeup(struct vfio_device *device, u32 flags,
>  	return ret;
>  }
>  
> -static void vfio_pci_runtime_pm_exit(struct vfio_pci_core_device *vdev,
> -				     bool resume_callback)
> +static void __vfio_pci_runtime_pm_exit(struct vfio_pci_core_device *vdev)
>  {
> -	/*
> -	 * The vdev power related flags are protected with 'memory_lock'
> -	 * semaphore.
> -	 */
> -	down_write(&vdev->memory_lock);
> -	if (resume_callback && !vdev->pm_wake_eventfd_ctx) {
> -		up_write(&vdev->memory_lock);
> -		return;
> -	}
> -
>  	if (vdev->pm_runtime_engaged) {
>  		vdev->pm_runtime_engaged = false;
>  		pm_runtime_get_noresume(&vdev->pdev->dev);
> -	}
> -
> -	if (vdev->pm_wake_eventfd_ctx) {
> -		if (resume_callback)
> -			eventfd_signal(vdev->pm_wake_eventfd_ctx, 1);
>  
> -		eventfd_ctx_put(vdev->pm_wake_eventfd_ctx);
> -		vdev->pm_wake_eventfd_ctx = NULL;
> +		if (vdev->pm_wake_eventfd_ctx) {
> +			eventfd_ctx_put(vdev->pm_wake_eventfd_ctx);
> +			vdev->pm_wake_eventfd_ctx = NULL;
> +		}
>  	}
> +}
>  
> +static void vfio_pci_runtime_pm_exit(struct vfio_pci_core_device *vdev)
> +{
> +	/*
> +	 * The vdev power related flags are protected with 'memory_lock'
> +	 * semaphore.
> +	 */
> +	down_write(&vdev->memory_lock);
> +	__vfio_pci_runtime_pm_exit(vdev);
>  	up_write(&vdev->memory_lock);
>  }
>  
> @@ -373,22 +368,13 @@ static int vfio_pci_core_pm_exit(struct vfio_device *device, u32 flags,
>  		return ret;
>  
>  	/*
> -	 * The device should already be resumed by the vfio core layer.
> -	 * vfio_pci_runtime_pm_exit() will internally increment the usage
> -	 * count corresponding to pm_runtime_put() called during low power
> -	 * feature entry.
> -	 *
> -	 * For the low power entry happened with wakeup eventfd, there will
> -	 * be two cases:
> -	 *
> -	 * 1. The device has gone into runtime suspended state. In this case,
> -	 *    the runtime resume by the vfio core layer should already have
> -	 *    performed all exit related handling and the
> -	 *    vfio_pci_runtime_pm_exit() will return early.
> -	 * 2. The device was in runtime active state. In this case, the
> -	 *    vfio_pci_runtime_pm_exit() will do all the required handling.
> +	 * The device is always in the active state here due to pm wrappers
> +	 * around ioctls.  If the device had entered a low power state and
> +	 * pm_wake_eventfd_ctx is valid, vfio_pci_core_runtime_resume() has 
> +	 * already signaled the eventfd and exited low power mode itself.
> +	 * pm_runtime_engaged protects the redundant call here.
>  	 */
> -	vfio_pci_runtime_pm_exit(vdev, false);
> +	vfio_pci_runtime_pm_exit(vdev);
>  	return 0;
>  }
>  
> @@ -425,15 +411,19 @@ static int vfio_pci_core_runtime_resume(struct device *dev)
>  {
>  	struct vfio_pci_core_device *vdev = dev_get_drvdata(dev);
>  
> -	if (vdev->pm_intx_masked)
> -		vfio_pci_intx_unmask(vdev);
> -
>  	/*
> -	 * Only for the low power entry happened with wakeup eventfd,
> -	 * the vfio_pci_runtime_pm_exit() will perform exit related handling
> -	 * and will trigger eventfd. For the other cases, it will return early.
> +	 * Resume with a pm_wake_eventfd_ctx signals the eventfd and exits
> +	 * low power mode.
>  	 */
> -	vfio_pci_runtime_pm_exit(vdev, true);
> +	down_write(&vdev->memory_lock);
> +	if (vdev->pm_wake_eventfd_ctx) {
> +		eventfd_signal(vdev->pm_wake_eventfd_ctx, 1);
> +		__vfio_pci_runtime_pm_exit(vdev);
> +	}
> +	up_write(&vdev->memory_lock);
> +
> +	if (vdev->pm_intx_masked)
> +		vfio_pci_intx_unmask(vdev);
>  
>  	return 0;
>  }
> @@ -553,7 +543,7 @@ void vfio_pci_core_disable(struct vfio_pci_core_device *vdev)
>  	 * the vfio_pci_set_power_state() will change the device power state
>  	 * to D0.
>  	 */
> -	vfio_pci_runtime_pm_exit(vdev, false);
> +	vfio_pci_runtime_pm_exit(vdev);
>  	pm_runtime_resume(&pdev->dev);
>  
>  	/*
> 


      reply	other threads:[~2022-07-25 15:05 UTC|newest]

Thread overview: 26+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-07-19 12:15 [PATCH v5 0/5] vfio/pci: power management changes Abhishek Sahu
2022-07-19 12:15 ` [PATCH v5 1/5] vfio: Add the device features for the low power entry and exit Abhishek Sahu
2022-07-21 22:34   ` Alex Williamson
2022-07-25 14:40     ` Abhishek Sahu
2022-07-25 22:09       ` Alex Williamson
2022-07-26 12:47         ` Abhishek Sahu
2022-07-26 13:13           ` Cornelia Huck
2022-07-26 14:17           ` Alex Williamson
2022-07-26 17:23           ` Jason Gunthorpe
2022-07-27  6:07             ` Abhishek Sahu
2022-08-01 18:42               ` Alex Williamson
2022-08-02 14:04                 ` Jason Gunthorpe
2022-08-02 15:41                   ` Alex Williamson
2022-08-02 16:35                     ` Jason Gunthorpe
2022-08-02 16:57                       ` Alex Williamson
2022-08-02 17:01                         ` Jason Gunthorpe
2022-08-03  6:32                           ` Abhishek Sahu
2022-07-19 12:15 ` [PATCH v5 2/5] vfio: Increment the runtime PM usage count during IOCTL call Abhishek Sahu
2022-07-21 22:34   ` Alex Williamson
2022-07-19 12:15 ` [PATCH v5 3/5] vfio/pci: Mask INTx during runtime suspend Abhishek Sahu
2022-07-19 12:15 ` [PATCH v5 4/5] vfio/pci: Implement VFIO_DEVICE_FEATURE_LOW_POWER_ENTRY/EXIT Abhishek Sahu
2022-07-21 22:34   ` Alex Williamson
2022-07-25 14:48     ` Abhishek Sahu
2022-07-19 12:15 ` [PATCH v5 5/5] vfio/pci: Implement VFIO_DEVICE_FEATURE_LOW_POWER_ENTRY_WITH_WAKEUP Abhishek Sahu
2022-07-21 22:34   ` Alex Williamson
2022-07-25 15:04     ` Abhishek Sahu [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=9c9b9a7a-bee4-305a-019b-54b96ffba3af@nvidia.com \
    --to=abhsahu@nvidia.com \
    --cc=alex.williamson@redhat.com \
    --cc=bhelgaas@google.com \
    --cc=cohuck@redhat.com \
    --cc=jgg@nvidia.com \
    --cc=kevin.tian@intel.com \
    --cc=kvm@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-pci@vger.kernel.org \
    --cc=linux-pm@vger.kernel.org \
    --cc=mgurtovoy@nvidia.com \
    --cc=rafael@kernel.org \
    --cc=shameerali.kolothum.thodi@huawei.com \
    --cc=yishaih@nvidia.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).