linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Abhishek Sahu <abhsahu@nvidia.com>
To: Alex Williamson <alex.williamson@redhat.com>
Cc: Cornelia Huck <cohuck@redhat.com>,
	Yishai Hadas <yishaih@nvidia.com>,
	Jason Gunthorpe <jgg@nvidia.com>,
	Shameer Kolothum <shameerali.kolothum.thodi@huawei.com>,
	Kevin Tian <kevin.tian@intel.com>,
	"Rafael J . Wysocki" <rafael@kernel.org>,
	Max Gurtovoy <mgurtovoy@nvidia.com>,
	Bjorn Helgaas <bhelgaas@google.com>,
	linux-kernel@vger.kernel.org, kvm@vger.kernel.org,
	linux-pm@vger.kernel.org, linux-pci@vger.kernel.org
Subject: Re: [PATCH v4 2/6] vfio: Add a new device feature for the power management
Date: Fri, 8 Jul 2022 15:09:22 +0530	[thread overview]
Message-ID: <ad80eb14-18a1-8895-ecfb-32687a4ba021@nvidia.com> (raw)
In-Reply-To: <20220706093959.3bd2cbbb.alex.williamson@redhat.com>

On 7/6/2022 9:09 PM, Alex Williamson wrote:
> On Fri, 1 Jul 2022 16:38:10 +0530
> Abhishek Sahu <abhsahu@nvidia.com> wrote:
> 
>> This patch adds the new feature VFIO_DEVICE_FEATURE_POWER_MANAGEMENT
>> for the power management in the header file. The implementation for the
>> same will be added in the subsequent patches.
>>
>> With the standard registers, all power states cannot be achieved. The
>> platform-based power management needs to be involved to go into the
>> lowest power state. For all the platform-based power management, this
>> device feature can be used.
>>
>> This device feature uses flags to specify the different operations. In
>> the future, if any more power management functionality is needed then
>> a new flag can be added to it. It supports both GET and SET operations.
>>
>> Signed-off-by: Abhishek Sahu <abhsahu@nvidia.com>
>> ---
>>  include/uapi/linux/vfio.h | 55 +++++++++++++++++++++++++++++++++++++++
>>  1 file changed, 55 insertions(+)
>>
>> diff --git a/include/uapi/linux/vfio.h b/include/uapi/linux/vfio.h
>> index 733a1cddde30..7e00de5c21ea 100644
>> --- a/include/uapi/linux/vfio.h
>> +++ b/include/uapi/linux/vfio.h
>> @@ -986,6 +986,61 @@ enum vfio_device_mig_state {
>>  	VFIO_DEVICE_STATE_RUNNING_P2P = 5,
>>  };
>>  
>> +/*
>> + * Perform power management-related operations for the VFIO device.
>> + *
>> + * The low power feature uses platform-based power management to move the
>> + * device into the low power state.  This low power state is device-specific.
>> + *
>> + * This device feature uses flags to specify the different operations.
>> + * It supports both the GET and SET operations.
>> + *
>> + * - VFIO_PM_LOW_POWER_ENTER flag moves the VFIO device into the low power
>> + *   state with platform-based power management.  This low power state will be
>> + *   internal to the VFIO driver and the user will not come to know which power
>> + *   state is chosen.  Once the user has moved the VFIO device into the low
>> + *   power state, then the user should not do any device access without moving
>> + *   the device out of the low power state.
> 
> Except we're wrapping device accesses to make this possible.  This
> should probably describe how any discrete access will wake the device
> but ongoing access through mmaps will generate user faults.
> 

 Sure. I will add that details also.

>> + *
>> + * - VFIO_PM_LOW_POWER_EXIT flag moves the VFIO device out of the low power
>> + *    state.  This flag should only be set if the user has previously put the
>> + *    device into low power state with the VFIO_PM_LOW_POWER_ENTER flag.
> 
> Indenting.
> 
 
 I will fix this.

>> + *
>> + * - VFIO_PM_LOW_POWER_ENTER and VFIO_PM_LOW_POWER_EXIT are mutually exclusive.
>> + *
>> + * - VFIO_PM_LOW_POWER_REENTERY_DISABLE flag is only valid with
>> + *   VFIO_PM_LOW_POWER_ENTER.  If there is any access for the VFIO device on
>> + *   the host side, then the device will be moved out of the low power state
>> + *   without the user's guest driver involvement.  Some devices require the
>> + *   user's guest driver involvement for each low-power entry.  If this flag is
>> + *   set, then the re-entry to the low power state will be disabled, and the
>> + *   host kernel will not move the device again into the low power state.
>> + *   The VFIO driver internally maintains a list of devices for which low
>> + *   power re-entry is disabled by default and for those devices, the
>> + *   re-entry will be disabled even if the user has not set this flag
>> + *   explicitly.
> 
> Wrong polarity.  The kernel should not maintain the policy.  By default
> every wakeup, whether from host kernel accesses or via user accesses
> that do a pm-get should signal a wakeup to userspace.  Userspace needs
> to opt-out of that wakeup to let the kernel automatically re-enter low
> power and userspace needs to maintain the policy for which devices it
> wants that to occur.
> 
 
 Okay. So that means, in the kernel side, we don’t have to maintain
 the list which currently contains NVIDIA device ID. Also, in our
 updated approach, this opt-out of that wake-up means that user
 has not provided eventfd in the feature SET ioctl. Correct ?
 
>> + *
>> + * For the IOCTL call with VFIO_DEVICE_FEATURE_GET:
>> + *
>> + * - VFIO_PM_LOW_POWER_ENTER will be set if the user has put the device into
>> + *   the low power state, otherwise, VFIO_PM_LOW_POWER_EXIT will be set.
>> + *
>> + * - If the device is in a normal power state currently, then
>> + *   VFIO_PM_LOW_POWER_REENTERY_DISABLE will be set for the devices where low
>> + *   power re-entry is disabled by default.  If the device is in the low power
>> + *   state currently, then VFIO_PM_LOW_POWER_REENTERY_DISABLE will be set
>> + *   according to the current transition.
> 
> Very confusing semantics.
> 
> What if the feature SET ioctl took an eventfd and that eventfd was one
> time use.  Calling the ioctl would setup the eventfd to notify the user
> on wakeup and call pm-put.  Any access to the device via host, ioctl,
> or region would be wrapped in pm-get/put and the pm-resume handler
> would perform the matching pm-get to balance the feature SET and signal
> the eventfd. 

 This seems a better option. It will help in making the ioctl simpler
 and we don’t have to add a separate index for PME which I added in
 patch 6. 

> If the user opts-out by not providing a wakeup eventfd,
> then the pm-resume handler does not perform a pm-get. Possibly we
> could even allow mmap access if a wake-up eventfd is provided.

 Sorry. I am not clear on this mmap part. We currently invalidates
 mapping before going into runtime-suspend. Now, if use tries do
 mmap then do we need some extra handling in the fault handler ?
 Need your help in understanding this part.

> The
> feature GET ioctl would be used to exit low power behavior and would be
> a no-op if the wakeup eventfd had already been signaled.  Thanks,
>
 
 I will use the GET ioctl for low power exit instead of returning the
 current status.
 
 Regards,
 Abhishek

> Alex
> 
>> + */
>> +struct vfio_device_feature_power_management {
>> +	__u32	flags;
>> +#define VFIO_PM_LOW_POWER_ENTER			(1 << 0)
>> +#define VFIO_PM_LOW_POWER_EXIT			(1 << 1)
>> +#define VFIO_PM_LOW_POWER_REENTERY_DISABLE	(1 << 2)
>> +	__u32	reserved;
>> +};
>> +
>> +#define VFIO_DEVICE_FEATURE_POWER_MANAGEMENT	3
>> +
>>  /* -------- API for Type1 VFIO IOMMU -------- */
>>  
>>  /**
> 


  reply	other threads:[~2022-07-08  9:39 UTC|newest]

Thread overview: 26+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-07-01 11:08 [PATCH v4 0/6] vfio/pci: power management changes Abhishek Sahu
2022-07-01 11:08 ` [PATCH v4 1/6] vfio/pci: Mask INTx during runtime suspend Abhishek Sahu
2022-07-06 15:39   ` Alex Williamson
2022-07-08  9:21     ` Abhishek Sahu
2022-07-08 15:45       ` Alex Williamson
2022-07-11  9:18         ` Abhishek Sahu
2022-07-11 12:57           ` Alex Williamson
2022-07-01 11:08 ` [PATCH v4 2/6] vfio: Add a new device feature for the power management Abhishek Sahu
2022-07-06 15:39   ` Alex Williamson
2022-07-08  9:39     ` Abhishek Sahu [this message]
2022-07-08 16:36       ` Alex Williamson
2022-07-11  9:43         ` Abhishek Sahu
2022-07-11 13:04           ` Alex Williamson
2022-07-11 17:30             ` Abhishek Sahu
2022-07-01 11:08 ` [PATCH v4 3/6] vfio: Increment the runtime PM usage count during IOCTL call Abhishek Sahu
2022-07-06 15:40   ` Alex Williamson
2022-07-08  9:43     ` Abhishek Sahu
2022-07-08 16:49       ` Alex Williamson
2022-07-11  9:50         ` Abhishek Sahu
2022-07-01 11:08 ` [PATCH v4 4/6] vfio/pci: Add the support for PCI D3cold state Abhishek Sahu
2022-07-06 15:40   ` Alex Williamson
2022-07-01 11:08 ` [PATCH v4 5/6] vfio/pci: Prevent low power re-entry without guest driver Abhishek Sahu
2022-07-06 15:40   ` Alex Williamson
2022-07-01 11:08 ` [PATCH v4 6/6] vfio/pci: Add support for virtual PME Abhishek Sahu
2022-07-06 15:40   ` Alex Williamson
2022-07-08  9:45     ` Abhishek Sahu

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=ad80eb14-18a1-8895-ecfb-32687a4ba021@nvidia.com \
    --to=abhsahu@nvidia.com \
    --cc=alex.williamson@redhat.com \
    --cc=bhelgaas@google.com \
    --cc=cohuck@redhat.com \
    --cc=jgg@nvidia.com \
    --cc=kevin.tian@intel.com \
    --cc=kvm@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-pci@vger.kernel.org \
    --cc=linux-pm@vger.kernel.org \
    --cc=mgurtovoy@nvidia.com \
    --cc=rafael@kernel.org \
    --cc=shameerali.kolothum.thodi@huawei.com \
    --cc=yishaih@nvidia.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).