All of lore.kernel.org
 help / color / mirror / Atom feed
From: Abhishek Sahu <abhsahu@nvidia.com>
To: <kvm@vger.kernel.org>,
	Alex Williamson <alex.williamson@redhat.com>,
	Cornelia Huck <cohuck@redhat.com>
Cc: Max Gurtovoy <mgurtovoy@nvidia.com>,
	Yishai Hadas <yishaih@nvidia.com>,
	Zhen Lei <thunder.leizhen@huawei.com>,
	Jason Gunthorpe <jgg@nvidia.com>, <linux-kernel@vger.kernel.org>,
	Abhishek Sahu <abhsahu@nvidia.com>
Subject: [PATCH v3 2/2] vfio/pci: wake-up devices around reset functions
Date: Thu, 17 Feb 2022 17:51:07 +0530	[thread overview]
Message-ID: <20220217122107.22434-3-abhsahu@nvidia.com> (raw)
In-Reply-To: <20220217122107.22434-1-abhsahu@nvidia.com>

If 'vfio_pci_core_device::needs_pm_restore' is set (PCI device does
not have No_Soft_Reset bit set in its PMCSR config register), then the
current PCI state will be saved locally in
'vfio_pci_core_device::pm_save' during D0->D3hot transition and same
will be restored back during D3hot->D0 transition. For reset-related
functionalities, vfio driver uses PCI reset API's. These
API's internally change the PCI power state back to D0 first if
the device power state is non-D0. This state change to D0 will happen
without the involvement of vfio driver.

Let's consider the following example:

1. The device is in D3hot.
2. User invokes VFIO_DEVICE_RESET ioctl.
3. pci_try_reset_function() will be called which internally
   invokes pci_dev_save_and_disable().
4. pci_set_power_state(dev, PCI_D0) will be called first.
5. pci_save_state() will happen then.

Now, for the devices which has NoSoftRst-, the pci_set_power_state()
can trigger soft reset and the original PCI config state will be lost
at step (4) and this state cannot be restored again. This original PCI
state can include any setting which is performed by SBIOS or host
linux kernel (for example LTR, ASPM L1 substates, etc.). When this
soft reset will be triggered, then all these settings will be reset,
and the device state saved at step (5) will also have this setting
cleared so it cannot be restored. Since the vfio driver only exposes
limited PCI capabilities to its user, so the vfio driver user also
won't have the option to save and restore these capabilities state
either and these original settings will be permanently lost.

For pci_reset_bus() also, we can have the above situation.
The other functions/devices can be in D3hot and the reset will change
the power state of all devices to D0 without the involvement of vfio
driver.

So, before calling any reset-related API's, we need to make sure that
the device state is D0. This is mainly to preserve the state around
soft reset.

For vfio_pci_core_disable(), we use __pci_reset_function_locked()
which internally can use pci_pm_reset() for the function reset.
pci_pm_reset() requires the device power state to be in D0, otherwise
it returns error.

This patch changes the device power state to D0 by invoking
vfio_pci_set_power_state() explicitly before calling any reset related
API's.

Fixes: 51ef3a004b1e ("vfio/pci: Restore device state on PM transition")
Signed-off-by: Abhishek Sahu <abhsahu@nvidia.com>
---
 drivers/vfio/pci/vfio_pci_core.c | 48 ++++++++++++++++++++++++++++++++
 1 file changed, 48 insertions(+)

diff --git a/drivers/vfio/pci/vfio_pci_core.c b/drivers/vfio/pci/vfio_pci_core.c
index 87b288affc13..2e6409cc11ad 100644
--- a/drivers/vfio/pci/vfio_pci_core.c
+++ b/drivers/vfio/pci/vfio_pci_core.c
@@ -335,6 +335,17 @@ void vfio_pci_core_disable(struct vfio_pci_core_device *vdev)
 	/* For needs_reset */
 	lockdep_assert_held(&vdev->vdev.dev_set->lock);
 
+	/*
+	 * This function can be invoked while the power state is non-D0.
+	 * This function calls __pci_reset_function_locked() which internally
+	 * can use pci_pm_reset() for the function reset. pci_pm_reset() will
+	 * fail if the power state is non-D0. Also, for the devices which
+	 * have NoSoftRst-, the reset function can cause the PCI config space
+	 * reset without restoring the original state (saved locally in
+	 * 'vdev->pm_save').
+	 */
+	vfio_pci_set_power_state(vdev, PCI_D0);
+
 	/* Stop the device from further DMA */
 	pci_clear_master(pdev);
 
@@ -934,6 +945,19 @@ long vfio_pci_core_ioctl(struct vfio_device *core_vdev, unsigned int cmd,
 			return -EINVAL;
 
 		vfio_pci_zap_and_down_write_memory_lock(vdev);
+
+		/*
+		 * This function can be invoked while the power state is non-D0.
+		 * If pci_try_reset_function() has been called while the power
+		 * state is non-D0, then pci_try_reset_function() will
+		 * internally set the power state to D0 without vfio driver
+		 * involvement. For the devices which have NoSoftRst-, the
+		 * reset function can cause the PCI config space reset without
+		 * restoring the original state (saved locally in
+		 * 'vdev->pm_save').
+		 */
+		vfio_pci_set_power_state(vdev, PCI_D0);
+
 		ret = pci_try_reset_function(vdev->pdev);
 		up_write(&vdev->memory_lock);
 
@@ -2068,6 +2092,18 @@ static int vfio_pci_dev_set_hot_reset(struct vfio_device_set *dev_set,
 	}
 	cur_mem = NULL;
 
+	/*
+	 * The pci_reset_bus() will reset all the devices in the bus.
+	 * The power state can be non-D0 for some of the devices in the bus.
+	 * For these devices, the pci_reset_bus() will internally set
+	 * the power state to D0 without vfio driver involvement.
+	 * For the devices which have NoSoftRst-, the reset function can
+	 * cause the PCI config space reset without restoring the original
+	 * state (saved locally in 'vdev->pm_save').
+	 */
+	list_for_each_entry(cur, &dev_set->device_list, vdev.dev_set_list)
+		vfio_pci_set_power_state(cur, PCI_D0);
+
 	ret = pci_reset_bus(pdev);
 
 err_undo:
@@ -2121,6 +2157,18 @@ static bool vfio_pci_dev_set_try_reset(struct vfio_device_set *dev_set)
 	if (!pdev)
 		return false;
 
+	/*
+	 * The pci_reset_bus() will reset all the devices in the bus.
+	 * The power state can be non-D0 for some of the devices in the bus.
+	 * For these devices, the pci_reset_bus() will internally set
+	 * the power state to D0 without vfio driver involvement.
+	 * For the devices which have NoSoftRst-, the reset function can
+	 * cause the PCI config space reset without restoring the original
+	 * state (saved locally in 'vdev->pm_save').
+	 */
+	list_for_each_entry(cur, &dev_set->device_list, vdev.dev_set_list)
+		vfio_pci_set_power_state(cur, PCI_D0);
+
 	ret = pci_reset_bus(pdev);
 	if (ret)
 		return false;
-- 
2.17.1


      parent reply	other threads:[~2022-02-17 12:22 UTC|newest]

Thread overview: 3+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-02-17 12:21 [PATCH v3 0/2] vfio/pci: fixes related with power management Abhishek Sahu
2022-02-17 12:21 ` [PATCH v3 1/2] vfio/pci: fix memory leak during D3hot to D0 transition Abhishek Sahu
2022-02-17 12:21 ` Abhishek Sahu [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20220217122107.22434-3-abhsahu@nvidia.com \
    --to=abhsahu@nvidia.com \
    --cc=alex.williamson@redhat.com \
    --cc=cohuck@redhat.com \
    --cc=jgg@nvidia.com \
    --cc=kvm@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mgurtovoy@nvidia.com \
    --cc=thunder.leizhen@huawei.com \
    --cc=yishaih@nvidia.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.