linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v1 0/2] Add ablility of VFIO driver to ignore reset when device don't need it
@ 2021-10-14  9:57 Zhenguo Yao
  2021-10-14  9:57 ` [PATCH v1 1/2] PCI: Add ignore_reset sysfs interface to control whether to do device reset in PCI drivers Zhenguo Yao
                   ` (2 more replies)
  0 siblings, 3 replies; 5+ messages in thread
From: Zhenguo Yao @ 2021-10-14  9:57 UTC (permalink / raw)
  To: bhelgaas, alex.williamson
  Cc: cohuck, jgg, mgurtovoy, yishaih, kvm, linux-pci, linux-kernel,
	yaozhenguo, Zhenguo Yao

In some scenarios, vfio device can't do any reset in initialization
process. For example: Nvswitch and GPU A100 working in Shared NVSwitch
Virtualization Model. In such mode, there are two type VMs: service
VM and Guest VM. The GPU devices are initialized in the following steps:

1. Service VM boot up. GPUs and Nvswitchs are passthrough to service VM.
Nvidia driver and manager software will do some settings in service VM.

2. The selected GPUs are unpluged from service VM.

3. Guest VM boots up with the selected GPUs passthrough.

The selected GPUs can't do any reset in step3, or they will be initialized
failed in Guest VM.

This patchset add a PCI sysfs interface:ignore_reset which drivers can
use it to control whether to do PCI reset or not. For example: In Shared
NVSwitch Virtualization Model. Hypervisor can disable PCI reset by setting
ignore_reset to 1 before Gust VM booting up.

Zhenguo Yao (2):
  PCI: Add ignore_reset sysfs interface to control whether do device
    reset in PCI drivers
  vfio-pci: Don't do device reset when ignore_reset is setting

 drivers/pci/pci-sysfs.c          | 25 +++++++++++++++++
 drivers/vfio/pci/vfio_pci_core.c | 48 ++++++++++++++++++++------------
 include/linux/pci.h              |  1 +
 3 files changed, 56 insertions(+), 18 deletions(-)

-- 
2.27.0


^ permalink raw reply	[flat|nested] 5+ messages in thread

* [PATCH v1 1/2] PCI: Add ignore_reset sysfs interface to control whether to do device reset in PCI drivers
  2021-10-14  9:57 [PATCH v1 0/2] Add ablility of VFIO driver to ignore reset when device don't need it Zhenguo Yao
@ 2021-10-14  9:57 ` Zhenguo Yao
  2021-10-14  9:57 ` [PATCH v1 2/2] vfio-pci: Don't do device reset when ignore_reset is setting Zhenguo Yao
  2021-10-14 12:48 ` [PATCH v1 0/2] Add ablility of VFIO driver to ignore reset when device don't need it Alex Williamson
  2 siblings, 0 replies; 5+ messages in thread
From: Zhenguo Yao @ 2021-10-14  9:57 UTC (permalink / raw)
  To: bhelgaas, alex.williamson
  Cc: cohuck, jgg, mgurtovoy, yishaih, kvm, linux-pci, linux-kernel,
	yaozhenguo, Zhenguo Yao

Some PCI devices can't do device reset in enable and disable operations.
So, and ignore_reset sysfs interface to ignore reset in those operations.
For example:

echo 1 > /sys/bus/pci/xxxx/ignore_reset

PCI drivers can ignore reset for this device based on ignore_reset.

Signed-off-by: Zhenguo Yao <yaozhenguo1@gmail.com>
---
 drivers/pci/pci-sysfs.c | 25 +++++++++++++++++++++++++
 include/linux/pci.h     |  1 +
 2 files changed, 26 insertions(+)

diff --git a/drivers/pci/pci-sysfs.c b/drivers/pci/pci-sysfs.c
index 7fb5cd17cc98..c2fa2ed3ae55 100644
--- a/drivers/pci/pci-sysfs.c
+++ b/drivers/pci/pci-sysfs.c
@@ -255,6 +255,30 @@ static ssize_t ari_enabled_show(struct device *dev,
 }
 static DEVICE_ATTR_RO(ari_enabled);
 
+static ssize_t ignore_reset_show(struct device *dev,
+		struct device_attribute *attr,
+		char *buf)
+{
+	struct pci_dev *pci_dev = to_pci_dev(dev);
+
+	return sprintf(buf, "%u\n", pci_dev->ignore_reset);
+}
+static ssize_t ignore_reset_store(struct device *dev,
+		struct device_attribute *attr,
+		const char *buf, size_t count)
+{
+	struct pci_dev *pdev = to_pci_dev(dev);
+	unsigned long val;
+
+	if (kstrtoul(buf, 0, &val) < 0)
+		return -EINVAL;
+
+	pdev->ignore_reset = !!val;
+
+	return count;
+}
+static DEVICE_ATTR_RW(ignore_reset);
+
 static ssize_t modalias_show(struct device *dev, struct device_attribute *attr,
 			     char *buf)
 {
@@ -618,6 +642,7 @@ static struct attribute *pci_dev_attrs[] = {
 #endif
 	&dev_attr_driver_override.attr,
 	&dev_attr_ari_enabled.attr,
+	&dev_attr_ignore_reset.attr,
 	NULL,
 };
 
diff --git a/include/linux/pci.h b/include/linux/pci.h
index cd8aa6fce204..ac026acd4572 100644
--- a/include/linux/pci.h
+++ b/include/linux/pci.h
@@ -508,6 +508,7 @@ struct pci_dev {
 
 	/* These methods index pci_reset_fn_methods[] */
 	u8 reset_methods[PCI_NUM_RESET_METHODS]; /* In priority order */
+	u8 ignore_reset; /* ignore reset control in driver */
 };
 
 static inline struct pci_dev *pci_physfn(struct pci_dev *dev)
-- 
2.27.0


^ permalink raw reply related	[flat|nested] 5+ messages in thread

* [PATCH v1 2/2] vfio-pci: Don't do device reset when ignore_reset is setting
  2021-10-14  9:57 [PATCH v1 0/2] Add ablility of VFIO driver to ignore reset when device don't need it Zhenguo Yao
  2021-10-14  9:57 ` [PATCH v1 1/2] PCI: Add ignore_reset sysfs interface to control whether to do device reset in PCI drivers Zhenguo Yao
@ 2021-10-14  9:57 ` Zhenguo Yao
  2021-10-14 12:48 ` [PATCH v1 0/2] Add ablility of VFIO driver to ignore reset when device don't need it Alex Williamson
  2 siblings, 0 replies; 5+ messages in thread
From: Zhenguo Yao @ 2021-10-14  9:57 UTC (permalink / raw)
  To: bhelgaas, alex.williamson
  Cc: cohuck, jgg, mgurtovoy, yishaih, kvm, linux-pci, linux-kernel,
	yaozhenguo, Zhenguo Yao

In some scenarios, vfio device can't do any reset in initialization
process. For example: Nvswitch and GPU A100 working in Shared NVSwitch
Virtualization Model. In such mode, The GPUs can't do any reset when
Guest VM is booting up.

So, Using ignore_reset to control whether to do PCI reset in
initialization. In Shared NVSwitch Virtualization Model, GPUs will
ignore reset when Gust VM booting up.

Signed-off-by: Zhenguo Yao <yaozhenguo1@gmail.com>
---
 drivers/vfio/pci/vfio_pci_core.c | 48 ++++++++++++++++++++------------
 1 file changed, 30 insertions(+), 18 deletions(-)

diff --git a/drivers/vfio/pci/vfio_pci_core.c b/drivers/vfio/pci/vfio_pci_core.c
index 68198e0f2a63..83d3ef5d3a9c 100644
--- a/drivers/vfio/pci/vfio_pci_core.c
+++ b/drivers/vfio/pci/vfio_pci_core.c
@@ -254,11 +254,13 @@ int vfio_pci_core_enable(struct vfio_pci_core_device *vdev)
 	if (ret)
 		return ret;
 
-	/* If reset fails because of the device lock, fail this path entirely */
-	ret = pci_try_reset_function(pdev);
-	if (ret == -EAGAIN) {
-		pci_disable_device(pdev);
-		return ret;
+	if (!pdev->ignore_reset) {
+		/* If reset fails because of the device lock, fail this path entirely */
+		ret = pci_try_reset_function(pdev);
+		if (ret == -EAGAIN) {
+			pci_disable_device(pdev);
+			return ret;
+		}
 	}
 
 	vdev->reset_works = !ret;
@@ -388,25 +390,30 @@ void vfio_pci_core_disable(struct vfio_pci_core_device *vdev)
 	 */
 	pci_write_config_word(pdev, PCI_COMMAND, PCI_COMMAND_INTX_DISABLE);
 
-	/*
-	 * Try to get the locks ourselves to prevent a deadlock. The
-	 * success of this is dependent on being able to lock the device,
-	 * which is not always possible.
-	 * We can not use the "try" reset interface here, which will
-	 * overwrite the previously restored configuration information.
-	 */
-	if (vdev->reset_works && pci_dev_trylock(pdev)) {
-		if (!__pci_reset_function_locked(pdev))
-			vdev->needs_reset = false;
-		pci_dev_unlock(pdev);
+	if (!pdev->ignore_reset) {
+		/*
+		 * Try to get the locks ourselves to prevent a deadlock. The
+		 * success of this is dependent on being able to lock the device,
+		 * which is not always possible.
+		 * We can not use the "try" reset interface here, which will
+		 * overwrite the previously restored configuration information.
+		 */
+		if (vdev->reset_works && pci_dev_trylock(pdev)) {
+			if (!__pci_reset_function_locked(pdev))
+				vdev->needs_reset = false;
+			pci_dev_unlock(pdev);
+		}
 	}
 
 	pci_restore_state(pdev);
 out:
 	pci_disable_device(pdev);
 
-	if (!vfio_pci_dev_set_try_reset(vdev->vdev.dev_set) && !disable_idle_d3)
-		vfio_pci_set_power_state(vdev, PCI_D3hot);
+	if (!pdev->ignore_reset) {
+		if (!vfio_pci_dev_set_try_reset(vdev->vdev.dev_set) &&
+					!disable_idle_d3)
+			vfio_pci_set_power_state(vdev, PCI_D3hot);
+	}
 }
 EXPORT_SYMBOL_GPL(vfio_pci_core_disable);
 
@@ -919,6 +926,8 @@ long vfio_pci_core_ioctl(struct vfio_device *core_vdev, unsigned int cmd,
 
 		if (!vdev->reset_works)
 			return -EINVAL;
+		if (vdev->pdev->ignore_reset)
+			return -EINVAL;
 
 		vfio_pci_zap_and_down_write_memory_lock(vdev);
 		ret = pci_try_reset_function(vdev->pdev);
@@ -1007,6 +1016,9 @@ long vfio_pci_core_ioctl(struct vfio_device *core_vdev, unsigned int cmd,
 		bool slot = false;
 		int group_idx, count = 0, ret = 0;
 
+		if (vdev->pdev->ignore_reset)
+			return -EINVAL;
+
 		minsz = offsetofend(struct vfio_pci_hot_reset, count);
 
 		if (copy_from_user(&hdr, (void __user *)arg, minsz))
-- 
2.27.0


^ permalink raw reply related	[flat|nested] 5+ messages in thread

* Re: [PATCH v1 0/2] Add ablility of VFIO driver to ignore reset when device don't need it
  2021-10-14  9:57 [PATCH v1 0/2] Add ablility of VFIO driver to ignore reset when device don't need it Zhenguo Yao
  2021-10-14  9:57 ` [PATCH v1 1/2] PCI: Add ignore_reset sysfs interface to control whether to do device reset in PCI drivers Zhenguo Yao
  2021-10-14  9:57 ` [PATCH v1 2/2] vfio-pci: Don't do device reset when ignore_reset is setting Zhenguo Yao
@ 2021-10-14 12:48 ` Alex Williamson
  2021-10-14 13:37   ` Zhenguo Yao
  2 siblings, 1 reply; 5+ messages in thread
From: Alex Williamson @ 2021-10-14 12:48 UTC (permalink / raw)
  To: Zhenguo Yao
  Cc: bhelgaas, cohuck, jgg, mgurtovoy, yishaih, kvm, linux-pci,
	linux-kernel, yaozhenguo

On Thu, 14 Oct 2021 17:57:46 +0800
Zhenguo Yao <yaozhenguo1@gmail.com> wrote:

> In some scenarios, vfio device can't do any reset in initialization
> process. For example: Nvswitch and GPU A100 working in Shared NVSwitch
> Virtualization Model. In such mode, there are two type VMs: service
> VM and Guest VM. The GPU devices are initialized in the following steps:
> 
> 1. Service VM boot up. GPUs and Nvswitchs are passthrough to service VM.
> Nvidia driver and manager software will do some settings in service VM.
> 
> 2. The selected GPUs are unpluged from service VM.
> 
> 3. Guest VM boots up with the selected GPUs passthrough.
> 
> The selected GPUs can't do any reset in step3, or they will be initialized
> failed in Guest VM.
> 
> This patchset add a PCI sysfs interface:ignore_reset which drivers can
> use it to control whether to do PCI reset or not. For example: In Shared
> NVSwitch Virtualization Model. Hypervisor can disable PCI reset by setting
> ignore_reset to 1 before Gust VM booting up.
> 
> Zhenguo Yao (2):
>   PCI: Add ignore_reset sysfs interface to control whether do device
>     reset in PCI drivers
>   vfio-pci: Don't do device reset when ignore_reset is setting
> 
>  drivers/pci/pci-sysfs.c          | 25 +++++++++++++++++
>  drivers/vfio/pci/vfio_pci_core.c | 48 ++++++++++++++++++++------------
>  include/linux/pci.h              |  1 +
>  3 files changed, 56 insertions(+), 18 deletions(-)
> 

This all seems like code to mask that these NVSwitch configurations are
probably insecure because we can't factor and manage NVSwitch isolation
into IOMMU grouping.  I'm guessing this "service VM" pokes proprietary
registers to manage that isolation and perhaps later resetting devices
negates that programming.  A more proper solution is probably to do our
best to guess the span of an NVSwitch configuration and make the IOMMU
group include all the devices, until NVIDIA provides proper code for
the kernel to understand this interconnect and how it affects DMA
isolation.  Nak on disabling resets for the purpose of preventing a
user from undoing proprietary device programming.  Thanks,

Alex


^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH v1 0/2] Add ablility of VFIO driver to ignore reset when device don't need it
  2021-10-14 12:48 ` [PATCH v1 0/2] Add ablility of VFIO driver to ignore reset when device don't need it Alex Williamson
@ 2021-10-14 13:37   ` Zhenguo Yao
  0 siblings, 0 replies; 5+ messages in thread
From: Zhenguo Yao @ 2021-10-14 13:37 UTC (permalink / raw)
  To: Alex Williamson
  Cc: Bjorn Helgaas, cohuck, jgg, mgurtovoy, yishaih, kvm, linux-pci,
	linux-kernel, 姚振国

OK.  Thank you.  Let's waitting for NVIDIA's solution.

Alex Williamson <alex.williamson@redhat.com> 于2021年10月14日周四 下午8:48写道:
>
> On Thu, 14 Oct 2021 17:57:46 +0800
> Zhenguo Yao <yaozhenguo1@gmail.com> wrote:
>
> > In some scenarios, vfio device can't do any reset in initialization
> > process. For example: Nvswitch and GPU A100 working in Shared NVSwitch
> > Virtualization Model. In such mode, there are two type VMs: service
> > VM and Guest VM. The GPU devices are initialized in the following steps:
> >
> > 1. Service VM boot up. GPUs and Nvswitchs are passthrough to service VM.
> > Nvidia driver and manager software will do some settings in service VM.
> >
> > 2. The selected GPUs are unpluged from service VM.
> >
> > 3. Guest VM boots up with the selected GPUs passthrough.
> >
> > The selected GPUs can't do any reset in step3, or they will be initialized
> > failed in Guest VM.
> >
> > This patchset add a PCI sysfs interface:ignore_reset which drivers can
> > use it to control whether to do PCI reset or not. For example: In Shared
> > NVSwitch Virtualization Model. Hypervisor can disable PCI reset by setting
> > ignore_reset to 1 before Gust VM booting up.
> >
> > Zhenguo Yao (2):
> >   PCI: Add ignore_reset sysfs interface to control whether do device
> >     reset in PCI drivers
> >   vfio-pci: Don't do device reset when ignore_reset is setting
> >
> >  drivers/pci/pci-sysfs.c          | 25 +++++++++++++++++
> >  drivers/vfio/pci/vfio_pci_core.c | 48 ++++++++++++++++++++------------
> >  include/linux/pci.h              |  1 +
> >  3 files changed, 56 insertions(+), 18 deletions(-)
> >
>
> This all seems like code to mask that these NVSwitch configurations are
> probably insecure because we can't factor and manage NVSwitch isolation
> into IOMMU grouping.  I'm guessing this "service VM" pokes proprietary
> registers to manage that isolation and perhaps later resetting devices
> negates that programming.  A more proper solution is probably to do our
> best to guess the span of an NVSwitch configuration and make the IOMMU
> group include all the devices, until NVIDIA provides proper code for
> the kernel to understand this interconnect and how it affects DMA
> isolation.  Nak on disabling resets for the purpose of preventing a
> user from undoing proprietary device programming.  Thanks,
>
> Alex
>

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2021-10-14 13:38 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-10-14  9:57 [PATCH v1 0/2] Add ablility of VFIO driver to ignore reset when device don't need it Zhenguo Yao
2021-10-14  9:57 ` [PATCH v1 1/2] PCI: Add ignore_reset sysfs interface to control whether to do device reset in PCI drivers Zhenguo Yao
2021-10-14  9:57 ` [PATCH v1 2/2] vfio-pci: Don't do device reset when ignore_reset is setting Zhenguo Yao
2021-10-14 12:48 ` [PATCH v1 0/2] Add ablility of VFIO driver to ignore reset when device don't need it Alex Williamson
2021-10-14 13:37   ` Zhenguo Yao

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).