All of lore.kernel.org
 help / color / mirror / Atom feed
* [RFC PATCH v4 00/10] VFIO support for platform devices
@ 2014-02-08 17:29 ` Antonios Motakis
  0 siblings, 0 replies; 92+ messages in thread
From: Antonios Motakis @ 2014-02-08 17:29 UTC (permalink / raw)
  To: alex.williamson, kvmarm, iommu, linux-kernel, gregkh
  Cc: tech, a.rigo, B08248, kim.phillips, jan.kiszka, kvm, R65777,
	B07421, christoffer.dall, agraf, B16395, will.deacon,
	Antonios Motakis

v4 of this series is functionally identical to v3 for VFIO_PLATFORM. The only
change is the inclusion of Kim Phillips' patch to expose driver_probe_device()
and the implementation of a binding mechanism for arbitrary devices via a file
in sysfs. The latter has been folded in the skeleton patch (04/10).

This patch series aims to implement VFIO support for platform devices that
reside behind an IOMMU. Examples of such devices are devices behind an ARM SMMU,
or behind a Samsung Exynos System MMU.

The API used is based on the existing VFIO API that is also used with PCI
devices. Only devices that include a basic set of IRQs and memory regions are
targeted; devices with complex relationships with other devices on a device tree
are not taken into account at this stage.

A copy with all the dependencies applied can be cloned from branch
vfio-platform-v4 at git@github.com:virtualopensystems/linux-kvm-arm.git

This code can also be tested on ARM FastModels using the following test cases:
 - A user space implementation via VFIO for the PL330 on the FastModels:
   git@github.com:virtualopensystems/pl330-vfio-test.git
 - A QEMU prototype, also based on the PL330:
   git@github.com:virtualopensystems/qemu.git pl330-vfio-dev

   We have written detailed instructions on how to build and run these tests:
   http://www.virtualopensystems.com/en/solutions/guides/vfio-on-arm/

The following IOCTLs have been found to be working on FastModels with an
ARM SMMU (MMU400). Testing was based on the ARM PL330 DMA Controller featured
on those models.
 - VFIO_GET_API_VERSION
 - VFIO_CHECK_EXTENSION

The TYPE1 fix proposed here enables the following IOCTLs:
 - VFIO_GROUP_GET_STATUS
 - VFIO_GROUP_SET_CONTAINER
 - VFIO_SET_IOMMU
 - VFIO_IOMMU_GET_INFO
 - VFIO_IOMMU_MAP_DMA
     For this ioctl specifically, a new flag has been added:
     VFIO_DMA_MAP_FLAG_EXEC. This flag is taken into account on systems with an
     ARM SMMU.

The VFIO platform driver proposed here implements the following:
 - VFIO_GROUP_GET_DEVICE_FD
 - VFIO_DEVICE_GET_INFO
 - VFIO_DEVICE_GET_REGION_INFO
 - VFIO_DEVICE_GET_IRQ_INFO
 - VFIO_DEVICE_SET_IRQS
     IRQs are implemented partially using this ioctl. Handling incoming
     interrupts with an eventfd is supported, as is masking and unmasking.
     Level sensitive interrupts are automasked. What is not implemented is
     masking/unmasking via eventfd.

In addition, the VFIO platform driver implements the following through
the VFIO device file descriptor:
 - MMAPing memory regions to the virtual address space of the VFIO user.
 - Read / write of memory regions directly through the file descriptor.

What still needs to be done, includes:
 - Eventfd for masking/unmasking
 - Extend the driver and API for device tree metadata
 - QEMU / KVM integration
 - Device specific functionality (e.g. VFIO_DEVICE_RESET)
 - Improve VFIO_IOMMU_TYPE1 driver to support multiple buses at the same time
 - Bind to ARM AMBA devices
 - IOMMUs with nested page tables (Stage 1 & 2 translation on ARM SMMUs)

Changes since v3:
 - Use Kim Phillips' driver_probe_device()
Changes since v2:
 - Fixed Read/Write and MMAP on device regions
 - Removed dependency on Device Tree
 - Interrupts support
 - Interrupt masking/unmasking
 - Automask level sensitive interrupts
 - Introduced VFIO_DMA_MAP_FLAG_EXEC
 - Code clean ups

Antonios Motakis (9):
  VFIO_IOMMU_TYPE1: Introduce the VFIO_DMA_MAP_FLAG_EXEC flag
  VFIO_IOMMU_TYPE1: workaround to build for platform devices
  VFIO_PLATFORM: Initial skeleton of VFIO support for platform devices
  VFIO_PLATFORM: Return info for device and its memory mapped IO regions
  VFIO_PLATFORM: Read and write support for the device fd
  VFIO_PLATFORM: Support MMAP of MMIO regions
  VFIO_PLATFORM: Return IRQ info
  VFIO_PLATFORM: Initial interrupts support
  VFIO_PLATFORM: Support for maskable and automasked interrupts

Kim Phillips (1):
  driver core: export driver_probe_device()

 drivers/base/base.h                           |   1 -
 drivers/base/dd.c                             |   1 +
 drivers/vfio/Kconfig                          |   3 +-
 drivers/vfio/Makefile                         |   1 +
 drivers/vfio/platform/Kconfig                 |   9 +
 drivers/vfio/platform/Makefile                |   4 +
 drivers/vfio/platform/vfio_platform.c         | 493 ++++++++++++++++++++++++++
 drivers/vfio/platform/vfio_platform_irq.c     | 289 +++++++++++++++
 drivers/vfio/platform/vfio_platform_private.h |  50 +++
 drivers/vfio/vfio_iommu_type1.c               |  27 +-
 include/linux/device.h                        |   1 +
 include/uapi/linux/vfio.h                     |   2 +
 12 files changed, 874 insertions(+), 7 deletions(-)
 create mode 100644 drivers/vfio/platform/Kconfig
 create mode 100644 drivers/vfio/platform/Makefile
 create mode 100644 drivers/vfio/platform/vfio_platform.c
 create mode 100644 drivers/vfio/platform/vfio_platform_irq.c
 create mode 100644 drivers/vfio/platform/vfio_platform_private.h

-- 
1.8.3.2


^ permalink raw reply	[flat|nested] 92+ messages in thread

* [RFC PATCH v4 00/10] VFIO support for platform devices
@ 2014-02-08 17:29 ` Antonios Motakis
  0 siblings, 0 replies; 92+ messages in thread
From: Antonios Motakis @ 2014-02-08 17:29 UTC (permalink / raw)
  To: alex.williamson-H+wXaHxf7aLQT0dZR+AlfA,
	kvmarm-FPEHb7Xf0XXUo1n7N8X6UoWGPAHP3yOg,
	iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	gregkh-hQyY1W1yCW8ekmWlsbkhG0B+6BGkLq7r
  Cc: B07421-KZfg59tc24xl57MIdRCFDg, Antonios Motakis,
	kvm-u79uwXL29TY76Z2rM5mHXA, jan.kiszka-kv7WeFo6aLtBDgjK7y7TUQ,
	will.deacon-5wv7dgnIgG8, a.rigo-lrHrjnjw1UfHK3s98zE1ajGjJy/sRE9J,
	agraf-l3A5Bk7waGM, B08248-KZfg59tc24xl57MIdRCFDg,
	R65777-KZfg59tc24xl57MIdRCFDg,
	tech-lrHrjnjw1UfHK3s98zE1ajGjJy/sRE9J,
	christoffer.dall-QSEj5FYQhm4dnm+yROfE0A

v4 of this series is functionally identical to v3 for VFIO_PLATFORM. The only
change is the inclusion of Kim Phillips' patch to expose driver_probe_device()
and the implementation of a binding mechanism for arbitrary devices via a file
in sysfs. The latter has been folded in the skeleton patch (04/10).

This patch series aims to implement VFIO support for platform devices that
reside behind an IOMMU. Examples of such devices are devices behind an ARM SMMU,
or behind a Samsung Exynos System MMU.

The API used is based on the existing VFIO API that is also used with PCI
devices. Only devices that include a basic set of IRQs and memory regions are
targeted; devices with complex relationships with other devices on a device tree
are not taken into account at this stage.

A copy with all the dependencies applied can be cloned from branch
vfio-platform-v4 at git-9UaJU3cA/F/QT0dZR+AlfA@public.gmane.org:virtualopensystems/linux-kvm-arm.git

This code can also be tested on ARM FastModels using the following test cases:
 - A user space implementation via VFIO for the PL330 on the FastModels:
   git-9UaJU3cA/F/QT0dZR+AlfA@public.gmane.org:virtualopensystems/pl330-vfio-test.git
 - A QEMU prototype, also based on the PL330:
   git-9UaJU3cA/F/QT0dZR+AlfA@public.gmane.org:virtualopensystems/qemu.git pl330-vfio-dev

   We have written detailed instructions on how to build and run these tests:
   http://www.virtualopensystems.com/en/solutions/guides/vfio-on-arm/

The following IOCTLs have been found to be working on FastModels with an
ARM SMMU (MMU400). Testing was based on the ARM PL330 DMA Controller featured
on those models.
 - VFIO_GET_API_VERSION
 - VFIO_CHECK_EXTENSION

The TYPE1 fix proposed here enables the following IOCTLs:
 - VFIO_GROUP_GET_STATUS
 - VFIO_GROUP_SET_CONTAINER
 - VFIO_SET_IOMMU
 - VFIO_IOMMU_GET_INFO
 - VFIO_IOMMU_MAP_DMA
     For this ioctl specifically, a new flag has been added:
     VFIO_DMA_MAP_FLAG_EXEC. This flag is taken into account on systems with an
     ARM SMMU.

The VFIO platform driver proposed here implements the following:
 - VFIO_GROUP_GET_DEVICE_FD
 - VFIO_DEVICE_GET_INFO
 - VFIO_DEVICE_GET_REGION_INFO
 - VFIO_DEVICE_GET_IRQ_INFO
 - VFIO_DEVICE_SET_IRQS
     IRQs are implemented partially using this ioctl. Handling incoming
     interrupts with an eventfd is supported, as is masking and unmasking.
     Level sensitive interrupts are automasked. What is not implemented is
     masking/unmasking via eventfd.

In addition, the VFIO platform driver implements the following through
the VFIO device file descriptor:
 - MMAPing memory regions to the virtual address space of the VFIO user.
 - Read / write of memory regions directly through the file descriptor.

What still needs to be done, includes:
 - Eventfd for masking/unmasking
 - Extend the driver and API for device tree metadata
 - QEMU / KVM integration
 - Device specific functionality (e.g. VFIO_DEVICE_RESET)
 - Improve VFIO_IOMMU_TYPE1 driver to support multiple buses at the same time
 - Bind to ARM AMBA devices
 - IOMMUs with nested page tables (Stage 1 & 2 translation on ARM SMMUs)

Changes since v3:
 - Use Kim Phillips' driver_probe_device()
Changes since v2:
 - Fixed Read/Write and MMAP on device regions
 - Removed dependency on Device Tree
 - Interrupts support
 - Interrupt masking/unmasking
 - Automask level sensitive interrupts
 - Introduced VFIO_DMA_MAP_FLAG_EXEC
 - Code clean ups

Antonios Motakis (9):
  VFIO_IOMMU_TYPE1: Introduce the VFIO_DMA_MAP_FLAG_EXEC flag
  VFIO_IOMMU_TYPE1: workaround to build for platform devices
  VFIO_PLATFORM: Initial skeleton of VFIO support for platform devices
  VFIO_PLATFORM: Return info for device and its memory mapped IO regions
  VFIO_PLATFORM: Read and write support for the device fd
  VFIO_PLATFORM: Support MMAP of MMIO regions
  VFIO_PLATFORM: Return IRQ info
  VFIO_PLATFORM: Initial interrupts support
  VFIO_PLATFORM: Support for maskable and automasked interrupts

Kim Phillips (1):
  driver core: export driver_probe_device()

 drivers/base/base.h                           |   1 -
 drivers/base/dd.c                             |   1 +
 drivers/vfio/Kconfig                          |   3 +-
 drivers/vfio/Makefile                         |   1 +
 drivers/vfio/platform/Kconfig                 |   9 +
 drivers/vfio/platform/Makefile                |   4 +
 drivers/vfio/platform/vfio_platform.c         | 493 ++++++++++++++++++++++++++
 drivers/vfio/platform/vfio_platform_irq.c     | 289 +++++++++++++++
 drivers/vfio/platform/vfio_platform_private.h |  50 +++
 drivers/vfio/vfio_iommu_type1.c               |  27 +-
 include/linux/device.h                        |   1 +
 include/uapi/linux/vfio.h                     |   2 +
 12 files changed, 874 insertions(+), 7 deletions(-)
 create mode 100644 drivers/vfio/platform/Kconfig
 create mode 100644 drivers/vfio/platform/Makefile
 create mode 100644 drivers/vfio/platform/vfio_platform.c
 create mode 100644 drivers/vfio/platform/vfio_platform_irq.c
 create mode 100644 drivers/vfio/platform/vfio_platform_private.h

-- 
1.8.3.2

^ permalink raw reply	[flat|nested] 92+ messages in thread

* [RFC PATCH v4 01/10] driver core: export driver_probe_device()
@ 2014-02-08 17:29   ` Antonios Motakis
  0 siblings, 0 replies; 92+ messages in thread
From: Antonios Motakis @ 2014-02-08 17:29 UTC (permalink / raw)
  To: alex.williamson, kvmarm, iommu, linux-kernel, gregkh
  Cc: tech, a.rigo, B08248, kim.phillips, jan.kiszka, kvm, R65777,
	B07421, christoffer.dall, agraf, B16395, will.deacon, Tejun Heo,
	Rafael J. Wysocki, Guenter Roeck, Toshi Kani, Joe Perches,
	Dmitry Kasatkin, Michal Hocko, Bjorn Helgaas

From: Kim Phillips <kim.phillips@linaro.org>

Needed by drivers, such as the vfio platform driver [1], seeking to
bypass bind_store()'s driver_match_device(), and bind to any device
via a private sysfs bind file.

[1] https://lkml.org/lkml/2013/12/11/522

note: the EXPORT_SYMBOL is needed because vfio-platform can be built
as a module.

Signed-off-by: Kim Phillips <kim.phillips@linaro.org>
---
 drivers/base/base.h    | 1 -
 drivers/base/dd.c      | 1 +
 include/linux/device.h | 1 +
 3 files changed, 2 insertions(+), 1 deletion(-)

diff --git a/drivers/base/base.h b/drivers/base/base.h
index 24f4242..fe25ad87 100644
--- a/drivers/base/base.h
+++ b/drivers/base/base.h
@@ -112,7 +112,6 @@ extern int bus_add_driver(struct device_driver *drv);
 extern void bus_remove_driver(struct device_driver *drv);
 
 extern void driver_detach(struct device_driver *drv);
-extern int driver_probe_device(struct device_driver *drv, struct device *dev);
 extern void driver_deferred_probe_del(struct device *dev);
 static inline int driver_match_device(struct device_driver *drv,
 				      struct device *dev)
diff --git a/drivers/base/dd.c b/drivers/base/dd.c
index 0605176..44f6184 100644
--- a/drivers/base/dd.c
+++ b/drivers/base/dd.c
@@ -384,6 +384,7 @@ int driver_probe_device(struct device_driver *drv, struct device *dev)
 
 	return ret;
 }
+EXPORT_SYMBOL_GPL(driver_probe_device);
 
 static int __device_attach(struct device_driver *drv, void *data)
 {
diff --git a/include/linux/device.h b/include/linux/device.h
index 952b010..ad80dd2 100644
--- a/include/linux/device.h
+++ b/include/linux/device.h
@@ -257,6 +257,7 @@ extern struct device_driver *driver_find(const char *name,
 					 struct bus_type *bus);
 extern int driver_probe_done(void);
 extern void wait_for_device_probe(void);
+extern int driver_probe_device(struct device_driver *drv, struct device *dev);
 
 
 /* sysfs interface for exporting driver attributes */
-- 
1.8.3.2


^ permalink raw reply related	[flat|nested] 92+ messages in thread

* [RFC PATCH v4 01/10] driver core: export driver_probe_device()
@ 2014-02-08 17:29   ` Antonios Motakis
  0 siblings, 0 replies; 92+ messages in thread
From: Antonios Motakis @ 2014-02-08 17:29 UTC (permalink / raw)
  To: alex.williamson-H+wXaHxf7aLQT0dZR+AlfA,
	kvmarm-FPEHb7Xf0XXUo1n7N8X6UoWGPAHP3yOg,
	iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	gregkh-hQyY1W1yCW8ekmWlsbkhG0B+6BGkLq7r
  Cc: B07421-KZfg59tc24xl57MIdRCFDg, Dmitry Kasatkin, Toshi Kani,
	kvm-u79uwXL29TY76Z2rM5mHXA, jan.kiszka-kv7WeFo6aLtBDgjK7y7TUQ,
	Rafael J. Wysocki, will.deacon-5wv7dgnIgG8,
	a.rigo-lrHrjnjw1UfHK3s98zE1ajGjJy/sRE9J, agraf-l3A5Bk7waGM,
	Joe Perches, B08248-KZfg59tc24xl57MIdRCFDg, Guenter Roeck,
	R65777-KZfg59tc24xl57MIdRCFDg, Tejun Heo,
	tech-lrHrjnjw1UfHK3s98zE1ajGjJy/sRE9J, Bjorn Helgaas,
	Michal Hocko, christoffer.dall-QSEj5FYQhm4dnm+yROfE0A

From: Kim Phillips <kim.phillips-QSEj5FYQhm4dnm+yROfE0A@public.gmane.org>

Needed by drivers, such as the vfio platform driver [1], seeking to
bypass bind_store()'s driver_match_device(), and bind to any device
via a private sysfs bind file.

[1] https://lkml.org/lkml/2013/12/11/522

note: the EXPORT_SYMBOL is needed because vfio-platform can be built
as a module.

Signed-off-by: Kim Phillips <kim.phillips-QSEj5FYQhm4dnm+yROfE0A@public.gmane.org>
---
 drivers/base/base.h    | 1 -
 drivers/base/dd.c      | 1 +
 include/linux/device.h | 1 +
 3 files changed, 2 insertions(+), 1 deletion(-)

diff --git a/drivers/base/base.h b/drivers/base/base.h
index 24f4242..fe25ad87 100644
--- a/drivers/base/base.h
+++ b/drivers/base/base.h
@@ -112,7 +112,6 @@ extern int bus_add_driver(struct device_driver *drv);
 extern void bus_remove_driver(struct device_driver *drv);
 
 extern void driver_detach(struct device_driver *drv);
-extern int driver_probe_device(struct device_driver *drv, struct device *dev);
 extern void driver_deferred_probe_del(struct device *dev);
 static inline int driver_match_device(struct device_driver *drv,
 				      struct device *dev)
diff --git a/drivers/base/dd.c b/drivers/base/dd.c
index 0605176..44f6184 100644
--- a/drivers/base/dd.c
+++ b/drivers/base/dd.c
@@ -384,6 +384,7 @@ int driver_probe_device(struct device_driver *drv, struct device *dev)
 
 	return ret;
 }
+EXPORT_SYMBOL_GPL(driver_probe_device);
 
 static int __device_attach(struct device_driver *drv, void *data)
 {
diff --git a/include/linux/device.h b/include/linux/device.h
index 952b010..ad80dd2 100644
--- a/include/linux/device.h
+++ b/include/linux/device.h
@@ -257,6 +257,7 @@ extern struct device_driver *driver_find(const char *name,
 					 struct bus_type *bus);
 extern int driver_probe_done(void);
 extern void wait_for_device_probe(void);
+extern int driver_probe_device(struct device_driver *drv, struct device *dev);
 
 
 /* sysfs interface for exporting driver attributes */
-- 
1.8.3.2

^ permalink raw reply related	[flat|nested] 92+ messages in thread

* [RFC PATCH v4 02/10] VFIO_IOMMU_TYPE1: Introduce the VFIO_DMA_MAP_FLAG_EXEC flag
@ 2014-02-08 17:29   ` Antonios Motakis
  0 siblings, 0 replies; 92+ messages in thread
From: Antonios Motakis @ 2014-02-08 17:29 UTC (permalink / raw)
  To: alex.williamson, kvmarm, iommu, linux-kernel, gregkh
  Cc: tech, a.rigo, B08248, kim.phillips, jan.kiszka, kvm, R65777,
	B07421, christoffer.dall, agraf, B16395, will.deacon,
	Antonios Motakis

The ARM SMMU driver expects the IOMMU_EXEC flag, otherwise it will
set the page tables for a device as XN (execute never). This affects
devices such as the ARM PL330 DMA Controller, which fails to operate
if the XN flag is set on the memory it tries to fetch its instructions
from.

We introduce the VFIO_DMA_MAP_FLAG_EXEC to VFIO, and use it in
VFIO_IOMMU_TYPE1 to set the IOMMU_EXEC flag. This way the user can
control whether the XN flag will be set on the requested mappings.

Signed-off-by: Antonios Motakis <a.motakis@virtualopensystems.com>
---
 drivers/vfio/vfio_iommu_type1.c | 5 ++++-
 include/uapi/linux/vfio.h       | 1 +
 2 files changed, 5 insertions(+), 1 deletion(-)

diff --git a/drivers/vfio/vfio_iommu_type1.c b/drivers/vfio/vfio_iommu_type1.c
index 4fb7a8f..ad7a1f6 100644
--- a/drivers/vfio/vfio_iommu_type1.c
+++ b/drivers/vfio/vfio_iommu_type1.c
@@ -557,6 +557,8 @@ static int vfio_dma_do_map(struct vfio_iommu *iommu,
 		prot |= IOMMU_WRITE;
 	if (map->flags & VFIO_DMA_MAP_FLAG_READ)
 		prot |= IOMMU_READ;
+	if (map->flags & VFIO_DMA_MAP_FLAG_EXEC)
+		prot |= IOMMU_EXEC;
 
 	if (!prot)
 		return -EINVAL; /* No READ/WRITE? */
@@ -865,7 +867,8 @@ static long vfio_iommu_type1_ioctl(void *iommu_data,
 	} else if (cmd == VFIO_IOMMU_MAP_DMA) {
 		struct vfio_iommu_type1_dma_map map;
 		uint32_t mask = VFIO_DMA_MAP_FLAG_READ |
-				VFIO_DMA_MAP_FLAG_WRITE;
+				VFIO_DMA_MAP_FLAG_WRITE |
+				VFIO_DMA_MAP_FLAG_EXEC;
 
 		minsz = offsetofend(struct vfio_iommu_type1_dma_map, size);
 
diff --git a/include/uapi/linux/vfio.h b/include/uapi/linux/vfio.h
index 0fd47f5..d8e9e99 100644
--- a/include/uapi/linux/vfio.h
+++ b/include/uapi/linux/vfio.h
@@ -392,6 +392,7 @@ struct vfio_iommu_type1_dma_map {
 	__u32	flags;
 #define VFIO_DMA_MAP_FLAG_READ (1 << 0)		/* readable from device */
 #define VFIO_DMA_MAP_FLAG_WRITE (1 << 1)	/* writable from device */
+#define VFIO_DMA_MAP_FLAG_EXEC (1 << 2)		/* executable from device */
 	__u64	vaddr;				/* Process virtual address */
 	__u64	iova;				/* IO virtual address */
 	__u64	size;				/* Size of mapping (bytes) */
-- 
1.8.3.2


^ permalink raw reply related	[flat|nested] 92+ messages in thread

* [RFC PATCH v4 02/10] VFIO_IOMMU_TYPE1: Introduce the VFIO_DMA_MAP_FLAG_EXEC flag
@ 2014-02-08 17:29   ` Antonios Motakis
  0 siblings, 0 replies; 92+ messages in thread
From: Antonios Motakis @ 2014-02-08 17:29 UTC (permalink / raw)
  To: alex.williamson-H+wXaHxf7aLQT0dZR+AlfA,
	kvmarm-FPEHb7Xf0XXUo1n7N8X6UoWGPAHP3yOg,
	iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	gregkh-hQyY1W1yCW8ekmWlsbkhG0B+6BGkLq7r
  Cc: B07421-KZfg59tc24xl57MIdRCFDg, Antonios Motakis,
	kvm-u79uwXL29TY76Z2rM5mHXA, jan.kiszka-kv7WeFo6aLtBDgjK7y7TUQ,
	will.deacon-5wv7dgnIgG8, a.rigo-lrHrjnjw1UfHK3s98zE1ajGjJy/sRE9J,
	agraf-l3A5Bk7waGM, B08248-KZfg59tc24xl57MIdRCFDg,
	R65777-KZfg59tc24xl57MIdRCFDg,
	tech-lrHrjnjw1UfHK3s98zE1ajGjJy/sRE9J,
	christoffer.dall-QSEj5FYQhm4dnm+yROfE0A

The ARM SMMU driver expects the IOMMU_EXEC flag, otherwise it will
set the page tables for a device as XN (execute never). This affects
devices such as the ARM PL330 DMA Controller, which fails to operate
if the XN flag is set on the memory it tries to fetch its instructions
from.

We introduce the VFIO_DMA_MAP_FLAG_EXEC to VFIO, and use it in
VFIO_IOMMU_TYPE1 to set the IOMMU_EXEC flag. This way the user can
control whether the XN flag will be set on the requested mappings.

Signed-off-by: Antonios Motakis <a.motakis-lrHrjnjw1UfHK3s98zE1ajGjJy/sRE9J@public.gmane.org>
---
 drivers/vfio/vfio_iommu_type1.c | 5 ++++-
 include/uapi/linux/vfio.h       | 1 +
 2 files changed, 5 insertions(+), 1 deletion(-)

diff --git a/drivers/vfio/vfio_iommu_type1.c b/drivers/vfio/vfio_iommu_type1.c
index 4fb7a8f..ad7a1f6 100644
--- a/drivers/vfio/vfio_iommu_type1.c
+++ b/drivers/vfio/vfio_iommu_type1.c
@@ -557,6 +557,8 @@ static int vfio_dma_do_map(struct vfio_iommu *iommu,
 		prot |= IOMMU_WRITE;
 	if (map->flags & VFIO_DMA_MAP_FLAG_READ)
 		prot |= IOMMU_READ;
+	if (map->flags & VFIO_DMA_MAP_FLAG_EXEC)
+		prot |= IOMMU_EXEC;
 
 	if (!prot)
 		return -EINVAL; /* No READ/WRITE? */
@@ -865,7 +867,8 @@ static long vfio_iommu_type1_ioctl(void *iommu_data,
 	} else if (cmd == VFIO_IOMMU_MAP_DMA) {
 		struct vfio_iommu_type1_dma_map map;
 		uint32_t mask = VFIO_DMA_MAP_FLAG_READ |
-				VFIO_DMA_MAP_FLAG_WRITE;
+				VFIO_DMA_MAP_FLAG_WRITE |
+				VFIO_DMA_MAP_FLAG_EXEC;
 
 		minsz = offsetofend(struct vfio_iommu_type1_dma_map, size);
 
diff --git a/include/uapi/linux/vfio.h b/include/uapi/linux/vfio.h
index 0fd47f5..d8e9e99 100644
--- a/include/uapi/linux/vfio.h
+++ b/include/uapi/linux/vfio.h
@@ -392,6 +392,7 @@ struct vfio_iommu_type1_dma_map {
 	__u32	flags;
 #define VFIO_DMA_MAP_FLAG_READ (1 << 0)		/* readable from device */
 #define VFIO_DMA_MAP_FLAG_WRITE (1 << 1)	/* writable from device */
+#define VFIO_DMA_MAP_FLAG_EXEC (1 << 2)		/* executable from device */
 	__u64	vaddr;				/* Process virtual address */
 	__u64	iova;				/* IO virtual address */
 	__u64	size;				/* Size of mapping (bytes) */
-- 
1.8.3.2

^ permalink raw reply related	[flat|nested] 92+ messages in thread

* [RFC PATCH v4 03/10] VFIO_IOMMU_TYPE1: workaround to build for platform devices
@ 2014-02-08 17:29   ` Antonios Motakis
  0 siblings, 0 replies; 92+ messages in thread
From: Antonios Motakis @ 2014-02-08 17:29 UTC (permalink / raw)
  To: alex.williamson, kvmarm, iommu, linux-kernel, gregkh
  Cc: tech, a.rigo, B08248, kim.phillips, jan.kiszka, kvm, R65777,
	B07421, christoffer.dall, agraf, B16395, will.deacon,
	Antonios Motakis

This is a workaround to make the VFIO_IOMMU_TYPE1 driver usable with
platform devices instead of PCI. A future permanent fix should support
both. This is required in order to use the Exynos SMMU, or ARM SMMU
driver with VFIO.

Signed-off-by: Antonios Motakis <a.motakis@virtualopensystems.com>
---
 drivers/vfio/Kconfig            |  2 +-
 drivers/vfio/vfio_iommu_type1.c | 22 ++++++++++++++++++----
 2 files changed, 19 insertions(+), 5 deletions(-)

diff --git a/drivers/vfio/Kconfig b/drivers/vfio/Kconfig
index 26b3d9d..bd50721 100644
--- a/drivers/vfio/Kconfig
+++ b/drivers/vfio/Kconfig
@@ -11,7 +11,7 @@ config VFIO_IOMMU_SPAPR_TCE
 menuconfig VFIO
 	tristate "VFIO Non-Privileged userspace driver framework"
 	depends on IOMMU_API
-	select VFIO_IOMMU_TYPE1 if X86
+	select VFIO_IOMMU_TYPE1 if X86 || ARM
 	select VFIO_IOMMU_SPAPR_TCE if (PPC_POWERNV || PPC_PSERIES)
 	help
 	  VFIO provides a framework for secure userspace device drivers.
diff --git a/drivers/vfio/vfio_iommu_type1.c b/drivers/vfio/vfio_iommu_type1.c
index ad7a1f6..81e65f4 100644
--- a/drivers/vfio/vfio_iommu_type1.c
+++ b/drivers/vfio/vfio_iommu_type1.c
@@ -30,7 +30,8 @@
 #include <linux/iommu.h>
 #include <linux/module.h>
 #include <linux/mm.h>
-#include <linux/pci.h>		/* pci_bus_type */
+#include <linux/pci.h>			/* pci_bus_type */
+#include <linux/platform_device.h>	/* platform_bus_type */
 #include <linux/rbtree.h>
 #include <linux/sched.h>
 #include <linux/slab.h>
@@ -47,6 +48,8 @@ module_param_named(allow_unsafe_interrupts,
 		   allow_unsafe_interrupts, bool, S_IRUGO | S_IWUSR);
 MODULE_PARM_DESC(allow_unsafe_interrupts,
 		 "Enable VFIO IOMMU support for on platforms without interrupt remapping support.");
+static struct bus_type *iommu_bus_type = NULL;
+static bool require_cap_intr_remap = false;
 
 static bool disable_hugepages;
 module_param_named(disable_hugepages,
@@ -785,7 +788,8 @@ static void *vfio_iommu_type1_open(unsigned long arg)
 	/*
 	 * Wish we didn't have to know about bus_type here.
 	 */
-	iommu->domain = iommu_domain_alloc(&pci_bus_type);
+	iommu->domain = iommu_domain_alloc(iommu_bus_type);
+
 	if (!iommu->domain) {
 		kfree(iommu);
 		return ERR_PTR(-EIO);
@@ -797,7 +801,7 @@ static void *vfio_iommu_type1_open(unsigned long arg)
 	 * the way.  Fortunately we know interrupt remapping is global for
 	 * our iommus.
 	 */
-	if (!allow_unsafe_interrupts &&
+	if (require_cap_intr_remap && !allow_unsafe_interrupts &&
 	    !iommu_domain_has_cap(iommu->domain, IOMMU_CAP_INTR_REMAP)) {
 		pr_warn("%s: No interrupt remapping support.  Use the module param \"allow_unsafe_interrupts\" to enable VFIO IOMMU support on this platform\n",
 		       __func__);
@@ -914,7 +918,17 @@ static const struct vfio_iommu_driver_ops vfio_iommu_driver_ops_type1 = {
 
 static int __init vfio_iommu_type1_init(void)
 {
-	if (!iommu_present(&pci_bus_type))
+#ifdef CONFIG_PCI
+	if (iommu_present(&pci_bus_type)) {
+		iommu_bus_type = &pci_bus_type;
+		/* For PCI targets, IOMMU_CAP_INTR_REMAP is required */
+		require_cap_intr_remap = true;
+	}
+#endif
+	if (!iommu_bus_type && iommu_present(&platform_bus_type))
+		iommu_bus_type = &platform_bus_type;
+
+	if(!iommu_bus_type)
 		return -ENODEV;
 
 	return vfio_register_iommu_driver(&vfio_iommu_driver_ops_type1);
-- 
1.8.3.2


^ permalink raw reply related	[flat|nested] 92+ messages in thread

* [RFC PATCH v4 03/10] VFIO_IOMMU_TYPE1: workaround to build for platform devices
@ 2014-02-08 17:29   ` Antonios Motakis
  0 siblings, 0 replies; 92+ messages in thread
From: Antonios Motakis @ 2014-02-08 17:29 UTC (permalink / raw)
  To: alex.williamson-H+wXaHxf7aLQT0dZR+AlfA,
	kvmarm-FPEHb7Xf0XXUo1n7N8X6UoWGPAHP3yOg,
	iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	gregkh-hQyY1W1yCW8ekmWlsbkhG0B+6BGkLq7r
  Cc: B07421-KZfg59tc24xl57MIdRCFDg, Antonios Motakis,
	kvm-u79uwXL29TY76Z2rM5mHXA, jan.kiszka-kv7WeFo6aLtBDgjK7y7TUQ,
	will.deacon-5wv7dgnIgG8, a.rigo-lrHrjnjw1UfHK3s98zE1ajGjJy/sRE9J,
	agraf-l3A5Bk7waGM, B08248-KZfg59tc24xl57MIdRCFDg,
	R65777-KZfg59tc24xl57MIdRCFDg,
	tech-lrHrjnjw1UfHK3s98zE1ajGjJy/sRE9J,
	christoffer.dall-QSEj5FYQhm4dnm+yROfE0A

This is a workaround to make the VFIO_IOMMU_TYPE1 driver usable with
platform devices instead of PCI. A future permanent fix should support
both. This is required in order to use the Exynos SMMU, or ARM SMMU
driver with VFIO.

Signed-off-by: Antonios Motakis <a.motakis-lrHrjnjw1UfHK3s98zE1ajGjJy/sRE9J@public.gmane.org>
---
 drivers/vfio/Kconfig            |  2 +-
 drivers/vfio/vfio_iommu_type1.c | 22 ++++++++++++++++++----
 2 files changed, 19 insertions(+), 5 deletions(-)

diff --git a/drivers/vfio/Kconfig b/drivers/vfio/Kconfig
index 26b3d9d..bd50721 100644
--- a/drivers/vfio/Kconfig
+++ b/drivers/vfio/Kconfig
@@ -11,7 +11,7 @@ config VFIO_IOMMU_SPAPR_TCE
 menuconfig VFIO
 	tristate "VFIO Non-Privileged userspace driver framework"
 	depends on IOMMU_API
-	select VFIO_IOMMU_TYPE1 if X86
+	select VFIO_IOMMU_TYPE1 if X86 || ARM
 	select VFIO_IOMMU_SPAPR_TCE if (PPC_POWERNV || PPC_PSERIES)
 	help
 	  VFIO provides a framework for secure userspace device drivers.
diff --git a/drivers/vfio/vfio_iommu_type1.c b/drivers/vfio/vfio_iommu_type1.c
index ad7a1f6..81e65f4 100644
--- a/drivers/vfio/vfio_iommu_type1.c
+++ b/drivers/vfio/vfio_iommu_type1.c
@@ -30,7 +30,8 @@
 #include <linux/iommu.h>
 #include <linux/module.h>
 #include <linux/mm.h>
-#include <linux/pci.h>		/* pci_bus_type */
+#include <linux/pci.h>			/* pci_bus_type */
+#include <linux/platform_device.h>	/* platform_bus_type */
 #include <linux/rbtree.h>
 #include <linux/sched.h>
 #include <linux/slab.h>
@@ -47,6 +48,8 @@ module_param_named(allow_unsafe_interrupts,
 		   allow_unsafe_interrupts, bool, S_IRUGO | S_IWUSR);
 MODULE_PARM_DESC(allow_unsafe_interrupts,
 		 "Enable VFIO IOMMU support for on platforms without interrupt remapping support.");
+static struct bus_type *iommu_bus_type = NULL;
+static bool require_cap_intr_remap = false;
 
 static bool disable_hugepages;
 module_param_named(disable_hugepages,
@@ -785,7 +788,8 @@ static void *vfio_iommu_type1_open(unsigned long arg)
 	/*
 	 * Wish we didn't have to know about bus_type here.
 	 */
-	iommu->domain = iommu_domain_alloc(&pci_bus_type);
+	iommu->domain = iommu_domain_alloc(iommu_bus_type);
+
 	if (!iommu->domain) {
 		kfree(iommu);
 		return ERR_PTR(-EIO);
@@ -797,7 +801,7 @@ static void *vfio_iommu_type1_open(unsigned long arg)
 	 * the way.  Fortunately we know interrupt remapping is global for
 	 * our iommus.
 	 */
-	if (!allow_unsafe_interrupts &&
+	if (require_cap_intr_remap && !allow_unsafe_interrupts &&
 	    !iommu_domain_has_cap(iommu->domain, IOMMU_CAP_INTR_REMAP)) {
 		pr_warn("%s: No interrupt remapping support.  Use the module param \"allow_unsafe_interrupts\" to enable VFIO IOMMU support on this platform\n",
 		       __func__);
@@ -914,7 +918,17 @@ static const struct vfio_iommu_driver_ops vfio_iommu_driver_ops_type1 = {
 
 static int __init vfio_iommu_type1_init(void)
 {
-	if (!iommu_present(&pci_bus_type))
+#ifdef CONFIG_PCI
+	if (iommu_present(&pci_bus_type)) {
+		iommu_bus_type = &pci_bus_type;
+		/* For PCI targets, IOMMU_CAP_INTR_REMAP is required */
+		require_cap_intr_remap = true;
+	}
+#endif
+	if (!iommu_bus_type && iommu_present(&platform_bus_type))
+		iommu_bus_type = &platform_bus_type;
+
+	if(!iommu_bus_type)
 		return -ENODEV;
 
 	return vfio_register_iommu_driver(&vfio_iommu_driver_ops_type1);
-- 
1.8.3.2

^ permalink raw reply related	[flat|nested] 92+ messages in thread

* [RFC PATCH v4 04/10] VFIO_PLATFORM: Initial skeleton of VFIO support for platform devices
  2014-02-08 17:29 ` Antonios Motakis
                   ` (3 preceding siblings ...)
  (?)
@ 2014-02-08 17:29 ` Antonios Motakis
  -1 siblings, 0 replies; 92+ messages in thread
From: Antonios Motakis @ 2014-02-08 17:29 UTC (permalink / raw)
  To: alex.williamson, kvmarm, iommu, linux-kernel, gregkh
  Cc: tech, a.rigo, B08248, kim.phillips, jan.kiszka, kvm, R65777,
	B07421, christoffer.dall, agraf, B16395, will.deacon,
	Antonios Motakis, Catalin Marinas, Mark Rutland

This patch forms the skeleton for platform devices support with VFIO.
We implement a new 'vfio_bind' sysfs file, that, once written
with a device ID, binds the platform driver to the device directly.
Bypassing the driver core's traditional bind file allows this
platform driver to bind to any device via sysfs.

Signed-off-by: Antonios Motakis <a.motakis@virtualopensystems.com>
Signed-off-by: Kim Phillips <kim.phillips@linaro.org>
---
 drivers/vfio/Kconfig                          |   1 +
 drivers/vfio/Makefile                         |   1 +
 drivers/vfio/platform/Kconfig                 |   9 ++
 drivers/vfio/platform/Makefile                |   4 +
 drivers/vfio/platform/vfio_platform.c         | 219 ++++++++++++++++++++++++++
 drivers/vfio/platform/vfio_platform_private.h |  22 +++
 include/uapi/linux/vfio.h                     |   1 +
 7 files changed, 257 insertions(+)
 create mode 100644 drivers/vfio/platform/Kconfig
 create mode 100644 drivers/vfio/platform/Makefile
 create mode 100644 drivers/vfio/platform/vfio_platform.c
 create mode 100644 drivers/vfio/platform/vfio_platform_private.h

diff --git a/drivers/vfio/Kconfig b/drivers/vfio/Kconfig
index bd50721..8156fcf 100644
--- a/drivers/vfio/Kconfig
+++ b/drivers/vfio/Kconfig
@@ -20,3 +20,4 @@ menuconfig VFIO
 	  If you don't know what to do here, say N.
 
 source "drivers/vfio/pci/Kconfig"
+source "drivers/vfio/platform/Kconfig"
diff --git a/drivers/vfio/Makefile b/drivers/vfio/Makefile
index 72bfabc..b5e4a33 100644
--- a/drivers/vfio/Makefile
+++ b/drivers/vfio/Makefile
@@ -2,3 +2,4 @@ obj-$(CONFIG_VFIO) += vfio.o
 obj-$(CONFIG_VFIO_IOMMU_TYPE1) += vfio_iommu_type1.o
 obj-$(CONFIG_VFIO_IOMMU_SPAPR_TCE) += vfio_iommu_spapr_tce.o
 obj-$(CONFIG_VFIO_PCI) += pci/
+obj-$(CONFIG_VFIO_PLATFORM) += platform/
diff --git a/drivers/vfio/platform/Kconfig b/drivers/vfio/platform/Kconfig
new file mode 100644
index 0000000..42b0022
--- /dev/null
+++ b/drivers/vfio/platform/Kconfig
@@ -0,0 +1,9 @@
+config VFIO_PLATFORM
+	tristate "VFIO support for platform devices"
+	depends on VFIO && EVENTFD
+	help
+	  Support for platform devices with VFIO. This is required to make
+	  use of platform devices present on the system using the VFIO
+	  framework.
+
+	  If you don't know what to do here, say N.
diff --git a/drivers/vfio/platform/Makefile b/drivers/vfio/platform/Makefile
new file mode 100644
index 0000000..df3a014
--- /dev/null
+++ b/drivers/vfio/platform/Makefile
@@ -0,0 +1,4 @@
+
+vfio-platform-y := vfio_platform.o
+
+obj-$(CONFIG_VFIO_PLATFORM) += vfio-platform.o
diff --git a/drivers/vfio/platform/vfio_platform.c b/drivers/vfio/platform/vfio_platform.c
new file mode 100644
index 0000000..a3d8f29
--- /dev/null
+++ b/drivers/vfio/platform/vfio_platform.c
@@ -0,0 +1,219 @@
+/*
+ * Copyright (C) 2013 - Virtual Open Systems
+ * Author: Antonios Motakis <a.motakis@virtualopensystems.com>
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License, version 2, as
+ * published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ */
+
+#include <linux/device.h>
+#include <linux/eventfd.h>
+#include <linux/interrupt.h>
+#include <linux/iommu.h>
+#include <linux/module.h>
+#include <linux/mutex.h>
+#include <linux/notifier.h>
+#include <linux/pm_runtime.h>
+#include <linux/slab.h>
+#include <linux/types.h>
+#include <linux/uaccess.h>
+#include <linux/vfio.h>
+#include <linux/io.h>
+#include <linux/platform_device.h>
+#include <linux/irq.h>
+
+#include "vfio_platform_private.h"
+
+#define DRIVER_VERSION  "0.2"
+#define DRIVER_AUTHOR   "Antonios Motakis <a.motakis@virtualopensystems.com>"
+#define DRIVER_DESC     "VFIO for platform devices - User Level meta-driver"
+
+static void vfio_platform_release(void *device_data)
+{
+	module_put(THIS_MODULE);
+}
+
+static int vfio_platform_open(void *device_data)
+{
+	if (!try_module_get(THIS_MODULE))
+		return -ENODEV;
+
+	return 0;
+}
+
+static long vfio_platform_ioctl(void *device_data,
+			   unsigned int cmd, unsigned long arg)
+{
+	struct vfio_platform_device *vdev = device_data;
+	unsigned long minsz;
+
+	if (cmd == VFIO_DEVICE_GET_INFO) {
+		struct vfio_device_info info;
+
+		minsz = offsetofend(struct vfio_device_info, num_irqs);
+
+		if (copy_from_user(&info, (void __user *)arg, minsz))
+			return -EFAULT;
+
+		if (info.argsz < minsz)
+			return -EINVAL;
+
+		info.flags = VFIO_DEVICE_FLAGS_PLATFORM;
+		info.num_regions = 0;
+		info.num_irqs = 0;
+
+		return copy_to_user((void __user *)arg, &info, minsz);
+
+	} else if (cmd == VFIO_DEVICE_GET_REGION_INFO)
+		return -EINVAL;
+
+	else if (cmd == VFIO_DEVICE_GET_IRQ_INFO)
+		return -EINVAL;
+
+	else if (cmd == VFIO_DEVICE_SET_IRQS)
+		return -EINVAL;
+
+	else if (cmd == VFIO_DEVICE_RESET)
+		return -EINVAL;
+
+	return -ENOTTY;
+}
+
+static ssize_t vfio_platform_read(void *device_data, char __user *buf,
+			     size_t count, loff_t *ppos)
+{
+	return 0;
+}
+
+static ssize_t vfio_platform_write(void *device_data, const char __user *buf,
+			      size_t count, loff_t *ppos)
+{
+	return 0;
+}
+
+static int vfio_platform_mmap(void *device_data, struct vm_area_struct *vma)
+{
+	return -EINVAL;
+}
+
+static const struct vfio_device_ops vfio_platform_ops = {
+	.name		= "vfio-platform",
+	.open		= vfio_platform_open,
+	.release	= vfio_platform_release,
+	.ioctl		= vfio_platform_ioctl,
+	.read		= vfio_platform_read,
+	.write		= vfio_platform_write,
+	.mmap		= vfio_platform_mmap,
+};
+
+static ssize_t vfio_bind_store(struct device_driver *driver, const char *buf,
+			       size_t count)
+{
+	struct device *dev;
+	int ret;
+
+	dev = bus_find_device_by_name(&platform_bus_type, NULL, buf);
+	if (!dev)
+		return -ENODEV;
+
+	device_lock(dev);
+	ret = driver_probe_device(driver, dev);
+	device_unlock(dev);
+	if (ret > 0) {
+		/* success */
+		ret = count;
+	}
+
+	return ret;
+}
+static DRIVER_ATTR_WO(vfio_bind);
+
+static int vfio_platform_probe(struct platform_device *pdev)
+{
+	struct vfio_platform_device *vdev;
+	struct iommu_group *group;
+	int ret;
+
+	group = iommu_group_get(&pdev->dev);
+	if (!group) {
+		pr_err("VFIO: No IOMMU group for device %s\n", pdev->name);
+		return -EINVAL;
+	}
+
+	vdev = kzalloc(sizeof(*vdev), GFP_KERNEL);
+	if (!vdev) {
+		iommu_group_put(group);
+		return -ENOMEM;
+	}
+
+	vdev->pdev = pdev;
+
+	ret = vfio_add_group_dev(&pdev->dev, &vfio_platform_ops, vdev);
+	if (ret) {
+		iommu_group_put(group);
+		kfree(vdev);
+	}
+
+	return ret;
+}
+
+static int vfio_platform_remove(struct platform_device *pdev)
+{
+	struct vfio_platform_device *vdev;
+
+	vdev = vfio_del_group_dev(&pdev->dev);
+	if (!vdev)
+		return -EINVAL;
+
+	iommu_group_put(pdev->dev.iommu_group);
+	kfree(vdev);
+
+	return 0;
+}
+
+static struct platform_driver vfio_platform_driver = {
+	.probe		= vfio_platform_probe,
+	.remove		= vfio_platform_remove,
+	.driver	= {
+		.name	= "vfio-platform",
+		.owner	= THIS_MODULE,
+	},
+};
+
+static int __init vfio_platform_driver_init(void)
+{
+	int ret;
+
+	ret = platform_driver_register(&vfio_platform_driver);
+	if (ret) {
+		pr_err("Failed to register vfio platform driver, error: %d\n",
+		       ret);
+		return ret;
+	}
+
+	ret = driver_create_file(&vfio_platform_driver.driver,
+				 &driver_attr_vfio_bind);
+	if (ret)
+		pr_err("Failed to create vfio_bind file, error: %d\n", ret);
+
+	return ret;
+}
+
+static void __exit vfio_platform_driver_exit(void)
+{
+	platform_driver_unregister(&vfio_platform_driver);
+}
+
+module_init(vfio_platform_driver_init);
+module_exit(vfio_platform_driver_exit);
+
+MODULE_VERSION(DRIVER_VERSION);
+MODULE_LICENSE("GPL v2");
+MODULE_AUTHOR(DRIVER_AUTHOR);
+MODULE_DESCRIPTION(DRIVER_DESC);
diff --git a/drivers/vfio/platform/vfio_platform_private.h b/drivers/vfio/platform/vfio_platform_private.h
new file mode 100644
index 0000000..6df8084
--- /dev/null
+++ b/drivers/vfio/platform/vfio_platform_private.h
@@ -0,0 +1,22 @@
+/*
+ * Copyright (C) 2013 - Virtual Open Systems
+ * Author: Antonios Motakis <a.motakis@virtualopensystems.com>
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License, version 2, as
+ * published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ */
+
+#ifndef VFIO_PLATFORM_PRIVATE_H
+#define VFIO_PLATFORM_PRIVATE_H
+
+struct vfio_platform_device {
+	struct platform_device		*pdev;
+};
+
+#endif /* VFIO_PCI_PRIVATE_H */
diff --git a/include/uapi/linux/vfio.h b/include/uapi/linux/vfio.h
index d8e9e99..51ebf65 100644
--- a/include/uapi/linux/vfio.h
+++ b/include/uapi/linux/vfio.h
@@ -148,6 +148,7 @@ struct vfio_device_info {
 	__u32	flags;
 #define VFIO_DEVICE_FLAGS_RESET	(1 << 0)	/* Device supports reset */
 #define VFIO_DEVICE_FLAGS_PCI	(1 << 1)	/* vfio-pci device */
+#define VFIO_DEVICE_FLAGS_PLATFORM (1 << 2)	/* vfio-platform device */
 	__u32	num_regions;	/* Max region index + 1 */
 	__u32	num_irqs;	/* Max IRQ index + 1 */
 };
-- 
1.8.3.2


^ permalink raw reply related	[flat|nested] 92+ messages in thread

* [RFC PATCH v4 05/10] VFIO_PLATFORM: Return info for device and its memory mapped IO regions
@ 2014-02-08 17:29   ` Antonios Motakis
  0 siblings, 0 replies; 92+ messages in thread
From: Antonios Motakis @ 2014-02-08 17:29 UTC (permalink / raw)
  To: alex.williamson, kvmarm, iommu, linux-kernel, gregkh
  Cc: tech, a.rigo, B08248, kim.phillips, jan.kiszka, kvm, R65777,
	B07421, christoffer.dall, agraf, B16395, will.deacon,
	Antonios Motakis, Catalin Marinas, Mark Rutland

A VFIO userspace driver will start by opening the VFIO device
that corresponds to an IOMMU group, and will use the ioctl interface
to get the basic device info, such as number of memory regions and
interrupts, and their properties.

This patch enables the IOCTLs:
 - VFIO_DEVICE_GET_INFO
 - VFIO_DEVICE_GET_REGION_INFO

 IRQ info is provided by one of the latter patches.

Signed-off-by: Antonios Motakis <a.motakis@virtualopensystems.com>
---
 drivers/vfio/platform/vfio_platform.c         | 74 ++++++++++++++++++++++++---
 drivers/vfio/platform/vfio_platform_private.h |  8 +++
 2 files changed, 76 insertions(+), 6 deletions(-)

diff --git a/drivers/vfio/platform/vfio_platform.c b/drivers/vfio/platform/vfio_platform.c
index a3d8f29..f7db5c0 100644
--- a/drivers/vfio/platform/vfio_platform.c
+++ b/drivers/vfio/platform/vfio_platform.c
@@ -34,15 +34,62 @@
 #define DRIVER_AUTHOR   "Antonios Motakis <a.motakis@virtualopensystems.com>"
 #define DRIVER_DESC     "VFIO for platform devices - User Level meta-driver"
 
+static int vfio_platform_regions_init(struct vfio_platform_device *vdev)
+{
+	int cnt = 0, i;
+
+	while (platform_get_resource(vdev->pdev, IORESOURCE_MEM, cnt))
+		cnt++;
+
+	vdev->num_regions = cnt;
+
+	vdev->region = kzalloc(sizeof(struct vfio_platform_region) * cnt,
+				GFP_KERNEL);
+	if (!vdev->region)
+		return -ENOMEM;
+
+	for (i = 0; i < cnt;  i++) {
+		struct vfio_platform_region region;
+		struct resource *res =
+			platform_get_resource(vdev->pdev, IORESOURCE_MEM, i);
+
+		region.addr = res->start;
+		region.size = resource_size(res);
+		region.flags = 0;
+
+		vdev->region[i] = region;
+	}
+
+	return 0;
+}
+
+static void vfio_platform_regions_cleanup(struct vfio_platform_device *vdev)
+{
+	kfree(vdev->region);
+}
+
 static void vfio_platform_release(void *device_data)
 {
+	struct vfio_platform_device *vdev = device_data;
+
+	vfio_platform_regions_cleanup(vdev);
+
 	module_put(THIS_MODULE);
 }
 
 static int vfio_platform_open(void *device_data)
 {
-	if (!try_module_get(THIS_MODULE))
+	struct vfio_platform_device *vdev = device_data;
+	int ret;
+
+	ret = vfio_platform_regions_init(vdev);
+	if (ret)
+		return ret;
+
+	if (!try_module_get(THIS_MODULE)) {
+		vfio_platform_regions_cleanup(vdev);
 		return -ENODEV;
+	}
 
 	return 0;
 }
@@ -65,18 +112,33 @@ static long vfio_platform_ioctl(void *device_data,
 			return -EINVAL;
 
 		info.flags = VFIO_DEVICE_FLAGS_PLATFORM;
-		info.num_regions = 0;
+		info.num_regions = vdev->num_regions;
 		info.num_irqs = 0;
 
 		return copy_to_user((void __user *)arg, &info, minsz);
 
-	} else if (cmd == VFIO_DEVICE_GET_REGION_INFO)
-		return -EINVAL;
+	} else if (cmd == VFIO_DEVICE_GET_REGION_INFO) {
+		struct vfio_region_info info;
+
+		minsz = offsetofend(struct vfio_region_info, offset);
+
+		if (copy_from_user(&info, (void __user *)arg, minsz))
+			return -EFAULT;
+
+		if (info.argsz < minsz)
+			return -EINVAL;
+
+		/* map offset to the physical address  */
+		info.offset = vdev->region[info.index].addr;
+		info.size = vdev->region[info.index].size;
+		info.flags = vdev->region[info.index].flags;
+
+		return copy_to_user((void __user *)arg, &info, minsz);
 
-	else if (cmd == VFIO_DEVICE_GET_IRQ_INFO)
+	} else if (cmd == VFIO_DEVICE_GET_IRQ_INFO) {
 		return -EINVAL;
 
-	else if (cmd == VFIO_DEVICE_SET_IRQS)
+	} else if (cmd == VFIO_DEVICE_SET_IRQS)
 		return -EINVAL;
 
 	else if (cmd == VFIO_DEVICE_RESET)
diff --git a/drivers/vfio/platform/vfio_platform_private.h b/drivers/vfio/platform/vfio_platform_private.h
index 6df8084..4705aa5 100644
--- a/drivers/vfio/platform/vfio_platform_private.h
+++ b/drivers/vfio/platform/vfio_platform_private.h
@@ -15,8 +15,16 @@
 #ifndef VFIO_PLATFORM_PRIVATE_H
 #define VFIO_PLATFORM_PRIVATE_H
 
+struct vfio_platform_region {
+	u64			addr;
+	resource_size_t		size;
+	u32			flags;
+};
+
 struct vfio_platform_device {
 	struct platform_device		*pdev;
+	struct vfio_platform_region	*region;
+	u32				num_regions;
 };
 
 #endif /* VFIO_PCI_PRIVATE_H */
-- 
1.8.3.2


^ permalink raw reply related	[flat|nested] 92+ messages in thread

* [RFC PATCH v4 05/10] VFIO_PLATFORM: Return info for device and its memory mapped IO regions
@ 2014-02-08 17:29   ` Antonios Motakis
  0 siblings, 0 replies; 92+ messages in thread
From: Antonios Motakis @ 2014-02-08 17:29 UTC (permalink / raw)
  To: alex.williamson-H+wXaHxf7aLQT0dZR+AlfA,
	kvmarm-FPEHb7Xf0XXUo1n7N8X6UoWGPAHP3yOg,
	iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	gregkh-hQyY1W1yCW8ekmWlsbkhG0B+6BGkLq7r
  Cc: Mark Rutland, B07421-KZfg59tc24xl57MIdRCFDg, Antonios Motakis,
	Catalin Marinas, kvm-u79uwXL29TY76Z2rM5mHXA,
	jan.kiszka-kv7WeFo6aLtBDgjK7y7TUQ, will.deacon-5wv7dgnIgG8,
	a.rigo-lrHrjnjw1UfHK3s98zE1ajGjJy/sRE9J, agraf-l3A5Bk7waGM,
	B08248-KZfg59tc24xl57MIdRCFDg, R65777-KZfg59tc24xl57MIdRCFDg,
	tech-lrHrjnjw1UfHK3s98zE1ajGjJy/sRE9J,
	christoffer.dall-QSEj5FYQhm4dnm+yROfE0A

A VFIO userspace driver will start by opening the VFIO device
that corresponds to an IOMMU group, and will use the ioctl interface
to get the basic device info, such as number of memory regions and
interrupts, and their properties.

This patch enables the IOCTLs:
 - VFIO_DEVICE_GET_INFO
 - VFIO_DEVICE_GET_REGION_INFO

 IRQ info is provided by one of the latter patches.

Signed-off-by: Antonios Motakis <a.motakis-lrHrjnjw1UfHK3s98zE1ajGjJy/sRE9J@public.gmane.org>
---
 drivers/vfio/platform/vfio_platform.c         | 74 ++++++++++++++++++++++++---
 drivers/vfio/platform/vfio_platform_private.h |  8 +++
 2 files changed, 76 insertions(+), 6 deletions(-)

diff --git a/drivers/vfio/platform/vfio_platform.c b/drivers/vfio/platform/vfio_platform.c
index a3d8f29..f7db5c0 100644
--- a/drivers/vfio/platform/vfio_platform.c
+++ b/drivers/vfio/platform/vfio_platform.c
@@ -34,15 +34,62 @@
 #define DRIVER_AUTHOR   "Antonios Motakis <a.motakis-lrHrjnjw1UfHK3s98zE1ajGjJy/sRE9J@public.gmane.org>"
 #define DRIVER_DESC     "VFIO for platform devices - User Level meta-driver"
 
+static int vfio_platform_regions_init(struct vfio_platform_device *vdev)
+{
+	int cnt = 0, i;
+
+	while (platform_get_resource(vdev->pdev, IORESOURCE_MEM, cnt))
+		cnt++;
+
+	vdev->num_regions = cnt;
+
+	vdev->region = kzalloc(sizeof(struct vfio_platform_region) * cnt,
+				GFP_KERNEL);
+	if (!vdev->region)
+		return -ENOMEM;
+
+	for (i = 0; i < cnt;  i++) {
+		struct vfio_platform_region region;
+		struct resource *res =
+			platform_get_resource(vdev->pdev, IORESOURCE_MEM, i);
+
+		region.addr = res->start;
+		region.size = resource_size(res);
+		region.flags = 0;
+
+		vdev->region[i] = region;
+	}
+
+	return 0;
+}
+
+static void vfio_platform_regions_cleanup(struct vfio_platform_device *vdev)
+{
+	kfree(vdev->region);
+}
+
 static void vfio_platform_release(void *device_data)
 {
+	struct vfio_platform_device *vdev = device_data;
+
+	vfio_platform_regions_cleanup(vdev);
+
 	module_put(THIS_MODULE);
 }
 
 static int vfio_platform_open(void *device_data)
 {
-	if (!try_module_get(THIS_MODULE))
+	struct vfio_platform_device *vdev = device_data;
+	int ret;
+
+	ret = vfio_platform_regions_init(vdev);
+	if (ret)
+		return ret;
+
+	if (!try_module_get(THIS_MODULE)) {
+		vfio_platform_regions_cleanup(vdev);
 		return -ENODEV;
+	}
 
 	return 0;
 }
@@ -65,18 +112,33 @@ static long vfio_platform_ioctl(void *device_data,
 			return -EINVAL;
 
 		info.flags = VFIO_DEVICE_FLAGS_PLATFORM;
-		info.num_regions = 0;
+		info.num_regions = vdev->num_regions;
 		info.num_irqs = 0;
 
 		return copy_to_user((void __user *)arg, &info, minsz);
 
-	} else if (cmd == VFIO_DEVICE_GET_REGION_INFO)
-		return -EINVAL;
+	} else if (cmd == VFIO_DEVICE_GET_REGION_INFO) {
+		struct vfio_region_info info;
+
+		minsz = offsetofend(struct vfio_region_info, offset);
+
+		if (copy_from_user(&info, (void __user *)arg, minsz))
+			return -EFAULT;
+
+		if (info.argsz < minsz)
+			return -EINVAL;
+
+		/* map offset to the physical address  */
+		info.offset = vdev->region[info.index].addr;
+		info.size = vdev->region[info.index].size;
+		info.flags = vdev->region[info.index].flags;
+
+		return copy_to_user((void __user *)arg, &info, minsz);
 
-	else if (cmd == VFIO_DEVICE_GET_IRQ_INFO)
+	} else if (cmd == VFIO_DEVICE_GET_IRQ_INFO) {
 		return -EINVAL;
 
-	else if (cmd == VFIO_DEVICE_SET_IRQS)
+	} else if (cmd == VFIO_DEVICE_SET_IRQS)
 		return -EINVAL;
 
 	else if (cmd == VFIO_DEVICE_RESET)
diff --git a/drivers/vfio/platform/vfio_platform_private.h b/drivers/vfio/platform/vfio_platform_private.h
index 6df8084..4705aa5 100644
--- a/drivers/vfio/platform/vfio_platform_private.h
+++ b/drivers/vfio/platform/vfio_platform_private.h
@@ -15,8 +15,16 @@
 #ifndef VFIO_PLATFORM_PRIVATE_H
 #define VFIO_PLATFORM_PRIVATE_H
 
+struct vfio_platform_region {
+	u64			addr;
+	resource_size_t		size;
+	u32			flags;
+};
+
 struct vfio_platform_device {
 	struct platform_device		*pdev;
+	struct vfio_platform_region	*region;
+	u32				num_regions;
 };
 
 #endif /* VFIO_PCI_PRIVATE_H */
-- 
1.8.3.2

^ permalink raw reply related	[flat|nested] 92+ messages in thread

* [RFC PATCH v4 06/10] VFIO_PLATFORM: Read and write support for the device fd
@ 2014-02-08 17:29   ` Antonios Motakis
  0 siblings, 0 replies; 92+ messages in thread
From: Antonios Motakis @ 2014-02-08 17:29 UTC (permalink / raw)
  To: alex.williamson, kvmarm, iommu, linux-kernel, gregkh
  Cc: tech, a.rigo, B08248, kim.phillips, jan.kiszka, kvm, R65777,
	B07421, christoffer.dall, agraf, B16395, will.deacon,
	Antonios Motakis

VFIO returns a file descriptor which we can use to manipulate the memory
regions of the device. Since some memory regions we cannot mmap due to
security concerns, we also allow to read and write to this file descriptor
directly.

Signed-off-by: Antonios Motakis <a.motakis@virtualopensystems.com>
Tested-by: Alvise Rigo <a.rigo@virtualopensystems.com>
---
 drivers/vfio/platform/vfio_platform.c | 128 +++++++++++++++++++++++++++++++++-
 1 file changed, 125 insertions(+), 3 deletions(-)

diff --git a/drivers/vfio/platform/vfio_platform.c b/drivers/vfio/platform/vfio_platform.c
index f7db5c0..ee96078 100644
--- a/drivers/vfio/platform/vfio_platform.c
+++ b/drivers/vfio/platform/vfio_platform.c
@@ -55,7 +55,8 @@ static int vfio_platform_regions_init(struct vfio_platform_device *vdev)
 
 		region.addr = res->start;
 		region.size = resource_size(res);
-		region.flags = 0;
+		region.flags = VFIO_REGION_INFO_FLAG_READ
+				| VFIO_REGION_INFO_FLAG_WRITE;
 
 		vdev->region[i] = region;
 	}
@@ -150,13 +151,134 @@ static long vfio_platform_ioctl(void *device_data,
 static ssize_t vfio_platform_read(void *device_data, char __user *buf,
 			     size_t count, loff_t *ppos)
 {
-	return 0;
+	struct vfio_platform_device *vdev = device_data;
+	unsigned int *io;
+	int i;
+
+	for (i = 0; i < vdev->num_regions; i++) {
+		struct vfio_platform_region region = vdev->region[i];
+		unsigned int done = 0;
+		loff_t off;
+
+		if ((*ppos < region.addr)
+		     || (*ppos + count - 1) >= (region.addr + region.size))
+			continue;
+
+		io = ioremap_nocache(region.addr, region.size);
+
+		off = *ppos - region.addr;
+
+		while (count) {
+			size_t filled;
+
+			if (count >= 4 && !(off % 4)) {
+				u32 val;
+
+				val = ioread32(io + off);
+				if (copy_to_user(buf, &val, 4))
+					goto err;
+
+				filled = 4;
+			} else if (count >= 2 && !(off % 2)) {
+				u16 val;
+
+				val = ioread16(io + off);
+				if (copy_to_user(buf, &val, 2))
+					goto err;
+
+				filled = 2;
+			} else {
+				u8 val;
+
+				val = ioread8(io + off);
+				if (copy_to_user(buf, &val, 1))
+					goto err;
+
+				filled = 1;
+			}
+
+
+			count -= filled;
+			done += filled;
+			off += filled;
+			buf += filled;
+		}
+
+		iounmap(io);
+		return done;
+	}
+
+	return -EFAULT;
+
+err:
+	iounmap(io);
+	return -EFAULT;
 }
 
 static ssize_t vfio_platform_write(void *device_data, const char __user *buf,
 			      size_t count, loff_t *ppos)
 {
-	return 0;
+	struct vfio_platform_device *vdev = device_data;
+	unsigned int *io;
+	int i;
+
+	for (i = 0; i < vdev->num_regions; i++) {
+		struct vfio_platform_region region = vdev->region[i];
+		unsigned int done = 0;
+		loff_t off;
+
+		if ((*ppos < region.addr)
+		     || (*ppos + count - 1) >= (region.addr + region.size))
+			continue;
+
+		io = ioremap_nocache(region.addr, region.size);
+
+		off = *ppos - region.addr;
+
+		while (count) {
+			size_t filled;
+
+			if (count >= 4 && !(off % 4)) {
+				u32 val;
+
+				if (copy_from_user(&val, buf, 4))
+					goto err;
+				iowrite32(val, io + off);
+
+				filled = 4;
+			} else if (count >= 2 && !(off % 2)) {
+				u16 val;
+
+				if (copy_from_user(&val, buf, 2))
+					goto err;
+				iowrite16(val, io + off);
+
+				filled = 2;
+			} else {
+				u8 val;
+
+				if (copy_from_user(&val, buf, 1))
+					goto err;
+				iowrite8(val, io + off);
+
+				filled = 1;
+			}
+
+			count -= filled;
+			done += filled;
+			off += filled;
+			buf += filled;
+		}
+
+		iounmap(io);
+		return done;
+	}
+
+	return -EINVAL;
+
+err:
+	iounmap(io);
+	return -EFAULT;
 }
 
 static int vfio_platform_mmap(void *device_data, struct vm_area_struct *vma)
-- 
1.8.3.2


^ permalink raw reply related	[flat|nested] 92+ messages in thread

* [RFC PATCH v4 06/10] VFIO_PLATFORM: Read and write support for the device fd
@ 2014-02-08 17:29   ` Antonios Motakis
  0 siblings, 0 replies; 92+ messages in thread
From: Antonios Motakis @ 2014-02-08 17:29 UTC (permalink / raw)
  To: alex.williamson-H+wXaHxf7aLQT0dZR+AlfA,
	kvmarm-FPEHb7Xf0XXUo1n7N8X6UoWGPAHP3yOg,
	iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	gregkh-hQyY1W1yCW8ekmWlsbkhG0B+6BGkLq7r
  Cc: B07421-KZfg59tc24xl57MIdRCFDg, Antonios Motakis,
	kvm-u79uwXL29TY76Z2rM5mHXA, jan.kiszka-kv7WeFo6aLtBDgjK7y7TUQ,
	will.deacon-5wv7dgnIgG8, a.rigo-lrHrjnjw1UfHK3s98zE1ajGjJy/sRE9J,
	agraf-l3A5Bk7waGM, B08248-KZfg59tc24xl57MIdRCFDg,
	R65777-KZfg59tc24xl57MIdRCFDg,
	tech-lrHrjnjw1UfHK3s98zE1ajGjJy/sRE9J,
	christoffer.dall-QSEj5FYQhm4dnm+yROfE0A

VFIO returns a file descriptor which we can use to manipulate the memory
regions of the device. Since some memory regions we cannot mmap due to
security concerns, we also allow to read and write to this file descriptor
directly.

Signed-off-by: Antonios Motakis <a.motakis-lrHrjnjw1UfHK3s98zE1ajGjJy/sRE9J@public.gmane.org>
Tested-by: Alvise Rigo <a.rigo-lrHrjnjw1UfHK3s98zE1ajGjJy/sRE9J@public.gmane.org>
---
 drivers/vfio/platform/vfio_platform.c | 128 +++++++++++++++++++++++++++++++++-
 1 file changed, 125 insertions(+), 3 deletions(-)

diff --git a/drivers/vfio/platform/vfio_platform.c b/drivers/vfio/platform/vfio_platform.c
index f7db5c0..ee96078 100644
--- a/drivers/vfio/platform/vfio_platform.c
+++ b/drivers/vfio/platform/vfio_platform.c
@@ -55,7 +55,8 @@ static int vfio_platform_regions_init(struct vfio_platform_device *vdev)
 
 		region.addr = res->start;
 		region.size = resource_size(res);
-		region.flags = 0;
+		region.flags = VFIO_REGION_INFO_FLAG_READ
+				| VFIO_REGION_INFO_FLAG_WRITE;
 
 		vdev->region[i] = region;
 	}
@@ -150,13 +151,134 @@ static long vfio_platform_ioctl(void *device_data,
 static ssize_t vfio_platform_read(void *device_data, char __user *buf,
 			     size_t count, loff_t *ppos)
 {
-	return 0;
+	struct vfio_platform_device *vdev = device_data;
+	unsigned int *io;
+	int i;
+
+	for (i = 0; i < vdev->num_regions; i++) {
+		struct vfio_platform_region region = vdev->region[i];
+		unsigned int done = 0;
+		loff_t off;
+
+		if ((*ppos < region.addr)
+		     || (*ppos + count - 1) >= (region.addr + region.size))
+			continue;
+
+		io = ioremap_nocache(region.addr, region.size);
+
+		off = *ppos - region.addr;
+
+		while (count) {
+			size_t filled;
+
+			if (count >= 4 && !(off % 4)) {
+				u32 val;
+
+				val = ioread32(io + off);
+				if (copy_to_user(buf, &val, 4))
+					goto err;
+
+				filled = 4;
+			} else if (count >= 2 && !(off % 2)) {
+				u16 val;
+
+				val = ioread16(io + off);
+				if (copy_to_user(buf, &val, 2))
+					goto err;
+
+				filled = 2;
+			} else {
+				u8 val;
+
+				val = ioread8(io + off);
+				if (copy_to_user(buf, &val, 1))
+					goto err;
+
+				filled = 1;
+			}
+
+
+			count -= filled;
+			done += filled;
+			off += filled;
+			buf += filled;
+		}
+
+		iounmap(io);
+		return done;
+	}
+
+	return -EFAULT;
+
+err:
+	iounmap(io);
+	return -EFAULT;
 }
 
 static ssize_t vfio_platform_write(void *device_data, const char __user *buf,
 			      size_t count, loff_t *ppos)
 {
-	return 0;
+	struct vfio_platform_device *vdev = device_data;
+	unsigned int *io;
+	int i;
+
+	for (i = 0; i < vdev->num_regions; i++) {
+		struct vfio_platform_region region = vdev->region[i];
+		unsigned int done = 0;
+		loff_t off;
+
+		if ((*ppos < region.addr)
+		     || (*ppos + count - 1) >= (region.addr + region.size))
+			continue;
+
+		io = ioremap_nocache(region.addr, region.size);
+
+		off = *ppos - region.addr;
+
+		while (count) {
+			size_t filled;
+
+			if (count >= 4 && !(off % 4)) {
+				u32 val;
+
+				if (copy_from_user(&val, buf, 4))
+					goto err;
+				iowrite32(val, io + off);
+
+				filled = 4;
+			} else if (count >= 2 && !(off % 2)) {
+				u16 val;
+
+				if (copy_from_user(&val, buf, 2))
+					goto err;
+				iowrite16(val, io + off);
+
+				filled = 2;
+			} else {
+				u8 val;
+
+				if (copy_from_user(&val, buf, 1))
+					goto err;
+				iowrite8(val, io + off);
+
+				filled = 1;
+			}
+
+			count -= filled;
+			done += filled;
+			off += filled;
+			buf += filled;
+		}
+
+		iounmap(io);
+		return done;
+	}
+
+	return -EINVAL;
+
+err:
+	iounmap(io);
+	return -EFAULT;
 }
 
 static int vfio_platform_mmap(void *device_data, struct vm_area_struct *vma)
-- 
1.8.3.2

^ permalink raw reply related	[flat|nested] 92+ messages in thread

* [RFC PATCH v4 07/10] VFIO_PLATFORM: Support MMAP of MMIO regions
@ 2014-02-08 17:29   ` Antonios Motakis
  0 siblings, 0 replies; 92+ messages in thread
From: Antonios Motakis @ 2014-02-08 17:29 UTC (permalink / raw)
  To: alex.williamson, kvmarm, iommu, linux-kernel, gregkh
  Cc: tech, a.rigo, B08248, kim.phillips, jan.kiszka, kvm, R65777,
	B07421, christoffer.dall, agraf, B16395, will.deacon,
	Antonios Motakis

Allow to memory map the MMIO regions of the device so userspace can
directly access them.

Signed-off-by: Antonios Motakis <a.motakis@virtualopensystems.com>
Tested-by: Alvise Rigo <a.rigo@virtualopensystems.com>
---
 drivers/vfio/platform/vfio_platform.c | 33 +++++++++++++++++++++++++++++++++
 1 file changed, 33 insertions(+)

diff --git a/drivers/vfio/platform/vfio_platform.c b/drivers/vfio/platform/vfio_platform.c
index ee96078..6b4b033 100644
--- a/drivers/vfio/platform/vfio_platform.c
+++ b/drivers/vfio/platform/vfio_platform.c
@@ -58,6 +58,11 @@ static int vfio_platform_regions_init(struct vfio_platform_device *vdev)
 		region.flags = VFIO_REGION_INFO_FLAG_READ
 				| VFIO_REGION_INFO_FLAG_WRITE;
 
+		/* Only regions addressed with PAGE granularity can be MMAPed
+		 * securely. */
+		if (!(region.addr & ~PAGE_MASK) && !(region.size & ~PAGE_MASK))
+			region.flags |= VFIO_REGION_INFO_FLAG_MMAP;
+
 		vdev->region[i] = region;
 	}
 
@@ -283,6 +288,34 @@ err:
 
 static int vfio_platform_mmap(void *device_data, struct vm_area_struct *vma)
 {
+	struct vfio_platform_device *vdev = device_data;
+	u64 req_len = vma->vm_end - vma->vm_start;
+	u64 addr = vma->vm_pgoff << PAGE_SHIFT;
+	int i;
+
+	if (vma->vm_end < vma->vm_start)
+		return -EINVAL;
+	if (vma->vm_start & ~PAGE_MASK)
+		return -EINVAL;
+	if (vma->vm_end & ~PAGE_MASK)
+		return -EINVAL;
+	if ((vma->vm_flags & VM_SHARED) == 0)
+		return -EINVAL;
+
+	for (i = 0; i < vdev->num_regions; i++) {
+		struct vfio_platform_region region = vdev->region[i];
+
+		if ((addr < region.addr)
+		     || (addr + req_len - 1) >= (region.addr + region.size))
+			continue;
+
+		vma->vm_private_data = vdev;
+		vma->vm_page_prot = pgprot_noncached(vma->vm_page_prot);
+
+		return remap_pfn_range(vma, vma->vm_start, vma->vm_pgoff,
+				       req_len, vma->vm_page_prot);
+	}
+
 	return -EINVAL;
 }
 
-- 
1.8.3.2


^ permalink raw reply related	[flat|nested] 92+ messages in thread

* [RFC PATCH v4 07/10] VFIO_PLATFORM: Support MMAP of MMIO regions
@ 2014-02-08 17:29   ` Antonios Motakis
  0 siblings, 0 replies; 92+ messages in thread
From: Antonios Motakis @ 2014-02-08 17:29 UTC (permalink / raw)
  To: alex.williamson-H+wXaHxf7aLQT0dZR+AlfA,
	kvmarm-FPEHb7Xf0XXUo1n7N8X6UoWGPAHP3yOg,
	iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	gregkh-hQyY1W1yCW8ekmWlsbkhG0B+6BGkLq7r
  Cc: B07421-KZfg59tc24xl57MIdRCFDg, Antonios Motakis,
	kvm-u79uwXL29TY76Z2rM5mHXA, jan.kiszka-kv7WeFo6aLtBDgjK7y7TUQ,
	will.deacon-5wv7dgnIgG8, a.rigo-lrHrjnjw1UfHK3s98zE1ajGjJy/sRE9J,
	agraf-l3A5Bk7waGM, B08248-KZfg59tc24xl57MIdRCFDg,
	R65777-KZfg59tc24xl57MIdRCFDg,
	tech-lrHrjnjw1UfHK3s98zE1ajGjJy/sRE9J,
	christoffer.dall-QSEj5FYQhm4dnm+yROfE0A

Allow to memory map the MMIO regions of the device so userspace can
directly access them.

Signed-off-by: Antonios Motakis <a.motakis-lrHrjnjw1UfHK3s98zE1ajGjJy/sRE9J@public.gmane.org>
Tested-by: Alvise Rigo <a.rigo-lrHrjnjw1UfHK3s98zE1ajGjJy/sRE9J@public.gmane.org>
---
 drivers/vfio/platform/vfio_platform.c | 33 +++++++++++++++++++++++++++++++++
 1 file changed, 33 insertions(+)

diff --git a/drivers/vfio/platform/vfio_platform.c b/drivers/vfio/platform/vfio_platform.c
index ee96078..6b4b033 100644
--- a/drivers/vfio/platform/vfio_platform.c
+++ b/drivers/vfio/platform/vfio_platform.c
@@ -58,6 +58,11 @@ static int vfio_platform_regions_init(struct vfio_platform_device *vdev)
 		region.flags = VFIO_REGION_INFO_FLAG_READ
 				| VFIO_REGION_INFO_FLAG_WRITE;
 
+		/* Only regions addressed with PAGE granularity can be MMAPed
+		 * securely. */
+		if (!(region.addr & ~PAGE_MASK) && !(region.size & ~PAGE_MASK))
+			region.flags |= VFIO_REGION_INFO_FLAG_MMAP;
+
 		vdev->region[i] = region;
 	}
 
@@ -283,6 +288,34 @@ err:
 
 static int vfio_platform_mmap(void *device_data, struct vm_area_struct *vma)
 {
+	struct vfio_platform_device *vdev = device_data;
+	u64 req_len = vma->vm_end - vma->vm_start;
+	u64 addr = vma->vm_pgoff << PAGE_SHIFT;
+	int i;
+
+	if (vma->vm_end < vma->vm_start)
+		return -EINVAL;
+	if (vma->vm_start & ~PAGE_MASK)
+		return -EINVAL;
+	if (vma->vm_end & ~PAGE_MASK)
+		return -EINVAL;
+	if ((vma->vm_flags & VM_SHARED) == 0)
+		return -EINVAL;
+
+	for (i = 0; i < vdev->num_regions; i++) {
+		struct vfio_platform_region region = vdev->region[i];
+
+		if ((addr < region.addr)
+		     || (addr + req_len - 1) >= (region.addr + region.size))
+			continue;
+
+		vma->vm_private_data = vdev;
+		vma->vm_page_prot = pgprot_noncached(vma->vm_page_prot);
+
+		return remap_pfn_range(vma, vma->vm_start, vma->vm_pgoff,
+				       req_len, vma->vm_page_prot);
+	}
+
 	return -EINVAL;
 }
 
-- 
1.8.3.2

^ permalink raw reply related	[flat|nested] 92+ messages in thread

* [RFC PATCH v4 08/10] VFIO_PLATFORM: Return IRQ info
@ 2014-02-08 17:29   ` Antonios Motakis
  0 siblings, 0 replies; 92+ messages in thread
From: Antonios Motakis @ 2014-02-08 17:29 UTC (permalink / raw)
  To: alex.williamson, kvmarm, iommu, linux-kernel, gregkh
  Cc: tech, a.rigo, B08248, kim.phillips, jan.kiszka, kvm, R65777,
	B07421, christoffer.dall, agraf, B16395, will.deacon,
	Antonios Motakis, Catalin Marinas, Mark Rutland

Return information for the interrupts exposed by the device.
This patch extends VFIO_DEVICE_GET_INFO with the number of IRQs
and enables VFIO_DEVICE_GET_IRQ_INFO

Signed-off-by: Antonios Motakis <a.motakis@virtualopensystems.com>
---
 drivers/vfio/platform/Makefile                |  2 +-
 drivers/vfio/platform/vfio_platform.c         | 35 +++++++++++++--
 drivers/vfio/platform/vfio_platform_irq.c     | 63 +++++++++++++++++++++++++++
 drivers/vfio/platform/vfio_platform_private.h | 11 +++++
 4 files changed, 106 insertions(+), 5 deletions(-)
 create mode 100644 drivers/vfio/platform/vfio_platform_irq.c

diff --git a/drivers/vfio/platform/Makefile b/drivers/vfio/platform/Makefile
index df3a014..2c53327 100644
--- a/drivers/vfio/platform/Makefile
+++ b/drivers/vfio/platform/Makefile
@@ -1,4 +1,4 @@
 
-vfio-platform-y := vfio_platform.o
+vfio-platform-y := vfio_platform.o vfio_platform_irq.o
 
 obj-$(CONFIG_VFIO_PLATFORM) += vfio-platform.o
diff --git a/drivers/vfio/platform/vfio_platform.c b/drivers/vfio/platform/vfio_platform.c
index 6b4b033..ef1ac17 100644
--- a/drivers/vfio/platform/vfio_platform.c
+++ b/drivers/vfio/platform/vfio_platform.c
@@ -79,6 +79,7 @@ static void vfio_platform_release(void *device_data)
 	struct vfio_platform_device *vdev = device_data;
 
 	vfio_platform_regions_cleanup(vdev);
+	vfio_platform_irq_cleanup(vdev);
 
 	module_put(THIS_MODULE);
 }
@@ -92,12 +93,22 @@ static int vfio_platform_open(void *device_data)
 	if (ret)
 		return ret;
 
+	ret = vfio_platform_irq_init(vdev);
+	if (ret)
+		goto err_irq;
+
 	if (!try_module_get(THIS_MODULE)) {
-		vfio_platform_regions_cleanup(vdev);
-		return -ENODEV;
+		ret = -ENODEV;
+		goto err_mod;
 	}
 
 	return 0;
+
+err_mod:
+	vfio_platform_irq_cleanup(vdev);
+err_irq:
+	vfio_platform_regions_cleanup(vdev);
+	return ret;
 }
 
 static long vfio_platform_ioctl(void *device_data,
@@ -119,7 +130,7 @@ static long vfio_platform_ioctl(void *device_data,
 
 		info.flags = VFIO_DEVICE_FLAGS_PLATFORM;
 		info.num_regions = vdev->num_regions;
-		info.num_irqs = 0;
+		info.num_irqs = vdev->num_irqs;
 
 		return copy_to_user((void __user *)arg, &info, minsz);
 
@@ -142,7 +153,23 @@ static long vfio_platform_ioctl(void *device_data,
 		return copy_to_user((void __user *)arg, &info, minsz);
 
 	} else if (cmd == VFIO_DEVICE_GET_IRQ_INFO) {
-		return -EINVAL;
+		struct vfio_irq_info info;
+
+		minsz = offsetofend(struct vfio_irq_info, count);
+
+		if (copy_from_user(&info, (void __user *)arg, minsz))
+			return -EFAULT;
+
+		if (info.argsz < minsz)
+			return -EINVAL;
+
+		if (info.index >= vdev->num_irqs)
+			return -EINVAL;
+
+		info.flags = vdev->irq[info.index].flags;
+		info.count = vdev->irq[info.index].count;
+
+		return copy_to_user((void __user *)arg, &info, minsz);
 
 	} else if (cmd == VFIO_DEVICE_SET_IRQS)
 		return -EINVAL;
diff --git a/drivers/vfio/platform/vfio_platform_irq.c b/drivers/vfio/platform/vfio_platform_irq.c
new file mode 100644
index 0000000..075c401
--- /dev/null
+++ b/drivers/vfio/platform/vfio_platform_irq.c
@@ -0,0 +1,63 @@
+/*
+ * VFIO platform devices interrupt handling
+ *
+ * Copyright (C) 2013 - Virtual Open Systems
+ * Author: Antonios Motakis <a.motakis@virtualopensystems.com>
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License, version 2, as
+ * published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ */
+
+#include <linux/device.h>
+#include <linux/eventfd.h>
+#include <linux/interrupt.h>
+#include <linux/iommu.h>
+#include <linux/module.h>
+#include <linux/mutex.h>
+#include <linux/notifier.h>
+#include <linux/pm_runtime.h>
+#include <linux/slab.h>
+#include <linux/types.h>
+#include <linux/uaccess.h>
+#include <linux/vfio.h>
+#include <linux/platform_device.h>
+#include <linux/irq.h>
+
+#include "vfio_platform_private.h"
+
+int vfio_platform_irq_init(struct vfio_platform_device *vdev)
+{
+	int cnt = 0, i;
+
+	while (platform_get_irq(vdev->pdev, cnt) > 0)
+		cnt++;
+
+	vdev->num_irqs = cnt;
+
+	vdev->irq = kzalloc(sizeof(struct vfio_platform_irq) * vdev->num_irqs,
+				GFP_KERNEL);
+	if (!vdev->irq)
+		return -ENOMEM;
+
+	for (i = 0; i < cnt; i++) {
+		struct vfio_platform_irq irq;
+
+		irq.flags = 0;
+		irq.count = 1;
+
+		vdev->irq[i] = irq;
+	}
+
+	return 0;
+}
+
+void vfio_platform_irq_cleanup(struct vfio_platform_device *vdev)
+{
+	kfree(vdev->irq);
+}
diff --git a/drivers/vfio/platform/vfio_platform_private.h b/drivers/vfio/platform/vfio_platform_private.h
index 4705aa5..726f5d1 100644
--- a/drivers/vfio/platform/vfio_platform_private.h
+++ b/drivers/vfio/platform/vfio_platform_private.h
@@ -15,6 +15,11 @@
 #ifndef VFIO_PLATFORM_PRIVATE_H
 #define VFIO_PLATFORM_PRIVATE_H
 
+struct vfio_platform_irq {
+	u32			flags;
+	u32			count;
+};
+
 struct vfio_platform_region {
 	u64			addr;
 	resource_size_t		size;
@@ -25,6 +30,12 @@ struct vfio_platform_device {
 	struct platform_device		*pdev;
 	struct vfio_platform_region	*region;
 	u32				num_regions;
+	struct vfio_platform_irq	*irq;
+	u32				num_irqs;
 };
 
+extern int vfio_platform_irq_init(struct vfio_platform_device *vdev);
+
+extern void vfio_platform_irq_cleanup(struct vfio_platform_device *vdev);
+
 #endif /* VFIO_PCI_PRIVATE_H */
-- 
1.8.3.2


^ permalink raw reply related	[flat|nested] 92+ messages in thread

* [RFC PATCH v4 08/10] VFIO_PLATFORM: Return IRQ info
@ 2014-02-08 17:29   ` Antonios Motakis
  0 siblings, 0 replies; 92+ messages in thread
From: Antonios Motakis @ 2014-02-08 17:29 UTC (permalink / raw)
  To: alex.williamson-H+wXaHxf7aLQT0dZR+AlfA,
	kvmarm-FPEHb7Xf0XXUo1n7N8X6UoWGPAHP3yOg,
	iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	gregkh-hQyY1W1yCW8ekmWlsbkhG0B+6BGkLq7r
  Cc: Mark Rutland, B07421-KZfg59tc24xl57MIdRCFDg, Antonios Motakis,
	Catalin Marinas, kvm-u79uwXL29TY76Z2rM5mHXA,
	jan.kiszka-kv7WeFo6aLtBDgjK7y7TUQ, will.deacon-5wv7dgnIgG8,
	a.rigo-lrHrjnjw1UfHK3s98zE1ajGjJy/sRE9J, agraf-l3A5Bk7waGM,
	B08248-KZfg59tc24xl57MIdRCFDg, R65777-KZfg59tc24xl57MIdRCFDg,
	tech-lrHrjnjw1UfHK3s98zE1ajGjJy/sRE9J,
	christoffer.dall-QSEj5FYQhm4dnm+yROfE0A

Return information for the interrupts exposed by the device.
This patch extends VFIO_DEVICE_GET_INFO with the number of IRQs
and enables VFIO_DEVICE_GET_IRQ_INFO

Signed-off-by: Antonios Motakis <a.motakis-lrHrjnjw1UfHK3s98zE1ajGjJy/sRE9J@public.gmane.org>
---
 drivers/vfio/platform/Makefile                |  2 +-
 drivers/vfio/platform/vfio_platform.c         | 35 +++++++++++++--
 drivers/vfio/platform/vfio_platform_irq.c     | 63 +++++++++++++++++++++++++++
 drivers/vfio/platform/vfio_platform_private.h | 11 +++++
 4 files changed, 106 insertions(+), 5 deletions(-)
 create mode 100644 drivers/vfio/platform/vfio_platform_irq.c

diff --git a/drivers/vfio/platform/Makefile b/drivers/vfio/platform/Makefile
index df3a014..2c53327 100644
--- a/drivers/vfio/platform/Makefile
+++ b/drivers/vfio/platform/Makefile
@@ -1,4 +1,4 @@
 
-vfio-platform-y := vfio_platform.o
+vfio-platform-y := vfio_platform.o vfio_platform_irq.o
 
 obj-$(CONFIG_VFIO_PLATFORM) += vfio-platform.o
diff --git a/drivers/vfio/platform/vfio_platform.c b/drivers/vfio/platform/vfio_platform.c
index 6b4b033..ef1ac17 100644
--- a/drivers/vfio/platform/vfio_platform.c
+++ b/drivers/vfio/platform/vfio_platform.c
@@ -79,6 +79,7 @@ static void vfio_platform_release(void *device_data)
 	struct vfio_platform_device *vdev = device_data;
 
 	vfio_platform_regions_cleanup(vdev);
+	vfio_platform_irq_cleanup(vdev);
 
 	module_put(THIS_MODULE);
 }
@@ -92,12 +93,22 @@ static int vfio_platform_open(void *device_data)
 	if (ret)
 		return ret;
 
+	ret = vfio_platform_irq_init(vdev);
+	if (ret)
+		goto err_irq;
+
 	if (!try_module_get(THIS_MODULE)) {
-		vfio_platform_regions_cleanup(vdev);
-		return -ENODEV;
+		ret = -ENODEV;
+		goto err_mod;
 	}
 
 	return 0;
+
+err_mod:
+	vfio_platform_irq_cleanup(vdev);
+err_irq:
+	vfio_platform_regions_cleanup(vdev);
+	return ret;
 }
 
 static long vfio_platform_ioctl(void *device_data,
@@ -119,7 +130,7 @@ static long vfio_platform_ioctl(void *device_data,
 
 		info.flags = VFIO_DEVICE_FLAGS_PLATFORM;
 		info.num_regions = vdev->num_regions;
-		info.num_irqs = 0;
+		info.num_irqs = vdev->num_irqs;
 
 		return copy_to_user((void __user *)arg, &info, minsz);
 
@@ -142,7 +153,23 @@ static long vfio_platform_ioctl(void *device_data,
 		return copy_to_user((void __user *)arg, &info, minsz);
 
 	} else if (cmd == VFIO_DEVICE_GET_IRQ_INFO) {
-		return -EINVAL;
+		struct vfio_irq_info info;
+
+		minsz = offsetofend(struct vfio_irq_info, count);
+
+		if (copy_from_user(&info, (void __user *)arg, minsz))
+			return -EFAULT;
+
+		if (info.argsz < minsz)
+			return -EINVAL;
+
+		if (info.index >= vdev->num_irqs)
+			return -EINVAL;
+
+		info.flags = vdev->irq[info.index].flags;
+		info.count = vdev->irq[info.index].count;
+
+		return copy_to_user((void __user *)arg, &info, minsz);
 
 	} else if (cmd == VFIO_DEVICE_SET_IRQS)
 		return -EINVAL;
diff --git a/drivers/vfio/platform/vfio_platform_irq.c b/drivers/vfio/platform/vfio_platform_irq.c
new file mode 100644
index 0000000..075c401
--- /dev/null
+++ b/drivers/vfio/platform/vfio_platform_irq.c
@@ -0,0 +1,63 @@
+/*
+ * VFIO platform devices interrupt handling
+ *
+ * Copyright (C) 2013 - Virtual Open Systems
+ * Author: Antonios Motakis <a.motakis-lrHrjnjw1UfHK3s98zE1ajGjJy/sRE9J@public.gmane.org>
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License, version 2, as
+ * published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ */
+
+#include <linux/device.h>
+#include <linux/eventfd.h>
+#include <linux/interrupt.h>
+#include <linux/iommu.h>
+#include <linux/module.h>
+#include <linux/mutex.h>
+#include <linux/notifier.h>
+#include <linux/pm_runtime.h>
+#include <linux/slab.h>
+#include <linux/types.h>
+#include <linux/uaccess.h>
+#include <linux/vfio.h>
+#include <linux/platform_device.h>
+#include <linux/irq.h>
+
+#include "vfio_platform_private.h"
+
+int vfio_platform_irq_init(struct vfio_platform_device *vdev)
+{
+	int cnt = 0, i;
+
+	while (platform_get_irq(vdev->pdev, cnt) > 0)
+		cnt++;
+
+	vdev->num_irqs = cnt;
+
+	vdev->irq = kzalloc(sizeof(struct vfio_platform_irq) * vdev->num_irqs,
+				GFP_KERNEL);
+	if (!vdev->irq)
+		return -ENOMEM;
+
+	for (i = 0; i < cnt; i++) {
+		struct vfio_platform_irq irq;
+
+		irq.flags = 0;
+		irq.count = 1;
+
+		vdev->irq[i] = irq;
+	}
+
+	return 0;
+}
+
+void vfio_platform_irq_cleanup(struct vfio_platform_device *vdev)
+{
+	kfree(vdev->irq);
+}
diff --git a/drivers/vfio/platform/vfio_platform_private.h b/drivers/vfio/platform/vfio_platform_private.h
index 4705aa5..726f5d1 100644
--- a/drivers/vfio/platform/vfio_platform_private.h
+++ b/drivers/vfio/platform/vfio_platform_private.h
@@ -15,6 +15,11 @@
 #ifndef VFIO_PLATFORM_PRIVATE_H
 #define VFIO_PLATFORM_PRIVATE_H
 
+struct vfio_platform_irq {
+	u32			flags;
+	u32			count;
+};
+
 struct vfio_platform_region {
 	u64			addr;
 	resource_size_t		size;
@@ -25,6 +30,12 @@ struct vfio_platform_device {
 	struct platform_device		*pdev;
 	struct vfio_platform_region	*region;
 	u32				num_regions;
+	struct vfio_platform_irq	*irq;
+	u32				num_irqs;
 };
 
+extern int vfio_platform_irq_init(struct vfio_platform_device *vdev);
+
+extern void vfio_platform_irq_cleanup(struct vfio_platform_device *vdev);
+
 #endif /* VFIO_PCI_PRIVATE_H */
-- 
1.8.3.2

^ permalink raw reply related	[flat|nested] 92+ messages in thread

* [RFC PATCH v4 09/10] VFIO_PLATFORM: Initial interrupts support
@ 2014-02-08 17:29   ` Antonios Motakis
  0 siblings, 0 replies; 92+ messages in thread
From: Antonios Motakis @ 2014-02-08 17:29 UTC (permalink / raw)
  To: alex.williamson, kvmarm, iommu, linux-kernel, gregkh
  Cc: tech, a.rigo, B08248, kim.phillips, jan.kiszka, kvm, R65777,
	B07421, christoffer.dall, agraf, B16395, will.deacon,
	Antonios Motakis, Catalin Marinas, Mark Rutland

This patch allows to set an eventfd for a patform device's interrupt,
and also to trigger the interrupt eventfd from userspace for testing.

Signed-off-by: Antonios Motakis <a.motakis@virtualopensystems.com>
Tested-by: Alvise Rigo <a.rigo@virtualopensystems.com>
---
 drivers/vfio/platform/vfio_platform.c         |  36 +++++++-
 drivers/vfio/platform/vfio_platform_irq.c     | 123 +++++++++++++++++++++++++-
 drivers/vfio/platform/vfio_platform_private.h |   7 ++
 3 files changed, 162 insertions(+), 4 deletions(-)

diff --git a/drivers/vfio/platform/vfio_platform.c b/drivers/vfio/platform/vfio_platform.c
index ef1ac17..ed5d678 100644
--- a/drivers/vfio/platform/vfio_platform.c
+++ b/drivers/vfio/platform/vfio_platform.c
@@ -171,10 +171,40 @@ static long vfio_platform_ioctl(void *device_data,
 
 		return copy_to_user((void __user *)arg, &info, minsz);
 
-	} else if (cmd == VFIO_DEVICE_SET_IRQS)
-		return -EINVAL;
+	} else if (cmd == VFIO_DEVICE_SET_IRQS) {
+		struct vfio_irq_set hdr;
+		int ret = 0;
+
+		minsz = offsetofend(struct vfio_irq_set, count);
+
+		if (copy_from_user(&hdr, (void __user *)arg, minsz))
+			return -EFAULT;
+
+		if (hdr.argsz < minsz)
+			return -EINVAL;
+
+		if (hdr.index >= vdev->num_irqs)
+			return -EINVAL;
+
+		if (hdr.start != 0 || hdr.count > 1)
+			return -EINVAL;
+
+		if (hdr.count == 0 &&
+			(!(hdr.flags & VFIO_IRQ_SET_DATA_NONE) ||
+			 !(hdr.flags & VFIO_IRQ_SET_ACTION_TRIGGER)))
+			return -EINVAL;
+
+		if (hdr.flags & ~(VFIO_IRQ_SET_DATA_TYPE_MASK |
+				  VFIO_IRQ_SET_ACTION_TYPE_MASK))
+			return -EINVAL;
+
+		ret = vfio_platform_set_irqs_ioctl(vdev, hdr.flags, hdr.index,
+						   hdr.start, hdr.count,
+						   (void *)arg+minsz);
+
+		return ret;
 
-	else if (cmd == VFIO_DEVICE_RESET)
+	} else if (cmd == VFIO_DEVICE_RESET)
 		return -EINVAL;
 
 	return -ENOTTY;
diff --git a/drivers/vfio/platform/vfio_platform_irq.c b/drivers/vfio/platform/vfio_platform_irq.c
index 075c401..433edc1 100644
--- a/drivers/vfio/platform/vfio_platform_irq.c
+++ b/drivers/vfio/platform/vfio_platform_irq.c
@@ -31,6 +31,9 @@
 
 #include "vfio_platform_private.h"
 
+static int vfio_set_trigger(struct vfio_platform_device *vdev,
+			    int index, int fd);
+
 int vfio_platform_irq_init(struct vfio_platform_device *vdev)
 {
 	int cnt = 0, i;
@@ -47,9 +50,11 @@ int vfio_platform_irq_init(struct vfio_platform_device *vdev)
 
 	for (i = 0; i < cnt; i++) {
 		struct vfio_platform_irq irq;
+		int hwirq = platform_get_irq(vdev->pdev, i);
 
-		irq.flags = 0;
+		irq.flags = VFIO_IRQ_INFO_EVENTFD;
 		irq.count = 1;
+		irq.hwirq = hwirq;
 
 		vdev->irq[i] = irq;
 	}
@@ -59,5 +64,121 @@ int vfio_platform_irq_init(struct vfio_platform_device *vdev)
 
 void vfio_platform_irq_cleanup(struct vfio_platform_device *vdev)
 {
+	int i;
+
+	for (i = 0; i < vdev->num_irqs; i++)
+		vfio_set_trigger(vdev, i, -1);
+
 	kfree(vdev->irq);
 }
+
+static irqreturn_t vfio_irq_handler(int irq, void *dev_id)
+{
+	struct eventfd_ctx *trigger = dev_id;
+
+	eventfd_signal(trigger, 1);
+
+	return IRQ_HANDLED;
+}
+
+static int vfio_set_trigger(struct vfio_platform_device *vdev,
+			    int index, int fd)
+{
+	struct vfio_platform_irq *irq = &vdev->irq[index];
+	struct eventfd_ctx *trigger;
+	int ret;
+
+	if (irq->trigger) {
+		free_irq(irq->hwirq, irq);
+		kfree(irq->name);
+		eventfd_ctx_put(irq->trigger);
+		irq->trigger = NULL;
+	}
+
+	if (fd < 0) /* Disable only */
+		return 0;
+
+	irq->name = kasprintf(GFP_KERNEL, "vfio-irq[%d](%s)",
+						irq->hwirq, vdev->pdev->name);
+	if (!irq->name)
+		return -ENOMEM;
+
+	trigger = eventfd_ctx_fdget(fd);
+	if (IS_ERR(trigger)) {
+		kfree(irq->name);
+		return PTR_ERR(trigger);
+	}
+
+	irq->trigger = trigger;
+
+	ret = request_irq(irq->hwirq, vfio_irq_handler, 0, irq->name, irq);
+	if (ret) {
+		kfree(irq->name);
+		eventfd_ctx_put(trigger);
+		irq->trigger = NULL;
+		return ret;
+	}
+
+	return 0;
+}
+
+static int vfio_platform_set_irq_trigger(struct vfio_platform_device *vdev,
+				     unsigned index, unsigned start,
+				     unsigned count, uint32_t flags, void *data)
+{
+	struct vfio_platform_irq *irq = &vdev->irq[index];
+	uint8_t arr;
+	int32_t fd;
+
+	switch (flags & VFIO_IRQ_SET_DATA_TYPE_MASK) {
+	case VFIO_IRQ_SET_DATA_NONE:
+		if (count == 0)
+			return vfio_set_trigger(vdev, index, -1);
+
+		vfio_irq_handler(irq->hwirq, irq);
+		return 0;
+
+	case VFIO_IRQ_SET_DATA_BOOL:
+		if (copy_from_user(&arr, data, sizeof(uint8_t)))
+			return -EFAULT;
+
+		if (arr == 0x1) {
+			vfio_irq_handler(irq->hwirq, irq);
+			return 0;
+		}
+
+		return -EINVAL;
+
+	case VFIO_IRQ_SET_DATA_EVENTFD:
+		if (copy_from_user(&fd, data, sizeof(int32_t)))
+			return -EFAULT;
+
+		return vfio_set_trigger(vdev, index, fd);
+	}
+
+	return -EFAULT;
+}
+
+int vfio_platform_set_irqs_ioctl(struct vfio_platform_device *vdev,
+				 uint32_t flags, unsigned index, unsigned start,
+				 unsigned count, void *data)
+{
+	int (*func)(struct vfio_platform_device *vdev, unsigned index,
+		    unsigned start, unsigned count, uint32_t flags,
+		    void *data) = NULL;
+
+	switch (flags & VFIO_IRQ_SET_ACTION_TYPE_MASK) {
+	case VFIO_IRQ_SET_ACTION_MASK:
+	case VFIO_IRQ_SET_ACTION_UNMASK:
+		/* XXX not implemented */
+		break;
+	case VFIO_IRQ_SET_ACTION_TRIGGER:
+		func = vfio_platform_set_irq_trigger;
+		break;
+	}
+
+	if (!func)
+		return -ENOTTY;
+
+	return func(vdev, index, start, count, flags, data);
+}
diff --git a/drivers/vfio/platform/vfio_platform_private.h b/drivers/vfio/platform/vfio_platform_private.h
index 726f5d1..befef01 100644
--- a/drivers/vfio/platform/vfio_platform_private.h
+++ b/drivers/vfio/platform/vfio_platform_private.h
@@ -16,8 +16,11 @@
 #define VFIO_PLATFORM_PRIVATE_H
 
 struct vfio_platform_irq {
+	struct eventfd_ctx	*trigger;
 	u32			flags;
 	u32			count;
+	int			hwirq;
+	char			*name;
 };
 
 struct vfio_platform_region {
@@ -38,4 +41,8 @@ extern int vfio_platform_irq_init(struct vfio_platform_device *vdev);
 
 extern void vfio_platform_irq_cleanup(struct vfio_platform_device *vdev);
 
+extern int vfio_platform_set_irqs_ioctl(struct vfio_platform_device *vdev,
+			uint32_t flags, unsigned index, unsigned start,
+			unsigned count, void *data);
+
 #endif /* VFIO_PCI_PRIVATE_H */
-- 
1.8.3.2


^ permalink raw reply related	[flat|nested] 92+ messages in thread

* [RFC PATCH v4 09/10] VFIO_PLATFORM: Initial interrupts support
@ 2014-02-08 17:29   ` Antonios Motakis
  0 siblings, 0 replies; 92+ messages in thread
From: Antonios Motakis @ 2014-02-08 17:29 UTC (permalink / raw)
  To: alex.williamson-H+wXaHxf7aLQT0dZR+AlfA,
	kvmarm-FPEHb7Xf0XXUo1n7N8X6UoWGPAHP3yOg,
	iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	gregkh-hQyY1W1yCW8ekmWlsbkhG0B+6BGkLq7r
  Cc: Mark Rutland, B07421-KZfg59tc24xl57MIdRCFDg, Antonios Motakis,
	Catalin Marinas, kvm-u79uwXL29TY76Z2rM5mHXA,
	jan.kiszka-kv7WeFo6aLtBDgjK7y7TUQ, will.deacon-5wv7dgnIgG8,
	a.rigo-lrHrjnjw1UfHK3s98zE1ajGjJy/sRE9J, agraf-l3A5Bk7waGM,
	B08248-KZfg59tc24xl57MIdRCFDg, R65777-KZfg59tc24xl57MIdRCFDg,
	tech-lrHrjnjw1UfHK3s98zE1ajGjJy/sRE9J,
	christoffer.dall-QSEj5FYQhm4dnm+yROfE0A

This patch allows to set an eventfd for a patform device's interrupt,
and also to trigger the interrupt eventfd from userspace for testing.

Signed-off-by: Antonios Motakis <a.motakis-lrHrjnjw1UfHK3s98zE1ajGjJy/sRE9J@public.gmane.org>
Tested-by: Alvise Rigo <a.rigo-lrHrjnjw1UfHK3s98zE1ajGjJy/sRE9J@public.gmane.org>
---
 drivers/vfio/platform/vfio_platform.c         |  36 +++++++-
 drivers/vfio/platform/vfio_platform_irq.c     | 123 +++++++++++++++++++++++++-
 drivers/vfio/platform/vfio_platform_private.h |   7 ++
 3 files changed, 162 insertions(+), 4 deletions(-)

diff --git a/drivers/vfio/platform/vfio_platform.c b/drivers/vfio/platform/vfio_platform.c
index ef1ac17..ed5d678 100644
--- a/drivers/vfio/platform/vfio_platform.c
+++ b/drivers/vfio/platform/vfio_platform.c
@@ -171,10 +171,40 @@ static long vfio_platform_ioctl(void *device_data,
 
 		return copy_to_user((void __user *)arg, &info, minsz);
 
-	} else if (cmd == VFIO_DEVICE_SET_IRQS)
-		return -EINVAL;
+	} else if (cmd == VFIO_DEVICE_SET_IRQS) {
+		struct vfio_irq_set hdr;
+		int ret = 0;
+
+		minsz = offsetofend(struct vfio_irq_set, count);
+
+		if (copy_from_user(&hdr, (void __user *)arg, minsz))
+			return -EFAULT;
+
+		if (hdr.argsz < minsz)
+			return -EINVAL;
+
+		if (hdr.index >= vdev->num_irqs)
+			return -EINVAL;
+
+		if (hdr.start != 0 || hdr.count > 1)
+			return -EINVAL;
+
+		if (hdr.count == 0 &&
+			(!(hdr.flags & VFIO_IRQ_SET_DATA_NONE) ||
+			 !(hdr.flags & VFIO_IRQ_SET_ACTION_TRIGGER)))
+			return -EINVAL;
+
+		if (hdr.flags & ~(VFIO_IRQ_SET_DATA_TYPE_MASK |
+				  VFIO_IRQ_SET_ACTION_TYPE_MASK))
+			return -EINVAL;
+
+		ret = vfio_platform_set_irqs_ioctl(vdev, hdr.flags, hdr.index,
+						   hdr.start, hdr.count,
+						   (void *)arg+minsz);
+
+		return ret;
 
-	else if (cmd == VFIO_DEVICE_RESET)
+	} else if (cmd == VFIO_DEVICE_RESET)
 		return -EINVAL;
 
 	return -ENOTTY;
diff --git a/drivers/vfio/platform/vfio_platform_irq.c b/drivers/vfio/platform/vfio_platform_irq.c
index 075c401..433edc1 100644
--- a/drivers/vfio/platform/vfio_platform_irq.c
+++ b/drivers/vfio/platform/vfio_platform_irq.c
@@ -31,6 +31,9 @@
 
 #include "vfio_platform_private.h"
 
+static int vfio_set_trigger(struct vfio_platform_device *vdev,
+			    int index, int fd);
+
 int vfio_platform_irq_init(struct vfio_platform_device *vdev)
 {
 	int cnt = 0, i;
@@ -47,9 +50,11 @@ int vfio_platform_irq_init(struct vfio_platform_device *vdev)
 
 	for (i = 0; i < cnt; i++) {
 		struct vfio_platform_irq irq;
+		int hwirq = platform_get_irq(vdev->pdev, i);
 
-		irq.flags = 0;
+		irq.flags = VFIO_IRQ_INFO_EVENTFD;
 		irq.count = 1;
+		irq.hwirq = hwirq;
 
 		vdev->irq[i] = irq;
 	}
@@ -59,5 +64,121 @@ int vfio_platform_irq_init(struct vfio_platform_device *vdev)
 
 void vfio_platform_irq_cleanup(struct vfio_platform_device *vdev)
 {
+	int i;
+
+	for (i = 0; i < vdev->num_irqs; i++)
+		vfio_set_trigger(vdev, i, -1);
+
 	kfree(vdev->irq);
 }
+
+static irqreturn_t vfio_irq_handler(int irq, void *dev_id)
+{
+	struct eventfd_ctx *trigger = dev_id;
+
+	eventfd_signal(trigger, 1);
+
+	return IRQ_HANDLED;
+}
+
+static int vfio_set_trigger(struct vfio_platform_device *vdev,
+			    int index, int fd)
+{
+	struct vfio_platform_irq *irq = &vdev->irq[index];
+	struct eventfd_ctx *trigger;
+	int ret;
+
+	if (irq->trigger) {
+		free_irq(irq->hwirq, irq);
+		kfree(irq->name);
+		eventfd_ctx_put(irq->trigger);
+		irq->trigger = NULL;
+	}
+
+	if (fd < 0) /* Disable only */
+		return 0;
+
+	irq->name = kasprintf(GFP_KERNEL, "vfio-irq[%d](%s)",
+						irq->hwirq, vdev->pdev->name);
+	if (!irq->name)
+		return -ENOMEM;
+
+	trigger = eventfd_ctx_fdget(fd);
+	if (IS_ERR(trigger)) {
+		kfree(irq->name);
+		return PTR_ERR(trigger);
+	}
+
+	irq->trigger = trigger;
+
+	ret = request_irq(irq->hwirq, vfio_irq_handler, 0, irq->name, irq);
+	if (ret) {
+		kfree(irq->name);
+		eventfd_ctx_put(trigger);
+		irq->trigger = NULL;
+		return ret;
+	}
+
+	return 0;
+}
+
+static int vfio_platform_set_irq_trigger(struct vfio_platform_device *vdev,
+				     unsigned index, unsigned start,
+				     unsigned count, uint32_t flags, void *data)
+{
+	struct vfio_platform_irq *irq = &vdev->irq[index];
+	uint8_t arr;
+	int32_t fd;
+
+	switch (flags & VFIO_IRQ_SET_DATA_TYPE_MASK) {
+	case VFIO_IRQ_SET_DATA_NONE:
+		if (count == 0)
+			return vfio_set_trigger(vdev, index, -1);
+
+		vfio_irq_handler(irq->hwirq, irq);
+		return 0;
+
+	case VFIO_IRQ_SET_DATA_BOOL:
+		if (copy_from_user(&arr, data, sizeof(uint8_t)))
+			return -EFAULT;
+
+		if (arr == 0x1) {
+			vfio_irq_handler(irq->hwirq, irq);
+			return 0;
+		}
+
+		return -EINVAL;
+
+	case VFIO_IRQ_SET_DATA_EVENTFD:
+		if (copy_from_user(&fd, data, sizeof(int32_t)))
+			return -EFAULT;
+
+		return vfio_set_trigger(vdev, index, fd);
+	}
+
+	return -EFAULT;
+}
+
+int vfio_platform_set_irqs_ioctl(struct vfio_platform_device *vdev,
+				 uint32_t flags, unsigned index, unsigned start,
+				 unsigned count, void *data)
+{
+	int (*func)(struct vfio_platform_device *vdev, unsigned index,
+		    unsigned start, unsigned count, uint32_t flags,
+		    void *data) = NULL;
+
+	switch (flags & VFIO_IRQ_SET_ACTION_TYPE_MASK) {
+	case VFIO_IRQ_SET_ACTION_MASK:
+	case VFIO_IRQ_SET_ACTION_UNMASK:
+		/* XXX not implemented */
+		break;
+	case VFIO_IRQ_SET_ACTION_TRIGGER:
+		func = vfio_platform_set_irq_trigger;
+		break;
+	}
+
+	if (!func)
+		return -ENOTTY;
+
+	return func(vdev, index, start, count, flags, data);
+}
diff --git a/drivers/vfio/platform/vfio_platform_private.h b/drivers/vfio/platform/vfio_platform_private.h
index 726f5d1..befef01 100644
--- a/drivers/vfio/platform/vfio_platform_private.h
+++ b/drivers/vfio/platform/vfio_platform_private.h
@@ -16,8 +16,11 @@
 #define VFIO_PLATFORM_PRIVATE_H
 
 struct vfio_platform_irq {
+	struct eventfd_ctx	*trigger;
 	u32			flags;
 	u32			count;
+	int			hwirq;
+	char			*name;
 };
 
 struct vfio_platform_region {
@@ -38,4 +41,8 @@ extern int vfio_platform_irq_init(struct vfio_platform_device *vdev);
 
 extern void vfio_platform_irq_cleanup(struct vfio_platform_device *vdev);
 
+extern int vfio_platform_set_irqs_ioctl(struct vfio_platform_device *vdev,
+			uint32_t flags, unsigned index, unsigned start,
+			unsigned count, void *data);
+
 #endif /* VFIO_PCI_PRIVATE_H */
-- 
1.8.3.2

^ permalink raw reply related	[flat|nested] 92+ messages in thread

* [RFC PATCH v4 10/10] VFIO_PLATFORM: Support for maskable and automasked interrupts
@ 2014-02-08 17:29   ` Antonios Motakis
  0 siblings, 0 replies; 92+ messages in thread
From: Antonios Motakis @ 2014-02-08 17:29 UTC (permalink / raw)
  To: alex.williamson, kvmarm, iommu, linux-kernel, gregkh
  Cc: tech, a.rigo, B08248, kim.phillips, jan.kiszka, kvm, R65777,
	B07421, christoffer.dall, agraf, B16395, will.deacon,
	Antonios Motakis, Catalin Marinas, Mark Rutland

Adds support to mask interrupts, and also for automasked interrupts.
Level sensitive interrupts are exposed as automasked interrupts and
are masked and disabled automatically when they fire.

Signed-off-by: Antonios Motakis <a.motakis@virtualopensystems.com>
Tested-by: Alvise Rigo <a.rigo@virtualopensystems.com>
---
 drivers/vfio/platform/vfio_platform_irq.c     | 117 ++++++++++++++++++++++++--
 drivers/vfio/platform/vfio_platform_private.h |   2 +
 2 files changed, 113 insertions(+), 6 deletions(-)

diff --git a/drivers/vfio/platform/vfio_platform_irq.c b/drivers/vfio/platform/vfio_platform_irq.c
index 433edc1..e38982f 100644
--- a/drivers/vfio/platform/vfio_platform_irq.c
+++ b/drivers/vfio/platform/vfio_platform_irq.c
@@ -52,9 +52,16 @@ int vfio_platform_irq_init(struct vfio_platform_device *vdev)
 		struct vfio_platform_irq irq;
 		int hwirq = platform_get_irq(vdev->pdev, i);
 
-		irq.flags = VFIO_IRQ_INFO_EVENTFD;
+		spin_lock_init(&irq.lock);
+
+		irq.flags = VFIO_IRQ_INFO_EVENTFD | VFIO_IRQ_INFO_MASKABLE;
+
+		if (irq_get_trigger_type(hwirq) & IRQ_TYPE_LEVEL_MASK)
+			irq.flags |= VFIO_IRQ_INFO_AUTOMASKED;
+
 		irq.count = 1;
 		irq.hwirq = hwirq;
+		irq.masked = false;
 
 		vdev->irq[i] = irq;
 	}
@@ -66,19 +73,39 @@ void vfio_platform_irq_cleanup(struct vfio_platform_device *vdev)
 {
 	int i;
 
-	for (i = 0; i < vdev->num_irqs; i++)
+	for (i = 0; i < vdev->num_irqs; i++) {
 		vfio_set_trigger(vdev, i, -1);
 
+		if (vdev->irq[i].masked)
+			enable_irq(vdev->irq[i].hwirq);
+	}
+
 	kfree(vdev->irq);
 }
 
 static irqreturn_t vfio_irq_handler(int irq, void *dev_id)
 {
-	struct eventfd_ctx *trigger = dev_id;
+	struct vfio_platform_irq *irq_ctx = dev_id;
+	unsigned long flags;
+	int ret = IRQ_NONE;
+
+	spin_lock_irqsave(&irq_ctx->lock, flags);
+
+	if (!irq_ctx->masked) {
+		ret = IRQ_HANDLED;
+
+		if (irq_ctx->flags & VFIO_IRQ_INFO_AUTOMASKED) {
+			disable_irq_nosync(irq_ctx->hwirq);
+			irq_ctx->masked = true;
+		}
+	}
 
-	eventfd_signal(trigger, 1);
+	spin_unlock_irqrestore(&irq_ctx->lock, flags);
 
-	return IRQ_HANDLED;
+	if (ret == IRQ_HANDLED)
+		eventfd_signal(irq_ctx->trigger, 1);
+
+	return ret;
 }
 
 static int vfio_set_trigger(struct vfio_platform_device *vdev,
@@ -159,6 +186,82 @@ static int vfio_platform_set_irq_trigger(struct vfio_platform_device *vdev,
 	return -EFAULT;
 }
 
+static int vfio_platform_set_irq_unmask(struct vfio_platform_device *vdev,
+				    unsigned index, unsigned start,
+				    unsigned count, uint32_t flags, void *data)
+{
+	uint8_t arr;
+
+	if (start != 0 || count != 1)
+		return -EINVAL;
+
+	switch (flags & VFIO_IRQ_SET_DATA_TYPE_MASK) {
+	case VFIO_IRQ_SET_DATA_BOOL:
+		if (copy_from_user(&arr, data, sizeof(uint8_t)))
+			return -EFAULT;
+
+		if (arr != 0x1)
+			return -EINVAL;
+
+	case VFIO_IRQ_SET_DATA_NONE:
+
+		spin_lock_irq(&vdev->irq[index].lock);
+
+		if (vdev->irq[index].masked) {
+			enable_irq(vdev->irq[index].hwirq);
+			vdev->irq[index].masked = false;
+		}
+
+		spin_unlock_irq(&vdev->irq[index].lock);
+
+		return 0;
+
+	case VFIO_IRQ_SET_DATA_EVENTFD: /* XXX not implemented yet */
+	default:
+		return -ENOTTY;
+	}
+
+	return 0;
+}
+
+static int vfio_platform_set_irq_mask(struct vfio_platform_device *vdev,
+				    unsigned index, unsigned start,
+				    unsigned count, uint32_t flags, void *data)
+{
+	uint8_t arr;
+
+	if (start != 0 || count != 1)
+		return -EINVAL;
+
+	switch (flags & VFIO_IRQ_SET_DATA_TYPE_MASK) {
+	case VFIO_IRQ_SET_DATA_BOOL:
+		if (copy_from_user(&arr, data, sizeof(uint8_t)))
+			return -EFAULT;
+
+		if (arr != 0x1)
+			return -EINVAL;
+
+	case VFIO_IRQ_SET_DATA_NONE:
+
+		spin_lock_irq(&vdev->irq[index].lock);
+
+		if (!vdev->irq[index].masked) {
+			disable_irq(vdev->irq[index].hwirq);
+			vdev->irq[index].masked = true;
+		}
+
+		spin_unlock_irq(&vdev->irq[index].lock);
+
+		return 0;
+
+	case VFIO_IRQ_SET_DATA_EVENTFD: /* XXX not implemented yet */
+	default:
+		return -ENOTTY;
+	}
+
+	return 0;
+}
+
 int vfio_platform_set_irqs_ioctl(struct vfio_platform_device *vdev,
 				 uint32_t flags, unsigned index, unsigned start,
 				 unsigned count, void *data)
@@ -169,8 +272,10 @@ int vfio_platform_set_irqs_ioctl(struct vfio_platform_device *vdev,
 
 	switch (flags & VFIO_IRQ_SET_ACTION_TYPE_MASK) {
 	case VFIO_IRQ_SET_ACTION_MASK:
+		func = vfio_platform_set_irq_mask;
+		break;
 	case VFIO_IRQ_SET_ACTION_UNMASK:
-		/* XXX not implemented */
+		func = vfio_platform_set_irq_unmask;
 		break;
 	case VFIO_IRQ_SET_ACTION_TRIGGER:
 		func = vfio_platform_set_irq_trigger;
diff --git a/drivers/vfio/platform/vfio_platform_private.h b/drivers/vfio/platform/vfio_platform_private.h
index befef01..5721313 100644
--- a/drivers/vfio/platform/vfio_platform_private.h
+++ b/drivers/vfio/platform/vfio_platform_private.h
@@ -21,6 +21,8 @@ struct vfio_platform_irq {
 	u32			count;
 	int			hwirq;
 	char			*name;
+	bool			masked;
+	spinlock_t		lock;
 };
 
 struct vfio_platform_region {
-- 
1.8.3.2


^ permalink raw reply related	[flat|nested] 92+ messages in thread

* [RFC PATCH v4 10/10] VFIO_PLATFORM: Support for maskable and automasked interrupts
@ 2014-02-08 17:29   ` Antonios Motakis
  0 siblings, 0 replies; 92+ messages in thread
From: Antonios Motakis @ 2014-02-08 17:29 UTC (permalink / raw)
  To: alex.williamson-H+wXaHxf7aLQT0dZR+AlfA,
	kvmarm-FPEHb7Xf0XXUo1n7N8X6UoWGPAHP3yOg,
	iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	gregkh-hQyY1W1yCW8ekmWlsbkhG0B+6BGkLq7r
  Cc: Mark Rutland, B07421-KZfg59tc24xl57MIdRCFDg, Antonios Motakis,
	Catalin Marinas, kvm-u79uwXL29TY76Z2rM5mHXA,
	jan.kiszka-kv7WeFo6aLtBDgjK7y7TUQ, will.deacon-5wv7dgnIgG8,
	a.rigo-lrHrjnjw1UfHK3s98zE1ajGjJy/sRE9J, agraf-l3A5Bk7waGM,
	B08248-KZfg59tc24xl57MIdRCFDg, R65777-KZfg59tc24xl57MIdRCFDg,
	tech-lrHrjnjw1UfHK3s98zE1ajGjJy/sRE9J,
	christoffer.dall-QSEj5FYQhm4dnm+yROfE0A

Adds support to mask interrupts, and also for automasked interrupts.
Level sensitive interrupts are exposed as automasked interrupts and
are masked and disabled automatically when they fire.

Signed-off-by: Antonios Motakis <a.motakis-lrHrjnjw1UfHK3s98zE1ajGjJy/sRE9J@public.gmane.org>
Tested-by: Alvise Rigo <a.rigo-lrHrjnjw1UfHK3s98zE1ajGjJy/sRE9J@public.gmane.org>
---
 drivers/vfio/platform/vfio_platform_irq.c     | 117 ++++++++++++++++++++++++--
 drivers/vfio/platform/vfio_platform_private.h |   2 +
 2 files changed, 113 insertions(+), 6 deletions(-)

diff --git a/drivers/vfio/platform/vfio_platform_irq.c b/drivers/vfio/platform/vfio_platform_irq.c
index 433edc1..e38982f 100644
--- a/drivers/vfio/platform/vfio_platform_irq.c
+++ b/drivers/vfio/platform/vfio_platform_irq.c
@@ -52,9 +52,16 @@ int vfio_platform_irq_init(struct vfio_platform_device *vdev)
 		struct vfio_platform_irq irq;
 		int hwirq = platform_get_irq(vdev->pdev, i);
 
-		irq.flags = VFIO_IRQ_INFO_EVENTFD;
+		spin_lock_init(&irq.lock);
+
+		irq.flags = VFIO_IRQ_INFO_EVENTFD | VFIO_IRQ_INFO_MASKABLE;
+
+		if (irq_get_trigger_type(hwirq) & IRQ_TYPE_LEVEL_MASK)
+			irq.flags |= VFIO_IRQ_INFO_AUTOMASKED;
+
 		irq.count = 1;
 		irq.hwirq = hwirq;
+		irq.masked = false;
 
 		vdev->irq[i] = irq;
 	}
@@ -66,19 +73,39 @@ void vfio_platform_irq_cleanup(struct vfio_platform_device *vdev)
 {
 	int i;
 
-	for (i = 0; i < vdev->num_irqs; i++)
+	for (i = 0; i < vdev->num_irqs; i++) {
 		vfio_set_trigger(vdev, i, -1);
 
+		if (vdev->irq[i].masked)
+			enable_irq(vdev->irq[i].hwirq);
+	}
+
 	kfree(vdev->irq);
 }
 
 static irqreturn_t vfio_irq_handler(int irq, void *dev_id)
 {
-	struct eventfd_ctx *trigger = dev_id;
+	struct vfio_platform_irq *irq_ctx = dev_id;
+	unsigned long flags;
+	int ret = IRQ_NONE;
+
+	spin_lock_irqsave(&irq_ctx->lock, flags);
+
+	if (!irq_ctx->masked) {
+		ret = IRQ_HANDLED;
+
+		if (irq_ctx->flags & VFIO_IRQ_INFO_AUTOMASKED) {
+			disable_irq_nosync(irq_ctx->hwirq);
+			irq_ctx->masked = true;
+		}
+	}
 
-	eventfd_signal(trigger, 1);
+	spin_unlock_irqrestore(&irq_ctx->lock, flags);
 
-	return IRQ_HANDLED;
+	if (ret == IRQ_HANDLED)
+		eventfd_signal(irq_ctx->trigger, 1);
+
+	return ret;
 }
 
 static int vfio_set_trigger(struct vfio_platform_device *vdev,
@@ -159,6 +186,82 @@ static int vfio_platform_set_irq_trigger(struct vfio_platform_device *vdev,
 	return -EFAULT;
 }
 
+static int vfio_platform_set_irq_unmask(struct vfio_platform_device *vdev,
+				    unsigned index, unsigned start,
+				    unsigned count, uint32_t flags, void *data)
+{
+	uint8_t arr;
+
+	if (start != 0 || count != 1)
+		return -EINVAL;
+
+	switch (flags & VFIO_IRQ_SET_DATA_TYPE_MASK) {
+	case VFIO_IRQ_SET_DATA_BOOL:
+		if (copy_from_user(&arr, data, sizeof(uint8_t)))
+			return -EFAULT;
+
+		if (arr != 0x1)
+			return -EINVAL;
+
+	case VFIO_IRQ_SET_DATA_NONE:
+
+		spin_lock_irq(&vdev->irq[index].lock);
+
+		if (vdev->irq[index].masked) {
+			enable_irq(vdev->irq[index].hwirq);
+			vdev->irq[index].masked = false;
+		}
+
+		spin_unlock_irq(&vdev->irq[index].lock);
+
+		return 0;
+
+	case VFIO_IRQ_SET_DATA_EVENTFD: /* XXX not implemented yet */
+	default:
+		return -ENOTTY;
+	}
+
+	return 0;
+}
+
+static int vfio_platform_set_irq_mask(struct vfio_platform_device *vdev,
+				    unsigned index, unsigned start,
+				    unsigned count, uint32_t flags, void *data)
+{
+	uint8_t arr;
+
+	if (start != 0 || count != 1)
+		return -EINVAL;
+
+	switch (flags & VFIO_IRQ_SET_DATA_TYPE_MASK) {
+	case VFIO_IRQ_SET_DATA_BOOL:
+		if (copy_from_user(&arr, data, sizeof(uint8_t)))
+			return -EFAULT;
+
+		if (arr != 0x1)
+			return -EINVAL;
+
+	case VFIO_IRQ_SET_DATA_NONE:
+
+		spin_lock_irq(&vdev->irq[index].lock);
+
+		if (!vdev->irq[index].masked) {
+			disable_irq(vdev->irq[index].hwirq);
+			vdev->irq[index].masked = true;
+		}
+
+		spin_unlock_irq(&vdev->irq[index].lock);
+
+		return 0;
+
+	case VFIO_IRQ_SET_DATA_EVENTFD: /* XXX not implemented yet */
+	default:
+		return -ENOTTY;
+	}
+
+	return 0;
+}
+
 int vfio_platform_set_irqs_ioctl(struct vfio_platform_device *vdev,
 				 uint32_t flags, unsigned index, unsigned start,
 				 unsigned count, void *data)
@@ -169,8 +272,10 @@ int vfio_platform_set_irqs_ioctl(struct vfio_platform_device *vdev,
 
 	switch (flags & VFIO_IRQ_SET_ACTION_TYPE_MASK) {
 	case VFIO_IRQ_SET_ACTION_MASK:
+		func = vfio_platform_set_irq_mask;
+		break;
 	case VFIO_IRQ_SET_ACTION_UNMASK:
-		/* XXX not implemented */
+		func = vfio_platform_set_irq_unmask;
 		break;
 	case VFIO_IRQ_SET_ACTION_TRIGGER:
 		func = vfio_platform_set_irq_trigger;
diff --git a/drivers/vfio/platform/vfio_platform_private.h b/drivers/vfio/platform/vfio_platform_private.h
index befef01..5721313 100644
--- a/drivers/vfio/platform/vfio_platform_private.h
+++ b/drivers/vfio/platform/vfio_platform_private.h
@@ -21,6 +21,8 @@ struct vfio_platform_irq {
 	u32			count;
 	int			hwirq;
 	char			*name;
+	bool			masked;
+	spinlock_t		lock;
 };
 
 struct vfio_platform_region {
-- 
1.8.3.2

^ permalink raw reply related	[flat|nested] 92+ messages in thread

* Re: [RFC PATCH v4 02/10] VFIO_IOMMU_TYPE1: Introduce the VFIO_DMA_MAP_FLAG_EXEC flag
@ 2014-02-10 20:04     ` Alex Williamson
  0 siblings, 0 replies; 92+ messages in thread
From: Alex Williamson @ 2014-02-10 20:04 UTC (permalink / raw)
  To: Antonios Motakis
  Cc: kvmarm, iommu, linux-kernel, gregkh, tech, a.rigo, B08248,
	kim.phillips, jan.kiszka, kvm, R65777, B07421, christoffer.dall,
	agraf, B16395, will.deacon

On Sat, 2014-02-08 at 18:29 +0100, Antonios Motakis wrote:
> The ARM SMMU driver expects the IOMMU_EXEC flag, otherwise it will
> set the page tables for a device as XN (execute never). This affects
> devices such as the ARM PL330 DMA Controller, which fails to operate
> if the XN flag is set on the memory it tries to fetch its instructions
> from.
> 
> We introduce the VFIO_DMA_MAP_FLAG_EXEC to VFIO, and use it in
> VFIO_IOMMU_TYPE1 to set the IOMMU_EXEC flag. This way the user can
> control whether the XN flag will be set on the requested mappings.

Should the user be told whether this flag is available?  It looks like
existing iommu drivers for x86 ignore the flag, can we count on that?
Thanks,

Alex

> Signed-off-by: Antonios Motakis <a.motakis@virtualopensystems.com>
> ---
>  drivers/vfio/vfio_iommu_type1.c | 5 ++++-
>  include/uapi/linux/vfio.h       | 1 +
>  2 files changed, 5 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/vfio/vfio_iommu_type1.c b/drivers/vfio/vfio_iommu_type1.c
> index 4fb7a8f..ad7a1f6 100644
> --- a/drivers/vfio/vfio_iommu_type1.c
> +++ b/drivers/vfio/vfio_iommu_type1.c
> @@ -557,6 +557,8 @@ static int vfio_dma_do_map(struct vfio_iommu *iommu,
>  		prot |= IOMMU_WRITE;
>  	if (map->flags & VFIO_DMA_MAP_FLAG_READ)
>  		prot |= IOMMU_READ;
> +	if (map->flags & VFIO_DMA_MAP_FLAG_EXEC)
> +		prot |= IOMMU_EXEC;
>  
>  	if (!prot)
>  		return -EINVAL; /* No READ/WRITE? */
> @@ -865,7 +867,8 @@ static long vfio_iommu_type1_ioctl(void *iommu_data,
>  	} else if (cmd == VFIO_IOMMU_MAP_DMA) {
>  		struct vfio_iommu_type1_dma_map map;
>  		uint32_t mask = VFIO_DMA_MAP_FLAG_READ |
> -				VFIO_DMA_MAP_FLAG_WRITE;
> +				VFIO_DMA_MAP_FLAG_WRITE |
> +				VFIO_DMA_MAP_FLAG_EXEC;
>  
>  		minsz = offsetofend(struct vfio_iommu_type1_dma_map, size);
>  
> diff --git a/include/uapi/linux/vfio.h b/include/uapi/linux/vfio.h
> index 0fd47f5..d8e9e99 100644
> --- a/include/uapi/linux/vfio.h
> +++ b/include/uapi/linux/vfio.h
> @@ -392,6 +392,7 @@ struct vfio_iommu_type1_dma_map {
>  	__u32	flags;
>  #define VFIO_DMA_MAP_FLAG_READ (1 << 0)		/* readable from device */
>  #define VFIO_DMA_MAP_FLAG_WRITE (1 << 1)	/* writable from device */
> +#define VFIO_DMA_MAP_FLAG_EXEC (1 << 2)		/* executable from device */
>  	__u64	vaddr;				/* Process virtual address */
>  	__u64	iova;				/* IO virtual address */
>  	__u64	size;				/* Size of mapping (bytes) */




^ permalink raw reply	[flat|nested] 92+ messages in thread

* Re: [RFC PATCH v4 02/10] VFIO_IOMMU_TYPE1: Introduce the VFIO_DMA_MAP_FLAG_EXEC flag
@ 2014-02-10 20:04     ` Alex Williamson
  0 siblings, 0 replies; 92+ messages in thread
From: Alex Williamson @ 2014-02-10 20:04 UTC (permalink / raw)
  To: Antonios Motakis
  Cc: B07421-KZfg59tc24xl57MIdRCFDg, kvm-u79uwXL29TY76Z2rM5mHXA,
	gregkh-hQyY1W1yCW8ekmWlsbkhG0B+6BGkLq7r, agraf-l3A5Bk7waGM,
	R65777-KZfg59tc24xl57MIdRCFDg, will.deacon-5wv7dgnIgG8,
	a.rigo-lrHrjnjw1UfHK3s98zE1ajGjJy/sRE9J,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	B08248-KZfg59tc24xl57MIdRCFDg,
	iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	jan.kiszka-kv7WeFo6aLtBDgjK7y7TUQ,
	tech-lrHrjnjw1UfHK3s98zE1ajGjJy/sRE9J,
	kvmarm-FPEHb7Xf0XXUo1n7N8X6UoWGPAHP3yOg,
	christoffer.dall-QSEj5FYQhm4dnm+yROfE0A

On Sat, 2014-02-08 at 18:29 +0100, Antonios Motakis wrote:
> The ARM SMMU driver expects the IOMMU_EXEC flag, otherwise it will
> set the page tables for a device as XN (execute never). This affects
> devices such as the ARM PL330 DMA Controller, which fails to operate
> if the XN flag is set on the memory it tries to fetch its instructions
> from.
> 
> We introduce the VFIO_DMA_MAP_FLAG_EXEC to VFIO, and use it in
> VFIO_IOMMU_TYPE1 to set the IOMMU_EXEC flag. This way the user can
> control whether the XN flag will be set on the requested mappings.

Should the user be told whether this flag is available?  It looks like
existing iommu drivers for x86 ignore the flag, can we count on that?
Thanks,

Alex

> Signed-off-by: Antonios Motakis <a.motakis-lrHrjnjw1UfHK3s98zE1ajGjJy/sRE9J@public.gmane.org>
> ---
>  drivers/vfio/vfio_iommu_type1.c | 5 ++++-
>  include/uapi/linux/vfio.h       | 1 +
>  2 files changed, 5 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/vfio/vfio_iommu_type1.c b/drivers/vfio/vfio_iommu_type1.c
> index 4fb7a8f..ad7a1f6 100644
> --- a/drivers/vfio/vfio_iommu_type1.c
> +++ b/drivers/vfio/vfio_iommu_type1.c
> @@ -557,6 +557,8 @@ static int vfio_dma_do_map(struct vfio_iommu *iommu,
>  		prot |= IOMMU_WRITE;
>  	if (map->flags & VFIO_DMA_MAP_FLAG_READ)
>  		prot |= IOMMU_READ;
> +	if (map->flags & VFIO_DMA_MAP_FLAG_EXEC)
> +		prot |= IOMMU_EXEC;
>  
>  	if (!prot)
>  		return -EINVAL; /* No READ/WRITE? */
> @@ -865,7 +867,8 @@ static long vfio_iommu_type1_ioctl(void *iommu_data,
>  	} else if (cmd == VFIO_IOMMU_MAP_DMA) {
>  		struct vfio_iommu_type1_dma_map map;
>  		uint32_t mask = VFIO_DMA_MAP_FLAG_READ |
> -				VFIO_DMA_MAP_FLAG_WRITE;
> +				VFIO_DMA_MAP_FLAG_WRITE |
> +				VFIO_DMA_MAP_FLAG_EXEC;
>  
>  		minsz = offsetofend(struct vfio_iommu_type1_dma_map, size);
>  
> diff --git a/include/uapi/linux/vfio.h b/include/uapi/linux/vfio.h
> index 0fd47f5..d8e9e99 100644
> --- a/include/uapi/linux/vfio.h
> +++ b/include/uapi/linux/vfio.h
> @@ -392,6 +392,7 @@ struct vfio_iommu_type1_dma_map {
>  	__u32	flags;
>  #define VFIO_DMA_MAP_FLAG_READ (1 << 0)		/* readable from device */
>  #define VFIO_DMA_MAP_FLAG_WRITE (1 << 1)	/* writable from device */
> +#define VFIO_DMA_MAP_FLAG_EXEC (1 << 2)		/* executable from device */
>  	__u64	vaddr;				/* Process virtual address */
>  	__u64	iova;				/* IO virtual address */
>  	__u64	size;				/* Size of mapping (bytes) */

^ permalink raw reply	[flat|nested] 92+ messages in thread

* Re: [RFC PATCH v4 05/10] VFIO_PLATFORM: Return info for device and its memory mapped IO regions
@ 2014-02-10 22:32     ` Alex Williamson
  0 siblings, 0 replies; 92+ messages in thread
From: Alex Williamson @ 2014-02-10 22:32 UTC (permalink / raw)
  To: Antonios Motakis
  Cc: kvmarm, iommu, linux-kernel, gregkh, tech, a.rigo, B08248,
	kim.phillips, jan.kiszka, kvm, R65777, B07421, christoffer.dall,
	agraf, B16395, will.deacon, Catalin Marinas, Mark Rutland

On Sat, 2014-02-08 at 18:29 +0100, Antonios Motakis wrote:
> A VFIO userspace driver will start by opening the VFIO device
> that corresponds to an IOMMU group, and will use the ioctl interface
> to get the basic device info, such as number of memory regions and
> interrupts, and their properties.
> 
> This patch enables the IOCTLs:
>  - VFIO_DEVICE_GET_INFO
>  - VFIO_DEVICE_GET_REGION_INFO
> 
>  IRQ info is provided by one of the latter patches.
> 
> Signed-off-by: Antonios Motakis <a.motakis@virtualopensystems.com>
> ---
>  drivers/vfio/platform/vfio_platform.c         | 74 ++++++++++++++++++++++++---
>  drivers/vfio/platform/vfio_platform_private.h |  8 +++
>  2 files changed, 76 insertions(+), 6 deletions(-)
> 
> diff --git a/drivers/vfio/platform/vfio_platform.c b/drivers/vfio/platform/vfio_platform.c
> index a3d8f29..f7db5c0 100644
> --- a/drivers/vfio/platform/vfio_platform.c
> +++ b/drivers/vfio/platform/vfio_platform.c
> @@ -34,15 +34,62 @@
>  #define DRIVER_AUTHOR   "Antonios Motakis <a.motakis@virtualopensystems.com>"
>  #define DRIVER_DESC     "VFIO for platform devices - User Level meta-driver"
>  
> +static int vfio_platform_regions_init(struct vfio_platform_device *vdev)
> +{
> +	int cnt = 0, i;
> +
> +	while (platform_get_resource(vdev->pdev, IORESOURCE_MEM, cnt))
> +		cnt++;
> +
> +	vdev->num_regions = cnt;
> +
> +	vdev->region = kzalloc(sizeof(struct vfio_platform_region) * cnt,
> +				GFP_KERNEL);
> +	if (!vdev->region)
> +		return -ENOMEM;
> +
> +	for (i = 0; i < cnt;  i++) {
> +		struct vfio_platform_region region;
> +		struct resource *res =
> +			platform_get_resource(vdev->pdev, IORESOURCE_MEM, i);
> +
> +		region.addr = res->start;
> +		region.size = resource_size(res);
> +		region.flags = 0;
> +
> +		vdev->region[i] = region;
> +	}
> +
> +	return 0;
> +}
> +
> +static void vfio_platform_regions_cleanup(struct vfio_platform_device *vdev)
> +{
> +	kfree(vdev->region);
> +}
> +
>  static void vfio_platform_release(void *device_data)
>  {
> +	struct vfio_platform_device *vdev = device_data;
> +
> +	vfio_platform_regions_cleanup(vdev);
> +
>  	module_put(THIS_MODULE);
>  }
>  
>  static int vfio_platform_open(void *device_data)
>  {
> -	if (!try_module_get(THIS_MODULE))
> +	struct vfio_platform_device *vdev = device_data;
> +	int ret;
> +
> +	ret = vfio_platform_regions_init(vdev);
> +	if (ret)
> +		return ret;
> +
> +	if (!try_module_get(THIS_MODULE)) {
> +		vfio_platform_regions_cleanup(vdev);
>  		return -ENODEV;
> +	}
>  
>  	return 0;
>  }
> @@ -65,18 +112,33 @@ static long vfio_platform_ioctl(void *device_data,
>  			return -EINVAL;
>  
>  		info.flags = VFIO_DEVICE_FLAGS_PLATFORM;
> -		info.num_regions = 0;
> +		info.num_regions = vdev->num_regions;
>  		info.num_irqs = 0;
>  
>  		return copy_to_user((void __user *)arg, &info, minsz);
>  
> -	} else if (cmd == VFIO_DEVICE_GET_REGION_INFO)
> -		return -EINVAL;
> +	} else if (cmd == VFIO_DEVICE_GET_REGION_INFO) {
> +		struct vfio_region_info info;
> +
> +		minsz = offsetofend(struct vfio_region_info, offset);
> +
> +		if (copy_from_user(&info, (void __user *)arg, minsz))
> +			return -EFAULT;
> +
> +		if (info.argsz < minsz)
> +			return -EINVAL;

Missing a bounds check for info.index, user could be getting back kernel
data here.  Thanks,

Alex

> +
> +		/* map offset to the physical address  */
> +		info.offset = vdev->region[info.index].addr;
> +		info.size = vdev->region[info.index].size;
> +		info.flags = vdev->region[info.index].flags;
> +
> +		return copy_to_user((void __user *)arg, &info, minsz);
>  
> -	else if (cmd == VFIO_DEVICE_GET_IRQ_INFO)
> +	} else if (cmd == VFIO_DEVICE_GET_IRQ_INFO) {
>  		return -EINVAL;
>  
> -	else if (cmd == VFIO_DEVICE_SET_IRQS)
> +	} else if (cmd == VFIO_DEVICE_SET_IRQS)
>  		return -EINVAL;
>  
>  	else if (cmd == VFIO_DEVICE_RESET)
> diff --git a/drivers/vfio/platform/vfio_platform_private.h b/drivers/vfio/platform/vfio_platform_private.h
> index 6df8084..4705aa5 100644
> --- a/drivers/vfio/platform/vfio_platform_private.h
> +++ b/drivers/vfio/platform/vfio_platform_private.h
> @@ -15,8 +15,16 @@
>  #ifndef VFIO_PLATFORM_PRIVATE_H
>  #define VFIO_PLATFORM_PRIVATE_H
>  
> +struct vfio_platform_region {
> +	u64			addr;
> +	resource_size_t		size;
> +	u32			flags;
> +};
> +
>  struct vfio_platform_device {
>  	struct platform_device		*pdev;
> +	struct vfio_platform_region	*region;
> +	u32				num_regions;
>  };
>  
>  #endif /* VFIO_PCI_PRIVATE_H */




^ permalink raw reply	[flat|nested] 92+ messages in thread

* Re: [RFC PATCH v4 05/10] VFIO_PLATFORM: Return info for device and its memory mapped IO regions
@ 2014-02-10 22:32     ` Alex Williamson
  0 siblings, 0 replies; 92+ messages in thread
From: Alex Williamson @ 2014-02-10 22:32 UTC (permalink / raw)
  To: Antonios Motakis
  Cc: Mark Rutland, B07421-KZfg59tc24xl57MIdRCFDg, Catalin Marinas,
	kvm-u79uwXL29TY76Z2rM5mHXA,
	gregkh-hQyY1W1yCW8ekmWlsbkhG0B+6BGkLq7r, agraf-l3A5Bk7waGM,
	R65777-KZfg59tc24xl57MIdRCFDg, will.deacon-5wv7dgnIgG8,
	a.rigo-lrHrjnjw1UfHK3s98zE1ajGjJy/sRE9J,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	B08248-KZfg59tc24xl57MIdRCFDg,
	iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	jan.kiszka-kv7WeFo6aLtBDgjK7y7TUQ,
	tech-lrHrjnjw1UfHK3s98zE1ajGjJy/sRE9J,
	kvmarm-FPEHb7Xf0XXUo1n7N8X6UoWGPAHP3yOg,
	christoffer.dall-QSEj5FYQhm4dnm+yROfE0A

On Sat, 2014-02-08 at 18:29 +0100, Antonios Motakis wrote:
> A VFIO userspace driver will start by opening the VFIO device
> that corresponds to an IOMMU group, and will use the ioctl interface
> to get the basic device info, such as number of memory regions and
> interrupts, and their properties.
> 
> This patch enables the IOCTLs:
>  - VFIO_DEVICE_GET_INFO
>  - VFIO_DEVICE_GET_REGION_INFO
> 
>  IRQ info is provided by one of the latter patches.
> 
> Signed-off-by: Antonios Motakis <a.motakis-lrHrjnjw1UfHK3s98zE1ajGjJy/sRE9J@public.gmane.org>
> ---
>  drivers/vfio/platform/vfio_platform.c         | 74 ++++++++++++++++++++++++---
>  drivers/vfio/platform/vfio_platform_private.h |  8 +++
>  2 files changed, 76 insertions(+), 6 deletions(-)
> 
> diff --git a/drivers/vfio/platform/vfio_platform.c b/drivers/vfio/platform/vfio_platform.c
> index a3d8f29..f7db5c0 100644
> --- a/drivers/vfio/platform/vfio_platform.c
> +++ b/drivers/vfio/platform/vfio_platform.c
> @@ -34,15 +34,62 @@
>  #define DRIVER_AUTHOR   "Antonios Motakis <a.motakis-lrHrjnjw1UfHK3s98zE1ajGjJy/sRE9J@public.gmane.org>"
>  #define DRIVER_DESC     "VFIO for platform devices - User Level meta-driver"
>  
> +static int vfio_platform_regions_init(struct vfio_platform_device *vdev)
> +{
> +	int cnt = 0, i;
> +
> +	while (platform_get_resource(vdev->pdev, IORESOURCE_MEM, cnt))
> +		cnt++;
> +
> +	vdev->num_regions = cnt;
> +
> +	vdev->region = kzalloc(sizeof(struct vfio_platform_region) * cnt,
> +				GFP_KERNEL);
> +	if (!vdev->region)
> +		return -ENOMEM;
> +
> +	for (i = 0; i < cnt;  i++) {
> +		struct vfio_platform_region region;
> +		struct resource *res =
> +			platform_get_resource(vdev->pdev, IORESOURCE_MEM, i);
> +
> +		region.addr = res->start;
> +		region.size = resource_size(res);
> +		region.flags = 0;
> +
> +		vdev->region[i] = region;
> +	}
> +
> +	return 0;
> +}
> +
> +static void vfio_platform_regions_cleanup(struct vfio_platform_device *vdev)
> +{
> +	kfree(vdev->region);
> +}
> +
>  static void vfio_platform_release(void *device_data)
>  {
> +	struct vfio_platform_device *vdev = device_data;
> +
> +	vfio_platform_regions_cleanup(vdev);
> +
>  	module_put(THIS_MODULE);
>  }
>  
>  static int vfio_platform_open(void *device_data)
>  {
> -	if (!try_module_get(THIS_MODULE))
> +	struct vfio_platform_device *vdev = device_data;
> +	int ret;
> +
> +	ret = vfio_platform_regions_init(vdev);
> +	if (ret)
> +		return ret;
> +
> +	if (!try_module_get(THIS_MODULE)) {
> +		vfio_platform_regions_cleanup(vdev);
>  		return -ENODEV;
> +	}
>  
>  	return 0;
>  }
> @@ -65,18 +112,33 @@ static long vfio_platform_ioctl(void *device_data,
>  			return -EINVAL;
>  
>  		info.flags = VFIO_DEVICE_FLAGS_PLATFORM;
> -		info.num_regions = 0;
> +		info.num_regions = vdev->num_regions;
>  		info.num_irqs = 0;
>  
>  		return copy_to_user((void __user *)arg, &info, minsz);
>  
> -	} else if (cmd == VFIO_DEVICE_GET_REGION_INFO)
> -		return -EINVAL;
> +	} else if (cmd == VFIO_DEVICE_GET_REGION_INFO) {
> +		struct vfio_region_info info;
> +
> +		minsz = offsetofend(struct vfio_region_info, offset);
> +
> +		if (copy_from_user(&info, (void __user *)arg, minsz))
> +			return -EFAULT;
> +
> +		if (info.argsz < minsz)
> +			return -EINVAL;

Missing a bounds check for info.index, user could be getting back kernel
data here.  Thanks,

Alex

> +
> +		/* map offset to the physical address  */
> +		info.offset = vdev->region[info.index].addr;
> +		info.size = vdev->region[info.index].size;
> +		info.flags = vdev->region[info.index].flags;
> +
> +		return copy_to_user((void __user *)arg, &info, minsz);
>  
> -	else if (cmd == VFIO_DEVICE_GET_IRQ_INFO)
> +	} else if (cmd == VFIO_DEVICE_GET_IRQ_INFO) {
>  		return -EINVAL;
>  
> -	else if (cmd == VFIO_DEVICE_SET_IRQS)
> +	} else if (cmd == VFIO_DEVICE_SET_IRQS)
>  		return -EINVAL;
>  
>  	else if (cmd == VFIO_DEVICE_RESET)
> diff --git a/drivers/vfio/platform/vfio_platform_private.h b/drivers/vfio/platform/vfio_platform_private.h
> index 6df8084..4705aa5 100644
> --- a/drivers/vfio/platform/vfio_platform_private.h
> +++ b/drivers/vfio/platform/vfio_platform_private.h
> @@ -15,8 +15,16 @@
>  #ifndef VFIO_PLATFORM_PRIVATE_H
>  #define VFIO_PLATFORM_PRIVATE_H
>  
> +struct vfio_platform_region {
> +	u64			addr;
> +	resource_size_t		size;
> +	u32			flags;
> +};
> +
>  struct vfio_platform_device {
>  	struct platform_device		*pdev;
> +	struct vfio_platform_region	*region;
> +	u32				num_regions;
>  };
>  
>  #endif /* VFIO_PCI_PRIVATE_H */

^ permalink raw reply	[flat|nested] 92+ messages in thread

* Re: [RFC PATCH v4 06/10] VFIO_PLATFORM: Read and write support for the device fd
@ 2014-02-10 22:45     ` Alex Williamson
  0 siblings, 0 replies; 92+ messages in thread
From: Alex Williamson @ 2014-02-10 22:45 UTC (permalink / raw)
  To: Antonios Motakis
  Cc: kvmarm, iommu, linux-kernel, gregkh, tech, a.rigo, B08248,
	kim.phillips, jan.kiszka, kvm, R65777, B07421, christoffer.dall,
	agraf, B16395, will.deacon

On Sat, 2014-02-08 at 18:29 +0100, Antonios Motakis wrote:
> VFIO returns a file descriptor which we can use to manipulate the memory
> regions of the device. Since some memory regions we cannot mmap due to
> security concerns, we also allow to read and write to this file descriptor
> directly.
> 
> Signed-off-by: Antonios Motakis <a.motakis@virtualopensystems.com>
> Tested-by: Alvise Rigo <a.rigo@virtualopensystems.com>
> ---
>  drivers/vfio/platform/vfio_platform.c | 128 +++++++++++++++++++++++++++++++++-
>  1 file changed, 125 insertions(+), 3 deletions(-)
> 
> diff --git a/drivers/vfio/platform/vfio_platform.c b/drivers/vfio/platform/vfio_platform.c
> index f7db5c0..ee96078 100644
> --- a/drivers/vfio/platform/vfio_platform.c
> +++ b/drivers/vfio/platform/vfio_platform.c
> @@ -55,7 +55,8 @@ static int vfio_platform_regions_init(struct vfio_platform_device *vdev)
>  
>  		region.addr = res->start;
>  		region.size = resource_size(res);
> -		region.flags = 0;
> +		region.flags = VFIO_REGION_INFO_FLAG_READ
> +				| VFIO_REGION_INFO_FLAG_WRITE;
>  
>  		vdev->region[i] = region;
>  	}
> @@ -150,13 +151,134 @@ static long vfio_platform_ioctl(void *device_data,
>  static ssize_t vfio_platform_read(void *device_data, char __user *buf,
>  			     size_t count, loff_t *ppos)
>  {
> -	return 0;
> +	struct vfio_platform_device *vdev = device_data;
> +	unsigned int *io;
> +	int i;
> +
> +	for (i = 0; i < vdev->num_regions; i++) {
> +		struct vfio_platform_region region = vdev->region[i];
> +		unsigned int done = 0;
> +		loff_t off;
> +
> +		if ((*ppos < region.addr)
> +		     || (*ppos + count - 1) >= (region.addr + region.size))
> +			continue;

Perhaps there's something to be said for vfio-pci's use of fixed offsets
to have a direct offset to index lookup.

> +
> +		io = ioremap_nocache(region.addr, region.size);

This must incur some overhead per access.

> +
> +		off = *ppos - region.addr;
> +
> +		while (count) {
> +			size_t filled;
> +
> +			if (count >= 4 && !(off % 4)) {
> +				u32 val;
> +
> +				val = ioread32(io + off);
> +				if (copy_to_user(buf, &val, 4))
> +					goto err;

For vfio-pci we've decided that these interfaces are always little
endian, have you considered whether it makes sense to do something
similar here?  Thanks,

Alex

> +
> +				filled = 4;
> +			} else if (count >= 2 && !(off % 2)) {
> +				u16 val;
> +
> +				val = ioread16(io + off);
> +				if (copy_to_user(buf, &val, 2))
> +					goto err;
> +
> +				filled = 2;
> +			} else {
> +				u8 val;
> +
> +				val = ioread8(io + off);
> +				if (copy_to_user(buf, &val, 1))
> +					goto err;
> +
> +				filled = 1;
> +			}
> +
> +
> +			count -= filled;
> +			done += filled;
> +			off += filled;
> +			buf += filled;
> +		}
> +
> +		iounmap(io);
> +		return done;
> +	}
> +
> +	return -EFAULT;
> +
> +err:
> +	iounmap(io);
> +	return -EFAULT;
>  }
>  
>  static ssize_t vfio_platform_write(void *device_data, const char __user *buf,
>  			      size_t count, loff_t *ppos)
>  {
> -	return 0;
> +	struct vfio_platform_device *vdev = device_data;
> +	unsigned int *io;
> +	int i;
> +
> +	for (i = 0; i < vdev->num_regions; i++) {
> +		struct vfio_platform_region region = vdev->region[i];
> +		unsigned int done = 0;
> +		loff_t off;
> +
> +		if ((*ppos < region.addr)
> +		     || (*ppos + count - 1) >= (region.addr + region.size))
> +			continue;
> +
> +		io = ioremap_nocache(region.addr, region.size);
> +
> +		off = *ppos - region.addr;
> +
> +		while (count) {
> +			size_t filled;
> +
> +			if (count >= 4 && !(off % 4)) {
> +				u32 val;
> +
> +				if (copy_from_user(&val, buf, 4))
> +					goto err;
> +				iowrite32(val, io + off);
> +
> +				filled = 4;
> +			} else if (count >= 2 && !(off % 2)) {
> +				u16 val;
> +
> +				if (copy_from_user(&val, buf, 2))
> +					goto err;
> +				iowrite16(val, io + off);
> +
> +				filled = 2;
> +			} else {
> +				u8 val;
> +
> +				if (copy_from_user(&val, buf, 1))
> +					goto err;
> +				iowrite8(val, io + off);
> +
> +				filled = 1;
> +			}
> +
> +			count -= filled;
> +			done += filled;
> +			off += filled;
> +			buf += filled;
> +		}
> +
> +		iounmap(io);
> +		return done;
> +	}
> +
> +	return -EINVAL;
> +
> +err:
> +	iounmap(io);
> +	return -EFAULT;
>  }
>  
>  static int vfio_platform_mmap(void *device_data, struct vm_area_struct *vma)




^ permalink raw reply	[flat|nested] 92+ messages in thread

* Re: [RFC PATCH v4 06/10] VFIO_PLATFORM: Read and write support for the device fd
@ 2014-02-10 22:45     ` Alex Williamson
  0 siblings, 0 replies; 92+ messages in thread
From: Alex Williamson @ 2014-02-10 22:45 UTC (permalink / raw)
  To: Antonios Motakis
  Cc: B07421-KZfg59tc24xl57MIdRCFDg, kvm-u79uwXL29TY76Z2rM5mHXA,
	gregkh-hQyY1W1yCW8ekmWlsbkhG0B+6BGkLq7r, agraf-l3A5Bk7waGM,
	R65777-KZfg59tc24xl57MIdRCFDg, will.deacon-5wv7dgnIgG8,
	a.rigo-lrHrjnjw1UfHK3s98zE1ajGjJy/sRE9J,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	B08248-KZfg59tc24xl57MIdRCFDg,
	iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	jan.kiszka-kv7WeFo6aLtBDgjK7y7TUQ,
	tech-lrHrjnjw1UfHK3s98zE1ajGjJy/sRE9J,
	kvmarm-FPEHb7Xf0XXUo1n7N8X6UoWGPAHP3yOg,
	christoffer.dall-QSEj5FYQhm4dnm+yROfE0A

On Sat, 2014-02-08 at 18:29 +0100, Antonios Motakis wrote:
> VFIO returns a file descriptor which we can use to manipulate the memory
> regions of the device. Since some memory regions we cannot mmap due to
> security concerns, we also allow to read and write to this file descriptor
> directly.
> 
> Signed-off-by: Antonios Motakis <a.motakis-lrHrjnjw1UfHK3s98zE1ajGjJy/sRE9J@public.gmane.org>
> Tested-by: Alvise Rigo <a.rigo-lrHrjnjw1UfHK3s98zE1ajGjJy/sRE9J@public.gmane.org>
> ---
>  drivers/vfio/platform/vfio_platform.c | 128 +++++++++++++++++++++++++++++++++-
>  1 file changed, 125 insertions(+), 3 deletions(-)
> 
> diff --git a/drivers/vfio/platform/vfio_platform.c b/drivers/vfio/platform/vfio_platform.c
> index f7db5c0..ee96078 100644
> --- a/drivers/vfio/platform/vfio_platform.c
> +++ b/drivers/vfio/platform/vfio_platform.c
> @@ -55,7 +55,8 @@ static int vfio_platform_regions_init(struct vfio_platform_device *vdev)
>  
>  		region.addr = res->start;
>  		region.size = resource_size(res);
> -		region.flags = 0;
> +		region.flags = VFIO_REGION_INFO_FLAG_READ
> +				| VFIO_REGION_INFO_FLAG_WRITE;
>  
>  		vdev->region[i] = region;
>  	}
> @@ -150,13 +151,134 @@ static long vfio_platform_ioctl(void *device_data,
>  static ssize_t vfio_platform_read(void *device_data, char __user *buf,
>  			     size_t count, loff_t *ppos)
>  {
> -	return 0;
> +	struct vfio_platform_device *vdev = device_data;
> +	unsigned int *io;
> +	int i;
> +
> +	for (i = 0; i < vdev->num_regions; i++) {
> +		struct vfio_platform_region region = vdev->region[i];
> +		unsigned int done = 0;
> +		loff_t off;
> +
> +		if ((*ppos < region.addr)
> +		     || (*ppos + count - 1) >= (region.addr + region.size))
> +			continue;

Perhaps there's something to be said for vfio-pci's use of fixed offsets
to have a direct offset to index lookup.

> +
> +		io = ioremap_nocache(region.addr, region.size);

This must incur some overhead per access.

> +
> +		off = *ppos - region.addr;
> +
> +		while (count) {
> +			size_t filled;
> +
> +			if (count >= 4 && !(off % 4)) {
> +				u32 val;
> +
> +				val = ioread32(io + off);
> +				if (copy_to_user(buf, &val, 4))
> +					goto err;

For vfio-pci we've decided that these interfaces are always little
endian, have you considered whether it makes sense to do something
similar here?  Thanks,

Alex

> +
> +				filled = 4;
> +			} else if (count >= 2 && !(off % 2)) {
> +				u16 val;
> +
> +				val = ioread16(io + off);
> +				if (copy_to_user(buf, &val, 2))
> +					goto err;
> +
> +				filled = 2;
> +			} else {
> +				u8 val;
> +
> +				val = ioread8(io + off);
> +				if (copy_to_user(buf, &val, 1))
> +					goto err;
> +
> +				filled = 1;
> +			}
> +
> +
> +			count -= filled;
> +			done += filled;
> +			off += filled;
> +			buf += filled;
> +		}
> +
> +		iounmap(io);
> +		return done;
> +	}
> +
> +	return -EFAULT;
> +
> +err:
> +	iounmap(io);
> +	return -EFAULT;
>  }
>  
>  static ssize_t vfio_platform_write(void *device_data, const char __user *buf,
>  			      size_t count, loff_t *ppos)
>  {
> -	return 0;
> +	struct vfio_platform_device *vdev = device_data;
> +	unsigned int *io;
> +	int i;
> +
> +	for (i = 0; i < vdev->num_regions; i++) {
> +		struct vfio_platform_region region = vdev->region[i];
> +		unsigned int done = 0;
> +		loff_t off;
> +
> +		if ((*ppos < region.addr)
> +		     || (*ppos + count - 1) >= (region.addr + region.size))
> +			continue;
> +
> +		io = ioremap_nocache(region.addr, region.size);
> +
> +		off = *ppos - region.addr;
> +
> +		while (count) {
> +			size_t filled;
> +
> +			if (count >= 4 && !(off % 4)) {
> +				u32 val;
> +
> +				if (copy_from_user(&val, buf, 4))
> +					goto err;
> +				iowrite32(val, io + off);
> +
> +				filled = 4;
> +			} else if (count >= 2 && !(off % 2)) {
> +				u16 val;
> +
> +				if (copy_from_user(&val, buf, 2))
> +					goto err;
> +				iowrite16(val, io + off);
> +
> +				filled = 2;
> +			} else {
> +				u8 val;
> +
> +				if (copy_from_user(&val, buf, 1))
> +					goto err;
> +				iowrite8(val, io + off);
> +
> +				filled = 1;
> +			}
> +
> +			count -= filled;
> +			done += filled;
> +			off += filled;
> +			buf += filled;
> +		}
> +
> +		iounmap(io);
> +		return done;
> +	}
> +
> +	return -EINVAL;
> +
> +err:
> +	iounmap(io);
> +	return -EFAULT;
>  }
>  
>  static int vfio_platform_mmap(void *device_data, struct vm_area_struct *vma)

^ permalink raw reply	[flat|nested] 92+ messages in thread

* Re: [RFC PATCH v4 06/10] VFIO_PLATFORM: Read and write support for the device fd
@ 2014-02-10 23:12       ` Scott Wood
  0 siblings, 0 replies; 92+ messages in thread
From: Scott Wood @ 2014-02-10 23:12 UTC (permalink / raw)
  To: Alex Williamson
  Cc: Antonios Motakis, kvmarm, iommu, linux-kernel, gregkh, tech,
	a.rigo, B08248, kim.phillips, jan.kiszka, kvm, R65777, B07421,
	christoffer.dall, agraf, B16395, will.deacon

On Mon, 2014-02-10 at 15:45 -0700, Alex Williamson wrote:
> On Sat, 2014-02-08 at 18:29 +0100, Antonios Motakis wrote:
> > VFIO returns a file descriptor which we can use to manipulate the memory
> > regions of the device. Since some memory regions we cannot mmap due to
> > security concerns, we also allow to read and write to this file descriptor
> > directly.
> > 
> > Signed-off-by: Antonios Motakis <a.motakis@virtualopensystems.com>
> > Tested-by: Alvise Rigo <a.rigo@virtualopensystems.com>
> > ---
> >  drivers/vfio/platform/vfio_platform.c | 128 +++++++++++++++++++++++++++++++++-
> >  1 file changed, 125 insertions(+), 3 deletions(-)
> > 
> > diff --git a/drivers/vfio/platform/vfio_platform.c b/drivers/vfio/platform/vfio_platform.c
> > index f7db5c0..ee96078 100644
> > --- a/drivers/vfio/platform/vfio_platform.c
> > +++ b/drivers/vfio/platform/vfio_platform.c
> > @@ -55,7 +55,8 @@ static int vfio_platform_regions_init(struct vfio_platform_device *vdev)
> >  
> >  		region.addr = res->start;
> >  		region.size = resource_size(res);
> > -		region.flags = 0;
> > +		region.flags = VFIO_REGION_INFO_FLAG_READ
> > +				| VFIO_REGION_INFO_FLAG_WRITE;
> >  
> >  		vdev->region[i] = region;
> >  	}
> > @@ -150,13 +151,134 @@ static long vfio_platform_ioctl(void *device_data,
> >  static ssize_t vfio_platform_read(void *device_data, char __user *buf,
> >  			     size_t count, loff_t *ppos)
> >  {
> > -	return 0;
> > +	struct vfio_platform_device *vdev = device_data;
> > +	unsigned int *io;
> > +	int i;
> > +
> > +	for (i = 0; i < vdev->num_regions; i++) {
> > +		struct vfio_platform_region region = vdev->region[i];
> > +		unsigned int done = 0;
> > +		loff_t off;
> > +
> > +		if ((*ppos < region.addr)
> > +		     || (*ppos + count - 1) >= (region.addr + region.size))
> > +			continue;
> 
> Perhaps there's something to be said for vfio-pci's use of fixed offsets
> to have a direct offset to index lookup.
> 
> > +
> > +		io = ioremap_nocache(region.addr, region.size);
> 
> This must incur some overhead per access.

There's mmap() if you want fast...  Given the limited ioremap space on
32-bit, I can see not wanting to map everything that the user has open
all the time -- but in that case, wouldn't it be better to just map one
page here rather than the whole region?

> > +
> > +		off = *ppos - region.addr;
> > +
> > +		while (count) {
> > +			size_t filled;
> > +
> > +			if (count >= 4 && !(off % 4)) {
> > +				u32 val;
> > +
> > +				val = ioread32(io + off);
> > +				if (copy_to_user(buf, &val, 4))
> > +					goto err;
> 
> For vfio-pci we've decided that these interfaces are always little
> endian, have you considered whether it makes sense to do something
> similar here?  Thanks,

ioread32() is little endian -- but since read() puts its result in the
caller's memory buffer (rather than a register return), I think it makes
more sense to preserve byte-invariance -- similar to the conclusion of
the recent KVM MMIO API clarification discussion.  Then the VFIO user
would use the same type of access (byte swapped or not) to access the
read() buffer that they would have used to access the register directly.

Forcing little endian is a better fit for PCI (which is inherently
little endian) than for platform devices which can be either endianness.

-Scott



^ permalink raw reply	[flat|nested] 92+ messages in thread

* Re: [RFC PATCH v4 06/10] VFIO_PLATFORM: Read and write support for the device fd
@ 2014-02-10 23:12       ` Scott Wood
  0 siblings, 0 replies; 92+ messages in thread
From: Scott Wood @ 2014-02-10 23:12 UTC (permalink / raw)
  To: Alex Williamson
  Cc: B07421-KZfg59tc24xl57MIdRCFDg, kvm-u79uwXL29TY76Z2rM5mHXA,
	gregkh-hQyY1W1yCW8ekmWlsbkhG0B+6BGkLq7r, agraf-l3A5Bk7waGM,
	R65777-KZfg59tc24xl57MIdRCFDg, will.deacon-5wv7dgnIgG8,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	a.rigo-lrHrjnjw1UfHK3s98zE1ajGjJy/sRE9J,
	B08248-KZfg59tc24xl57MIdRCFDg,
	iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	jan.kiszka-kv7WeFo6aLtBDgjK7y7TUQ, Antonios Motakis,
	tech-lrHrjnjw1UfHK3s98zE1ajGjJy/sRE9J,
	kvmarm-FPEHb7Xf0XXUo1n7N8X6UoWGPAHP3yOg,
	christoffer.dall-QSEj5FYQhm4dnm+yROfE0A

On Mon, 2014-02-10 at 15:45 -0700, Alex Williamson wrote:
> On Sat, 2014-02-08 at 18:29 +0100, Antonios Motakis wrote:
> > VFIO returns a file descriptor which we can use to manipulate the memory
> > regions of the device. Since some memory regions we cannot mmap due to
> > security concerns, we also allow to read and write to this file descriptor
> > directly.
> > 
> > Signed-off-by: Antonios Motakis <a.motakis-lrHrjnjw1UfHK3s98zE1ajGjJy/sRE9J@public.gmane.org>
> > Tested-by: Alvise Rigo <a.rigo-lrHrjnjw1UfHK3s98zE1ajGjJy/sRE9J@public.gmane.org>
> > ---
> >  drivers/vfio/platform/vfio_platform.c | 128 +++++++++++++++++++++++++++++++++-
> >  1 file changed, 125 insertions(+), 3 deletions(-)
> > 
> > diff --git a/drivers/vfio/platform/vfio_platform.c b/drivers/vfio/platform/vfio_platform.c
> > index f7db5c0..ee96078 100644
> > --- a/drivers/vfio/platform/vfio_platform.c
> > +++ b/drivers/vfio/platform/vfio_platform.c
> > @@ -55,7 +55,8 @@ static int vfio_platform_regions_init(struct vfio_platform_device *vdev)
> >  
> >  		region.addr = res->start;
> >  		region.size = resource_size(res);
> > -		region.flags = 0;
> > +		region.flags = VFIO_REGION_INFO_FLAG_READ
> > +				| VFIO_REGION_INFO_FLAG_WRITE;
> >  
> >  		vdev->region[i] = region;
> >  	}
> > @@ -150,13 +151,134 @@ static long vfio_platform_ioctl(void *device_data,
> >  static ssize_t vfio_platform_read(void *device_data, char __user *buf,
> >  			     size_t count, loff_t *ppos)
> >  {
> > -	return 0;
> > +	struct vfio_platform_device *vdev = device_data;
> > +	unsigned int *io;
> > +	int i;
> > +
> > +	for (i = 0; i < vdev->num_regions; i++) {
> > +		struct vfio_platform_region region = vdev->region[i];
> > +		unsigned int done = 0;
> > +		loff_t off;
> > +
> > +		if ((*ppos < region.addr)
> > +		     || (*ppos + count - 1) >= (region.addr + region.size))
> > +			continue;
> 
> Perhaps there's something to be said for vfio-pci's use of fixed offsets
> to have a direct offset to index lookup.
> 
> > +
> > +		io = ioremap_nocache(region.addr, region.size);
> 
> This must incur some overhead per access.

There's mmap() if you want fast...  Given the limited ioremap space on
32-bit, I can see not wanting to map everything that the user has open
all the time -- but in that case, wouldn't it be better to just map one
page here rather than the whole region?

> > +
> > +		off = *ppos - region.addr;
> > +
> > +		while (count) {
> > +			size_t filled;
> > +
> > +			if (count >= 4 && !(off % 4)) {
> > +				u32 val;
> > +
> > +				val = ioread32(io + off);
> > +				if (copy_to_user(buf, &val, 4))
> > +					goto err;
> 
> For vfio-pci we've decided that these interfaces are always little
> endian, have you considered whether it makes sense to do something
> similar here?  Thanks,

ioread32() is little endian -- but since read() puts its result in the
caller's memory buffer (rather than a register return), I think it makes
more sense to preserve byte-invariance -- similar to the conclusion of
the recent KVM MMIO API clarification discussion.  Then the VFIO user
would use the same type of access (byte swapped or not) to access the
read() buffer that they would have used to access the register directly.

Forcing little endian is a better fit for PCI (which is inherently
little endian) than for platform devices which can be either endianness.

-Scott

^ permalink raw reply	[flat|nested] 92+ messages in thread

* Re: [RFC PATCH v4 06/10] VFIO_PLATFORM: Read and write support for the device fd
@ 2014-02-10 23:20         ` Alex Williamson
  0 siblings, 0 replies; 92+ messages in thread
From: Alex Williamson @ 2014-02-10 23:20 UTC (permalink / raw)
  To: Scott Wood
  Cc: Antonios Motakis, kvmarm, iommu, linux-kernel, gregkh, tech,
	a.rigo, B08248, kim.phillips, jan.kiszka, kvm, R65777, B07421,
	christoffer.dall, agraf, B16395, will.deacon

On Mon, 2014-02-10 at 17:12 -0600, Scott Wood wrote:
> On Mon, 2014-02-10 at 15:45 -0700, Alex Williamson wrote:
> > On Sat, 2014-02-08 at 18:29 +0100, Antonios Motakis wrote:
> > > VFIO returns a file descriptor which we can use to manipulate the memory
> > > regions of the device. Since some memory regions we cannot mmap due to
> > > security concerns, we also allow to read and write to this file descriptor
> > > directly.
> > > 
> > > Signed-off-by: Antonios Motakis <a.motakis@virtualopensystems.com>
> > > Tested-by: Alvise Rigo <a.rigo@virtualopensystems.com>
> > > ---
> > >  drivers/vfio/platform/vfio_platform.c | 128 +++++++++++++++++++++++++++++++++-
> > >  1 file changed, 125 insertions(+), 3 deletions(-)
> > > 
> > > diff --git a/drivers/vfio/platform/vfio_platform.c b/drivers/vfio/platform/vfio_platform.c
> > > index f7db5c0..ee96078 100644
> > > --- a/drivers/vfio/platform/vfio_platform.c
> > > +++ b/drivers/vfio/platform/vfio_platform.c
> > > @@ -55,7 +55,8 @@ static int vfio_platform_regions_init(struct vfio_platform_device *vdev)
> > >  
> > >  		region.addr = res->start;
> > >  		region.size = resource_size(res);
> > > -		region.flags = 0;
> > > +		region.flags = VFIO_REGION_INFO_FLAG_READ
> > > +				| VFIO_REGION_INFO_FLAG_WRITE;
> > >  
> > >  		vdev->region[i] = region;
> > >  	}
> > > @@ -150,13 +151,134 @@ static long vfio_platform_ioctl(void *device_data,
> > >  static ssize_t vfio_platform_read(void *device_data, char __user *buf,
> > >  			     size_t count, loff_t *ppos)
> > >  {
> > > -	return 0;
> > > +	struct vfio_platform_device *vdev = device_data;
> > > +	unsigned int *io;
> > > +	int i;
> > > +
> > > +	for (i = 0; i < vdev->num_regions; i++) {
> > > +		struct vfio_platform_region region = vdev->region[i];
> > > +		unsigned int done = 0;
> > > +		loff_t off;
> > > +
> > > +		if ((*ppos < region.addr)
> > > +		     || (*ppos + count - 1) >= (region.addr + region.size))
> > > +			continue;
> > 
> > Perhaps there's something to be said for vfio-pci's use of fixed offsets
> > to have a direct offset to index lookup.
> > 
> > > +
> > > +		io = ioremap_nocache(region.addr, region.size);
> > 
> > This must incur some overhead per access.
> 
> There's mmap() if you want fast...  Given the limited ioremap space on
> 32-bit, I can see not wanting to map everything that the user has open
> all the time -- but in that case, wouldn't it be better to just map one
> page here rather than the whole region?
> 
> > > +
> > > +		off = *ppos - region.addr;
> > > +
> > > +		while (count) {
> > > +			size_t filled;
> > > +
> > > +			if (count >= 4 && !(off % 4)) {
> > > +				u32 val;
> > > +
> > > +				val = ioread32(io + off);
> > > +				if (copy_to_user(buf, &val, 4))
> > > +					goto err;
> > 
> > For vfio-pci we've decided that these interfaces are always little
> > endian, have you considered whether it makes sense to do something
> > similar here?  Thanks,
> 
> ioread32() is little endian -- but since read() puts its result in the
> caller's memory buffer (rather than a register return), I think it makes
> more sense to preserve byte-invariance -- similar to the conclusion of
> the recent KVM MMIO API clarification discussion.  Then the VFIO user
> would use the same type of access (byte swapped or not) to access the
> read() buffer that they would have used to access the register directly.
> 
> Forcing little endian is a better fit for PCI (which is inherently
> little endian) than for platform devices which can be either endianness.

Ok, works for me.  Thanks,

Alex




^ permalink raw reply	[flat|nested] 92+ messages in thread

* Re: [RFC PATCH v4 06/10] VFIO_PLATFORM: Read and write support for the device fd
@ 2014-02-10 23:20         ` Alex Williamson
  0 siblings, 0 replies; 92+ messages in thread
From: Alex Williamson @ 2014-02-10 23:20 UTC (permalink / raw)
  To: Scott Wood
  Cc: B07421-KZfg59tc24xl57MIdRCFDg, kvm-u79uwXL29TY76Z2rM5mHXA,
	gregkh-hQyY1W1yCW8ekmWlsbkhG0B+6BGkLq7r, agraf-l3A5Bk7waGM,
	R65777-KZfg59tc24xl57MIdRCFDg, will.deacon-5wv7dgnIgG8,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	a.rigo-lrHrjnjw1UfHK3s98zE1ajGjJy/sRE9J,
	B08248-KZfg59tc24xl57MIdRCFDg,
	iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	jan.kiszka-kv7WeFo6aLtBDgjK7y7TUQ, Antonios Motakis,
	tech-lrHrjnjw1UfHK3s98zE1ajGjJy/sRE9J,
	kvmarm-FPEHb7Xf0XXUo1n7N8X6UoWGPAHP3yOg,
	christoffer.dall-QSEj5FYQhm4dnm+yROfE0A

On Mon, 2014-02-10 at 17:12 -0600, Scott Wood wrote:
> On Mon, 2014-02-10 at 15:45 -0700, Alex Williamson wrote:
> > On Sat, 2014-02-08 at 18:29 +0100, Antonios Motakis wrote:
> > > VFIO returns a file descriptor which we can use to manipulate the memory
> > > regions of the device. Since some memory regions we cannot mmap due to
> > > security concerns, we also allow to read and write to this file descriptor
> > > directly.
> > > 
> > > Signed-off-by: Antonios Motakis <a.motakis-lrHrjnjw1UfHK3s98zE1ajGjJy/sRE9J@public.gmane.org>
> > > Tested-by: Alvise Rigo <a.rigo-lrHrjnjw1UfHK3s98zE1ajGjJy/sRE9J@public.gmane.org>
> > > ---
> > >  drivers/vfio/platform/vfio_platform.c | 128 +++++++++++++++++++++++++++++++++-
> > >  1 file changed, 125 insertions(+), 3 deletions(-)
> > > 
> > > diff --git a/drivers/vfio/platform/vfio_platform.c b/drivers/vfio/platform/vfio_platform.c
> > > index f7db5c0..ee96078 100644
> > > --- a/drivers/vfio/platform/vfio_platform.c
> > > +++ b/drivers/vfio/platform/vfio_platform.c
> > > @@ -55,7 +55,8 @@ static int vfio_platform_regions_init(struct vfio_platform_device *vdev)
> > >  
> > >  		region.addr = res->start;
> > >  		region.size = resource_size(res);
> > > -		region.flags = 0;
> > > +		region.flags = VFIO_REGION_INFO_FLAG_READ
> > > +				| VFIO_REGION_INFO_FLAG_WRITE;
> > >  
> > >  		vdev->region[i] = region;
> > >  	}
> > > @@ -150,13 +151,134 @@ static long vfio_platform_ioctl(void *device_data,
> > >  static ssize_t vfio_platform_read(void *device_data, char __user *buf,
> > >  			     size_t count, loff_t *ppos)
> > >  {
> > > -	return 0;
> > > +	struct vfio_platform_device *vdev = device_data;
> > > +	unsigned int *io;
> > > +	int i;
> > > +
> > > +	for (i = 0; i < vdev->num_regions; i++) {
> > > +		struct vfio_platform_region region = vdev->region[i];
> > > +		unsigned int done = 0;
> > > +		loff_t off;
> > > +
> > > +		if ((*ppos < region.addr)
> > > +		     || (*ppos + count - 1) >= (region.addr + region.size))
> > > +			continue;
> > 
> > Perhaps there's something to be said for vfio-pci's use of fixed offsets
> > to have a direct offset to index lookup.
> > 
> > > +
> > > +		io = ioremap_nocache(region.addr, region.size);
> > 
> > This must incur some overhead per access.
> 
> There's mmap() if you want fast...  Given the limited ioremap space on
> 32-bit, I can see not wanting to map everything that the user has open
> all the time -- but in that case, wouldn't it be better to just map one
> page here rather than the whole region?
> 
> > > +
> > > +		off = *ppos - region.addr;
> > > +
> > > +		while (count) {
> > > +			size_t filled;
> > > +
> > > +			if (count >= 4 && !(off % 4)) {
> > > +				u32 val;
> > > +
> > > +				val = ioread32(io + off);
> > > +				if (copy_to_user(buf, &val, 4))
> > > +					goto err;
> > 
> > For vfio-pci we've decided that these interfaces are always little
> > endian, have you considered whether it makes sense to do something
> > similar here?  Thanks,
> 
> ioread32() is little endian -- but since read() puts its result in the
> caller's memory buffer (rather than a register return), I think it makes
> more sense to preserve byte-invariance -- similar to the conclusion of
> the recent KVM MMIO API clarification discussion.  Then the VFIO user
> would use the same type of access (byte swapped or not) to access the
> read() buffer that they would have used to access the register directly.
> 
> Forcing little endian is a better fit for PCI (which is inherently
> little endian) than for platform devices which can be either endianness.

Ok, works for me.  Thanks,

Alex

^ permalink raw reply	[flat|nested] 92+ messages in thread

* Re: [RFC PATCH v4 01/10] driver core: export driver_probe_device()
@ 2014-02-14 22:27     ` Greg KH
  0 siblings, 0 replies; 92+ messages in thread
From: Greg KH @ 2014-02-14 22:27 UTC (permalink / raw)
  To: Antonios Motakis
  Cc: alex.williamson, kvmarm, iommu, linux-kernel, tech, a.rigo,
	B08248, kim.phillips, jan.kiszka, kvm, R65777, B07421,
	christoffer.dall, agraf, B16395, will.deacon, Tejun Heo,
	Rafael J. Wysocki, Guenter Roeck, Toshi Kani, Joe Perches,
	Dmitry Kasatkin, Michal Hocko, Bjorn Helgaas

On Sat, Feb 08, 2014 at 06:29:31PM +0100, Antonios Motakis wrote:
> From: Kim Phillips <kim.phillips@linaro.org>
> 
> Needed by drivers, such as the vfio platform driver [1], seeking to
> bypass bind_store()'s driver_match_device(), and bind to any device
> via a private sysfs bind file.
> 
> [1] https://lkml.org/lkml/2013/12/11/522
> 
> note: the EXPORT_SYMBOL is needed because vfio-platform can be built
> as a module.

No code outside of drivers/base/ should be calling this function, you
are doing something wrong in your bus if you want to do this, please fix
your bus code.

sorry, I can't accept this at all.

greg k-h

^ permalink raw reply	[flat|nested] 92+ messages in thread

* Re: [RFC PATCH v4 01/10] driver core: export driver_probe_device()
@ 2014-02-14 22:27     ` Greg KH
  0 siblings, 0 replies; 92+ messages in thread
From: Greg KH @ 2014-02-14 22:27 UTC (permalink / raw)
  To: Antonios Motakis
  Cc: B07421-KZfg59tc24xl57MIdRCFDg, kvm-u79uwXL29TY76Z2rM5mHXA,
	jan.kiszka-kv7WeFo6aLtBDgjK7y7TUQ, will.deacon-5wv7dgnIgG8,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA, Michal Hocko,
	kvmarm-FPEHb7Xf0XXUo1n7N8X6UoWGPAHP3yOg, Rafael J. Wysocki,
	agraf-l3A5Bk7waGM, R65777-KZfg59tc24xl57MIdRCFDg, Guenter Roeck,
	Dmitry Kasatkin, B08248-KZfg59tc24xl57MIdRCFDg, Bjorn Helgaas,
	tech-lrHrjnjw1UfHK3s98zE1ajGjJy/sRE9J, Toshi Kani,
	a.rigo-lrHrjnjw1UfHK3s98zE1ajGjJy/sRE9J,
	iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA, Joe Perches,
	Tejun Heo, christoffer.dall-QSEj5FYQhm4dnm+yROfE0A

On Sat, Feb 08, 2014 at 06:29:31PM +0100, Antonios Motakis wrote:
> From: Kim Phillips <kim.phillips-QSEj5FYQhm4dnm+yROfE0A@public.gmane.org>
> 
> Needed by drivers, such as the vfio platform driver [1], seeking to
> bypass bind_store()'s driver_match_device(), and bind to any device
> via a private sysfs bind file.
> 
> [1] https://lkml.org/lkml/2013/12/11/522
> 
> note: the EXPORT_SYMBOL is needed because vfio-platform can be built
> as a module.

No code outside of drivers/base/ should be calling this function, you
are doing something wrong in your bus if you want to do this, please fix
your bus code.

sorry, I can't accept this at all.

greg k-h

^ permalink raw reply	[flat|nested] 92+ messages in thread

* RE: [RFC PATCH v4 01/10] driver core: export driver_probe_device()
       [not found]     ` <20140214222716.GA11838-U8xfFu+wG4EAvxtiuMwx3w@public.gmane.org>
@ 2014-02-14 23:00       ` Stuart Yoder
       [not found]         ` <ba7597fd8c9f4d91bbccfb42e31a165e-ufbTtyGzTTT8GZusEWM6WuO6mTEJWrR4XA4E9RH9d+qIuWR1G4zioA@public.gmane.org>
  0 siblings, 1 reply; 92+ messages in thread
From: Stuart Yoder @ 2014-02-14 23:00 UTC (permalink / raw)
  To: Greg KH, Antonios Motakis
  Cc: kvm-u79uwXL29TY76Z2rM5mHXA, jan.kiszka-kv7WeFo6aLtBDgjK7y7TUQ,
	will.deacon-5wv7dgnIgG8, linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	Michal Hocko, Bjorn Helgaas, Varun Sethi,
	kvmarm-FPEHb7Xf0XXUo1n7N8X6UoWGPAHP3yOg, Rafael J. Wysocki,
	agraf-l3A5Bk7waGM, Guenter Roeck, Dmitry Kasatkin, Tejun Heo,
	Scott Wood, tech-lrHrjnjw1UfHK3s98zE1ajGjJy/sRE9J, Toshi Kani,
	a.rigo-lrHrjnjw1UfHK3s98zE1ajGjJy/sRE9J,
	iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA, Joe Perches,



> -----Original Message-----
> From: Greg KH [mailto:gregkh-hQyY1W1yCW8ekmWlsbkhG0B+6BGkLq7r@public.gmane.org]
> Sent: Friday, February 14, 2014 4:27 PM
> To: Antonios Motakis
> Cc: alex.williamson-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org; kvmarm-FPEHb7Xf0XXUo1n7N8X6UoWGPAHP3yOg@public.gmane.org;
> iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA@public.gmane.org; linux-kernel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org;
> tech-lrHrjnjw1UfHK3s98zE1ajGjJy/sRE9J@public.gmane.org; a.rigo-lrHrjnjw1UfHK3s98zE1ajGjJy/sRE9J@public.gmane.org; Yoder Stuart-
> B08248; kim.phillips-QSEj5FYQhm4dnm+yROfE0A@public.gmane.org; jan.kiszka-kv7WeFo6aLtBDgjK7y7TUQ@public.gmane.org;
> kvm-u79uwXL29TY76Z2rM5mHXA@public.gmane.org; Bhushan Bharat-R65777; Wood Scott-B07421;
> christoffer.dall-QSEj5FYQhm4dnm+yROfE0A@public.gmane.org; agraf-l3A5Bk7waGM@public.gmane.org; Sethi Varun-B16395;
> will.deacon-5wv7dgnIgG8@public.gmane.org; Tejun Heo; Rafael J. Wysocki; Guenter Roeck; Toshi
> Kani; Joe Perches; Dmitry Kasatkin; Michal Hocko; Bjorn Helgaas
> Subject: Re: [RFC PATCH v4 01/10] driver core: export
> driver_probe_device()
> 
> On Sat, Feb 08, 2014 at 06:29:31PM +0100, Antonios Motakis wrote:
> > From: Kim Phillips <kim.phillips-QSEj5FYQhm4dnm+yROfE0A@public.gmane.org>
> >
> > Needed by drivers, such as the vfio platform driver [1], seeking to
> > bypass bind_store()'s driver_match_device(), and bind to any device
> > via a private sysfs bind file.
> >
> > [1] https://lkml.org/lkml/2013/12/11/522
> >
> > note: the EXPORT_SYMBOL is needed because vfio-platform can be built
> > as a module.
> 
> No code outside of drivers/base/ should be calling this function

Why?  driver_probe_device() allows a driver to explicitly bind
to a specific device.   What is conceptually wrong with allowing
that?

Thanks,
Stuart

^ permalink raw reply	[flat|nested] 92+ messages in thread

* Re: [RFC PATCH v4 01/10] driver core: export driver_probe_device()
       [not found]         ` <ba7597fd8c9f4d91bbccfb42e31a165e-ufbTtyGzTTT8GZusEWM6WuO6mTEJWrR4XA4E9RH9d+qIuWR1G4zioA@public.gmane.org>
@ 2014-02-15  2:47             ` Greg KH
  0 siblings, 0 replies; 92+ messages in thread
From: Greg KH @ 2014-02-15  2:47 UTC (permalink / raw)
  To: Stuart Yoder
  Cc: kvm-u79uwXL29TY76Z2rM5mHXA, jan.kiszka-kv7WeFo6aLtBDgjK7y7TUQ,
	will.deacon-5wv7dgnIgG8, linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	Bjorn Helgaas, Varun Sethi,
	kvmarm-FPEHb7Xf0XXUo1n7N8X6UoWGPAHP3yOg, Rafael J. Wysocki,
	agraf-l3A5Bk7waGM, Guenter Roeck, Dmitry Kasatkin, Tejun Heo,
	Scott Wood, Antonios Motakis,
	tech-lrHrjnjw1UfHK3s98zE1ajGjJy/sRE9J, Michal Hocko, Toshi Kani,
	a.rigo-lrHrjnjw1UfHK3s98zE1ajGjJy/sRE9J,
	iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA

On Fri, Feb 14, 2014 at 11:00:31PM +0000, Stuart Yoder wrote:
> 
> 
> > -----Original Message-----
> > From: Greg KH [mailto:gregkh-hQyY1W1yCW8ekmWlsbkhG0B+6BGkLq7r@public.gmane.org]
> > Sent: Friday, February 14, 2014 4:27 PM
> > To: Antonios Motakis
> > Cc: alex.williamson-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org; kvmarm-FPEHb7Xf0XXUo1n7N8X6UoWGPAHP3yOg@public.gmane.org;
> > iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA@public.gmane.org; linux-kernel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org;
> > tech-lrHrjnjw1UfHK3s98zE1ajGjJy/sRE9J@public.gmane.org; a.rigo-lrHrjnjw1UfHK3s98zE1ajGjJy/sRE9J@public.gmane.org; Yoder Stuart-
> > B08248; kim.phillips-QSEj5FYQhm4dnm+yROfE0A@public.gmane.org; jan.kiszka-kv7WeFo6aLtBDgjK7y7TUQ@public.gmane.org;
> > kvm-u79uwXL29TY76Z2rM5mHXA@public.gmane.org; Bhushan Bharat-R65777; Wood Scott-B07421;
> > christoffer.dall-QSEj5FYQhm4dnm+yROfE0A@public.gmane.org; agraf-l3A5Bk7waGM@public.gmane.org; Sethi Varun-B16395;
> > will.deacon-5wv7dgnIgG8@public.gmane.org; Tejun Heo; Rafael J. Wysocki; Guenter Roeck; Toshi
> > Kani; Joe Perches; Dmitry Kasatkin; Michal Hocko; Bjorn Helgaas
> > Subject: Re: [RFC PATCH v4 01/10] driver core: export
> > driver_probe_device()
> > 
> > On Sat, Feb 08, 2014 at 06:29:31PM +0100, Antonios Motakis wrote:
> > > From: Kim Phillips <kim.phillips-QSEj5FYQhm4dnm+yROfE0A@public.gmane.org>
> > >
> > > Needed by drivers, such as the vfio platform driver [1], seeking to
> > > bypass bind_store()'s driver_match_device(), and bind to any device
> > > via a private sysfs bind file.
> > >
> > > [1] https://lkml.org/lkml/2013/12/11/522
> > >
> > > note: the EXPORT_SYMBOL is needed because vfio-platform can be built
> > > as a module.
> > 
> > No code outside of drivers/base/ should be calling this function
> 
> Why?  driver_probe_device() allows a driver to explicitly bind
> to a specific device.   What is conceptually wrong with allowing
> that?

Because that's not how a bus should work, and the fact that no other
subsystem in the kernel does that might be a hint you are trying to do
something a bit "wrong" here.

greg k-h

^ permalink raw reply	[flat|nested] 92+ messages in thread

* Re: [RFC PATCH v4 01/10] driver core: export driver_probe_device()
@ 2014-02-15  2:47             ` Greg KH
  0 siblings, 0 replies; 92+ messages in thread
From: Greg KH @ 2014-02-15  2:47 UTC (permalink / raw)
  To: Stuart Yoder
  Cc: kvm-u79uwXL29TY76Z2rM5mHXA, jan.kiszka-kv7WeFo6aLtBDgjK7y7TUQ,
	will.deacon-5wv7dgnIgG8, linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	Bjorn Helgaas, Varun Sethi,
	kvmarm-FPEHb7Xf0XXUo1n7N8X6UoWGPAHP3yOg, Rafael J. Wysocki,
	agraf-l3A5Bk7waGM, Guenter Roeck, Dmitry Kasatkin, Tejun Heo,
	Scott Wood, Antonios Motakis,
	tech-lrHrjnjw1UfHK3s98zE1ajGjJy/sRE9J, Michal Hocko, Toshi Kani,
	a.rigo-lrHrjnjw1UfHK3s98zE1ajGjJy/sRE9J,
	iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA

On Fri, Feb 14, 2014 at 11:00:31PM +0000, Stuart Yoder wrote:
> 
> 
> > -----Original Message-----
> > From: Greg KH [mailto:gregkh-hQyY1W1yCW8ekmWlsbkhG0B+6BGkLq7r@public.gmane.org]
> > Sent: Friday, February 14, 2014 4:27 PM
> > To: Antonios Motakis
> > Cc: alex.williamson-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org; kvmarm-FPEHb7Xf0XXUo1n7N8X6UoWGPAHP3yOg@public.gmane.org;
> > iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA@public.gmane.org; linux-kernel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org;
> > tech-lrHrjnjw1UfHK3s98zE1ajGjJy/sRE9J@public.gmane.org; a.rigo-lrHrjnjw1UfHK3s98zE1ajGjJy/sRE9J@public.gmane.org; Yoder Stuart-
> > B08248; kim.phillips-QSEj5FYQhm4dnm+yROfE0A@public.gmane.org; jan.kiszka-kv7WeFo6aLtBDgjK7y7TUQ@public.gmane.org;
> > kvm-u79uwXL29TY76Z2rM5mHXA@public.gmane.org; Bhushan Bharat-R65777; Wood Scott-B07421;
> > christoffer.dall-QSEj5FYQhm4dnm+yROfE0A@public.gmane.org; agraf-l3A5Bk7waGM@public.gmane.org; Sethi Varun-B16395;
> > will.deacon-5wv7dgnIgG8@public.gmane.org; Tejun Heo; Rafael J. Wysocki; Guenter Roeck; Toshi
> > Kani; Joe Perches; Dmitry Kasatkin; Michal Hocko; Bjorn Helgaas
> > Subject: Re: [RFC PATCH v4 01/10] driver core: export
> > driver_probe_device()
> > 
> > On Sat, Feb 08, 2014 at 06:29:31PM +0100, Antonios Motakis wrote:
> > > From: Kim Phillips <kim.phillips-QSEj5FYQhm4dnm+yROfE0A@public.gmane.org>
> > >
> > > Needed by drivers, such as the vfio platform driver [1], seeking to
> > > bypass bind_store()'s driver_match_device(), and bind to any device
> > > via a private sysfs bind file.
> > >
> > > [1] https://lkml.org/lkml/2013/12/11/522
> > >
> > > note: the EXPORT_SYMBOL is needed because vfio-platform can be built
> > > as a module.
> > 
> > No code outside of drivers/base/ should be calling this function
> 
> Why?  driver_probe_device() allows a driver to explicitly bind
> to a specific device.   What is conceptually wrong with allowing
> that?

Because that's not how a bus should work, and the fact that no other
subsystem in the kernel does that might be a hint you are trying to do
something a bit "wrong" here.

greg k-h

^ permalink raw reply	[flat|nested] 92+ messages in thread

* RE: [RFC PATCH v4 01/10] driver core: export driver_probe_device()
       [not found]             ` <20140215024725.GA2542-U8xfFu+wG4EAvxtiuMwx3w@public.gmane.org>
@ 2014-02-15 16:33                 ` Stuart Yoder
  0 siblings, 0 replies; 92+ messages in thread
From: Stuart Yoder @ 2014-02-15 16:33 UTC (permalink / raw)
  To: Greg KH
  Cc: kvm-u79uwXL29TY76Z2rM5mHXA, jan.kiszka-kv7WeFo6aLtBDgjK7y7TUQ,
	will.deacon-5wv7dgnIgG8, linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	Bjorn Helgaas, Varun Sethi,
	kvmarm-FPEHb7Xf0XXUo1n7N8X6UoWGPAHP3yOg, Rafael J. Wysocki,
	agraf-l3A5Bk7waGM, Guenter Roeck, Dmitry Kasatkin, Tejun Heo,
	Scott Wood, Antonios Motakis,
	tech-lrHrjnjw1UfHK3s98zE1ajGjJy/sRE9J, Michal Hocko, Toshi Kani,
	a.rigo-lrHrjnjw1UfHK3s98zE1ajGjJy/sRE9J,
	iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA

> > Why?  driver_probe_device() allows a driver to explicitly bind
> > to a specific device.   What is conceptually wrong with allowing
> > that?
> 
> Because that's not how a bus should work, and the fact that no other
> subsystem in the kernel does that might be a hint you are trying to do
> something a bit "wrong" here.

Let me try to succinctly as I can describe the problem we are trying to
solve here...

The vfio mechanism in the kernel (e.g. vfio-pci) allows devices to be
exposed user space (via file descriptors), enabling user space
drivers.  So, for example to export an e1000 card to user space, I do
this:

   echo 0001:03:00.0 > /sys/bus/pci/devices/0001:03:00.0/driver/unbind
   echo 8086 10d3 > /sys/bus/pci/drivers/vfio-pci/new_id

The first step unbinds the target device (0001:03:00.0) from the normal
e1000 driver.

The second step causes the vfio-pci driver to bind to device 0001:03:00.0.
This second step tells vfio-pci that it now handles e1000 device IDs,
and the vfio-pci drivers registers with the PCI bus to handle '8086 10d3'. 

That works, but it is ugly.  We now have 2 active drivers handling
the same device type...which introduces various possible race conditions.

We never want vfio-pci to auto-bind to any new device that shows up
on the PCI bus.  Binding a device to vfio-pci must be an explicit
action by an administrator.

You mentioned previously that user space can sort out the problem
of multiple drivers registered for handling the same device type.
That is true, but doesn't help here.   We don't want vfio-pci
to handle _all_ e1000 cards, just explicitly selected e1000 cards.

We want the normal e1000 driver to be loaded and to bind to new
devices that may be hot-plugged.

There are 2 proposed mechanisms that have been put forth, both of
which you have now rejected:

   1.  sysfs_bind_only flag was proposed which would allow a vfio
       driver (like vfio-pci) to only bind by explicit request through
       the sysfs 'bind' file.

   2.  Have the vfio driver call driver_probe_device() to explicitly bind
       a particular device instance to the driver.  Only change we need
       here is the EXPORT_SYMBOL.

Are you in principle opposed to any mechanism that would allow 2 drivers
to be resident/active and allow a sysadmin to explicitly bind a 
particular device instance to the driver of their choice?

Thanks,
Stuart

^ permalink raw reply	[flat|nested] 92+ messages in thread

* RE: [RFC PATCH v4 01/10] driver core: export driver_probe_device()
@ 2014-02-15 16:33                 ` Stuart Yoder
  0 siblings, 0 replies; 92+ messages in thread
From: Stuart Yoder @ 2014-02-15 16:33 UTC (permalink / raw)
  To: Greg KH
  Cc: kvm-u79uwXL29TY76Z2rM5mHXA, jan.kiszka-kv7WeFo6aLtBDgjK7y7TUQ,
	will.deacon-5wv7dgnIgG8, linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	Bjorn Helgaas, Varun Sethi,
	kvmarm-FPEHb7Xf0XXUo1n7N8X6UoWGPAHP3yOg, Rafael J. Wysocki,
	agraf-l3A5Bk7waGM, Guenter Roeck, Dmitry Kasatkin, Tejun Heo,
	Scott Wood, Antonios Motakis,
	tech-lrHrjnjw1UfHK3s98zE1ajGjJy/sRE9J, Michal Hocko, Toshi Kani,
	a.rigo-lrHrjnjw1UfHK3s98zE1ajGjJy/sRE9J,
	iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA

> > Why?  driver_probe_device() allows a driver to explicitly bind
> > to a specific device.   What is conceptually wrong with allowing
> > that?
> 
> Because that's not how a bus should work, and the fact that no other
> subsystem in the kernel does that might be a hint you are trying to do
> something a bit "wrong" here.

Let me try to succinctly as I can describe the problem we are trying to
solve here...

The vfio mechanism in the kernel (e.g. vfio-pci) allows devices to be
exposed user space (via file descriptors), enabling user space
drivers.  So, for example to export an e1000 card to user space, I do
this:

   echo 0001:03:00.0 > /sys/bus/pci/devices/0001:03:00.0/driver/unbind
   echo 8086 10d3 > /sys/bus/pci/drivers/vfio-pci/new_id

The first step unbinds the target device (0001:03:00.0) from the normal
e1000 driver.

The second step causes the vfio-pci driver to bind to device 0001:03:00.0.
This second step tells vfio-pci that it now handles e1000 device IDs,
and the vfio-pci drivers registers with the PCI bus to handle '8086 10d3'. 

That works, but it is ugly.  We now have 2 active drivers handling
the same device type...which introduces various possible race conditions.

We never want vfio-pci to auto-bind to any new device that shows up
on the PCI bus.  Binding a device to vfio-pci must be an explicit
action by an administrator.

You mentioned previously that user space can sort out the problem
of multiple drivers registered for handling the same device type.
That is true, but doesn't help here.   We don't want vfio-pci
to handle _all_ e1000 cards, just explicitly selected e1000 cards.

We want the normal e1000 driver to be loaded and to bind to new
devices that may be hot-plugged.

There are 2 proposed mechanisms that have been put forth, both of
which you have now rejected:

   1.  sysfs_bind_only flag was proposed which would allow a vfio
       driver (like vfio-pci) to only bind by explicit request through
       the sysfs 'bind' file.

   2.  Have the vfio driver call driver_probe_device() to explicitly bind
       a particular device instance to the driver.  Only change we need
       here is the EXPORT_SYMBOL.

Are you in principle opposed to any mechanism that would allow 2 drivers
to be resident/active and allow a sysadmin to explicitly bind a 
particular device instance to the driver of their choice?

Thanks,
Stuart

^ permalink raw reply	[flat|nested] 92+ messages in thread

* Re: [RFC PATCH v4 01/10] driver core: export driver_probe_device()
       [not found]                 ` <7043e1edd9974de590dcb392cd8aff14-ufbTtyGzTTT8GZusEWM6WuO6mTEJWrR4XA4E9RH9d+qIuWR1G4zioA@public.gmane.org>
@ 2014-02-15 17:33                     ` Greg KH
  0 siblings, 0 replies; 92+ messages in thread
From: Greg KH @ 2014-02-15 17:33 UTC (permalink / raw)
  To: Stuart Yoder
  Cc: kvm-u79uwXL29TY76Z2rM5mHXA, jan.kiszka-kv7WeFo6aLtBDgjK7y7TUQ,
	will.deacon-5wv7dgnIgG8, linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	Bjorn Helgaas, Varun Sethi,
	kvmarm-FPEHb7Xf0XXUo1n7N8X6UoWGPAHP3yOg, Rafael J. Wysocki,
	agraf-l3A5Bk7waGM, Guenter Roeck, Dmitry Kasatkin, Tejun Heo,
	Scott Wood, Antonios Motakis,
	tech-lrHrjnjw1UfHK3s98zE1ajGjJy/sRE9J, Michal Hocko, Toshi Kani,
	a.rigo-lrHrjnjw1UfHK3s98zE1ajGjJy/sRE9J,
	iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA

On Sat, Feb 15, 2014 at 04:33:44PM +0000, Stuart Yoder wrote:
> > > Why?  driver_probe_device() allows a driver to explicitly bind
> > > to a specific device.   What is conceptually wrong with allowing
> > > that?
> > 
> > Because that's not how a bus should work, and the fact that no other
> > subsystem in the kernel does that might be a hint you are trying to do
> > something a bit "wrong" here.
> 
> Let me try to succinctly as I can describe the problem we are trying to
> solve here...
> 
> The vfio mechanism in the kernel (e.g. vfio-pci) allows devices to be
> exposed user space (via file descriptors), enabling user space
> drivers.  So, for example to export an e1000 card to user space, I do
> this:
> 
>    echo 0001:03:00.0 > /sys/bus/pci/devices/0001:03:00.0/driver/unbind
>    echo 8086 10d3 > /sys/bus/pci/drivers/vfio-pci/new_id

What's wrong with using the "bind" file instead?  That picks a specific
device and binds it to a specific driver.  Or have we been down this
path before?  :)

And that is for a PCI "driver" not a totally separate bus, which it
looks like you are wanting to do here.

> The first step unbinds the target device (0001:03:00.0) from the normal
> e1000 driver.
> 
> The second step causes the vfio-pci driver to bind to device 0001:03:00.0.
> This second step tells vfio-pci that it now handles e1000 device IDs,
> and the vfio-pci drivers registers with the PCI bus to handle '8086 10d3'. 
> 
> That works, but it is ugly.  We now have 2 active drivers handling
> the same device type...which introduces various possible race conditions.
> 
> We never want vfio-pci to auto-bind to any new device that shows up
> on the PCI bus.  Binding a device to vfio-pci must be an explicit
> action by an administrator.

Then use the "bind" file.

> You mentioned previously that user space can sort out the problem
> of multiple drivers registered for handling the same device type.
> That is true, but doesn't help here.   We don't want vfio-pci
> to handle _all_ e1000 cards, just explicitly selected e1000 cards.
> 
> We want the normal e1000 driver to be loaded and to bind to new
> devices that may be hot-plugged.

I want a pony too...

> There are 2 proposed mechanisms that have been put forth, both of
> which you have now rejected:
> 
>    1.  sysfs_bind_only flag was proposed which would allow a vfio
>        driver (like vfio-pci) to only bind by explicit request through
>        the sysfs 'bind' file.

Why did I reject this?  What did the patch look like?

>    2.  Have the vfio driver call driver_probe_device() to explicitly bind
>        a particular device instance to the driver.  Only change we need
>        here is the EXPORT_SYMBOL.

How are you going to prevent the driver from being bound to the device
in the core with this change?  How are you going to call this function?
When?  On what action of the user?

> Are you in principle opposed to any mechanism that would allow 2 drivers
> to be resident/active and allow a sysadmin to explicitly bind a 
> particular device instance to the driver of their choice?

No, that works today with the bind/unbind/new_id files, it's just that
you don't like it :)

thanks,

greg k-h

^ permalink raw reply	[flat|nested] 92+ messages in thread

* Re: [RFC PATCH v4 01/10] driver core: export driver_probe_device()
@ 2014-02-15 17:33                     ` Greg KH
  0 siblings, 0 replies; 92+ messages in thread
From: Greg KH @ 2014-02-15 17:33 UTC (permalink / raw)
  To: Stuart Yoder
  Cc: kvm-u79uwXL29TY76Z2rM5mHXA, jan.kiszka-kv7WeFo6aLtBDgjK7y7TUQ,
	will.deacon-5wv7dgnIgG8, linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	Bjorn Helgaas, Varun Sethi,
	kvmarm-FPEHb7Xf0XXUo1n7N8X6UoWGPAHP3yOg, Rafael J. Wysocki,
	agraf-l3A5Bk7waGM, Guenter Roeck, Dmitry Kasatkin, Tejun Heo,
	Scott Wood, Antonios Motakis,
	tech-lrHrjnjw1UfHK3s98zE1ajGjJy/sRE9J, Michal Hocko, Toshi Kani,
	a.rigo-lrHrjnjw1UfHK3s98zE1ajGjJy/sRE9J,
	iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA

On Sat, Feb 15, 2014 at 04:33:44PM +0000, Stuart Yoder wrote:
> > > Why?  driver_probe_device() allows a driver to explicitly bind
> > > to a specific device.   What is conceptually wrong with allowing
> > > that?
> > 
> > Because that's not how a bus should work, and the fact that no other
> > subsystem in the kernel does that might be a hint you are trying to do
> > something a bit "wrong" here.
> 
> Let me try to succinctly as I can describe the problem we are trying to
> solve here...
> 
> The vfio mechanism in the kernel (e.g. vfio-pci) allows devices to be
> exposed user space (via file descriptors), enabling user space
> drivers.  So, for example to export an e1000 card to user space, I do
> this:
> 
>    echo 0001:03:00.0 > /sys/bus/pci/devices/0001:03:00.0/driver/unbind
>    echo 8086 10d3 > /sys/bus/pci/drivers/vfio-pci/new_id

What's wrong with using the "bind" file instead?  That picks a specific
device and binds it to a specific driver.  Or have we been down this
path before?  :)

And that is for a PCI "driver" not a totally separate bus, which it
looks like you are wanting to do here.

> The first step unbinds the target device (0001:03:00.0) from the normal
> e1000 driver.
> 
> The second step causes the vfio-pci driver to bind to device 0001:03:00.0.
> This second step tells vfio-pci that it now handles e1000 device IDs,
> and the vfio-pci drivers registers with the PCI bus to handle '8086 10d3'. 
> 
> That works, but it is ugly.  We now have 2 active drivers handling
> the same device type...which introduces various possible race conditions.
> 
> We never want vfio-pci to auto-bind to any new device that shows up
> on the PCI bus.  Binding a device to vfio-pci must be an explicit
> action by an administrator.

Then use the "bind" file.

> You mentioned previously that user space can sort out the problem
> of multiple drivers registered for handling the same device type.
> That is true, but doesn't help here.   We don't want vfio-pci
> to handle _all_ e1000 cards, just explicitly selected e1000 cards.
> 
> We want the normal e1000 driver to be loaded and to bind to new
> devices that may be hot-plugged.

I want a pony too...

> There are 2 proposed mechanisms that have been put forth, both of
> which you have now rejected:
> 
>    1.  sysfs_bind_only flag was proposed which would allow a vfio
>        driver (like vfio-pci) to only bind by explicit request through
>        the sysfs 'bind' file.

Why did I reject this?  What did the patch look like?

>    2.  Have the vfio driver call driver_probe_device() to explicitly bind
>        a particular device instance to the driver.  Only change we need
>        here is the EXPORT_SYMBOL.

How are you going to prevent the driver from being bound to the device
in the core with this change?  How are you going to call this function?
When?  On what action of the user?

> Are you in principle opposed to any mechanism that would allow 2 drivers
> to be resident/active and allow a sysadmin to explicitly bind a 
> particular device instance to the driver of their choice?

No, that works today with the bind/unbind/new_id files, it's just that
you don't like it :)

thanks,

greg k-h

^ permalink raw reply	[flat|nested] 92+ messages in thread

* RE: [RFC PATCH v4 01/10] driver core: export driver_probe_device()
       [not found]                     ` <20140215173348.GA8056-U8xfFu+wG4EAvxtiuMwx3w@public.gmane.org>
@ 2014-02-15 18:19                         ` Stuart Yoder
  0 siblings, 0 replies; 92+ messages in thread
From: Stuart Yoder @ 2014-02-15 18:19 UTC (permalink / raw)
  To: Greg KH
  Cc: kvm-u79uwXL29TY76Z2rM5mHXA, jan.kiszka-kv7WeFo6aLtBDgjK7y7TUQ,
	will.deacon-5wv7dgnIgG8, linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	Bjorn Helgaas, Varun Sethi,
	kvmarm-FPEHb7Xf0XXUo1n7N8X6UoWGPAHP3yOg, Rafael J. Wysocki,
	agraf-l3A5Bk7waGM, Guenter Roeck, Dmitry Kasatkin, Tejun Heo,
	Scott Wood, Antonios Motakis,
	tech-lrHrjnjw1UfHK3s98zE1ajGjJy/sRE9J, Michal Hocko, Toshi Kani,
	a.rigo-lrHrjnjw1UfHK3s98zE1ajGjJy/sRE9J,
	iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA



> -----Original Message-----
> From: Greg KH [mailto:gregkh-hQyY1W1yCW8ekmWlsbkhG0B+6BGkLq7r@public.gmane.org]
> Sent: Saturday, February 15, 2014 11:34 AM
> To: Yoder Stuart-B08248
> Cc: Antonios Motakis; alex.williamson-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org;
> kvmarm-FPEHb7Xf0XXUo1n7N8X6UoWGPAHP3yOg@public.gmane.org; iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA@public.gmane.org; linux-
> kernel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org; tech-lrHrjnjw1UfHK3s98zE1ajGjJy/sRE9J@public.gmane.org;
> a.rigo-lrHrjnjw1UfHK3s98zE1ajGjJy/sRE9J@public.gmane.org; kim.phillips-QSEj5FYQhm4dnm+yROfE0A@public.gmane.org;
> jan.kiszka-kv7WeFo6aLtBDgjK7y7TUQ@public.gmane.org; kvm-u79uwXL29TY76Z2rM5mHXA@public.gmane.org; Bhushan Bharat-R65777; Wood
> Scott-B07421; christoffer.dall-QSEj5FYQhm4dnm+yROfE0A@public.gmane.org; agraf-l3A5Bk7waGM@public.gmane.org; Sethi Varun-
> B16395; will.deacon-5wv7dgnIgG8@public.gmane.org; Tejun Heo; Rafael J. Wysocki; Guenter Roeck;
> Toshi Kani; Joe Perches; Dmitry Kasatkin; Michal Hocko; Bjorn Helgaas
> Subject: Re: [RFC PATCH v4 01/10] driver core: export
> driver_probe_device()
> 
> On Sat, Feb 15, 2014 at 04:33:44PM +0000, Stuart Yoder wrote:
> > > > Why?  driver_probe_device() allows a driver to explicitly bind
> > > > to a specific device.   What is conceptually wrong with allowing
> > > > that?
> > >
> > > Because that's not how a bus should work, and the fact that no other
> > > subsystem in the kernel does that might be a hint you are trying to
> do
> > > something a bit "wrong" here.
> >
> > Let me try to succinctly as I can describe the problem we are trying to
> > solve here...
> >
> > The vfio mechanism in the kernel (e.g. vfio-pci) allows devices to be
> > exposed user space (via file descriptors), enabling user space
> > drivers.  So, for example to export an e1000 card to user space, I do
> > this:
> >
> >    echo 0001:03:00.0 > /sys/bus/pci/devices/0001:03:00.0/driver/unbind
> >    echo 8086 10d3 > /sys/bus/pci/drivers/vfio-pci/new_id
> 
> What's wrong with using the "bind" file instead?  That picks a specific
> device and binds it to a specific driver.  Or have we been down this
> path before?  :)

Yes we have :)  The "bind" file does not bypass device ID checks, so
it wouldn't work without new_id or a wildcard match of some kind.

> And that is for a PCI "driver" not a totally separate bus, which it
> looks like you are wanting to do here.

vfio-pci is a PCI driver, not a bus (drivers/vfio/pci/vfio_pci.c).

> > The first step unbinds the target device (0001:03:00.0) from the normal
> > e1000 driver.
> >
> > The second step causes the vfio-pci driver to bind to device
> 0001:03:00.0.
> > This second step tells vfio-pci that it now handles e1000 device IDs,
> > and the vfio-pci drivers registers with the PCI bus to handle '8086
> 10d3'.
> >
> > That works, but it is ugly.  We now have 2 active drivers handling
> > the same device type...which introduces various possible race
> conditions.
> >
> > We never want vfio-pci to auto-bind to any new device that shows up
> > on the PCI bus.  Binding a device to vfio-pci must be an explicit
> > action by an administrator.
> 
> Then use the "bind" file.

See above.

> > You mentioned previously that user space can sort out the problem
> > of multiple drivers registered for handling the same device type.
> > That is true, but doesn't help here.   We don't want vfio-pci
> > to handle _all_ e1000 cards, just explicitly selected e1000 cards.
> >
> > We want the normal e1000 driver to be loaded and to bind to new
> > devices that may be hot-plugged.
> 
> I want a pony too...

It's not that difficult...this patch accomplishes it by
simply allowing drivers to call driver_probe_device().

> > There are 2 proposed mechanisms that have been put forth, both of
> > which you have now rejected:
> >
> >    1.  sysfs_bind_only flag was proposed which would allow a vfio
> >        driver (like vfio-pci) to only bind by explicit request through
> >        the sysfs 'bind' file.
> 
> Why did I reject this?  What did the patch look like?

https://lkml.org/lkml/2013/12/3/253


> >    2.  Have the vfio driver call driver_probe_device() to explicitly
> bind
> >        a particular device instance to the driver.  Only change we need
> >        here is the EXPORT_SYMBOL.
> 
> How are you going to prevent the driver from being bound to the device
> in the core with this change?  How are you going to call this function?
> When?  On what action of the user?

The vfio-pci driver would create a sysfs object "vfio_bind".

User would do:
   echo 0001:03:00.0 > /sys/bus/pci/drivers/vfio-pci/vfio_bind

vfio-pci would call driver_probe_device() which binds
the specific device to the vfio-pci driver...and there is 
no ambiguity.

> > Are you in principle opposed to any mechanism that would allow 2
> drivers
> > to be resident/active and allow a sysadmin to explicitly bind a
> > particular device instance to the driver of their choice?
> 
> No, that works today with the bind/unbind/new_id files, it's just that
> you don't like it :)

We don't like it because of the ambiguities/race-conditions with
the current situation.

A vfio driver, like vfio-pci, certainly is a bit different than a normal
driver, in that it really is not device ID aware.  It simply passes
through device resources (mappable regions, IRQs) to user space without
interpreting or understanding them.  It is kind of a "meta" driver, but
it is not a bus.  Every bus type would need its own vfio driver to
do this type of device pass through.

Thanks,
Stuart

^ permalink raw reply	[flat|nested] 92+ messages in thread

* RE: [RFC PATCH v4 01/10] driver core: export driver_probe_device()
@ 2014-02-15 18:19                         ` Stuart Yoder
  0 siblings, 0 replies; 92+ messages in thread
From: Stuart Yoder @ 2014-02-15 18:19 UTC (permalink / raw)
  To: Greg KH
  Cc: kvm-u79uwXL29TY76Z2rM5mHXA, jan.kiszka-kv7WeFo6aLtBDgjK7y7TUQ,
	will.deacon-5wv7dgnIgG8, linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	Bjorn Helgaas, Varun Sethi,
	kvmarm-FPEHb7Xf0XXUo1n7N8X6UoWGPAHP3yOg, Rafael J. Wysocki,
	agraf-l3A5Bk7waGM, Guenter Roeck, Dmitry Kasatkin, Tejun Heo,
	Scott Wood, Antonios Motakis,
	tech-lrHrjnjw1UfHK3s98zE1ajGjJy/sRE9J, Michal Hocko, Toshi Kani,
	a.rigo-lrHrjnjw1UfHK3s98zE1ajGjJy/sRE9J,
	iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA



> -----Original Message-----
> From: Greg KH [mailto:gregkh-hQyY1W1yCW8ekmWlsbkhG0B+6BGkLq7r@public.gmane.org]
> Sent: Saturday, February 15, 2014 11:34 AM
> To: Yoder Stuart-B08248
> Cc: Antonios Motakis; alex.williamson-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org;
> kvmarm-FPEHb7Xf0XXUo1n7N8X6UoWGPAHP3yOg@public.gmane.org; iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA@public.gmane.org; linux-
> kernel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org; tech-lrHrjnjw1UfHK3s98zE1ajGjJy/sRE9J@public.gmane.org;
> a.rigo-lrHrjnjw1UfHK3s98zE1ajGjJy/sRE9J@public.gmane.org; kim.phillips-QSEj5FYQhm4dnm+yROfE0A@public.gmane.org;
> jan.kiszka-kv7WeFo6aLtBDgjK7y7TUQ@public.gmane.org; kvm-u79uwXL29TY76Z2rM5mHXA@public.gmane.org; Bhushan Bharat-R65777; Wood
> Scott-B07421; christoffer.dall-QSEj5FYQhm4dnm+yROfE0A@public.gmane.org; agraf-l3A5Bk7waGM@public.gmane.org; Sethi Varun-
> B16395; will.deacon-5wv7dgnIgG8@public.gmane.org; Tejun Heo; Rafael J. Wysocki; Guenter Roeck;
> Toshi Kani; Joe Perches; Dmitry Kasatkin; Michal Hocko; Bjorn Helgaas
> Subject: Re: [RFC PATCH v4 01/10] driver core: export
> driver_probe_device()
> 
> On Sat, Feb 15, 2014 at 04:33:44PM +0000, Stuart Yoder wrote:
> > > > Why?  driver_probe_device() allows a driver to explicitly bind
> > > > to a specific device.   What is conceptually wrong with allowing
> > > > that?
> > >
> > > Because that's not how a bus should work, and the fact that no other
> > > subsystem in the kernel does that might be a hint you are trying to
> do
> > > something a bit "wrong" here.
> >
> > Let me try to succinctly as I can describe the problem we are trying to
> > solve here...
> >
> > The vfio mechanism in the kernel (e.g. vfio-pci) allows devices to be
> > exposed user space (via file descriptors), enabling user space
> > drivers.  So, for example to export an e1000 card to user space, I do
> > this:
> >
> >    echo 0001:03:00.0 > /sys/bus/pci/devices/0001:03:00.0/driver/unbind
> >    echo 8086 10d3 > /sys/bus/pci/drivers/vfio-pci/new_id
> 
> What's wrong with using the "bind" file instead?  That picks a specific
> device and binds it to a specific driver.  Or have we been down this
> path before?  :)

Yes we have :)  The "bind" file does not bypass device ID checks, so
it wouldn't work without new_id or a wildcard match of some kind.

> And that is for a PCI "driver" not a totally separate bus, which it
> looks like you are wanting to do here.

vfio-pci is a PCI driver, not a bus (drivers/vfio/pci/vfio_pci.c).

> > The first step unbinds the target device (0001:03:00.0) from the normal
> > e1000 driver.
> >
> > The second step causes the vfio-pci driver to bind to device
> 0001:03:00.0.
> > This second step tells vfio-pci that it now handles e1000 device IDs,
> > and the vfio-pci drivers registers with the PCI bus to handle '8086
> 10d3'.
> >
> > That works, but it is ugly.  We now have 2 active drivers handling
> > the same device type...which introduces various possible race
> conditions.
> >
> > We never want vfio-pci to auto-bind to any new device that shows up
> > on the PCI bus.  Binding a device to vfio-pci must be an explicit
> > action by an administrator.
> 
> Then use the "bind" file.

See above.

> > You mentioned previously that user space can sort out the problem
> > of multiple drivers registered for handling the same device type.
> > That is true, but doesn't help here.   We don't want vfio-pci
> > to handle _all_ e1000 cards, just explicitly selected e1000 cards.
> >
> > We want the normal e1000 driver to be loaded and to bind to new
> > devices that may be hot-plugged.
> 
> I want a pony too...

It's not that difficult...this patch accomplishes it by
simply allowing drivers to call driver_probe_device().

> > There are 2 proposed mechanisms that have been put forth, both of
> > which you have now rejected:
> >
> >    1.  sysfs_bind_only flag was proposed which would allow a vfio
> >        driver (like vfio-pci) to only bind by explicit request through
> >        the sysfs 'bind' file.
> 
> Why did I reject this?  What did the patch look like?

https://lkml.org/lkml/2013/12/3/253


> >    2.  Have the vfio driver call driver_probe_device() to explicitly
> bind
> >        a particular device instance to the driver.  Only change we need
> >        here is the EXPORT_SYMBOL.
> 
> How are you going to prevent the driver from being bound to the device
> in the core with this change?  How are you going to call this function?
> When?  On what action of the user?

The vfio-pci driver would create a sysfs object "vfio_bind".

User would do:
   echo 0001:03:00.0 > /sys/bus/pci/drivers/vfio-pci/vfio_bind

vfio-pci would call driver_probe_device() which binds
the specific device to the vfio-pci driver...and there is 
no ambiguity.

> > Are you in principle opposed to any mechanism that would allow 2
> drivers
> > to be resident/active and allow a sysadmin to explicitly bind a
> > particular device instance to the driver of their choice?
> 
> No, that works today with the bind/unbind/new_id files, it's just that
> you don't like it :)

We don't like it because of the ambiguities/race-conditions with
the current situation.

A vfio driver, like vfio-pci, certainly is a bit different than a normal
driver, in that it really is not device ID aware.  It simply passes
through device resources (mappable regions, IRQs) to user space without
interpreting or understanding them.  It is kind of a "meta" driver, but
it is not a bus.  Every bus type would need its own vfio driver to
do this type of device pass through.

Thanks,
Stuart

^ permalink raw reply	[flat|nested] 92+ messages in thread

* Re: [RFC PATCH v4 01/10] driver core: export driver_probe_device()
       [not found]                         ` <38f0473542954fe8b312a1f7b61a3d21-ufbTtyGzTTT8GZusEWM6WuO6mTEJWrR4XA4E9RH9d+qIuWR1G4zioA@public.gmane.org>
@ 2014-02-18  0:38                             ` Scott Wood
  0 siblings, 0 replies; 92+ messages in thread
From: Scott Wood @ 2014-02-18  0:38 UTC (permalink / raw)
  To: Yoder Stuart-B08248
  Cc: kvm-u79uwXL29TY76Z2rM5mHXA, jan.kiszka-kv7WeFo6aLtBDgjK7y7TUQ,
	will.deacon-5wv7dgnIgG8, a.rigo-lrHrjnjw1UfHK3s98zE1ajGjJy/sRE9J,
	Sethi Varun-B16395, kvmarm-FPEHb7Xf0XXUo1n7N8X6UoWGPAHP3yOg,
	Rafael J. Wysocki, agraf-l3A5Bk7waGM, Guenter Roeck,
	Dmitry Kasatkin, Tejun Heo, Bjorn Helgaas, Antonios Motakis,
	tech-lrHrjnjw1UfHK3s98zE1ajGjJy/sRE9J, Michal Hocko, Toshi Kani,
	Greg KH, linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA@public.gmane.org

On Sat, 2014-02-15 at 12:19 -0600, Yoder Stuart-B08248 wrote:
> 
> > -----Original Message-----
> > From: Greg KH [mailto:gregkh-hQyY1W1yCW8ekmWlsbkhG0B+6BGkLq7r@public.gmane.org]
> > Sent: Saturday, February 15, 2014 11:34 AM
> > To: Yoder Stuart-B08248
> > Cc: Antonios Motakis; alex.williamson-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org;
> > kvmarm-FPEHb7Xf0XXUo1n7N8X6UoWGPAHP3yOg@public.gmane.org; iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA@public.gmane.org; linux-
> > kernel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org; tech-lrHrjnjw1UfHK3s98zE1ajGjJy/sRE9J@public.gmane.org;
> > a.rigo-lrHrjnjw1UfHK3s98zE1ajGjJy/sRE9J@public.gmane.org; kim.phillips-QSEj5FYQhm4dnm+yROfE0A@public.gmane.org;
> > jan.kiszka-kv7WeFo6aLtBDgjK7y7TUQ@public.gmane.org; kvm-u79uwXL29TY76Z2rM5mHXA@public.gmane.org; Bhushan Bharat-R65777; Wood
> > Scott-B07421; christoffer.dall-QSEj5FYQhm4dnm+yROfE0A@public.gmane.org; agraf-l3A5Bk7waGM@public.gmane.org; Sethi Varun-
> > B16395; will.deacon-5wv7dgnIgG8@public.gmane.org; Tejun Heo; Rafael J. Wysocki; Guenter Roeck;
> > Toshi Kani; Joe Perches; Dmitry Kasatkin; Michal Hocko; Bjorn Helgaas
> > Subject: Re: [RFC PATCH v4 01/10] driver core: export
> > driver_probe_device()
> > 
> > On Sat, Feb 15, 2014 at 04:33:44PM +0000, Stuart Yoder wrote:
> > > Are you in principle opposed to any mechanism that would allow 2
> > drivers
> > > to be resident/active and allow a sysadmin to explicitly bind a
> > > particular device instance to the driver of their choice?
> > 
> > No, that works today with the bind/unbind/new_id files, it's just that
> > you don't like it :)
> 
> We don't like it because of the ambiguities/race-conditions with
> the current situation.

Plus, it's semantically weird (a.k.a. a hack).  The user isn't trying to
bind an entire type of device to the vfio driver, but rather a specific
device.  Races and similar ugliness is often what you get when you try
to pile things on top of the wrong abstraction.  That you can hack
around the races with a userspace loop (and hope that no damage was done
by the wrong driver in the meantime -- packets sent, filesystems
automounted, other inappropriate I/O performed, driver unbind
bugs/unwillingness encountered, etc) is not a particularly satisfying
answer.  At best the race fixup will end up being a poorly tested code
path (if the person scripting userspace thinks of doing it at all).

It also doesn't "work today" because there is no new_id for platform
devices, and the matching situation for platform devices is more
complicated than on PCI, so it would be more awkward to implement and
more awkward to use.

We can apply enough grease and pound the square peg through the round
hole if we must, but we'd like to first exhaust our options for doing it
in a simple, straightforward, robust, and semantically sensible manner
-- especially since once we start supporting the new_id approach for
vfio binding on platform devices it'll be ABI that we're stuck with.

-Scott

^ permalink raw reply	[flat|nested] 92+ messages in thread

* Re: [RFC PATCH v4 01/10] driver core: export driver_probe_device()
@ 2014-02-18  0:38                             ` Scott Wood
  0 siblings, 0 replies; 92+ messages in thread
From: Scott Wood @ 2014-02-18  0:38 UTC (permalink / raw)
  To: Yoder Stuart-B08248
  Cc: kvm-u79uwXL29TY76Z2rM5mHXA, jan.kiszka-kv7WeFo6aLtBDgjK7y7TUQ,
	will.deacon-5wv7dgnIgG8, a.rigo-lrHrjnjw1UfHK3s98zE1ajGjJy/sRE9J,
	Sethi Varun-B16395, kvmarm-FPEHb7Xf0XXUo1n7N8X6UoWGPAHP3yOg,
	Rafael J. Wysocki, agraf-l3A5Bk7waGM, Guenter Roeck,
	Dmitry Kasatkin, Tejun Heo, Bjorn Helgaas, Antonios Motakis,
	tech-lrHrjnjw1UfHK3s98zE1ajGjJy/sRE9J, Michal Hocko, Toshi Kani,
	Greg KH, linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA@public.gmane.org

On Sat, 2014-02-15 at 12:19 -0600, Yoder Stuart-B08248 wrote:
> 
> > -----Original Message-----
> > From: Greg KH [mailto:gregkh-hQyY1W1yCW8ekmWlsbkhG0B+6BGkLq7r@public.gmane.org]
> > Sent: Saturday, February 15, 2014 11:34 AM
> > To: Yoder Stuart-B08248
> > Cc: Antonios Motakis; alex.williamson-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org;
> > kvmarm-FPEHb7Xf0XXUo1n7N8X6UoWGPAHP3yOg@public.gmane.org; iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA@public.gmane.org; linux-
> > kernel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org; tech-lrHrjnjw1UfHK3s98zE1ajGjJy/sRE9J@public.gmane.org;
> > a.rigo-lrHrjnjw1UfHK3s98zE1ajGjJy/sRE9J@public.gmane.org; kim.phillips-QSEj5FYQhm4dnm+yROfE0A@public.gmane.org;
> > jan.kiszka-kv7WeFo6aLtBDgjK7y7TUQ@public.gmane.org; kvm-u79uwXL29TY76Z2rM5mHXA@public.gmane.org; Bhushan Bharat-R65777; Wood
> > Scott-B07421; christoffer.dall-QSEj5FYQhm4dnm+yROfE0A@public.gmane.org; agraf-l3A5Bk7waGM@public.gmane.org; Sethi Varun-
> > B16395; will.deacon-5wv7dgnIgG8@public.gmane.org; Tejun Heo; Rafael J. Wysocki; Guenter Roeck;
> > Toshi Kani; Joe Perches; Dmitry Kasatkin; Michal Hocko; Bjorn Helgaas
> > Subject: Re: [RFC PATCH v4 01/10] driver core: export
> > driver_probe_device()
> > 
> > On Sat, Feb 15, 2014 at 04:33:44PM +0000, Stuart Yoder wrote:
> > > Are you in principle opposed to any mechanism that would allow 2
> > drivers
> > > to be resident/active and allow a sysadmin to explicitly bind a
> > > particular device instance to the driver of their choice?
> > 
> > No, that works today with the bind/unbind/new_id files, it's just that
> > you don't like it :)
> 
> We don't like it because of the ambiguities/race-conditions with
> the current situation.

Plus, it's semantically weird (a.k.a. a hack).  The user isn't trying to
bind an entire type of device to the vfio driver, but rather a specific
device.  Races and similar ugliness is often what you get when you try
to pile things on top of the wrong abstraction.  That you can hack
around the races with a userspace loop (and hope that no damage was done
by the wrong driver in the meantime -- packets sent, filesystems
automounted, other inappropriate I/O performed, driver unbind
bugs/unwillingness encountered, etc) is not a particularly satisfying
answer.  At best the race fixup will end up being a poorly tested code
path (if the person scripting userspace thinks of doing it at all).

It also doesn't "work today" because there is no new_id for platform
devices, and the matching situation for platform devices is more
complicated than on PCI, so it would be more awkward to implement and
more awkward to use.

We can apply enough grease and pound the square peg through the round
hole if we must, but we'd like to first exhaust our options for doing it
in a simple, straightforward, robust, and semantically sensible manner
-- especially since once we start supporting the new_id approach for
vfio binding on platform devices it'll be ABI that we're stuck with.

-Scott

^ permalink raw reply	[flat|nested] 92+ messages in thread

* RE: [RFC PATCH v4 01/10] driver core: export driver_probe_device()
  2014-02-15 17:33                     ` Greg KH
@ 2014-02-20 22:34                       ` Stuart Yoder
  -1 siblings, 0 replies; 92+ messages in thread
From: Stuart Yoder @ 2014-02-20 22:34 UTC (permalink / raw)
  To: Greg KH
  Cc: kvm-u79uwXL29TY76Z2rM5mHXA, jan.kiszka-kv7WeFo6aLtBDgjK7y7TUQ,
	will.deacon-5wv7dgnIgG8, linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	Bjorn Helgaas, Varun Sethi,
	kvmarm-FPEHb7Xf0XXUo1n7N8X6UoWGPAHP3yOg, Rafael J. Wysocki,
	agraf-l3A5Bk7waGM, Guenter Roeck, Dmitry Kasatkin, Tejun Heo,
	Scott Wood, Antonios Motakis,
	tech-lrHrjnjw1UfHK3s98zE1ajGjJy/sRE9J, Michal Hocko, Toshi Kani,
	a.rigo-lrHrjnjw1UfHK3s98zE1ajGjJy/sRE9J,
	iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA



> -----Original Message-----
> From: Yoder Stuart-B08248
> Sent: Saturday, February 15, 2014 12:19 PM
> To: 'Greg KH'
> Cc: Antonios Motakis; alex.williamson-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org;
> kvmarm-FPEHb7Xf0XXUo1n7N8X6UoWGPAHP3yOg@public.gmane.org; iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA@public.gmane.org; linux-
> kernel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org; tech-lrHrjnjw1UfHK3s98zE1ajGjJy/sRE9J@public.gmane.org;
> a.rigo-lrHrjnjw1UfHK3s98zE1ajGjJy/sRE9J@public.gmane.org; kim.phillips-QSEj5FYQhm4dnm+yROfE0A@public.gmane.org;
> jan.kiszka-kv7WeFo6aLtBDgjK7y7TUQ@public.gmane.org; kvm-u79uwXL29TY76Z2rM5mHXA@public.gmane.org; Bhushan Bharat-R65777; Wood
> Scott-B07421; christoffer.dall-QSEj5FYQhm4dnm+yROfE0A@public.gmane.org; agraf-l3A5Bk7waGM@public.gmane.org; Sethi Varun-
> B16395; will.deacon-5wv7dgnIgG8@public.gmane.org; Tejun Heo; Rafael J. Wysocki; Guenter Roeck;
> Toshi Kani; Joe Perches; Dmitry Kasatkin; Michal Hocko; Bjorn Helgaas
> Subject: RE: [RFC PATCH v4 01/10] driver core: export
> driver_probe_device()
> 
> 
> 
> > -----Original Message-----
> > From: Greg KH [mailto:gregkh-hQyY1W1yCW8ekmWlsbkhG0B+6BGkLq7r@public.gmane.org]
> > Sent: Saturday, February 15, 2014 11:34 AM
> > To: Yoder Stuart-B08248
> > Cc: Antonios Motakis; alex.williamson-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org;
> > kvmarm-FPEHb7Xf0XXUo1n7N8X6UoWGPAHP3yOg@public.gmane.org; iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA@public.gmane.org; linux-
> > kernel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org; tech-lrHrjnjw1UfHK3s98zE1ajGjJy/sRE9J@public.gmane.org;
> > a.rigo-lrHrjnjw1UfHK3s98zE1ajGjJy/sRE9J@public.gmane.org; kim.phillips-QSEj5FYQhm4dnm+yROfE0A@public.gmane.org;
> > jan.kiszka-kv7WeFo6aLtBDgjK7y7TUQ@public.gmane.org; kvm-u79uwXL29TY76Z2rM5mHXA@public.gmane.org; Bhushan Bharat-R65777;
> Wood
> > Scott-B07421; christoffer.dall-QSEj5FYQhm4dnm+yROfE0A@public.gmane.org; agraf-l3A5Bk7waGM@public.gmane.org; Sethi Varun-
> > B16395; will.deacon-5wv7dgnIgG8@public.gmane.org; Tejun Heo; Rafael J. Wysocki; Guenter
> Roeck;
> > Toshi Kani; Joe Perches; Dmitry Kasatkin; Michal Hocko; Bjorn Helgaas
> > Subject: Re: [RFC PATCH v4 01/10] driver core: export
> > driver_probe_device()
> >
> > On Sat, Feb 15, 2014 at 04:33:44PM +0000, Stuart Yoder wrote:
> > > > > Why?  driver_probe_device() allows a driver to explicitly bind
> > > > > to a specific device.   What is conceptually wrong with allowing
> > > > > that?
> > > >
> > > > Because that's not how a bus should work, and the fact that no
> other
> > > > subsystem in the kernel does that might be a hint you are trying to
> > do
> > > > something a bit "wrong" here.
> > >
> > > Let me try to succinctly as I can describe the problem we are trying
> to
> > > solve here...
> > >
> > > The vfio mechanism in the kernel (e.g. vfio-pci) allows devices to be
> > > exposed user space (via file descriptors), enabling user space
> > > drivers.  So, for example to export an e1000 card to user space, I do
> > > this:
> > >
> > >    echo 0001:03:00.0 >
> /sys/bus/pci/devices/0001:03:00.0/driver/unbind
> > >    echo 8086 10d3 > /sys/bus/pci/drivers/vfio-pci/new_id
> >
> > What's wrong with using the "bind" file instead?  That picks a specific
> > device and binds it to a specific driver.  Or have we been down this
> > path before?  :)
> 
> Yes we have :)  The "bind" file does not bypass device ID checks, so
> it wouldn't work without new_id or a wildcard match of some kind.
> 
> > And that is for a PCI "driver" not a totally separate bus, which it
> > looks like you are wanting to do here.
> 
> vfio-pci is a PCI driver, not a bus (drivers/vfio/pci/vfio_pci.c).
> 
> > > The first step unbinds the target device (0001:03:00.0) from the
> normal
> > > e1000 driver.
> > >
> > > The second step causes the vfio-pci driver to bind to device
> > 0001:03:00.0.
> > > This second step tells vfio-pci that it now handles e1000 device IDs,
> > > and the vfio-pci drivers registers with the PCI bus to handle '8086
> > 10d3'.
> > >
> > > That works, but it is ugly.  We now have 2 active drivers handling
> > > the same device type...which introduces various possible race
> > conditions.
> > >
> > > We never want vfio-pci to auto-bind to any new device that shows up
> > > on the PCI bus.  Binding a device to vfio-pci must be an explicit
> > > action by an administrator.
> >
> > Then use the "bind" file.
> 
> See above.
> 
> > > You mentioned previously that user space can sort out the problem
> > > of multiple drivers registered for handling the same device type.
> > > That is true, but doesn't help here.   We don't want vfio-pci
> > > to handle _all_ e1000 cards, just explicitly selected e1000 cards.
> > >
> > > We want the normal e1000 driver to be loaded and to bind to new
> > > devices that may be hot-plugged.
> >
> > I want a pony too...
> 
> It's not that difficult...this patch accomplishes it by
> simply allowing drivers to call driver_probe_device().
> 
> > > There are 2 proposed mechanisms that have been put forth, both of
> > > which you have now rejected:
> > >
> > >    1.  sysfs_bind_only flag was proposed which would allow a vfio
> > >        driver (like vfio-pci) to only bind by explicit request
> through
> > >        the sysfs 'bind' file.
> >
> > Why did I reject this?  What did the patch look like?
> 
> https://lkml.org/lkml/2013/12/3/253
> 
> 
> > >    2.  Have the vfio driver call driver_probe_device() to explicitly
> > bind
> > >        a particular device instance to the driver.  Only change we
> need
> > >        here is the EXPORT_SYMBOL.
> >
> > How are you going to prevent the driver from being bound to the device
> > in the core with this change?  How are you going to call this function?
> > When?  On what action of the user?
> 
> The vfio-pci driver would create a sysfs object "vfio_bind".
> 
> User would do:
>    echo 0001:03:00.0 > /sys/bus/pci/drivers/vfio-pci/vfio_bind
> 
> vfio-pci would call driver_probe_device() which binds
> the specific device to the vfio-pci driver...and there is
> no ambiguity.
> 
> > > Are you in principle opposed to any mechanism that would allow 2
> > drivers
> > > to be resident/active and allow a sysadmin to explicitly bind a
> > > particular device instance to the driver of their choice?
> >
> > No, that works today with the bind/unbind/new_id files, it's just that
> > you don't like it :)
> 
> We don't like it because of the ambiguities/race-conditions with
> the current situation.
> 
> A vfio driver, like vfio-pci, certainly is a bit different than a normal
> driver, in that it really is not device ID aware.  It simply passes
> through device resources (mappable regions, IRQs) to user space without
> interpreting or understanding them.  It is kind of a "meta" driver, but
> it is not a bus.  Every bus type would need its own vfio driver to
> do this type of device pass through.

Hi Greg,

Any further thoughts on this?

Thanks,
Stuart

^ permalink raw reply	[flat|nested] 92+ messages in thread

* RE: [RFC PATCH v4 01/10] driver core: export driver_probe_device()
@ 2014-02-20 22:34                       ` Stuart Yoder
  0 siblings, 0 replies; 92+ messages in thread
From: Stuart Yoder @ 2014-02-20 22:34 UTC (permalink / raw)
  To: Greg KH
  Cc: kvm-u79uwXL29TY76Z2rM5mHXA, jan.kiszka-kv7WeFo6aLtBDgjK7y7TUQ,
	will.deacon-5wv7dgnIgG8, linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	Bjorn Helgaas, Varun Sethi,
	kvmarm-FPEHb7Xf0XXUo1n7N8X6UoWGPAHP3yOg, Rafael J. Wysocki,
	agraf-l3A5Bk7waGM, Guenter Roeck, Dmitry Kasatkin, Tejun Heo,
	Scott Wood, Antonios Motakis,
	tech-lrHrjnjw1UfHK3s98zE1ajGjJy/sRE9J, Michal Hocko, Toshi Kani,
	a.rigo-lrHrjnjw1UfHK3s98zE1ajGjJy/sRE9J,
	iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA



> -----Original Message-----
> From: Yoder Stuart-B08248
> Sent: Saturday, February 15, 2014 12:19 PM
> To: 'Greg KH'
> Cc: Antonios Motakis; alex.williamson-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org;
> kvmarm-FPEHb7Xf0XXUo1n7N8X6UoWGPAHP3yOg@public.gmane.org; iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA@public.gmane.org; linux-
> kernel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org; tech-lrHrjnjw1UfHK3s98zE1ajGjJy/sRE9J@public.gmane.org;
> a.rigo-lrHrjnjw1UfHK3s98zE1ajGjJy/sRE9J@public.gmane.org; kim.phillips-QSEj5FYQhm4dnm+yROfE0A@public.gmane.org;
> jan.kiszka-kv7WeFo6aLtBDgjK7y7TUQ@public.gmane.org; kvm-u79uwXL29TY76Z2rM5mHXA@public.gmane.org; Bhushan Bharat-R65777; Wood
> Scott-B07421; christoffer.dall-QSEj5FYQhm4dnm+yROfE0A@public.gmane.org; agraf-l3A5Bk7waGM@public.gmane.org; Sethi Varun-
> B16395; will.deacon-5wv7dgnIgG8@public.gmane.org; Tejun Heo; Rafael J. Wysocki; Guenter Roeck;
> Toshi Kani; Joe Perches; Dmitry Kasatkin; Michal Hocko; Bjorn Helgaas
> Subject: RE: [RFC PATCH v4 01/10] driver core: export
> driver_probe_device()
> 
> 
> 
> > -----Original Message-----
> > From: Greg KH [mailto:gregkh-hQyY1W1yCW8ekmWlsbkhG0B+6BGkLq7r@public.gmane.org]
> > Sent: Saturday, February 15, 2014 11:34 AM
> > To: Yoder Stuart-B08248
> > Cc: Antonios Motakis; alex.williamson-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org;
> > kvmarm-FPEHb7Xf0XXUo1n7N8X6UoWGPAHP3yOg@public.gmane.org; iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA@public.gmane.org; linux-
> > kernel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org; tech-lrHrjnjw1UfHK3s98zE1ajGjJy/sRE9J@public.gmane.org;
> > a.rigo-lrHrjnjw1UfHK3s98zE1ajGjJy/sRE9J@public.gmane.org; kim.phillips-QSEj5FYQhm4dnm+yROfE0A@public.gmane.org;
> > jan.kiszka-kv7WeFo6aLtBDgjK7y7TUQ@public.gmane.org; kvm-u79uwXL29TY76Z2rM5mHXA@public.gmane.org; Bhushan Bharat-R65777;
> Wood
> > Scott-B07421; christoffer.dall-QSEj5FYQhm4dnm+yROfE0A@public.gmane.org; agraf-l3A5Bk7waGM@public.gmane.org; Sethi Varun-
> > B16395; will.deacon-5wv7dgnIgG8@public.gmane.org; Tejun Heo; Rafael J. Wysocki; Guenter
> Roeck;
> > Toshi Kani; Joe Perches; Dmitry Kasatkin; Michal Hocko; Bjorn Helgaas
> > Subject: Re: [RFC PATCH v4 01/10] driver core: export
> > driver_probe_device()
> >
> > On Sat, Feb 15, 2014 at 04:33:44PM +0000, Stuart Yoder wrote:
> > > > > Why?  driver_probe_device() allows a driver to explicitly bind
> > > > > to a specific device.   What is conceptually wrong with allowing
> > > > > that?
> > > >
> > > > Because that's not how a bus should work, and the fact that no
> other
> > > > subsystem in the kernel does that might be a hint you are trying to
> > do
> > > > something a bit "wrong" here.
> > >
> > > Let me try to succinctly as I can describe the problem we are trying
> to
> > > solve here...
> > >
> > > The vfio mechanism in the kernel (e.g. vfio-pci) allows devices to be
> > > exposed user space (via file descriptors), enabling user space
> > > drivers.  So, for example to export an e1000 card to user space, I do
> > > this:
> > >
> > >    echo 0001:03:00.0 >
> /sys/bus/pci/devices/0001:03:00.0/driver/unbind
> > >    echo 8086 10d3 > /sys/bus/pci/drivers/vfio-pci/new_id
> >
> > What's wrong with using the "bind" file instead?  That picks a specific
> > device and binds it to a specific driver.  Or have we been down this
> > path before?  :)
> 
> Yes we have :)  The "bind" file does not bypass device ID checks, so
> it wouldn't work without new_id or a wildcard match of some kind.
> 
> > And that is for a PCI "driver" not a totally separate bus, which it
> > looks like you are wanting to do here.
> 
> vfio-pci is a PCI driver, not a bus (drivers/vfio/pci/vfio_pci.c).
> 
> > > The first step unbinds the target device (0001:03:00.0) from the
> normal
> > > e1000 driver.
> > >
> > > The second step causes the vfio-pci driver to bind to device
> > 0001:03:00.0.
> > > This second step tells vfio-pci that it now handles e1000 device IDs,
> > > and the vfio-pci drivers registers with the PCI bus to handle '8086
> > 10d3'.
> > >
> > > That works, but it is ugly.  We now have 2 active drivers handling
> > > the same device type...which introduces various possible race
> > conditions.
> > >
> > > We never want vfio-pci to auto-bind to any new device that shows up
> > > on the PCI bus.  Binding a device to vfio-pci must be an explicit
> > > action by an administrator.
> >
> > Then use the "bind" file.
> 
> See above.
> 
> > > You mentioned previously that user space can sort out the problem
> > > of multiple drivers registered for handling the same device type.
> > > That is true, but doesn't help here.   We don't want vfio-pci
> > > to handle _all_ e1000 cards, just explicitly selected e1000 cards.
> > >
> > > We want the normal e1000 driver to be loaded and to bind to new
> > > devices that may be hot-plugged.
> >
> > I want a pony too...
> 
> It's not that difficult...this patch accomplishes it by
> simply allowing drivers to call driver_probe_device().
> 
> > > There are 2 proposed mechanisms that have been put forth, both of
> > > which you have now rejected:
> > >
> > >    1.  sysfs_bind_only flag was proposed which would allow a vfio
> > >        driver (like vfio-pci) to only bind by explicit request
> through
> > >        the sysfs 'bind' file.
> >
> > Why did I reject this?  What did the patch look like?
> 
> https://lkml.org/lkml/2013/12/3/253
> 
> 
> > >    2.  Have the vfio driver call driver_probe_device() to explicitly
> > bind
> > >        a particular device instance to the driver.  Only change we
> need
> > >        here is the EXPORT_SYMBOL.
> >
> > How are you going to prevent the driver from being bound to the device
> > in the core with this change?  How are you going to call this function?
> > When?  On what action of the user?
> 
> The vfio-pci driver would create a sysfs object "vfio_bind".
> 
> User would do:
>    echo 0001:03:00.0 > /sys/bus/pci/drivers/vfio-pci/vfio_bind
> 
> vfio-pci would call driver_probe_device() which binds
> the specific device to the vfio-pci driver...and there is
> no ambiguity.
> 
> > > Are you in principle opposed to any mechanism that would allow 2
> > drivers
> > > to be resident/active and allow a sysadmin to explicitly bind a
> > > particular device instance to the driver of their choice?
> >
> > No, that works today with the bind/unbind/new_id files, it's just that
> > you don't like it :)
> 
> We don't like it because of the ambiguities/race-conditions with
> the current situation.
> 
> A vfio driver, like vfio-pci, certainly is a bit different than a normal
> driver, in that it really is not device ID aware.  It simply passes
> through device resources (mappable regions, IRQs) to user space without
> interpreting or understanding them.  It is kind of a "meta" driver, but
> it is not a bus.  Every bus type would need its own vfio driver to
> do this type of device pass through.

Hi Greg,

Any further thoughts on this?

Thanks,
Stuart

^ permalink raw reply	[flat|nested] 92+ messages in thread

* Re: [RFC PATCH v4 01/10] driver core: export driver_probe_device()
       [not found]                       ` <b6374a0f30194969ba4622ff2f58ae65-ufbTtyGzTTT8GZusEWM6WuO6mTEJWrR4XA4E9RH9d+qIuWR1G4zioA@public.gmane.org>
@ 2014-02-20 22:43                           ` Greg KH
  0 siblings, 0 replies; 92+ messages in thread
From: Greg KH @ 2014-02-20 22:43 UTC (permalink / raw)
  To: Stuart Yoder
  Cc: kvm-u79uwXL29TY76Z2rM5mHXA, jan.kiszka-kv7WeFo6aLtBDgjK7y7TUQ,
	will.deacon-5wv7dgnIgG8, linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	Bjorn Helgaas, Varun Sethi,
	kvmarm-FPEHb7Xf0XXUo1n7N8X6UoWGPAHP3yOg, Rafael J. Wysocki,
	agraf-l3A5Bk7waGM, Guenter Roeck, Dmitry Kasatkin, Tejun Heo,
	Scott Wood, Antonios Motakis,
	tech-lrHrjnjw1UfHK3s98zE1ajGjJy/sRE9J, Michal Hocko, Toshi Kani,
	a.rigo-lrHrjnjw1UfHK3s98zE1ajGjJy/sRE9J,
	iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA

On Thu, Feb 20, 2014 at 10:34:35PM +0000, Stuart Yoder wrote:
> 
> 
> > -----Original Message-----
> > From: Yoder Stuart-B08248
> > Sent: Saturday, February 15, 2014 12:19 PM
> > To: 'Greg KH'
> > Cc: Antonios Motakis; alex.williamson-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org;
> > kvmarm-FPEHb7Xf0XXUo1n7N8X6UoWGPAHP3yOg@public.gmane.org; iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA@public.gmane.org; linux-
> > kernel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org; tech-lrHrjnjw1UfHK3s98zE1ajGjJy/sRE9J@public.gmane.org;
> > a.rigo-lrHrjnjw1UfHK3s98zE1ajGjJy/sRE9J@public.gmane.org; kim.phillips-QSEj5FYQhm4dnm+yROfE0A@public.gmane.org;
> > jan.kiszka-kv7WeFo6aLtBDgjK7y7TUQ@public.gmane.org; kvm-u79uwXL29TY76Z2rM5mHXA@public.gmane.org; Bhushan Bharat-R65777; Wood
> > Scott-B07421; christoffer.dall-QSEj5FYQhm4dnm+yROfE0A@public.gmane.org; agraf-l3A5Bk7waGM@public.gmane.org; Sethi Varun-
> > B16395; will.deacon-5wv7dgnIgG8@public.gmane.org; Tejun Heo; Rafael J. Wysocki; Guenter Roeck;
> > Toshi Kani; Joe Perches; Dmitry Kasatkin; Michal Hocko; Bjorn Helgaas
> > Subject: RE: [RFC PATCH v4 01/10] driver core: export
> > driver_probe_device()
> > 
> > 
> > 
> > > -----Original Message-----
> > > From: Greg KH [mailto:gregkh-hQyY1W1yCW8ekmWlsbkhG0B+6BGkLq7r@public.gmane.org]
> > > Sent: Saturday, February 15, 2014 11:34 AM
> > > To: Yoder Stuart-B08248
> > > Cc: Antonios Motakis; alex.williamson-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org;
> > > kvmarm-FPEHb7Xf0XXUo1n7N8X6UoWGPAHP3yOg@public.gmane.org; iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA@public.gmane.org; linux-
> > > kernel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org; tech-lrHrjnjw1UfHK3s98zE1ajGjJy/sRE9J@public.gmane.org;
> > > a.rigo-lrHrjnjw1UfHK3s98zE1ajGjJy/sRE9J@public.gmane.org; kim.phillips-QSEj5FYQhm4dnm+yROfE0A@public.gmane.org;
> > > jan.kiszka-kv7WeFo6aLtBDgjK7y7TUQ@public.gmane.org; kvm-u79uwXL29TY76Z2rM5mHXA@public.gmane.org; Bhushan Bharat-R65777;
> > Wood
> > > Scott-B07421; christoffer.dall-QSEj5FYQhm4dnm+yROfE0A@public.gmane.org; agraf-l3A5Bk7waGM@public.gmane.org; Sethi Varun-
> > > B16395; will.deacon-5wv7dgnIgG8@public.gmane.org; Tejun Heo; Rafael J. Wysocki; Guenter
> > Roeck;
> > > Toshi Kani; Joe Perches; Dmitry Kasatkin; Michal Hocko; Bjorn Helgaas
> > > Subject: Re: [RFC PATCH v4 01/10] driver core: export
> > > driver_probe_device()
> > >
> > > On Sat, Feb 15, 2014 at 04:33:44PM +0000, Stuart Yoder wrote:
> > > > > > Why?  driver_probe_device() allows a driver to explicitly bind
> > > > > > to a specific device.   What is conceptually wrong with allowing
> > > > > > that?
> > > > >
> > > > > Because that's not how a bus should work, and the fact that no
> > other
> > > > > subsystem in the kernel does that might be a hint you are trying to
> > > do
> > > > > something a bit "wrong" here.
> > > >
> > > > Let me try to succinctly as I can describe the problem we are trying
> > to
> > > > solve here...
> > > >
> > > > The vfio mechanism in the kernel (e.g. vfio-pci) allows devices to be
> > > > exposed user space (via file descriptors), enabling user space
> > > > drivers.  So, for example to export an e1000 card to user space, I do
> > > > this:
> > > >
> > > >    echo 0001:03:00.0 >
> > /sys/bus/pci/devices/0001:03:00.0/driver/unbind
> > > >    echo 8086 10d3 > /sys/bus/pci/drivers/vfio-pci/new_id
> > >
> > > What's wrong with using the "bind" file instead?  That picks a specific
> > > device and binds it to a specific driver.  Or have we been down this
> > > path before?  :)
> > 
> > Yes we have :)  The "bind" file does not bypass device ID checks, so
> > it wouldn't work without new_id or a wildcard match of some kind.
> > 
> > > And that is for a PCI "driver" not a totally separate bus, which it
> > > looks like you are wanting to do here.
> > 
> > vfio-pci is a PCI driver, not a bus (drivers/vfio/pci/vfio_pci.c).
> > 
> > > > The first step unbinds the target device (0001:03:00.0) from the
> > normal
> > > > e1000 driver.
> > > >
> > > > The second step causes the vfio-pci driver to bind to device
> > > 0001:03:00.0.
> > > > This second step tells vfio-pci that it now handles e1000 device IDs,
> > > > and the vfio-pci drivers registers with the PCI bus to handle '8086
> > > 10d3'.
> > > >
> > > > That works, but it is ugly.  We now have 2 active drivers handling
> > > > the same device type...which introduces various possible race
> > > conditions.
> > > >
> > > > We never want vfio-pci to auto-bind to any new device that shows up
> > > > on the PCI bus.  Binding a device to vfio-pci must be an explicit
> > > > action by an administrator.
> > >
> > > Then use the "bind" file.
> > 
> > See above.
> > 
> > > > You mentioned previously that user space can sort out the problem
> > > > of multiple drivers registered for handling the same device type.
> > > > That is true, but doesn't help here.   We don't want vfio-pci
> > > > to handle _all_ e1000 cards, just explicitly selected e1000 cards.
> > > >
> > > > We want the normal e1000 driver to be loaded and to bind to new
> > > > devices that may be hot-plugged.
> > >
> > > I want a pony too...
> > 
> > It's not that difficult...this patch accomplishes it by
> > simply allowing drivers to call driver_probe_device().
> > 
> > > > There are 2 proposed mechanisms that have been put forth, both of
> > > > which you have now rejected:
> > > >
> > > >    1.  sysfs_bind_only flag was proposed which would allow a vfio
> > > >        driver (like vfio-pci) to only bind by explicit request
> > through
> > > >        the sysfs 'bind' file.
> > >
> > > Why did I reject this?  What did the patch look like?
> > 
> > https://lkml.org/lkml/2013/12/3/253
> > 
> > 
> > > >    2.  Have the vfio driver call driver_probe_device() to explicitly
> > > bind
> > > >        a particular device instance to the driver.  Only change we
> > need
> > > >        here is the EXPORT_SYMBOL.
> > >
> > > How are you going to prevent the driver from being bound to the device
> > > in the core with this change?  How are you going to call this function?
> > > When?  On what action of the user?
> > 
> > The vfio-pci driver would create a sysfs object "vfio_bind".
> > 
> > User would do:
> >    echo 0001:03:00.0 > /sys/bus/pci/drivers/vfio-pci/vfio_bind
> > 
> > vfio-pci would call driver_probe_device() which binds
> > the specific device to the vfio-pci driver...and there is
> > no ambiguity.
> > 
> > > > Are you in principle opposed to any mechanism that would allow 2
> > > drivers
> > > > to be resident/active and allow a sysadmin to explicitly bind a
> > > > particular device instance to the driver of their choice?
> > >
> > > No, that works today with the bind/unbind/new_id files, it's just that
> > > you don't like it :)
> > 
> > We don't like it because of the ambiguities/race-conditions with
> > the current situation.
> > 
> > A vfio driver, like vfio-pci, certainly is a bit different than a normal
> > driver, in that it really is not device ID aware.  It simply passes
> > through device resources (mappable regions, IRQs) to user space without
> > interpreting or understanding them.  It is kind of a "meta" driver, but
> > it is not a bus.  Every bus type would need its own vfio driver to
> > do this type of device pass through.
> 
> Hi Greg,
> 
> Any further thoughts on this?

Sorry, been swamped with other patches and stable stuff and not had a
time to look at it.  Give me a few days...

^ permalink raw reply	[flat|nested] 92+ messages in thread

* Re: [RFC PATCH v4 01/10] driver core: export driver_probe_device()
@ 2014-02-20 22:43                           ` Greg KH
  0 siblings, 0 replies; 92+ messages in thread
From: Greg KH @ 2014-02-20 22:43 UTC (permalink / raw)
  To: Stuart Yoder
  Cc: kvm-u79uwXL29TY76Z2rM5mHXA, jan.kiszka-kv7WeFo6aLtBDgjK7y7TUQ,
	will.deacon-5wv7dgnIgG8, linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	Bjorn Helgaas, Varun Sethi,
	kvmarm-FPEHb7Xf0XXUo1n7N8X6UoWGPAHP3yOg, Rafael J. Wysocki,
	agraf-l3A5Bk7waGM, Guenter Roeck, Dmitry Kasatkin, Tejun Heo,
	Scott Wood, Antonios Motakis,
	tech-lrHrjnjw1UfHK3s98zE1ajGjJy/sRE9J, Michal Hocko, Toshi Kani,
	a.rigo-lrHrjnjw1UfHK3s98zE1ajGjJy/sRE9J,
	iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA

On Thu, Feb 20, 2014 at 10:34:35PM +0000, Stuart Yoder wrote:
> 
> 
> > -----Original Message-----
> > From: Yoder Stuart-B08248
> > Sent: Saturday, February 15, 2014 12:19 PM
> > To: 'Greg KH'
> > Cc: Antonios Motakis; alex.williamson-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org;
> > kvmarm-FPEHb7Xf0XXUo1n7N8X6UoWGPAHP3yOg@public.gmane.org; iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA@public.gmane.org; linux-
> > kernel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org; tech-lrHrjnjw1UfHK3s98zE1ajGjJy/sRE9J@public.gmane.org;
> > a.rigo-lrHrjnjw1UfHK3s98zE1ajGjJy/sRE9J@public.gmane.org; kim.phillips-QSEj5FYQhm4dnm+yROfE0A@public.gmane.org;
> > jan.kiszka-kv7WeFo6aLtBDgjK7y7TUQ@public.gmane.org; kvm-u79uwXL29TY76Z2rM5mHXA@public.gmane.org; Bhushan Bharat-R65777; Wood
> > Scott-B07421; christoffer.dall-QSEj5FYQhm4dnm+yROfE0A@public.gmane.org; agraf-l3A5Bk7waGM@public.gmane.org; Sethi Varun-
> > B16395; will.deacon-5wv7dgnIgG8@public.gmane.org; Tejun Heo; Rafael J. Wysocki; Guenter Roeck;
> > Toshi Kani; Joe Perches; Dmitry Kasatkin; Michal Hocko; Bjorn Helgaas
> > Subject: RE: [RFC PATCH v4 01/10] driver core: export
> > driver_probe_device()
> > 
> > 
> > 
> > > -----Original Message-----
> > > From: Greg KH [mailto:gregkh-hQyY1W1yCW8ekmWlsbkhG0B+6BGkLq7r@public.gmane.org]
> > > Sent: Saturday, February 15, 2014 11:34 AM
> > > To: Yoder Stuart-B08248
> > > Cc: Antonios Motakis; alex.williamson-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org;
> > > kvmarm-FPEHb7Xf0XXUo1n7N8X6UoWGPAHP3yOg@public.gmane.org; iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA@public.gmane.org; linux-
> > > kernel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org; tech-lrHrjnjw1UfHK3s98zE1ajGjJy/sRE9J@public.gmane.org;
> > > a.rigo-lrHrjnjw1UfHK3s98zE1ajGjJy/sRE9J@public.gmane.org; kim.phillips-QSEj5FYQhm4dnm+yROfE0A@public.gmane.org;
> > > jan.kiszka-kv7WeFo6aLtBDgjK7y7TUQ@public.gmane.org; kvm-u79uwXL29TY76Z2rM5mHXA@public.gmane.org; Bhushan Bharat-R65777;
> > Wood
> > > Scott-B07421; christoffer.dall-QSEj5FYQhm4dnm+yROfE0A@public.gmane.org; agraf-l3A5Bk7waGM@public.gmane.org; Sethi Varun-
> > > B16395; will.deacon-5wv7dgnIgG8@public.gmane.org; Tejun Heo; Rafael J. Wysocki; Guenter
> > Roeck;
> > > Toshi Kani; Joe Perches; Dmitry Kasatkin; Michal Hocko; Bjorn Helgaas
> > > Subject: Re: [RFC PATCH v4 01/10] driver core: export
> > > driver_probe_device()
> > >
> > > On Sat, Feb 15, 2014 at 04:33:44PM +0000, Stuart Yoder wrote:
> > > > > > Why?  driver_probe_device() allows a driver to explicitly bind
> > > > > > to a specific device.   What is conceptually wrong with allowing
> > > > > > that?
> > > > >
> > > > > Because that's not how a bus should work, and the fact that no
> > other
> > > > > subsystem in the kernel does that might be a hint you are trying to
> > > do
> > > > > something a bit "wrong" here.
> > > >
> > > > Let me try to succinctly as I can describe the problem we are trying
> > to
> > > > solve here...
> > > >
> > > > The vfio mechanism in the kernel (e.g. vfio-pci) allows devices to be
> > > > exposed user space (via file descriptors), enabling user space
> > > > drivers.  So, for example to export an e1000 card to user space, I do
> > > > this:
> > > >
> > > >    echo 0001:03:00.0 >
> > /sys/bus/pci/devices/0001:03:00.0/driver/unbind
> > > >    echo 8086 10d3 > /sys/bus/pci/drivers/vfio-pci/new_id
> > >
> > > What's wrong with using the "bind" file instead?  That picks a specific
> > > device and binds it to a specific driver.  Or have we been down this
> > > path before?  :)
> > 
> > Yes we have :)  The "bind" file does not bypass device ID checks, so
> > it wouldn't work without new_id or a wildcard match of some kind.
> > 
> > > And that is for a PCI "driver" not a totally separate bus, which it
> > > looks like you are wanting to do here.
> > 
> > vfio-pci is a PCI driver, not a bus (drivers/vfio/pci/vfio_pci.c).
> > 
> > > > The first step unbinds the target device (0001:03:00.0) from the
> > normal
> > > > e1000 driver.
> > > >
> > > > The second step causes the vfio-pci driver to bind to device
> > > 0001:03:00.0.
> > > > This second step tells vfio-pci that it now handles e1000 device IDs,
> > > > and the vfio-pci drivers registers with the PCI bus to handle '8086
> > > 10d3'.
> > > >
> > > > That works, but it is ugly.  We now have 2 active drivers handling
> > > > the same device type...which introduces various possible race
> > > conditions.
> > > >
> > > > We never want vfio-pci to auto-bind to any new device that shows up
> > > > on the PCI bus.  Binding a device to vfio-pci must be an explicit
> > > > action by an administrator.
> > >
> > > Then use the "bind" file.
> > 
> > See above.
> > 
> > > > You mentioned previously that user space can sort out the problem
> > > > of multiple drivers registered for handling the same device type.
> > > > That is true, but doesn't help here.   We don't want vfio-pci
> > > > to handle _all_ e1000 cards, just explicitly selected e1000 cards.
> > > >
> > > > We want the normal e1000 driver to be loaded and to bind to new
> > > > devices that may be hot-plugged.
> > >
> > > I want a pony too...
> > 
> > It's not that difficult...this patch accomplishes it by
> > simply allowing drivers to call driver_probe_device().
> > 
> > > > There are 2 proposed mechanisms that have been put forth, both of
> > > > which you have now rejected:
> > > >
> > > >    1.  sysfs_bind_only flag was proposed which would allow a vfio
> > > >        driver (like vfio-pci) to only bind by explicit request
> > through
> > > >        the sysfs 'bind' file.
> > >
> > > Why did I reject this?  What did the patch look like?
> > 
> > https://lkml.org/lkml/2013/12/3/253
> > 
> > 
> > > >    2.  Have the vfio driver call driver_probe_device() to explicitly
> > > bind
> > > >        a particular device instance to the driver.  Only change we
> > need
> > > >        here is the EXPORT_SYMBOL.
> > >
> > > How are you going to prevent the driver from being bound to the device
> > > in the core with this change?  How are you going to call this function?
> > > When?  On what action of the user?
> > 
> > The vfio-pci driver would create a sysfs object "vfio_bind".
> > 
> > User would do:
> >    echo 0001:03:00.0 > /sys/bus/pci/drivers/vfio-pci/vfio_bind
> > 
> > vfio-pci would call driver_probe_device() which binds
> > the specific device to the vfio-pci driver...and there is
> > no ambiguity.
> > 
> > > > Are you in principle opposed to any mechanism that would allow 2
> > > drivers
> > > > to be resident/active and allow a sysadmin to explicitly bind a
> > > > particular device instance to the driver of their choice?
> > >
> > > No, that works today with the bind/unbind/new_id files, it's just that
> > > you don't like it :)
> > 
> > We don't like it because of the ambiguities/race-conditions with
> > the current situation.
> > 
> > A vfio driver, like vfio-pci, certainly is a bit different than a normal
> > driver, in that it really is not device ID aware.  It simply passes
> > through device resources (mappable regions, IRQs) to user space without
> > interpreting or understanding them.  It is kind of a "meta" driver, but
> > it is not a bus.  Every bus type would need its own vfio driver to
> > do this type of device pass through.
> 
> Hi Greg,
> 
> Any further thoughts on this?

Sorry, been swamped with other patches and stable stuff and not had a
time to look at it.  Give me a few days...

^ permalink raw reply	[flat|nested] 92+ messages in thread

* RE: [RFC PATCH v4 01/10] driver core: export driver_probe_device()
       [not found]                           ` <20140220224337.GA20097-U8xfFu+wG4EAvxtiuMwx3w@public.gmane.org>
@ 2014-03-06 22:25                               ` Stuart Yoder
  0 siblings, 0 replies; 92+ messages in thread
From: Stuart Yoder @ 2014-03-06 22:25 UTC (permalink / raw)
  To: Greg KH
  Cc: kvm-u79uwXL29TY76Z2rM5mHXA, jan.kiszka-kv7WeFo6aLtBDgjK7y7TUQ,
	will.deacon-5wv7dgnIgG8, linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	Bjorn Helgaas, Varun Sethi,
	kvmarm-FPEHb7Xf0XXUo1n7N8X6UoWGPAHP3yOg, Rafael J. Wysocki,
	agraf-l3A5Bk7waGM, Guenter Roeck, Dmitry Kasatkin, Tejun Heo,
	Scott Wood, Antonios Motakis,
	tech-lrHrjnjw1UfHK3s98zE1ajGjJy/sRE9J, Michal Hocko, Toshi Kani,
	a.rigo-lrHrjnjw1UfHK3s98zE1ajGjJy/sRE9J,
	iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA



> -----Original Message-----
> From: Greg KH [mailto:gregkh-hQyY1W1yCW8ekmWlsbkhG0B+6BGkLq7r@public.gmane.org]
> Sent: Thursday, February 20, 2014 4:44 PM
> To: Yoder Stuart-B08248
> Cc: Antonios Motakis; alex.williamson-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org;
> kvmarm-FPEHb7Xf0XXUo1n7N8X6UoWGPAHP3yOg@public.gmane.org; iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA@public.gmane.org; linux-
> kernel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org; tech-lrHrjnjw1UfHK3s98zE1ajGjJy/sRE9J@public.gmane.org;
> a.rigo-lrHrjnjw1UfHK3s98zE1ajGjJy/sRE9J@public.gmane.org; kim.phillips-QSEj5FYQhm4dnm+yROfE0A@public.gmane.org;
> jan.kiszka-kv7WeFo6aLtBDgjK7y7TUQ@public.gmane.org; kvm-u79uwXL29TY76Z2rM5mHXA@public.gmane.org; Bhushan Bharat-R65777; Wood
> Scott-B07421; christoffer.dall-QSEj5FYQhm4dnm+yROfE0A@public.gmane.org; agraf-l3A5Bk7waGM@public.gmane.org; Sethi Varun-
> B16395; will.deacon-5wv7dgnIgG8@public.gmane.org; Tejun Heo; Rafael J. Wysocki; Guenter Roeck;
> Toshi Kani; Joe Perches; Dmitry Kasatkin; Michal Hocko; Bjorn Helgaas
> Subject: Re: [RFC PATCH v4 01/10] driver core: export
> driver_probe_device()
> 
> On Thu, Feb 20, 2014 at 10:34:35PM +0000, Stuart Yoder wrote:
> >
> >
> > > -----Original Message-----
> > > From: Yoder Stuart-B08248
> > > Sent: Saturday, February 15, 2014 12:19 PM
> > > To: 'Greg KH'
> > > Cc: Antonios Motakis; alex.williamson-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org;
> > > kvmarm-FPEHb7Xf0XXUo1n7N8X6UoWGPAHP3yOg@public.gmane.org; iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA@public.gmane.org;
> linux-
> > > kernel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org; tech-lrHrjnjw1UfHK3s98zE1ajGjJy/sRE9J@public.gmane.org;
> > > a.rigo-lrHrjnjw1UfHK3s98zE1ajGjJy/sRE9J@public.gmane.org; kim.phillips-QSEj5FYQhm4dnm+yROfE0A@public.gmane.org;
> > > jan.kiszka-kv7WeFo6aLtBDgjK7y7TUQ@public.gmane.org; kvm-u79uwXL29TY76Z2rM5mHXA@public.gmane.org; Bhushan Bharat-R65777;
> Wood
> > > Scott-B07421; christoffer.dall-QSEj5FYQhm4dnm+yROfE0A@public.gmane.org; agraf-l3A5Bk7waGM@public.gmane.org; Sethi
> Varun-
> > > B16395; will.deacon-5wv7dgnIgG8@public.gmane.org; Tejun Heo; Rafael J. Wysocki; Guenter
> Roeck;
> > > Toshi Kani; Joe Perches; Dmitry Kasatkin; Michal Hocko; Bjorn Helgaas
> > > Subject: RE: [RFC PATCH v4 01/10] driver core: export
> > > driver_probe_device()
> > >
> > >
> > >
> > > > -----Original Message-----
> > > > From: Greg KH [mailto:gregkh-hQyY1W1yCW8ekmWlsbkhG0B+6BGkLq7r@public.gmane.org]
> > > > Sent: Saturday, February 15, 2014 11:34 AM
> > > > To: Yoder Stuart-B08248
> > > > Cc: Antonios Motakis; alex.williamson-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org;
> > > > kvmarm-FPEHb7Xf0XXUo1n7N8X6UoWGPAHP3yOg@public.gmane.org; iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA@public.gmane.org;
> linux-
> > > > kernel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org; tech-lrHrjnjw1UfHK3s98zE1ajGjJy/sRE9J@public.gmane.org;
> > > > a.rigo-lrHrjnjw1UfHK3s98zE1ajGjJy/sRE9J@public.gmane.org; kim.phillips-QSEj5FYQhm4dnm+yROfE0A@public.gmane.org;
> > > > jan.kiszka-kv7WeFo6aLtBDgjK7y7TUQ@public.gmane.org; kvm-u79uwXL29TY76Z2rM5mHXA@public.gmane.org; Bhushan Bharat-R65777;
> > > Wood
> > > > Scott-B07421; christoffer.dall-QSEj5FYQhm4dnm+yROfE0A@public.gmane.org; agraf-l3A5Bk7waGM@public.gmane.org; Sethi
> Varun-
> > > > B16395; will.deacon-5wv7dgnIgG8@public.gmane.org; Tejun Heo; Rafael J. Wysocki; Guenter
> > > Roeck;
> > > > Toshi Kani; Joe Perches; Dmitry Kasatkin; Michal Hocko; Bjorn
> Helgaas
> > > > Subject: Re: [RFC PATCH v4 01/10] driver core: export
> > > > driver_probe_device()
> > > >
> > > > On Sat, Feb 15, 2014 at 04:33:44PM +0000, Stuart Yoder wrote:
> > > > > > > Why?  driver_probe_device() allows a driver to explicitly
> bind
> > > > > > > to a specific device.   What is conceptually wrong with
> allowing
> > > > > > > that?
> > > > > >
> > > > > > Because that's not how a bus should work, and the fact that no
> > > other
> > > > > > subsystem in the kernel does that might be a hint you are
> trying to
> > > > do
> > > > > > something a bit "wrong" here.
> > > > >
> > > > > Let me try to succinctly as I can describe the problem we are
> trying
> > > to
> > > > > solve here...
> > > > >
> > > > > The vfio mechanism in the kernel (e.g. vfio-pci) allows devices
> to be
> > > > > exposed user space (via file descriptors), enabling user space
> > > > > drivers.  So, for example to export an e1000 card to user space,
> I do
> > > > > this:
> > > > >
> > > > >    echo 0001:03:00.0 >
> > > /sys/bus/pci/devices/0001:03:00.0/driver/unbind
> > > > >    echo 8086 10d3 > /sys/bus/pci/drivers/vfio-pci/new_id
> > > >
> > > > What's wrong with using the "bind" file instead?  That picks a
> specific
> > > > device and binds it to a specific driver.  Or have we been down
> this
> > > > path before?  :)
> > >
> > > Yes we have :)  The "bind" file does not bypass device ID checks, so
> > > it wouldn't work without new_id or a wildcard match of some kind.
> > >
> > > > And that is for a PCI "driver" not a totally separate bus, which it
> > > > looks like you are wanting to do here.
> > >
> > > vfio-pci is a PCI driver, not a bus (drivers/vfio/pci/vfio_pci.c).
> > >
> > > > > The first step unbinds the target device (0001:03:00.0) from the
> > > normal
> > > > > e1000 driver.
> > > > >
> > > > > The second step causes the vfio-pci driver to bind to device
> > > > 0001:03:00.0.
> > > > > This second step tells vfio-pci that it now handles e1000 device
> IDs,
> > > > > and the vfio-pci drivers registers with the PCI bus to handle
> '8086
> > > > 10d3'.
> > > > >
> > > > > That works, but it is ugly.  We now have 2 active drivers
> handling
> > > > > the same device type...which introduces various possible race
> > > > conditions.
> > > > >
> > > > > We never want vfio-pci to auto-bind to any new device that shows
> up
> > > > > on the PCI bus.  Binding a device to vfio-pci must be an explicit
> > > > > action by an administrator.
> > > >
> > > > Then use the "bind" file.
> > >
> > > See above.
> > >
> > > > > You mentioned previously that user space can sort out the problem
> > > > > of multiple drivers registered for handling the same device type.
> > > > > That is true, but doesn't help here.   We don't want vfio-pci
> > > > > to handle _all_ e1000 cards, just explicitly selected e1000
> cards.
> > > > >
> > > > > We want the normal e1000 driver to be loaded and to bind to new
> > > > > devices that may be hot-plugged.
> > > >
> > > > I want a pony too...
> > >
> > > It's not that difficult...this patch accomplishes it by
> > > simply allowing drivers to call driver_probe_device().
> > >
> > > > > There are 2 proposed mechanisms that have been put forth, both of
> > > > > which you have now rejected:
> > > > >
> > > > >    1.  sysfs_bind_only flag was proposed which would allow a vfio
> > > > >        driver (like vfio-pci) to only bind by explicit request
> > > through
> > > > >        the sysfs 'bind' file.
> > > >
> > > > Why did I reject this?  What did the patch look like?
> > >
> > > https://lkml.org/lkml/2013/12/3/253
> > >
> > >
> > > > >    2.  Have the vfio driver call driver_probe_device() to
> explicitly
> > > > bind
> > > > >        a particular device instance to the driver.  Only change
> we
> > > need
> > > > >        here is the EXPORT_SYMBOL.
> > > >
> > > > How are you going to prevent the driver from being bound to the
> device
> > > > in the core with this change?  How are you going to call this
> function?
> > > > When?  On what action of the user?
> > >
> > > The vfio-pci driver would create a sysfs object "vfio_bind".
> > >
> > > User would do:
> > >    echo 0001:03:00.0 > /sys/bus/pci/drivers/vfio-pci/vfio_bind
> > >
> > > vfio-pci would call driver_probe_device() which binds
> > > the specific device to the vfio-pci driver...and there is
> > > no ambiguity.
> > >
> > > > > Are you in principle opposed to any mechanism that would allow 2
> > > > drivers
> > > > > to be resident/active and allow a sysadmin to explicitly bind a
> > > > > particular device instance to the driver of their choice?
> > > >
> > > > No, that works today with the bind/unbind/new_id files, it's just
> that
> > > > you don't like it :)
> > >
> > > We don't like it because of the ambiguities/race-conditions with
> > > the current situation.
> > >
> > > A vfio driver, like vfio-pci, certainly is a bit different than a
> normal
> > > driver, in that it really is not device ID aware.  It simply passes
> > > through device resources (mappable regions, IRQs) to user space
> without
> > > interpreting or understanding them.  It is kind of a "meta" driver,
> but
> > > it is not a bus.  Every bus type would need its own vfio driver to
> > > do this type of device pass through.
> >
> > Hi Greg,
> >
> > Any further thoughts on this?
> 
> Sorry, been swamped with other patches and stable stuff and not had a
> time to look at it.  Give me a few days...

Hi Greg, wanted to ping you on this again...

I know some days have gone by, so let me summarize the issue-- vfio
drivers in the kernel (regardless of bus type) need to bind to
devices of any type.   There seem to be 3 approaches:

   1.  new_id -- (current approach) the user explicitly registers
       each new device type with the vfio driver using the new_id
       mechanism.

       Problem: multiple drivers will be resident that handle the
       same device type...and there is nothing user space hotplug
       infrastructure can do to help.

   2.  "any id" -- the vfio driver could specify a wildcard match of
       some kind so that it can bind to any possible device id.  However,
       we don't want vfio grabbing all devices...just the ones we
       explicitly want to pass to user space.

       Proposed patch to support this was to create a new flag
       "sysfs_bind_only" in struct device_driver.  When this flag
       is set, the driver can only bind to devices via the sysfs
       bind file.  This would allow the wildcard match to work.

       Patch is here:
       https://lkml.org/lkml/2013/12/3/253

   3.  Driver initiated explicit bind -- with this approach the
       vfio driver would create a private 'bind' sysfs object
       and the user would echo the requested device into it:
 
       echo 0001:03:00.0 > /sys/bus/pci/drivers/vfio-pci/vfio_bind

       In order to make that work, the driver would need to call
       driver_probe_device() and thus we need this patch:
       https://lkml.org/lkml/2014/2/8/175


Thanks,
Stuart


 

^ permalink raw reply	[flat|nested] 92+ messages in thread

* RE: [RFC PATCH v4 01/10] driver core: export driver_probe_device()
@ 2014-03-06 22:25                               ` Stuart Yoder
  0 siblings, 0 replies; 92+ messages in thread
From: Stuart Yoder @ 2014-03-06 22:25 UTC (permalink / raw)
  To: Greg KH
  Cc: kvm-u79uwXL29TY76Z2rM5mHXA, jan.kiszka-kv7WeFo6aLtBDgjK7y7TUQ,
	will.deacon-5wv7dgnIgG8, linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	Bjorn Helgaas, Varun Sethi,
	kvmarm-FPEHb7Xf0XXUo1n7N8X6UoWGPAHP3yOg, Rafael J. Wysocki,
	agraf-l3A5Bk7waGM, Guenter Roeck, Dmitry Kasatkin, Tejun Heo,
	Scott Wood, Antonios Motakis,
	tech-lrHrjnjw1UfHK3s98zE1ajGjJy/sRE9J, Michal Hocko, Toshi Kani,
	a.rigo-lrHrjnjw1UfHK3s98zE1ajGjJy/sRE9J,
	iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA



> -----Original Message-----
> From: Greg KH [mailto:gregkh-hQyY1W1yCW8ekmWlsbkhG0B+6BGkLq7r@public.gmane.org]
> Sent: Thursday, February 20, 2014 4:44 PM
> To: Yoder Stuart-B08248
> Cc: Antonios Motakis; alex.williamson-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org;
> kvmarm-FPEHb7Xf0XXUo1n7N8X6UoWGPAHP3yOg@public.gmane.org; iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA@public.gmane.org; linux-
> kernel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org; tech-lrHrjnjw1UfHK3s98zE1ajGjJy/sRE9J@public.gmane.org;
> a.rigo-lrHrjnjw1UfHK3s98zE1ajGjJy/sRE9J@public.gmane.org; kim.phillips-QSEj5FYQhm4dnm+yROfE0A@public.gmane.org;
> jan.kiszka-kv7WeFo6aLtBDgjK7y7TUQ@public.gmane.org; kvm-u79uwXL29TY76Z2rM5mHXA@public.gmane.org; Bhushan Bharat-R65777; Wood
> Scott-B07421; christoffer.dall-QSEj5FYQhm4dnm+yROfE0A@public.gmane.org; agraf-l3A5Bk7waGM@public.gmane.org; Sethi Varun-
> B16395; will.deacon-5wv7dgnIgG8@public.gmane.org; Tejun Heo; Rafael J. Wysocki; Guenter Roeck;
> Toshi Kani; Joe Perches; Dmitry Kasatkin; Michal Hocko; Bjorn Helgaas
> Subject: Re: [RFC PATCH v4 01/10] driver core: export
> driver_probe_device()
> 
> On Thu, Feb 20, 2014 at 10:34:35PM +0000, Stuart Yoder wrote:
> >
> >
> > > -----Original Message-----
> > > From: Yoder Stuart-B08248
> > > Sent: Saturday, February 15, 2014 12:19 PM
> > > To: 'Greg KH'
> > > Cc: Antonios Motakis; alex.williamson-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org;
> > > kvmarm-FPEHb7Xf0XXUo1n7N8X6UoWGPAHP3yOg@public.gmane.org; iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA@public.gmane.org;
> linux-
> > > kernel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org; tech-lrHrjnjw1UfHK3s98zE1ajGjJy/sRE9J@public.gmane.org;
> > > a.rigo-lrHrjnjw1UfHK3s98zE1ajGjJy/sRE9J@public.gmane.org; kim.phillips-QSEj5FYQhm4dnm+yROfE0A@public.gmane.org;
> > > jan.kiszka-kv7WeFo6aLtBDgjK7y7TUQ@public.gmane.org; kvm-u79uwXL29TY76Z2rM5mHXA@public.gmane.org; Bhushan Bharat-R65777;
> Wood
> > > Scott-B07421; christoffer.dall-QSEj5FYQhm4dnm+yROfE0A@public.gmane.org; agraf-l3A5Bk7waGM@public.gmane.org; Sethi
> Varun-
> > > B16395; will.deacon-5wv7dgnIgG8@public.gmane.org; Tejun Heo; Rafael J. Wysocki; Guenter
> Roeck;
> > > Toshi Kani; Joe Perches; Dmitry Kasatkin; Michal Hocko; Bjorn Helgaas
> > > Subject: RE: [RFC PATCH v4 01/10] driver core: export
> > > driver_probe_device()
> > >
> > >
> > >
> > > > -----Original Message-----
> > > > From: Greg KH [mailto:gregkh-hQyY1W1yCW8ekmWlsbkhG0B+6BGkLq7r@public.gmane.org]
> > > > Sent: Saturday, February 15, 2014 11:34 AM
> > > > To: Yoder Stuart-B08248
> > > > Cc: Antonios Motakis; alex.williamson-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org;
> > > > kvmarm-FPEHb7Xf0XXUo1n7N8X6UoWGPAHP3yOg@public.gmane.org; iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA@public.gmane.org;
> linux-
> > > > kernel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org; tech-lrHrjnjw1UfHK3s98zE1ajGjJy/sRE9J@public.gmane.org;
> > > > a.rigo-lrHrjnjw1UfHK3s98zE1ajGjJy/sRE9J@public.gmane.org; kim.phillips-QSEj5FYQhm4dnm+yROfE0A@public.gmane.org;
> > > > jan.kiszka-kv7WeFo6aLtBDgjK7y7TUQ@public.gmane.org; kvm-u79uwXL29TY76Z2rM5mHXA@public.gmane.org; Bhushan Bharat-R65777;
> > > Wood
> > > > Scott-B07421; christoffer.dall-QSEj5FYQhm4dnm+yROfE0A@public.gmane.org; agraf-l3A5Bk7waGM@public.gmane.org; Sethi
> Varun-
> > > > B16395; will.deacon-5wv7dgnIgG8@public.gmane.org; Tejun Heo; Rafael J. Wysocki; Guenter
> > > Roeck;
> > > > Toshi Kani; Joe Perches; Dmitry Kasatkin; Michal Hocko; Bjorn
> Helgaas
> > > > Subject: Re: [RFC PATCH v4 01/10] driver core: export
> > > > driver_probe_device()
> > > >
> > > > On Sat, Feb 15, 2014 at 04:33:44PM +0000, Stuart Yoder wrote:
> > > > > > > Why?  driver_probe_device() allows a driver to explicitly
> bind
> > > > > > > to a specific device.   What is conceptually wrong with
> allowing
> > > > > > > that?
> > > > > >
> > > > > > Because that's not how a bus should work, and the fact that no
> > > other
> > > > > > subsystem in the kernel does that might be a hint you are
> trying to
> > > > do
> > > > > > something a bit "wrong" here.
> > > > >
> > > > > Let me try to succinctly as I can describe the problem we are
> trying
> > > to
> > > > > solve here...
> > > > >
> > > > > The vfio mechanism in the kernel (e.g. vfio-pci) allows devices
> to be
> > > > > exposed user space (via file descriptors), enabling user space
> > > > > drivers.  So, for example to export an e1000 card to user space,
> I do
> > > > > this:
> > > > >
> > > > >    echo 0001:03:00.0 >
> > > /sys/bus/pci/devices/0001:03:00.0/driver/unbind
> > > > >    echo 8086 10d3 > /sys/bus/pci/drivers/vfio-pci/new_id
> > > >
> > > > What's wrong with using the "bind" file instead?  That picks a
> specific
> > > > device and binds it to a specific driver.  Or have we been down
> this
> > > > path before?  :)
> > >
> > > Yes we have :)  The "bind" file does not bypass device ID checks, so
> > > it wouldn't work without new_id or a wildcard match of some kind.
> > >
> > > > And that is for a PCI "driver" not a totally separate bus, which it
> > > > looks like you are wanting to do here.
> > >
> > > vfio-pci is a PCI driver, not a bus (drivers/vfio/pci/vfio_pci.c).
> > >
> > > > > The first step unbinds the target device (0001:03:00.0) from the
> > > normal
> > > > > e1000 driver.
> > > > >
> > > > > The second step causes the vfio-pci driver to bind to device
> > > > 0001:03:00.0.
> > > > > This second step tells vfio-pci that it now handles e1000 device
> IDs,
> > > > > and the vfio-pci drivers registers with the PCI bus to handle
> '8086
> > > > 10d3'.
> > > > >
> > > > > That works, but it is ugly.  We now have 2 active drivers
> handling
> > > > > the same device type...which introduces various possible race
> > > > conditions.
> > > > >
> > > > > We never want vfio-pci to auto-bind to any new device that shows
> up
> > > > > on the PCI bus.  Binding a device to vfio-pci must be an explicit
> > > > > action by an administrator.
> > > >
> > > > Then use the "bind" file.
> > >
> > > See above.
> > >
> > > > > You mentioned previously that user space can sort out the problem
> > > > > of multiple drivers registered for handling the same device type.
> > > > > That is true, but doesn't help here.   We don't want vfio-pci
> > > > > to handle _all_ e1000 cards, just explicitly selected e1000
> cards.
> > > > >
> > > > > We want the normal e1000 driver to be loaded and to bind to new
> > > > > devices that may be hot-plugged.
> > > >
> > > > I want a pony too...
> > >
> > > It's not that difficult...this patch accomplishes it by
> > > simply allowing drivers to call driver_probe_device().
> > >
> > > > > There are 2 proposed mechanisms that have been put forth, both of
> > > > > which you have now rejected:
> > > > >
> > > > >    1.  sysfs_bind_only flag was proposed which would allow a vfio
> > > > >        driver (like vfio-pci) to only bind by explicit request
> > > through
> > > > >        the sysfs 'bind' file.
> > > >
> > > > Why did I reject this?  What did the patch look like?
> > >
> > > https://lkml.org/lkml/2013/12/3/253
> > >
> > >
> > > > >    2.  Have the vfio driver call driver_probe_device() to
> explicitly
> > > > bind
> > > > >        a particular device instance to the driver.  Only change
> we
> > > need
> > > > >        here is the EXPORT_SYMBOL.
> > > >
> > > > How are you going to prevent the driver from being bound to the
> device
> > > > in the core with this change?  How are you going to call this
> function?
> > > > When?  On what action of the user?
> > >
> > > The vfio-pci driver would create a sysfs object "vfio_bind".
> > >
> > > User would do:
> > >    echo 0001:03:00.0 > /sys/bus/pci/drivers/vfio-pci/vfio_bind
> > >
> > > vfio-pci would call driver_probe_device() which binds
> > > the specific device to the vfio-pci driver...and there is
> > > no ambiguity.
> > >
> > > > > Are you in principle opposed to any mechanism that would allow 2
> > > > drivers
> > > > > to be resident/active and allow a sysadmin to explicitly bind a
> > > > > particular device instance to the driver of their choice?
> > > >
> > > > No, that works today with the bind/unbind/new_id files, it's just
> that
> > > > you don't like it :)
> > >
> > > We don't like it because of the ambiguities/race-conditions with
> > > the current situation.
> > >
> > > A vfio driver, like vfio-pci, certainly is a bit different than a
> normal
> > > driver, in that it really is not device ID aware.  It simply passes
> > > through device resources (mappable regions, IRQs) to user space
> without
> > > interpreting or understanding them.  It is kind of a "meta" driver,
> but
> > > it is not a bus.  Every bus type would need its own vfio driver to
> > > do this type of device pass through.
> >
> > Hi Greg,
> >
> > Any further thoughts on this?
> 
> Sorry, been swamped with other patches and stable stuff and not had a
> time to look at it.  Give me a few days...

Hi Greg, wanted to ping you on this again...

I know some days have gone by, so let me summarize the issue-- vfio
drivers in the kernel (regardless of bus type) need to bind to
devices of any type.   There seem to be 3 approaches:

   1.  new_id -- (current approach) the user explicitly registers
       each new device type with the vfio driver using the new_id
       mechanism.

       Problem: multiple drivers will be resident that handle the
       same device type...and there is nothing user space hotplug
       infrastructure can do to help.

   2.  "any id" -- the vfio driver could specify a wildcard match of
       some kind so that it can bind to any possible device id.  However,
       we don't want vfio grabbing all devices...just the ones we
       explicitly want to pass to user space.

       Proposed patch to support this was to create a new flag
       "sysfs_bind_only" in struct device_driver.  When this flag
       is set, the driver can only bind to devices via the sysfs
       bind file.  This would allow the wildcard match to work.

       Patch is here:
       https://lkml.org/lkml/2013/12/3/253

   3.  Driver initiated explicit bind -- with this approach the
       vfio driver would create a private 'bind' sysfs object
       and the user would echo the requested device into it:
 
       echo 0001:03:00.0 > /sys/bus/pci/drivers/vfio-pci/vfio_bind

       In order to make that work, the driver would need to call
       driver_probe_device() and thus we need this patch:
       https://lkml.org/lkml/2014/2/8/175


Thanks,
Stuart


 

^ permalink raw reply	[flat|nested] 92+ messages in thread

* mechanism to allow a driver to bind to any device
  2014-02-20 22:43                           ` Greg KH
@ 2014-03-26  1:40                             ` Stuart Yoder
  -1 siblings, 0 replies; 92+ messages in thread
From: Stuart Yoder @ 2014-03-26  1:40 UTC (permalink / raw)
  To: Greg KH
  Cc: kvm-u79uwXL29TY76Z2rM5mHXA, jan.kiszka-kv7WeFo6aLtBDgjK7y7TUQ,
	will.deacon-5wv7dgnIgG8, linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	Bjorn Helgaas, Varun Sethi,
	kvmarm-FPEHb7Xf0XXUo1n7N8X6UoWGPAHP3yOg, Rafael J. Wysocki,
	agraf-l3A5Bk7waGM, Guenter Roeck, Dmitry Kasatkin, Tejun Heo,
	Scott Wood, Antonios Motakis,
	tech-lrHrjnjw1UfHK3s98zE1ajGjJy/sRE9J, Michal Hocko, Toshi Kani,
	a.rigo-lrHrjnjw1UfHK3s98zE1ajGjJy/sRE9J,
	iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA

Hi Greg,

We (Linaro, Freescale, Virtual Open Systems) are trying get an issue
closed that has been perculating for a while around creating a mechanism
that will allow kernel drivers like vfio can bind to devices of any type.

This thread with you:
http://www.spinics.net/lists/kvm-arm/msg08370.html
...seems to have died out, so am trying to get your response
and will summarize again.  Vfio drivers in the kernel (regardless of
bus type) need to bind to devices of any type.  The driver's function
is to simply export hardware resources of any type to user space.

There are several approaches that have been proposed:

   1.  new_id -- (current approach) the user explicitly registers
       each new device type with the vfio driver using the new_id
       mechanism.

       Problem: multiple drivers will be resident that handle the
       same device type...and there is nothing user space hotplug
       infrastructure can do to help.

   2.  "any id" -- the vfio driver could specify a wildcard match
       of some kind in its ID match table which would allow it to
       match and bind to any possible device id.  However,
       we don't want the vfio driver grabbing _all_ devices...just the ones we
       explicitly want to pass to user space.

       The proposed patch to support this was to create a new flag
       "sysfs_bind_only" in struct device_driver.  When this flag
       is set, the driver can only bind to devices via the sysfs
       bind file.  This would allow the wildcard match to work.

       Patch is here:
       https://lkml.org/lkml/2013/12/3/253

   3.  "Driver initiated explicit bind" -- with this approach the
       vfio driver would create a private 'bind' sysfs object
       and the user would echo the requested device into it:
 
       echo 0001:03:00.0 > /sys/bus/pci/drivers/vfio-pci/vfio_bind

       In order to make that work, the driver would need to call
       driver_probe_device() and thus we need this patch:
       https://lkml.org/lkml/2014/2/8/175


Would like your comment on these options-- option #3 is preferred
and is literally a 2 line patch.

Thanks,
Stuart

^ permalink raw reply	[flat|nested] 92+ messages in thread

* mechanism to allow a driver to bind to any device
@ 2014-03-26  1:40                             ` Stuart Yoder
  0 siblings, 0 replies; 92+ messages in thread
From: Stuart Yoder @ 2014-03-26  1:40 UTC (permalink / raw)
  To: Greg KH
  Cc: kvm-u79uwXL29TY76Z2rM5mHXA, jan.kiszka-kv7WeFo6aLtBDgjK7y7TUQ,
	will.deacon-5wv7dgnIgG8, linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	Bjorn Helgaas, Varun Sethi,
	kvmarm-FPEHb7Xf0XXUo1n7N8X6UoWGPAHP3yOg, Rafael J. Wysocki,
	agraf-l3A5Bk7waGM, Guenter Roeck, Dmitry Kasatkin, Tejun Heo,
	Scott Wood, Antonios Motakis,
	tech-lrHrjnjw1UfHK3s98zE1ajGjJy/sRE9J, Michal Hocko, Toshi Kani,
	a.rigo-lrHrjnjw1UfHK3s98zE1ajGjJy/sRE9J,
	iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA

Hi Greg,

We (Linaro, Freescale, Virtual Open Systems) are trying get an issue
closed that has been perculating for a while around creating a mechanism
that will allow kernel drivers like vfio can bind to devices of any type.

This thread with you:
http://www.spinics.net/lists/kvm-arm/msg08370.html
...seems to have died out, so am trying to get your response
and will summarize again.  Vfio drivers in the kernel (regardless of
bus type) need to bind to devices of any type.  The driver's function
is to simply export hardware resources of any type to user space.

There are several approaches that have been proposed:

   1.  new_id -- (current approach) the user explicitly registers
       each new device type with the vfio driver using the new_id
       mechanism.

       Problem: multiple drivers will be resident that handle the
       same device type...and there is nothing user space hotplug
       infrastructure can do to help.

   2.  "any id" -- the vfio driver could specify a wildcard match
       of some kind in its ID match table which would allow it to
       match and bind to any possible device id.  However,
       we don't want the vfio driver grabbing _all_ devices...just the ones we
       explicitly want to pass to user space.

       The proposed patch to support this was to create a new flag
       "sysfs_bind_only" in struct device_driver.  When this flag
       is set, the driver can only bind to devices via the sysfs
       bind file.  This would allow the wildcard match to work.

       Patch is here:
       https://lkml.org/lkml/2013/12/3/253

   3.  "Driver initiated explicit bind" -- with this approach the
       vfio driver would create a private 'bind' sysfs object
       and the user would echo the requested device into it:
 
       echo 0001:03:00.0 > /sys/bus/pci/drivers/vfio-pci/vfio_bind

       In order to make that work, the driver would need to call
       driver_probe_device() and thus we need this patch:
       https://lkml.org/lkml/2014/2/8/175


Would like your comment on these options-- option #3 is preferred
and is literally a 2 line patch.

Thanks,
Stuart

^ permalink raw reply	[flat|nested] 92+ messages in thread

* Re: mechanism to allow a driver to bind to any device
       [not found]                             ` <54cd150235ba4954becdd12f725c5ebd-ufbTtyGzTTT8GZusEWM6WuO6mTEJWrR4XA4E9RH9d+qIuWR1G4zioA@public.gmane.org>
@ 2014-03-26 14:40                               ` Konrad Rzeszutek Wilk
       [not found]                                 ` <20140326144025.GA18387-6K5HmflnPlqSPmnEAIUT9EEOCMrvLtNR@public.gmane.org>
  2014-03-26 21:39                                 ` Antonios Motakis
  2014-03-26 21:42                                 ` Antonios Motakis
  2 siblings, 1 reply; 92+ messages in thread
From: Konrad Rzeszutek Wilk @ 2014-03-26 14:40 UTC (permalink / raw)
  To: Stuart Yoder
  Cc: kvm-u79uwXL29TY76Z2rM5mHXA, jan.kiszka-kv7WeFo6aLtBDgjK7y7TUQ,
	will.deacon-5wv7dgnIgG8, linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	Michal Hocko, Bjorn Helgaas, Varun Sethi,
	kvmarm-FPEHb7Xf0XXUo1n7N8X6UoWGPAHP3yOg, Rafael J. Wysocki,
	agraf-l3A5Bk7waGM, Guenter Roeck, Dmitry Kasatkin, Joe Perches,
	Scott Wood, Antonios Motakis,
	tech-lrHrjnjw1UfHK3s98zE1ajGjJy/sRE9J, Toshi Kani, Greg KH,
	a.rigo-lrHrjnjw1UfHK3s98zE1ajGjJy/sRE9J,

On Wed, Mar 26, 2014 at 01:40:32AM +0000, Stuart Yoder wrote:
> Hi Greg,
> 
> We (Linaro, Freescale, Virtual Open Systems) are trying get an issue
> closed that has been perculating for a while around creating a mechanism
> that will allow kernel drivers like vfio can bind to devices of any type.
> 
> This thread with you:
> http://www.spinics.net/lists/kvm-arm/msg08370.html
> ...seems to have died out, so am trying to get your response
> and will summarize again.  Vfio drivers in the kernel (regardless of
> bus type) need to bind to devices of any type.  The driver's function
> is to simply export hardware resources of any type to user space.
> 
> There are several approaches that have been proposed:

You seem to have missed the one I proposed.
> 
>    1.  new_id -- (current approach) the user explicitly registers
>        each new device type with the vfio driver using the new_id
>        mechanism.
> 
>        Problem: multiple drivers will be resident that handle the
>        same device type...and there is nothing user space hotplug
>        infrastructure can do to help.
> 
>    2.  "any id" -- the vfio driver could specify a wildcard match
>        of some kind in its ID match table which would allow it to
>        match and bind to any possible device id.  However,
>        we don't want the vfio driver grabbing _all_ devices...just the ones we
>        explicitly want to pass to user space.
> 
>        The proposed patch to support this was to create a new flag
>        "sysfs_bind_only" in struct device_driver.  When this flag
>        is set, the driver can only bind to devices via the sysfs
>        bind file.  This would allow the wildcard match to work.
> 
>        Patch is here:
>        https://lkml.org/lkml/2013/12/3/253
> 
>    3.  "Driver initiated explicit bind" -- with this approach the
>        vfio driver would create a private 'bind' sysfs object
>        and the user would echo the requested device into it:
>  
>        echo 0001:03:00.0 > /sys/bus/pci/drivers/vfio-pci/vfio_bind
> 
>        In order to make that work, the driver would need to call
>        driver_probe_device() and thus we need this patch:
>        https://lkml.org/lkml/2014/2/8/175
> 

4). Use the 'unbind' (from the original device) and 'bind' to vfio driver.

Which I think is what is currently being done. Why is that not sufficient?
The only thing I see in the URL is " That works, but it is ugly."
There is some mention of race but I don't see how - if you do the 'unbind'
on the original driver and then bind the BDF to the VFIO how would you get
a race?

> 
> Would like your comment on these options-- option #3 is preferred
> and is literally a 2 line patch.
> 
> Thanks,
> Stuart
> _______________________________________________
> iommu mailing list
> iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA@public.gmane.org
> https://lists.linuxfoundation.org/mailman/listinfo/iommu

^ permalink raw reply	[flat|nested] 92+ messages in thread

* Re: mechanism to allow a driver to bind to any device
       [not found]                                 ` <20140326144025.GA18387-6K5HmflnPlqSPmnEAIUT9EEOCMrvLtNR@public.gmane.org>
@ 2014-03-26 15:06                                   ` Alexander Graf
       [not found]                                     ` <D45FC8F2-7807-4BBB-A253-8EFCD091D6BD-l3A5Bk7waGM@public.gmane.org>
  2014-03-26 15:32                                   ` Stuart Yoder
  1 sibling, 1 reply; 92+ messages in thread
From: Alexander Graf @ 2014-03-26 15:06 UTC (permalink / raw)
  To: Konrad Rzeszutek Wilk
  Cc: kvm-u79uwXL29TY76Z2rM5mHXA, jan.kiszka-kv7WeFo6aLtBDgjK7y7TUQ,
	will.deacon-5wv7dgnIgG8, Stuart Yoder,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA, Michal Hocko, Bjorn Helgaas,
	Varun Sethi, kvmarm-FPEHb7Xf0XXUo1n7N8X6UoWGPAHP3yOg,
	Rafael J. Wysocki, Guenter Roeck, Dmitry Kasatkin, Joe Perches,
	Scott Wood, Antonios Motakis,
	tech-lrHrjnjw1UfHK3s98zE1ajGjJy/sRE9J, Toshi Kani, Greg KH,
	a.rigo-lrHrjnjw1UfHK3s98zE1ajGjJy/sRE9J,



> Am 26.03.2014 um 22:40 schrieb Konrad Rzeszutek Wilk <konrad.wilk-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org>:
> 
>> On Wed, Mar 26, 2014 at 01:40:32AM +0000, Stuart Yoder wrote:
>> Hi Greg,
>> 
>> We (Linaro, Freescale, Virtual Open Systems) are trying get an issue
>> closed that has been perculating for a while around creating a mechanism
>> that will allow kernel drivers like vfio can bind to devices of any type.
>> 
>> This thread with you:
>> http://www.spinics.net/lists/kvm-arm/msg08370.html
>> ...seems to have died out, so am trying to get your response
>> and will summarize again.  Vfio drivers in the kernel (regardless of
>> bus type) need to bind to devices of any type.  The driver's function
>> is to simply export hardware resources of any type to user space.
>> 
>> There are several approaches that have been proposed:
> 
> You seem to have missed the one I proposed.
>> 
>>   1.  new_id -- (current approach) the user explicitly registers
>>       each new device type with the vfio driver using the new_id
>>       mechanism.
>> 
>>       Problem: multiple drivers will be resident that handle the
>>       same device type...and there is nothing user space hotplug
>>       infrastructure can do to help.
>> 
>>   2.  "any id" -- the vfio driver could specify a wildcard match
>>       of some kind in its ID match table which would allow it to
>>       match and bind to any possible device id.  However,
>>       we don't want the vfio driver grabbing _all_ devices...just the ones we
>>       explicitly want to pass to user space.
>> 
>>       The proposed patch to support this was to create a new flag
>>       "sysfs_bind_only" in struct device_driver.  When this flag
>>       is set, the driver can only bind to devices via the sysfs
>>       bind file.  This would allow the wildcard match to work.
>> 
>>       Patch is here:
>>       https://lkml.org/lkml/2013/12/3/253
>> 
>>   3.  "Driver initiated explicit bind" -- with this approach the
>>       vfio driver would create a private 'bind' sysfs object
>>       and the user would echo the requested device into it:
>> 
>>       echo 0001:03:00.0 > /sys/bus/pci/drivers/vfio-pci/vfio_bind
>> 
>>       In order to make that work, the driver would need to call
>>       driver_probe_device() and thus we need this patch:
>>       https://lkml.org/lkml/2014/2/8/175
> 
> 4). Use the 'unbind' (from the original device) and 'bind' to vfio driver.

This is approach 2, no?

> 
> Which I think is what is currently being done. Why is that not sufficient?

How would 'bind to vfio driver' look like?

> The only thing I see in the URL is " That works, but it is ugly."
> There is some mention of race but I don't see how - if you do the 'unbind'
> on the original driver and then bind the BDF to the VFIO how would you get
> a race?

Typically on PCI, you do a

  - add wildcard (pci id) match to vfio driver
  - unbind driver
  -> reprobe
  -> device attaches to vfio driver because it is the least recent match
  - remove wildcard match from vfio driver

If in between you hotplug add a card of the same type, it gets attached to vfio - even though the logical "default driver" would be the device specific driver.


Alex

> 
>> 
>> Would like your comment on these options-- option #3 is preferred
>> and is literally a 2 line patch.
>> 
>> Thanks,
>> Stuart
>> _______________________________________________
>> iommu mailing list
>> iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA@public.gmane.org
>> https://lists.linuxfoundation.org/mailman/listinfo/iommu

^ permalink raw reply	[flat|nested] 92+ messages in thread

* RE: mechanism to allow a driver to bind to any device
       [not found]                                 ` <20140326144025.GA18387-6K5HmflnPlqSPmnEAIUT9EEOCMrvLtNR@public.gmane.org>
  2014-03-26 15:06                                   ` Alexander Graf
@ 2014-03-26 15:32                                   ` Stuart Yoder
  1 sibling, 0 replies; 92+ messages in thread
From: Stuart Yoder @ 2014-03-26 15:32 UTC (permalink / raw)
  To: Konrad Rzeszutek Wilk
  Cc: kvm-u79uwXL29TY76Z2rM5mHXA, jan.kiszka-kv7WeFo6aLtBDgjK7y7TUQ,
	will.deacon-5wv7dgnIgG8, linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	Michal Hocko, Bjorn Helgaas, Varun Sethi,
	kvmarm-FPEHb7Xf0XXUo1n7N8X6UoWGPAHP3yOg, Rafael J. Wysocki,
	agraf-l3A5Bk7waGM, Guenter Roeck, Dmitry Kasatkin, Joe Perches,
	Scott Wood, Antonios Motakis,
	tech-lrHrjnjw1UfHK3s98zE1ajGjJy/sRE9J, Toshi Kani, Greg KH,
	a.rigo-lrHrjnjw1UfHK3s98zE1ajGjJy/sRE9J,



> -----Original Message-----
> From: Konrad Rzeszutek Wilk [mailto:konrad.wilk-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org]
> Sent: Wednesday, March 26, 2014 9:40 AM
> To: Yoder Stuart-B08248
> Cc: Greg KH; kvm-u79uwXL29TY76Z2rM5mHXA@public.gmane.org; jan.kiszka-kv7WeFo6aLtBDgjK7y7TUQ@public.gmane.org;
> will.deacon-5wv7dgnIgG8@public.gmane.org; linux-kernel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org; Bjorn Helgaas; Sethi
> Varun-B16395; kvmarm-FPEHb7Xf0XXUo1n7N8X6UoWGPAHP3yOg@public.gmane.org; Rafael J. Wysocki;
> agraf-l3A5Bk7waGM@public.gmane.org; Guenter Roeck; Dmitry Kasatkin; Tejun Heo; Wood Scott-
> B07421; Antonios Motakis; tech-lrHrjnjw1UfHK3s98zE1ajGjJy/sRE9J@public.gmane.org; Michal Hocko;
> Toshi Kani; a.rigo-lrHrjnjw1UfHK3s98zE1ajGjJy/sRE9J@public.gmane.org; iommu-cunTk1MwBs/ROKNJybVBZg@public.gmane.org
> foundation.org; Joe Perches; christoffer.dall-QSEj5FYQhm4dnm+yROfE0A@public.gmane.org
> Subject: Re: mechanism to allow a driver to bind to any device
> 
> On Wed, Mar 26, 2014 at 01:40:32AM +0000, Stuart Yoder wrote:
> > Hi Greg,
> >
> > We (Linaro, Freescale, Virtual Open Systems) are trying get an issue
> > closed that has been perculating for a while around creating a
> mechanism
> > that will allow kernel drivers like vfio can bind to devices of any
> type.
> >
> > This thread with you:
> > http://www.spinics.net/lists/kvm-arm/msg08370.html
> > ...seems to have died out, so am trying to get your response
> > and will summarize again.  Vfio drivers in the kernel (regardless of
> > bus type) need to bind to devices of any type.  The driver's function
> > is to simply export hardware resources of any type to user space.
> >
> > There are several approaches that have been proposed:
> 
> You seem to have missed the one I proposed.

Sorry, I frankly had no idea of what you were talking about.  Please
explain with an example what steps a user would take to unbind
a device from the host and bind it to say vfio-pci.

> >    1.  new_id -- (current approach) the user explicitly registers
> >        each new device type with the vfio driver using the new_id
> >        mechanism.
> >
> >        Problem: multiple drivers will be resident that handle the
> >        same device type...and there is nothing user space hotplug
> >        infrastructure can do to help.
> >
> >    2.  "any id" -- the vfio driver could specify a wildcard match
> >        of some kind in its ID match table which would allow it to
> >        match and bind to any possible device id.  However,
> >        we don't want the vfio driver grabbing _all_ devices...just the
> ones we
> >        explicitly want to pass to user space.
> >
> >        The proposed patch to support this was to create a new flag
> >        "sysfs_bind_only" in struct device_driver.  When this flag
> >        is set, the driver can only bind to devices via the sysfs
> >        bind file.  This would allow the wildcard match to work.
> >
> >        Patch is here:
> >        https://lkml.org/lkml/2013/12/3/253
> >
> >    3.  "Driver initiated explicit bind" -- with this approach the
> >        vfio driver would create a private 'bind' sysfs object
> >        and the user would echo the requested device into it:
> >
> >        echo 0001:03:00.0 > /sys/bus/pci/drivers/vfio-pci/vfio_bind
> >
> >        In order to make that work, the driver would need to call
> >        driver_probe_device() and thus we need this patch:
> >        https://lkml.org/lkml/2014/2/8/175
> >
> 
> 4). Use the 'unbind' (from the original device) and 'bind' to vfio
> driver.

How can you bind a device to vfio, when vfio is not aware of the
device type?   Does not work.

There is no way the vfio driver can know ahead of time what device may
be bound to it.

Stuart

^ permalink raw reply	[flat|nested] 92+ messages in thread

* Re: mechanism to allow a driver to bind to any device
       [not found]                                     ` <D45FC8F2-7807-4BBB-A253-8EFCD091D6BD-l3A5Bk7waGM@public.gmane.org>
@ 2014-03-26 16:21                                       ` Alex Williamson
       [not found]                                         ` <1395850862.632.247.camel-85EaTFmN5p//9pzu0YdTqQ@public.gmane.org>
  2014-03-26 16:24                                       ` Konrad Rzeszutek Wilk
  1 sibling, 1 reply; 92+ messages in thread
From: Alex Williamson @ 2014-03-26 16:21 UTC (permalink / raw)
  To: Alexander Graf
  Cc: kvm-u79uwXL29TY76Z2rM5mHXA, jan.kiszka-kv7WeFo6aLtBDgjK7y7TUQ,
	will.deacon-5wv7dgnIgG8, Stuart Yoder,
	a.rigo-lrHrjnjw1UfHK3s98zE1ajGjJy/sRE9J, Michal Hocko,
	Scott Wood, Varun Sethi, kvmarm-FPEHb7Xf0XXUo1n7N8X6UoWGPAHP3yOg,
	Rafael J. Wysocki, Guenter Roeck, Dmitry Kasatkin, Tejun Heo,
	Bjorn Helgaas, Antonios Motakis,
	tech-lrHrjnjw1UfHK3s98zE1ajGjJy/sRE9J, Toshi Kani, Greg KH,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA,

On Wed, 2014-03-26 at 23:06 +0800, Alexander Graf wrote:
> 
> > Am 26.03.2014 um 22:40 schrieb Konrad Rzeszutek Wilk <konrad.wilk-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org>:
> > 
> >> On Wed, Mar 26, 2014 at 01:40:32AM +0000, Stuart Yoder wrote:
> >> Hi Greg,
> >> 
> >> We (Linaro, Freescale, Virtual Open Systems) are trying get an issue
> >> closed that has been perculating for a while around creating a mechanism
> >> that will allow kernel drivers like vfio can bind to devices of any type.
> >> 
> >> This thread with you:
> >> http://www.spinics.net/lists/kvm-arm/msg08370.html
> >> ...seems to have died out, so am trying to get your response
> >> and will summarize again.  Vfio drivers in the kernel (regardless of
> >> bus type) need to bind to devices of any type.  The driver's function
> >> is to simply export hardware resources of any type to user space.
> >> 
> >> There are several approaches that have been proposed:
> > 
> > You seem to have missed the one I proposed.
> >> 
> >>   1.  new_id -- (current approach) the user explicitly registers
> >>       each new device type with the vfio driver using the new_id
> >>       mechanism.
> >> 
> >>       Problem: multiple drivers will be resident that handle the
> >>       same device type...and there is nothing user space hotplug
> >>       infrastructure can do to help.
> >> 
> >>   2.  "any id" -- the vfio driver could specify a wildcard match
> >>       of some kind in its ID match table which would allow it to
> >>       match and bind to any possible device id.  However,
> >>       we don't want the vfio driver grabbing _all_ devices...just the ones we
> >>       explicitly want to pass to user space.
> >> 
> >>       The proposed patch to support this was to create a new flag
> >>       "sysfs_bind_only" in struct device_driver.  When this flag
> >>       is set, the driver can only bind to devices via the sysfs
> >>       bind file.  This would allow the wildcard match to work.
> >> 
> >>       Patch is here:
> >>       https://lkml.org/lkml/2013/12/3/253
> >> 
> >>   3.  "Driver initiated explicit bind" -- with this approach the
> >>       vfio driver would create a private 'bind' sysfs object
> >>       and the user would echo the requested device into it:
> >> 
> >>       echo 0001:03:00.0 > /sys/bus/pci/drivers/vfio-pci/vfio_bind
> >> 
> >>       In order to make that work, the driver would need to call
> >>       driver_probe_device() and thus we need this patch:
> >>       https://lkml.org/lkml/2014/2/8/175
> > 
> > 4). Use the 'unbind' (from the original device) and 'bind' to vfio driver.
> 
> This is approach 2, no?
> 
> > 
> > Which I think is what is currently being done. Why is that not sufficient?
> 
> How would 'bind to vfio driver' look like?
> 
> > The only thing I see in the URL is " That works, but it is ugly."
> > There is some mention of race but I don't see how - if you do the 'unbind'
> > on the original driver and then bind the BDF to the VFIO how would you get
> > a race?
> 
> Typically on PCI, you do a
> 
>   - add wildcard (pci id) match to vfio driver
>   - unbind driver
>   -> reprobe
>   -> device attaches to vfio driver because it is the least recent match
>   - remove wildcard match from vfio driver
> 
> If in between you hotplug add a card of the same type, it gets attached to vfio - even though the logical "default driver" would be the device specific driver.

I've mentioned drivers_autoprobe in the past, but I'm not sure we're
really factoring it into the discussion.  drivers_autoprobe allows us to
toggle two points:

a) When a new device is added whether we automatically give drivers a
try at binding to it

b) When a new driver is added whether it gets to try to bind to anything
in the system

So we do have a mechanism to avoid the race, but the problem is that it
becomes the responsibility of userspace to:

1) turn off drivers_autoprobe
2) unbind/new_id/bind/remove_id
3) turn on drivers_autoprobe
4) call drivers_probe for anything added between 1) & 3)

Is the question about the ugliness of the current solution whether it's
unreasonable to ask userspace to do this?

What we seem to be asking for above is more like an autoprobe flag per
driver where there's some way for this special driver to opt out of auto
probing.  Option 2. in Stuart's list does this by short-cutting ID
matching so that a "match" is only found when using the sysfs bind path,
option 3. enables a way for a driver to expose their own sysfs entry
point for binding.  The latter feels particularly chaotic since drivers
get to make-up their own bind mechanism.

Another twist I'll throw in is that devices can be hot added to IOMMU
groups that are in-use by userspace.  When that happens we'd like to be
able to disable driver autoprobe of the device to avoid a host driver
automatically binding to the device.  I wonder if instead of looking at
the problem from the driver perspective, if we were to instead look at
it from the device perspective if we might find a solution that would
address both.  For instance, if devices had a driver_probe_id property
that was by default set to their bus specific ID match ("$VENDOR
$DEVICE" on PCI) could we use that to write new match IDs so that a
device could only bind to a given driver?  Effectively we could then
bind either using the current method of adding to the list of IDs a
driver will match of changing the ID that a device would match.  Does
that get us anywhere?  Thanks,

Alex

^ permalink raw reply	[flat|nested] 92+ messages in thread

* Re: mechanism to allow a driver to bind to any device
       [not found]                                     ` <D45FC8F2-7807-4BBB-A253-8EFCD091D6BD-l3A5Bk7waGM@public.gmane.org>
  2014-03-26 16:21                                       ` Alex Williamson
@ 2014-03-26 16:24                                       ` Konrad Rzeszutek Wilk
  1 sibling, 0 replies; 92+ messages in thread
From: Konrad Rzeszutek Wilk @ 2014-03-26 16:24 UTC (permalink / raw)
  To: Alexander Graf
  Cc: kvm-u79uwXL29TY76Z2rM5mHXA, jan.kiszka-kv7WeFo6aLtBDgjK7y7TUQ,
	will.deacon-5wv7dgnIgG8, Stuart Yoder,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA, Michal Hocko, Bjorn Helgaas,
	Varun Sethi, kvmarm-FPEHb7Xf0XXUo1n7N8X6UoWGPAHP3yOg,
	Rafael J. Wysocki, Guenter Roeck, Dmitry Kasatkin, Joe Perches,
	Scott Wood, Antonios Motakis,
	tech-lrHrjnjw1UfHK3s98zE1ajGjJy/sRE9J, Toshi Kani, Greg KH,
	a.rigo-lrHrjnjw1UfHK3s98zE1ajGjJy/sRE9J,

On Wed, Mar 26, 2014 at 11:06:02PM +0800, Alexander Graf wrote:
> 
> 
> > Am 26.03.2014 um 22:40 schrieb Konrad Rzeszutek Wilk <konrad.wilk-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org>:
> > 
> >> On Wed, Mar 26, 2014 at 01:40:32AM +0000, Stuart Yoder wrote:
> >> Hi Greg,
> >> 
> >> We (Linaro, Freescale, Virtual Open Systems) are trying get an issue
> >> closed that has been perculating for a while around creating a mechanism
> >> that will allow kernel drivers like vfio can bind to devices of any type.
> >> 
> >> This thread with you:
> >> http://www.spinics.net/lists/kvm-arm/msg08370.html
> >> ...seems to have died out, so am trying to get your response
> >> and will summarize again.  Vfio drivers in the kernel (regardless of
> >> bus type) need to bind to devices of any type.  The driver's function
> >> is to simply export hardware resources of any type to user space.
> >> 
> >> There are several approaches that have been proposed:
> > 
> > You seem to have missed the one I proposed.
> >> 
> >>   1.  new_id -- (current approach) the user explicitly registers
> >>       each new device type with the vfio driver using the new_id
> >>       mechanism.
> >> 
> >>       Problem: multiple drivers will be resident that handle the
> >>       same device type...and there is nothing user space hotplug
> >>       infrastructure can do to help.
> >> 
> >>   2.  "any id" -- the vfio driver could specify a wildcard match
> >>       of some kind in its ID match table which would allow it to
> >>       match and bind to any possible device id.  However,
> >>       we don't want the vfio driver grabbing _all_ devices...just the ones we
> >>       explicitly want to pass to user space.
> >> 
> >>       The proposed patch to support this was to create a new flag
> >>       "sysfs_bind_only" in struct device_driver.  When this flag
> >>       is set, the driver can only bind to devices via the sysfs
> >>       bind file.  This would allow the wildcard match to work.
> >> 
> >>       Patch is here:
> >>       https://lkml.org/lkml/2013/12/3/253
> >> 
> >>   3.  "Driver initiated explicit bind" -- with this approach the
> >>       vfio driver would create a private 'bind' sysfs object
> >>       and the user would echo the requested device into it:
> >> 
> >>       echo 0001:03:00.0 > /sys/bus/pci/drivers/vfio-pci/vfio_bind
> >> 
> >>       In order to make that work, the driver would need to call
> >>       driver_probe_device() and thus we need this patch:
> >>       https://lkml.org/lkml/2014/2/8/175
> > 
> > 4). Use the 'unbind' (from the original device) and 'bind' to vfio driver.
> 
> This is approach 2, no?
> 
> > 
> > Which I think is what is currently being done. Why is that not sufficient?
> 
> How would 'bind to vfio driver' look like?

you echo the BDF to a 'new_slot' to setup an pci match entry (so that it
can lookup from the BDF the device/vendor id). Then you echo the
BDF to the 'bind'.

> 
> > The only thing I see in the URL is " That works, but it is ugly."
> > There is some mention of race but I don't see how - if you do the 'unbind'
> > on the original driver and then bind the BDF to the VFIO how would you get
> > a race?
> 
> Typically on PCI, you do a
> 
>   - add wildcard (pci id) match to vfio driver
>   - unbind driver
>   -> reprobe
>   -> device attaches to vfio driver because it is the least recent match
>   - remove wildcard match from vfio driver
> 
> If in between you hotplug add a card of the same type, it gets attached to vfio - even though the logical "default driver" would be the device specific driver.

But that would not happen if you use BDF. So if you switch from using
device/vendor id then you don't have this problem.
> 
> 
> Alex
> 
> > 
> >> 
> >> Would like your comment on these options-- option #3 is preferred
> >> and is literally a 2 line patch.
> >> 
> >> Thanks,
> >> Stuart
> >> _______________________________________________
> >> iommu mailing list
> >> iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA@public.gmane.org
> >> https://lists.linuxfoundation.org/mailman/listinfo/iommu

^ permalink raw reply	[flat|nested] 92+ messages in thread

* Re: mechanism to allow a driver to bind to any device
       [not found]                                         ` <1395850862.632.247.camel-85EaTFmN5p//9pzu0YdTqQ@public.gmane.org>
@ 2014-03-26 16:32                                             ` Konrad Rzeszutek Wilk
  2014-03-26 22:09                                           ` Alex Williamson
  2014-03-31 18:32                                             ` Stuart Yoder
  2 siblings, 0 replies; 92+ messages in thread
From: Konrad Rzeszutek Wilk @ 2014-03-26 16:32 UTC (permalink / raw)
  To: Alex Williamson
  Cc: kvm-u79uwXL29TY76Z2rM5mHXA, jan.kiszka-kv7WeFo6aLtBDgjK7y7TUQ,
	will.deacon-5wv7dgnIgG8, Stuart Yoder,
	a.rigo-lrHrjnjw1UfHK3s98zE1ajGjJy/sRE9J, Michal Hocko,
	Scott Wood, Varun Sethi, kvmarm-FPEHb7Xf0XXUo1n7N8X6UoWGPAHP3yOg,
	Rafael J. Wysocki, Alexander Graf, Guenter Roeck,
	Dmitry Kasatkin, Tejun Heo, Bjorn Helgaas, Antonios Motakis,
	tech-lrHrjnjw1UfHK3s98zE1ajGjJy/sRE9J, Toshi Kani, Greg KH,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org

On Wed, Mar 26, 2014 at 10:21:02AM -0600, Alex Williamson wrote:
> On Wed, 2014-03-26 at 23:06 +0800, Alexander Graf wrote:
> > 
> > > Am 26.03.2014 um 22:40 schrieb Konrad Rzeszutek Wilk <konrad.wilk-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org>:
> > > 
> > >> On Wed, Mar 26, 2014 at 01:40:32AM +0000, Stuart Yoder wrote:
> > >> Hi Greg,
> > >> 
> > >> We (Linaro, Freescale, Virtual Open Systems) are trying get an issue
> > >> closed that has been perculating for a while around creating a mechanism
> > >> that will allow kernel drivers like vfio can bind to devices of any type.
> > >> 
> > >> This thread with you:
> > >> http://www.spinics.net/lists/kvm-arm/msg08370.html
> > >> ...seems to have died out, so am trying to get your response
> > >> and will summarize again.  Vfio drivers in the kernel (regardless of
> > >> bus type) need to bind to devices of any type.  The driver's function
> > >> is to simply export hardware resources of any type to user space.
> > >> 
> > >> There are several approaches that have been proposed:
> > > 
> > > You seem to have missed the one I proposed.
> > >> 
> > >>   1.  new_id -- (current approach) the user explicitly registers
> > >>       each new device type with the vfio driver using the new_id
> > >>       mechanism.
> > >> 
> > >>       Problem: multiple drivers will be resident that handle the
> > >>       same device type...and there is nothing user space hotplug
> > >>       infrastructure can do to help.
> > >> 
> > >>   2.  "any id" -- the vfio driver could specify a wildcard match
> > >>       of some kind in its ID match table which would allow it to
> > >>       match and bind to any possible device id.  However,
> > >>       we don't want the vfio driver grabbing _all_ devices...just the ones we
> > >>       explicitly want to pass to user space.
> > >> 
> > >>       The proposed patch to support this was to create a new flag
> > >>       "sysfs_bind_only" in struct device_driver.  When this flag
> > >>       is set, the driver can only bind to devices via the sysfs
> > >>       bind file.  This would allow the wildcard match to work.
> > >> 
> > >>       Patch is here:
> > >>       https://lkml.org/lkml/2013/12/3/253
> > >> 
> > >>   3.  "Driver initiated explicit bind" -- with this approach the
> > >>       vfio driver would create a private 'bind' sysfs object
> > >>       and the user would echo the requested device into it:
> > >> 
> > >>       echo 0001:03:00.0 > /sys/bus/pci/drivers/vfio-pci/vfio_bind
> > >> 
> > >>       In order to make that work, the driver would need to call
> > >>       driver_probe_device() and thus we need this patch:
> > >>       https://lkml.org/lkml/2014/2/8/175
> > > 
> > > 4). Use the 'unbind' (from the original device) and 'bind' to vfio driver.
> > 
> > This is approach 2, no?
> > 
> > > 
> > > Which I think is what is currently being done. Why is that not sufficient?
> > 
> > How would 'bind to vfio driver' look like?
> > 
> > > The only thing I see in the URL is " That works, but it is ugly."
> > > There is some mention of race but I don't see how - if you do the 'unbind'
> > > on the original driver and then bind the BDF to the VFIO how would you get
> > > a race?
> > 
> > Typically on PCI, you do a
> > 
> >   - add wildcard (pci id) match to vfio driver
> >   - unbind driver
> >   -> reprobe
> >   -> device attaches to vfio driver because it is the least recent match
> >   - remove wildcard match from vfio driver
> > 
> > If in between you hotplug add a card of the same type, it gets attached to vfio - even though the logical "default driver" would be the device specific driver.
> 
> I've mentioned drivers_autoprobe in the past, but I'm not sure we're
> really factoring it into the discussion.  drivers_autoprobe allows us to
> toggle two points:
> 
> a) When a new device is added whether we automatically give drivers a
> try at binding to it
> 
> b) When a new driver is added whether it gets to try to bind to anything
> in the system
> 
> So we do have a mechanism to avoid the race, but the problem is that it
> becomes the responsibility of userspace to:
> 
> 1) turn off drivers_autoprobe
> 2) unbind/new_id/bind/remove_id
> 3) turn on drivers_autoprobe
> 4) call drivers_probe for anything added between 1) & 3)
> 
> Is the question about the ugliness of the current solution whether it's
> unreasonable to ask userspace to do this?
> 
> What we seem to be asking for above is more like an autoprobe flag per
> driver where there's some way for this special driver to opt out of auto
> probing.  Option 2. in Stuart's list does this by short-cutting ID
> matching so that a "match" is only found when using the sysfs bind path,
> option 3. enables a way for a driver to expose their own sysfs entry
> point for binding.  The latter feels particularly chaotic since drivers
> get to make-up their own bind mechanism.
> 
> Another twist I'll throw in is that devices can be hot added to IOMMU
> groups that are in-use by userspace.  When that happens we'd like to be
> able to disable driver autoprobe of the device to avoid a host driver
> automatically binding to the device.  I wonder if instead of looking at
> the problem from the driver perspective, if we were to instead look at
> it from the device perspective if we might find a solution that would
> address both.  For instance, if devices had a driver_probe_id property
> that was by default set to their bus specific ID match ("$VENDOR
> $DEVICE" on PCI) could we use that to write new match IDs so that a
> device could only bind to a given driver?  Effectively we could then
> bind either using the current method of adding to the list of IDs a
> driver will match of changing the ID that a device would match.  Does
> that get us anywhere?  Thanks,

The other option for this is to having some sort of priority on the 
device probing with hotplugging.

That is you can could do the following:

 1) add the device vendor/model in vfio
 2) unbind the BDF from the original driver.
 3) hotplug happens - any new device that has the device vendor/model gets
   owned by vfio instead of the original device.
 4). bind the BDF to the vfio.

Granted that is a bit silly too - as the admin might want to have the new
hotplugged device be owned by the native driver.

In which case, why not just switch out from using device vendor/model
to just using BDF values?
> 
> Alex
> 

^ permalink raw reply	[flat|nested] 92+ messages in thread

* Re: mechanism to allow a driver to bind to any device
@ 2014-03-26 16:32                                             ` Konrad Rzeszutek Wilk
  0 siblings, 0 replies; 92+ messages in thread
From: Konrad Rzeszutek Wilk @ 2014-03-26 16:32 UTC (permalink / raw)
  To: Alex Williamson
  Cc: kvm-u79uwXL29TY76Z2rM5mHXA, jan.kiszka-kv7WeFo6aLtBDgjK7y7TUQ,
	will.deacon-5wv7dgnIgG8, Stuart Yoder,
	a.rigo-lrHrjnjw1UfHK3s98zE1ajGjJy/sRE9J, Michal Hocko,
	Scott Wood, Varun Sethi, kvmarm-FPEHb7Xf0XXUo1n7N8X6UoWGPAHP3yOg,
	Rafael J. Wysocki, Alexander Graf, Guenter Roeck,
	Dmitry Kasatkin, Tejun Heo, Bjorn Helgaas, Antonios Motakis,
	tech-lrHrjnjw1UfHK3s98zE1ajGjJy/sRE9J, Toshi Kani, Greg KH,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org

On Wed, Mar 26, 2014 at 10:21:02AM -0600, Alex Williamson wrote:
> On Wed, 2014-03-26 at 23:06 +0800, Alexander Graf wrote:
> > 
> > > Am 26.03.2014 um 22:40 schrieb Konrad Rzeszutek Wilk <konrad.wilk-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org>:
> > > 
> > >> On Wed, Mar 26, 2014 at 01:40:32AM +0000, Stuart Yoder wrote:
> > >> Hi Greg,
> > >> 
> > >> We (Linaro, Freescale, Virtual Open Systems) are trying get an issue
> > >> closed that has been perculating for a while around creating a mechanism
> > >> that will allow kernel drivers like vfio can bind to devices of any type.
> > >> 
> > >> This thread with you:
> > >> http://www.spinics.net/lists/kvm-arm/msg08370.html
> > >> ...seems to have died out, so am trying to get your response
> > >> and will summarize again.  Vfio drivers in the kernel (regardless of
> > >> bus type) need to bind to devices of any type.  The driver's function
> > >> is to simply export hardware resources of any type to user space.
> > >> 
> > >> There are several approaches that have been proposed:
> > > 
> > > You seem to have missed the one I proposed.
> > >> 
> > >>   1.  new_id -- (current approach) the user explicitly registers
> > >>       each new device type with the vfio driver using the new_id
> > >>       mechanism.
> > >> 
> > >>       Problem: multiple drivers will be resident that handle the
> > >>       same device type...and there is nothing user space hotplug
> > >>       infrastructure can do to help.
> > >> 
> > >>   2.  "any id" -- the vfio driver could specify a wildcard match
> > >>       of some kind in its ID match table which would allow it to
> > >>       match and bind to any possible device id.  However,
> > >>       we don't want the vfio driver grabbing _all_ devices...just the ones we
> > >>       explicitly want to pass to user space.
> > >> 
> > >>       The proposed patch to support this was to create a new flag
> > >>       "sysfs_bind_only" in struct device_driver.  When this flag
> > >>       is set, the driver can only bind to devices via the sysfs
> > >>       bind file.  This would allow the wildcard match to work.
> > >> 
> > >>       Patch is here:
> > >>       https://lkml.org/lkml/2013/12/3/253
> > >> 
> > >>   3.  "Driver initiated explicit bind" -- with this approach the
> > >>       vfio driver would create a private 'bind' sysfs object
> > >>       and the user would echo the requested device into it:
> > >> 
> > >>       echo 0001:03:00.0 > /sys/bus/pci/drivers/vfio-pci/vfio_bind
> > >> 
> > >>       In order to make that work, the driver would need to call
> > >>       driver_probe_device() and thus we need this patch:
> > >>       https://lkml.org/lkml/2014/2/8/175
> > > 
> > > 4). Use the 'unbind' (from the original device) and 'bind' to vfio driver.
> > 
> > This is approach 2, no?
> > 
> > > 
> > > Which I think is what is currently being done. Why is that not sufficient?
> > 
> > How would 'bind to vfio driver' look like?
> > 
> > > The only thing I see in the URL is " That works, but it is ugly."
> > > There is some mention of race but I don't see how - if you do the 'unbind'
> > > on the original driver and then bind the BDF to the VFIO how would you get
> > > a race?
> > 
> > Typically on PCI, you do a
> > 
> >   - add wildcard (pci id) match to vfio driver
> >   - unbind driver
> >   -> reprobe
> >   -> device attaches to vfio driver because it is the least recent match
> >   - remove wildcard match from vfio driver
> > 
> > If in between you hotplug add a card of the same type, it gets attached to vfio - even though the logical "default driver" would be the device specific driver.
> 
> I've mentioned drivers_autoprobe in the past, but I'm not sure we're
> really factoring it into the discussion.  drivers_autoprobe allows us to
> toggle two points:
> 
> a) When a new device is added whether we automatically give drivers a
> try at binding to it
> 
> b) When a new driver is added whether it gets to try to bind to anything
> in the system
> 
> So we do have a mechanism to avoid the race, but the problem is that it
> becomes the responsibility of userspace to:
> 
> 1) turn off drivers_autoprobe
> 2) unbind/new_id/bind/remove_id
> 3) turn on drivers_autoprobe
> 4) call drivers_probe for anything added between 1) & 3)
> 
> Is the question about the ugliness of the current solution whether it's
> unreasonable to ask userspace to do this?
> 
> What we seem to be asking for above is more like an autoprobe flag per
> driver where there's some way for this special driver to opt out of auto
> probing.  Option 2. in Stuart's list does this by short-cutting ID
> matching so that a "match" is only found when using the sysfs bind path,
> option 3. enables a way for a driver to expose their own sysfs entry
> point for binding.  The latter feels particularly chaotic since drivers
> get to make-up their own bind mechanism.
> 
> Another twist I'll throw in is that devices can be hot added to IOMMU
> groups that are in-use by userspace.  When that happens we'd like to be
> able to disable driver autoprobe of the device to avoid a host driver
> automatically binding to the device.  I wonder if instead of looking at
> the problem from the driver perspective, if we were to instead look at
> it from the device perspective if we might find a solution that would
> address both.  For instance, if devices had a driver_probe_id property
> that was by default set to their bus specific ID match ("$VENDOR
> $DEVICE" on PCI) could we use that to write new match IDs so that a
> device could only bind to a given driver?  Effectively we could then
> bind either using the current method of adding to the list of IDs a
> driver will match of changing the ID that a device would match.  Does
> that get us anywhere?  Thanks,

The other option for this is to having some sort of priority on the 
device probing with hotplugging.

That is you can could do the following:

 1) add the device vendor/model in vfio
 2) unbind the BDF from the original driver.
 3) hotplug happens - any new device that has the device vendor/model gets
   owned by vfio instead of the original device.
 4). bind the BDF to the vfio.

Granted that is a bit silly too - as the admin might want to have the new
hotplugged device be owned by the native driver.

In which case, why not just switch out from using device vendor/model
to just using BDF values?
> 
> Alex
> 

^ permalink raw reply	[flat|nested] 92+ messages in thread

* Re: mechanism to allow a driver to bind to any device
       [not found]                                             ` <20140326163209.GB21368-6K5HmflnPlqSPmnEAIUT9EEOCMrvLtNR@public.gmane.org>
@ 2014-03-26 16:49                                                 ` Alex Williamson
  2014-03-26 17:51                                               ` Stuart Yoder
  1 sibling, 0 replies; 92+ messages in thread
From: Alex Williamson @ 2014-03-26 16:49 UTC (permalink / raw)
  To: Konrad Rzeszutek Wilk
  Cc: kvm-u79uwXL29TY76Z2rM5mHXA, jan.kiszka-kv7WeFo6aLtBDgjK7y7TUQ,
	will.deacon-5wv7dgnIgG8, Stuart Yoder,
	a.rigo-lrHrjnjw1UfHK3s98zE1ajGjJy/sRE9J, Michal Hocko,
	Scott Wood, Varun Sethi, kvmarm-FPEHb7Xf0XXUo1n7N8X6UoWGPAHP3yOg,
	Rafael J. Wysocki, Alexander Graf, Guenter Roeck,
	Dmitry Kasatkin, Tejun Heo, Bjorn Helgaas, Antonios Motakis,
	tech-lrHrjnjw1UfHK3s98zE1ajGjJy/sRE9J, Toshi Kani, Greg KH,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org

On Wed, 2014-03-26 at 12:32 -0400, Konrad Rzeszutek Wilk wrote:
> On Wed, Mar 26, 2014 at 10:21:02AM -0600, Alex Williamson wrote:
> > On Wed, 2014-03-26 at 23:06 +0800, Alexander Graf wrote:
> > > 
> > > > Am 26.03.2014 um 22:40 schrieb Konrad Rzeszutek Wilk <konrad.wilk-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org>:
> > > > 
> > > >> On Wed, Mar 26, 2014 at 01:40:32AM +0000, Stuart Yoder wrote:
> > > >> Hi Greg,
> > > >> 
> > > >> We (Linaro, Freescale, Virtual Open Systems) are trying get an issue
> > > >> closed that has been perculating for a while around creating a mechanism
> > > >> that will allow kernel drivers like vfio can bind to devices of any type.
> > > >> 
> > > >> This thread with you:
> > > >> http://www.spinics.net/lists/kvm-arm/msg08370.html
> > > >> ...seems to have died out, so am trying to get your response
> > > >> and will summarize again.  Vfio drivers in the kernel (regardless of
> > > >> bus type) need to bind to devices of any type.  The driver's function
> > > >> is to simply export hardware resources of any type to user space.
> > > >> 
> > > >> There are several approaches that have been proposed:
> > > > 
> > > > You seem to have missed the one I proposed.
> > > >> 
> > > >>   1.  new_id -- (current approach) the user explicitly registers
> > > >>       each new device type with the vfio driver using the new_id
> > > >>       mechanism.
> > > >> 
> > > >>       Problem: multiple drivers will be resident that handle the
> > > >>       same device type...and there is nothing user space hotplug
> > > >>       infrastructure can do to help.
> > > >> 
> > > >>   2.  "any id" -- the vfio driver could specify a wildcard match
> > > >>       of some kind in its ID match table which would allow it to
> > > >>       match and bind to any possible device id.  However,
> > > >>       we don't want the vfio driver grabbing _all_ devices...just the ones we
> > > >>       explicitly want to pass to user space.
> > > >> 
> > > >>       The proposed patch to support this was to create a new flag
> > > >>       "sysfs_bind_only" in struct device_driver.  When this flag
> > > >>       is set, the driver can only bind to devices via the sysfs
> > > >>       bind file.  This would allow the wildcard match to work.
> > > >> 
> > > >>       Patch is here:
> > > >>       https://lkml.org/lkml/2013/12/3/253
> > > >> 
> > > >>   3.  "Driver initiated explicit bind" -- with this approach the
> > > >>       vfio driver would create a private 'bind' sysfs object
> > > >>       and the user would echo the requested device into it:
> > > >> 
> > > >>       echo 0001:03:00.0 > /sys/bus/pci/drivers/vfio-pci/vfio_bind
> > > >> 
> > > >>       In order to make that work, the driver would need to call
> > > >>       driver_probe_device() and thus we need this patch:
> > > >>       https://lkml.org/lkml/2014/2/8/175
> > > > 
> > > > 4). Use the 'unbind' (from the original device) and 'bind' to vfio driver.
> > > 
> > > This is approach 2, no?
> > > 
> > > > 
> > > > Which I think is what is currently being done. Why is that not sufficient?
> > > 
> > > How would 'bind to vfio driver' look like?
> > > 
> > > > The only thing I see in the URL is " That works, but it is ugly."
> > > > There is some mention of race but I don't see how - if you do the 'unbind'
> > > > on the original driver and then bind the BDF to the VFIO how would you get
> > > > a race?
> > > 
> > > Typically on PCI, you do a
> > > 
> > >   - add wildcard (pci id) match to vfio driver
> > >   - unbind driver
> > >   -> reprobe
> > >   -> device attaches to vfio driver because it is the least recent match
> > >   - remove wildcard match from vfio driver
> > > 
> > > If in between you hotplug add a card of the same type, it gets attached to vfio - even though the logical "default driver" would be the device specific driver.
> > 
> > I've mentioned drivers_autoprobe in the past, but I'm not sure we're
> > really factoring it into the discussion.  drivers_autoprobe allows us to
> > toggle two points:
> > 
> > a) When a new device is added whether we automatically give drivers a
> > try at binding to it
> > 
> > b) When a new driver is added whether it gets to try to bind to anything
> > in the system
> > 
> > So we do have a mechanism to avoid the race, but the problem is that it
> > becomes the responsibility of userspace to:
> > 
> > 1) turn off drivers_autoprobe
> > 2) unbind/new_id/bind/remove_id
> > 3) turn on drivers_autoprobe
> > 4) call drivers_probe for anything added between 1) & 3)
> > 
> > Is the question about the ugliness of the current solution whether it's
> > unreasonable to ask userspace to do this?
> > 
> > What we seem to be asking for above is more like an autoprobe flag per
> > driver where there's some way for this special driver to opt out of auto
> > probing.  Option 2. in Stuart's list does this by short-cutting ID
> > matching so that a "match" is only found when using the sysfs bind path,
> > option 3. enables a way for a driver to expose their own sysfs entry
> > point for binding.  The latter feels particularly chaotic since drivers
> > get to make-up their own bind mechanism.
> > 
> > Another twist I'll throw in is that devices can be hot added to IOMMU
> > groups that are in-use by userspace.  When that happens we'd like to be
> > able to disable driver autoprobe of the device to avoid a host driver
> > automatically binding to the device.  I wonder if instead of looking at
> > the problem from the driver perspective, if we were to instead look at
> > it from the device perspective if we might find a solution that would
> > address both.  For instance, if devices had a driver_probe_id property
> > that was by default set to their bus specific ID match ("$VENDOR
> > $DEVICE" on PCI) could we use that to write new match IDs so that a
> > device could only bind to a given driver?  Effectively we could then
> > bind either using the current method of adding to the list of IDs a
> > driver will match of changing the ID that a device would match.  Does
> > that get us anywhere?  Thanks,
> 
> The other option for this is to having some sort of priority on the 
> device probing with hotplugging.
> 
> That is you can could do the following:
> 
>  1) add the device vendor/model in vfio
>  2) unbind the BDF from the original driver.
>  3) hotplug happens - any new device that has the device vendor/model gets
>    owned by vfio instead of the original device.

This doesn't help the device-added-to-inuse-group problem though because
we have no idea if the new device would have the same vendor/model as
other devices in the group.  By making the device probe ID modifiable,
vfio can watch the IOMMU group notifiers and change the probe ID of new
devices to either prevent the host driver from claiming them or to allow
vfio to claim them.  At the same time we change the problem from "this
driver can attach to this kind of device" to "this device can attach to
that driver", which also solves Stuart's problem.  Thanks,

Alex

>  4). bind the BDF to the vfio.
> 
> Granted that is a bit silly too - as the admin might want to have the new
> hotplugged device be owned by the native driver.
> 
> In which case, why not just switch out from using device vendor/model
> to just using BDF values?
> > 
> > Alex
> > 

^ permalink raw reply	[flat|nested] 92+ messages in thread

* Re: mechanism to allow a driver to bind to any device
@ 2014-03-26 16:49                                                 ` Alex Williamson
  0 siblings, 0 replies; 92+ messages in thread
From: Alex Williamson @ 2014-03-26 16:49 UTC (permalink / raw)
  To: Konrad Rzeszutek Wilk
  Cc: kvm-u79uwXL29TY76Z2rM5mHXA, jan.kiszka-kv7WeFo6aLtBDgjK7y7TUQ,
	will.deacon-5wv7dgnIgG8, Stuart Yoder,
	a.rigo-lrHrjnjw1UfHK3s98zE1ajGjJy/sRE9J, Michal Hocko,
	Scott Wood, Varun Sethi, kvmarm-FPEHb7Xf0XXUo1n7N8X6UoWGPAHP3yOg,
	Rafael J. Wysocki, Alexander Graf, Guenter Roeck,
	Dmitry Kasatkin, Tejun Heo, Bjorn Helgaas, Antonios Motakis,
	tech-lrHrjnjw1UfHK3s98zE1ajGjJy/sRE9J, Toshi Kani, Greg KH,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org

On Wed, 2014-03-26 at 12:32 -0400, Konrad Rzeszutek Wilk wrote:
> On Wed, Mar 26, 2014 at 10:21:02AM -0600, Alex Williamson wrote:
> > On Wed, 2014-03-26 at 23:06 +0800, Alexander Graf wrote:
> > > 
> > > > Am 26.03.2014 um 22:40 schrieb Konrad Rzeszutek Wilk <konrad.wilk-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org>:
> > > > 
> > > >> On Wed, Mar 26, 2014 at 01:40:32AM +0000, Stuart Yoder wrote:
> > > >> Hi Greg,
> > > >> 
> > > >> We (Linaro, Freescale, Virtual Open Systems) are trying get an issue
> > > >> closed that has been perculating for a while around creating a mechanism
> > > >> that will allow kernel drivers like vfio can bind to devices of any type.
> > > >> 
> > > >> This thread with you:
> > > >> http://www.spinics.net/lists/kvm-arm/msg08370.html
> > > >> ...seems to have died out, so am trying to get your response
> > > >> and will summarize again.  Vfio drivers in the kernel (regardless of
> > > >> bus type) need to bind to devices of any type.  The driver's function
> > > >> is to simply export hardware resources of any type to user space.
> > > >> 
> > > >> There are several approaches that have been proposed:
> > > > 
> > > > You seem to have missed the one I proposed.
> > > >> 
> > > >>   1.  new_id -- (current approach) the user explicitly registers
> > > >>       each new device type with the vfio driver using the new_id
> > > >>       mechanism.
> > > >> 
> > > >>       Problem: multiple drivers will be resident that handle the
> > > >>       same device type...and there is nothing user space hotplug
> > > >>       infrastructure can do to help.
> > > >> 
> > > >>   2.  "any id" -- the vfio driver could specify a wildcard match
> > > >>       of some kind in its ID match table which would allow it to
> > > >>       match and bind to any possible device id.  However,
> > > >>       we don't want the vfio driver grabbing _all_ devices...just the ones we
> > > >>       explicitly want to pass to user space.
> > > >> 
> > > >>       The proposed patch to support this was to create a new flag
> > > >>       "sysfs_bind_only" in struct device_driver.  When this flag
> > > >>       is set, the driver can only bind to devices via the sysfs
> > > >>       bind file.  This would allow the wildcard match to work.
> > > >> 
> > > >>       Patch is here:
> > > >>       https://lkml.org/lkml/2013/12/3/253
> > > >> 
> > > >>   3.  "Driver initiated explicit bind" -- with this approach the
> > > >>       vfio driver would create a private 'bind' sysfs object
> > > >>       and the user would echo the requested device into it:
> > > >> 
> > > >>       echo 0001:03:00.0 > /sys/bus/pci/drivers/vfio-pci/vfio_bind
> > > >> 
> > > >>       In order to make that work, the driver would need to call
> > > >>       driver_probe_device() and thus we need this patch:
> > > >>       https://lkml.org/lkml/2014/2/8/175
> > > > 
> > > > 4). Use the 'unbind' (from the original device) and 'bind' to vfio driver.
> > > 
> > > This is approach 2, no?
> > > 
> > > > 
> > > > Which I think is what is currently being done. Why is that not sufficient?
> > > 
> > > How would 'bind to vfio driver' look like?
> > > 
> > > > The only thing I see in the URL is " That works, but it is ugly."
> > > > There is some mention of race but I don't see how - if you do the 'unbind'
> > > > on the original driver and then bind the BDF to the VFIO how would you get
> > > > a race?
> > > 
> > > Typically on PCI, you do a
> > > 
> > >   - add wildcard (pci id) match to vfio driver
> > >   - unbind driver
> > >   -> reprobe
> > >   -> device attaches to vfio driver because it is the least recent match
> > >   - remove wildcard match from vfio driver
> > > 
> > > If in between you hotplug add a card of the same type, it gets attached to vfio - even though the logical "default driver" would be the device specific driver.
> > 
> > I've mentioned drivers_autoprobe in the past, but I'm not sure we're
> > really factoring it into the discussion.  drivers_autoprobe allows us to
> > toggle two points:
> > 
> > a) When a new device is added whether we automatically give drivers a
> > try at binding to it
> > 
> > b) When a new driver is added whether it gets to try to bind to anything
> > in the system
> > 
> > So we do have a mechanism to avoid the race, but the problem is that it
> > becomes the responsibility of userspace to:
> > 
> > 1) turn off drivers_autoprobe
> > 2) unbind/new_id/bind/remove_id
> > 3) turn on drivers_autoprobe
> > 4) call drivers_probe for anything added between 1) & 3)
> > 
> > Is the question about the ugliness of the current solution whether it's
> > unreasonable to ask userspace to do this?
> > 
> > What we seem to be asking for above is more like an autoprobe flag per
> > driver where there's some way for this special driver to opt out of auto
> > probing.  Option 2. in Stuart's list does this by short-cutting ID
> > matching so that a "match" is only found when using the sysfs bind path,
> > option 3. enables a way for a driver to expose their own sysfs entry
> > point for binding.  The latter feels particularly chaotic since drivers
> > get to make-up their own bind mechanism.
> > 
> > Another twist I'll throw in is that devices can be hot added to IOMMU
> > groups that are in-use by userspace.  When that happens we'd like to be
> > able to disable driver autoprobe of the device to avoid a host driver
> > automatically binding to the device.  I wonder if instead of looking at
> > the problem from the driver perspective, if we were to instead look at
> > it from the device perspective if we might find a solution that would
> > address both.  For instance, if devices had a driver_probe_id property
> > that was by default set to their bus specific ID match ("$VENDOR
> > $DEVICE" on PCI) could we use that to write new match IDs so that a
> > device could only bind to a given driver?  Effectively we could then
> > bind either using the current method of adding to the list of IDs a
> > driver will match of changing the ID that a device would match.  Does
> > that get us anywhere?  Thanks,
> 
> The other option for this is to having some sort of priority on the 
> device probing with hotplugging.
> 
> That is you can could do the following:
> 
>  1) add the device vendor/model in vfio
>  2) unbind the BDF from the original driver.
>  3) hotplug happens - any new device that has the device vendor/model gets
>    owned by vfio instead of the original device.

This doesn't help the device-added-to-inuse-group problem though because
we have no idea if the new device would have the same vendor/model as
other devices in the group.  By making the device probe ID modifiable,
vfio can watch the IOMMU group notifiers and change the probe ID of new
devices to either prevent the host driver from claiming them or to allow
vfio to claim them.  At the same time we change the problem from "this
driver can attach to this kind of device" to "this device can attach to
that driver", which also solves Stuart's problem.  Thanks,

Alex

>  4). bind the BDF to the vfio.
> 
> Granted that is a bit silly too - as the admin might want to have the new
> hotplugged device be owned by the native driver.
> 
> In which case, why not just switch out from using device vendor/model
> to just using BDF values?
> > 
> > Alex
> > 

^ permalink raw reply	[flat|nested] 92+ messages in thread

* Re: mechanism to allow a driver to bind to any device
       [not found]                                                 ` <1395852592.632.253.camel-85EaTFmN5p//9pzu0YdTqQ@public.gmane.org>
@ 2014-03-26 17:04                                                     ` Konrad Rzeszutek Wilk
  0 siblings, 0 replies; 92+ messages in thread
From: Konrad Rzeszutek Wilk @ 2014-03-26 17:04 UTC (permalink / raw)
  To: Alex Williamson
  Cc: kvm-u79uwXL29TY76Z2rM5mHXA, jan.kiszka-kv7WeFo6aLtBDgjK7y7TUQ,
	will.deacon-5wv7dgnIgG8, Stuart Yoder,
	a.rigo-lrHrjnjw1UfHK3s98zE1ajGjJy/sRE9J, Michal Hocko,
	Scott Wood, Varun Sethi, kvmarm-FPEHb7Xf0XXUo1n7N8X6UoWGPAHP3yOg,
	Rafael J. Wysocki, Alexander Graf, Guenter Roeck,
	Dmitry Kasatkin, Tejun Heo, Bjorn Helgaas, Antonios Motakis,
	tech-lrHrjnjw1UfHK3s98zE1ajGjJy/sRE9J, Toshi Kani, Greg KH,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org

On Wed, Mar 26, 2014 at 10:49:52AM -0600, Alex Williamson wrote:
> On Wed, 2014-03-26 at 12:32 -0400, Konrad Rzeszutek Wilk wrote:
> > On Wed, Mar 26, 2014 at 10:21:02AM -0600, Alex Williamson wrote:
> > > On Wed, 2014-03-26 at 23:06 +0800, Alexander Graf wrote:
> > > > 
> > > > > Am 26.03.2014 um 22:40 schrieb Konrad Rzeszutek Wilk <konrad.wilk-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org>:
> > > > > 
> > > > >> On Wed, Mar 26, 2014 at 01:40:32AM +0000, Stuart Yoder wrote:
> > > > >> Hi Greg,
> > > > >> 
> > > > >> We (Linaro, Freescale, Virtual Open Systems) are trying get an issue
> > > > >> closed that has been perculating for a while around creating a mechanism
> > > > >> that will allow kernel drivers like vfio can bind to devices of any type.
> > > > >> 
> > > > >> This thread with you:
> > > > >> http://www.spinics.net/lists/kvm-arm/msg08370.html
> > > > >> ...seems to have died out, so am trying to get your response
> > > > >> and will summarize again.  Vfio drivers in the kernel (regardless of
> > > > >> bus type) need to bind to devices of any type.  The driver's function
> > > > >> is to simply export hardware resources of any type to user space.
> > > > >> 
> > > > >> There are several approaches that have been proposed:
> > > > > 
> > > > > You seem to have missed the one I proposed.
> > > > >> 
> > > > >>   1.  new_id -- (current approach) the user explicitly registers
> > > > >>       each new device type with the vfio driver using the new_id
> > > > >>       mechanism.
> > > > >> 
> > > > >>       Problem: multiple drivers will be resident that handle the
> > > > >>       same device type...and there is nothing user space hotplug
> > > > >>       infrastructure can do to help.
> > > > >> 
> > > > >>   2.  "any id" -- the vfio driver could specify a wildcard match
> > > > >>       of some kind in its ID match table which would allow it to
> > > > >>       match and bind to any possible device id.  However,
> > > > >>       we don't want the vfio driver grabbing _all_ devices...just the ones we
> > > > >>       explicitly want to pass to user space.
> > > > >> 
> > > > >>       The proposed patch to support this was to create a new flag
> > > > >>       "sysfs_bind_only" in struct device_driver.  When this flag
> > > > >>       is set, the driver can only bind to devices via the sysfs
> > > > >>       bind file.  This would allow the wildcard match to work.
> > > > >> 
> > > > >>       Patch is here:
> > > > >>       https://lkml.org/lkml/2013/12/3/253
> > > > >> 
> > > > >>   3.  "Driver initiated explicit bind" -- with this approach the
> > > > >>       vfio driver would create a private 'bind' sysfs object
> > > > >>       and the user would echo the requested device into it:
> > > > >> 
> > > > >>       echo 0001:03:00.0 > /sys/bus/pci/drivers/vfio-pci/vfio_bind
> > > > >> 
> > > > >>       In order to make that work, the driver would need to call
> > > > >>       driver_probe_device() and thus we need this patch:
> > > > >>       https://lkml.org/lkml/2014/2/8/175
> > > > > 
> > > > > 4). Use the 'unbind' (from the original device) and 'bind' to vfio driver.
> > > > 
> > > > This is approach 2, no?
> > > > 
> > > > > 
> > > > > Which I think is what is currently being done. Why is that not sufficient?
> > > > 
> > > > How would 'bind to vfio driver' look like?
> > > > 
> > > > > The only thing I see in the URL is " That works, but it is ugly."
> > > > > There is some mention of race but I don't see how - if you do the 'unbind'
> > > > > on the original driver and then bind the BDF to the VFIO how would you get
> > > > > a race?
> > > > 
> > > > Typically on PCI, you do a
> > > > 
> > > >   - add wildcard (pci id) match to vfio driver
> > > >   - unbind driver
> > > >   -> reprobe
> > > >   -> device attaches to vfio driver because it is the least recent match
> > > >   - remove wildcard match from vfio driver
> > > > 
> > > > If in between you hotplug add a card of the same type, it gets attached to vfio - even though the logical "default driver" would be the device specific driver.
> > > 
> > > I've mentioned drivers_autoprobe in the past, but I'm not sure we're
> > > really factoring it into the discussion.  drivers_autoprobe allows us to
> > > toggle two points:
> > > 
> > > a) When a new device is added whether we automatically give drivers a
> > > try at binding to it
> > > 
> > > b) When a new driver is added whether it gets to try to bind to anything
> > > in the system
> > > 
> > > So we do have a mechanism to avoid the race, but the problem is that it
> > > becomes the responsibility of userspace to:
> > > 
> > > 1) turn off drivers_autoprobe
> > > 2) unbind/new_id/bind/remove_id
> > > 3) turn on drivers_autoprobe
> > > 4) call drivers_probe for anything added between 1) & 3)
> > > 
> > > Is the question about the ugliness of the current solution whether it's
> > > unreasonable to ask userspace to do this?
> > > 
> > > What we seem to be asking for above is more like an autoprobe flag per
> > > driver where there's some way for this special driver to opt out of auto
> > > probing.  Option 2. in Stuart's list does this by short-cutting ID
> > > matching so that a "match" is only found when using the sysfs bind path,
> > > option 3. enables a way for a driver to expose their own sysfs entry
> > > point for binding.  The latter feels particularly chaotic since drivers
> > > get to make-up their own bind mechanism.
> > > 
> > > Another twist I'll throw in is that devices can be hot added to IOMMU
> > > groups that are in-use by userspace.  When that happens we'd like to be
> > > able to disable driver autoprobe of the device to avoid a host driver
> > > automatically binding to the device.  I wonder if instead of looking at
> > > the problem from the driver perspective, if we were to instead look at
> > > it from the device perspective if we might find a solution that would
> > > address both.  For instance, if devices had a driver_probe_id property
> > > that was by default set to their bus specific ID match ("$VENDOR
> > > $DEVICE" on PCI) could we use that to write new match IDs so that a
> > > device could only bind to a given driver?  Effectively we could then
> > > bind either using the current method of adding to the list of IDs a
> > > driver will match of changing the ID that a device would match.  Does
> > > that get us anywhere?  Thanks,
> > 
> > The other option for this is to having some sort of priority on the 
> > device probing with hotplugging.
> > 
> > That is you can could do the following:
> > 
> >  1) add the device vendor/model in vfio
> >  2) unbind the BDF from the original driver.
> >  3) hotplug happens - any new device that has the device vendor/model gets
> >    owned by vfio instead of the original device.
> 
> This doesn't help the device-added-to-inuse-group problem though because
> we have no idea if the new device would have the same vendor/model as
> other devices in the group.  By making the device probe ID modifiable,

Um, you add a hotplugged PCI device in a group that is in usage?

> vfio can watch the IOMMU group notifiers and change the probe ID of new

Ewwww.
> devices to either prevent the host driver from claiming them or to allow
> vfio to claim them.  At the same time we change the problem from "this
> driver can attach to this kind of device" to "this device can attach to
> that driver", which also solves Stuart's problem.  Thanks,
> 
> Alex
> 
> >  4). bind the BDF to the vfio.
> > 
> > Granted that is a bit silly too - as the admin might want to have the new
> > hotplugged device be owned by the native driver.
> > 
> > In which case, why not just switch out from using device vendor/model
> > to just using BDF values?

Which would still solve the problem. The user-space would just have to
reassign the device to the vfio group.

> > > 
> > > Alex
> > > 
> 
> 
> 

^ permalink raw reply	[flat|nested] 92+ messages in thread

* Re: mechanism to allow a driver to bind to any device
@ 2014-03-26 17:04                                                     ` Konrad Rzeszutek Wilk
  0 siblings, 0 replies; 92+ messages in thread
From: Konrad Rzeszutek Wilk @ 2014-03-26 17:04 UTC (permalink / raw)
  To: Alex Williamson
  Cc: kvm-u79uwXL29TY76Z2rM5mHXA, jan.kiszka-kv7WeFo6aLtBDgjK7y7TUQ,
	will.deacon-5wv7dgnIgG8, Stuart Yoder,
	a.rigo-lrHrjnjw1UfHK3s98zE1ajGjJy/sRE9J, Michal Hocko,
	Scott Wood, Varun Sethi, kvmarm-FPEHb7Xf0XXUo1n7N8X6UoWGPAHP3yOg,
	Rafael J. Wysocki, Alexander Graf, Guenter Roeck,
	Dmitry Kasatkin, Tejun Heo, Bjorn Helgaas, Antonios Motakis,
	tech-lrHrjnjw1UfHK3s98zE1ajGjJy/sRE9J, Toshi Kani, Greg KH,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org

On Wed, Mar 26, 2014 at 10:49:52AM -0600, Alex Williamson wrote:
> On Wed, 2014-03-26 at 12:32 -0400, Konrad Rzeszutek Wilk wrote:
> > On Wed, Mar 26, 2014 at 10:21:02AM -0600, Alex Williamson wrote:
> > > On Wed, 2014-03-26 at 23:06 +0800, Alexander Graf wrote:
> > > > 
> > > > > Am 26.03.2014 um 22:40 schrieb Konrad Rzeszutek Wilk <konrad.wilk-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org>:
> > > > > 
> > > > >> On Wed, Mar 26, 2014 at 01:40:32AM +0000, Stuart Yoder wrote:
> > > > >> Hi Greg,
> > > > >> 
> > > > >> We (Linaro, Freescale, Virtual Open Systems) are trying get an issue
> > > > >> closed that has been perculating for a while around creating a mechanism
> > > > >> that will allow kernel drivers like vfio can bind to devices of any type.
> > > > >> 
> > > > >> This thread with you:
> > > > >> http://www.spinics.net/lists/kvm-arm/msg08370.html
> > > > >> ...seems to have died out, so am trying to get your response
> > > > >> and will summarize again.  Vfio drivers in the kernel (regardless of
> > > > >> bus type) need to bind to devices of any type.  The driver's function
> > > > >> is to simply export hardware resources of any type to user space.
> > > > >> 
> > > > >> There are several approaches that have been proposed:
> > > > > 
> > > > > You seem to have missed the one I proposed.
> > > > >> 
> > > > >>   1.  new_id -- (current approach) the user explicitly registers
> > > > >>       each new device type with the vfio driver using the new_id
> > > > >>       mechanism.
> > > > >> 
> > > > >>       Problem: multiple drivers will be resident that handle the
> > > > >>       same device type...and there is nothing user space hotplug
> > > > >>       infrastructure can do to help.
> > > > >> 
> > > > >>   2.  "any id" -- the vfio driver could specify a wildcard match
> > > > >>       of some kind in its ID match table which would allow it to
> > > > >>       match and bind to any possible device id.  However,
> > > > >>       we don't want the vfio driver grabbing _all_ devices...just the ones we
> > > > >>       explicitly want to pass to user space.
> > > > >> 
> > > > >>       The proposed patch to support this was to create a new flag
> > > > >>       "sysfs_bind_only" in struct device_driver.  When this flag
> > > > >>       is set, the driver can only bind to devices via the sysfs
> > > > >>       bind file.  This would allow the wildcard match to work.
> > > > >> 
> > > > >>       Patch is here:
> > > > >>       https://lkml.org/lkml/2013/12/3/253
> > > > >> 
> > > > >>   3.  "Driver initiated explicit bind" -- with this approach the
> > > > >>       vfio driver would create a private 'bind' sysfs object
> > > > >>       and the user would echo the requested device into it:
> > > > >> 
> > > > >>       echo 0001:03:00.0 > /sys/bus/pci/drivers/vfio-pci/vfio_bind
> > > > >> 
> > > > >>       In order to make that work, the driver would need to call
> > > > >>       driver_probe_device() and thus we need this patch:
> > > > >>       https://lkml.org/lkml/2014/2/8/175
> > > > > 
> > > > > 4). Use the 'unbind' (from the original device) and 'bind' to vfio driver.
> > > > 
> > > > This is approach 2, no?
> > > > 
> > > > > 
> > > > > Which I think is what is currently being done. Why is that not sufficient?
> > > > 
> > > > How would 'bind to vfio driver' look like?
> > > > 
> > > > > The only thing I see in the URL is " That works, but it is ugly."
> > > > > There is some mention of race but I don't see how - if you do the 'unbind'
> > > > > on the original driver and then bind the BDF to the VFIO how would you get
> > > > > a race?
> > > > 
> > > > Typically on PCI, you do a
> > > > 
> > > >   - add wildcard (pci id) match to vfio driver
> > > >   - unbind driver
> > > >   -> reprobe
> > > >   -> device attaches to vfio driver because it is the least recent match
> > > >   - remove wildcard match from vfio driver
> > > > 
> > > > If in between you hotplug add a card of the same type, it gets attached to vfio - even though the logical "default driver" would be the device specific driver.
> > > 
> > > I've mentioned drivers_autoprobe in the past, but I'm not sure we're
> > > really factoring it into the discussion.  drivers_autoprobe allows us to
> > > toggle two points:
> > > 
> > > a) When a new device is added whether we automatically give drivers a
> > > try at binding to it
> > > 
> > > b) When a new driver is added whether it gets to try to bind to anything
> > > in the system
> > > 
> > > So we do have a mechanism to avoid the race, but the problem is that it
> > > becomes the responsibility of userspace to:
> > > 
> > > 1) turn off drivers_autoprobe
> > > 2) unbind/new_id/bind/remove_id
> > > 3) turn on drivers_autoprobe
> > > 4) call drivers_probe for anything added between 1) & 3)
> > > 
> > > Is the question about the ugliness of the current solution whether it's
> > > unreasonable to ask userspace to do this?
> > > 
> > > What we seem to be asking for above is more like an autoprobe flag per
> > > driver where there's some way for this special driver to opt out of auto
> > > probing.  Option 2. in Stuart's list does this by short-cutting ID
> > > matching so that a "match" is only found when using the sysfs bind path,
> > > option 3. enables a way for a driver to expose their own sysfs entry
> > > point for binding.  The latter feels particularly chaotic since drivers
> > > get to make-up their own bind mechanism.
> > > 
> > > Another twist I'll throw in is that devices can be hot added to IOMMU
> > > groups that are in-use by userspace.  When that happens we'd like to be
> > > able to disable driver autoprobe of the device to avoid a host driver
> > > automatically binding to the device.  I wonder if instead of looking at
> > > the problem from the driver perspective, if we were to instead look at
> > > it from the device perspective if we might find a solution that would
> > > address both.  For instance, if devices had a driver_probe_id property
> > > that was by default set to their bus specific ID match ("$VENDOR
> > > $DEVICE" on PCI) could we use that to write new match IDs so that a
> > > device could only bind to a given driver?  Effectively we could then
> > > bind either using the current method of adding to the list of IDs a
> > > driver will match of changing the ID that a device would match.  Does
> > > that get us anywhere?  Thanks,
> > 
> > The other option for this is to having some sort of priority on the 
> > device probing with hotplugging.
> > 
> > That is you can could do the following:
> > 
> >  1) add the device vendor/model in vfio
> >  2) unbind the BDF from the original driver.
> >  3) hotplug happens - any new device that has the device vendor/model gets
> >    owned by vfio instead of the original device.
> 
> This doesn't help the device-added-to-inuse-group problem though because
> we have no idea if the new device would have the same vendor/model as
> other devices in the group.  By making the device probe ID modifiable,

Um, you add a hotplugged PCI device in a group that is in usage?

> vfio can watch the IOMMU group notifiers and change the probe ID of new

Ewwww.
> devices to either prevent the host driver from claiming them or to allow
> vfio to claim them.  At the same time we change the problem from "this
> driver can attach to this kind of device" to "this device can attach to
> that driver", which also solves Stuart's problem.  Thanks,
> 
> Alex
> 
> >  4). bind the BDF to the vfio.
> > 
> > Granted that is a bit silly too - as the admin might want to have the new
> > hotplugged device be owned by the native driver.
> > 
> > In which case, why not just switch out from using device vendor/model
> > to just using BDF values?

Which would still solve the problem. The user-space would just have to
reassign the device to the vfio group.

> > > 
> > > Alex
> > > 
> 
> 
> 

^ permalink raw reply	[flat|nested] 92+ messages in thread

* Re: mechanism to allow a driver to bind to any device
       [not found]                                                     ` <20140326170406.GA22902-6K5HmflnPlqSPmnEAIUT9EEOCMrvLtNR@public.gmane.org>
@ 2014-03-26 17:26                                                         ` Alex Williamson
  0 siblings, 0 replies; 92+ messages in thread
From: Alex Williamson @ 2014-03-26 17:26 UTC (permalink / raw)
  To: Konrad Rzeszutek Wilk
  Cc: kvm-u79uwXL29TY76Z2rM5mHXA, jan.kiszka-kv7WeFo6aLtBDgjK7y7TUQ,
	will.deacon-5wv7dgnIgG8, Stuart Yoder,
	a.rigo-lrHrjnjw1UfHK3s98zE1ajGjJy/sRE9J, Michal Hocko,
	Scott Wood, Varun Sethi, kvmarm-FPEHb7Xf0XXUo1n7N8X6UoWGPAHP3yOg,
	Rafael J. Wysocki, Alexander Graf, Guenter Roeck,
	Dmitry Kasatkin, Tejun Heo, Bjorn Helgaas, Antonios Motakis,
	tech-lrHrjnjw1UfHK3s98zE1ajGjJy/sRE9J, Toshi Kani, Greg KH,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org

On Wed, 2014-03-26 at 13:04 -0400, Konrad Rzeszutek Wilk wrote:
> On Wed, Mar 26, 2014 at 10:49:52AM -0600, Alex Williamson wrote:
> > On Wed, 2014-03-26 at 12:32 -0400, Konrad Rzeszutek Wilk wrote:
> > > On Wed, Mar 26, 2014 at 10:21:02AM -0600, Alex Williamson wrote:
> > > > On Wed, 2014-03-26 at 23:06 +0800, Alexander Graf wrote:
> > > > > 
> > > > > > Am 26.03.2014 um 22:40 schrieb Konrad Rzeszutek Wilk <konrad.wilk-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org>:
> > > > > > 
> > > > > >> On Wed, Mar 26, 2014 at 01:40:32AM +0000, Stuart Yoder wrote:
> > > > > >> Hi Greg,
> > > > > >> 
> > > > > >> We (Linaro, Freescale, Virtual Open Systems) are trying get an issue
> > > > > >> closed that has been perculating for a while around creating a mechanism
> > > > > >> that will allow kernel drivers like vfio can bind to devices of any type.
> > > > > >> 
> > > > > >> This thread with you:
> > > > > >> http://www.spinics.net/lists/kvm-arm/msg08370.html
> > > > > >> ...seems to have died out, so am trying to get your response
> > > > > >> and will summarize again.  Vfio drivers in the kernel (regardless of
> > > > > >> bus type) need to bind to devices of any type.  The driver's function
> > > > > >> is to simply export hardware resources of any type to user space.
> > > > > >> 
> > > > > >> There are several approaches that have been proposed:
> > > > > > 
> > > > > > You seem to have missed the one I proposed.
> > > > > >> 
> > > > > >>   1.  new_id -- (current approach) the user explicitly registers
> > > > > >>       each new device type with the vfio driver using the new_id
> > > > > >>       mechanism.
> > > > > >> 
> > > > > >>       Problem: multiple drivers will be resident that handle the
> > > > > >>       same device type...and there is nothing user space hotplug
> > > > > >>       infrastructure can do to help.
> > > > > >> 
> > > > > >>   2.  "any id" -- the vfio driver could specify a wildcard match
> > > > > >>       of some kind in its ID match table which would allow it to
> > > > > >>       match and bind to any possible device id.  However,
> > > > > >>       we don't want the vfio driver grabbing _all_ devices...just the ones we
> > > > > >>       explicitly want to pass to user space.
> > > > > >> 
> > > > > >>       The proposed patch to support this was to create a new flag
> > > > > >>       "sysfs_bind_only" in struct device_driver.  When this flag
> > > > > >>       is set, the driver can only bind to devices via the sysfs
> > > > > >>       bind file.  This would allow the wildcard match to work.
> > > > > >> 
> > > > > >>       Patch is here:
> > > > > >>       https://lkml.org/lkml/2013/12/3/253
> > > > > >> 
> > > > > >>   3.  "Driver initiated explicit bind" -- with this approach the
> > > > > >>       vfio driver would create a private 'bind' sysfs object
> > > > > >>       and the user would echo the requested device into it:
> > > > > >> 
> > > > > >>       echo 0001:03:00.0 > /sys/bus/pci/drivers/vfio-pci/vfio_bind
> > > > > >> 
> > > > > >>       In order to make that work, the driver would need to call
> > > > > >>       driver_probe_device() and thus we need this patch:
> > > > > >>       https://lkml.org/lkml/2014/2/8/175
> > > > > > 
> > > > > > 4). Use the 'unbind' (from the original device) and 'bind' to vfio driver.
> > > > > 
> > > > > This is approach 2, no?
> > > > > 
> > > > > > 
> > > > > > Which I think is what is currently being done. Why is that not sufficient?
> > > > > 
> > > > > How would 'bind to vfio driver' look like?
> > > > > 
> > > > > > The only thing I see in the URL is " That works, but it is ugly."
> > > > > > There is some mention of race but I don't see how - if you do the 'unbind'
> > > > > > on the original driver and then bind the BDF to the VFIO how would you get
> > > > > > a race?
> > > > > 
> > > > > Typically on PCI, you do a
> > > > > 
> > > > >   - add wildcard (pci id) match to vfio driver
> > > > >   - unbind driver
> > > > >   -> reprobe
> > > > >   -> device attaches to vfio driver because it is the least recent match
> > > > >   - remove wildcard match from vfio driver
> > > > > 
> > > > > If in between you hotplug add a card of the same type, it gets attached to vfio - even though the logical "default driver" would be the device specific driver.
> > > > 
> > > > I've mentioned drivers_autoprobe in the past, but I'm not sure we're
> > > > really factoring it into the discussion.  drivers_autoprobe allows us to
> > > > toggle two points:
> > > > 
> > > > a) When a new device is added whether we automatically give drivers a
> > > > try at binding to it
> > > > 
> > > > b) When a new driver is added whether it gets to try to bind to anything
> > > > in the system
> > > > 
> > > > So we do have a mechanism to avoid the race, but the problem is that it
> > > > becomes the responsibility of userspace to:
> > > > 
> > > > 1) turn off drivers_autoprobe
> > > > 2) unbind/new_id/bind/remove_id
> > > > 3) turn on drivers_autoprobe
> > > > 4) call drivers_probe for anything added between 1) & 3)
> > > > 
> > > > Is the question about the ugliness of the current solution whether it's
> > > > unreasonable to ask userspace to do this?
> > > > 
> > > > What we seem to be asking for above is more like an autoprobe flag per
> > > > driver where there's some way for this special driver to opt out of auto
> > > > probing.  Option 2. in Stuart's list does this by short-cutting ID
> > > > matching so that a "match" is only found when using the sysfs bind path,
> > > > option 3. enables a way for a driver to expose their own sysfs entry
> > > > point for binding.  The latter feels particularly chaotic since drivers
> > > > get to make-up their own bind mechanism.
> > > > 
> > > > Another twist I'll throw in is that devices can be hot added to IOMMU
> > > > groups that are in-use by userspace.  When that happens we'd like to be
> > > > able to disable driver autoprobe of the device to avoid a host driver
> > > > automatically binding to the device.  I wonder if instead of looking at
> > > > the problem from the driver perspective, if we were to instead look at
> > > > it from the device perspective if we might find a solution that would
> > > > address both.  For instance, if devices had a driver_probe_id property
> > > > that was by default set to their bus specific ID match ("$VENDOR
> > > > $DEVICE" on PCI) could we use that to write new match IDs so that a
> > > > device could only bind to a given driver?  Effectively we could then
> > > > bind either using the current method of adding to the list of IDs a
> > > > driver will match of changing the ID that a device would match.  Does
> > > > that get us anywhere?  Thanks,
> > > 
> > > The other option for this is to having some sort of priority on the 
> > > device probing with hotplugging.
> > > 
> > > That is you can could do the following:
> > > 
> > >  1) add the device vendor/model in vfio
> > >  2) unbind the BDF from the original driver.
> > >  3) hotplug happens - any new device that has the device vendor/model gets
> > >    owned by vfio instead of the original device.
> > 
> > This doesn't help the device-added-to-inuse-group problem though because
> > we have no idea if the new device would have the same vendor/model as
> > other devices in the group.  By making the device probe ID modifiable,
> 
> Um, you add a hotplugged PCI device in a group that is in usage?

Sure, what if your IOMMU group is an entire conventional PCI
sub-hierarchy that supports hotplug.

> > vfio can watch the IOMMU group notifiers and change the probe ID of new
> 
> Ewwww.

How is that so terrible?  Is it worse than BUG()?

> > devices to either prevent the host driver from claiming them or to allow
> > vfio to claim them.  At the same time we change the problem from "this
> > driver can attach to this kind of device" to "this device can attach to
> > that driver", which also solves Stuart's problem.  Thanks,
> > 
> > Alex
> > 
> > >  4). bind the BDF to the vfio.
> > > 
> > > Granted that is a bit silly too - as the admin might want to have the new
> > > hotplugged device be owned by the native driver.
> > > 
> > > In which case, why not just switch out from using device vendor/model
> > > to just using BDF values?
> 
> Which would still solve the problem. The user-space would just have to
> reassign the device to the vfio group.

Sorry, I'm not really seeing how your proposal is different than what we
have already.  The steps above are exactly what we have today.  The
'new_slot' entry mentioned in reply to Stuart seems to be nothing more
than a PCI specific shortcut for new_id, but nothing substantive changes
about the binding path.  Thanks,

Alex

^ permalink raw reply	[flat|nested] 92+ messages in thread

* Re: mechanism to allow a driver to bind to any device
@ 2014-03-26 17:26                                                         ` Alex Williamson
  0 siblings, 0 replies; 92+ messages in thread
From: Alex Williamson @ 2014-03-26 17:26 UTC (permalink / raw)
  To: Konrad Rzeszutek Wilk
  Cc: kvm-u79uwXL29TY76Z2rM5mHXA, jan.kiszka-kv7WeFo6aLtBDgjK7y7TUQ,
	will.deacon-5wv7dgnIgG8, Stuart Yoder,
	a.rigo-lrHrjnjw1UfHK3s98zE1ajGjJy/sRE9J, Michal Hocko,
	Scott Wood, Varun Sethi, kvmarm-FPEHb7Xf0XXUo1n7N8X6UoWGPAHP3yOg,
	Rafael J. Wysocki, Alexander Graf, Guenter Roeck,
	Dmitry Kasatkin, Tejun Heo, Bjorn Helgaas, Antonios Motakis,
	tech-lrHrjnjw1UfHK3s98zE1ajGjJy/sRE9J, Toshi Kani, Greg KH,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org

On Wed, 2014-03-26 at 13:04 -0400, Konrad Rzeszutek Wilk wrote:
> On Wed, Mar 26, 2014 at 10:49:52AM -0600, Alex Williamson wrote:
> > On Wed, 2014-03-26 at 12:32 -0400, Konrad Rzeszutek Wilk wrote:
> > > On Wed, Mar 26, 2014 at 10:21:02AM -0600, Alex Williamson wrote:
> > > > On Wed, 2014-03-26 at 23:06 +0800, Alexander Graf wrote:
> > > > > 
> > > > > > Am 26.03.2014 um 22:40 schrieb Konrad Rzeszutek Wilk <konrad.wilk-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org>:
> > > > > > 
> > > > > >> On Wed, Mar 26, 2014 at 01:40:32AM +0000, Stuart Yoder wrote:
> > > > > >> Hi Greg,
> > > > > >> 
> > > > > >> We (Linaro, Freescale, Virtual Open Systems) are trying get an issue
> > > > > >> closed that has been perculating for a while around creating a mechanism
> > > > > >> that will allow kernel drivers like vfio can bind to devices of any type.
> > > > > >> 
> > > > > >> This thread with you:
> > > > > >> http://www.spinics.net/lists/kvm-arm/msg08370.html
> > > > > >> ...seems to have died out, so am trying to get your response
> > > > > >> and will summarize again.  Vfio drivers in the kernel (regardless of
> > > > > >> bus type) need to bind to devices of any type.  The driver's function
> > > > > >> is to simply export hardware resources of any type to user space.
> > > > > >> 
> > > > > >> There are several approaches that have been proposed:
> > > > > > 
> > > > > > You seem to have missed the one I proposed.
> > > > > >> 
> > > > > >>   1.  new_id -- (current approach) the user explicitly registers
> > > > > >>       each new device type with the vfio driver using the new_id
> > > > > >>       mechanism.
> > > > > >> 
> > > > > >>       Problem: multiple drivers will be resident that handle the
> > > > > >>       same device type...and there is nothing user space hotplug
> > > > > >>       infrastructure can do to help.
> > > > > >> 
> > > > > >>   2.  "any id" -- the vfio driver could specify a wildcard match
> > > > > >>       of some kind in its ID match table which would allow it to
> > > > > >>       match and bind to any possible device id.  However,
> > > > > >>       we don't want the vfio driver grabbing _all_ devices...just the ones we
> > > > > >>       explicitly want to pass to user space.
> > > > > >> 
> > > > > >>       The proposed patch to support this was to create a new flag
> > > > > >>       "sysfs_bind_only" in struct device_driver.  When this flag
> > > > > >>       is set, the driver can only bind to devices via the sysfs
> > > > > >>       bind file.  This would allow the wildcard match to work.
> > > > > >> 
> > > > > >>       Patch is here:
> > > > > >>       https://lkml.org/lkml/2013/12/3/253
> > > > > >> 
> > > > > >>   3.  "Driver initiated explicit bind" -- with this approach the
> > > > > >>       vfio driver would create a private 'bind' sysfs object
> > > > > >>       and the user would echo the requested device into it:
> > > > > >> 
> > > > > >>       echo 0001:03:00.0 > /sys/bus/pci/drivers/vfio-pci/vfio_bind
> > > > > >> 
> > > > > >>       In order to make that work, the driver would need to call
> > > > > >>       driver_probe_device() and thus we need this patch:
> > > > > >>       https://lkml.org/lkml/2014/2/8/175
> > > > > > 
> > > > > > 4). Use the 'unbind' (from the original device) and 'bind' to vfio driver.
> > > > > 
> > > > > This is approach 2, no?
> > > > > 
> > > > > > 
> > > > > > Which I think is what is currently being done. Why is that not sufficient?
> > > > > 
> > > > > How would 'bind to vfio driver' look like?
> > > > > 
> > > > > > The only thing I see in the URL is " That works, but it is ugly."
> > > > > > There is some mention of race but I don't see how - if you do the 'unbind'
> > > > > > on the original driver and then bind the BDF to the VFIO how would you get
> > > > > > a race?
> > > > > 
> > > > > Typically on PCI, you do a
> > > > > 
> > > > >   - add wildcard (pci id) match to vfio driver
> > > > >   - unbind driver
> > > > >   -> reprobe
> > > > >   -> device attaches to vfio driver because it is the least recent match
> > > > >   - remove wildcard match from vfio driver
> > > > > 
> > > > > If in between you hotplug add a card of the same type, it gets attached to vfio - even though the logical "default driver" would be the device specific driver.
> > > > 
> > > > I've mentioned drivers_autoprobe in the past, but I'm not sure we're
> > > > really factoring it into the discussion.  drivers_autoprobe allows us to
> > > > toggle two points:
> > > > 
> > > > a) When a new device is added whether we automatically give drivers a
> > > > try at binding to it
> > > > 
> > > > b) When a new driver is added whether it gets to try to bind to anything
> > > > in the system
> > > > 
> > > > So we do have a mechanism to avoid the race, but the problem is that it
> > > > becomes the responsibility of userspace to:
> > > > 
> > > > 1) turn off drivers_autoprobe
> > > > 2) unbind/new_id/bind/remove_id
> > > > 3) turn on drivers_autoprobe
> > > > 4) call drivers_probe for anything added between 1) & 3)
> > > > 
> > > > Is the question about the ugliness of the current solution whether it's
> > > > unreasonable to ask userspace to do this?
> > > > 
> > > > What we seem to be asking for above is more like an autoprobe flag per
> > > > driver where there's some way for this special driver to opt out of auto
> > > > probing.  Option 2. in Stuart's list does this by short-cutting ID
> > > > matching so that a "match" is only found when using the sysfs bind path,
> > > > option 3. enables a way for a driver to expose their own sysfs entry
> > > > point for binding.  The latter feels particularly chaotic since drivers
> > > > get to make-up their own bind mechanism.
> > > > 
> > > > Another twist I'll throw in is that devices can be hot added to IOMMU
> > > > groups that are in-use by userspace.  When that happens we'd like to be
> > > > able to disable driver autoprobe of the device to avoid a host driver
> > > > automatically binding to the device.  I wonder if instead of looking at
> > > > the problem from the driver perspective, if we were to instead look at
> > > > it from the device perspective if we might find a solution that would
> > > > address both.  For instance, if devices had a driver_probe_id property
> > > > that was by default set to their bus specific ID match ("$VENDOR
> > > > $DEVICE" on PCI) could we use that to write new match IDs so that a
> > > > device could only bind to a given driver?  Effectively we could then
> > > > bind either using the current method of adding to the list of IDs a
> > > > driver will match of changing the ID that a device would match.  Does
> > > > that get us anywhere?  Thanks,
> > > 
> > > The other option for this is to having some sort of priority on the 
> > > device probing with hotplugging.
> > > 
> > > That is you can could do the following:
> > > 
> > >  1) add the device vendor/model in vfio
> > >  2) unbind the BDF from the original driver.
> > >  3) hotplug happens - any new device that has the device vendor/model gets
> > >    owned by vfio instead of the original device.
> > 
> > This doesn't help the device-added-to-inuse-group problem though because
> > we have no idea if the new device would have the same vendor/model as
> > other devices in the group.  By making the device probe ID modifiable,
> 
> Um, you add a hotplugged PCI device in a group that is in usage?

Sure, what if your IOMMU group is an entire conventional PCI
sub-hierarchy that supports hotplug.

> > vfio can watch the IOMMU group notifiers and change the probe ID of new
> 
> Ewwww.

How is that so terrible?  Is it worse than BUG()?

> > devices to either prevent the host driver from claiming them or to allow
> > vfio to claim them.  At the same time we change the problem from "this
> > driver can attach to this kind of device" to "this device can attach to
> > that driver", which also solves Stuart's problem.  Thanks,
> > 
> > Alex
> > 
> > >  4). bind the BDF to the vfio.
> > > 
> > > Granted that is a bit silly too - as the admin might want to have the new
> > > hotplugged device be owned by the native driver.
> > > 
> > > In which case, why not just switch out from using device vendor/model
> > > to just using BDF values?
> 
> Which would still solve the problem. The user-space would just have to
> reassign the device to the vfio group.

Sorry, I'm not really seeing how your proposal is different than what we
have already.  The steps above are exactly what we have today.  The
'new_slot' entry mentioned in reply to Stuart seems to be nothing more
than a PCI specific shortcut for new_id, but nothing substantive changes
about the binding path.  Thanks,

Alex

^ permalink raw reply	[flat|nested] 92+ messages in thread

* RE: mechanism to allow a driver to bind to any device
       [not found]                                             ` <20140326163209.GB21368-6K5HmflnPlqSPmnEAIUT9EEOCMrvLtNR@public.gmane.org>
  2014-03-26 16:49                                                 ` Alex Williamson
@ 2014-03-26 17:51                                               ` Stuart Yoder
  1 sibling, 0 replies; 92+ messages in thread
From: Stuart Yoder @ 2014-03-26 17:51 UTC (permalink / raw)
  To: Konrad Rzeszutek Wilk, Alex Williamson
  Cc: kvm-u79uwXL29TY76Z2rM5mHXA, jan.kiszka-kv7WeFo6aLtBDgjK7y7TUQ,
	will.deacon-5wv7dgnIgG8, a.rigo-lrHrjnjw1UfHK3s98zE1ajGjJy/sRE9J,
	Michal Hocko, Scott Wood, Varun Sethi,
	kvmarm-FPEHb7Xf0XXUo1n7N8X6UoWGPAHP3yOg, Rafael J. Wysocki,
	Alexander Graf, Guenter Roeck, Dmitry Kasatkin, Tejun Heo,
	Bjorn Helgaas, Antonios Motakis,
	tech-lrHrjnjw1UfHK3s98zE1ajGjJy/sRE9J, Toshi Kani, Greg KH,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA,

> The other option for this is to having some sort of priority on the
> device probing with hotplugging.
> 
> That is you can could do the following:
> 
>  1) add the device vendor/model in vfio
>  2) unbind the BDF from the original driver.
>  3) hotplug happens - any new device that has the device vendor/model
> gets
>    owned by vfio instead of the original device.
>  4). bind the BDF to the vfio.
> 
> Granted that is a bit silly too - as the admin might want to have the new
> hotplugged device be owned by the native driver.
> 
> In which case, why not just switch out from using device vendor/model
> to just using BDF values?

Did you read option #3 in my proposal-- using bus/dev/func # is
one option proposed:

>    3.  "Driver initiated explicit bind" -- with this approach the
>        vfio driver would create a private 'bind' sysfs object
>        and the user would echo the requested device into it:
>  
>        echo 0001:03:00.0 > /sys/bus/pci/drivers/vfio-pci/vfio_bind
              ^^^^^^^^^^^^
              bus/dev/func
> 
>        In order to make that work, the driver would need to call
>        driver_probe_device() and thus we need this patch:
>        https://lkml.org/lkml/2014/2/8/175

But in order to do that the vfio driver needs to call 
driver_probe_device() which it can't right now without
the proposed 2 line patch...which Greg rejects.

Stuart

^ permalink raw reply	[flat|nested] 92+ messages in thread

* Re: mechanism to allow a driver to bind to any device
       [not found]                             ` <54cd150235ba4954becdd12f725c5ebd-ufbTtyGzTTT8GZusEWM6WuO6mTEJWrR4XA4E9RH9d+qIuWR1G4zioA@public.gmane.org>
@ 2014-03-26 21:39                                 ` Antonios Motakis
  2014-03-26 21:39                                 ` Antonios Motakis
  2014-03-26 21:42                                 ` Antonios Motakis
  2 siblings, 0 replies; 92+ messages in thread
From: Antonios Motakis @ 2014-03-26 21:39 UTC (permalink / raw)
  To: Stuart Yoder
  Cc: kvm-u79uwXL29TY76Z2rM5mHXA, jan.kiszka-kv7WeFo6aLtBDgjK7y7TUQ,
	will.deacon-5wv7dgnIgG8, linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	Bjorn Helgaas, Varun Sethi,
	kvmarm-FPEHb7Xf0XXUo1n7N8X6UoWGPAHP3yOg, Rafael J. Wysocki,
	agraf-l3A5Bk7waGM, Guenter Roeck, Dmitry Kasatkin, Tejun Heo,
	Scott Wood, tech-lrHrjnjw1UfHK3s98zE1ajGjJy/sRE9J, Michal Hocko,
	Toshi Kani, Greg KH, a.rigo-lrHrjnjw1UfHK3s98zE1ajGjJy/sRE9J,
	iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA, Joe


[-- Attachment #1.1: Type: text/plain, Size: 2982 bytes --]

Hello,


On Wed, Mar 26, 2014 at 2:40 AM, Stuart Yoder <stuart.yoder-KZfg59tc24xl57MIdRCFDg@public.gmane.org>wrote:

> Hi Greg,
>
> We (Linaro, Freescale, Virtual Open Systems) are trying get an issue
> closed that has been perculating for a while around creating a mechanism
> that will allow kernel drivers like vfio can bind to devices of any type.
>
> This thread with you:
> http://www.spinics.net/lists/kvm-arm/msg08370.html
> ...seems to have died out, so am trying to get your response
> and will summarize again.  Vfio drivers in the kernel (regardless of
> bus type) need to bind to devices of any type.  The driver's function
> is to simply export hardware resources of any type to user space.
>
> There are several approaches that have been proposed:
>
>    1.  new_id -- (current approach) the user explicitly registers
>        each new device type with the vfio driver using the new_id
>        mechanism.
>
>        Problem: multiple drivers will be resident that handle the
>        same device type...and there is nothing user space hotplug
>        infrastructure can do to help.
>
>
Of note is that new_id doesn't work particularly well for platform devices.
Before trying any of the other two solutions, a nasty hack was applied on
the device tree used with the system in order to let vfio-platform match
with a specific device, which is certainly not an acceptable solution.

Implementing wildcard matching quickly reveals the problems mentioned,
which motivates the other two approaches.


>    2.  "any id" -- the vfio driver could specify a wildcard match
>        of some kind in its ID match table which would allow it to
>        match and bind to any possible device id.  However,
>        we don't want the vfio driver grabbing _all_ devices...just the
> ones we
>        explicitly want to pass to user space.
>
>        The proposed patch to support this was to create a new flag
>        "sysfs_bind_only" in struct device_driver.  When this flag
>        is set, the driver can only bind to devices via the sysfs
>        bind file.  This would allow the wildcard match to work.
>
>        Patch is here:
>        https://lkml.org/lkml/2013/12/3/253
>
>    3.  "Driver initiated explicit bind" -- with this approach the
>        vfio driver would create a private 'bind' sysfs object
>        and the user would echo the requested device into it:
>
>        echo 0001:03:00.0 > /sys/bus/pci/drivers/vfio-pci/vfio_bind
>
>        In order to make that work, the driver would need to call
>        driver_probe_device() and thus we need this patch:
>        https://lkml.org/lkml/2014/2/8/175
>
>
> Would like your comment on these options-- option #3 is preferred
> and is literally a 2 line patch.
>

I would definitely agree with approach #3, for which Kim has already
provided a patch. Not having this would make using VFIO with platform
devices really inelegant and strange.


>
> Thanks,
> Stuart
>



-- 
Antonios Motakis
Virtual Open Systems

[-- Attachment #1.2: Type: text/html, Size: 4122 bytes --]

[-- Attachment #2: Type: text/plain, Size: 0 bytes --]



^ permalink raw reply	[flat|nested] 92+ messages in thread

* Re: mechanism to allow a driver to bind to any device
@ 2014-03-26 21:39                                 ` Antonios Motakis
  0 siblings, 0 replies; 92+ messages in thread
From: Antonios Motakis @ 2014-03-26 21:39 UTC (permalink / raw)
  To: Stuart Yoder
  Cc: kvm-u79uwXL29TY76Z2rM5mHXA, jan.kiszka-kv7WeFo6aLtBDgjK7y7TUQ,
	will.deacon-5wv7dgnIgG8, linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	Bjorn Helgaas, Varun Sethi,
	kvmarm-FPEHb7Xf0XXUo1n7N8X6UoWGPAHP3yOg, Rafael J. Wysocki,
	agraf-l3A5Bk7waGM, Guenter Roeck, Dmitry Kasatkin, Tejun Heo,
	Scott Wood, tech-lrHrjnjw1UfHK3s98zE1ajGjJy/sRE9J, Michal Hocko,
	Toshi Kani, Greg KH, a.rigo-lrHrjnjw1UfHK3s98zE1ajGjJy/sRE9J,
	iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA, Joe


[-- Attachment #1.1: Type: text/plain, Size: 2982 bytes --]

Hello,


On Wed, Mar 26, 2014 at 2:40 AM, Stuart Yoder <stuart.yoder-KZfg59tc24xl57MIdRCFDg@public.gmane.org>wrote:

> Hi Greg,
>
> We (Linaro, Freescale, Virtual Open Systems) are trying get an issue
> closed that has been perculating for a while around creating a mechanism
> that will allow kernel drivers like vfio can bind to devices of any type.
>
> This thread with you:
> http://www.spinics.net/lists/kvm-arm/msg08370.html
> ...seems to have died out, so am trying to get your response
> and will summarize again.  Vfio drivers in the kernel (regardless of
> bus type) need to bind to devices of any type.  The driver's function
> is to simply export hardware resources of any type to user space.
>
> There are several approaches that have been proposed:
>
>    1.  new_id -- (current approach) the user explicitly registers
>        each new device type with the vfio driver using the new_id
>        mechanism.
>
>        Problem: multiple drivers will be resident that handle the
>        same device type...and there is nothing user space hotplug
>        infrastructure can do to help.
>
>
Of note is that new_id doesn't work particularly well for platform devices.
Before trying any of the other two solutions, a nasty hack was applied on
the device tree used with the system in order to let vfio-platform match
with a specific device, which is certainly not an acceptable solution.

Implementing wildcard matching quickly reveals the problems mentioned,
which motivates the other two approaches.


>    2.  "any id" -- the vfio driver could specify a wildcard match
>        of some kind in its ID match table which would allow it to
>        match and bind to any possible device id.  However,
>        we don't want the vfio driver grabbing _all_ devices...just the
> ones we
>        explicitly want to pass to user space.
>
>        The proposed patch to support this was to create a new flag
>        "sysfs_bind_only" in struct device_driver.  When this flag
>        is set, the driver can only bind to devices via the sysfs
>        bind file.  This would allow the wildcard match to work.
>
>        Patch is here:
>        https://lkml.org/lkml/2013/12/3/253
>
>    3.  "Driver initiated explicit bind" -- with this approach the
>        vfio driver would create a private 'bind' sysfs object
>        and the user would echo the requested device into it:
>
>        echo 0001:03:00.0 > /sys/bus/pci/drivers/vfio-pci/vfio_bind
>
>        In order to make that work, the driver would need to call
>        driver_probe_device() and thus we need this patch:
>        https://lkml.org/lkml/2014/2/8/175
>
>
> Would like your comment on these options-- option #3 is preferred
> and is literally a 2 line patch.
>

I would definitely agree with approach #3, for which Kim has already
provided a patch. Not having this would make using VFIO with platform
devices really inelegant and strange.


>
> Thanks,
> Stuart
>



-- 
Antonios Motakis
Virtual Open Systems

[-- Attachment #1.2: Type: text/html, Size: 4122 bytes --]

[-- Attachment #2: Type: text/plain, Size: 0 bytes --]



^ permalink raw reply	[flat|nested] 92+ messages in thread

* Re: mechanism to allow a driver to bind to any device
       [not found]                             ` <54cd150235ba4954becdd12f725c5ebd-ufbTtyGzTTT8GZusEWM6WuO6mTEJWrR4XA4E9RH9d+qIuWR1G4zioA@public.gmane.org>
@ 2014-03-26 21:42                                 ` Antonios Motakis
  2014-03-26 21:39                                 ` Antonios Motakis
  2014-03-26 21:42                                 ` Antonios Motakis
  2 siblings, 0 replies; 92+ messages in thread
From: Antonios Motakis @ 2014-03-26 21:42 UTC (permalink / raw)
  To: Stuart Yoder
  Cc: kvm-u79uwXL29TY76Z2rM5mHXA, jan.kiszka-kv7WeFo6aLtBDgjK7y7TUQ,
	will.deacon-5wv7dgnIgG8, linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	Bjorn Helgaas, Varun Sethi,
	kvmarm-FPEHb7Xf0XXUo1n7N8X6UoWGPAHP3yOg, Rafael J. Wysocki,
	agraf-l3A5Bk7waGM, Guenter Roeck, Dmitry Kasatkin, Tejun Heo,
	Scott Wood, tech-lrHrjnjw1UfHK3s98zE1ajGjJy/sRE9J, Michal Hocko,
	Toshi Kani, Greg KH, a.rigo-lrHrjnjw1UfHK3s98zE1ajGjJy/sRE9J,
	iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA, Joe

(Resend because of html)

On Wed, Mar 26, 2014 at 2:40 AM, Stuart Yoder
<stuart.yoder-KZfg59tc24xl57MIdRCFDg@public.gmane.org> wrote:
> Hi Greg,
>
> We (Linaro, Freescale, Virtual Open Systems) are trying get an issue
> closed that has been perculating for a while around creating a mechanism
> that will allow kernel drivers like vfio can bind to devices of any type.
>
> This thread with you:
> http://www.spinics.net/lists/kvm-arm/msg08370.html
> ...seems to have died out, so am trying to get your response
> and will summarize again.  Vfio drivers in the kernel (regardless of
> bus type) need to bind to devices of any type.  The driver's function
> is to simply export hardware resources of any type to user space.
>
> There are several approaches that have been proposed:
>
>    1.  new_id -- (current approach) the user explicitly registers
>        each new device type with the vfio driver using the new_id
>        mechanism.
>
>        Problem: multiple drivers will be resident that handle the
>        same device type...and there is nothing user space hotplug
>        infrastructure can do to help.
>

Of note is that new_id doesn't work particularly well for platform
devices. Before trying any of the other two solutions, a nasty hack
was applied on the device tree used with the system in order to let
vfio-platform match with a specific device, which is certainly not an
acceptable solution.

Implementing wildcard matching quickly reveals the problems mentioned,
which motivates the other two approaches.

>    2.  "any id" -- the vfio driver could specify a wildcard match
>        of some kind in its ID match table which would allow it to
>        match and bind to any possible device id.  However,
>        we don't want the vfio driver grabbing _all_ devices...just the ones we
>        explicitly want to pass to user space.
>
>        The proposed patch to support this was to create a new flag
>        "sysfs_bind_only" in struct device_driver.  When this flag
>        is set, the driver can only bind to devices via the sysfs
>        bind file.  This would allow the wildcard match to work.
>
>        Patch is here:
>        https://lkml.org/lkml/2013/12/3/253
>
>    3.  "Driver initiated explicit bind" -- with this approach the
>        vfio driver would create a private 'bind' sysfs object
>        and the user would echo the requested device into it:
>
>        echo 0001:03:00.0 > /sys/bus/pci/drivers/vfio-pci/vfio_bind
>
>        In order to make that work, the driver would need to call
>        driver_probe_device() and thus we need this patch:
>        https://lkml.org/lkml/2014/2/8/175
>
>
> Would like your comment on these options-- option #3 is preferred
> and is literally a 2 line patch.

I would definitely agree with approach #3, for which Kim has already
provided a patch. Not having this would make using VFIO with platform
devices really inelegant and strange.

>
> Thanks,
> Stuart



-- 
Antonios Motakis
Virtual Open Systems

^ permalink raw reply	[flat|nested] 92+ messages in thread

* Re: mechanism to allow a driver to bind to any device
@ 2014-03-26 21:42                                 ` Antonios Motakis
  0 siblings, 0 replies; 92+ messages in thread
From: Antonios Motakis @ 2014-03-26 21:42 UTC (permalink / raw)
  To: Stuart Yoder
  Cc: kvm-u79uwXL29TY76Z2rM5mHXA, jan.kiszka-kv7WeFo6aLtBDgjK7y7TUQ,
	will.deacon-5wv7dgnIgG8, linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	Bjorn Helgaas, Varun Sethi,
	kvmarm-FPEHb7Xf0XXUo1n7N8X6UoWGPAHP3yOg, Rafael J. Wysocki,
	agraf-l3A5Bk7waGM, Guenter Roeck, Dmitry Kasatkin, Tejun Heo,
	Scott Wood, tech-lrHrjnjw1UfHK3s98zE1ajGjJy/sRE9J, Michal Hocko,
	Toshi Kani, Greg KH, a.rigo-lrHrjnjw1UfHK3s98zE1ajGjJy/sRE9J,
	iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA, Joe

(Resend because of html)

On Wed, Mar 26, 2014 at 2:40 AM, Stuart Yoder
<stuart.yoder-KZfg59tc24xl57MIdRCFDg@public.gmane.org> wrote:
> Hi Greg,
>
> We (Linaro, Freescale, Virtual Open Systems) are trying get an issue
> closed that has been perculating for a while around creating a mechanism
> that will allow kernel drivers like vfio can bind to devices of any type.
>
> This thread with you:
> http://www.spinics.net/lists/kvm-arm/msg08370.html
> ...seems to have died out, so am trying to get your response
> and will summarize again.  Vfio drivers in the kernel (regardless of
> bus type) need to bind to devices of any type.  The driver's function
> is to simply export hardware resources of any type to user space.
>
> There are several approaches that have been proposed:
>
>    1.  new_id -- (current approach) the user explicitly registers
>        each new device type with the vfio driver using the new_id
>        mechanism.
>
>        Problem: multiple drivers will be resident that handle the
>        same device type...and there is nothing user space hotplug
>        infrastructure can do to help.
>

Of note is that new_id doesn't work particularly well for platform
devices. Before trying any of the other two solutions, a nasty hack
was applied on the device tree used with the system in order to let
vfio-platform match with a specific device, which is certainly not an
acceptable solution.

Implementing wildcard matching quickly reveals the problems mentioned,
which motivates the other two approaches.

>    2.  "any id" -- the vfio driver could specify a wildcard match
>        of some kind in its ID match table which would allow it to
>        match and bind to any possible device id.  However,
>        we don't want the vfio driver grabbing _all_ devices...just the ones we
>        explicitly want to pass to user space.
>
>        The proposed patch to support this was to create a new flag
>        "sysfs_bind_only" in struct device_driver.  When this flag
>        is set, the driver can only bind to devices via the sysfs
>        bind file.  This would allow the wildcard match to work.
>
>        Patch is here:
>        https://lkml.org/lkml/2013/12/3/253
>
>    3.  "Driver initiated explicit bind" -- with this approach the
>        vfio driver would create a private 'bind' sysfs object
>        and the user would echo the requested device into it:
>
>        echo 0001:03:00.0 > /sys/bus/pci/drivers/vfio-pci/vfio_bind
>
>        In order to make that work, the driver would need to call
>        driver_probe_device() and thus we need this patch:
>        https://lkml.org/lkml/2014/2/8/175
>
>
> Would like your comment on these options-- option #3 is preferred
> and is literally a 2 line patch.

I would definitely agree with approach #3, for which Kim has already
provided a patch. Not having this would make using VFIO with platform
devices really inelegant and strange.

>
> Thanks,
> Stuart



-- 
Antonios Motakis
Virtual Open Systems

^ permalink raw reply	[flat|nested] 92+ messages in thread

* Re: mechanism to allow a driver to bind to any device
       [not found]                                         ` <1395850862.632.247.camel-85EaTFmN5p//9pzu0YdTqQ@public.gmane.org>
  2014-03-26 16:32                                             ` Konrad Rzeszutek Wilk
@ 2014-03-26 22:09                                           ` Alex Williamson
       [not found]                                             ` <1395871761.632.316.camel-85EaTFmN5p//9pzu0YdTqQ@public.gmane.org>
  2014-03-31 18:47                                               ` Stuart Yoder
  2014-03-31 18:32                                             ` Stuart Yoder
  2 siblings, 2 replies; 92+ messages in thread
From: Alex Williamson @ 2014-03-26 22:09 UTC (permalink / raw)
  To: Alexander Graf
  Cc: kvm-u79uwXL29TY76Z2rM5mHXA, jan.kiszka-kv7WeFo6aLtBDgjK7y7TUQ,
	will.deacon-5wv7dgnIgG8, Stuart Yoder,
	a.rigo-lrHrjnjw1UfHK3s98zE1ajGjJy/sRE9J, Michal Hocko,
	Bjorn Helgaas, Varun Sethi,
	kvmarm-FPEHb7Xf0XXUo1n7N8X6UoWGPAHP3yOg, Rafael J. Wysocki,
	Guenter Roeck, Dmitry Kasatkin, Tejun Heo, Scott Wood,
	Antonios Motakis, tech-lrHrjnjw1UfHK3s98zE1ajGjJy/sRE9J,
	Toshi Kani, Greg KH, linux-kernel-u79uwXL29TY76Z2rM5mHXA,

On Wed, 2014-03-26 at 10:21 -0600, Alex Williamson wrote:
> On Wed, 2014-03-26 at 23:06 +0800, Alexander Graf wrote:
> > 
> > > Am 26.03.2014 um 22:40 schrieb Konrad Rzeszutek Wilk <konrad.wilk-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org>:
> > > 
> > >> On Wed, Mar 26, 2014 at 01:40:32AM +0000, Stuart Yoder wrote:
> > >> Hi Greg,
> > >> 
> > >> We (Linaro, Freescale, Virtual Open Systems) are trying get an issue
> > >> closed that has been perculating for a while around creating a mechanism
> > >> that will allow kernel drivers like vfio can bind to devices of any type.
> > >> 
> > >> This thread with you:
> > >> http://www.spinics.net/lists/kvm-arm/msg08370.html
> > >> ...seems to have died out, so am trying to get your response
> > >> and will summarize again.  Vfio drivers in the kernel (regardless of
> > >> bus type) need to bind to devices of any type.  The driver's function
> > >> is to simply export hardware resources of any type to user space.
> > >> 
> > >> There are several approaches that have been proposed:
> > > 
> > > You seem to have missed the one I proposed.
> > >> 
> > >>   1.  new_id -- (current approach) the user explicitly registers
> > >>       each new device type with the vfio driver using the new_id
> > >>       mechanism.
> > >> 
> > >>       Problem: multiple drivers will be resident that handle the
> > >>       same device type...and there is nothing user space hotplug
> > >>       infrastructure can do to help.
> > >> 
> > >>   2.  "any id" -- the vfio driver could specify a wildcard match
> > >>       of some kind in its ID match table which would allow it to
> > >>       match and bind to any possible device id.  However,
> > >>       we don't want the vfio driver grabbing _all_ devices...just the ones we
> > >>       explicitly want to pass to user space.
> > >> 
> > >>       The proposed patch to support this was to create a new flag
> > >>       "sysfs_bind_only" in struct device_driver.  When this flag
> > >>       is set, the driver can only bind to devices via the sysfs
> > >>       bind file.  This would allow the wildcard match to work.
> > >> 
> > >>       Patch is here:
> > >>       https://lkml.org/lkml/2013/12/3/253
> > >> 
> > >>   3.  "Driver initiated explicit bind" -- with this approach the
> > >>       vfio driver would create a private 'bind' sysfs object
> > >>       and the user would echo the requested device into it:
> > >> 
> > >>       echo 0001:03:00.0 > /sys/bus/pci/drivers/vfio-pci/vfio_bind
> > >> 
> > >>       In order to make that work, the driver would need to call
> > >>       driver_probe_device() and thus we need this patch:
> > >>       https://lkml.org/lkml/2014/2/8/175
> > > 
> > > 4). Use the 'unbind' (from the original device) and 'bind' to vfio driver.
> > 
> > This is approach 2, no?
> > 
> > > 
> > > Which I think is what is currently being done. Why is that not sufficient?
> > 
> > How would 'bind to vfio driver' look like?
> > 
> > > The only thing I see in the URL is " That works, but it is ugly."
> > > There is some mention of race but I don't see how - if you do the 'unbind'
> > > on the original driver and then bind the BDF to the VFIO how would you get
> > > a race?
> > 
> > Typically on PCI, you do a
> > 
> >   - add wildcard (pci id) match to vfio driver
> >   - unbind driver
> >   -> reprobe
> >   -> device attaches to vfio driver because it is the least recent match
> >   - remove wildcard match from vfio driver
> > 
> > If in between you hotplug add a card of the same type, it gets attached to vfio - even though the logical "default driver" would be the device specific driver.
> 
> I've mentioned drivers_autoprobe in the past, but I'm not sure we're
> really factoring it into the discussion.  drivers_autoprobe allows us to
> toggle two points:
> 
> a) When a new device is added whether we automatically give drivers a
> try at binding to it
> 
> b) When a new driver is added whether it gets to try to bind to anything
> in the system
> 
> So we do have a mechanism to avoid the race, but the problem is that it
> becomes the responsibility of userspace to:
> 
> 1) turn off drivers_autoprobe
> 2) unbind/new_id/bind/remove_id
> 3) turn on drivers_autoprobe
> 4) call drivers_probe for anything added between 1) & 3)
> 
> Is the question about the ugliness of the current solution whether it's
> unreasonable to ask userspace to do this?
> 
> What we seem to be asking for above is more like an autoprobe flag per
> driver where there's some way for this special driver to opt out of auto
> probing.  Option 2. in Stuart's list does this by short-cutting ID
> matching so that a "match" is only found when using the sysfs bind path,
> option 3. enables a way for a driver to expose their own sysfs entry
> point for binding.  The latter feels particularly chaotic since drivers
> get to make-up their own bind mechanism.
> 
> Another twist I'll throw in is that devices can be hot added to IOMMU
> groups that are in-use by userspace.  When that happens we'd like to be
> able to disable driver autoprobe of the device to avoid a host driver
> automatically binding to the device.  I wonder if instead of looking at
> the problem from the driver perspective, if we were to instead look at
> it from the device perspective if we might find a solution that would
> address both.  For instance, if devices had a driver_probe_id property
> that was by default set to their bus specific ID match ("$VENDOR
> $DEVICE" on PCI) could we use that to write new match IDs so that a
> device could only bind to a given driver?  Effectively we could then
> bind either using the current method of adding to the list of IDs a
> driver will match of changing the ID that a device would match.  Does
> that get us anywhere?  Thanks,

Here's one way this might work for PCI; note that we can do this
entirely in the bus driver for PCI.  Bind/unbind would go like this:

# bind device to vfio-pci
echo vfio-pci > /sys/bus/pci/devices/0000\:03\:00.0/preferred_driver
echo 0000:03:00.0 > /sys/bus/pci/devices/0000\:03\:00.0/driver/unbind
echo 0000:03:00.0 > /sys/bus/pci/drivers_probe

# bind device back to host driver
echo > /sys/bus/pci/devices/0000\:03\:00.0/preferred_driver
echo 0000:03:00.0 > /sys/bus/pci/devices/0000\:03\:00.0/driver/unbind
echo 0000:03:00.0 > /sys/bus/pci/drivers_probe

When preferred_driver is set for a device it will match and bind only to
a driver with a matching name.  This also means we can write random
strings here to avoid a device being bound to any driver if we want.

In the example patch below I've put the preferred_driver in the struct
pci_dev, but if this mechanism were adopted by multiple devices perhaps
we could add it to struct device.  Would something like this work for
platform devices?

Note 1, the below is just the core PCI driver change to support this,
there's some trivial collateral damage from changing an exported
function not shown here for brevity.

Note 2, PCI passes a struct pci_device_id to the driver probe function
which would be NULL in the preferred driver case of the example below.
We'd need to dynamically create one of these when calling the probe
function to make this practical for drivers that use that data.  Thanks,

Alex

Signed-off-by: Alex Williamson <alex.williamson-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>

diff --git a/drivers/pci/pci-driver.c b/drivers/pci/pci-driver.c
index d911e0c..9425920 100644
--- a/drivers/pci/pci-driver.c
+++ b/drivers/pci/pci-driver.c
@@ -203,17 +203,23 @@ ATTRIBUTE_GROUPS(pci_drv);
  * Deprecated, don't use this as it will not catch any dynamic ids
  * that a driver might want to check for.
  */
-const struct pci_device_id *pci_match_id(const struct pci_device_id *ids,
-					 struct pci_dev *dev)
+int pci_match_id(const struct pci_device_id *ids, struct pci_dev *dev,
+		 const struct pci_device_id **id)
 {
+	if (id)
+		*id = NULL;
+
 	if (ids) {
 		while (ids->vendor || ids->subvendor || ids->class_mask) {
-			if (pci_match_one_device(ids, dev))
-				return ids;
+			if (pci_match_one_device(ids, dev)) {
+				if (id)
+					*id = ids;
+				return 1;
+			}
 			ids++;
 		}
 	}
-	return NULL;
+	return 0;
 }
 
 /**
@@ -225,22 +231,30 @@ const struct pci_device_id *pci_match_id(const struct pci_device_id *ids,
  * system is in its list of supported devices.  Returns the matching
  * pci_device_id structure or %NULL if there is no match.
  */
-static const struct pci_device_id *pci_match_device(struct pci_driver *drv,
-						    struct pci_dev *dev)
+static int pci_match_device(struct pci_driver *drv, struct pci_dev *dev,
+			    const struct pci_device_id **id)
 {
 	struct pci_dynid *dynid;
 
+	if (id)
+		*id = NULL;
+
+	if (dev->preferred_driver)
+		return !strcmp(drv->name, dev->preferred_driver);
+
 	/* Look at the dynamic ids first, before the static ones */
 	spin_lock(&drv->dynids.lock);
 	list_for_each_entry(dynid, &drv->dynids.list, node) {
 		if (pci_match_one_device(&dynid->id, dev)) {
 			spin_unlock(&drv->dynids.lock);
-			return &dynid->id;
+			if (id)
+				*id = &dynid->id;
+			return 1;
 		}
 	}
 	spin_unlock(&drv->dynids.lock);
 
-	return pci_match_id(drv->id_table, dev);
+	return pci_match_id(drv->id_table, dev, id);
 }
 
 struct drv_dev_and_id {
@@ -342,8 +356,7 @@ __pci_device_probe(struct pci_driver *drv, struct pci_dev *pci_dev)
 	if (!pci_dev->driver && drv->probe) {
 		error = -ENODEV;
 
-		id = pci_match_device(drv, pci_dev);
-		if (id)
+		if (pci_match_device(drv, pci_dev, &id))
 			error = pci_call_probe(drv, pci_dev, id);
 		if (error >= 0)
 			error = 0;
@@ -1272,17 +1285,12 @@ static int pci_bus_match(struct device *dev, struct device_driver *drv)
 {
 	struct pci_dev *pci_dev = to_pci_dev(dev);
 	struct pci_driver *pci_drv;
-	const struct pci_device_id *found_id;
 
 	if (!pci_dev->match_driver)
 		return 0;
 
 	pci_drv = to_pci_driver(drv);
-	found_id = pci_match_device(pci_drv, pci_dev);
-	if (found_id)
-		return 1;
-
-	return 0;
+	return pci_match_device(pci_drv, pci_dev, NULL);
 }
 
 /**
diff --git a/drivers/pci/pci-sysfs.c b/drivers/pci/pci-sysfs.c
index 4e0acef..d6075f8 100644
--- a/drivers/pci/pci-sysfs.c
+++ b/drivers/pci/pci-sysfs.c
@@ -222,6 +222,46 @@ static ssize_t enabled_show(struct device *dev,
 }
 static DEVICE_ATTR_RW(enabled);
 
+static ssize_t preferred_driver_store(struct device *dev,
+				      struct device_attribute *attr,
+				      const char *buf, size_t count)
+{
+	struct pci_dev *pdev = to_pci_dev(dev);
+	char *preferred_driver, *old = pdev->preferred_driver;
+
+	if (count > PATH_MAX)
+		return -EINVAL;
+
+	preferred_driver = kstrndup(buf, count, GFP_KERNEL);
+	if (!preferred_driver)
+		return -ENOMEM;
+
+	while (strlen(preferred_driver) &&
+	       preferred_driver[strlen(preferred_driver) - 1] == '\n')
+		preferred_driver[strlen(preferred_driver) - 1] = '\0';
+
+	if (strlen(preferred_driver)) {
+		pdev->preferred_driver = preferred_driver;
+	} else {
+		kfree(preferred_driver);
+		pdev->preferred_driver = NULL;
+	}
+			
+	if (old)
+		kfree(old);
+
+	return count;
+}
+
+static ssize_t preferred_driver_show(struct device *dev,
+				     struct device_attribute *attr, char *buf)
+{
+	struct pci_dev *pdev = to_pci_dev(dev);
+
+	return sprintf(buf, "%s\n", pdev->preferred_driver);
+}
+static DEVICE_ATTR_RW(preferred_driver);
+
 #ifdef CONFIG_NUMA
 static ssize_t
 numa_node_show(struct device *dev, struct device_attribute *attr, char *buf)
@@ -521,6 +561,7 @@ static struct attribute *pci_dev_attrs[] = {
 #if defined(CONFIG_PM_RUNTIME) && defined(CONFIG_ACPI)
 	&dev_attr_d3cold_allowed.attr,
 #endif
+	&dev_attr_preferred_driver.attr,
 	NULL,
 };
 
diff --git a/include/linux/pci.h b/include/linux/pci.h
index aab57b4..6fecb0a 100644
--- a/include/linux/pci.h
+++ b/include/linux/pci.h
@@ -365,6 +365,7 @@ struct pci_dev {
 #endif
 	phys_addr_t rom; /* Physical address of ROM if it's not from the BAR */
 	size_t romlen; /* Length of ROM if it's not from the BAR */
+	char *preferred_driver; /* Preferred driver, supercedes ID matching */
 };
 
 static inline struct pci_dev *pci_physfn(struct pci_dev *dev)
@@ -1111,8 +1112,8 @@ int pci_add_dynid(struct pci_driver *drv,
 		  unsigned int subvendor, unsigned int subdevice,
 		  unsigned int class, unsigned int class_mask,
 		  unsigned long driver_data);
-const struct pci_device_id *pci_match_id(const struct pci_device_id *ids,
-					 struct pci_dev *dev);
+int pci_match_id(const struct pci_device_id *ids, struct pci_dev *dev,
+		 const struct pci_device_id **id);
 int pci_scan_bridge(struct pci_bus *bus, struct pci_dev *dev, int max,
 		    int pass);
 

^ permalink raw reply related	[flat|nested] 92+ messages in thread

* Re: mechanism to allow a driver to bind to any device
       [not found]                                 ` <CAG8rG2xCvCGJWwZTnkia5GD3BVJZB9SmKOm79T6Q1FnhgB+urw-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
@ 2014-03-28  6:59                                     ` Greg KH
  0 siblings, 0 replies; 92+ messages in thread
From: Greg KH @ 2014-03-28  6:59 UTC (permalink / raw)
  To: Antonios Motakis
  Cc: kvm-u79uwXL29TY76Z2rM5mHXA, jan.kiszka-kv7WeFo6aLtBDgjK7y7TUQ,
	will.deacon-5wv7dgnIgG8, Stuart Yoder,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA, Bjorn Helgaas, Varun Sethi,
	kvmarm-FPEHb7Xf0XXUo1n7N8X6UoWGPAHP3yOg, Rafael J. Wysocki,
	agraf-l3A5Bk7waGM, Guenter Roeck, Dmitry Kasatkin, Tejun Heo,
	Scott Wood, tech-lrHrjnjw1UfHK3s98zE1ajGjJy/sRE9J, Michal Hocko,
	Toshi Kani, a.rigo-lrHrjnjw1UfHK3s98zE1ajGjJy/sRE9J,
	iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA, Joe

On Wed, Mar 26, 2014 at 10:39:57PM +0100, Antonios Motakis wrote:
> 
> Of note is that new_id doesn't work particularly well for platform devices.

Nor should it.  Platform devices suck horribly, and "ids" mean nothing
to them, so you shouldn't even try this.  Use a "real" bus and it should
be fine.

greg k-h

^ permalink raw reply	[flat|nested] 92+ messages in thread

* Re: mechanism to allow a driver to bind to any device
@ 2014-03-28  6:59                                     ` Greg KH
  0 siblings, 0 replies; 92+ messages in thread
From: Greg KH @ 2014-03-28  6:59 UTC (permalink / raw)
  To: Antonios Motakis
  Cc: kvm-u79uwXL29TY76Z2rM5mHXA, jan.kiszka-kv7WeFo6aLtBDgjK7y7TUQ,
	will.deacon-5wv7dgnIgG8, Stuart Yoder,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA, Bjorn Helgaas, Varun Sethi,
	kvmarm-FPEHb7Xf0XXUo1n7N8X6UoWGPAHP3yOg, Rafael J. Wysocki,
	agraf-l3A5Bk7waGM, Guenter Roeck, Dmitry Kasatkin, Tejun Heo,
	Scott Wood, tech-lrHrjnjw1UfHK3s98zE1ajGjJy/sRE9J, Michal Hocko,
	Toshi Kani, a.rigo-lrHrjnjw1UfHK3s98zE1ajGjJy/sRE9J,
	iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA, Joe

On Wed, Mar 26, 2014 at 10:39:57PM +0100, Antonios Motakis wrote:
> 
> Of note is that new_id doesn't work particularly well for platform devices.

Nor should it.  Platform devices suck horribly, and "ids" mean nothing
to them, so you shouldn't even try this.  Use a "real" bus and it should
be fine.

greg k-h

^ permalink raw reply	[flat|nested] 92+ messages in thread

* Re: mechanism to allow a driver to bind to any device
       [not found]                                             ` <1395871761.632.316.camel-85EaTFmN5p//9pzu0YdTqQ@public.gmane.org>
@ 2014-03-28 16:58                                                 ` Konrad Rzeszutek Wilk
  0 siblings, 0 replies; 92+ messages in thread
From: Konrad Rzeszutek Wilk @ 2014-03-28 16:58 UTC (permalink / raw)
  To: Alex Williamson
  Cc: kvm-u79uwXL29TY76Z2rM5mHXA, jan.kiszka-kv7WeFo6aLtBDgjK7y7TUQ,
	will.deacon-5wv7dgnIgG8, Stuart Yoder,
	a.rigo-lrHrjnjw1UfHK3s98zE1ajGjJy/sRE9J, Michal Hocko,
	Scott Wood, Varun Sethi, kvmarm-FPEHb7Xf0XXUo1n7N8X6UoWGPAHP3yOg,
	Rafael J. Wysocki, Alexander Graf, Guenter Roeck,
	Dmitry Kasatkin, Tejun Heo, Bjorn Helgaas, Antonios Motakis,
	tech-lrHrjnjw1UfHK3s98zE1ajGjJy/sRE9J, Toshi Kani, Greg KH,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org

On Wed, Mar 26, 2014 at 04:09:21PM -0600, Alex Williamson wrote:
> On Wed, 2014-03-26 at 10:21 -0600, Alex Williamson wrote:
> > On Wed, 2014-03-26 at 23:06 +0800, Alexander Graf wrote:
> > > 
> > > > Am 26.03.2014 um 22:40 schrieb Konrad Rzeszutek Wilk <konrad.wilk-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org>:
> > > > 
> > > >> On Wed, Mar 26, 2014 at 01:40:32AM +0000, Stuart Yoder wrote:
> > > >> Hi Greg,
> > > >> 
> > > >> We (Linaro, Freescale, Virtual Open Systems) are trying get an issue
> > > >> closed that has been perculating for a while around creating a mechanism
> > > >> that will allow kernel drivers like vfio can bind to devices of any type.
> > > >> 
> > > >> This thread with you:
> > > >> http://www.spinics.net/lists/kvm-arm/msg08370.html
> > > >> ...seems to have died out, so am trying to get your response
> > > >> and will summarize again.  Vfio drivers in the kernel (regardless of
> > > >> bus type) need to bind to devices of any type.  The driver's function
> > > >> is to simply export hardware resources of any type to user space.
> > > >> 
> > > >> There are several approaches that have been proposed:
> > > > 
> > > > You seem to have missed the one I proposed.
> > > >> 
> > > >>   1.  new_id -- (current approach) the user explicitly registers
> > > >>       each new device type with the vfio driver using the new_id
> > > >>       mechanism.
> > > >> 
> > > >>       Problem: multiple drivers will be resident that handle the
> > > >>       same device type...and there is nothing user space hotplug
> > > >>       infrastructure can do to help.
> > > >> 
> > > >>   2.  "any id" -- the vfio driver could specify a wildcard match
> > > >>       of some kind in its ID match table which would allow it to
> > > >>       match and bind to any possible device id.  However,
> > > >>       we don't want the vfio driver grabbing _all_ devices...just the ones we
> > > >>       explicitly want to pass to user space.
> > > >> 
> > > >>       The proposed patch to support this was to create a new flag
> > > >>       "sysfs_bind_only" in struct device_driver.  When this flag
> > > >>       is set, the driver can only bind to devices via the sysfs
> > > >>       bind file.  This would allow the wildcard match to work.
> > > >> 
> > > >>       Patch is here:
> > > >>       https://lkml.org/lkml/2013/12/3/253
> > > >> 
> > > >>   3.  "Driver initiated explicit bind" -- with this approach the
> > > >>       vfio driver would create a private 'bind' sysfs object
> > > >>       and the user would echo the requested device into it:
> > > >> 
> > > >>       echo 0001:03:00.0 > /sys/bus/pci/drivers/vfio-pci/vfio_bind
> > > >> 
> > > >>       In order to make that work, the driver would need to call
> > > >>       driver_probe_device() and thus we need this patch:
> > > >>       https://lkml.org/lkml/2014/2/8/175
> > > > 
> > > > 4). Use the 'unbind' (from the original device) and 'bind' to vfio driver.
> > > 
> > > This is approach 2, no?
> > > 
> > > > 
> > > > Which I think is what is currently being done. Why is that not sufficient?
> > > 
> > > How would 'bind to vfio driver' look like?
> > > 
> > > > The only thing I see in the URL is " That works, but it is ugly."
> > > > There is some mention of race but I don't see how - if you do the 'unbind'
> > > > on the original driver and then bind the BDF to the VFIO how would you get
> > > > a race?
> > > 
> > > Typically on PCI, you do a
> > > 
> > >   - add wildcard (pci id) match to vfio driver
> > >   - unbind driver
> > >   -> reprobe
> > >   -> device attaches to vfio driver because it is the least recent match
> > >   - remove wildcard match from vfio driver
> > > 
> > > If in between you hotplug add a card of the same type, it gets attached to vfio - even though the logical "default driver" would be the device specific driver.
> > 
> > I've mentioned drivers_autoprobe in the past, but I'm not sure we're
> > really factoring it into the discussion.  drivers_autoprobe allows us to
> > toggle two points:
> > 
> > a) When a new device is added whether we automatically give drivers a
> > try at binding to it
> > 
> > b) When a new driver is added whether it gets to try to bind to anything
> > in the system
> > 
> > So we do have a mechanism to avoid the race, but the problem is that it
> > becomes the responsibility of userspace to:
> > 
> > 1) turn off drivers_autoprobe
> > 2) unbind/new_id/bind/remove_id
> > 3) turn on drivers_autoprobe
> > 4) call drivers_probe for anything added between 1) & 3)
> > 
> > Is the question about the ugliness of the current solution whether it's
> > unreasonable to ask userspace to do this?
> > 
> > What we seem to be asking for above is more like an autoprobe flag per
> > driver where there's some way for this special driver to opt out of auto
> > probing.  Option 2. in Stuart's list does this by short-cutting ID
> > matching so that a "match" is only found when using the sysfs bind path,
> > option 3. enables a way for a driver to expose their own sysfs entry
> > point for binding.  The latter feels particularly chaotic since drivers
> > get to make-up their own bind mechanism.
> > 
> > Another twist I'll throw in is that devices can be hot added to IOMMU
> > groups that are in-use by userspace.  When that happens we'd like to be
> > able to disable driver autoprobe of the device to avoid a host driver
> > automatically binding to the device.  I wonder if instead of looking at
> > the problem from the driver perspective, if we were to instead look at
> > it from the device perspective if we might find a solution that would
> > address both.  For instance, if devices had a driver_probe_id property
> > that was by default set to their bus specific ID match ("$VENDOR
> > $DEVICE" on PCI) could we use that to write new match IDs so that a
> > device could only bind to a given driver?  Effectively we could then
> > bind either using the current method of adding to the list of IDs a
> > driver will match of changing the ID that a device would match.  Does
> > that get us anywhere?  Thanks,
> 
> Here's one way this might work for PCI; note that we can do this
> entirely in the bus driver for PCI.  Bind/unbind would go like this:
> 
> # bind device to vfio-pci
> echo vfio-pci > /sys/bus/pci/devices/0000\:03\:00.0/preferred_driver
> echo 0000:03:00.0 > /sys/bus/pci/devices/0000\:03\:00.0/driver/unbind
> echo 0000:03:00.0 > /sys/bus/pci/drivers_probe
> 
> # bind device back to host driver
> echo > /sys/bus/pci/devices/0000\:03\:00.0/preferred_driver
> echo 0000:03:00.0 > /sys/bus/pci/devices/0000\:03\:00.0/driver/unbind
> echo 0000:03:00.0 > /sys/bus/pci/drivers_probe
> 
> When preferred_driver is set for a device it will match and bind only to
> a driver with a matching name.  This also means we can write random
> strings here to avoid a device being bound to any driver if we want.
> 
> In the example patch below I've put the preferred_driver in the struct
> pci_dev, but if this mechanism were adopted by multiple devices perhaps
> we could add it to struct device.  Would something like this work for
> platform devices?
> 
> Note 1, the below is just the core PCI driver change to support this,
> there's some trivial collateral damage from changing an exported
> function not shown here for brevity.
> 
> Note 2, PCI passes a struct pci_device_id to the driver probe function
> which would be NULL in the preferred driver case of the example below.
> We'd need to dynamically create one of these when calling the probe
> function to make this practical for drivers that use that data.  Thanks,

That is I think a much easier way. Thought I would just call
it 'override' instead of preferred_driver, since well, that is its
intent.

Thank you for prototyping it!
> 
> Alex
> 
> Signed-off-by: Alex Williamson <alex.williamson-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
> 
> diff --git a/drivers/pci/pci-driver.c b/drivers/pci/pci-driver.c
> index d911e0c..9425920 100644
> --- a/drivers/pci/pci-driver.c
> +++ b/drivers/pci/pci-driver.c
> @@ -203,17 +203,23 @@ ATTRIBUTE_GROUPS(pci_drv);
>   * Deprecated, don't use this as it will not catch any dynamic ids
>   * that a driver might want to check for.
>   */
> -const struct pci_device_id *pci_match_id(const struct pci_device_id *ids,
> -					 struct pci_dev *dev)
> +int pci_match_id(const struct pci_device_id *ids, struct pci_dev *dev,
> +		 const struct pci_device_id **id)
>  {
> +	if (id)
> +		*id = NULL;
> +
>  	if (ids) {
>  		while (ids->vendor || ids->subvendor || ids->class_mask) {
> -			if (pci_match_one_device(ids, dev))
> -				return ids;
> +			if (pci_match_one_device(ids, dev)) {
> +				if (id)
> +					*id = ids;
> +				return 1;
> +			}
>  			ids++;
>  		}
>  	}
> -	return NULL;
> +	return 0;
>  }
>  
>  /**
> @@ -225,22 +231,30 @@ const struct pci_device_id *pci_match_id(const struct pci_device_id *ids,
>   * system is in its list of supported devices.  Returns the matching
>   * pci_device_id structure or %NULL if there is no match.
>   */
> -static const struct pci_device_id *pci_match_device(struct pci_driver *drv,
> -						    struct pci_dev *dev)
> +static int pci_match_device(struct pci_driver *drv, struct pci_dev *dev,
> +			    const struct pci_device_id **id)
>  {
>  	struct pci_dynid *dynid;
>  
> +	if (id)
> +		*id = NULL;
> +
> +	if (dev->preferred_driver)
> +		return !strcmp(drv->name, dev->preferred_driver);
> +
>  	/* Look at the dynamic ids first, before the static ones */
>  	spin_lock(&drv->dynids.lock);
>  	list_for_each_entry(dynid, &drv->dynids.list, node) {
>  		if (pci_match_one_device(&dynid->id, dev)) {
>  			spin_unlock(&drv->dynids.lock);
> -			return &dynid->id;
> +			if (id)
> +				*id = &dynid->id;
> +			return 1;
>  		}
>  	}
>  	spin_unlock(&drv->dynids.lock);
>  
> -	return pci_match_id(drv->id_table, dev);
> +	return pci_match_id(drv->id_table, dev, id);
>  }
>  
>  struct drv_dev_and_id {
> @@ -342,8 +356,7 @@ __pci_device_probe(struct pci_driver *drv, struct pci_dev *pci_dev)
>  	if (!pci_dev->driver && drv->probe) {
>  		error = -ENODEV;
>  
> -		id = pci_match_device(drv, pci_dev);
> -		if (id)
> +		if (pci_match_device(drv, pci_dev, &id))
>  			error = pci_call_probe(drv, pci_dev, id);
>  		if (error >= 0)
>  			error = 0;
> @@ -1272,17 +1285,12 @@ static int pci_bus_match(struct device *dev, struct device_driver *drv)
>  {
>  	struct pci_dev *pci_dev = to_pci_dev(dev);
>  	struct pci_driver *pci_drv;
> -	const struct pci_device_id *found_id;
>  
>  	if (!pci_dev->match_driver)
>  		return 0;
>  
>  	pci_drv = to_pci_driver(drv);
> -	found_id = pci_match_device(pci_drv, pci_dev);
> -	if (found_id)
> -		return 1;
> -
> -	return 0;
> +	return pci_match_device(pci_drv, pci_dev, NULL);
>  }
>  
>  /**
> diff --git a/drivers/pci/pci-sysfs.c b/drivers/pci/pci-sysfs.c
> index 4e0acef..d6075f8 100644
> --- a/drivers/pci/pci-sysfs.c
> +++ b/drivers/pci/pci-sysfs.c
> @@ -222,6 +222,46 @@ static ssize_t enabled_show(struct device *dev,
>  }
>  static DEVICE_ATTR_RW(enabled);
>  
> +static ssize_t preferred_driver_store(struct device *dev,
> +				      struct device_attribute *attr,
> +				      const char *buf, size_t count)
> +{
> +	struct pci_dev *pdev = to_pci_dev(dev);
> +	char *preferred_driver, *old = pdev->preferred_driver;
> +
> +	if (count > PATH_MAX)
> +		return -EINVAL;
> +
> +	preferred_driver = kstrndup(buf, count, GFP_KERNEL);
> +	if (!preferred_driver)
> +		return -ENOMEM;
> +
> +	while (strlen(preferred_driver) &&
> +	       preferred_driver[strlen(preferred_driver) - 1] == '\n')
> +		preferred_driver[strlen(preferred_driver) - 1] = '\0';
> +
> +	if (strlen(preferred_driver)) {
> +		pdev->preferred_driver = preferred_driver;
> +	} else {
> +		kfree(preferred_driver);
> +		pdev->preferred_driver = NULL;
> +	}
> +			
> +	if (old)
> +		kfree(old);
> +
> +	return count;
> +}
> +
> +static ssize_t preferred_driver_show(struct device *dev,
> +				     struct device_attribute *attr, char *buf)
> +{
> +	struct pci_dev *pdev = to_pci_dev(dev);
> +
> +	return sprintf(buf, "%s\n", pdev->preferred_driver);
> +}
> +static DEVICE_ATTR_RW(preferred_driver);
> +
>  #ifdef CONFIG_NUMA
>  static ssize_t
>  numa_node_show(struct device *dev, struct device_attribute *attr, char *buf)
> @@ -521,6 +561,7 @@ static struct attribute *pci_dev_attrs[] = {
>  #if defined(CONFIG_PM_RUNTIME) && defined(CONFIG_ACPI)
>  	&dev_attr_d3cold_allowed.attr,
>  #endif
> +	&dev_attr_preferred_driver.attr,
>  	NULL,
>  };
>  
> diff --git a/include/linux/pci.h b/include/linux/pci.h
> index aab57b4..6fecb0a 100644
> --- a/include/linux/pci.h
> +++ b/include/linux/pci.h
> @@ -365,6 +365,7 @@ struct pci_dev {
>  #endif
>  	phys_addr_t rom; /* Physical address of ROM if it's not from the BAR */
>  	size_t romlen; /* Length of ROM if it's not from the BAR */
> +	char *preferred_driver; /* Preferred driver, supercedes ID matching */
>  };
>  
>  static inline struct pci_dev *pci_physfn(struct pci_dev *dev)
> @@ -1111,8 +1112,8 @@ int pci_add_dynid(struct pci_driver *drv,
>  		  unsigned int subvendor, unsigned int subdevice,
>  		  unsigned int class, unsigned int class_mask,
>  		  unsigned long driver_data);
> -const struct pci_device_id *pci_match_id(const struct pci_device_id *ids,
> -					 struct pci_dev *dev);
> +int pci_match_id(const struct pci_device_id *ids, struct pci_dev *dev,
> +		 const struct pci_device_id **id);
>  int pci_scan_bridge(struct pci_bus *bus, struct pci_dev *dev, int max,
>  		    int pass);
>  
> 
> 
> _______________________________________________
> iommu mailing list
> iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA@public.gmane.org
> https://lists.linuxfoundation.org/mailman/listinfo/iommu

^ permalink raw reply	[flat|nested] 92+ messages in thread

* Re: mechanism to allow a driver to bind to any device
@ 2014-03-28 16:58                                                 ` Konrad Rzeszutek Wilk
  0 siblings, 0 replies; 92+ messages in thread
From: Konrad Rzeszutek Wilk @ 2014-03-28 16:58 UTC (permalink / raw)
  To: Alex Williamson
  Cc: kvm-u79uwXL29TY76Z2rM5mHXA, jan.kiszka-kv7WeFo6aLtBDgjK7y7TUQ,
	will.deacon-5wv7dgnIgG8, Stuart Yoder,
	a.rigo-lrHrjnjw1UfHK3s98zE1ajGjJy/sRE9J, Michal Hocko,
	Scott Wood, Varun Sethi, kvmarm-FPEHb7Xf0XXUo1n7N8X6UoWGPAHP3yOg,
	Rafael J. Wysocki, Alexander Graf, Guenter Roeck,
	Dmitry Kasatkin, Tejun Heo, Bjorn Helgaas, Antonios Motakis,
	tech-lrHrjnjw1UfHK3s98zE1ajGjJy/sRE9J, Toshi Kani, Greg KH,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org

On Wed, Mar 26, 2014 at 04:09:21PM -0600, Alex Williamson wrote:
> On Wed, 2014-03-26 at 10:21 -0600, Alex Williamson wrote:
> > On Wed, 2014-03-26 at 23:06 +0800, Alexander Graf wrote:
> > > 
> > > > Am 26.03.2014 um 22:40 schrieb Konrad Rzeszutek Wilk <konrad.wilk-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org>:
> > > > 
> > > >> On Wed, Mar 26, 2014 at 01:40:32AM +0000, Stuart Yoder wrote:
> > > >> Hi Greg,
> > > >> 
> > > >> We (Linaro, Freescale, Virtual Open Systems) are trying get an issue
> > > >> closed that has been perculating for a while around creating a mechanism
> > > >> that will allow kernel drivers like vfio can bind to devices of any type.
> > > >> 
> > > >> This thread with you:
> > > >> http://www.spinics.net/lists/kvm-arm/msg08370.html
> > > >> ...seems to have died out, so am trying to get your response
> > > >> and will summarize again.  Vfio drivers in the kernel (regardless of
> > > >> bus type) need to bind to devices of any type.  The driver's function
> > > >> is to simply export hardware resources of any type to user space.
> > > >> 
> > > >> There are several approaches that have been proposed:
> > > > 
> > > > You seem to have missed the one I proposed.
> > > >> 
> > > >>   1.  new_id -- (current approach) the user explicitly registers
> > > >>       each new device type with the vfio driver using the new_id
> > > >>       mechanism.
> > > >> 
> > > >>       Problem: multiple drivers will be resident that handle the
> > > >>       same device type...and there is nothing user space hotplug
> > > >>       infrastructure can do to help.
> > > >> 
> > > >>   2.  "any id" -- the vfio driver could specify a wildcard match
> > > >>       of some kind in its ID match table which would allow it to
> > > >>       match and bind to any possible device id.  However,
> > > >>       we don't want the vfio driver grabbing _all_ devices...just the ones we
> > > >>       explicitly want to pass to user space.
> > > >> 
> > > >>       The proposed patch to support this was to create a new flag
> > > >>       "sysfs_bind_only" in struct device_driver.  When this flag
> > > >>       is set, the driver can only bind to devices via the sysfs
> > > >>       bind file.  This would allow the wildcard match to work.
> > > >> 
> > > >>       Patch is here:
> > > >>       https://lkml.org/lkml/2013/12/3/253
> > > >> 
> > > >>   3.  "Driver initiated explicit bind" -- with this approach the
> > > >>       vfio driver would create a private 'bind' sysfs object
> > > >>       and the user would echo the requested device into it:
> > > >> 
> > > >>       echo 0001:03:00.0 > /sys/bus/pci/drivers/vfio-pci/vfio_bind
> > > >> 
> > > >>       In order to make that work, the driver would need to call
> > > >>       driver_probe_device() and thus we need this patch:
> > > >>       https://lkml.org/lkml/2014/2/8/175
> > > > 
> > > > 4). Use the 'unbind' (from the original device) and 'bind' to vfio driver.
> > > 
> > > This is approach 2, no?
> > > 
> > > > 
> > > > Which I think is what is currently being done. Why is that not sufficient?
> > > 
> > > How would 'bind to vfio driver' look like?
> > > 
> > > > The only thing I see in the URL is " That works, but it is ugly."
> > > > There is some mention of race but I don't see how - if you do the 'unbind'
> > > > on the original driver and then bind the BDF to the VFIO how would you get
> > > > a race?
> > > 
> > > Typically on PCI, you do a
> > > 
> > >   - add wildcard (pci id) match to vfio driver
> > >   - unbind driver
> > >   -> reprobe
> > >   -> device attaches to vfio driver because it is the least recent match
> > >   - remove wildcard match from vfio driver
> > > 
> > > If in between you hotplug add a card of the same type, it gets attached to vfio - even though the logical "default driver" would be the device specific driver.
> > 
> > I've mentioned drivers_autoprobe in the past, but I'm not sure we're
> > really factoring it into the discussion.  drivers_autoprobe allows us to
> > toggle two points:
> > 
> > a) When a new device is added whether we automatically give drivers a
> > try at binding to it
> > 
> > b) When a new driver is added whether it gets to try to bind to anything
> > in the system
> > 
> > So we do have a mechanism to avoid the race, but the problem is that it
> > becomes the responsibility of userspace to:
> > 
> > 1) turn off drivers_autoprobe
> > 2) unbind/new_id/bind/remove_id
> > 3) turn on drivers_autoprobe
> > 4) call drivers_probe for anything added between 1) & 3)
> > 
> > Is the question about the ugliness of the current solution whether it's
> > unreasonable to ask userspace to do this?
> > 
> > What we seem to be asking for above is more like an autoprobe flag per
> > driver where there's some way for this special driver to opt out of auto
> > probing.  Option 2. in Stuart's list does this by short-cutting ID
> > matching so that a "match" is only found when using the sysfs bind path,
> > option 3. enables a way for a driver to expose their own sysfs entry
> > point for binding.  The latter feels particularly chaotic since drivers
> > get to make-up their own bind mechanism.
> > 
> > Another twist I'll throw in is that devices can be hot added to IOMMU
> > groups that are in-use by userspace.  When that happens we'd like to be
> > able to disable driver autoprobe of the device to avoid a host driver
> > automatically binding to the device.  I wonder if instead of looking at
> > the problem from the driver perspective, if we were to instead look at
> > it from the device perspective if we might find a solution that would
> > address both.  For instance, if devices had a driver_probe_id property
> > that was by default set to their bus specific ID match ("$VENDOR
> > $DEVICE" on PCI) could we use that to write new match IDs so that a
> > device could only bind to a given driver?  Effectively we could then
> > bind either using the current method of adding to the list of IDs a
> > driver will match of changing the ID that a device would match.  Does
> > that get us anywhere?  Thanks,
> 
> Here's one way this might work for PCI; note that we can do this
> entirely in the bus driver for PCI.  Bind/unbind would go like this:
> 
> # bind device to vfio-pci
> echo vfio-pci > /sys/bus/pci/devices/0000\:03\:00.0/preferred_driver
> echo 0000:03:00.0 > /sys/bus/pci/devices/0000\:03\:00.0/driver/unbind
> echo 0000:03:00.0 > /sys/bus/pci/drivers_probe
> 
> # bind device back to host driver
> echo > /sys/bus/pci/devices/0000\:03\:00.0/preferred_driver
> echo 0000:03:00.0 > /sys/bus/pci/devices/0000\:03\:00.0/driver/unbind
> echo 0000:03:00.0 > /sys/bus/pci/drivers_probe
> 
> When preferred_driver is set for a device it will match and bind only to
> a driver with a matching name.  This also means we can write random
> strings here to avoid a device being bound to any driver if we want.
> 
> In the example patch below I've put the preferred_driver in the struct
> pci_dev, but if this mechanism were adopted by multiple devices perhaps
> we could add it to struct device.  Would something like this work for
> platform devices?
> 
> Note 1, the below is just the core PCI driver change to support this,
> there's some trivial collateral damage from changing an exported
> function not shown here for brevity.
> 
> Note 2, PCI passes a struct pci_device_id to the driver probe function
> which would be NULL in the preferred driver case of the example below.
> We'd need to dynamically create one of these when calling the probe
> function to make this practical for drivers that use that data.  Thanks,

That is I think a much easier way. Thought I would just call
it 'override' instead of preferred_driver, since well, that is its
intent.

Thank you for prototyping it!
> 
> Alex
> 
> Signed-off-by: Alex Williamson <alex.williamson-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
> 
> diff --git a/drivers/pci/pci-driver.c b/drivers/pci/pci-driver.c
> index d911e0c..9425920 100644
> --- a/drivers/pci/pci-driver.c
> +++ b/drivers/pci/pci-driver.c
> @@ -203,17 +203,23 @@ ATTRIBUTE_GROUPS(pci_drv);
>   * Deprecated, don't use this as it will not catch any dynamic ids
>   * that a driver might want to check for.
>   */
> -const struct pci_device_id *pci_match_id(const struct pci_device_id *ids,
> -					 struct pci_dev *dev)
> +int pci_match_id(const struct pci_device_id *ids, struct pci_dev *dev,
> +		 const struct pci_device_id **id)
>  {
> +	if (id)
> +		*id = NULL;
> +
>  	if (ids) {
>  		while (ids->vendor || ids->subvendor || ids->class_mask) {
> -			if (pci_match_one_device(ids, dev))
> -				return ids;
> +			if (pci_match_one_device(ids, dev)) {
> +				if (id)
> +					*id = ids;
> +				return 1;
> +			}
>  			ids++;
>  		}
>  	}
> -	return NULL;
> +	return 0;
>  }
>  
>  /**
> @@ -225,22 +231,30 @@ const struct pci_device_id *pci_match_id(const struct pci_device_id *ids,
>   * system is in its list of supported devices.  Returns the matching
>   * pci_device_id structure or %NULL if there is no match.
>   */
> -static const struct pci_device_id *pci_match_device(struct pci_driver *drv,
> -						    struct pci_dev *dev)
> +static int pci_match_device(struct pci_driver *drv, struct pci_dev *dev,
> +			    const struct pci_device_id **id)
>  {
>  	struct pci_dynid *dynid;
>  
> +	if (id)
> +		*id = NULL;
> +
> +	if (dev->preferred_driver)
> +		return !strcmp(drv->name, dev->preferred_driver);
> +
>  	/* Look at the dynamic ids first, before the static ones */
>  	spin_lock(&drv->dynids.lock);
>  	list_for_each_entry(dynid, &drv->dynids.list, node) {
>  		if (pci_match_one_device(&dynid->id, dev)) {
>  			spin_unlock(&drv->dynids.lock);
> -			return &dynid->id;
> +			if (id)
> +				*id = &dynid->id;
> +			return 1;
>  		}
>  	}
>  	spin_unlock(&drv->dynids.lock);
>  
> -	return pci_match_id(drv->id_table, dev);
> +	return pci_match_id(drv->id_table, dev, id);
>  }
>  
>  struct drv_dev_and_id {
> @@ -342,8 +356,7 @@ __pci_device_probe(struct pci_driver *drv, struct pci_dev *pci_dev)
>  	if (!pci_dev->driver && drv->probe) {
>  		error = -ENODEV;
>  
> -		id = pci_match_device(drv, pci_dev);
> -		if (id)
> +		if (pci_match_device(drv, pci_dev, &id))
>  			error = pci_call_probe(drv, pci_dev, id);
>  		if (error >= 0)
>  			error = 0;
> @@ -1272,17 +1285,12 @@ static int pci_bus_match(struct device *dev, struct device_driver *drv)
>  {
>  	struct pci_dev *pci_dev = to_pci_dev(dev);
>  	struct pci_driver *pci_drv;
> -	const struct pci_device_id *found_id;
>  
>  	if (!pci_dev->match_driver)
>  		return 0;
>  
>  	pci_drv = to_pci_driver(drv);
> -	found_id = pci_match_device(pci_drv, pci_dev);
> -	if (found_id)
> -		return 1;
> -
> -	return 0;
> +	return pci_match_device(pci_drv, pci_dev, NULL);
>  }
>  
>  /**
> diff --git a/drivers/pci/pci-sysfs.c b/drivers/pci/pci-sysfs.c
> index 4e0acef..d6075f8 100644
> --- a/drivers/pci/pci-sysfs.c
> +++ b/drivers/pci/pci-sysfs.c
> @@ -222,6 +222,46 @@ static ssize_t enabled_show(struct device *dev,
>  }
>  static DEVICE_ATTR_RW(enabled);
>  
> +static ssize_t preferred_driver_store(struct device *dev,
> +				      struct device_attribute *attr,
> +				      const char *buf, size_t count)
> +{
> +	struct pci_dev *pdev = to_pci_dev(dev);
> +	char *preferred_driver, *old = pdev->preferred_driver;
> +
> +	if (count > PATH_MAX)
> +		return -EINVAL;
> +
> +	preferred_driver = kstrndup(buf, count, GFP_KERNEL);
> +	if (!preferred_driver)
> +		return -ENOMEM;
> +
> +	while (strlen(preferred_driver) &&
> +	       preferred_driver[strlen(preferred_driver) - 1] == '\n')
> +		preferred_driver[strlen(preferred_driver) - 1] = '\0';
> +
> +	if (strlen(preferred_driver)) {
> +		pdev->preferred_driver = preferred_driver;
> +	} else {
> +		kfree(preferred_driver);
> +		pdev->preferred_driver = NULL;
> +	}
> +			
> +	if (old)
> +		kfree(old);
> +
> +	return count;
> +}
> +
> +static ssize_t preferred_driver_show(struct device *dev,
> +				     struct device_attribute *attr, char *buf)
> +{
> +	struct pci_dev *pdev = to_pci_dev(dev);
> +
> +	return sprintf(buf, "%s\n", pdev->preferred_driver);
> +}
> +static DEVICE_ATTR_RW(preferred_driver);
> +
>  #ifdef CONFIG_NUMA
>  static ssize_t
>  numa_node_show(struct device *dev, struct device_attribute *attr, char *buf)
> @@ -521,6 +561,7 @@ static struct attribute *pci_dev_attrs[] = {
>  #if defined(CONFIG_PM_RUNTIME) && defined(CONFIG_ACPI)
>  	&dev_attr_d3cold_allowed.attr,
>  #endif
> +	&dev_attr_preferred_driver.attr,
>  	NULL,
>  };
>  
> diff --git a/include/linux/pci.h b/include/linux/pci.h
> index aab57b4..6fecb0a 100644
> --- a/include/linux/pci.h
> +++ b/include/linux/pci.h
> @@ -365,6 +365,7 @@ struct pci_dev {
>  #endif
>  	phys_addr_t rom; /* Physical address of ROM if it's not from the BAR */
>  	size_t romlen; /* Length of ROM if it's not from the BAR */
> +	char *preferred_driver; /* Preferred driver, supercedes ID matching */
>  };
>  
>  static inline struct pci_dev *pci_physfn(struct pci_dev *dev)
> @@ -1111,8 +1112,8 @@ int pci_add_dynid(struct pci_driver *drv,
>  		  unsigned int subvendor, unsigned int subdevice,
>  		  unsigned int class, unsigned int class_mask,
>  		  unsigned long driver_data);
> -const struct pci_device_id *pci_match_id(const struct pci_device_id *ids,
> -					 struct pci_dev *dev);
> +int pci_match_id(const struct pci_device_id *ids, struct pci_dev *dev,
> +		 const struct pci_device_id **id);
>  int pci_scan_bridge(struct pci_bus *bus, struct pci_dev *dev, int max,
>  		    int pass);
>  
> 
> 
> _______________________________________________
> iommu mailing list
> iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA@public.gmane.org
> https://lists.linuxfoundation.org/mailman/listinfo/iommu

^ permalink raw reply	[flat|nested] 92+ messages in thread

* Re: mechanism to allow a driver to bind to any device
       [not found]                                                 ` <20140328165809.GA12659-6K5HmflnPlqSPmnEAIUT9EEOCMrvLtNR@public.gmane.org>
@ 2014-03-28 17:10                                                     ` Alex Williamson
  0 siblings, 0 replies; 92+ messages in thread
From: Alex Williamson @ 2014-03-28 17:10 UTC (permalink / raw)
  To: Konrad Rzeszutek Wilk
  Cc: kvm-u79uwXL29TY76Z2rM5mHXA, jan.kiszka-kv7WeFo6aLtBDgjK7y7TUQ,
	will.deacon-5wv7dgnIgG8, Stuart Yoder,
	a.rigo-lrHrjnjw1UfHK3s98zE1ajGjJy/sRE9J, Michal Hocko,
	Scott Wood, Varun Sethi, kvmarm-FPEHb7Xf0XXUo1n7N8X6UoWGPAHP3yOg,
	Rafael J. Wysocki, Alexander Graf, Guenter Roeck,
	Dmitry Kasatkin, Tejun Heo, Bjorn Helgaas, Antonios Motakis,
	tech-lrHrjnjw1UfHK3s98zE1ajGjJy/sRE9J, Toshi Kani, Greg KH,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org

On Fri, 2014-03-28 at 12:58 -0400, Konrad Rzeszutek Wilk wrote:
> On Wed, Mar 26, 2014 at 04:09:21PM -0600, Alex Williamson wrote:
> > On Wed, 2014-03-26 at 10:21 -0600, Alex Williamson wrote:
> > > On Wed, 2014-03-26 at 23:06 +0800, Alexander Graf wrote:
> > > > 
> > > > > Am 26.03.2014 um 22:40 schrieb Konrad Rzeszutek Wilk <konrad.wilk-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org>:
> > > > > 
> > > > >> On Wed, Mar 26, 2014 at 01:40:32AM +0000, Stuart Yoder wrote:
> > > > >> Hi Greg,
> > > > >> 
> > > > >> We (Linaro, Freescale, Virtual Open Systems) are trying get an issue
> > > > >> closed that has been perculating for a while around creating a mechanism
> > > > >> that will allow kernel drivers like vfio can bind to devices of any type.
> > > > >> 
> > > > >> This thread with you:
> > > > >> http://www.spinics.net/lists/kvm-arm/msg08370.html
> > > > >> ...seems to have died out, so am trying to get your response
> > > > >> and will summarize again.  Vfio drivers in the kernel (regardless of
> > > > >> bus type) need to bind to devices of any type.  The driver's function
> > > > >> is to simply export hardware resources of any type to user space.
> > > > >> 
> > > > >> There are several approaches that have been proposed:
> > > > > 
> > > > > You seem to have missed the one I proposed.
> > > > >> 
> > > > >>   1.  new_id -- (current approach) the user explicitly registers
> > > > >>       each new device type with the vfio driver using the new_id
> > > > >>       mechanism.
> > > > >> 
> > > > >>       Problem: multiple drivers will be resident that handle the
> > > > >>       same device type...and there is nothing user space hotplug
> > > > >>       infrastructure can do to help.
> > > > >> 
> > > > >>   2.  "any id" -- the vfio driver could specify a wildcard match
> > > > >>       of some kind in its ID match table which would allow it to
> > > > >>       match and bind to any possible device id.  However,
> > > > >>       we don't want the vfio driver grabbing _all_ devices...just the ones we
> > > > >>       explicitly want to pass to user space.
> > > > >> 
> > > > >>       The proposed patch to support this was to create a new flag
> > > > >>       "sysfs_bind_only" in struct device_driver.  When this flag
> > > > >>       is set, the driver can only bind to devices via the sysfs
> > > > >>       bind file.  This would allow the wildcard match to work.
> > > > >> 
> > > > >>       Patch is here:
> > > > >>       https://lkml.org/lkml/2013/12/3/253
> > > > >> 
> > > > >>   3.  "Driver initiated explicit bind" -- with this approach the
> > > > >>       vfio driver would create a private 'bind' sysfs object
> > > > >>       and the user would echo the requested device into it:
> > > > >> 
> > > > >>       echo 0001:03:00.0 > /sys/bus/pci/drivers/vfio-pci/vfio_bind
> > > > >> 
> > > > >>       In order to make that work, the driver would need to call
> > > > >>       driver_probe_device() and thus we need this patch:
> > > > >>       https://lkml.org/lkml/2014/2/8/175
> > > > > 
> > > > > 4). Use the 'unbind' (from the original device) and 'bind' to vfio driver.
> > > > 
> > > > This is approach 2, no?
> > > > 
> > > > > 
> > > > > Which I think is what is currently being done. Why is that not sufficient?
> > > > 
> > > > How would 'bind to vfio driver' look like?
> > > > 
> > > > > The only thing I see in the URL is " That works, but it is ugly."
> > > > > There is some mention of race but I don't see how - if you do the 'unbind'
> > > > > on the original driver and then bind the BDF to the VFIO how would you get
> > > > > a race?
> > > > 
> > > > Typically on PCI, you do a
> > > > 
> > > >   - add wildcard (pci id) match to vfio driver
> > > >   - unbind driver
> > > >   -> reprobe
> > > >   -> device attaches to vfio driver because it is the least recent match
> > > >   - remove wildcard match from vfio driver
> > > > 
> > > > If in between you hotplug add a card of the same type, it gets attached to vfio - even though the logical "default driver" would be the device specific driver.
> > > 
> > > I've mentioned drivers_autoprobe in the past, but I'm not sure we're
> > > really factoring it into the discussion.  drivers_autoprobe allows us to
> > > toggle two points:
> > > 
> > > a) When a new device is added whether we automatically give drivers a
> > > try at binding to it
> > > 
> > > b) When a new driver is added whether it gets to try to bind to anything
> > > in the system
> > > 
> > > So we do have a mechanism to avoid the race, but the problem is that it
> > > becomes the responsibility of userspace to:
> > > 
> > > 1) turn off drivers_autoprobe
> > > 2) unbind/new_id/bind/remove_id
> > > 3) turn on drivers_autoprobe
> > > 4) call drivers_probe for anything added between 1) & 3)
> > > 
> > > Is the question about the ugliness of the current solution whether it's
> > > unreasonable to ask userspace to do this?
> > > 
> > > What we seem to be asking for above is more like an autoprobe flag per
> > > driver where there's some way for this special driver to opt out of auto
> > > probing.  Option 2. in Stuart's list does this by short-cutting ID
> > > matching so that a "match" is only found when using the sysfs bind path,
> > > option 3. enables a way for a driver to expose their own sysfs entry
> > > point for binding.  The latter feels particularly chaotic since drivers
> > > get to make-up their own bind mechanism.
> > > 
> > > Another twist I'll throw in is that devices can be hot added to IOMMU
> > > groups that are in-use by userspace.  When that happens we'd like to be
> > > able to disable driver autoprobe of the device to avoid a host driver
> > > automatically binding to the device.  I wonder if instead of looking at
> > > the problem from the driver perspective, if we were to instead look at
> > > it from the device perspective if we might find a solution that would
> > > address both.  For instance, if devices had a driver_probe_id property
> > > that was by default set to their bus specific ID match ("$VENDOR
> > > $DEVICE" on PCI) could we use that to write new match IDs so that a
> > > device could only bind to a given driver?  Effectively we could then
> > > bind either using the current method of adding to the list of IDs a
> > > driver will match of changing the ID that a device would match.  Does
> > > that get us anywhere?  Thanks,
> > 
> > Here's one way this might work for PCI; note that we can do this
> > entirely in the bus driver for PCI.  Bind/unbind would go like this:
> > 
> > # bind device to vfio-pci
> > echo vfio-pci > /sys/bus/pci/devices/0000\:03\:00.0/preferred_driver
> > echo 0000:03:00.0 > /sys/bus/pci/devices/0000\:03\:00.0/driver/unbind
> > echo 0000:03:00.0 > /sys/bus/pci/drivers_probe
> > 
> > # bind device back to host driver
> > echo > /sys/bus/pci/devices/0000\:03\:00.0/preferred_driver
> > echo 0000:03:00.0 > /sys/bus/pci/devices/0000\:03\:00.0/driver/unbind
> > echo 0000:03:00.0 > /sys/bus/pci/drivers_probe
> > 
> > When preferred_driver is set for a device it will match and bind only to
> > a driver with a matching name.  This also means we can write random
> > strings here to avoid a device being bound to any driver if we want.
> > 
> > In the example patch below I've put the preferred_driver in the struct
> > pci_dev, but if this mechanism were adopted by multiple devices perhaps
> > we could add it to struct device.  Would something like this work for
> > platform devices?
> > 
> > Note 1, the below is just the core PCI driver change to support this,
> > there's some trivial collateral damage from changing an exported
> > function not shown here for brevity.
> > 
> > Note 2, PCI passes a struct pci_device_id to the driver probe function
> > which would be NULL in the preferred driver case of the example below.
> > We'd need to dynamically create one of these when calling the probe
> > function to make this practical for drivers that use that data.  Thanks,
> 
> That is I think a much easier way. Thought I would just call
> it 'override' instead of preferred_driver, since well, that is its
> intent.
> 
> Thank you for prototyping it!

I've realized since this first draft that returning NULL for the
pci_device_id would be unexpected for a number of drivers and probably
cause null pointer dereferences.  This is an implementation detail
though, we probably want a static "any ID" pci_device_id to return in
the case that there are no static table or dynid matches yet we still
want the override to match.  This should result in a smaller patch.
I'll wait for feasibility from the platform folks before I do another
revision though.  Thanks,

Alex
 
> > Signed-off-by: Alex Williamson <alex.williamson-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
> > 
> > diff --git a/drivers/pci/pci-driver.c b/drivers/pci/pci-driver.c
> > index d911e0c..9425920 100644
> > --- a/drivers/pci/pci-driver.c
> > +++ b/drivers/pci/pci-driver.c
> > @@ -203,17 +203,23 @@ ATTRIBUTE_GROUPS(pci_drv);
> >   * Deprecated, don't use this as it will not catch any dynamic ids
> >   * that a driver might want to check for.
> >   */
> > -const struct pci_device_id *pci_match_id(const struct pci_device_id *ids,
> > -					 struct pci_dev *dev)
> > +int pci_match_id(const struct pci_device_id *ids, struct pci_dev *dev,
> > +		 const struct pci_device_id **id)
> >  {
> > +	if (id)
> > +		*id = NULL;
> > +
> >  	if (ids) {
> >  		while (ids->vendor || ids->subvendor || ids->class_mask) {
> > -			if (pci_match_one_device(ids, dev))
> > -				return ids;
> > +			if (pci_match_one_device(ids, dev)) {
> > +				if (id)
> > +					*id = ids;
> > +				return 1;
> > +			}
> >  			ids++;
> >  		}
> >  	}
> > -	return NULL;
> > +	return 0;
> >  }
> >  
> >  /**
> > @@ -225,22 +231,30 @@ const struct pci_device_id *pci_match_id(const struct pci_device_id *ids,
> >   * system is in its list of supported devices.  Returns the matching
> >   * pci_device_id structure or %NULL if there is no match.
> >   */
> > -static const struct pci_device_id *pci_match_device(struct pci_driver *drv,
> > -						    struct pci_dev *dev)
> > +static int pci_match_device(struct pci_driver *drv, struct pci_dev *dev,
> > +			    const struct pci_device_id **id)
> >  {
> >  	struct pci_dynid *dynid;
> >  
> > +	if (id)
> > +		*id = NULL;
> > +
> > +	if (dev->preferred_driver)
> > +		return !strcmp(drv->name, dev->preferred_driver);
> > +
> >  	/* Look at the dynamic ids first, before the static ones */
> >  	spin_lock(&drv->dynids.lock);
> >  	list_for_each_entry(dynid, &drv->dynids.list, node) {
> >  		if (pci_match_one_device(&dynid->id, dev)) {
> >  			spin_unlock(&drv->dynids.lock);
> > -			return &dynid->id;
> > +			if (id)
> > +				*id = &dynid->id;
> > +			return 1;
> >  		}
> >  	}
> >  	spin_unlock(&drv->dynids.lock);
> >  
> > -	return pci_match_id(drv->id_table, dev);
> > +	return pci_match_id(drv->id_table, dev, id);
> >  }
> >  
> >  struct drv_dev_and_id {
> > @@ -342,8 +356,7 @@ __pci_device_probe(struct pci_driver *drv, struct pci_dev *pci_dev)
> >  	if (!pci_dev->driver && drv->probe) {
> >  		error = -ENODEV;
> >  
> > -		id = pci_match_device(drv, pci_dev);
> > -		if (id)
> > +		if (pci_match_device(drv, pci_dev, &id))
> >  			error = pci_call_probe(drv, pci_dev, id);
> >  		if (error >= 0)
> >  			error = 0;
> > @@ -1272,17 +1285,12 @@ static int pci_bus_match(struct device *dev, struct device_driver *drv)
> >  {
> >  	struct pci_dev *pci_dev = to_pci_dev(dev);
> >  	struct pci_driver *pci_drv;
> > -	const struct pci_device_id *found_id;
> >  
> >  	if (!pci_dev->match_driver)
> >  		return 0;
> >  
> >  	pci_drv = to_pci_driver(drv);
> > -	found_id = pci_match_device(pci_drv, pci_dev);
> > -	if (found_id)
> > -		return 1;
> > -
> > -	return 0;
> > +	return pci_match_device(pci_drv, pci_dev, NULL);
> >  }
> >  
> >  /**
> > diff --git a/drivers/pci/pci-sysfs.c b/drivers/pci/pci-sysfs.c
> > index 4e0acef..d6075f8 100644
> > --- a/drivers/pci/pci-sysfs.c
> > +++ b/drivers/pci/pci-sysfs.c
> > @@ -222,6 +222,46 @@ static ssize_t enabled_show(struct device *dev,
> >  }
> >  static DEVICE_ATTR_RW(enabled);
> >  
> > +static ssize_t preferred_driver_store(struct device *dev,
> > +				      struct device_attribute *attr,
> > +				      const char *buf, size_t count)
> > +{
> > +	struct pci_dev *pdev = to_pci_dev(dev);
> > +	char *preferred_driver, *old = pdev->preferred_driver;
> > +
> > +	if (count > PATH_MAX)
> > +		return -EINVAL;
> > +
> > +	preferred_driver = kstrndup(buf, count, GFP_KERNEL);
> > +	if (!preferred_driver)
> > +		return -ENOMEM;
> > +
> > +	while (strlen(preferred_driver) &&
> > +	       preferred_driver[strlen(preferred_driver) - 1] == '\n')
> > +		preferred_driver[strlen(preferred_driver) - 1] = '\0';
> > +
> > +	if (strlen(preferred_driver)) {
> > +		pdev->preferred_driver = preferred_driver;
> > +	} else {
> > +		kfree(preferred_driver);
> > +		pdev->preferred_driver = NULL;
> > +	}
> > +			
> > +	if (old)
> > +		kfree(old);
> > +
> > +	return count;
> > +}
> > +
> > +static ssize_t preferred_driver_show(struct device *dev,
> > +				     struct device_attribute *attr, char *buf)
> > +{
> > +	struct pci_dev *pdev = to_pci_dev(dev);
> > +
> > +	return sprintf(buf, "%s\n", pdev->preferred_driver);
> > +}
> > +static DEVICE_ATTR_RW(preferred_driver);
> > +
> >  #ifdef CONFIG_NUMA
> >  static ssize_t
> >  numa_node_show(struct device *dev, struct device_attribute *attr, char *buf)
> > @@ -521,6 +561,7 @@ static struct attribute *pci_dev_attrs[] = {
> >  #if defined(CONFIG_PM_RUNTIME) && defined(CONFIG_ACPI)
> >  	&dev_attr_d3cold_allowed.attr,
> >  #endif
> > +	&dev_attr_preferred_driver.attr,
> >  	NULL,
> >  };
> >  
> > diff --git a/include/linux/pci.h b/include/linux/pci.h
> > index aab57b4..6fecb0a 100644
> > --- a/include/linux/pci.h
> > +++ b/include/linux/pci.h
> > @@ -365,6 +365,7 @@ struct pci_dev {
> >  #endif
> >  	phys_addr_t rom; /* Physical address of ROM if it's not from the BAR */
> >  	size_t romlen; /* Length of ROM if it's not from the BAR */
> > +	char *preferred_driver; /* Preferred driver, supercedes ID matching */
> >  };
> >  
> >  static inline struct pci_dev *pci_physfn(struct pci_dev *dev)
> > @@ -1111,8 +1112,8 @@ int pci_add_dynid(struct pci_driver *drv,
> >  		  unsigned int subvendor, unsigned int subdevice,
> >  		  unsigned int class, unsigned int class_mask,
> >  		  unsigned long driver_data);
> > -const struct pci_device_id *pci_match_id(const struct pci_device_id *ids,
> > -					 struct pci_dev *dev);
> > +int pci_match_id(const struct pci_device_id *ids, struct pci_dev *dev,
> > +		 const struct pci_device_id **id);
> >  int pci_scan_bridge(struct pci_bus *bus, struct pci_dev *dev, int max,
> >  		    int pass);
> >  
> > 
> > 
> > _______________________________________________
> > iommu mailing list
> > iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA@public.gmane.org
> > https://lists.linuxfoundation.org/mailman/listinfo/iommu

^ permalink raw reply	[flat|nested] 92+ messages in thread

* Re: mechanism to allow a driver to bind to any device
@ 2014-03-28 17:10                                                     ` Alex Williamson
  0 siblings, 0 replies; 92+ messages in thread
From: Alex Williamson @ 2014-03-28 17:10 UTC (permalink / raw)
  To: Konrad Rzeszutek Wilk
  Cc: kvm-u79uwXL29TY76Z2rM5mHXA, jan.kiszka-kv7WeFo6aLtBDgjK7y7TUQ,
	will.deacon-5wv7dgnIgG8, Stuart Yoder,
	a.rigo-lrHrjnjw1UfHK3s98zE1ajGjJy/sRE9J, Michal Hocko,
	Scott Wood, Varun Sethi, kvmarm-FPEHb7Xf0XXUo1n7N8X6UoWGPAHP3yOg,
	Rafael J. Wysocki, Alexander Graf, Guenter Roeck,
	Dmitry Kasatkin, Tejun Heo, Bjorn Helgaas, Antonios Motakis,
	tech-lrHrjnjw1UfHK3s98zE1ajGjJy/sRE9J, Toshi Kani, Greg KH,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org

On Fri, 2014-03-28 at 12:58 -0400, Konrad Rzeszutek Wilk wrote:
> On Wed, Mar 26, 2014 at 04:09:21PM -0600, Alex Williamson wrote:
> > On Wed, 2014-03-26 at 10:21 -0600, Alex Williamson wrote:
> > > On Wed, 2014-03-26 at 23:06 +0800, Alexander Graf wrote:
> > > > 
> > > > > Am 26.03.2014 um 22:40 schrieb Konrad Rzeszutek Wilk <konrad.wilk-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org>:
> > > > > 
> > > > >> On Wed, Mar 26, 2014 at 01:40:32AM +0000, Stuart Yoder wrote:
> > > > >> Hi Greg,
> > > > >> 
> > > > >> We (Linaro, Freescale, Virtual Open Systems) are trying get an issue
> > > > >> closed that has been perculating for a while around creating a mechanism
> > > > >> that will allow kernel drivers like vfio can bind to devices of any type.
> > > > >> 
> > > > >> This thread with you:
> > > > >> http://www.spinics.net/lists/kvm-arm/msg08370.html
> > > > >> ...seems to have died out, so am trying to get your response
> > > > >> and will summarize again.  Vfio drivers in the kernel (regardless of
> > > > >> bus type) need to bind to devices of any type.  The driver's function
> > > > >> is to simply export hardware resources of any type to user space.
> > > > >> 
> > > > >> There are several approaches that have been proposed:
> > > > > 
> > > > > You seem to have missed the one I proposed.
> > > > >> 
> > > > >>   1.  new_id -- (current approach) the user explicitly registers
> > > > >>       each new device type with the vfio driver using the new_id
> > > > >>       mechanism.
> > > > >> 
> > > > >>       Problem: multiple drivers will be resident that handle the
> > > > >>       same device type...and there is nothing user space hotplug
> > > > >>       infrastructure can do to help.
> > > > >> 
> > > > >>   2.  "any id" -- the vfio driver could specify a wildcard match
> > > > >>       of some kind in its ID match table which would allow it to
> > > > >>       match and bind to any possible device id.  However,
> > > > >>       we don't want the vfio driver grabbing _all_ devices...just the ones we
> > > > >>       explicitly want to pass to user space.
> > > > >> 
> > > > >>       The proposed patch to support this was to create a new flag
> > > > >>       "sysfs_bind_only" in struct device_driver.  When this flag
> > > > >>       is set, the driver can only bind to devices via the sysfs
> > > > >>       bind file.  This would allow the wildcard match to work.
> > > > >> 
> > > > >>       Patch is here:
> > > > >>       https://lkml.org/lkml/2013/12/3/253
> > > > >> 
> > > > >>   3.  "Driver initiated explicit bind" -- with this approach the
> > > > >>       vfio driver would create a private 'bind' sysfs object
> > > > >>       and the user would echo the requested device into it:
> > > > >> 
> > > > >>       echo 0001:03:00.0 > /sys/bus/pci/drivers/vfio-pci/vfio_bind
> > > > >> 
> > > > >>       In order to make that work, the driver would need to call
> > > > >>       driver_probe_device() and thus we need this patch:
> > > > >>       https://lkml.org/lkml/2014/2/8/175
> > > > > 
> > > > > 4). Use the 'unbind' (from the original device) and 'bind' to vfio driver.
> > > > 
> > > > This is approach 2, no?
> > > > 
> > > > > 
> > > > > Which I think is what is currently being done. Why is that not sufficient?
> > > > 
> > > > How would 'bind to vfio driver' look like?
> > > > 
> > > > > The only thing I see in the URL is " That works, but it is ugly."
> > > > > There is some mention of race but I don't see how - if you do the 'unbind'
> > > > > on the original driver and then bind the BDF to the VFIO how would you get
> > > > > a race?
> > > > 
> > > > Typically on PCI, you do a
> > > > 
> > > >   - add wildcard (pci id) match to vfio driver
> > > >   - unbind driver
> > > >   -> reprobe
> > > >   -> device attaches to vfio driver because it is the least recent match
> > > >   - remove wildcard match from vfio driver
> > > > 
> > > > If in between you hotplug add a card of the same type, it gets attached to vfio - even though the logical "default driver" would be the device specific driver.
> > > 
> > > I've mentioned drivers_autoprobe in the past, but I'm not sure we're
> > > really factoring it into the discussion.  drivers_autoprobe allows us to
> > > toggle two points:
> > > 
> > > a) When a new device is added whether we automatically give drivers a
> > > try at binding to it
> > > 
> > > b) When a new driver is added whether it gets to try to bind to anything
> > > in the system
> > > 
> > > So we do have a mechanism to avoid the race, but the problem is that it
> > > becomes the responsibility of userspace to:
> > > 
> > > 1) turn off drivers_autoprobe
> > > 2) unbind/new_id/bind/remove_id
> > > 3) turn on drivers_autoprobe
> > > 4) call drivers_probe for anything added between 1) & 3)
> > > 
> > > Is the question about the ugliness of the current solution whether it's
> > > unreasonable to ask userspace to do this?
> > > 
> > > What we seem to be asking for above is more like an autoprobe flag per
> > > driver where there's some way for this special driver to opt out of auto
> > > probing.  Option 2. in Stuart's list does this by short-cutting ID
> > > matching so that a "match" is only found when using the sysfs bind path,
> > > option 3. enables a way for a driver to expose their own sysfs entry
> > > point for binding.  The latter feels particularly chaotic since drivers
> > > get to make-up their own bind mechanism.
> > > 
> > > Another twist I'll throw in is that devices can be hot added to IOMMU
> > > groups that are in-use by userspace.  When that happens we'd like to be
> > > able to disable driver autoprobe of the device to avoid a host driver
> > > automatically binding to the device.  I wonder if instead of looking at
> > > the problem from the driver perspective, if we were to instead look at
> > > it from the device perspective if we might find a solution that would
> > > address both.  For instance, if devices had a driver_probe_id property
> > > that was by default set to their bus specific ID match ("$VENDOR
> > > $DEVICE" on PCI) could we use that to write new match IDs so that a
> > > device could only bind to a given driver?  Effectively we could then
> > > bind either using the current method of adding to the list of IDs a
> > > driver will match of changing the ID that a device would match.  Does
> > > that get us anywhere?  Thanks,
> > 
> > Here's one way this might work for PCI; note that we can do this
> > entirely in the bus driver for PCI.  Bind/unbind would go like this:
> > 
> > # bind device to vfio-pci
> > echo vfio-pci > /sys/bus/pci/devices/0000\:03\:00.0/preferred_driver
> > echo 0000:03:00.0 > /sys/bus/pci/devices/0000\:03\:00.0/driver/unbind
> > echo 0000:03:00.0 > /sys/bus/pci/drivers_probe
> > 
> > # bind device back to host driver
> > echo > /sys/bus/pci/devices/0000\:03\:00.0/preferred_driver
> > echo 0000:03:00.0 > /sys/bus/pci/devices/0000\:03\:00.0/driver/unbind
> > echo 0000:03:00.0 > /sys/bus/pci/drivers_probe
> > 
> > When preferred_driver is set for a device it will match and bind only to
> > a driver with a matching name.  This also means we can write random
> > strings here to avoid a device being bound to any driver if we want.
> > 
> > In the example patch below I've put the preferred_driver in the struct
> > pci_dev, but if this mechanism were adopted by multiple devices perhaps
> > we could add it to struct device.  Would something like this work for
> > platform devices?
> > 
> > Note 1, the below is just the core PCI driver change to support this,
> > there's some trivial collateral damage from changing an exported
> > function not shown here for brevity.
> > 
> > Note 2, PCI passes a struct pci_device_id to the driver probe function
> > which would be NULL in the preferred driver case of the example below.
> > We'd need to dynamically create one of these when calling the probe
> > function to make this practical for drivers that use that data.  Thanks,
> 
> That is I think a much easier way. Thought I would just call
> it 'override' instead of preferred_driver, since well, that is its
> intent.
> 
> Thank you for prototyping it!

I've realized since this first draft that returning NULL for the
pci_device_id would be unexpected for a number of drivers and probably
cause null pointer dereferences.  This is an implementation detail
though, we probably want a static "any ID" pci_device_id to return in
the case that there are no static table or dynid matches yet we still
want the override to match.  This should result in a smaller patch.
I'll wait for feasibility from the platform folks before I do another
revision though.  Thanks,

Alex
 
> > Signed-off-by: Alex Williamson <alex.williamson-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
> > 
> > diff --git a/drivers/pci/pci-driver.c b/drivers/pci/pci-driver.c
> > index d911e0c..9425920 100644
> > --- a/drivers/pci/pci-driver.c
> > +++ b/drivers/pci/pci-driver.c
> > @@ -203,17 +203,23 @@ ATTRIBUTE_GROUPS(pci_drv);
> >   * Deprecated, don't use this as it will not catch any dynamic ids
> >   * that a driver might want to check for.
> >   */
> > -const struct pci_device_id *pci_match_id(const struct pci_device_id *ids,
> > -					 struct pci_dev *dev)
> > +int pci_match_id(const struct pci_device_id *ids, struct pci_dev *dev,
> > +		 const struct pci_device_id **id)
> >  {
> > +	if (id)
> > +		*id = NULL;
> > +
> >  	if (ids) {
> >  		while (ids->vendor || ids->subvendor || ids->class_mask) {
> > -			if (pci_match_one_device(ids, dev))
> > -				return ids;
> > +			if (pci_match_one_device(ids, dev)) {
> > +				if (id)
> > +					*id = ids;
> > +				return 1;
> > +			}
> >  			ids++;
> >  		}
> >  	}
> > -	return NULL;
> > +	return 0;
> >  }
> >  
> >  /**
> > @@ -225,22 +231,30 @@ const struct pci_device_id *pci_match_id(const struct pci_device_id *ids,
> >   * system is in its list of supported devices.  Returns the matching
> >   * pci_device_id structure or %NULL if there is no match.
> >   */
> > -static const struct pci_device_id *pci_match_device(struct pci_driver *drv,
> > -						    struct pci_dev *dev)
> > +static int pci_match_device(struct pci_driver *drv, struct pci_dev *dev,
> > +			    const struct pci_device_id **id)
> >  {
> >  	struct pci_dynid *dynid;
> >  
> > +	if (id)
> > +		*id = NULL;
> > +
> > +	if (dev->preferred_driver)
> > +		return !strcmp(drv->name, dev->preferred_driver);
> > +
> >  	/* Look at the dynamic ids first, before the static ones */
> >  	spin_lock(&drv->dynids.lock);
> >  	list_for_each_entry(dynid, &drv->dynids.list, node) {
> >  		if (pci_match_one_device(&dynid->id, dev)) {
> >  			spin_unlock(&drv->dynids.lock);
> > -			return &dynid->id;
> > +			if (id)
> > +				*id = &dynid->id;
> > +			return 1;
> >  		}
> >  	}
> >  	spin_unlock(&drv->dynids.lock);
> >  
> > -	return pci_match_id(drv->id_table, dev);
> > +	return pci_match_id(drv->id_table, dev, id);
> >  }
> >  
> >  struct drv_dev_and_id {
> > @@ -342,8 +356,7 @@ __pci_device_probe(struct pci_driver *drv, struct pci_dev *pci_dev)
> >  	if (!pci_dev->driver && drv->probe) {
> >  		error = -ENODEV;
> >  
> > -		id = pci_match_device(drv, pci_dev);
> > -		if (id)
> > +		if (pci_match_device(drv, pci_dev, &id))
> >  			error = pci_call_probe(drv, pci_dev, id);
> >  		if (error >= 0)
> >  			error = 0;
> > @@ -1272,17 +1285,12 @@ static int pci_bus_match(struct device *dev, struct device_driver *drv)
> >  {
> >  	struct pci_dev *pci_dev = to_pci_dev(dev);
> >  	struct pci_driver *pci_drv;
> > -	const struct pci_device_id *found_id;
> >  
> >  	if (!pci_dev->match_driver)
> >  		return 0;
> >  
> >  	pci_drv = to_pci_driver(drv);
> > -	found_id = pci_match_device(pci_drv, pci_dev);
> > -	if (found_id)
> > -		return 1;
> > -
> > -	return 0;
> > +	return pci_match_device(pci_drv, pci_dev, NULL);
> >  }
> >  
> >  /**
> > diff --git a/drivers/pci/pci-sysfs.c b/drivers/pci/pci-sysfs.c
> > index 4e0acef..d6075f8 100644
> > --- a/drivers/pci/pci-sysfs.c
> > +++ b/drivers/pci/pci-sysfs.c
> > @@ -222,6 +222,46 @@ static ssize_t enabled_show(struct device *dev,
> >  }
> >  static DEVICE_ATTR_RW(enabled);
> >  
> > +static ssize_t preferred_driver_store(struct device *dev,
> > +				      struct device_attribute *attr,
> > +				      const char *buf, size_t count)
> > +{
> > +	struct pci_dev *pdev = to_pci_dev(dev);
> > +	char *preferred_driver, *old = pdev->preferred_driver;
> > +
> > +	if (count > PATH_MAX)
> > +		return -EINVAL;
> > +
> > +	preferred_driver = kstrndup(buf, count, GFP_KERNEL);
> > +	if (!preferred_driver)
> > +		return -ENOMEM;
> > +
> > +	while (strlen(preferred_driver) &&
> > +	       preferred_driver[strlen(preferred_driver) - 1] == '\n')
> > +		preferred_driver[strlen(preferred_driver) - 1] = '\0';
> > +
> > +	if (strlen(preferred_driver)) {
> > +		pdev->preferred_driver = preferred_driver;
> > +	} else {
> > +		kfree(preferred_driver);
> > +		pdev->preferred_driver = NULL;
> > +	}
> > +			
> > +	if (old)
> > +		kfree(old);
> > +
> > +	return count;
> > +}
> > +
> > +static ssize_t preferred_driver_show(struct device *dev,
> > +				     struct device_attribute *attr, char *buf)
> > +{
> > +	struct pci_dev *pdev = to_pci_dev(dev);
> > +
> > +	return sprintf(buf, "%s\n", pdev->preferred_driver);
> > +}
> > +static DEVICE_ATTR_RW(preferred_driver);
> > +
> >  #ifdef CONFIG_NUMA
> >  static ssize_t
> >  numa_node_show(struct device *dev, struct device_attribute *attr, char *buf)
> > @@ -521,6 +561,7 @@ static struct attribute *pci_dev_attrs[] = {
> >  #if defined(CONFIG_PM_RUNTIME) && defined(CONFIG_ACPI)
> >  	&dev_attr_d3cold_allowed.attr,
> >  #endif
> > +	&dev_attr_preferred_driver.attr,
> >  	NULL,
> >  };
> >  
> > diff --git a/include/linux/pci.h b/include/linux/pci.h
> > index aab57b4..6fecb0a 100644
> > --- a/include/linux/pci.h
> > +++ b/include/linux/pci.h
> > @@ -365,6 +365,7 @@ struct pci_dev {
> >  #endif
> >  	phys_addr_t rom; /* Physical address of ROM if it's not from the BAR */
> >  	size_t romlen; /* Length of ROM if it's not from the BAR */
> > +	char *preferred_driver; /* Preferred driver, supercedes ID matching */
> >  };
> >  
> >  static inline struct pci_dev *pci_physfn(struct pci_dev *dev)
> > @@ -1111,8 +1112,8 @@ int pci_add_dynid(struct pci_driver *drv,
> >  		  unsigned int subvendor, unsigned int subdevice,
> >  		  unsigned int class, unsigned int class_mask,
> >  		  unsigned long driver_data);
> > -const struct pci_device_id *pci_match_id(const struct pci_device_id *ids,
> > -					 struct pci_dev *dev);
> > +int pci_match_id(const struct pci_device_id *ids, struct pci_dev *dev,
> > +		 const struct pci_device_id **id);
> >  int pci_scan_bridge(struct pci_bus *bus, struct pci_dev *dev, int max,
> >  		    int pass);
> >  
> > 
> > 
> > _______________________________________________
> > iommu mailing list
> > iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA@public.gmane.org
> > https://lists.linuxfoundation.org/mailman/listinfo/iommu

^ permalink raw reply	[flat|nested] 92+ messages in thread

* RE: mechanism to allow a driver to bind to any device
       [not found]                                     ` <20140328065942.GB14619-U8xfFu+wG4EAvxtiuMwx3w@public.gmane.org>
@ 2014-03-31 18:21                                       ` Stuart Yoder
  0 siblings, 0 replies; 92+ messages in thread
From: Stuart Yoder @ 2014-03-31 18:21 UTC (permalink / raw)
  To: Greg KH, Antonios Motakis
  Cc: kvm-u79uwXL29TY76Z2rM5mHXA, jan.kiszka-kv7WeFo6aLtBDgjK7y7TUQ,
	will.deacon-5wv7dgnIgG8, linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	Michal Hocko, Bjorn Helgaas, Varun Sethi,
	kvmarm-FPEHb7Xf0XXUo1n7N8X6UoWGPAHP3yOg, Rafael J. Wysocki,
	agraf-l3A5Bk7waGM, Guenter Roeck, Dmitry Kasatkin, Tejun Heo,
	Scott Wood, tech-lrHrjnjw1UfHK3s98zE1ajGjJy/sRE9J, Toshi Kani,
	a.rigo-lrHrjnjw1UfHK3s98zE1ajGjJy/sRE9J,
	iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA, Joe Perches,



> -----Original Message-----
> From: Greg KH [mailto:gregkh-hQyY1W1yCW8ekmWlsbkhG0B+6BGkLq7r@public.gmane.org]
> Sent: Friday, March 28, 2014 2:00 AM
> To: Antonios Motakis
> Cc: Yoder Stuart-B08248; alex.williamson-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org;
> kvmarm-FPEHb7Xf0XXUo1n7N8X6UoWGPAHP3yOg@public.gmane.org; iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA@public.gmane.org; linux-
> kernel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org; tech-lrHrjnjw1UfHK3s98zE1ajGjJy/sRE9J@public.gmane.org;
> a.rigo-lrHrjnjw1UfHK3s98zE1ajGjJy/sRE9J@public.gmane.org; kim.phillips-QSEj5FYQhm4dnm+yROfE0A@public.gmane.org;
> jan.kiszka-kv7WeFo6aLtBDgjK7y7TUQ@public.gmane.org; kvm-u79uwXL29TY76Z2rM5mHXA@public.gmane.org; Bhushan Bharat-R65777; Wood
> Scott-B07421; christoffer.dall-QSEj5FYQhm4dnm+yROfE0A@public.gmane.org; agraf-l3A5Bk7waGM@public.gmane.org; Sethi Varun-
> B16395; will.deacon-5wv7dgnIgG8@public.gmane.org; Tejun Heo; Rafael J. Wysocki; Guenter Roeck;
> Toshi Kani; Joe Perches; Dmitry Kasatkin; Michal Hocko; Bjorn Helgaas
> Subject: Re: mechanism to allow a driver to bind to any device
> 
> On Wed, Mar 26, 2014 at 10:39:57PM +0100, Antonios Motakis wrote:
> >
> > Of note is that new_id doesn't work particularly well for platform
> devices.
> 
> Nor should it.  Platform devices suck horribly, and "ids" mean nothing
> to them, so you shouldn't even try this.  Use a "real" bus and it should
> be fine.

We have the hardware we have and have to live with it.  A system-on-a-chip
typically has bunches of memory mapped I/O devices that are not on a bus that
supports probing.   These devices don't have any discoverable device
IDs.

So, the Linux platform bus works fine for making Linux aware of these
devices and associating drivers with them.  Not sure why it is horrible.
I don't see a reasonable alternative for dealing with these types of
devices.

But this is all besides the point of this thread, which encompasses PCI,
platform bus, or any other kind of bus.  This thread is about how specific
devices can be bound to a vfio driver that wants to expose that device
to user space.

Stuart

^ permalink raw reply	[flat|nested] 92+ messages in thread

* RE: mechanism to allow a driver to bind to any device
       [not found]                                         ` <1395850862.632.247.camel-85EaTFmN5p//9pzu0YdTqQ@public.gmane.org>
@ 2014-03-31 18:32                                             ` Stuart Yoder
  2014-03-26 22:09                                           ` Alex Williamson
  2014-03-31 18:32                                             ` Stuart Yoder
  2 siblings, 0 replies; 92+ messages in thread
From: Stuart Yoder @ 2014-03-31 18:32 UTC (permalink / raw)
  To: Alex Williamson, Alexander Graf
  Cc: kvm-u79uwXL29TY76Z2rM5mHXA, jan.kiszka-kv7WeFo6aLtBDgjK7y7TUQ,
	will.deacon-5wv7dgnIgG8, a.rigo-lrHrjnjw1UfHK3s98zE1ajGjJy/sRE9J,
	Michal Hocko, Bjorn Helgaas, Varun Sethi,
	kvmarm-FPEHb7Xf0XXUo1n7N8X6UoWGPAHP3yOg, Rafael J. Wysocki,
	Guenter Roeck, Dmitry Kasatkin, Tejun Heo, Scott Wood,
	Antonios Motakis, tech-lrHrjnjw1UfHK3s98zE1ajGjJy/sRE9J,
	Toshi Kani, Greg KH, linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA@public.gmane.org



> -----Original Message-----
> From: Alex Williamson [mailto:alex.williamson-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org]
> Sent: Wednesday, March 26, 2014 11:21 AM
> To: Alexander Graf
> Cc: Konrad Rzeszutek Wilk; kvm-u79uwXL29TY76Z2rM5mHXA@public.gmane.org; jan.kiszka-kv7WeFo6aLtBDgjK7y7TUQ@public.gmane.org;
> will.deacon-5wv7dgnIgG8@public.gmane.org; Yoder Stuart-B08248; linux-kernel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org;
> Michal Hocko; Bjorn Helgaas; Sethi Varun-B16395;
> kvmarm-FPEHb7Xf0XXUo1n7N8X6UoWGPAHP3yOg@public.gmane.org; Rafael J. Wysocki; Guenter Roeck; Dmitry
> Kasatkin; Joe Perches; Wood Scott-B07421; Antonios Motakis;
> tech-lrHrjnjw1UfHK3s98zE1ajGjJy/sRE9J@public.gmane.org; Toshi Kani; Greg KH;
> a.rigo-lrHrjnjw1UfHK3s98zE1ajGjJy/sRE9J@public.gmane.org; iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA@public.gmane.org; Tejun
> Heo; christoffer.dall-QSEj5FYQhm4dnm+yROfE0A@public.gmane.org
> Subject: Re: mechanism to allow a driver to bind to any device
> 
> On Wed, 2014-03-26 at 23:06 +0800, Alexander Graf wrote:
> >
> > > Am 26.03.2014 um 22:40 schrieb Konrad Rzeszutek Wilk
> <konrad.wilk-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org>:
> > >
> > >> On Wed, Mar 26, 2014 at 01:40:32AM +0000, Stuart Yoder wrote:
> > >> Hi Greg,
> > >>
> > >> We (Linaro, Freescale, Virtual Open Systems) are trying get an issue
> > >> closed that has been perculating for a while around creating a
> mechanism
> > >> that will allow kernel drivers like vfio can bind to devices of any
> type.
> > >>
> > >> This thread with you:
> > >> http://www.spinics.net/lists/kvm-arm/msg08370.html
> > >> ...seems to have died out, so am trying to get your response
> > >> and will summarize again.  Vfio drivers in the kernel (regardless of
> > >> bus type) need to bind to devices of any type.  The driver's
> function
> > >> is to simply export hardware resources of any type to user space.
> > >>
> > >> There are several approaches that have been proposed:
> > >
> > > You seem to have missed the one I proposed.
> > >>
> > >>   1.  new_id -- (current approach) the user explicitly registers
> > >>       each new device type with the vfio driver using the new_id
> > >>       mechanism.
> > >>
> > >>       Problem: multiple drivers will be resident that handle the
> > >>       same device type...and there is nothing user space hotplug
> > >>       infrastructure can do to help.
> > >>
> > >>   2.  "any id" -- the vfio driver could specify a wildcard match
> > >>       of some kind in its ID match table which would allow it to
> > >>       match and bind to any possible device id.  However,
> > >>       we don't want the vfio driver grabbing _all_ devices...just
> the ones we
> > >>       explicitly want to pass to user space.
> > >>
> > >>       The proposed patch to support this was to create a new flag
> > >>       "sysfs_bind_only" in struct device_driver.  When this flag
> > >>       is set, the driver can only bind to devices via the sysfs
> > >>       bind file.  This would allow the wildcard match to work.
> > >>
> > >>       Patch is here:
> > >>       https://lkml.org/lkml/2013/12/3/253
> > >>
> > >>   3.  "Driver initiated explicit bind" -- with this approach the
> > >>       vfio driver would create a private 'bind' sysfs object
> > >>       and the user would echo the requested device into it:
> > >>
> > >>       echo 0001:03:00.0 > /sys/bus/pci/drivers/vfio-pci/vfio_bind
> > >>
> > >>       In order to make that work, the driver would need to call
> > >>       driver_probe_device() and thus we need this patch:
> > >>       https://lkml.org/lkml/2014/2/8/175
> > >
> > > 4). Use the 'unbind' (from the original device) and 'bind' to vfio
> driver.
> >
> > This is approach 2, no?
> >
> > >
> > > Which I think is what is currently being done. Why is that not
> sufficient?
> >
> > How would 'bind to vfio driver' look like?
> >
> > > The only thing I see in the URL is " That works, but it is ugly."
> > > There is some mention of race but I don't see how - if you do the
> 'unbind'
> > > on the original driver and then bind the BDF to the VFIO how would
> you get
> > > a race?
> >
> > Typically on PCI, you do a
> >
> >   - add wildcard (pci id) match to vfio driver
> >   - unbind driver
> >   -> reprobe
> >   -> device attaches to vfio driver because it is the least recent
> match
> >   - remove wildcard match from vfio driver
> >
> > If in between you hotplug add a card of the same type, it gets attached
> to vfio - even though the logical "default driver" would be the device
> specific driver.
> 
> I've mentioned drivers_autoprobe in the past, but I'm not sure we're
> really factoring it into the discussion.  drivers_autoprobe allows us to
> toggle two points:
> 
> a) When a new device is added whether we automatically give drivers a
> try at binding to it
> 
> b) When a new driver is added whether it gets to try to bind to anything
> in the system
> 
> So we do have a mechanism to avoid the race, but the problem is that it
> becomes the responsibility of userspace to:
> 
> 1) turn off drivers_autoprobe
> 2) unbind/new_id/bind/remove_id
> 3) turn on drivers_autoprobe
> 4) call drivers_probe for anything added between 1) & 3)
> 
> Is the question about the ugliness of the current solution whether it's
> unreasonable to ask userspace to do this?

It's probably not unreasonable... I did not understand the
drivers_autoprobe mechanism until now...didn't realize we had that.

> What we seem to be asking for above is more like an autoprobe flag per
> driver where there's some way for this special driver to opt out of auto
> probing.

Yes, that is basically it.  In fact perhaps using 'autoprobe' in
the name of the sysfs object would have been better and more clear
than 'sysfs_bind_only'.

> Option 2. in Stuart's list does this by short-cutting ID
> matching so that a "match" is only found when using the sysfs bind path,
> option 3. enables a way for a driver to expose their own sysfs entry
> point for binding.  The latter feels particularly chaotic since drivers
> get to make-up their own bind mechanism.
> 
> Another twist I'll throw in is that devices can be hot added to IOMMU
> groups that are in-use by userspace.  When that happens we'd like to be
> able to disable driver autoprobe of the device to avoid a host driver
> automatically binding to the device.  I wonder if instead of looking at
> the problem from the driver perspective, if we were to instead look at
> it from the device perspective if we might find a solution that would
> address both.  For instance, if devices had a driver_probe_id property
> that was by default set to their bus specific ID match ("$VENDOR
> $DEVICE" on PCI) could we use that to write new match IDs so that a
> device could only bind to a given driver?  Effectively we could then
> bind either using the current method of adding to the list of IDs a
> driver will match of changing the ID that a device would match.  Does
> that get us anywhere?  Thanks,

[Saw your follow-on post on the above and will comment there...]

Stuart

^ permalink raw reply	[flat|nested] 92+ messages in thread

* RE: mechanism to allow a driver to bind to any device
@ 2014-03-31 18:32                                             ` Stuart Yoder
  0 siblings, 0 replies; 92+ messages in thread
From: Stuart Yoder @ 2014-03-31 18:32 UTC (permalink / raw)
  To: Alex Williamson, Alexander Graf
  Cc: kvm-u79uwXL29TY76Z2rM5mHXA, jan.kiszka-kv7WeFo6aLtBDgjK7y7TUQ,
	will.deacon-5wv7dgnIgG8, a.rigo-lrHrjnjw1UfHK3s98zE1ajGjJy/sRE9J,
	Michal Hocko, Bjorn Helgaas, Varun Sethi,
	kvmarm-FPEHb7Xf0XXUo1n7N8X6UoWGPAHP3yOg, Rafael J. Wysocki,
	Guenter Roeck, Dmitry Kasatkin, Tejun Heo, Scott Wood,
	Antonios Motakis, tech-lrHrjnjw1UfHK3s98zE1ajGjJy/sRE9J,
	Toshi Kani, Greg KH, linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA@public.gmane.org



> -----Original Message-----
> From: Alex Williamson [mailto:alex.williamson-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org]
> Sent: Wednesday, March 26, 2014 11:21 AM
> To: Alexander Graf
> Cc: Konrad Rzeszutek Wilk; kvm-u79uwXL29TY76Z2rM5mHXA@public.gmane.org; jan.kiszka-kv7WeFo6aLtBDgjK7y7TUQ@public.gmane.org;
> will.deacon-5wv7dgnIgG8@public.gmane.org; Yoder Stuart-B08248; linux-kernel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org;
> Michal Hocko; Bjorn Helgaas; Sethi Varun-B16395;
> kvmarm-FPEHb7Xf0XXUo1n7N8X6UoWGPAHP3yOg@public.gmane.org; Rafael J. Wysocki; Guenter Roeck; Dmitry
> Kasatkin; Joe Perches; Wood Scott-B07421; Antonios Motakis;
> tech-lrHrjnjw1UfHK3s98zE1ajGjJy/sRE9J@public.gmane.org; Toshi Kani; Greg KH;
> a.rigo-lrHrjnjw1UfHK3s98zE1ajGjJy/sRE9J@public.gmane.org; iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA@public.gmane.org; Tejun
> Heo; christoffer.dall-QSEj5FYQhm4dnm+yROfE0A@public.gmane.org
> Subject: Re: mechanism to allow a driver to bind to any device
> 
> On Wed, 2014-03-26 at 23:06 +0800, Alexander Graf wrote:
> >
> > > Am 26.03.2014 um 22:40 schrieb Konrad Rzeszutek Wilk
> <konrad.wilk-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org>:
> > >
> > >> On Wed, Mar 26, 2014 at 01:40:32AM +0000, Stuart Yoder wrote:
> > >> Hi Greg,
> > >>
> > >> We (Linaro, Freescale, Virtual Open Systems) are trying get an issue
> > >> closed that has been perculating for a while around creating a
> mechanism
> > >> that will allow kernel drivers like vfio can bind to devices of any
> type.
> > >>
> > >> This thread with you:
> > >> http://www.spinics.net/lists/kvm-arm/msg08370.html
> > >> ...seems to have died out, so am trying to get your response
> > >> and will summarize again.  Vfio drivers in the kernel (regardless of
> > >> bus type) need to bind to devices of any type.  The driver's
> function
> > >> is to simply export hardware resources of any type to user space.
> > >>
> > >> There are several approaches that have been proposed:
> > >
> > > You seem to have missed the one I proposed.
> > >>
> > >>   1.  new_id -- (current approach) the user explicitly registers
> > >>       each new device type with the vfio driver using the new_id
> > >>       mechanism.
> > >>
> > >>       Problem: multiple drivers will be resident that handle the
> > >>       same device type...and there is nothing user space hotplug
> > >>       infrastructure can do to help.
> > >>
> > >>   2.  "any id" -- the vfio driver could specify a wildcard match
> > >>       of some kind in its ID match table which would allow it to
> > >>       match and bind to any possible device id.  However,
> > >>       we don't want the vfio driver grabbing _all_ devices...just
> the ones we
> > >>       explicitly want to pass to user space.
> > >>
> > >>       The proposed patch to support this was to create a new flag
> > >>       "sysfs_bind_only" in struct device_driver.  When this flag
> > >>       is set, the driver can only bind to devices via the sysfs
> > >>       bind file.  This would allow the wildcard match to work.
> > >>
> > >>       Patch is here:
> > >>       https://lkml.org/lkml/2013/12/3/253
> > >>
> > >>   3.  "Driver initiated explicit bind" -- with this approach the
> > >>       vfio driver would create a private 'bind' sysfs object
> > >>       and the user would echo the requested device into it:
> > >>
> > >>       echo 0001:03:00.0 > /sys/bus/pci/drivers/vfio-pci/vfio_bind
> > >>
> > >>       In order to make that work, the driver would need to call
> > >>       driver_probe_device() and thus we need this patch:
> > >>       https://lkml.org/lkml/2014/2/8/175
> > >
> > > 4). Use the 'unbind' (from the original device) and 'bind' to vfio
> driver.
> >
> > This is approach 2, no?
> >
> > >
> > > Which I think is what is currently being done. Why is that not
> sufficient?
> >
> > How would 'bind to vfio driver' look like?
> >
> > > The only thing I see in the URL is " That works, but it is ugly."
> > > There is some mention of race but I don't see how - if you do the
> 'unbind'
> > > on the original driver and then bind the BDF to the VFIO how would
> you get
> > > a race?
> >
> > Typically on PCI, you do a
> >
> >   - add wildcard (pci id) match to vfio driver
> >   - unbind driver
> >   -> reprobe
> >   -> device attaches to vfio driver because it is the least recent
> match
> >   - remove wildcard match from vfio driver
> >
> > If in between you hotplug add a card of the same type, it gets attached
> to vfio - even though the logical "default driver" would be the device
> specific driver.
> 
> I've mentioned drivers_autoprobe in the past, but I'm not sure we're
> really factoring it into the discussion.  drivers_autoprobe allows us to
> toggle two points:
> 
> a) When a new device is added whether we automatically give drivers a
> try at binding to it
> 
> b) When a new driver is added whether it gets to try to bind to anything
> in the system
> 
> So we do have a mechanism to avoid the race, but the problem is that it
> becomes the responsibility of userspace to:
> 
> 1) turn off drivers_autoprobe
> 2) unbind/new_id/bind/remove_id
> 3) turn on drivers_autoprobe
> 4) call drivers_probe for anything added between 1) & 3)
> 
> Is the question about the ugliness of the current solution whether it's
> unreasonable to ask userspace to do this?

It's probably not unreasonable... I did not understand the
drivers_autoprobe mechanism until now...didn't realize we had that.

> What we seem to be asking for above is more like an autoprobe flag per
> driver where there's some way for this special driver to opt out of auto
> probing.

Yes, that is basically it.  In fact perhaps using 'autoprobe' in
the name of the sysfs object would have been better and more clear
than 'sysfs_bind_only'.

> Option 2. in Stuart's list does this by short-cutting ID
> matching so that a "match" is only found when using the sysfs bind path,
> option 3. enables a way for a driver to expose their own sysfs entry
> point for binding.  The latter feels particularly chaotic since drivers
> get to make-up their own bind mechanism.
> 
> Another twist I'll throw in is that devices can be hot added to IOMMU
> groups that are in-use by userspace.  When that happens we'd like to be
> able to disable driver autoprobe of the device to avoid a host driver
> automatically binding to the device.  I wonder if instead of looking at
> the problem from the driver perspective, if we were to instead look at
> it from the device perspective if we might find a solution that would
> address both.  For instance, if devices had a driver_probe_id property
> that was by default set to their bus specific ID match ("$VENDOR
> $DEVICE" on PCI) could we use that to write new match IDs so that a
> device could only bind to a given driver?  Effectively we could then
> bind either using the current method of adding to the list of IDs a
> driver will match of changing the ID that a device would match.  Does
> that get us anywhere?  Thanks,

[Saw your follow-on post on the above and will comment there...]

Stuart

^ permalink raw reply	[flat|nested] 92+ messages in thread

* RE: mechanism to allow a driver to bind to any device
       [not found]                                             ` <1395871761.632.316.camel-85EaTFmN5p//9pzu0YdTqQ@public.gmane.org>
@ 2014-03-31 18:47                                               ` Stuart Yoder
  0 siblings, 0 replies; 92+ messages in thread
From: Stuart Yoder @ 2014-03-31 18:47 UTC (permalink / raw)
  To: Alex Williamson, Alexander Graf
  Cc: kvm, jan.kiszka, will.deacon, a.rigo, Michal Hocko, Scott Wood,
	Varun Sethi, kvmarm, Rafael J. Wysocki, Guenter Roeck,
	Dmitry Kasatkin, Tejun Heo, Bjorn Helgaas, Antonios Motakis,
	tech, Toshi Kani, Greg KH, linux-kernel, iommu, Joe Perches,
	christoffer.dall

[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #1: Type: text/plain; charset="utf-8", Size: 9189 bytes --]



> -----Original Message-----
> From: Alex Williamson [mailto:alex.williamson@redhat.com]
> Sent: Wednesday, March 26, 2014 5:09 PM
> To: Alexander Graf
> Cc: kvm@vger.kernel.org; jan.kiszka@siemens.com; will.deacon@arm.com;
> Yoder Stuart-B08248; a.rigo@virtualopensystems.com; Michal Hocko; Wood
> Scott-B07421; Sethi Varun-B16395; kvmarm@lists.cs.columbia.edu; Rafael J.
> Wysocki; Guenter Roeck; Dmitry Kasatkin; Tejun Heo; Bjorn Helgaas;
> Antonios Motakis; tech@virtualopensystems.com; Toshi Kani; Greg KH;
> linux-kernel@vger.kernel.org; iommu@lists.linux-foundation.org; Joe
> Perches; christoffer.dall@linaro.org
> Subject: Re: mechanism to allow a driver to bind to any device
> 
> On Wed, 2014-03-26 at 10:21 -0600, Alex Williamson wrote:
> > On Wed, 2014-03-26 at 23:06 +0800, Alexander Graf wrote:
> > >
> > > > Am 26.03.2014 um 22:40 schrieb Konrad Rzeszutek Wilk
> <konrad.wilk@oracle.com>:
> > > >
> > > >> On Wed, Mar 26, 2014 at 01:40:32AM +0000, Stuart Yoder wrote:
> > > >> Hi Greg,
> > > >>
> > > >> We (Linaro, Freescale, Virtual Open Systems) are trying get an
> issue
> > > >> closed that has been perculating for a while around creating a
> mechanism
> > > >> that will allow kernel drivers like vfio can bind to devices of
> any type.
> > > >>
> > > >> This thread with you:
> > > >> http://www.spinics.net/lists/kvm-arm/msg08370.html
> > > >> ...seems to have died out, so am trying to get your response
> > > >> and will summarize again.  Vfio drivers in the kernel (regardless
> of
> > > >> bus type) need to bind to devices of any type.  The driver's
> function
> > > >> is to simply export hardware resources of any type to user space.
> > > >>
> > > >> There are several approaches that have been proposed:
> > > >
> > > > You seem to have missed the one I proposed.
> > > >>
> > > >>   1.  new_id -- (current approach) the user explicitly registers
> > > >>       each new device type with the vfio driver using the new_id
> > > >>       mechanism.
> > > >>
> > > >>       Problem: multiple drivers will be resident that handle the
> > > >>       same device type...and there is nothing user space hotplug
> > > >>       infrastructure can do to help.
> > > >>
> > > >>   2.  "any id" -- the vfio driver could specify a wildcard match
> > > >>       of some kind in its ID match table which would allow it to
> > > >>       match and bind to any possible device id.  However,
> > > >>       we don't want the vfio driver grabbing _all_ devices...just
> the ones we
> > > >>       explicitly want to pass to user space.
> > > >>
> > > >>       The proposed patch to support this was to create a new flag
> > > >>       "sysfs_bind_only" in struct device_driver.  When this flag
> > > >>       is set, the driver can only bind to devices via the sysfs
> > > >>       bind file.  This would allow the wildcard match to work.
> > > >>
> > > >>       Patch is here:
> > > >>       https://lkml.org/lkml/2013/12/3/253
> > > >>
> > > >>   3.  "Driver initiated explicit bind" -- with this approach the
> > > >>       vfio driver would create a private 'bind' sysfs object
> > > >>       and the user would echo the requested device into it:
> > > >>
> > > >>       echo 0001:03:00.0 > /sys/bus/pci/drivers/vfio-pci/vfio_bind
> > > >>
> > > >>       In order to make that work, the driver would need to call
> > > >>       driver_probe_device() and thus we need this patch:
> > > >>       https://lkml.org/lkml/2014/2/8/175
> > > >
> > > > 4). Use the 'unbind' (from the original device) and 'bind' to vfio
> driver.
> > >
> > > This is approach 2, no?
> > >
> > > >
> > > > Which I think is what is currently being done. Why is that not
> sufficient?
> > >
> > > How would 'bind to vfio driver' look like?
> > >
> > > > The only thing I see in the URL is " That works, but it is ugly."
> > > > There is some mention of race but I don't see how - if you do the
> 'unbind'
> > > > on the original driver and then bind the BDF to the VFIO how would
> you get
> > > > a race?
> > >
> > > Typically on PCI, you do a
> > >
> > >   - add wildcard (pci id) match to vfio driver
> > >   - unbind driver
> > >   -> reprobe
> > >   -> device attaches to vfio driver because it is the least recent
> match
> > >   - remove wildcard match from vfio driver
> > >
> > > If in between you hotplug add a card of the same type, it gets
> attached to vfio - even though the logical "default driver" would be the
> device specific driver.
> >
> > I've mentioned drivers_autoprobe in the past, but I'm not sure we're
> > really factoring it into the discussion.  drivers_autoprobe allows us
> to
> > toggle two points:
> >
> > a) When a new device is added whether we automatically give drivers a
> > try at binding to it
> >
> > b) When a new driver is added whether it gets to try to bind to
> anything
> > in the system
> >
> > So we do have a mechanism to avoid the race, but the problem is that it
> > becomes the responsibility of userspace to:
> >
> > 1) turn off drivers_autoprobe
> > 2) unbind/new_id/bind/remove_id
> > 3) turn on drivers_autoprobe
> > 4) call drivers_probe for anything added between 1) & 3)
> >
> > Is the question about the ugliness of the current solution whether it's
> > unreasonable to ask userspace to do this?
> >
> > What we seem to be asking for above is more like an autoprobe flag per
> > driver where there's some way for this special driver to opt out of
> auto
> > probing.  Option 2. in Stuart's list does this by short-cutting ID
> > matching so that a "match" is only found when using the sysfs bind
> path,
> > option 3. enables a way for a driver to expose their own sysfs entry
> > point for binding.  The latter feels particularly chaotic since drivers
> > get to make-up their own bind mechanism.
> >
> > Another twist I'll throw in is that devices can be hot added to IOMMU
> > groups that are in-use by userspace.  When that happens we'd like to be
> > able to disable driver autoprobe of the device to avoid a host driver
> > automatically binding to the device.  I wonder if instead of looking at
> > the problem from the driver perspective, if we were to instead look at
> > it from the device perspective if we might find a solution that would
> > address both.  For instance, if devices had a driver_probe_id property
> > that was by default set to their bus specific ID match ("$VENDOR
> > $DEVICE" on PCI) could we use that to write new match IDs so that a
> > device could only bind to a given driver?  Effectively we could then
> > bind either using the current method of adding to the list of IDs a
> > driver will match of changing the ID that a device would match.  Does
> > that get us anywhere?  Thanks,
> 
> Here's one way this might work for PCI; note that we can do this
> entirely in the bus driver for PCI.  Bind/unbind would go like this:
> 
> # bind device to vfio-pci
> echo vfio-pci > /sys/bus/pci/devices/0000\:03\:00.0/preferred_driver
> echo 0000:03:00.0 > /sys/bus/pci/devices/0000\:03\:00.0/driver/unbind
> echo 0000:03:00.0 > /sys/bus/pci/drivers_probe
> 
> # bind device back to host driver
> echo > /sys/bus/pci/devices/0000\:03\:00.0/preferred_driver
> echo 0000:03:00.0 > /sys/bus/pci/devices/0000\:03\:00.0/driver/unbind
> echo 0000:03:00.0 > /sys/bus/pci/drivers_probe
> 
> When preferred_driver is set for a device it will match and bind only to
> a driver with a matching name.  This also means we can write random
> strings here to avoid a device being bound to any driver if we want.
> 
> In the example patch below I've put the preferred_driver in the struct
> pci_dev, but if this mechanism were adopted by multiple devices perhaps
> we could add it to struct device.  Would something like this work for
> platform devices?
> 
> Note 1, the below is just the core PCI driver change to support this,
> there's some trivial collateral damage from changing an exported
> function not shown here for brevity.
> 
> Note 2, PCI passes a struct pci_device_id to the driver probe function
> which would be NULL in the preferred driver case of the example below.
> We'd need to dynamically create one of these when calling the probe
> function to make this practical for drivers that use that data.  Thanks,

The paradigm of telling the device what the preferred driver is feels
more awkward to me than a sysfs flag for the driver to opt out of
auto-probing...but at this point if there is consensus that the
preferred_driver approach will be accepted upstream, I'm ok with it.
It think it works.

However, I am concerned about getting 'preferred driver' accepted
into the kernel and it's not immediately obvious to me how it is more
palatable than the 'opt out of auto-probe' approaches that were
proposed previously.

I also, was at the point where I thought we should perhaps just
go with current mechanisms and implement new_id for the platform
bus...but Greg's recent response is 'platform devices suck' and it sounds
like he would reject a new_id patch for the platform bus.  So it kind
of feels like we are stuck.

Thanks,
Stuart


ÿôèº{.nÇ+‰·Ÿ®‰­†+%ŠËÿ±éݶ\x17¥Šwÿº{.nÇ+‰·¥Š{±þG«éÿŠ{ayº\x1dʇڙë,j\a­¢f£¢·hšïêÿ‘êçz_è®\x03(­éšŽŠÝ¢j"ú\x1a¶^[m§ÿÿ¾\a«þG«éÿ¢¸?™¨è­Ú&£ø§~á¶iO•æ¬z·švØ^\x14\x04\x1a¶^[m§ÿÿÃ\fÿ¶ìÿ¢¸?–I¥

^ permalink raw reply	[flat|nested] 92+ messages in thread

* RE: mechanism to allow a driver to bind to any device
@ 2014-03-31 18:47                                               ` Stuart Yoder
  0 siblings, 0 replies; 92+ messages in thread
From: Stuart Yoder @ 2014-03-31 18:47 UTC (permalink / raw)
  To: Alex Williamson, Alexander Graf
  Cc: Joe Perches, Dmitry Kasatkin, Toshi Kani,
	kvm-u79uwXL29TY76Z2rM5mHXA, Greg KH,
	jan.kiszka-kv7WeFo6aLtBDgjK7y7TUQ, Rafael J. Wysocki,
	will.deacon-5wv7dgnIgG8, a.rigo-lrHrjnjw1UfHK3s98zE1ajGjJy/sRE9J,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA, Michal Hocko, Tejun Heo,
	iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	christoffer.dall-QSEj5FYQhm4dnm+yROfE0A, Antonios Motakis,
	Scott Wood, Varun Sethi, tech-lrHrjnjw1UfHK3s98zE1ajGjJy/sRE9J,
	Bjorn Helgaas



> -----Original Message-----
> From: Alex Williamson [mailto:alex.williamson-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org]
> Sent: Wednesday, March 26, 2014 5:09 PM
> To: Alexander Graf
> Cc: kvm-u79uwXL29TY76Z2rM5mHXA@public.gmane.org; jan.kiszka-kv7WeFo6aLtBDgjK7y7TUQ@public.gmane.org; will.deacon-5wv7dgnIgG8@public.gmane.org;
> Yoder Stuart-B08248; a.rigo-lrHrjnjw1UfHK3s98zE1ajGjJy/sRE9J@public.gmane.org; Michal Hocko; Wood
> Scott-B07421; Sethi Varun-B16395; kvmarm-FPEHb7Xf0XXUo1n7N8X6UoWGPAHP3yOg@public.gmane.org; Rafael J.
> Wysocki; Guenter Roeck; Dmitry Kasatkin; Tejun Heo; Bjorn Helgaas;
> Antonios Motakis; tech-lrHrjnjw1UfHK3s98zE1ajGjJy/sRE9J@public.gmane.org; Toshi Kani; Greg KH;
> linux-kernel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org; iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA@public.gmane.org; Joe
> Perches; christoffer.dall-QSEj5FYQhm4dnm+yROfE0A@public.gmane.org
> Subject: Re: mechanism to allow a driver to bind to any device
> 
> On Wed, 2014-03-26 at 10:21 -0600, Alex Williamson wrote:
> > On Wed, 2014-03-26 at 23:06 +0800, Alexander Graf wrote:
> > >
> > > > Am 26.03.2014 um 22:40 schrieb Konrad Rzeszutek Wilk
> <konrad.wilk-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org>:
> > > >
> > > >> On Wed, Mar 26, 2014 at 01:40:32AM +0000, Stuart Yoder wrote:
> > > >> Hi Greg,
> > > >>
> > > >> We (Linaro, Freescale, Virtual Open Systems) are trying get an
> issue
> > > >> closed that has been perculating for a while around creating a
> mechanism
> > > >> that will allow kernel drivers like vfio can bind to devices of
> any type.
> > > >>
> > > >> This thread with you:
> > > >> http://www.spinics.net/lists/kvm-arm/msg08370.html
> > > >> ...seems to have died out, so am trying to get your response
> > > >> and will summarize again.  Vfio drivers in the kernel (regardless
> of
> > > >> bus type) need to bind to devices of any type.  The driver's
> function
> > > >> is to simply export hardware resources of any type to user space.
> > > >>
> > > >> There are several approaches that have been proposed:
> > > >
> > > > You seem to have missed the one I proposed.
> > > >>
> > > >>   1.  new_id -- (current approach) the user explicitly registers
> > > >>       each new device type with the vfio driver using the new_id
> > > >>       mechanism.
> > > >>
> > > >>       Problem: multiple drivers will be resident that handle the
> > > >>       same device type...and there is nothing user space hotplug
> > > >>       infrastructure can do to help.
> > > >>
> > > >>   2.  "any id" -- the vfio driver could specify a wildcard match
> > > >>       of some kind in its ID match table which would allow it to
> > > >>       match and bind to any possible device id.  However,
> > > >>       we don't want the vfio driver grabbing _all_ devices...just
> the ones we
> > > >>       explicitly want to pass to user space.
> > > >>
> > > >>       The proposed patch to support this was to create a new flag
> > > >>       "sysfs_bind_only" in struct device_driver.  When this flag
> > > >>       is set, the driver can only bind to devices via the sysfs
> > > >>       bind file.  This would allow the wildcard match to work.
> > > >>
> > > >>       Patch is here:
> > > >>       https://lkml.org/lkml/2013/12/3/253
> > > >>
> > > >>   3.  "Driver initiated explicit bind" -- with this approach the
> > > >>       vfio driver would create a private 'bind' sysfs object
> > > >>       and the user would echo the requested device into it:
> > > >>
> > > >>       echo 0001:03:00.0 > /sys/bus/pci/drivers/vfio-pci/vfio_bind
> > > >>
> > > >>       In order to make that work, the driver would need to call
> > > >>       driver_probe_device() and thus we need this patch:
> > > >>       https://lkml.org/lkml/2014/2/8/175
> > > >
> > > > 4). Use the 'unbind' (from the original device) and 'bind' to vfio
> driver.
> > >
> > > This is approach 2, no?
> > >
> > > >
> > > > Which I think is what is currently being done. Why is that not
> sufficient?
> > >
> > > How would 'bind to vfio driver' look like?
> > >
> > > > The only thing I see in the URL is " That works, but it is ugly."
> > > > There is some mention of race but I don't see how - if you do the
> 'unbind'
> > > > on the original driver and then bind the BDF to the VFIO how would
> you get
> > > > a race?
> > >
> > > Typically on PCI, you do a
> > >
> > >   - add wildcard (pci id) match to vfio driver
> > >   - unbind driver
> > >   -> reprobe
> > >   -> device attaches to vfio driver because it is the least recent
> match
> > >   - remove wildcard match from vfio driver
> > >
> > > If in between you hotplug add a card of the same type, it gets
> attached to vfio - even though the logical "default driver" would be the
> device specific driver.
> >
> > I've mentioned drivers_autoprobe in the past, but I'm not sure we're
> > really factoring it into the discussion.  drivers_autoprobe allows us
> to
> > toggle two points:
> >
> > a) When a new device is added whether we automatically give drivers a
> > try at binding to it
> >
> > b) When a new driver is added whether it gets to try to bind to
> anything
> > in the system
> >
> > So we do have a mechanism to avoid the race, but the problem is that it
> > becomes the responsibility of userspace to:
> >
> > 1) turn off drivers_autoprobe
> > 2) unbind/new_id/bind/remove_id
> > 3) turn on drivers_autoprobe
> > 4) call drivers_probe for anything added between 1) & 3)
> >
> > Is the question about the ugliness of the current solution whether it's
> > unreasonable to ask userspace to do this?
> >
> > What we seem to be asking for above is more like an autoprobe flag per
> > driver where there's some way for this special driver to opt out of
> auto
> > probing.  Option 2. in Stuart's list does this by short-cutting ID
> > matching so that a "match" is only found when using the sysfs bind
> path,
> > option 3. enables a way for a driver to expose their own sysfs entry
> > point for binding.  The latter feels particularly chaotic since drivers
> > get to make-up their own bind mechanism.
> >
> > Another twist I'll throw in is that devices can be hot added to IOMMU
> > groups that are in-use by userspace.  When that happens we'd like to be
> > able to disable driver autoprobe of the device to avoid a host driver
> > automatically binding to the device.  I wonder if instead of looking at
> > the problem from the driver perspective, if we were to instead look at
> > it from the device perspective if we might find a solution that would
> > address both.  For instance, if devices had a driver_probe_id property
> > that was by default set to their bus specific ID match ("$VENDOR
> > $DEVICE" on PCI) could we use that to write new match IDs so that a
> > device could only bind to a given driver?  Effectively we could then
> > bind either using the current method of adding to the list of IDs a
> > driver will match of changing the ID that a device would match.  Does
> > that get us anywhere?  Thanks,
> 
> Here's one way this might work for PCI; note that we can do this
> entirely in the bus driver for PCI.  Bind/unbind would go like this:
> 
> # bind device to vfio-pci
> echo vfio-pci > /sys/bus/pci/devices/0000\:03\:00.0/preferred_driver
> echo 0000:03:00.0 > /sys/bus/pci/devices/0000\:03\:00.0/driver/unbind
> echo 0000:03:00.0 > /sys/bus/pci/drivers_probe
> 
> # bind device back to host driver
> echo > /sys/bus/pci/devices/0000\:03\:00.0/preferred_driver
> echo 0000:03:00.0 > /sys/bus/pci/devices/0000\:03\:00.0/driver/unbind
> echo 0000:03:00.0 > /sys/bus/pci/drivers_probe
> 
> When preferred_driver is set for a device it will match and bind only to
> a driver with a matching name.  This also means we can write random
> strings here to avoid a device being bound to any driver if we want.
> 
> In the example patch below I've put the preferred_driver in the struct
> pci_dev, but if this mechanism were adopted by multiple devices perhaps
> we could add it to struct device.  Would something like this work for
> platform devices?
> 
> Note 1, the below is just the core PCI driver change to support this,
> there's some trivial collateral damage from changing an exported
> function not shown here for brevity.
> 
> Note 2, PCI passes a struct pci_device_id to the driver probe function
> which would be NULL in the preferred driver case of the example below.
> We'd need to dynamically create one of these when calling the probe
> function to make this practical for drivers that use that data.  Thanks,

The paradigm of telling the device what the preferred driver is feels
more awkward to me than a sysfs flag for the driver to opt out of
auto-probing...but at this point if there is consensus that the
preferred_driver approach will be accepted upstream, I'm ok with it.
It think it works.

However, I am concerned about getting 'preferred driver' accepted
into the kernel and it's not immediately obvious to me how it is more
palatable than the 'opt out of auto-probe' approaches that were
proposed previously.

I also, was at the point where I thought we should perhaps just
go with current mechanisms and implement new_id for the platform
bus...but Greg's recent response is 'platform devices suck' and it sounds
like he would reject a new_id patch for the platform bus.  So it kind
of feels like we are stuck.

Thanks,
Stuart

^ permalink raw reply	[flat|nested] 92+ messages in thread

* Re: mechanism to allow a driver to bind to any device
       [not found]                                               ` <7d1b495cdb6a415e8d3b7f60f409991c-ufbTtyGzTTT8GZusEWM6WuO6mTEJWrR4XA4E9RH9d+qIuWR1G4zioA@public.gmane.org>
@ 2014-03-31 19:47                                                 ` Greg KH
       [not found]                                                   ` <20140331194705.GA13014-U8xfFu+wG4EAvxtiuMwx3w@public.gmane.org>
  0 siblings, 1 reply; 92+ messages in thread
From: Greg KH @ 2014-03-31 19:47 UTC (permalink / raw)
  To: Stuart Yoder
  Cc: kvm-u79uwXL29TY76Z2rM5mHXA, jan.kiszka-kv7WeFo6aLtBDgjK7y7TUQ,
	will.deacon-5wv7dgnIgG8, a.rigo-lrHrjnjw1UfHK3s98zE1ajGjJy/sRE9J,
	Michal Hocko, Bjorn Helgaas, Varun Sethi,
	kvmarm-FPEHb7Xf0XXUo1n7N8X6UoWGPAHP3yOg, Rafael J. Wysocki,
	Alexander Graf, Guenter Roeck, Dmitry Kasatkin, Tejun Heo,
	Scott Wood, Antonios Motakis,
	tech-lrHrjnjw1UfHK3s98zE1ajGjJy/sRE9J, Toshi Kani,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA

On Mon, Mar 31, 2014 at 06:47:51PM +0000, Stuart Yoder wrote:
> I also, was at the point where I thought we should perhaps just
> go with current mechanisms and implement new_id for the platform
> bus...but Greg's recent response is 'platform devices suck' and it sounds
> like he would reject a new_id patch for the platform bus.  So it kind
> of feels like we are stuck.

ids mean nothing in the platform device model, so having a new_id file
for them makes no sense.

greg k-h

^ permalink raw reply	[flat|nested] 92+ messages in thread

* RE: mechanism to allow a driver to bind to any device
       [not found]                                                   ` <20140331194705.GA13014-U8xfFu+wG4EAvxtiuMwx3w@public.gmane.org>
@ 2014-03-31 20:23                                                     ` Stuart Yoder
  2014-03-31 22:32                                                         ` Kim Phillips
  0 siblings, 1 reply; 92+ messages in thread
From: Stuart Yoder @ 2014-03-31 20:23 UTC (permalink / raw)
  To: Greg KH
  Cc: kvm-u79uwXL29TY76Z2rM5mHXA, jan.kiszka-kv7WeFo6aLtBDgjK7y7TUQ,
	will.deacon-5wv7dgnIgG8, a.rigo-lrHrjnjw1UfHK3s98zE1ajGjJy/sRE9J,
	Michal Hocko, Bjorn Helgaas, Varun Sethi,
	kvmarm-FPEHb7Xf0XXUo1n7N8X6UoWGPAHP3yOg, Rafael J. Wysocki,
	Alexander Graf, Guenter Roeck, Dmitry Kasatkin, Tejun Heo,
	Scott Wood, Antonios Motakis,
	tech-lrHrjnjw1UfHK3s98zE1ajGjJy/sRE9J, Toshi Kani,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA



> -----Original Message-----
> From: Greg KH [mailto:gregkh-hQyY1W1yCW8ekmWlsbkhG0B+6BGkLq7r@public.gmane.org]
> Sent: Monday, March 31, 2014 2:47 PM
> To: Yoder Stuart-B08248
> Cc: Alex Williamson; Alexander Graf; kvm-u79uwXL29TY76Z2rM5mHXA@public.gmane.org;
> jan.kiszka-kv7WeFo6aLtBDgjK7y7TUQ@public.gmane.org; will.deacon-5wv7dgnIgG8@public.gmane.org;
> a.rigo-lrHrjnjw1UfHK3s98zE1ajGjJy/sRE9J@public.gmane.org; Michal Hocko; Wood Scott-B07421; Sethi
> Varun-B16395; kvmarm-FPEHb7Xf0XXUo1n7N8X6UoWGPAHP3yOg@public.gmane.org; Rafael J. Wysocki; Guenter
> Roeck; Dmitry Kasatkin; Tejun Heo; Bjorn Helgaas; Antonios Motakis;
> tech-lrHrjnjw1UfHK3s98zE1ajGjJy/sRE9J@public.gmane.org; Toshi Kani; linux-kernel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org;
> iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA@public.gmane.org; Joe Perches;
> christoffer.dall-QSEj5FYQhm4dnm+yROfE0A@public.gmane.org
> Subject: Re: mechanism to allow a driver to bind to any device
> 
> On Mon, Mar 31, 2014 at 06:47:51PM +0000, Stuart Yoder wrote:
> > I also, was at the point where I thought we should perhaps just
> > go with current mechanisms and implement new_id for the platform
> > bus...but Greg's recent response is 'platform devices suck' and it
> sounds
> > like he would reject a new_id patch for the platform bus.  So it kind
> > of feels like we are stuck.
> 
> ids mean nothing in the platform device model, so having a new_id file
> for them makes no sense.

They don't have IDs like PCI, but platform drivers have to match on
something.  Platform device match tables are based on compatible strings.

Example from Freescale DMA driver:
  static const struct of_device_id fsldma_of_ids[] = {
        { .compatible = "fsl,elo3-dma", },
        { .compatible = "fsl,eloplus-dma", },
        { .compatible = "fsl,elo-dma", },
        {}
  };

The process of unbinding, setting a new_id, and binding to vfio would work
just like PCI:

   echo ffe101300.dma > /sys/bus/platform/devices/ffe101300.dma/driver/unbind
   echo fsl,eloplus-dma > /sys/bus/platform/drivers/vfio-platform/new_id

Thanks,
Stuart

^ permalink raw reply	[flat|nested] 92+ messages in thread

* Re: mechanism to allow a driver to bind to any device
@ 2014-03-31 22:32                                                         ` Kim Phillips
  0 siblings, 0 replies; 92+ messages in thread
From: Kim Phillips @ 2014-03-31 22:32 UTC (permalink / raw)
  To: stuart.yoder
  Cc: gregkh, kvm, jan.kiszka, will.deacon, mhocko, bhelgaas,
	Varun.Sethi, kvmarm, rafael.j.wysocki, linux, d.kasatkin, tj,
	alex.williamson, scottwood, tech, toshi.kani, linux-kernel,
	iommu, joe, kim.phillips

On Mon, 31 Mar 2014 20:23:36 +0000
Stuart Yoder <stuart.yoder@freescale.com> wrote:

> > From: Greg KH [mailto:gregkh@linuxfoundation.org]
> > Sent: Monday, March 31, 2014 2:47 PM
> > 
> > On Mon, Mar 31, 2014 at 06:47:51PM +0000, Stuart Yoder wrote:
> > > I also, was at the point where I thought we should perhaps just
> > > go with current mechanisms and implement new_id for the platform
> > > bus...but Greg's recent response is 'platform devices suck' and it
> > sounds
> > > like he would reject a new_id patch for the platform bus.  So it kind
> > > of feels like we are stuck.
> > 
> > ids mean nothing in the platform device model, so having a new_id file
> > for them makes no sense.
> 
> They don't have IDs like PCI, but platform drivers have to match on
> something.  Platform device match tables are based on compatible strings.
> 
> Example from Freescale DMA driver:
>   static const struct of_device_id fsldma_of_ids[] = {
>         { .compatible = "fsl,elo3-dma", },
>         { .compatible = "fsl,eloplus-dma", },
>         { .compatible = "fsl,elo-dma", },
>         {}
>   };
> 
> The process of unbinding, setting a new_id, and binding to vfio would work
> just like PCI:
> 
>    echo ffe101300.dma > /sys/bus/platform/devices/ffe101300.dma/driver/unbind
>    echo fsl,eloplus-dma > /sys/bus/platform/drivers/vfio-platform/new_id

In platform device land, we don't want to pursue the
new_id/match-by-compatible methodology: we know exactly which specific
device (not device types) we want bound to which driver, so we just
want to be able to simply:

echo fff51000.ethernet | sudo tee -a /sys/bus/platform/devices/fff51000.ethernet/driver/unbind
echo fff51000.ethernet | sudo tee -a /sys/bus/platform/drivers/vfio-platform/bind

and not get involved with how PCI "doesn't simply do that," independent
of autoprobe/hotplug.

Kim

^ permalink raw reply	[flat|nested] 92+ messages in thread

* Re: mechanism to allow a driver to bind to any device
@ 2014-03-31 22:32                                                         ` Kim Phillips
  0 siblings, 0 replies; 92+ messages in thread
From: Kim Phillips @ 2014-03-31 22:32 UTC (permalink / raw)
  To: stuart.yoder-KZfg59tc24xl57MIdRCFDg
  Cc: joe-6d6DIl74uiNBDgjK7y7TUQ, d.kasatkin-Sze3O3UU22JBDgjK7y7TUQ,
	toshi.kani-VXdhtT5mjnY, kvm-u79uwXL29TY76Z2rM5mHXA,
	gregkh-hQyY1W1yCW8ekmWlsbkhG0B+6BGkLq7r,
	rafael.j.wysocki-ral2JQCrhuEAvxtiuMwx3w, will.deacon-5wv7dgnIgG8,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	scottwood-KZfg59tc24xl57MIdRCFDg, mhocko-AlSwsSmVLrQ,
	tj-DgEjT+Ai2ygdnm+yROfE0A, kim.phillips-KZfg59tc24xl57MIdRCFDg,
	jan.kiszka-kv7WeFo6aLtBDgjK7y7TUQ,
	bhelgaas-hpIqsD4AKlfQT0dZR+AlfA,
	Varun.Sethi-KZfg59tc24xl57MIdRCFDg,
	tech-lrHrjnjw1UfHK3s98zE1ajGjJy/sRE9J,
	kvmarm-FPEHb7Xf0XXUo1n7N8X6UoWGPAHP3yOg,
	linux-0h96xk9xTtrk1uMJSBkQmQ

On Mon, 31 Mar 2014 20:23:36 +0000
Stuart Yoder <stuart.yoder-KZfg59tc24xl57MIdRCFDg@public.gmane.org> wrote:

> > From: Greg KH [mailto:gregkh-hQyY1W1yCW8ekmWlsbkhG0B+6BGkLq7r@public.gmane.org]
> > Sent: Monday, March 31, 2014 2:47 PM
> > 
> > On Mon, Mar 31, 2014 at 06:47:51PM +0000, Stuart Yoder wrote:
> > > I also, was at the point where I thought we should perhaps just
> > > go with current mechanisms and implement new_id for the platform
> > > bus...but Greg's recent response is 'platform devices suck' and it
> > sounds
> > > like he would reject a new_id patch for the platform bus.  So it kind
> > > of feels like we are stuck.
> > 
> > ids mean nothing in the platform device model, so having a new_id file
> > for them makes no sense.
> 
> They don't have IDs like PCI, but platform drivers have to match on
> something.  Platform device match tables are based on compatible strings.
> 
> Example from Freescale DMA driver:
>   static const struct of_device_id fsldma_of_ids[] = {
>         { .compatible = "fsl,elo3-dma", },
>         { .compatible = "fsl,eloplus-dma", },
>         { .compatible = "fsl,elo-dma", },
>         {}
>   };
> 
> The process of unbinding, setting a new_id, and binding to vfio would work
> just like PCI:
> 
>    echo ffe101300.dma > /sys/bus/platform/devices/ffe101300.dma/driver/unbind
>    echo fsl,eloplus-dma > /sys/bus/platform/drivers/vfio-platform/new_id

In platform device land, we don't want to pursue the
new_id/match-by-compatible methodology: we know exactly which specific
device (not device types) we want bound to which driver, so we just
want to be able to simply:

echo fff51000.ethernet | sudo tee -a /sys/bus/platform/devices/fff51000.ethernet/driver/unbind
echo fff51000.ethernet | sudo tee -a /sys/bus/platform/drivers/vfio-platform/bind

and not get involved with how PCI "doesn't simply do that," independent
of autoprobe/hotplug.

Kim

^ permalink raw reply	[flat|nested] 92+ messages in thread

* Re: mechanism to allow a driver to bind to any device
@ 2014-03-31 22:36                                                       ` Kim Phillips
  0 siblings, 0 replies; 92+ messages in thread
From: Kim Phillips @ 2014-03-31 22:36 UTC (permalink / raw)
  To: alex.williamson
  Cc: konrad.wilk, kvm, jan.kiszka, will.deacon, stuart.yoder, a.rigo,
	mhocko, scottwood, Varun.Sethi, kvmarm, rafael.j.wysocki, agraf,
	linux, d.kasatkin, tj, bhelgaas, a.motakis, tech, toshi.kani,
	gregkh, linux-kernel, iommu, joe, christoffer.dall, kim.phillips

On Fri, 28 Mar 2014 11:10:23 -0600
Alex Williamson <alex.williamson@redhat.com> wrote:

> On Fri, 2014-03-28 at 12:58 -0400, Konrad Rzeszutek Wilk wrote:
> > On Wed, Mar 26, 2014 at 04:09:21PM -0600, Alex Williamson wrote:
> > > On Wed, 2014-03-26 at 10:21 -0600, Alex Williamson wrote:
> > > > On Wed, 2014-03-26 at 23:06 +0800, Alexander Graf wrote:
> > > > > 
> > > > > > Am 26.03.2014 um 22:40 schrieb Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>:
> > > > > > 
> > > > > >> On Wed, Mar 26, 2014 at 01:40:32AM +0000, Stuart Yoder wrote:
> > > > > >> Hi Greg,
> > > > > >> 
> > > > > >> We (Linaro, Freescale, Virtual Open Systems) are trying get an issue
> > > > > >> closed that has been perculating for a while around creating a mechanism
> > > > > >> that will allow kernel drivers like vfio can bind to devices of any type.
> > > > > >> 
> > > > > >> This thread with you:
> > > > > >> http://www.spinics.net/lists/kvm-arm/msg08370.html
> > > > > >> ...seems to have died out, so am trying to get your response
> > > > > >> and will summarize again.  Vfio drivers in the kernel (regardless of
> > > > > >> bus type) need to bind to devices of any type.  The driver's function
> > > > > >> is to simply export hardware resources of any type to user space.
> > > > > >> 
> > > > > >> There are several approaches that have been proposed:
> > > > > > 
> > > > > > You seem to have missed the one I proposed.
> > > > > >> 
> > > > > >>   1.  new_id -- (current approach) the user explicitly registers
> > > > > >>       each new device type with the vfio driver using the new_id
> > > > > >>       mechanism.
> > > > > >> 
> > > > > >>       Problem: multiple drivers will be resident that handle the
> > > > > >>       same device type...and there is nothing user space hotplug
> > > > > >>       infrastructure can do to help.
> > > > > >> 
> > > > > >>   2.  "any id" -- the vfio driver could specify a wildcard match
> > > > > >>       of some kind in its ID match table which would allow it to
> > > > > >>       match and bind to any possible device id.  However,
> > > > > >>       we don't want the vfio driver grabbing _all_ devices...just the ones we
> > > > > >>       explicitly want to pass to user space.
> > > > > >> 
> > > > > >>       The proposed patch to support this was to create a new flag
> > > > > >>       "sysfs_bind_only" in struct device_driver.  When this flag
> > > > > >>       is set, the driver can only bind to devices via the sysfs
> > > > > >>       bind file.  This would allow the wildcard match to work.
> > > > > >> 
> > > > > >>       Patch is here:
> > > > > >>       https://lkml.org/lkml/2013/12/3/253
> > > > > >> 
> > > > > >>   3.  "Driver initiated explicit bind" -- with this approach the
> > > > > >>       vfio driver would create a private 'bind' sysfs object
> > > > > >>       and the user would echo the requested device into it:
> > > > > >> 
> > > > > >>       echo 0001:03:00.0 > /sys/bus/pci/drivers/vfio-pci/vfio_bind
> > > > > >> 
> > > > > >>       In order to make that work, the driver would need to call
> > > > > >>       driver_probe_device() and thus we need this patch:
> > > > > >>       https://lkml.org/lkml/2014/2/8/175
> > > > > > 
> > > > > > 4). Use the 'unbind' (from the original device) and 'bind' to vfio driver.
> > > > > 
> > > > > This is approach 2, no?
> > > > > 
> > > > > > 
> > > > > > Which I think is what is currently being done. Why is that not sufficient?
> > > > > 
> > > > > How would 'bind to vfio driver' look like?
> > > > > 
> > > > > > The only thing I see in the URL is " That works, but it is ugly."
> > > > > > There is some mention of race but I don't see how - if you do the 'unbind'
> > > > > > on the original driver and then bind the BDF to the VFIO how would you get
> > > > > > a race?
> > > > > 
> > > > > Typically on PCI, you do a
> > > > > 
> > > > >   - add wildcard (pci id) match to vfio driver
> > > > >   - unbind driver
> > > > >   -> reprobe
> > > > >   -> device attaches to vfio driver because it is the least recent match
> > > > >   - remove wildcard match from vfio driver
> > > > > 
> > > > > If in between you hotplug add a card of the same type, it gets attached to vfio - even though the logical "default driver" would be the device specific driver.
> > > > 
> > > > I've mentioned drivers_autoprobe in the past, but I'm not sure we're
> > > > really factoring it into the discussion.  drivers_autoprobe allows us to
> > > > toggle two points:
> > > > 
> > > > a) When a new device is added whether we automatically give drivers a
> > > > try at binding to it
> > > > 
> > > > b) When a new driver is added whether it gets to try to bind to anything
> > > > in the system
> > > > 
> > > > So we do have a mechanism to avoid the race, but the problem is that it
> > > > becomes the responsibility of userspace to:
> > > > 
> > > > 1) turn off drivers_autoprobe
> > > > 2) unbind/new_id/bind/remove_id
> > > > 3) turn on drivers_autoprobe
> > > > 4) call drivers_probe for anything added between 1) & 3)
> > > > 
> > > > Is the question about the ugliness of the current solution whether it's
> > > > unreasonable to ask userspace to do this?
> > > > What we seem to be asking for above is more like an autoprobe flag per
> > > > driver where there's some way for this special driver to opt out of auto
> > > > probing.  Option 2. in Stuart's list does this by short-cutting ID
> > > > matching so that a "match" is only found when using the sysfs bind path,
> > > > option 3. enables a way for a driver to expose their own sysfs entry
> > > > point for binding.  The latter feels particularly chaotic since drivers
> > > > get to make-up their own bind mechanism.

agreed - so far, option 2 looks the most sane.

> > > > Another twist I'll throw in is that devices can be hot added to IOMMU
> > > > groups that are in-use by userspace.  When that happens we'd like to be
> > > > able to disable driver autoprobe of the device to avoid a host driver
> > > > automatically binding to the device.  I wonder if instead of looking at
> > > > the problem from the driver perspective, if we were to instead look at
> > > > it from the device perspective if we might find a solution that would
> > > > address both.  For instance, if devices had a driver_probe_id property
> > > > that was by default set to their bus specific ID match ("$VENDOR
> > > > $DEVICE" on PCI) could we use that to write new match IDs so that a
> > > > device could only bind to a given driver?  Effectively we could then
> > > > bind either using the current method of adding to the list of IDs a
> > > > driver will match of changing the ID that a device would match.  Does
> > > > that get us anywhere?  Thanks,

How does this compare to Scott's device->sysfs_bind_only, in addition
to option 2 above's driver->sysfs_bind_only?:

"What it looks like we do still want from the driver core is the ability
for a driver to say that it should not be bound to a device except via
explicit sysfs bind, and the ability for a user to say that a device
should not be bound to a driver except via explicit sysfs bind.  This is
a separate issue from making driver_match_device() happy (in some
earlier e-mails in the thread these two issues were not properly
separated)." [1]

> > > Here's one way this might work for PCI; note that we can do this
> > > entirely in the bus driver for PCI.  Bind/unbind would go like this:
> > > 
> > > # bind device to vfio-pci
> > > echo vfio-pci > /sys/bus/pci/devices/0000\:03\:00.0/preferred_driver
> > > echo 0000:03:00.0 > /sys/bus/pci/devices/0000\:03\:00.0/driver/unbind
> > > echo 0000:03:00.0 > /sys/bus/pci/drivers_probe
> > > 
> > > # bind device back to host driver
> > > echo > /sys/bus/pci/devices/0000\:03\:00.0/preferred_driver
> > > echo 0000:03:00.0 > /sys/bus/pci/devices/0000\:03\:00.0/driver/unbind
> > > echo 0000:03:00.0 > /sys/bus/pci/drivers_probe

With the null-write to preferred_driver, it's not crystal clear (to
me at least) what would happen in the above command sequence, given
multiple drivers may match.  It seems like there'd be more control
binding in a multiple driver-match environment using
{device,driver}->sysfs_bind_only.

Kim

[1] last paragraph:
http://www.spinics.net/lists/kvm/msg96906.html

^ permalink raw reply	[flat|nested] 92+ messages in thread

* Re: mechanism to allow a driver to bind to any device
@ 2014-03-31 22:36                                                       ` Kim Phillips
  0 siblings, 0 replies; 92+ messages in thread
From: Kim Phillips @ 2014-03-31 22:36 UTC (permalink / raw)
  To: alex.williamson-H+wXaHxf7aLQT0dZR+AlfA
  Cc: kvm-u79uwXL29TY76Z2rM5mHXA, jan.kiszka-kv7WeFo6aLtBDgjK7y7TUQ,
	will.deacon-5wv7dgnIgG8, stuart.yoder-KZfg59tc24xl57MIdRCFDg,
	a.rigo-lrHrjnjw1UfHK3s98zE1ajGjJy/sRE9J, mhocko-AlSwsSmVLrQ,
	bhelgaas-hpIqsD4AKlfQT0dZR+AlfA,
	Varun.Sethi-KZfg59tc24xl57MIdRCFDg,
	kvmarm-FPEHb7Xf0XXUo1n7N8X6UoWGPAHP3yOg,
	rafael.j.wysocki-ral2JQCrhuEAvxtiuMwx3w, agraf-l3A5Bk7waGM,
	linux-0h96xk9xTtrk1uMJSBkQmQ, d.kasatkin-Sze3O3UU22JBDgjK7y7TUQ,
	joe-6d6DIl74uiNBDgjK7y7TUQ, scottwood-KZfg59tc24xl57MIdRCFDg,
	a.motakis-lrHrjnjw1UfHK3s98zE1ajGjJy/sRE9J,
	tech-lrHrjnjw1UfHK3s98zE1ajGjJy/sRE9J,
	kim.phillips-KZfg59tc24xl57MIdRCFDg, toshi.kani-VXdhtT5mjnY,
	gregkh-hQyY1W1yCW8ekmWlsbkhG0B+6BGkLq7r,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	tj-DgEjT+Ai2ygdnm+yROfE0A,
	christoffer.dall-QSEj5FYQhm4dnm+yROfE0A

On Fri, 28 Mar 2014 11:10:23 -0600
Alex Williamson <alex.williamson-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org> wrote:

> On Fri, 2014-03-28 at 12:58 -0400, Konrad Rzeszutek Wilk wrote:
> > On Wed, Mar 26, 2014 at 04:09:21PM -0600, Alex Williamson wrote:
> > > On Wed, 2014-03-26 at 10:21 -0600, Alex Williamson wrote:
> > > > On Wed, 2014-03-26 at 23:06 +0800, Alexander Graf wrote:
> > > > > 
> > > > > > Am 26.03.2014 um 22:40 schrieb Konrad Rzeszutek Wilk <konrad.wilk-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org>:
> > > > > > 
> > > > > >> On Wed, Mar 26, 2014 at 01:40:32AM +0000, Stuart Yoder wrote:
> > > > > >> Hi Greg,
> > > > > >> 
> > > > > >> We (Linaro, Freescale, Virtual Open Systems) are trying get an issue
> > > > > >> closed that has been perculating for a while around creating a mechanism
> > > > > >> that will allow kernel drivers like vfio can bind to devices of any type.
> > > > > >> 
> > > > > >> This thread with you:
> > > > > >> http://www.spinics.net/lists/kvm-arm/msg08370.html
> > > > > >> ...seems to have died out, so am trying to get your response
> > > > > >> and will summarize again.  Vfio drivers in the kernel (regardless of
> > > > > >> bus type) need to bind to devices of any type.  The driver's function
> > > > > >> is to simply export hardware resources of any type to user space.
> > > > > >> 
> > > > > >> There are several approaches that have been proposed:
> > > > > > 
> > > > > > You seem to have missed the one I proposed.
> > > > > >> 
> > > > > >>   1.  new_id -- (current approach) the user explicitly registers
> > > > > >>       each new device type with the vfio driver using the new_id
> > > > > >>       mechanism.
> > > > > >> 
> > > > > >>       Problem: multiple drivers will be resident that handle the
> > > > > >>       same device type...and there is nothing user space hotplug
> > > > > >>       infrastructure can do to help.
> > > > > >> 
> > > > > >>   2.  "any id" -- the vfio driver could specify a wildcard match
> > > > > >>       of some kind in its ID match table which would allow it to
> > > > > >>       match and bind to any possible device id.  However,
> > > > > >>       we don't want the vfio driver grabbing _all_ devices...just the ones we
> > > > > >>       explicitly want to pass to user space.
> > > > > >> 
> > > > > >>       The proposed patch to support this was to create a new flag
> > > > > >>       "sysfs_bind_only" in struct device_driver.  When this flag
> > > > > >>       is set, the driver can only bind to devices via the sysfs
> > > > > >>       bind file.  This would allow the wildcard match to work.
> > > > > >> 
> > > > > >>       Patch is here:
> > > > > >>       https://lkml.org/lkml/2013/12/3/253
> > > > > >> 
> > > > > >>   3.  "Driver initiated explicit bind" -- with this approach the
> > > > > >>       vfio driver would create a private 'bind' sysfs object
> > > > > >>       and the user would echo the requested device into it:
> > > > > >> 
> > > > > >>       echo 0001:03:00.0 > /sys/bus/pci/drivers/vfio-pci/vfio_bind
> > > > > >> 
> > > > > >>       In order to make that work, the driver would need to call
> > > > > >>       driver_probe_device() and thus we need this patch:
> > > > > >>       https://lkml.org/lkml/2014/2/8/175
> > > > > > 
> > > > > > 4). Use the 'unbind' (from the original device) and 'bind' to vfio driver.
> > > > > 
> > > > > This is approach 2, no?
> > > > > 
> > > > > > 
> > > > > > Which I think is what is currently being done. Why is that not sufficient?
> > > > > 
> > > > > How would 'bind to vfio driver' look like?
> > > > > 
> > > > > > The only thing I see in the URL is " That works, but it is ugly."
> > > > > > There is some mention of race but I don't see how - if you do the 'unbind'
> > > > > > on the original driver and then bind the BDF to the VFIO how would you get
> > > > > > a race?
> > > > > 
> > > > > Typically on PCI, you do a
> > > > > 
> > > > >   - add wildcard (pci id) match to vfio driver
> > > > >   - unbind driver
> > > > >   -> reprobe
> > > > >   -> device attaches to vfio driver because it is the least recent match
> > > > >   - remove wildcard match from vfio driver
> > > > > 
> > > > > If in between you hotplug add a card of the same type, it gets attached to vfio - even though the logical "default driver" would be the device specific driver.
> > > > 
> > > > I've mentioned drivers_autoprobe in the past, but I'm not sure we're
> > > > really factoring it into the discussion.  drivers_autoprobe allows us to
> > > > toggle two points:
> > > > 
> > > > a) When a new device is added whether we automatically give drivers a
> > > > try at binding to it
> > > > 
> > > > b) When a new driver is added whether it gets to try to bind to anything
> > > > in the system
> > > > 
> > > > So we do have a mechanism to avoid the race, but the problem is that it
> > > > becomes the responsibility of userspace to:
> > > > 
> > > > 1) turn off drivers_autoprobe
> > > > 2) unbind/new_id/bind/remove_id
> > > > 3) turn on drivers_autoprobe
> > > > 4) call drivers_probe for anything added between 1) & 3)
> > > > 
> > > > Is the question about the ugliness of the current solution whether it's
> > > > unreasonable to ask userspace to do this?
> > > > What we seem to be asking for above is more like an autoprobe flag per
> > > > driver where there's some way for this special driver to opt out of auto
> > > > probing.  Option 2. in Stuart's list does this by short-cutting ID
> > > > matching so that a "match" is only found when using the sysfs bind path,
> > > > option 3. enables a way for a driver to expose their own sysfs entry
> > > > point for binding.  The latter feels particularly chaotic since drivers
> > > > get to make-up their own bind mechanism.

agreed - so far, option 2 looks the most sane.

> > > > Another twist I'll throw in is that devices can be hot added to IOMMU
> > > > groups that are in-use by userspace.  When that happens we'd like to be
> > > > able to disable driver autoprobe of the device to avoid a host driver
> > > > automatically binding to the device.  I wonder if instead of looking at
> > > > the problem from the driver perspective, if we were to instead look at
> > > > it from the device perspective if we might find a solution that would
> > > > address both.  For instance, if devices had a driver_probe_id property
> > > > that was by default set to their bus specific ID match ("$VENDOR
> > > > $DEVICE" on PCI) could we use that to write new match IDs so that a
> > > > device could only bind to a given driver?  Effectively we could then
> > > > bind either using the current method of adding to the list of IDs a
> > > > driver will match of changing the ID that a device would match.  Does
> > > > that get us anywhere?  Thanks,

How does this compare to Scott's device->sysfs_bind_only, in addition
to option 2 above's driver->sysfs_bind_only?:

"What it looks like we do still want from the driver core is the ability
for a driver to say that it should not be bound to a device except via
explicit sysfs bind, and the ability for a user to say that a device
should not be bound to a driver except via explicit sysfs bind.  This is
a separate issue from making driver_match_device() happy (in some
earlier e-mails in the thread these two issues were not properly
separated)." [1]

> > > Here's one way this might work for PCI; note that we can do this
> > > entirely in the bus driver for PCI.  Bind/unbind would go like this:
> > > 
> > > # bind device to vfio-pci
> > > echo vfio-pci > /sys/bus/pci/devices/0000\:03\:00.0/preferred_driver
> > > echo 0000:03:00.0 > /sys/bus/pci/devices/0000\:03\:00.0/driver/unbind
> > > echo 0000:03:00.0 > /sys/bus/pci/drivers_probe
> > > 
> > > # bind device back to host driver
> > > echo > /sys/bus/pci/devices/0000\:03\:00.0/preferred_driver
> > > echo 0000:03:00.0 > /sys/bus/pci/devices/0000\:03\:00.0/driver/unbind
> > > echo 0000:03:00.0 > /sys/bus/pci/drivers_probe

With the null-write to preferred_driver, it's not crystal clear (to
me at least) what would happen in the above command sequence, given
multiple drivers may match.  It seems like there'd be more control
binding in a multiple driver-match environment using
{device,driver}->sysfs_bind_only.

Kim

[1] last paragraph:
http://www.spinics.net/lists/kvm/msg96906.html

^ permalink raw reply	[flat|nested] 92+ messages in thread

* Re: mechanism to allow a driver to bind to any device
@ 2014-03-31 23:52                                                         ` Alex Williamson
  0 siblings, 0 replies; 92+ messages in thread
From: Alex Williamson @ 2014-03-31 23:52 UTC (permalink / raw)
  To: Kim Phillips
  Cc: konrad.wilk, kvm, jan.kiszka, will.deacon, stuart.yoder, a.rigo,
	mhocko, scottwood, Varun.Sethi, kvmarm, rafael.j.wysocki, agraf,
	linux, d.kasatkin, tj, bhelgaas, a.motakis, tech, toshi.kani,
	gregkh, linux-kernel, iommu, joe, christoffer.dall, kim.phillips

On Mon, 2014-03-31 at 17:36 -0500, Kim Phillips wrote:
> On Fri, 28 Mar 2014 11:10:23 -0600
> Alex Williamson <alex.williamson@redhat.com> wrote:
> 
> > On Fri, 2014-03-28 at 12:58 -0400, Konrad Rzeszutek Wilk wrote:
> > > On Wed, Mar 26, 2014 at 04:09:21PM -0600, Alex Williamson wrote:
> > > > On Wed, 2014-03-26 at 10:21 -0600, Alex Williamson wrote:
> > > > > On Wed, 2014-03-26 at 23:06 +0800, Alexander Graf wrote:
> > > > > > 
> > > > > > > Am 26.03.2014 um 22:40 schrieb Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>:
> > > > > > > 
> > > > > > >> On Wed, Mar 26, 2014 at 01:40:32AM +0000, Stuart Yoder wrote:
> > > > > > >> Hi Greg,
> > > > > > >> 
> > > > > > >> We (Linaro, Freescale, Virtual Open Systems) are trying get an issue
> > > > > > >> closed that has been perculating for a while around creating a mechanism
> > > > > > >> that will allow kernel drivers like vfio can bind to devices of any type.
> > > > > > >> 
> > > > > > >> This thread with you:
> > > > > > >> http://www.spinics.net/lists/kvm-arm/msg08370.html
> > > > > > >> ...seems to have died out, so am trying to get your response
> > > > > > >> and will summarize again.  Vfio drivers in the kernel (regardless of
> > > > > > >> bus type) need to bind to devices of any type.  The driver's function
> > > > > > >> is to simply export hardware resources of any type to user space.
> > > > > > >> 
> > > > > > >> There are several approaches that have been proposed:
> > > > > > > 
> > > > > > > You seem to have missed the one I proposed.
> > > > > > >> 
> > > > > > >>   1.  new_id -- (current approach) the user explicitly registers
> > > > > > >>       each new device type with the vfio driver using the new_id
> > > > > > >>       mechanism.
> > > > > > >> 
> > > > > > >>       Problem: multiple drivers will be resident that handle the
> > > > > > >>       same device type...and there is nothing user space hotplug
> > > > > > >>       infrastructure can do to help.
> > > > > > >> 
> > > > > > >>   2.  "any id" -- the vfio driver could specify a wildcard match
> > > > > > >>       of some kind in its ID match table which would allow it to
> > > > > > >>       match and bind to any possible device id.  However,
> > > > > > >>       we don't want the vfio driver grabbing _all_ devices...just the ones we
> > > > > > >>       explicitly want to pass to user space.
> > > > > > >> 
> > > > > > >>       The proposed patch to support this was to create a new flag
> > > > > > >>       "sysfs_bind_only" in struct device_driver.  When this flag
> > > > > > >>       is set, the driver can only bind to devices via the sysfs
> > > > > > >>       bind file.  This would allow the wildcard match to work.
> > > > > > >> 
> > > > > > >>       Patch is here:
> > > > > > >>       https://lkml.org/lkml/2013/12/3/253
> > > > > > >> 
> > > > > > >>   3.  "Driver initiated explicit bind" -- with this approach the
> > > > > > >>       vfio driver would create a private 'bind' sysfs object
> > > > > > >>       and the user would echo the requested device into it:
> > > > > > >> 
> > > > > > >>       echo 0001:03:00.0 > /sys/bus/pci/drivers/vfio-pci/vfio_bind
> > > > > > >> 
> > > > > > >>       In order to make that work, the driver would need to call
> > > > > > >>       driver_probe_device() and thus we need this patch:
> > > > > > >>       https://lkml.org/lkml/2014/2/8/175
> > > > > > > 
> > > > > > > 4). Use the 'unbind' (from the original device) and 'bind' to vfio driver.
> > > > > > 
> > > > > > This is approach 2, no?
> > > > > > 
> > > > > > > 
> > > > > > > Which I think is what is currently being done. Why is that not sufficient?
> > > > > > 
> > > > > > How would 'bind to vfio driver' look like?
> > > > > > 
> > > > > > > The only thing I see in the URL is " That works, but it is ugly."
> > > > > > > There is some mention of race but I don't see how - if you do the 'unbind'
> > > > > > > on the original driver and then bind the BDF to the VFIO how would you get
> > > > > > > a race?
> > > > > > 
> > > > > > Typically on PCI, you do a
> > > > > > 
> > > > > >   - add wildcard (pci id) match to vfio driver
> > > > > >   - unbind driver
> > > > > >   -> reprobe
> > > > > >   -> device attaches to vfio driver because it is the least recent match
> > > > > >   - remove wildcard match from vfio driver
> > > > > > 
> > > > > > If in between you hotplug add a card of the same type, it gets attached to vfio - even though the logical "default driver" would be the device specific driver.
> > > > > 
> > > > > I've mentioned drivers_autoprobe in the past, but I'm not sure we're
> > > > > really factoring it into the discussion.  drivers_autoprobe allows us to
> > > > > toggle two points:
> > > > > 
> > > > > a) When a new device is added whether we automatically give drivers a
> > > > > try at binding to it
> > > > > 
> > > > > b) When a new driver is added whether it gets to try to bind to anything
> > > > > in the system
> > > > > 
> > > > > So we do have a mechanism to avoid the race, but the problem is that it
> > > > > becomes the responsibility of userspace to:
> > > > > 
> > > > > 1) turn off drivers_autoprobe
> > > > > 2) unbind/new_id/bind/remove_id
> > > > > 3) turn on drivers_autoprobe
> > > > > 4) call drivers_probe for anything added between 1) & 3)
> > > > > 
> > > > > Is the question about the ugliness of the current solution whether it's
> > > > > unreasonable to ask userspace to do this?
> > > > > What we seem to be asking for above is more like an autoprobe flag per
> > > > > driver where there's some way for this special driver to opt out of auto
> > > > > probing.  Option 2. in Stuart's list does this by short-cutting ID
> > > > > matching so that a "match" is only found when using the sysfs bind path,
> > > > > option 3. enables a way for a driver to expose their own sysfs entry
> > > > > point for binding.  The latter feels particularly chaotic since drivers
> > > > > get to make-up their own bind mechanism.
> 
> agreed - so far, option 2 looks the most sane.
> 
> > > > > Another twist I'll throw in is that devices can be hot added to IOMMU
> > > > > groups that are in-use by userspace.  When that happens we'd like to be
> > > > > able to disable driver autoprobe of the device to avoid a host driver
> > > > > automatically binding to the device.  I wonder if instead of looking at
> > > > > the problem from the driver perspective, if we were to instead look at
> > > > > it from the device perspective if we might find a solution that would
> > > > > address both.  For instance, if devices had a driver_probe_id property
> > > > > that was by default set to their bus specific ID match ("$VENDOR
> > > > > $DEVICE" on PCI) could we use that to write new match IDs so that a
> > > > > device could only bind to a given driver?  Effectively we could then
> > > > > bind either using the current method of adding to the list of IDs a
> > > > > driver will match of changing the ID that a device would match.  Does
> > > > > that get us anywhere?  Thanks,
> 
> How does this compare to Scott's device->sysfs_bind_only, in addition
> to option 2 above's driver->sysfs_bind_only?:
> 
> "What it looks like we do still want from the driver core is the ability
> for a driver to say that it should not be bound to a device except via
> explicit sysfs bind, and the ability for a user to say that a device
> should not be bound to a driver except via explicit sysfs bind.  This is
> a separate issue from making driver_match_device() happy (in some
> earlier e-mails in the thread these two issues were not properly
> separated)." [1]

Sorry, I can't find reference to how device->sysfs_bind_only works in
conjunction with driver->sysfs_bind_only.  Can you provide some example
use cases?

As it stands, the driver->sysfs_bind_only patch of option 2 forces a
driver to operate in either the existing mode or a mode where there is
no automatic binding.  That breaks existing vfio-pci users today who are
able to build the driver static into their kernel and use vfio-pci.ids=
$VENDOR:$DEVICE on the kernel commandline so that vfio-pci grabs their
devices before any loadable module drivers.  It also doesn't address the
issue above where a device is hot-added to an IOMMU group and we may
want to have the device auto-bind to the vfio driver, or at the very
least not auto-bind to a host driver.  Maybe the device portion of
sysfs_bind_only addresses that.

All of the original proposals above are working on the premise that we
add an id or enable an "any id" for a driver, but then we need to
prevent the driver from binding to others of that id, which just makes a
mess.  So why not reverse it and allow a device to specify an id that
matches a driver?  That automatically solves the many-to-one problem of
device-to-driver since we're only setting a property on a single device.
We also no longer care if the device gets bound automatically or via
sysfs because it can only go to the correct driver.  So we don't need a
restriction like sysfs-only.  There's no modal operation of the driver,
it's just a new match rule.  It can also be implemented at the bus
driver, completely independent of the driver core.

> > > > Here's one way this might work for PCI; note that we can do this
> > > > entirely in the bus driver for PCI.  Bind/unbind would go like this:
> > > > 
> > > > # bind device to vfio-pci
> > > > echo vfio-pci > /sys/bus/pci/devices/0000\:03\:00.0/preferred_driver
> > > > echo 0000:03:00.0 > /sys/bus/pci/devices/0000\:03\:00.0/driver/unbind
> > > > echo 0000:03:00.0 > /sys/bus/pci/drivers_probe
> > > > 
> > > > # bind device back to host driver
> > > > echo > /sys/bus/pci/devices/0000\:03\:00.0/preferred_driver
> > > > echo 0000:03:00.0 > /sys/bus/pci/devices/0000\:03\:00.0/driver/unbind
> > > > echo 0000:03:00.0 > /sys/bus/pci/drivers_probe
> 
> With the null-write to preferred_driver, it's not crystal clear (to
> me at least) what would happen in the above command sequence, given
> multiple drivers may match.  It seems like there'd be more control
> binding in a multiple driver-match environment using
> {device,driver}->sysfs_bind_only.

The null-write says there is no preferred driver and existing matching
rules apply.  The device starts out with no preferred_driver.  Notice
that we never added any new ids to drivers, we made the device match the
driver, then we cleared it.  Thanks,

Alex


^ permalink raw reply	[flat|nested] 92+ messages in thread

* Re: mechanism to allow a driver to bind to any device
@ 2014-03-31 23:52                                                         ` Alex Williamson
  0 siblings, 0 replies; 92+ messages in thread
From: Alex Williamson @ 2014-03-31 23:52 UTC (permalink / raw)
  To: Kim Phillips
  Cc: kvm-u79uwXL29TY76Z2rM5mHXA, jan.kiszka-kv7WeFo6aLtBDgjK7y7TUQ,
	will.deacon-5wv7dgnIgG8, stuart.yoder-KZfg59tc24xl57MIdRCFDg,
	a.rigo-lrHrjnjw1UfHK3s98zE1ajGjJy/sRE9J, mhocko-AlSwsSmVLrQ,
	bhelgaas-hpIqsD4AKlfQT0dZR+AlfA,
	Varun.Sethi-KZfg59tc24xl57MIdRCFDg,
	kvmarm-FPEHb7Xf0XXUo1n7N8X6UoWGPAHP3yOg,
	rafael.j.wysocki-ral2JQCrhuEAvxtiuMwx3w, agraf-l3A5Bk7waGM,
	linux-0h96xk9xTtrk1uMJSBkQmQ, d.kasatkin-Sze3O3UU22JBDgjK7y7TUQ,
	joe-6d6DIl74uiNBDgjK7y7TUQ, scottwood-KZfg59tc24xl57MIdRCFDg,
	a.motakis-lrHrjnjw1UfHK3s98zE1ajGjJy/sRE9J,
	tech-lrHrjnjw1UfHK3s98zE1ajGjJy/sRE9J,
	kim.phillips-KZfg59tc24xl57MIdRCFDg, toshi.kani-VXdhtT5mjnY,
	gregkh-hQyY1W1yCW8ekmWlsbkhG0B+6BGkLq7r,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	tj-DgEjT+Ai2ygdnm+yROfE0A,
	christoffer.dall-QSEj5FYQhm4dnm+yROfE0A

On Mon, 2014-03-31 at 17:36 -0500, Kim Phillips wrote:
> On Fri, 28 Mar 2014 11:10:23 -0600
> Alex Williamson <alex.williamson-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org> wrote:
> 
> > On Fri, 2014-03-28 at 12:58 -0400, Konrad Rzeszutek Wilk wrote:
> > > On Wed, Mar 26, 2014 at 04:09:21PM -0600, Alex Williamson wrote:
> > > > On Wed, 2014-03-26 at 10:21 -0600, Alex Williamson wrote:
> > > > > On Wed, 2014-03-26 at 23:06 +0800, Alexander Graf wrote:
> > > > > > 
> > > > > > > Am 26.03.2014 um 22:40 schrieb Konrad Rzeszutek Wilk <konrad.wilk-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org>:
> > > > > > > 
> > > > > > >> On Wed, Mar 26, 2014 at 01:40:32AM +0000, Stuart Yoder wrote:
> > > > > > >> Hi Greg,
> > > > > > >> 
> > > > > > >> We (Linaro, Freescale, Virtual Open Systems) are trying get an issue
> > > > > > >> closed that has been perculating for a while around creating a mechanism
> > > > > > >> that will allow kernel drivers like vfio can bind to devices of any type.
> > > > > > >> 
> > > > > > >> This thread with you:
> > > > > > >> http://www.spinics.net/lists/kvm-arm/msg08370.html
> > > > > > >> ...seems to have died out, so am trying to get your response
> > > > > > >> and will summarize again.  Vfio drivers in the kernel (regardless of
> > > > > > >> bus type) need to bind to devices of any type.  The driver's function
> > > > > > >> is to simply export hardware resources of any type to user space.
> > > > > > >> 
> > > > > > >> There are several approaches that have been proposed:
> > > > > > > 
> > > > > > > You seem to have missed the one I proposed.
> > > > > > >> 
> > > > > > >>   1.  new_id -- (current approach) the user explicitly registers
> > > > > > >>       each new device type with the vfio driver using the new_id
> > > > > > >>       mechanism.
> > > > > > >> 
> > > > > > >>       Problem: multiple drivers will be resident that handle the
> > > > > > >>       same device type...and there is nothing user space hotplug
> > > > > > >>       infrastructure can do to help.
> > > > > > >> 
> > > > > > >>   2.  "any id" -- the vfio driver could specify a wildcard match
> > > > > > >>       of some kind in its ID match table which would allow it to
> > > > > > >>       match and bind to any possible device id.  However,
> > > > > > >>       we don't want the vfio driver grabbing _all_ devices...just the ones we
> > > > > > >>       explicitly want to pass to user space.
> > > > > > >> 
> > > > > > >>       The proposed patch to support this was to create a new flag
> > > > > > >>       "sysfs_bind_only" in struct device_driver.  When this flag
> > > > > > >>       is set, the driver can only bind to devices via the sysfs
> > > > > > >>       bind file.  This would allow the wildcard match to work.
> > > > > > >> 
> > > > > > >>       Patch is here:
> > > > > > >>       https://lkml.org/lkml/2013/12/3/253
> > > > > > >> 
> > > > > > >>   3.  "Driver initiated explicit bind" -- with this approach the
> > > > > > >>       vfio driver would create a private 'bind' sysfs object
> > > > > > >>       and the user would echo the requested device into it:
> > > > > > >> 
> > > > > > >>       echo 0001:03:00.0 > /sys/bus/pci/drivers/vfio-pci/vfio_bind
> > > > > > >> 
> > > > > > >>       In order to make that work, the driver would need to call
> > > > > > >>       driver_probe_device() and thus we need this patch:
> > > > > > >>       https://lkml.org/lkml/2014/2/8/175
> > > > > > > 
> > > > > > > 4). Use the 'unbind' (from the original device) and 'bind' to vfio driver.
> > > > > > 
> > > > > > This is approach 2, no?
> > > > > > 
> > > > > > > 
> > > > > > > Which I think is what is currently being done. Why is that not sufficient?
> > > > > > 
> > > > > > How would 'bind to vfio driver' look like?
> > > > > > 
> > > > > > > The only thing I see in the URL is " That works, but it is ugly."
> > > > > > > There is some mention of race but I don't see how - if you do the 'unbind'
> > > > > > > on the original driver and then bind the BDF to the VFIO how would you get
> > > > > > > a race?
> > > > > > 
> > > > > > Typically on PCI, you do a
> > > > > > 
> > > > > >   - add wildcard (pci id) match to vfio driver
> > > > > >   - unbind driver
> > > > > >   -> reprobe
> > > > > >   -> device attaches to vfio driver because it is the least recent match
> > > > > >   - remove wildcard match from vfio driver
> > > > > > 
> > > > > > If in between you hotplug add a card of the same type, it gets attached to vfio - even though the logical "default driver" would be the device specific driver.
> > > > > 
> > > > > I've mentioned drivers_autoprobe in the past, but I'm not sure we're
> > > > > really factoring it into the discussion.  drivers_autoprobe allows us to
> > > > > toggle two points:
> > > > > 
> > > > > a) When a new device is added whether we automatically give drivers a
> > > > > try at binding to it
> > > > > 
> > > > > b) When a new driver is added whether it gets to try to bind to anything
> > > > > in the system
> > > > > 
> > > > > So we do have a mechanism to avoid the race, but the problem is that it
> > > > > becomes the responsibility of userspace to:
> > > > > 
> > > > > 1) turn off drivers_autoprobe
> > > > > 2) unbind/new_id/bind/remove_id
> > > > > 3) turn on drivers_autoprobe
> > > > > 4) call drivers_probe for anything added between 1) & 3)
> > > > > 
> > > > > Is the question about the ugliness of the current solution whether it's
> > > > > unreasonable to ask userspace to do this?
> > > > > What we seem to be asking for above is more like an autoprobe flag per
> > > > > driver where there's some way for this special driver to opt out of auto
> > > > > probing.  Option 2. in Stuart's list does this by short-cutting ID
> > > > > matching so that a "match" is only found when using the sysfs bind path,
> > > > > option 3. enables a way for a driver to expose their own sysfs entry
> > > > > point for binding.  The latter feels particularly chaotic since drivers
> > > > > get to make-up their own bind mechanism.
> 
> agreed - so far, option 2 looks the most sane.
> 
> > > > > Another twist I'll throw in is that devices can be hot added to IOMMU
> > > > > groups that are in-use by userspace.  When that happens we'd like to be
> > > > > able to disable driver autoprobe of the device to avoid a host driver
> > > > > automatically binding to the device.  I wonder if instead of looking at
> > > > > the problem from the driver perspective, if we were to instead look at
> > > > > it from the device perspective if we might find a solution that would
> > > > > address both.  For instance, if devices had a driver_probe_id property
> > > > > that was by default set to their bus specific ID match ("$VENDOR
> > > > > $DEVICE" on PCI) could we use that to write new match IDs so that a
> > > > > device could only bind to a given driver?  Effectively we could then
> > > > > bind either using the current method of adding to the list of IDs a
> > > > > driver will match of changing the ID that a device would match.  Does
> > > > > that get us anywhere?  Thanks,
> 
> How does this compare to Scott's device->sysfs_bind_only, in addition
> to option 2 above's driver->sysfs_bind_only?:
> 
> "What it looks like we do still want from the driver core is the ability
> for a driver to say that it should not be bound to a device except via
> explicit sysfs bind, and the ability for a user to say that a device
> should not be bound to a driver except via explicit sysfs bind.  This is
> a separate issue from making driver_match_device() happy (in some
> earlier e-mails in the thread these two issues were not properly
> separated)." [1]

Sorry, I can't find reference to how device->sysfs_bind_only works in
conjunction with driver->sysfs_bind_only.  Can you provide some example
use cases?

As it stands, the driver->sysfs_bind_only patch of option 2 forces a
driver to operate in either the existing mode or a mode where there is
no automatic binding.  That breaks existing vfio-pci users today who are
able to build the driver static into their kernel and use vfio-pci.ids=
$VENDOR:$DEVICE on the kernel commandline so that vfio-pci grabs their
devices before any loadable module drivers.  It also doesn't address the
issue above where a device is hot-added to an IOMMU group and we may
want to have the device auto-bind to the vfio driver, or at the very
least not auto-bind to a host driver.  Maybe the device portion of
sysfs_bind_only addresses that.

All of the original proposals above are working on the premise that we
add an id or enable an "any id" for a driver, but then we need to
prevent the driver from binding to others of that id, which just makes a
mess.  So why not reverse it and allow a device to specify an id that
matches a driver?  That automatically solves the many-to-one problem of
device-to-driver since we're only setting a property on a single device.
We also no longer care if the device gets bound automatically or via
sysfs because it can only go to the correct driver.  So we don't need a
restriction like sysfs-only.  There's no modal operation of the driver,
it's just a new match rule.  It can also be implemented at the bus
driver, completely independent of the driver core.

> > > > Here's one way this might work for PCI; note that we can do this
> > > > entirely in the bus driver for PCI.  Bind/unbind would go like this:
> > > > 
> > > > # bind device to vfio-pci
> > > > echo vfio-pci > /sys/bus/pci/devices/0000\:03\:00.0/preferred_driver
> > > > echo 0000:03:00.0 > /sys/bus/pci/devices/0000\:03\:00.0/driver/unbind
> > > > echo 0000:03:00.0 > /sys/bus/pci/drivers_probe
> > > > 
> > > > # bind device back to host driver
> > > > echo > /sys/bus/pci/devices/0000\:03\:00.0/preferred_driver
> > > > echo 0000:03:00.0 > /sys/bus/pci/devices/0000\:03\:00.0/driver/unbind
> > > > echo 0000:03:00.0 > /sys/bus/pci/drivers_probe
> 
> With the null-write to preferred_driver, it's not crystal clear (to
> me at least) what would happen in the above command sequence, given
> multiple drivers may match.  It seems like there'd be more control
> binding in a multiple driver-match environment using
> {device,driver}->sysfs_bind_only.

The null-write says there is no preferred driver and existing matching
rules apply.  The device starts out with no preferred_driver.  Notice
that we never added any new ids to drivers, we made the device match the
driver, then we cleared it.  Thanks,

Alex

^ permalink raw reply	[flat|nested] 92+ messages in thread

* RE: [RFC PATCH v4 01/10] driver core: export driver_probe_device()
@ 2014-03-06 22:31 Konrad Rzeszutek Wilk
  0 siblings, 0 replies; 92+ messages in thread
From: Konrad Rzeszutek Wilk @ 2014-03-06 22:31 UTC (permalink / raw)
  To: Stuart Yoder
  Cc: kvm-u79uwXL29TY76Z2rM5mHXA, jan.kiszka-kv7WeFo6aLtBDgjK7y7TUQ,
	will.deacon-5wv7dgnIgG8, linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	Michal Hocko, Bjorn Helgaas, Varun Sethi,
	kvmarm-FPEHb7Xf0XXUo1n7N8X6UoWGPAHP3yOg, Rafael J. Wysocki,
	agraf-l3A5Bk7waGM, Guenter Roeck, Dmitry Kasatkin, Tejun Heo,
	Scott Wood, Antonios Motakis,
	tech-lrHrjnjw1UfHK3s98zE1ajGjJy/sRE9J, Toshi Kani, Greg KH,
	a.rigo-lrHrjnjw1UfHK3s98zE1ajGjJy/sRE9J,


On Mar 6, 2014 5:25 PM, Stuart Yoder <stuart.yoder@freescale.com> wrote:
>
>
>
> > -----Original Message----- 
> > From: Greg KH [mailto:gregkh@linuxfoundation.org] 
> > Sent: Thursday, February 20, 2014 4:44 PM 
> > To: Yoder Stuart-B08248 
> > Cc: Antonios Motakis; alex.williamson@redhat.com; 
> > kvmarm@lists.cs.columbia.edu; iommu@lists.linux-foundation.org; linux- 
> > kernel@vger.kernel.org; tech@virtualopensystems.com; 
> > a.rigo@virtualopensystems.com; kim.phillips@linaro.org; 
> > jan.kiszka@siemens.com; kvm@vger.kernel.org; Bhushan Bharat-R65777; Wood 
> > Scott-B07421; christoffer.dall@linaro.org; agraf@suse.de; Sethi Varun- 
> > B16395; will.deacon@arm.com; Tejun Heo; Rafael J. Wysocki; Guenter Roeck; 
> > Toshi Kani; Joe Perches; Dmitry Kasatkin; Michal Hocko; Bjorn Helgaas 
> > Subject: Re: [RFC PATCH v4 01/10] driver core: export 
> > driver_probe_device() 
> > 
> > On Thu, Feb 20, 2014 at 10:34:35PM +0000, Stuart Yoder wrote: 
> > > 
> > > 
> > > > -----Original Message----- 
> > > > From: Yoder Stuart-B08248 
> > > > Sent: Saturday, February 15, 2014 12:19 PM 
> > > > To: 'Greg KH' 
> > > > Cc: Antonios Motakis; alex.williamson@redhat.com; 
> > > > kvmarm@lists.cs.columbia.edu; iommu@lists.linux-foundation.org; 
> > linux- 
> > > > kernel@vger.kernel.org; tech@virtualopensystems.com; 
> > > > a.rigo@virtualopensystems.com; kim.phillips@linaro.org; 
> > > > jan.kiszka@siemens.com; kvm@vger.kernel.org; Bhushan Bharat-R65777; 
> > Wood 
> > > > Scott-B07421; christoffer.dall@linaro.org; agraf@suse.de; Sethi 
> > Varun- 
> > > > B16395; will.deacon@arm.com; Tejun Heo; Rafael J. Wysocki; Guenter 
> > Roeck; 
> > > > Toshi Kani; Joe Perches; Dmitry Kasatkin; Michal Hocko; Bjorn Helgaas 
> > > > Subject: RE: [RFC PATCH v4 01/10] driver core: export 
> > > > driver_probe_device() 
> > > > 
> > > > 
> > > > 
> > > > > -----Original Message----- 
> > > > > From: Greg KH [mailto:gregkh@linuxfoundation.org] 
> > > > > Sent: Saturday, February 15, 2014 11:34 AM 
> > > > > To: Yoder Stuart-B08248 
> > > > > Cc: Antonios Motakis; alex.williamson@redhat.com; 
> > > > > kvmarm@lists.cs.columbia.edu; iommu@lists.linux-foundation.org; 
> > linux- 
> > > > > kernel@vger.kernel.org; tech@virtualopensystems.com; 
> > > > > a.rigo@virtualopensystems.com; kim.phillips@linaro.org; 
> > > > > jan.kiszka@siemens.com; kvm@vger.kernel.org; Bhushan Bharat-R65777; 
> > > > Wood 
> > > > > Scott-B07421; christoffer.dall@linaro.org; agraf@suse.de; Sethi 
> > Varun- 
> > > > > B16395; will.deacon@arm.com; Tejun Heo; Rafael J. Wysocki; Guenter 
> > > > Roeck; 
> > > > > Toshi Kani; Joe Perches; Dmitry Kasatkin; Michal Hocko; Bjorn 
> > Helgaas 
> > > > > Subject: Re: [RFC PATCH v4 01/10] driver core: export 
> > > > > driver_probe_device() 
> > > > > 
> > > > > On Sat, Feb 15, 2014 at 04:33:44PM +0000, Stuart Yoder wrote: 
> > > > > > > > Why?  driver_probe_device() allows a driver to explicitly 
> > bind 
> > > > > > > > to a specific device.   What is conceptually wrong with 
> > allowing 
> > > > > > > > that? 
> > > > > > > 
> > > > > > > Because that's not how a bus should work, and the fact that no 
> > > > other 
> > > > > > > subsystem in the kernel does that might be a hint you are 
> > trying to 
> > > > > do 
> > > > > > > something a bit "wrong" here. 
> > > > > > 
> > > > > > Let me try to succinctly as I can describe the problem we are 
> > trying 
> > > > to 
> > > > > > solve here... 
> > > > > > 
> > > > > > The vfio mechanism in the kernel (e.g. vfio-pci) allows devices 
> > to be 
> > > > > > exposed user space (via file descriptors), enabling user space 
> > > > > > drivers.  So, for example to export an e1000 card to user space, 
> > I do 
> > > > > > this: 
> > > > > > 
> > > > > >    echo 0001:03:00.0 > 
> > > > /sys/bus/pci/devices/0001:03:00.0/driver/unbind 
> > > > > >    echo 8086 10d3 > /sys/bus/pci/drivers/vfio-pci/new_id 
> > > > > 
> > > > > What's wrong with using the "bind" file instead?  That picks a 
> > specific 
> > > > > device and binds it to a specific driver.  Or have we been down 
> > this 
> > > > > path before?  :) 
> > > > 
> > > > Yes we have :)  The "bind" file does not bypass device ID checks, so 
> > > > it wouldn't work without new_id or a wildcard match of some kind. 
> > > > 
> > > > > And that is for a PCI "driver" not a totally separate bus, which it 
> > > > > looks like you are wanting to do here. 
> > > > 
> > > > vfio-pci is a PCI driver, not a bus (drivers/vfio/pci/vfio_pci.c). 
> > > > 
> > > > > > The first step unbinds the target device (0001:03:00.0) from the 
> > > > normal 
> > > > > > e1000 driver. 
> > > > > > 
> > > > > > The second step causes the vfio-pci driver to bind to device 
> > > > > 0001:03:00.0. 
> > > > > > This second step tells vfio-pci that it now handles e1000 device 
> > IDs, 
> > > > > > and the vfio-pci drivers registers with the PCI bus to handle 
> > '8086 
> > > > > 10d3'. 
> > > > > > 
> > > > > > That works, but it is ugly.  We now have 2 active drivers 
> > handling 
> > > > > > the same device type...which introduces various possible race 
> > > > > conditions. 
> > > > > > 
> > > > > > We never want vfio-pci to auto-bind to any new device that shows 
> > up 
> > > > > > on the PCI bus.  Binding a device to vfio-pci must be an explicit 
> > > > > > action by an administrator. 
> > > > > 
> > > > > Then use the "bind" file. 
> > > > 
> > > > See above. 
> > > > 
> > > > > > You mentioned previously that user space can sort out the problem 
> > > > > > of multiple drivers registered for handling the same device type. 
> > > > > > That is true, but doesn't help here.   We don't want vfio-pci 
> > > > > > to handle _all_ e1000 cards, just explicitly selected e1000 
> > cards. 
> > > > > > 
> > > > > > We want the normal e1000 driver to be loaded and to bind to new 
> > > > > > devices that may be hot-plugged. 
> > > > > 
> > > > > I want a pony too... 
> > > > 
> > > > It's not that difficult...this patch accomplishes it by 
> > > > simply allowing drivers to call driver_probe_device(). 
> > > > 
> > > > > > There are 2 proposed mechanisms that have been put forth, both of 
> > > > > > which you have now rejected: 
> > > > > > 
> > > > > >    1.  sysfs_bind_only flag was proposed which would allow a vfio 
> > > > > >        driver (like vfio-pci) to only bind by explicit request 
> > > > through 
> > > > > >        the sysfs 'bind' file. 
> > > > > 
> > > > > Why did I reject this?  What did the patch look like? 
> > > > 
> > > > https://lkml.org/lkml/2013/12/3/253 
> > > > 
> > > > 
> > > > > >    2.  Have the vfio driver call driver_probe_device() to 
> > explicitly 
> > > > > bind 
> > > > > >        a particular device instance to the driver.  Only change 
> > we 
> > > > need 
> > > > > >        here is the EXPORT_SYMBOL. 
> > > > > 
> > > > > How are you going to prevent the driver from being bound to the 
> > device 
> > > > > in the core with this change?  How are you going to call this 
> > function? 
> > > > > When?  On what action of the user? 
> > > > 
> > > > The vfio-pci driver would create a sysfs object "vfio_bind". 
> > > > 
> > > > User would do: 
> > > >    echo 0001:03:00.0 > /sys/bus/pci/drivers/vfio-pci/vfio_bind 
> > > > 
> > > > vfio-pci would call driver_probe_device() which binds 
> > > > the specific device to the vfio-pci driver...and there is 
> > > > no ambiguity. 
> > > > 
> > > > > > Are you in principle opposed to any mechanism that would allow 2 
> > > > > drivers 
> > > > > > to be resident/active and allow a sysadmin to explicitly bind a 
> > > > > > particular device instance to the driver of their choice? 
> > > > > 
> > > > > No, that works today with the bind/unbind/new_id files, it's just 
> > that 
> > > > > you don't like it :) 
> > > > 
> > > > We don't like it because of the ambiguities/race-conditions with 
> > > > the current situation. 
> > > > 
> > > > A vfio driver, like vfio-pci, certainly is a bit different than a 
> > normal 
> > > > driver, in that it really is not device ID aware.  It simply passes 
> > > > through device resources (mappable regions, IRQs) to user space 
> > without 
> > > > interpreting or understanding them.  It is kind of a "meta" driver, 
> > but 
> > > > it is not a bus.  Every bus type would need its own vfio driver to 
> > > > do this type of device pass through. 
> > > 
> > > Hi Greg, 
> > > 
> > > Any further thoughts on this? 
> > 
> > Sorry, been swamped with other patches and stable stuff and not had a 
> > time to look at it.  Give me a few days... 
>
> Hi Greg, wanted to ping you on this again... 
>
> I know some days have gone by, so let me summarize the issue-- vfio 
> drivers in the kernel (regardless of bus type) need to bind to 
> devices of any type.   There seem to be 3 approaches: 
>
>    1.  new_id -- (current approach) the user explicitly registers 
>        each new device type with the vfio driver using the new_id 
>        mechanism. 
>
>        Problem: multiple drivers will be resident that handle the 
>        same device type...and there is nothing user space hotplug 
>        infrastructure can do to help. 
>
>    2.  "any id" -- the vfio driver could specify a wildcard match of 
>        some kind so that it can bind to any possible device id.  However, 
>        we don't want vfio grabbing all devices...just the ones we 
>        explicitly want to pass to user space. 
>
>        Proposed patch to support this was to create a new flag 
>        "sysfs_bind_only" in struct device_driver.  When this flag 
>        is set, the driver can only bind to devices via the sysfs 
>        bind file.  This would allow the wildcard match to work. 
>
>        Patch is here: 
>        https://lkml.org/lkml/2013/12/3/253 
>
>    3.  Driver initiated explicit bind -- with this approach the 
>        vfio driver would create a private 'bind' sysfs object 
>        and the user would echo the requested device into it: 
>
>        echo 0001:03:00.0 > /sys/bus/pci/drivers/vfio-pci/vfio_bind 
>
>        In order to make that work, the driver would need to call 
>        driver_probe_device() and thus we need this patch: 
>        https://lkml.org/lkml/2014/2/8/175 

There is a forth way. You can to the devices using BDF. And if you want to do it at startup you can have an 'hide' parameter that will assign to itself is he devices and the be the owner of said devices.
>
>
> Thanks, 
> Stuart 
>
>
>
>
> _______________________________________________ 
> iommu mailing list 
> iommu@lists.linux-foundation.org 
> https://lists.linuxfoundation.org/mailman/listinfo/iommu 
_______________________________________________
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

^ permalink raw reply	[flat|nested] 92+ messages in thread

* RE: [RFC PATCH v4 01/10] driver core: export driver_probe_device()
@ 2014-03-06 22:31 Konrad Rzeszutek Wilk
  0 siblings, 0 replies; 92+ messages in thread
From: Konrad Rzeszutek Wilk @ 2014-03-06 22:31 UTC (permalink / raw)
  To: Stuart Yoder
  Cc: kvm-u79uwXL29TY76Z2rM5mHXA, jan.kiszka-kv7WeFo6aLtBDgjK7y7TUQ,
	will.deacon-5wv7dgnIgG8, linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	Michal Hocko, Bjorn Helgaas, Varun Sethi,
	kvmarm-FPEHb7Xf0XXUo1n7N8X6UoWGPAHP3yOg, Rafael J. Wysocki,
	agraf-l3A5Bk7waGM, Guenter Roeck, Dmitry Kasatkin, Tejun Heo,
	Scott Wood, Antonios Motakis,
	tech-lrHrjnjw1UfHK3s98zE1ajGjJy/sRE9J, Toshi Kani, Greg KH,
	a.rigo-lrHrjnjw1UfHK3s98zE1ajGjJy/sRE9J,


On Mar 6, 2014 5:25 PM, Stuart Yoder <stuart.yoder@freescale.com> wrote:
>
>
>
> > -----Original Message----- 
> > From: Greg KH [mailto:gregkh@linuxfoundation.org] 
> > Sent: Thursday, February 20, 2014 4:44 PM 
> > To: Yoder Stuart-B08248 
> > Cc: Antonios Motakis; alex.williamson@redhat.com; 
> > kvmarm@lists.cs.columbia.edu; iommu@lists.linux-foundation.org; linux- 
> > kernel@vger.kernel.org; tech@virtualopensystems.com; 
> > a.rigo@virtualopensystems.com; kim.phillips@linaro.org; 
> > jan.kiszka@siemens.com; kvm@vger.kernel.org; Bhushan Bharat-R65777; Wood 
> > Scott-B07421; christoffer.dall@linaro.org; agraf@suse.de; Sethi Varun- 
> > B16395; will.deacon@arm.com; Tejun Heo; Rafael J. Wysocki; Guenter Roeck; 
> > Toshi Kani; Joe Perches; Dmitry Kasatkin; Michal Hocko; Bjorn Helgaas 
> > Subject: Re: [RFC PATCH v4 01/10] driver core: export 
> > driver_probe_device() 
> > 
> > On Thu, Feb 20, 2014 at 10:34:35PM +0000, Stuart Yoder wrote: 
> > > 
> > > 
> > > > -----Original Message----- 
> > > > From: Yoder Stuart-B08248 
> > > > Sent: Saturday, February 15, 2014 12:19 PM 
> > > > To: 'Greg KH' 
> > > > Cc: Antonios Motakis; alex.williamson@redhat.com; 
> > > > kvmarm@lists.cs.columbia.edu; iommu@lists.linux-foundation.org; 
> > linux- 
> > > > kernel@vger.kernel.org; tech@virtualopensystems.com; 
> > > > a.rigo@virtualopensystems.com; kim.phillips@linaro.org; 
> > > > jan.kiszka@siemens.com; kvm@vger.kernel.org; Bhushan Bharat-R65777; 
> > Wood 
> > > > Scott-B07421; christoffer.dall@linaro.org; agraf@suse.de; Sethi 
> > Varun- 
> > > > B16395; will.deacon@arm.com; Tejun Heo; Rafael J. Wysocki; Guenter 
> > Roeck; 
> > > > Toshi Kani; Joe Perches; Dmitry Kasatkin; Michal Hocko; Bjorn Helgaas 
> > > > Subject: RE: [RFC PATCH v4 01/10] driver core: export 
> > > > driver_probe_device() 
> > > > 
> > > > 
> > > > 
> > > > > -----Original Message----- 
> > > > > From: Greg KH [mailto:gregkh@linuxfoundation.org] 
> > > > > Sent: Saturday, February 15, 2014 11:34 AM 
> > > > > To: Yoder Stuart-B08248 
> > > > > Cc: Antonios Motakis; alex.williamson@redhat.com; 
> > > > > kvmarm@lists.cs.columbia.edu; iommu@lists.linux-foundation.org; 
> > linux- 
> > > > > kernel@vger.kernel.org; tech@virtualopensystems.com; 
> > > > > a.rigo@virtualopensystems.com; kim.phillips@linaro.org; 
> > > > > jan.kiszka@siemens.com; kvm@vger.kernel.org; Bhushan Bharat-R65777; 
> > > > Wood 
> > > > > Scott-B07421; christoffer.dall@linaro.org; agraf@suse.de; Sethi 
> > Varun- 
> > > > > B16395; will.deacon@arm.com; Tejun Heo; Rafael J. Wysocki; Guenter 
> > > > Roeck; 
> > > > > Toshi Kani; Joe Perches; Dmitry Kasatkin; Michal Hocko; Bjorn 
> > Helgaas 
> > > > > Subject: Re: [RFC PATCH v4 01/10] driver core: export 
> > > > > driver_probe_device() 
> > > > > 
> > > > > On Sat, Feb 15, 2014 at 04:33:44PM +0000, Stuart Yoder wrote: 
> > > > > > > > Why?  driver_probe_device() allows a driver to explicitly 
> > bind 
> > > > > > > > to a specific device.   What is conceptually wrong with 
> > allowing 
> > > > > > > > that? 
> > > > > > > 
> > > > > > > Because that's not how a bus should work, and the fact that no 
> > > > other 
> > > > > > > subsystem in the kernel does that might be a hint you are 
> > trying to 
> > > > > do 
> > > > > > > something a bit "wrong" here. 
> > > > > > 
> > > > > > Let me try to succinctly as I can describe the problem we are 
> > trying 
> > > > to 
> > > > > > solve here... 
> > > > > > 
> > > > > > The vfio mechanism in the kernel (e.g. vfio-pci) allows devices 
> > to be 
> > > > > > exposed user space (via file descriptors), enabling user space 
> > > > > > drivers.  So, for example to export an e1000 card to user space, 
> > I do 
> > > > > > this: 
> > > > > > 
> > > > > >    echo 0001:03:00.0 > 
> > > > /sys/bus/pci/devices/0001:03:00.0/driver/unbind 
> > > > > >    echo 8086 10d3 > /sys/bus/pci/drivers/vfio-pci/new_id 
> > > > > 
> > > > > What's wrong with using the "bind" file instead?  That picks a 
> > specific 
> > > > > device and binds it to a specific driver.  Or have we been down 
> > this 
> > > > > path before?  :) 
> > > > 
> > > > Yes we have :)  The "bind" file does not bypass device ID checks, so 
> > > > it wouldn't work without new_id or a wildcard match of some kind. 
> > > > 
> > > > > And that is for a PCI "driver" not a totally separate bus, which it 
> > > > > looks like you are wanting to do here. 
> > > > 
> > > > vfio-pci is a PCI driver, not a bus (drivers/vfio/pci/vfio_pci.c). 
> > > > 
> > > > > > The first step unbinds the target device (0001:03:00.0) from the 
> > > > normal 
> > > > > > e1000 driver. 
> > > > > > 
> > > > > > The second step causes the vfio-pci driver to bind to device 
> > > > > 0001:03:00.0. 
> > > > > > This second step tells vfio-pci that it now handles e1000 device 
> > IDs, 
> > > > > > and the vfio-pci drivers registers with the PCI bus to handle 
> > '8086 
> > > > > 10d3'. 
> > > > > > 
> > > > > > That works, but it is ugly.  We now have 2 active drivers 
> > handling 
> > > > > > the same device type...which introduces various possible race 
> > > > > conditions. 
> > > > > > 
> > > > > > We never want vfio-pci to auto-bind to any new device that shows 
> > up 
> > > > > > on the PCI bus.  Binding a device to vfio-pci must be an explicit 
> > > > > > action by an administrator. 
> > > > > 
> > > > > Then use the "bind" file. 
> > > > 
> > > > See above. 
> > > > 
> > > > > > You mentioned previously that user space can sort out the problem 
> > > > > > of multiple drivers registered for handling the same device type. 
> > > > > > That is true, but doesn't help here.   We don't want vfio-pci 
> > > > > > to handle _all_ e1000 cards, just explicitly selected e1000 
> > cards. 
> > > > > > 
> > > > > > We want the normal e1000 driver to be loaded and to bind to new 
> > > > > > devices that may be hot-plugged. 
> > > > > 
> > > > > I want a pony too... 
> > > > 
> > > > It's not that difficult...this patch accomplishes it by 
> > > > simply allowing drivers to call driver_probe_device(). 
> > > > 
> > > > > > There are 2 proposed mechanisms that have been put forth, both of 
> > > > > > which you have now rejected: 
> > > > > > 
> > > > > >    1.  sysfs_bind_only flag was proposed which would allow a vfio 
> > > > > >        driver (like vfio-pci) to only bind by explicit request 
> > > > through 
> > > > > >        the sysfs 'bind' file. 
> > > > > 
> > > > > Why did I reject this?  What did the patch look like? 
> > > > 
> > > > https://lkml.org/lkml/2013/12/3/253 
> > > > 
> > > > 
> > > > > >    2.  Have the vfio driver call driver_probe_device() to 
> > explicitly 
> > > > > bind 
> > > > > >        a particular device instance to the driver.  Only change 
> > we 
> > > > need 
> > > > > >        here is the EXPORT_SYMBOL. 
> > > > > 
> > > > > How are you going to prevent the driver from being bound to the 
> > device 
> > > > > in the core with this change?  How are you going to call this 
> > function? 
> > > > > When?  On what action of the user? 
> > > > 
> > > > The vfio-pci driver would create a sysfs object "vfio_bind". 
> > > > 
> > > > User would do: 
> > > >    echo 0001:03:00.0 > /sys/bus/pci/drivers/vfio-pci/vfio_bind 
> > > > 
> > > > vfio-pci would call driver_probe_device() which binds 
> > > > the specific device to the vfio-pci driver...and there is 
> > > > no ambiguity. 
> > > > 
> > > > > > Are you in principle opposed to any mechanism that would allow 2 
> > > > > drivers 
> > > > > > to be resident/active and allow a sysadmin to explicitly bind a 
> > > > > > particular device instance to the driver of their choice? 
> > > > > 
> > > > > No, that works today with the bind/unbind/new_id files, it's just 
> > that 
> > > > > you don't like it :) 
> > > > 
> > > > We don't like it because of the ambiguities/race-conditions with 
> > > > the current situation. 
> > > > 
> > > > A vfio driver, like vfio-pci, certainly is a bit different than a 
> > normal 
> > > > driver, in that it really is not device ID aware.  It simply passes 
> > > > through device resources (mappable regions, IRQs) to user space 
> > without 
> > > > interpreting or understanding them.  It is kind of a "meta" driver, 
> > but 
> > > > it is not a bus.  Every bus type would need its own vfio driver to 
> > > > do this type of device pass through. 
> > > 
> > > Hi Greg, 
> > > 
> > > Any further thoughts on this? 
> > 
> > Sorry, been swamped with other patches and stable stuff and not had a 
> > time to look at it.  Give me a few days... 
>
> Hi Greg, wanted to ping you on this again... 
>
> I know some days have gone by, so let me summarize the issue-- vfio 
> drivers in the kernel (regardless of bus type) need to bind to 
> devices of any type.   There seem to be 3 approaches: 
>
>    1.  new_id -- (current approach) the user explicitly registers 
>        each new device type with the vfio driver using the new_id 
>        mechanism. 
>
>        Problem: multiple drivers will be resident that handle the 
>        same device type...and there is nothing user space hotplug 
>        infrastructure can do to help. 
>
>    2.  "any id" -- the vfio driver could specify a wildcard match of 
>        some kind so that it can bind to any possible device id.  However, 
>        we don't want vfio grabbing all devices...just the ones we 
>        explicitly want to pass to user space. 
>
>        Proposed patch to support this was to create a new flag 
>        "sysfs_bind_only" in struct device_driver.  When this flag 
>        is set, the driver can only bind to devices via the sysfs 
>        bind file.  This would allow the wildcard match to work. 
>
>        Patch is here: 
>        https://lkml.org/lkml/2013/12/3/253 
>
>    3.  Driver initiated explicit bind -- with this approach the 
>        vfio driver would create a private 'bind' sysfs object 
>        and the user would echo the requested device into it: 
>
>        echo 0001:03:00.0 > /sys/bus/pci/drivers/vfio-pci/vfio_bind 
>
>        In order to make that work, the driver would need to call 
>        driver_probe_device() and thus we need this patch: 
>        https://lkml.org/lkml/2014/2/8/175 

There is a forth way. You can to the devices using BDF. And if you want to do it at startup you can have an 'hide' parameter that will assign to itself is he devices and the be the owner of said devices.
>
>
> Thanks, 
> Stuart 
>
>
>
>
> _______________________________________________ 
> iommu mailing list 
> iommu@lists.linux-foundation.org 
> https://lists.linuxfoundation.org/mailman/listinfo/iommu 
_______________________________________________
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

^ permalink raw reply	[flat|nested] 92+ messages in thread

end of thread, other threads:[~2014-03-31 23:52 UTC | newest]

Thread overview: 92+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2014-02-08 17:29 [RFC PATCH v4 00/10] VFIO support for platform devices Antonios Motakis
2014-02-08 17:29 ` Antonios Motakis
2014-02-08 17:29 ` [RFC PATCH v4 01/10] driver core: export driver_probe_device() Antonios Motakis
2014-02-08 17:29   ` Antonios Motakis
2014-02-14 22:27   ` Greg KH
2014-02-14 22:27     ` Greg KH
     [not found]     ` <20140214222716.GA11838-U8xfFu+wG4EAvxtiuMwx3w@public.gmane.org>
2014-02-14 23:00       ` Stuart Yoder
     [not found]         ` <ba7597fd8c9f4d91bbccfb42e31a165e-ufbTtyGzTTT8GZusEWM6WuO6mTEJWrR4XA4E9RH9d+qIuWR1G4zioA@public.gmane.org>
2014-02-15  2:47           ` Greg KH
2014-02-15  2:47             ` Greg KH
     [not found]             ` <20140215024725.GA2542-U8xfFu+wG4EAvxtiuMwx3w@public.gmane.org>
2014-02-15 16:33               ` Stuart Yoder
2014-02-15 16:33                 ` Stuart Yoder
     [not found]                 ` <7043e1edd9974de590dcb392cd8aff14-ufbTtyGzTTT8GZusEWM6WuO6mTEJWrR4XA4E9RH9d+qIuWR1G4zioA@public.gmane.org>
2014-02-15 17:33                   ` Greg KH
2014-02-15 17:33                     ` Greg KH
     [not found]                     ` <20140215173348.GA8056-U8xfFu+wG4EAvxtiuMwx3w@public.gmane.org>
2014-02-15 18:19                       ` Stuart Yoder
2014-02-15 18:19                         ` Stuart Yoder
     [not found]                         ` <38f0473542954fe8b312a1f7b61a3d21-ufbTtyGzTTT8GZusEWM6WuO6mTEJWrR4XA4E9RH9d+qIuWR1G4zioA@public.gmane.org>
2014-02-18  0:38                           ` Scott Wood
2014-02-18  0:38                             ` Scott Wood
2014-02-20 22:34                     ` Stuart Yoder
2014-02-20 22:34                       ` Stuart Yoder
     [not found]                       ` <b6374a0f30194969ba4622ff2f58ae65-ufbTtyGzTTT8GZusEWM6WuO6mTEJWrR4XA4E9RH9d+qIuWR1G4zioA@public.gmane.org>
2014-02-20 22:43                         ` Greg KH
2014-02-20 22:43                           ` Greg KH
     [not found]                           ` <20140220224337.GA20097-U8xfFu+wG4EAvxtiuMwx3w@public.gmane.org>
2014-03-06 22:25                             ` Stuart Yoder
2014-03-06 22:25                               ` Stuart Yoder
2014-03-26  1:40                           ` mechanism to allow a driver to bind to any device Stuart Yoder
2014-03-26  1:40                             ` Stuart Yoder
     [not found]                             ` <54cd150235ba4954becdd12f725c5ebd-ufbTtyGzTTT8GZusEWM6WuO6mTEJWrR4XA4E9RH9d+qIuWR1G4zioA@public.gmane.org>
2014-03-26 14:40                               ` Konrad Rzeszutek Wilk
     [not found]                                 ` <20140326144025.GA18387-6K5HmflnPlqSPmnEAIUT9EEOCMrvLtNR@public.gmane.org>
2014-03-26 15:06                                   ` Alexander Graf
     [not found]                                     ` <D45FC8F2-7807-4BBB-A253-8EFCD091D6BD-l3A5Bk7waGM@public.gmane.org>
2014-03-26 16:21                                       ` Alex Williamson
     [not found]                                         ` <1395850862.632.247.camel-85EaTFmN5p//9pzu0YdTqQ@public.gmane.org>
2014-03-26 16:32                                           ` Konrad Rzeszutek Wilk
2014-03-26 16:32                                             ` Konrad Rzeszutek Wilk
     [not found]                                             ` <20140326163209.GB21368-6K5HmflnPlqSPmnEAIUT9EEOCMrvLtNR@public.gmane.org>
2014-03-26 16:49                                               ` Alex Williamson
2014-03-26 16:49                                                 ` Alex Williamson
     [not found]                                                 ` <1395852592.632.253.camel-85EaTFmN5p//9pzu0YdTqQ@public.gmane.org>
2014-03-26 17:04                                                   ` Konrad Rzeszutek Wilk
2014-03-26 17:04                                                     ` Konrad Rzeszutek Wilk
     [not found]                                                     ` <20140326170406.GA22902-6K5HmflnPlqSPmnEAIUT9EEOCMrvLtNR@public.gmane.org>
2014-03-26 17:26                                                       ` Alex Williamson
2014-03-26 17:26                                                         ` Alex Williamson
2014-03-26 17:51                                               ` Stuart Yoder
2014-03-26 22:09                                           ` Alex Williamson
     [not found]                                             ` <1395871761.632.316.camel-85EaTFmN5p//9pzu0YdTqQ@public.gmane.org>
2014-03-28 16:58                                               ` Konrad Rzeszutek Wilk
2014-03-28 16:58                                                 ` Konrad Rzeszutek Wilk
     [not found]                                                 ` <20140328165809.GA12659-6K5HmflnPlqSPmnEAIUT9EEOCMrvLtNR@public.gmane.org>
2014-03-28 17:10                                                   ` Alex Williamson
2014-03-28 17:10                                                     ` Alex Williamson
2014-03-31 22:36                                                     ` Kim Phillips
2014-03-31 22:36                                                       ` Kim Phillips
2014-03-31 23:52                                                       ` Alex Williamson
2014-03-31 23:52                                                         ` Alex Williamson
2014-03-31 18:47                                             ` Stuart Yoder
2014-03-31 18:47                                               ` Stuart Yoder
     [not found]                                               ` <7d1b495cdb6a415e8d3b7f60f409991c-ufbTtyGzTTT8GZusEWM6WuO6mTEJWrR4XA4E9RH9d+qIuWR1G4zioA@public.gmane.org>
2014-03-31 19:47                                                 ` Greg KH
     [not found]                                                   ` <20140331194705.GA13014-U8xfFu+wG4EAvxtiuMwx3w@public.gmane.org>
2014-03-31 20:23                                                     ` Stuart Yoder
2014-03-31 22:32                                                       ` Kim Phillips
2014-03-31 22:32                                                         ` Kim Phillips
2014-03-31 18:32                                           ` Stuart Yoder
2014-03-31 18:32                                             ` Stuart Yoder
2014-03-26 16:24                                       ` Konrad Rzeszutek Wilk
2014-03-26 15:32                                   ` Stuart Yoder
2014-03-26 21:39                               ` Antonios Motakis
2014-03-26 21:39                                 ` Antonios Motakis
     [not found]                                 ` <CAG8rG2xCvCGJWwZTnkia5GD3BVJZB9SmKOm79T6Q1FnhgB+urw-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2014-03-28  6:59                                   ` Greg KH
2014-03-28  6:59                                     ` Greg KH
     [not found]                                     ` <20140328065942.GB14619-U8xfFu+wG4EAvxtiuMwx3w@public.gmane.org>
2014-03-31 18:21                                       ` Stuart Yoder
2014-03-26 21:42                               ` Antonios Motakis
2014-03-26 21:42                                 ` Antonios Motakis
2014-02-08 17:29 ` [RFC PATCH v4 02/10] VFIO_IOMMU_TYPE1: Introduce the VFIO_DMA_MAP_FLAG_EXEC flag Antonios Motakis
2014-02-08 17:29   ` Antonios Motakis
2014-02-10 20:04   ` Alex Williamson
2014-02-10 20:04     ` Alex Williamson
2014-02-08 17:29 ` [RFC PATCH v4 03/10] VFIO_IOMMU_TYPE1: workaround to build for platform devices Antonios Motakis
2014-02-08 17:29   ` Antonios Motakis
2014-02-08 17:29 ` [RFC PATCH v4 04/10] VFIO_PLATFORM: Initial skeleton of VFIO support " Antonios Motakis
2014-02-08 17:29 ` [RFC PATCH v4 05/10] VFIO_PLATFORM: Return info for device and its memory mapped IO regions Antonios Motakis
2014-02-08 17:29   ` Antonios Motakis
2014-02-10 22:32   ` Alex Williamson
2014-02-10 22:32     ` Alex Williamson
2014-02-08 17:29 ` [RFC PATCH v4 06/10] VFIO_PLATFORM: Read and write support for the device fd Antonios Motakis
2014-02-08 17:29   ` Antonios Motakis
2014-02-10 22:45   ` Alex Williamson
2014-02-10 22:45     ` Alex Williamson
2014-02-10 23:12     ` Scott Wood
2014-02-10 23:12       ` Scott Wood
2014-02-10 23:20       ` Alex Williamson
2014-02-10 23:20         ` Alex Williamson
2014-02-08 17:29 ` [RFC PATCH v4 07/10] VFIO_PLATFORM: Support MMAP of MMIO regions Antonios Motakis
2014-02-08 17:29   ` Antonios Motakis
2014-02-08 17:29 ` [RFC PATCH v4 08/10] VFIO_PLATFORM: Return IRQ info Antonios Motakis
2014-02-08 17:29   ` Antonios Motakis
2014-02-08 17:29 ` [RFC PATCH v4 09/10] VFIO_PLATFORM: Initial interrupts support Antonios Motakis
2014-02-08 17:29   ` Antonios Motakis
2014-02-08 17:29 ` [RFC PATCH v4 10/10] VFIO_PLATFORM: Support for maskable and automasked interrupts Antonios Motakis
2014-02-08 17:29   ` Antonios Motakis
2014-03-06 22:31 [RFC PATCH v4 01/10] driver core: export driver_probe_device() Konrad Rzeszutek Wilk
2014-03-06 22:31 Konrad Rzeszutek Wilk

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.