All of lore.kernel.org
 help / color / mirror / Atom feed
* [RFC PATCH 0/3] VFIO: Report IOMMU fault event to userspace
@ 2017-02-19 14:47 Lan Tianyu
  2017-02-19 14:47 ` [RFC PATCH 1/3] VFIO: Add new cmd to receive eventfd from userspace to notify IOMMU fault event Lan Tianyu
                   ` (3 more replies)
  0 siblings, 4 replies; 27+ messages in thread
From: Lan Tianyu @ 2017-02-19 14:47 UTC (permalink / raw)
  To: kvm
  Cc: Lan Tianyu, kevin.tian, mst, jan.kiszka, jasowang, peterx, david,
	alex.williamson, yi.l.liu

This patchset proposes a solution to report hardware IOMMU fault
event to userspace. Motivation is to make vIOMMU in Qemu get physical
fault event and info from IOMMU hardware. vIOMMU is in charge of transforming
fault info and inject to guest. The feature is also very important to
support first level translation(Translation request with PASID) in VM
which requires vIOMMU to inject device page request to VM.   

Introduce two new VFIO cmd for VFIO IOMMU type1 driver
- VFIO_IOMMU_SET_FAULT_EVENT_FD
  Receive eventfd from user space and trigger eventfd when get fault event
from IOMMU.  

- VFIO_IOMMU_GET_FAULT_INFO 
  Return IOMMU fault info to userspace.

VFIO IOMMU Type1 driver needs to register IOMMU fault event notifier
to IOMMU driver for assigned devices when they are attached to VFIO
container. IOMMU driver will run VFIO fault event callback when hardware
IOMMU triggers fault events for devices assigned to VFIO container. Then,
type1 driver signals eventfd provided by userspace and usrspace gets fault
info via new cmd.

IOMMU interface are still in design stage and so we still can't register
IOMMU fault event notifier to IOMMU driver. This patchset is prototype
code to check whether VFIO IOMMU type1 driver is suitable place to deal
with IOMMU fault event and just passes build test.

Very appreciate for comments.

Lan Tianyu (3):
  VFIO: Add new cmd to receive eventfd from userspace to notify IOMMU
    fault event
  VFIO: Add IOMMU fault notifier callback
  VFIO: Add new cmd for user space to get IOMMU fault info

 drivers/vfio/vfio_iommu_type1.c | 106 ++++++++++++++++++++++++++++++++++++++++
 include/linux/iommu.h           |   7 +++
 include/uapi/linux/vfio.h       |  37 ++++++++++++++
 3 files changed, 150 insertions(+)

-- 
1.8.3.1

^ permalink raw reply	[flat|nested] 27+ messages in thread

* [RFC PATCH 1/3] VFIO: Add new cmd to receive eventfd from userspace to notify IOMMU fault event
  2017-02-19 14:47 [RFC PATCH 0/3] VFIO: Report IOMMU fault event to userspace Lan Tianyu
@ 2017-02-19 14:47 ` Lan Tianyu
  2017-02-20 20:53   ` Alex Williamson
  2017-02-21  5:48   ` Michael S. Tsirkin
  2017-02-19 14:47 ` [RFC PATCH 2/3] VFIO: Add IOMMU fault notifier callback Lan Tianyu
                   ` (2 subsequent siblings)
  3 siblings, 2 replies; 27+ messages in thread
From: Lan Tianyu @ 2017-02-19 14:47 UTC (permalink / raw)
  To: kvm
  Cc: Lan Tianyu, kevin.tian, mst, jan.kiszka, jasowang, peterx, david,
	alex.williamson, yi.l.liu

This patch is to receive IOMMU fault eventfd and IOMMU fault event flag
from userspace. VFIO should register IOMMU fault event handler according
fault event flag which designates what kind of fault events need to be
reported.  When VFIO get IOMMU fault event from IOMMU driver, it should
notify userspace via received fd.

Signed-off-by: Lan Tianyu <tianyu.lan@intel.com>
---
 drivers/vfio/vfio_iommu_type1.c | 45 +++++++++++++++++++++++++++++++++++++++++
 include/uapi/linux/vfio.h       | 15 ++++++++++++++
 2 files changed, 60 insertions(+)

diff --git a/drivers/vfio/vfio_iommu_type1.c b/drivers/vfio/vfio_iommu_type1.c
index b3cc33f..46674ea 100644
--- a/drivers/vfio/vfio_iommu_type1.c
+++ b/drivers/vfio/vfio_iommu_type1.c
@@ -38,6 +38,7 @@
 #include <linux/workqueue.h>
 #include <linux/mdev.h>
 #include <linux/notifier.h>
+#include <linux/eventfd.h>
 
 #define DRIVER_VERSION  "0.2"
 #define DRIVER_AUTHOR   "Alex Williamson <alex.williamson@redhat.com>"
@@ -61,6 +62,8 @@ struct vfio_iommu {
 	struct mutex		lock;
 	struct rb_root		dma_list;
 	struct blocking_notifier_head notifier;
+	struct eventfd_ctx	*iommu_fault_fd;
+	struct mutex            fault_lock;
 	bool			v2;
 	bool			nesting;
 };
@@ -1452,6 +1455,7 @@ static void *vfio_iommu_type1_open(unsigned long arg)
 	INIT_LIST_HEAD(&iommu->domain_list);
 	iommu->dma_list = RB_ROOT;
 	mutex_init(&iommu->lock);
+	mutex_init(&iommu->fault_lock);
 	BLOCKING_INIT_NOTIFIER_HEAD(&iommu->notifier);
 
 	return iommu;
@@ -1582,6 +1586,47 @@ static long vfio_iommu_type1_ioctl(void *iommu_data,
 
 		return copy_to_user((void __user *)arg, &unmap, minsz) ?
 			-EFAULT : 0;
+	} else if (cmd == VFIO_IOMMU_SET_FAULT_EVENTFD) {
+		struct vfio_iommu_type1_set_fault_eventfd eventfd;
+		int fd;
+		int ret;
+
+		minsz = offsetofend(struct vfio_iommu_type1_set_fault_eventfd,
+				    fd);
+
+		if (copy_from_user(&eventfd, (void __user *)arg, minsz))
+			return -EFAULT;
+
+		if (eventfd.argsz < minsz)
+			return -EINVAL;
+
+
+		mutex_lock(&iommu->fault_lock);
+
+		fd = eventfd.fd;
+		if (fd == -1) {
+			if (iommu->iommu_fault_fd)
+				eventfd_ctx_put(iommu->iommu_fault_fd);
+			iommu->iommu_fault_fd = NULL;
+			ret = 0;
+		} else if (fd >= 0) {
+			struct eventfd_ctx *ctx;
+
+			ctx = eventfd_ctx_fdget(fd);
+			if (IS_ERR(ctx))
+				return PTR_ERR(ctx);
+
+			if (ctx)
+				eventfd_ctx_put(ctx);
+
+			iommu->iommu_fault_fd = ctx;
+			ret = 0;
+		} else
+			ret = -EINVAL;
+
+		mutex_unlock(&iommu->fault_lock);
+
+		return ret;
 	}
 
 	return -ENOTTY;
diff --git a/include/uapi/linux/vfio.h b/include/uapi/linux/vfio.h
index 519eff3..8616334 100644
--- a/include/uapi/linux/vfio.h
+++ b/include/uapi/linux/vfio.h
@@ -547,6 +547,21 @@ struct vfio_iommu_type1_dma_unmap {
 #define VFIO_IOMMU_ENABLE	_IO(VFIO_TYPE, VFIO_BASE + 15)
 #define VFIO_IOMMU_DISABLE	_IO(VFIO_TYPE, VFIO_BASE + 16)
 
+/*
+ * VFIO_IOMMU_SET_FAULT_EVENT_FD	_IO(VFIO_TYPE, VFIO_BASE + 17)
+ *
+ * Receive eventfd from userspace to notify fault event from IOMMU.
+ */
+struct vfio_iommu_type1_set_fault_eventfd {
+	__u32	argsz;
+	__u32   flags;
+/* What IOMMU Fault events should be reported. */
+#define VFIO_IOMMU_UR_FAULT_WITHOUT_PASID (1 << 0)
+	__s32	fd;			/* Eventfd from user space */
+};
+
+#define VFIO_IOMMU_SET_FAULT_EVENTFD	_IO(VFIO_TYPE, VFIO_BASE + 17)
+
 /* -------- Additional API for SPAPR TCE (Server POWERPC) IOMMU -------- */
 
 /*
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [RFC PATCH 2/3] VFIO: Add IOMMU fault notifier callback
  2017-02-19 14:47 [RFC PATCH 0/3] VFIO: Report IOMMU fault event to userspace Lan Tianyu
  2017-02-19 14:47 ` [RFC PATCH 1/3] VFIO: Add new cmd to receive eventfd from userspace to notify IOMMU fault event Lan Tianyu
@ 2017-02-19 14:47 ` Lan Tianyu
  2017-02-20  2:58   ` Liu, Yi L
                     ` (2 more replies)
  2017-02-19 14:47 ` [RFC PATCH 3/3] VFIO: Add new cmd for user space to get IOMMU fault info Lan Tianyu
  2017-02-20 20:53 ` [RFC PATCH 0/3] VFIO: Report IOMMU fault event to userspace Alex Williamson
  3 siblings, 3 replies; 27+ messages in thread
From: Lan Tianyu @ 2017-02-19 14:47 UTC (permalink / raw)
  To: kvm
  Cc: Lan Tianyu, kevin.tian, mst, jan.kiszka, jasowang, peterx, david,
	alex.williamson, yi.l.liu

This patch is to add callback to handle fault event reported by
IOMMU driver. Callback stores fault into an array and notify userspace
via eventfd to read fault info.

Signed-off-by: Lan Tianyu <tianyu.lan@intel.com>
---
 drivers/vfio/vfio_iommu_type1.c | 30 ++++++++++++++++++++++++++++++
 include/linux/iommu.h           |  7 +++++++
 include/uapi/linux/vfio.h       |  7 +++++++
 3 files changed, 44 insertions(+)

diff --git a/drivers/vfio/vfio_iommu_type1.c b/drivers/vfio/vfio_iommu_type1.c
index 46674ea..dc434a3 100644
--- a/drivers/vfio/vfio_iommu_type1.c
+++ b/drivers/vfio/vfio_iommu_type1.c
@@ -56,6 +56,8 @@
 MODULE_PARM_DESC(disable_hugepages,
 		 "Disable VFIO IOMMU support for IOMMU hugepages.");
 
+#define NR_IOMMU_FAULT_INFO	10
+
 struct vfio_iommu {
 	struct list_head	domain_list;
 	struct vfio_domain	*external_domain; /* domain for external user */
@@ -64,6 +66,9 @@ struct vfio_iommu {
 	struct blocking_notifier_head notifier;
 	struct eventfd_ctx	*iommu_fault_fd;
 	struct mutex            fault_lock;
+	struct vfio_iommu_fault_info fault_info[NR_IOMMU_FAULT_INFO];
+	struct blocking_notifier_head iommu_fault_notifier;
+	u8			fault_count;
 	bool			v2;
 	bool			nesting;
 };
@@ -1456,6 +1461,7 @@ static void *vfio_iommu_type1_open(unsigned long arg)
 	iommu->dma_list = RB_ROOT;
 	mutex_init(&iommu->lock);
 	mutex_init(&iommu->fault_lock);
+	iommu->fault_count = 0;
 	BLOCKING_INIT_NOTIFIER_HEAD(&iommu->notifier);
 
 	return iommu;
@@ -1516,6 +1522,30 @@ static int vfio_domains_have_iommu_cache(struct vfio_iommu *iommu)
 	return ret;
 }
 
+static int vfio_iommu_fault_event_notifier(struct notifier_block *nb,
+					   struct iommu_fault_info *fault_info,
+					   void *data)
+{
+	struct vfio_iommu *iommu = data;
+	struct vfio_iommu_fault_info *info;
+
+	mutex_lock(&iommu->fault_lock);
+
+	info = &iommu->fault_info[iommu->fault_count];
+	info->addr = fault_info->addr;
+	info->sid = fault_info->sid;
+	info->fault_reason = fault_info->fault_reason;
+	info->is_write = fault_info->is_write;
+
+	iommu->fault_count++;
+
+	if (iommu->iommu_fault_fd)
+		eventfd_signal(iommu->iommu_fault_fd, 1);
+
+	mutex_unlock(&iommu->fault_lock);
+	return 0;
+}
+
 static long vfio_iommu_type1_ioctl(void *iommu_data,
 				   unsigned int cmd, unsigned long arg)
 {
diff --git a/include/linux/iommu.h b/include/linux/iommu.h
index 0ff5111..b6a7bdb 100644
--- a/include/linux/iommu.h
+++ b/include/linux/iommu.h
@@ -86,6 +86,13 @@ struct iommu_domain {
 	void *iova_cookie;
 };
 
+struct iommu_fault_info {
+	__u64	addr;
+	__u16   sid;
+	__u8    fault_reason;
+	__u8	is_write:1;
+};
+
 enum iommu_cap {
 	IOMMU_CAP_CACHE_COHERENCY,	/* IOMMU can enforce cache coherent DMA
 					   transactions */
diff --git a/include/uapi/linux/vfio.h b/include/uapi/linux/vfio.h
index 8616334..da359dd 100644
--- a/include/uapi/linux/vfio.h
+++ b/include/uapi/linux/vfio.h
@@ -562,6 +562,13 @@ struct vfio_iommu_type1_set_fault_eventfd {
 
 #define VFIO_IOMMU_SET_FAULT_EVENTFD	_IO(VFIO_TYPE, VFIO_BASE + 17)
 
+struct vfio_iommu_fault_info {
+	__u64	addr;
+	__u16   sid;
+	__u8    fault_reason;
+	__u8	is_write:1;
+};
+
 /* -------- Additional API for SPAPR TCE (Server POWERPC) IOMMU -------- */
 
 /*
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [RFC PATCH 3/3] VFIO: Add new cmd for user space to get IOMMU fault info
  2017-02-19 14:47 [RFC PATCH 0/3] VFIO: Report IOMMU fault event to userspace Lan Tianyu
  2017-02-19 14:47 ` [RFC PATCH 1/3] VFIO: Add new cmd to receive eventfd from userspace to notify IOMMU fault event Lan Tianyu
  2017-02-19 14:47 ` [RFC PATCH 2/3] VFIO: Add IOMMU fault notifier callback Lan Tianyu
@ 2017-02-19 14:47 ` Lan Tianyu
  2017-02-20 20:53   ` Alex Williamson
  2017-02-20 20:53 ` [RFC PATCH 0/3] VFIO: Report IOMMU fault event to userspace Alex Williamson
  3 siblings, 1 reply; 27+ messages in thread
From: Lan Tianyu @ 2017-02-19 14:47 UTC (permalink / raw)
  To: kvm
  Cc: Lan Tianyu, kevin.tian, mst, jan.kiszka, jasowang, peterx, david,
	alex.williamson, yi.l.liu

This patch is to introduce cmd VFIO_IOMMU_GET_FAULT_INFO cmd to return
fault info reported by IOMMU driver.

Signed-off-by: Lan Tianyu <tianyu.lan@intel.com>
---
 drivers/vfio/vfio_iommu_type1.c | 31 +++++++++++++++++++++++++++++++
 include/uapi/linux/vfio.h       | 15 +++++++++++++++
 2 files changed, 46 insertions(+)

diff --git a/drivers/vfio/vfio_iommu_type1.c b/drivers/vfio/vfio_iommu_type1.c
index dc434a3..58e6689 100644
--- a/drivers/vfio/vfio_iommu_type1.c
+++ b/drivers/vfio/vfio_iommu_type1.c
@@ -1657,6 +1657,37 @@ static long vfio_iommu_type1_ioctl(void *iommu_data,
 		mutex_unlock(&iommu->fault_lock);
 
 		return ret;
+	} else if (cmd == VFIO_IOMMU_GET_FAULT_INFO) {
+		struct vfio_iommu_type1_get_fault_info info;
+		int fault_size = sizeof(struct vfio_iommu_fault_info);
+		int ret;
+
+		minsz = offsetofend(struct vfio_iommu_type1_get_fault_info,
+				    count);
+		if (copy_from_user(&info, (void __user *)arg, minsz))
+			return -EFAULT;
+
+		if (info.argsz < minsz)
+			return -EINVAL;
+
+		mutex_lock(&iommu->fault_lock);
+		info.count = iommu->fault_count;
+
+		if (info.argsz < sizeof(info) +
+		    iommu->fault_count * fault_size)
+			ret = -ENOSPC;
+
+		if (copy_to_user((void __user *)arg, &info, minsz))
+			ret = -EFAULT;
+
+		if (!ret & !copy_to_user((void __user *)(arg + minsz),
+		    iommu->fault_info, info.count * fault_size))
+			iommu->fault_count = 0;
+		else if (ret != ENOSPC)
+			ret = -EFAULT;
+
+		mutex_unlock(&iommu->fault_lock);
+		return ret;
 	}
 
 	return -ENOTTY;
diff --git a/include/uapi/linux/vfio.h b/include/uapi/linux/vfio.h
index da359dd..e6fd86f 100644
--- a/include/uapi/linux/vfio.h
+++ b/include/uapi/linux/vfio.h
@@ -562,6 +562,11 @@ struct vfio_iommu_type1_set_fault_eventfd {
 
 #define VFIO_IOMMU_SET_FAULT_EVENTFD	_IO(VFIO_TYPE, VFIO_BASE + 17)
 
+/*
+ * VFIO_IOMMU_GET_FAULT_INFO		_IO(VFIO_TYPE, VFIO_BASE + 18)
+ *
+ * Return IOMMU fault info to userspace.
+ */
 struct vfio_iommu_fault_info {
 	__u64	addr;
 	__u16   sid;
@@ -569,6 +574,16 @@ struct vfio_iommu_fault_info {
 	__u8	is_write:1;
 };
 
+struct vfio_iommu_type1_get_fault_info {
+	__u32	argsz;
+	__u32   flags;
+	__u32	count;
+	struct vfio_iommu_fault_info fault_info[];
+};
+
+#define VFIO_IOMMU_GET_FAULT_INFO	_IO(VFIO_TYPE, VFIO_BASE + 18)
+
+
 /* -------- Additional API for SPAPR TCE (Server POWERPC) IOMMU -------- */
 
 /*
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 27+ messages in thread

* RE: [RFC PATCH 2/3] VFIO: Add IOMMU fault notifier callback
  2017-02-19 14:47 ` [RFC PATCH 2/3] VFIO: Add IOMMU fault notifier callback Lan Tianyu
@ 2017-02-20  2:58   ` Liu, Yi L
  2017-02-20 20:53   ` Alex Williamson
  2017-02-21  5:55   ` Michael S. Tsirkin
  2 siblings, 0 replies; 27+ messages in thread
From: Liu, Yi L @ 2017-02-20  2:58 UTC (permalink / raw)
  To: Lan, Tianyu, kvm
  Cc: Tian, Kevin, mst, jan.kiszka, jasowang, peterx, david,
	alex.williamson, Pan, Jacob jun, Raj, Ashok, Liu, Yi L

Loop Jacob and Ashok.

> -----Original Message-----
> From: Lan, Tianyu
> Sent: Sunday, February 19, 2017 10:47 PM
> To: kvm@vger.kernel.org
> Cc: Lan, Tianyu <tianyu.lan@intel.com>; Tian, Kevin <kevin.tian@intel.com>;
> mst@redhat.com; jan.kiszka@siemens.com; jasowang@redhat.com;
> peterx@redhat.com; david@gibson.dropbear.id.au;
> alex.williamson@redhat.com; Liu, Yi L <yi.l.liu@intel.com>
> Subject: [RFC PATCH 2/3] VFIO: Add IOMMU fault notifier callback
> 
> This patch is to add callback to handle fault event reported by IOMMU driver.
> Callback stores fault into an array and notify userspace via eventfd to read fault
> info.
> 
> Signed-off-by: Lan Tianyu <tianyu.lan@intel.com>
> ---
>  drivers/vfio/vfio_iommu_type1.c | 30 ++++++++++++++++++++++++++++++
>  include/linux/iommu.h           |  7 +++++++
>  include/uapi/linux/vfio.h       |  7 +++++++
>  3 files changed, 44 insertions(+)
> 
> diff --git a/drivers/vfio/vfio_iommu_type1.c
> b/drivers/vfio/vfio_iommu_type1.c index 46674ea..dc434a3 100644
> --- a/drivers/vfio/vfio_iommu_type1.c
> +++ b/drivers/vfio/vfio_iommu_type1.c
> @@ -56,6 +56,8 @@
>  MODULE_PARM_DESC(disable_hugepages,
>  		 "Disable VFIO IOMMU support for IOMMU hugepages.");
> 
> +#define NR_IOMMU_FAULT_INFO	10
> +
>  struct vfio_iommu {
>  	struct list_head	domain_list;
>  	struct vfio_domain	*external_domain; /* domain for external user
> */
> @@ -64,6 +66,9 @@ struct vfio_iommu {
>  	struct blocking_notifier_head notifier;
>  	struct eventfd_ctx	*iommu_fault_fd;
>  	struct mutex            fault_lock;
> +	struct vfio_iommu_fault_info fault_info[NR_IOMMU_FAULT_INFO];
> +	struct blocking_notifier_head iommu_fault_notifier;
> +	u8			fault_count;
>  	bool			v2;
>  	bool			nesting;
>  };
> @@ -1456,6 +1461,7 @@ static void *vfio_iommu_type1_open(unsigned long
> arg)
>  	iommu->dma_list = RB_ROOT;
>  	mutex_init(&iommu->lock);
>  	mutex_init(&iommu->fault_lock);
> +	iommu->fault_count = 0;
>  	BLOCKING_INIT_NOTIFIER_HEAD(&iommu->notifier);
> 
>  	return iommu;
> @@ -1516,6 +1522,30 @@ static int vfio_domains_have_iommu_cache(struct
> vfio_iommu *iommu)
>  	return ret;
>  }
> 
> +static int vfio_iommu_fault_event_notifier(struct notifier_block *nb,
> +					   struct iommu_fault_info *fault_info,
> +					   void *data)
> +{
> +	struct vfio_iommu *iommu = data;
> +	struct vfio_iommu_fault_info *info;
> +
> +	mutex_lock(&iommu->fault_lock);
> +
> +	info = &iommu->fault_info[iommu->fault_count];
> +	info->addr = fault_info->addr;
> +	info->sid = fault_info->sid;
> +	info->fault_reason = fault_info->fault_reason;
> +	info->is_write = fault_info->is_write;
> +
> +	iommu->fault_count++;
> +
> +	if (iommu->iommu_fault_fd)
> +		eventfd_signal(iommu->iommu_fault_fd, 1);
> +
> +	mutex_unlock(&iommu->fault_lock);
> +	return 0;
> +}
> +
>  static long vfio_iommu_type1_ioctl(void *iommu_data,
>  				   unsigned int cmd, unsigned long arg)  { diff --git
> a/include/linux/iommu.h b/include/linux/iommu.h index 0ff5111..b6a7bdb
> 100644
> --- a/include/linux/iommu.h
> +++ b/include/linux/iommu.h
> @@ -86,6 +86,13 @@ struct iommu_domain {
>  	void *iova_cookie;
>  };
> 
> +struct iommu_fault_info {
> +	__u64	addr;
> +	__u16   sid;
> +	__u8    fault_reason;
> +	__u8	is_write:1;
> +};
> +
>  enum iommu_cap {
>  	IOMMU_CAP_CACHE_COHERENCY,	/* IOMMU can enforce cache
> coherent DMA
>  					   transactions */
> diff --git a/include/uapi/linux/vfio.h b/include/uapi/linux/vfio.h index
> 8616334..da359dd 100644
> --- a/include/uapi/linux/vfio.h
> +++ b/include/uapi/linux/vfio.h
> @@ -562,6 +562,13 @@ struct vfio_iommu_type1_set_fault_eventfd {
> 
>  #define VFIO_IOMMU_SET_FAULT_EVENTFD	_IO(VFIO_TYPE, VFIO_BASE + 17)
> 
> +struct vfio_iommu_fault_info {
> +	__u64	addr;
> +	__u16   sid;
> +	__u8    fault_reason;
> +	__u8	is_write:1;
> +};
> +
>  /* -------- Additional API for SPAPR TCE (Server POWERPC) IOMMU -------- */
> 
>  /*
> --
> 1.8.3.1

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [RFC PATCH 0/3] VFIO: Report IOMMU fault event to userspace
  2017-02-19 14:47 [RFC PATCH 0/3] VFIO: Report IOMMU fault event to userspace Lan Tianyu
                   ` (2 preceding siblings ...)
  2017-02-19 14:47 ` [RFC PATCH 3/3] VFIO: Add new cmd for user space to get IOMMU fault info Lan Tianyu
@ 2017-02-20 20:53 ` Alex Williamson
  2017-02-21  4:49   ` Lan Tianyu
  3 siblings, 1 reply; 27+ messages in thread
From: Alex Williamson @ 2017-02-20 20:53 UTC (permalink / raw)
  To: Lan Tianyu
  Cc: kvm, kevin.tian, mst, jan.kiszka, jasowang, peterx, david, yi.l.liu

On Sun, 19 Feb 2017 22:47:06 +0800
Lan Tianyu <tianyu.lan@intel.com> wrote:

> This patchset proposes a solution to report hardware IOMMU fault
> event to userspace. Motivation is to make vIOMMU in Qemu get physical
> fault event and info from IOMMU hardware. vIOMMU is in charge of transforming
> fault info and inject to guest. The feature is also very important to
> support first level translation(Translation request with PASID) in VM
> which requires vIOMMU to inject device page request to VM.   

Is the type1 IOMMU backend even a valid target here?  vIOMMU support
with type1 is functional, but not really usable except for very
specific cases.  Type1 is not designed for and not well suited to
dynamic DMA mapping in a VM, which means that anything other than
iommu=pt or assigning the device to a VM user or l2 guest is going to
have horrible performance.  Faulting in mappings, as seems to be
enabled here, sort of implies dynmaic mapping, right?  Our container
based accounting falls apart for type1 backing a vIOMMU as well.  Each
device lives in its own address space and therefore vfio container,
which means that management of vIOMMU assigned device VMs needs to
account for locked memory limits per device.  This quickly opens a
large vulnerability if a VM can lock multiples of the VM memory size.

So is what you're developing here just a proof of concept using type1,
or is this something that you feel actually has value long term?
 
> Introduce two new VFIO cmd for VFIO IOMMU type1 driver
> - VFIO_IOMMU_SET_FAULT_EVENT_FD
>   Receive eventfd from user space and trigger eventfd when get fault event
> from IOMMU.  
> 
> - VFIO_IOMMU_GET_FAULT_INFO 
>   Return IOMMU fault info to userspace.
> 
> VFIO IOMMU Type1 driver needs to register IOMMU fault event notifier
> to IOMMU driver for assigned devices when they are attached to VFIO
> container. IOMMU driver will run VFIO fault event callback when hardware
> IOMMU triggers fault events for devices assigned to VFIO container. Then,
> type1 driver signals eventfd provided by userspace and usrspace gets fault
> info via new cmd.
> 
> IOMMU interface are still in design stage and so we still can't register
> IOMMU fault event notifier to IOMMU driver. This patchset is prototype
> code to check whether VFIO IOMMU type1 driver is suitable place to deal
> with IOMMU fault event and just passes build test.
> 
> Very appreciate for comments.
> 
> Lan Tianyu (3):
>   VFIO: Add new cmd to receive eventfd from userspace to notify IOMMU
>     fault event
>   VFIO: Add IOMMU fault notifier callback
>   VFIO: Add new cmd for user space to get IOMMU fault info
> 
>  drivers/vfio/vfio_iommu_type1.c | 106 ++++++++++++++++++++++++++++++++++++++++
>  include/linux/iommu.h           |   7 +++
>  include/uapi/linux/vfio.h       |  37 ++++++++++++++
>  3 files changed, 150 insertions(+)
> 

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [RFC PATCH 1/3] VFIO: Add new cmd to receive eventfd from userspace to notify IOMMU fault event
  2017-02-19 14:47 ` [RFC PATCH 1/3] VFIO: Add new cmd to receive eventfd from userspace to notify IOMMU fault event Lan Tianyu
@ 2017-02-20 20:53   ` Alex Williamson
  2017-02-21  5:29     ` Lan Tianyu
  2017-02-21  5:48   ` Michael S. Tsirkin
  1 sibling, 1 reply; 27+ messages in thread
From: Alex Williamson @ 2017-02-20 20:53 UTC (permalink / raw)
  To: Lan Tianyu
  Cc: kvm, kevin.tian, mst, jan.kiszka, jasowang, peterx, david, yi.l.liu

On Sun, 19 Feb 2017 22:47:07 +0800
Lan Tianyu <tianyu.lan@intel.com> wrote:

> This patch is to receive IOMMU fault eventfd and IOMMU fault event flag
> from userspace. VFIO should register IOMMU fault event handler according
> fault event flag which designates what kind of fault events need to be
> reported.  When VFIO get IOMMU fault event from IOMMU driver, it should
> notify userspace via received fd.
> 
> Signed-off-by: Lan Tianyu <tianyu.lan@intel.com>
> ---
>  drivers/vfio/vfio_iommu_type1.c | 45 +++++++++++++++++++++++++++++++++++++++++
>  include/uapi/linux/vfio.h       | 15 ++++++++++++++
>  2 files changed, 60 insertions(+)
> 
> diff --git a/drivers/vfio/vfio_iommu_type1.c b/drivers/vfio/vfio_iommu_type1.c
> index b3cc33f..46674ea 100644
> --- a/drivers/vfio/vfio_iommu_type1.c
> +++ b/drivers/vfio/vfio_iommu_type1.c
> @@ -38,6 +38,7 @@
>  #include <linux/workqueue.h>
>  #include <linux/mdev.h>
>  #include <linux/notifier.h>
> +#include <linux/eventfd.h>
>  
>  #define DRIVER_VERSION  "0.2"
>  #define DRIVER_AUTHOR   "Alex Williamson <alex.williamson@redhat.com>"
> @@ -61,6 +62,8 @@ struct vfio_iommu {
>  	struct mutex		lock;
>  	struct rb_root		dma_list;
>  	struct blocking_notifier_head notifier;
> +	struct eventfd_ctx	*iommu_fault_fd;
> +	struct mutex            fault_lock;
>  	bool			v2;
>  	bool			nesting;
>  };
> @@ -1452,6 +1455,7 @@ static void *vfio_iommu_type1_open(unsigned long arg)
>  	INIT_LIST_HEAD(&iommu->domain_list);
>  	iommu->dma_list = RB_ROOT;
>  	mutex_init(&iommu->lock);
> +	mutex_init(&iommu->fault_lock);
>  	BLOCKING_INIT_NOTIFIER_HEAD(&iommu->notifier);
>  
>  	return iommu;
> @@ -1582,6 +1586,47 @@ static long vfio_iommu_type1_ioctl(void *iommu_data,
>  
>  		return copy_to_user((void __user *)arg, &unmap, minsz) ?
>  			-EFAULT : 0;
> +	} else if (cmd == VFIO_IOMMU_SET_FAULT_EVENTFD) {
> +		struct vfio_iommu_type1_set_fault_eventfd eventfd;
> +		int fd;
> +		int ret;
> +
> +		minsz = offsetofend(struct vfio_iommu_type1_set_fault_eventfd,
> +				    fd);
> +
> +		if (copy_from_user(&eventfd, (void __user *)arg, minsz))
> +			return -EFAULT;
> +
> +		if (eventfd.argsz < minsz)
> +			return -EINVAL;
> +
> +
> +		mutex_lock(&iommu->fault_lock);
> +
> +		fd = eventfd.fd;
> +		if (fd == -1) {
> +			if (iommu->iommu_fault_fd)
> +				eventfd_ctx_put(iommu->iommu_fault_fd);
> +			iommu->iommu_fault_fd = NULL;
> +			ret = 0;
> +		} else if (fd >= 0) {
> +			struct eventfd_ctx *ctx;
> +
> +			ctx = eventfd_ctx_fdget(fd);
> +			if (IS_ERR(ctx))
> +				return PTR_ERR(ctx);
> +
> +			if (ctx)
> +				eventfd_ctx_put(ctx);
> +
> +			iommu->iommu_fault_fd = ctx;
> +			ret = 0;
> +		} else
> +			ret = -EINVAL;
> +
> +		mutex_unlock(&iommu->fault_lock);
> +
> +		return ret;
>  	}
>  
>  	return -ENOTTY;
> diff --git a/include/uapi/linux/vfio.h b/include/uapi/linux/vfio.h
> index 519eff3..8616334 100644
> --- a/include/uapi/linux/vfio.h
> +++ b/include/uapi/linux/vfio.h
> @@ -547,6 +547,21 @@ struct vfio_iommu_type1_dma_unmap {
>  #define VFIO_IOMMU_ENABLE	_IO(VFIO_TYPE, VFIO_BASE + 15)
>  #define VFIO_IOMMU_DISABLE	_IO(VFIO_TYPE, VFIO_BASE + 16)
>  
> +/*
> + * VFIO_IOMMU_SET_FAULT_EVENT_FD	_IO(VFIO_TYPE, VFIO_BASE + 17)
> + *
> + * Receive eventfd from userspace to notify fault event from IOMMU.
> + */
> +struct vfio_iommu_type1_set_fault_eventfd {
> +	__u32	argsz;
> +	__u32   flags;
> +/* What IOMMU Fault events should be reported. */
> +#define VFIO_IOMMU_UR_FAULT_WITHOUT_PASID (1 << 0)

Is this a filter?  Would multiple filters be allowed using the same
eventfd or would each type require a separate eventfd?  How would a
user be able to probe what flags/filters are available?  Should this
simply use the IRQ_INFO and SET_IRQ ioctls from the device file
descriptor for consistency?

> +	__s32	fd;			/* Eventfd from user space */
> +};
> +
> +#define VFIO_IOMMU_SET_FAULT_EVENTFD	_IO(VFIO_TYPE, VFIO_BASE + 17)
> +
>  /* -------- Additional API for SPAPR TCE (Server POWERPC) IOMMU -------- */
>  
>  /*

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [RFC PATCH 2/3] VFIO: Add IOMMU fault notifier callback
  2017-02-19 14:47 ` [RFC PATCH 2/3] VFIO: Add IOMMU fault notifier callback Lan Tianyu
  2017-02-20  2:58   ` Liu, Yi L
@ 2017-02-20 20:53   ` Alex Williamson
  2017-02-21  6:05     ` Lan Tianyu
  2017-02-21  5:55   ` Michael S. Tsirkin
  2 siblings, 1 reply; 27+ messages in thread
From: Alex Williamson @ 2017-02-20 20:53 UTC (permalink / raw)
  To: Lan Tianyu
  Cc: kvm, kevin.tian, mst, jan.kiszka, jasowang, peterx, david, yi.l.liu

On Sun, 19 Feb 2017 22:47:08 +0800
Lan Tianyu <tianyu.lan@intel.com> wrote:

> This patch is to add callback to handle fault event reported by
> IOMMU driver. Callback stores fault into an array and notify userspace
> via eventfd to read fault info.
> 
> Signed-off-by: Lan Tianyu <tianyu.lan@intel.com>
> ---
>  drivers/vfio/vfio_iommu_type1.c | 30 ++++++++++++++++++++++++++++++
>  include/linux/iommu.h           |  7 +++++++
>  include/uapi/linux/vfio.h       |  7 +++++++
>  3 files changed, 44 insertions(+)
> 
> diff --git a/drivers/vfio/vfio_iommu_type1.c b/drivers/vfio/vfio_iommu_type1.c
> index 46674ea..dc434a3 100644
> --- a/drivers/vfio/vfio_iommu_type1.c
> +++ b/drivers/vfio/vfio_iommu_type1.c
> @@ -56,6 +56,8 @@
>  MODULE_PARM_DESC(disable_hugepages,
>  		 "Disable VFIO IOMMU support for IOMMU hugepages.");
>  
> +#define NR_IOMMU_FAULT_INFO	10

How is this value determined?

> +
>  struct vfio_iommu {
>  	struct list_head	domain_list;
>  	struct vfio_domain	*external_domain; /* domain for external user */
> @@ -64,6 +66,9 @@ struct vfio_iommu {
>  	struct blocking_notifier_head notifier;
>  	struct eventfd_ctx	*iommu_fault_fd;
>  	struct mutex            fault_lock;
> +	struct vfio_iommu_fault_info fault_info[NR_IOMMU_FAULT_INFO];
> +	struct blocking_notifier_head iommu_fault_notifier;
> +	u8			fault_count;
>  	bool			v2;
>  	bool			nesting;
>  };
> @@ -1456,6 +1461,7 @@ static void *vfio_iommu_type1_open(unsigned long arg)
>  	iommu->dma_list = RB_ROOT;
>  	mutex_init(&iommu->lock);
>  	mutex_init(&iommu->fault_lock);
> +	iommu->fault_count = 0;
>  	BLOCKING_INIT_NOTIFIER_HEAD(&iommu->notifier);
>  
>  	return iommu;
> @@ -1516,6 +1522,30 @@ static int vfio_domains_have_iommu_cache(struct vfio_iommu *iommu)
>  	return ret;
>  }
>  
> +static int vfio_iommu_fault_event_notifier(struct notifier_block *nb,
> +					   struct iommu_fault_info *fault_info,
> +					   void *data)
> +{
> +	struct vfio_iommu *iommu = data;
> +	struct vfio_iommu_fault_info *info;
> +
> +	mutex_lock(&iommu->fault_lock);
> +
> +	info = &iommu->fault_info[iommu->fault_count];
> +	info->addr = fault_info->addr;
> +	info->sid = fault_info->sid;
> +	info->fault_reason = fault_info->fault_reason;
> +	info->is_write = fault_info->is_write;
> +
> +	iommu->fault_count++;
> +
> +	if (iommu->iommu_fault_fd)
> +		eventfd_signal(iommu->iommu_fault_fd, 1);
> +
> +	mutex_unlock(&iommu->fault_lock);
> +	return 0;
> +}


This notifier is never registered as any sort of IOMMU fault reporting
infrastructure is currently imaginary, and the iommu development list
isn't cc'd.

> +
>  static long vfio_iommu_type1_ioctl(void *iommu_data,
>  				   unsigned int cmd, unsigned long arg)
>  {
> diff --git a/include/linux/iommu.h b/include/linux/iommu.h
> index 0ff5111..b6a7bdb 100644
> --- a/include/linux/iommu.h
> +++ b/include/linux/iommu.h
> @@ -86,6 +86,13 @@ struct iommu_domain {
>  	void *iova_cookie;
>  };
>  
> +struct iommu_fault_info {
> +	__u64	addr;
> +	__u16   sid;
> +	__u8    fault_reason;
> +	__u8	is_write:1;
> +};


A common fault reporting structure would need to be agreed upon by all
parties, ie. everyone that current supporting the IOMMU API.  As used
in QEMU, the fault_reason here appears to be very VT-d specific.

> +
>  enum iommu_cap {
>  	IOMMU_CAP_CACHE_COHERENCY,	/* IOMMU can enforce cache coherent DMA
>  					   transactions */
> diff --git a/include/uapi/linux/vfio.h b/include/uapi/linux/vfio.h
> index 8616334..da359dd 100644
> --- a/include/uapi/linux/vfio.h
> +++ b/include/uapi/linux/vfio.h
> @@ -562,6 +562,13 @@ struct vfio_iommu_type1_set_fault_eventfd {
>  
>  #define VFIO_IOMMU_SET_FAULT_EVENTFD	_IO(VFIO_TYPE, VFIO_BASE + 17)
>  
> +struct vfio_iommu_fault_info {
> +	__u64	addr;
> +	__u16   sid;
> +	__u8    fault_reason;
> +	__u8	is_write:1;
> +};
> +
>  /* -------- Additional API for SPAPR TCE (Server POWERPC) IOMMU -------- */
>  
>  /*

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [RFC PATCH 3/3] VFIO: Add new cmd for user space to get IOMMU fault info
  2017-02-19 14:47 ` [RFC PATCH 3/3] VFIO: Add new cmd for user space to get IOMMU fault info Lan Tianyu
@ 2017-02-20 20:53   ` Alex Williamson
  0 siblings, 0 replies; 27+ messages in thread
From: Alex Williamson @ 2017-02-20 20:53 UTC (permalink / raw)
  To: Lan Tianyu
  Cc: kvm, kevin.tian, mst, jan.kiszka, jasowang, peterx, david, yi.l.liu

On Sun, 19 Feb 2017 22:47:09 +0800
Lan Tianyu <tianyu.lan@intel.com> wrote:

> This patch is to introduce cmd VFIO_IOMMU_GET_FAULT_INFO cmd to return
> fault info reported by IOMMU driver.
> 
> Signed-off-by: Lan Tianyu <tianyu.lan@intel.com>
> ---
>  drivers/vfio/vfio_iommu_type1.c | 31 +++++++++++++++++++++++++++++++
>  include/uapi/linux/vfio.h       | 15 +++++++++++++++
>  2 files changed, 46 insertions(+)
> 
> diff --git a/drivers/vfio/vfio_iommu_type1.c b/drivers/vfio/vfio_iommu_type1.c
> index dc434a3..58e6689 100644
> --- a/drivers/vfio/vfio_iommu_type1.c
> +++ b/drivers/vfio/vfio_iommu_type1.c
> @@ -1657,6 +1657,37 @@ static long vfio_iommu_type1_ioctl(void *iommu_data,
>  		mutex_unlock(&iommu->fault_lock);
>  
>  		return ret;
> +	} else if (cmd == VFIO_IOMMU_GET_FAULT_INFO) {
> +		struct vfio_iommu_type1_get_fault_info info;
> +		int fault_size = sizeof(struct vfio_iommu_fault_info);
> +		int ret;
> +
> +		minsz = offsetofend(struct vfio_iommu_type1_get_fault_info,
> +				    count);
> +		if (copy_from_user(&info, (void __user *)arg, minsz))
> +			return -EFAULT;
> +
> +		if (info.argsz < minsz)
> +			return -EINVAL;
> +
> +		mutex_lock(&iommu->fault_lock);
> +		info.count = iommu->fault_count;
> +
> +		if (info.argsz < sizeof(info) +
> +		    iommu->fault_count * fault_size)
> +			ret = -ENOSPC;
> +
> +		if (copy_to_user((void __user *)arg, &info, minsz))
> +			ret = -EFAULT;
> +
> +		if (!ret & !copy_to_user((void __user *)(arg + minsz),
> +		    iommu->fault_info, info.count * fault_size))
> +			iommu->fault_count = 0;
> +		else if (ret != ENOSPC)
> +			ret = -EFAULT;
> +
> +		mutex_unlock(&iommu->fault_lock);
> +		return ret;
>  	}
>  
>  	return -ENOTTY;
> diff --git a/include/uapi/linux/vfio.h b/include/uapi/linux/vfio.h
> index da359dd..e6fd86f 100644
> --- a/include/uapi/linux/vfio.h
> +++ b/include/uapi/linux/vfio.h
> @@ -562,6 +562,11 @@ struct vfio_iommu_type1_set_fault_eventfd {
>  
>  #define VFIO_IOMMU_SET_FAULT_EVENTFD	_IO(VFIO_TYPE, VFIO_BASE + 17)
>  
> +/*
> + * VFIO_IOMMU_GET_FAULT_INFO		_IO(VFIO_TYPE, VFIO_BASE + 18)
> + *
> + * Return IOMMU fault info to userspace.
> + */
>  struct vfio_iommu_fault_info {
>  	__u64	addr;
>  	__u16   sid;
> @@ -569,6 +574,16 @@ struct vfio_iommu_fault_info {
>  	__u8	is_write:1;
>  };
>  
> +struct vfio_iommu_type1_get_fault_info {
> +	__u32	argsz;
> +	__u32   flags;
> +	__u32	count;
> +	struct vfio_iommu_fault_info fault_info[];
> +};

I wonder if adding a region to the container would be better for
reporting this.  It could at least act more like hardware reports the
errors, ie. head and tail pointers at known offsets with split
ownership between the kernel and user, possibly the ability to mmap
the region address space directly into QEMU memory.

> +
> +#define VFIO_IOMMU_GET_FAULT_INFO	_IO(VFIO_TYPE, VFIO_BASE + 18)
> +
> +
>  /* -------- Additional API for SPAPR TCE (Server POWERPC) IOMMU -------- */
>  
>  /*

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [RFC PATCH 0/3] VFIO: Report IOMMU fault event to userspace
  2017-02-20 20:53 ` [RFC PATCH 0/3] VFIO: Report IOMMU fault event to userspace Alex Williamson
@ 2017-02-21  4:49   ` Lan Tianyu
  2017-02-21  5:29     ` Alex Williamson
  0 siblings, 1 reply; 27+ messages in thread
From: Lan Tianyu @ 2017-02-21  4:49 UTC (permalink / raw)
  To: Alex Williamson
  Cc: kvm, kevin.tian, mst, jan.kiszka, jasowang, peterx, david, yi.l.liu

On 2017年02月21日 04:53, Alex Williamson wrote:
> On Sun, 19 Feb 2017 22:47:06 +0800
> Lan Tianyu <tianyu.lan@intel.com> wrote:
> 
>> This patchset proposes a solution to report hardware IOMMU fault
>> event to userspace. Motivation is to make vIOMMU in Qemu get physical
>> fault event and info from IOMMU hardware. vIOMMU is in charge of transforming
>> fault info and inject to guest. The feature is also very important to
>> support first level translation(Translation request with PASID) in VM
>> which requires vIOMMU to inject device page request to VM.   
> 
> Is the type1 IOMMU backend even a valid target here? vIOMMU support
> with type1 is functional, but not really usable except for very
> specific cases.  Type1 is not designed for and not well suited to
> dynamic DMA mapping in a VM, which means that anything other than
> iommu=pt or assigning the device to a VM user or l2 guest is going to
> have horrible performance. Faulting in mappings, as seems to be
> enabled here, sort of implies dynmaic mapping, right?

Peter's patches have enabled vIOMMU IOVA with assigned device and guest
may change vIOMMU's IOVA page table dynamically. For example, if we
assign NIC and enable DMA protection in VM, NIC driver will dynamically
map/unmap DMA memory when receive/send packages and change vIOMMU IOVA
page table.

> Our container
> based accounting falls apart for type1 backing a vIOMMU as well.  Each
> device lives in its own address space and therefore vfio container,
> which means that management of vIOMMU assigned device VMs needs to
> account for locked memory limits per device.  This quickly opens a
> large vulnerability if a VM can lock multiples of the VM memory size.

This seems we have already this issue if we use assigned device with
vIOMMU regardless of whether we report pIOMMU fault event to vIOMMU or
not, right?

> 
> So is what you're developing here just a proof of concept using type1,
> or is this something that you feel actually has value long term?

We want to pass physical IOMMU fault event to vIOMMU and then inject
virtual fault event to VM. It's necessary to create the channel between
pIOMMU driver and vIOMMU. Currently, IOMMU map/umap requests go through
VFIO IOMMU type1 driver and so I supposed VFIO IOMMU type1 was better
place. This patchset is to show the idea. Another option is to go though
VFIO-PCI driver we have considered. It's very helpful if you can give
some guides here :)

>  
>> Introduce two new VFIO cmd for VFIO IOMMU type1 driver
>> - VFIO_IOMMU_SET_FAULT_EVENT_FD
>>   Receive eventfd from user space and trigger eventfd when get fault event
>> from IOMMU.  
>>
>> - VFIO_IOMMU_GET_FAULT_INFO 
>>   Return IOMMU fault info to userspace.
>>
>> VFIO IOMMU Type1 driver needs to register IOMMU fault event notifier
>> to IOMMU driver for assigned devices when they are attached to VFIO
>> container. IOMMU driver will run VFIO fault event callback when hardware
>> IOMMU triggers fault events for devices assigned to VFIO container. Then,
>> type1 driver signals eventfd provided by userspace and usrspace gets fault
>> info via new cmd.
>>
>> IOMMU interface are still in design stage and so we still can't register
>> IOMMU fault event notifier to IOMMU driver. This patchset is prototype
>> code to check whether VFIO IOMMU type1 driver is suitable place to deal
>> with IOMMU fault event and just passes build test.
>>
>> Very appreciate for comments.
>>
>> Lan Tianyu (3):
>>   VFIO: Add new cmd to receive eventfd from userspace to notify IOMMU
>>     fault event
>>   VFIO: Add IOMMU fault notifier callback
>>   VFIO: Add new cmd for user space to get IOMMU fault info
>>
>>  drivers/vfio/vfio_iommu_type1.c | 106 ++++++++++++++++++++++++++++++++++++++++
>>  include/linux/iommu.h           |   7 +++
>>  include/uapi/linux/vfio.h       |  37 ++++++++++++++
>>  3 files changed, 150 insertions(+)
>>
> 


-- 
Best regards
Tianyu Lan

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [RFC PATCH 1/3] VFIO: Add new cmd to receive eventfd from userspace to notify IOMMU fault event
  2017-02-20 20:53   ` Alex Williamson
@ 2017-02-21  5:29     ` Lan Tianyu
  0 siblings, 0 replies; 27+ messages in thread
From: Lan Tianyu @ 2017-02-21  5:29 UTC (permalink / raw)
  To: Alex Williamson
  Cc: kvm, kevin.tian, mst, jan.kiszka, jasowang, peterx, david, yi.l.liu

On 2017年02月21日 04:53, Alex Williamson wrote:
> On Sun, 19 Feb 2017 22:47:07 +0800
> Lan Tianyu <tianyu.lan@intel.com> wrote:
>> diff --git a/include/uapi/linux/vfio.h b/include/uapi/linux/vfio.h
>> index 519eff3..8616334 100644
>> --- a/include/uapi/linux/vfio.h
>> +++ b/include/uapi/linux/vfio.h
>> @@ -547,6 +547,21 @@ struct vfio_iommu_type1_dma_unmap {
>>  #define VFIO_IOMMU_ENABLE	_IO(VFIO_TYPE, VFIO_BASE + 15)
>>  #define VFIO_IOMMU_DISABLE	_IO(VFIO_TYPE, VFIO_BASE + 16)
>>  
>> +/*
>> + * VFIO_IOMMU_SET_FAULT_EVENT_FD	_IO(VFIO_TYPE, VFIO_BASE + 17)
>> + *
>> + * Receive eventfd from userspace to notify fault event from IOMMU.
>> + */
>> +struct vfio_iommu_type1_set_fault_eventfd {
>> +	__u32	argsz;
>> +	__u32   flags;
>> +/* What IOMMU Fault events should be reported. */
>> +#define VFIO_IOMMU_UR_FAULT_WITHOUT_PASID (1 << 0)
> 
> Is this a filter?  Would multiple filters be allowed using the same
> eventfd or would each type require a separate eventfd?  How would a
> user be able to probe what flags/filters are available?  Should this
> simply use the IRQ_INFO and SET_IRQ ioctls from the device file
> descriptor for consistency?

The filer should be in the IOMMU driver. VFIO driver needs to pass the
flag to IOMMU driver. The flag indicates what kind of fault events IOMMU
driver should report and is provided by userspace. Qemu has info what
kind of fault event it can handle. The new IOMMU driver API of
registering IOMMU fault notifier is still under design stage and so I
didn't use the flag in this patchset. The new VFIO fault event channel
also will affect the parameters of new IOMMU API and so I sent the
patchset first to get some conclusion before finalizing new IOMMU API.

Just like you mentioned "such an event might logically be connected to
the vfio container rather than the device" about IOMMU fault event in
the AER ERR mail(the discussion with Kevin), I put the IOMMU fault
evenfd with VFIO container rather than vfio-pci device.

-- 
Best regards
Tianyu Lan

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [RFC PATCH 0/3] VFIO: Report IOMMU fault event to userspace
  2017-02-21  4:49   ` Lan Tianyu
@ 2017-02-21  5:29     ` Alex Williamson
  2017-02-21 15:18       ` Lan Tianyu
  2017-02-28 15:58       ` Lan, Tianyu
  0 siblings, 2 replies; 27+ messages in thread
From: Alex Williamson @ 2017-02-21  5:29 UTC (permalink / raw)
  To: Lan Tianyu
  Cc: kvm, kevin.tian, mst, jan.kiszka, jasowang, peterx, david, yi.l.liu

On Tue, 21 Feb 2017 12:49:18 +0800
Lan Tianyu <tianyu.lan@intel.com> wrote:

> On 2017年02月21日 04:53, Alex Williamson wrote:
> > On Sun, 19 Feb 2017 22:47:06 +0800
> > Lan Tianyu <tianyu.lan@intel.com> wrote:
> >   
> >> This patchset proposes a solution to report hardware IOMMU fault
> >> event to userspace. Motivation is to make vIOMMU in Qemu get physical
> >> fault event and info from IOMMU hardware. vIOMMU is in charge of transforming
> >> fault info and inject to guest. The feature is also very important to
> >> support first level translation(Translation request with PASID) in VM
> >> which requires vIOMMU to inject device page request to VM.     
> > 
> > Is the type1 IOMMU backend even a valid target here? vIOMMU support
> > with type1 is functional, but not really usable except for very
> > specific cases.  Type1 is not designed for and not well suited to
> > dynamic DMA mapping in a VM, which means that anything other than
> > iommu=pt or assigning the device to a VM user or l2 guest is going to
> > have horrible performance. Faulting in mappings, as seems to be
> > enabled here, sort of implies dynmaic mapping, right?  
> 
> Peter's patches have enabled vIOMMU IOVA with assigned device and guest
> may change vIOMMU's IOVA page table dynamically. For example, if we
> assign NIC and enable DMA protection in VM, NIC driver will dynamically
> map/unmap DMA memory when receive/send packages and change vIOMMU IOVA
> page table.

Yes, it's functional, but does it have sufficient performance to bother
to further extend it for the vIOMMU use case?  The benefit of the
current level of vfio/viommu integration is that we can a) make use of
DPDK with assigned devices in a secure mode within the guest, and b)
use nested device assignment (though the usefulness of this this
versus simple novelty is is questionable).  Both of these make use of
relatively static mappings through vfio and therefore should not
experience much mapping overhead aside from setup and teardown.

As I understand your use case, particularly with PASID, you're trying
to enable the device to fault in mappings, which implies a dynamic
mapping use case.  Run a 10GBps NIC (or 40 or 100) in a viommu guest
_without_ using iommu=pt.  How close do you get to native performance?
How close can a virtio device get in the same configuration?  What is
the usefulness in extending type1 to support piommu faults if it
doesn't have worthwhile performance?
 
> > Our container
> > based accounting falls apart for type1 backing a vIOMMU as well.  Each
> > device lives in its own address space and therefore vfio container,
> > which means that management of vIOMMU assigned device VMs needs to
> > account for locked memory limits per device.  This quickly opens a
> > large vulnerability if a VM can lock multiples of the VM memory size.  
> 
> This seems we have already this issue if we use assigned device with
> vIOMMU regardless of whether we report pIOMMU fault event to vIOMMU or
> not, right?

Yes, so why continue to push viommu features into an iommu model that
has these limitations?  Perhaps type1 can be extended to avoid them,
perhaps a new "type2" iommu backend is a better solution.  Either way I
don't see how adding iommu fault handling to type1 as it exists today
is more than just a proof of concept exercise.

> > 
> > So is what you're developing here just a proof of concept using type1,
> > or is this something that you feel actually has value long term?  
> 
> We want to pass physical IOMMU fault event to vIOMMU and then inject
> virtual fault event to VM. It's necessary to create the channel between
> pIOMMU driver and vIOMMU. Currently, IOMMU map/umap requests go through
> VFIO IOMMU type1 driver and so I supposed VFIO IOMMU type1 was better
> place. This patchset is to show the idea. Another option is to go though
> VFIO-PCI driver we have considered. It's very helpful if you can give
> some guides here :)

I understand, but is this any more than a functional proof of concept?
What are your performance requirements?  Does type1 meet them?  Have
you tested?  My expectation is that dynamic DMA mapping performance
through type1 with a viommu configured for dynamic mappings is abysmal
and it's therefore a poor target on which to base additional features
such as fault handling.  Do you have data to suggest otherwise?  Going
through the device rather than the container doesn't solve my assertion
that type1 is merely functional and does not provide worthwhile
performance when used in conjunction with dynamic mappings.  Thanks,

Alex

> >> Introduce two new VFIO cmd for VFIO IOMMU type1 driver
> >> - VFIO_IOMMU_SET_FAULT_EVENT_FD
> >>   Receive eventfd from user space and trigger eventfd when get fault event
> >> from IOMMU.  
> >>
> >> - VFIO_IOMMU_GET_FAULT_INFO 
> >>   Return IOMMU fault info to userspace.
> >>
> >> VFIO IOMMU Type1 driver needs to register IOMMU fault event notifier
> >> to IOMMU driver for assigned devices when they are attached to VFIO
> >> container. IOMMU driver will run VFIO fault event callback when hardware
> >> IOMMU triggers fault events for devices assigned to VFIO container. Then,
> >> type1 driver signals eventfd provided by userspace and usrspace gets fault
> >> info via new cmd.
> >>
> >> IOMMU interface are still in design stage and so we still can't register
> >> IOMMU fault event notifier to IOMMU driver. This patchset is prototype
> >> code to check whether VFIO IOMMU type1 driver is suitable place to deal
> >> with IOMMU fault event and just passes build test.
> >>
> >> Very appreciate for comments.
> >>
> >> Lan Tianyu (3):
> >>   VFIO: Add new cmd to receive eventfd from userspace to notify IOMMU
> >>     fault event
> >>   VFIO: Add IOMMU fault notifier callback
> >>   VFIO: Add new cmd for user space to get IOMMU fault info
> >>
> >>  drivers/vfio/vfio_iommu_type1.c | 106 ++++++++++++++++++++++++++++++++++++++++
> >>  include/linux/iommu.h           |   7 +++
> >>  include/uapi/linux/vfio.h       |  37 ++++++++++++++
> >>  3 files changed, 150 insertions(+)
> >>  
> >   
> 
> 

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [RFC PATCH 1/3] VFIO: Add new cmd to receive eventfd from userspace to notify IOMMU fault event
  2017-02-19 14:47 ` [RFC PATCH 1/3] VFIO: Add new cmd to receive eventfd from userspace to notify IOMMU fault event Lan Tianyu
  2017-02-20 20:53   ` Alex Williamson
@ 2017-02-21  5:48   ` Michael S. Tsirkin
  2017-02-21  6:05     ` Alex Williamson
  2017-02-21  6:11     ` Liu, Yi L
  1 sibling, 2 replies; 27+ messages in thread
From: Michael S. Tsirkin @ 2017-02-21  5:48 UTC (permalink / raw)
  To: Lan Tianyu
  Cc: kvm, kevin.tian, jan.kiszka, jasowang, peterx, david,
	alex.williamson, yi.l.liu

On Sun, Feb 19, 2017 at 10:47:07PM +0800, Lan Tianyu wrote:
> This patch is to receive IOMMU fault eventfd and IOMMU fault event flag
> from userspace. VFIO should register IOMMU fault event handler according
> fault event flag which designates what kind of fault events need to be
> reported.  When VFIO get IOMMU fault event from IOMMU driver, it should
> notify userspace via received fd.
> 
> Signed-off-by: Lan Tianyu <tianyu.lan@intel.com>
> ---
>  drivers/vfio/vfio_iommu_type1.c | 45 +++++++++++++++++++++++++++++++++++++++++
>  include/uapi/linux/vfio.h       | 15 ++++++++++++++
>  2 files changed, 60 insertions(+)
> 
> diff --git a/drivers/vfio/vfio_iommu_type1.c b/drivers/vfio/vfio_iommu_type1.c
> index b3cc33f..46674ea 100644
> --- a/drivers/vfio/vfio_iommu_type1.c
> +++ b/drivers/vfio/vfio_iommu_type1.c
> @@ -38,6 +38,7 @@
>  #include <linux/workqueue.h>
>  #include <linux/mdev.h>
>  #include <linux/notifier.h>
> +#include <linux/eventfd.h>
>  
>  #define DRIVER_VERSION  "0.2"
>  #define DRIVER_AUTHOR   "Alex Williamson <alex.williamson@redhat.com>"
> @@ -61,6 +62,8 @@ struct vfio_iommu {
>  	struct mutex		lock;
>  	struct rb_root		dma_list;
>  	struct blocking_notifier_head notifier;
> +	struct eventfd_ctx	*iommu_fault_fd;
> +	struct mutex            fault_lock;
>  	bool			v2;
>  	bool			nesting;
>  };
> @@ -1452,6 +1455,7 @@ static void *vfio_iommu_type1_open(unsigned long arg)
>  	INIT_LIST_HEAD(&iommu->domain_list);
>  	iommu->dma_list = RB_ROOT;
>  	mutex_init(&iommu->lock);
> +	mutex_init(&iommu->fault_lock);
>  	BLOCKING_INIT_NOTIFIER_HEAD(&iommu->notifier);
>  
>  	return iommu;
> @@ -1582,6 +1586,47 @@ static long vfio_iommu_type1_ioctl(void *iommu_data,
>  
>  		return copy_to_user((void __user *)arg, &unmap, minsz) ?
>  			-EFAULT : 0;
> +	} else if (cmd == VFIO_IOMMU_SET_FAULT_EVENTFD) {
> +		struct vfio_iommu_type1_set_fault_eventfd eventfd;
> +		int fd;
> +		int ret;
> +
> +		minsz = offsetofend(struct vfio_iommu_type1_set_fault_eventfd,
> +				    fd);
> +
> +		if (copy_from_user(&eventfd, (void __user *)arg, minsz))
> +			return -EFAULT;
> +
> +		if (eventfd.argsz < minsz)
> +			return -EINVAL;
> +
> +
> +		mutex_lock(&iommu->fault_lock);
> +
> +		fd = eventfd.fd;
> +		if (fd == -1) {
> +			if (iommu->iommu_fault_fd)
> +				eventfd_ctx_put(iommu->iommu_fault_fd);
> +			iommu->iommu_fault_fd = NULL;
> +			ret = 0;
> +		} else if (fd >= 0) {
> +			struct eventfd_ctx *ctx;
> +
> +			ctx = eventfd_ctx_fdget(fd);
> +			if (IS_ERR(ctx))
> +				return PTR_ERR(ctx);
> +
> +			if (ctx)
> +				eventfd_ctx_put(ctx);
> +
> +			iommu->iommu_fault_fd = ctx;
> +			ret = 0;
> +		} else
> +			ret = -EINVAL;
> +
> +		mutex_unlock(&iommu->fault_lock);
> +
> +		return ret;
>  	}
>  
>  	return -ENOTTY;
> diff --git a/include/uapi/linux/vfio.h b/include/uapi/linux/vfio.h
> index 519eff3..8616334 100644
> --- a/include/uapi/linux/vfio.h
> +++ b/include/uapi/linux/vfio.h
> @@ -547,6 +547,21 @@ struct vfio_iommu_type1_dma_unmap {
>  #define VFIO_IOMMU_ENABLE	_IO(VFIO_TYPE, VFIO_BASE + 15)
>  #define VFIO_IOMMU_DISABLE	_IO(VFIO_TYPE, VFIO_BASE + 16)
>  
> +/*
> + * VFIO_IOMMU_SET_FAULT_EVENT_FD	_IO(VFIO_TYPE, VFIO_BASE + 17)
> + *
> + * Receive eventfd from userspace to notify fault event from IOMMU.
> + */
> +struct vfio_iommu_type1_set_fault_eventfd {
> +	__u32	argsz;
> +	__u32   flags;
> +/* What IOMMU Fault events should be reported. */
> +#define VFIO_IOMMU_UR_FAULT_WITHOUT_PASID (1 << 0)
> +	__s32	fd;			/* Eventfd from user space */
> +};
> +
> +#define VFIO_IOMMU_SET_FAULT_EVENTFD	_IO(VFIO_TYPE, VFIO_BASE + 17)
> +
>  /* -------- Additional API for SPAPR TCE (Server POWERPC) IOMMU -------- */

How about simply doing

        VFIO_PCI_MSIX_IRQ_INDEX,
        VFIO_PCI_ERR_IRQ_INDEX,
        VFIO_PCI_REQ_IRQ_INDEX,
+       VFIO_PCI_IOMMU_UR_FAULT_WITHOUT_PASID_IRQ_INDEX,
        VFIO_PCI_NUM_IRQS


instead?

This way you don't need to write a whole new kernel/userspace
interface.

>  /*
> -- 
> 1.8.3.1

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [RFC PATCH 2/3] VFIO: Add IOMMU fault notifier callback
  2017-02-19 14:47 ` [RFC PATCH 2/3] VFIO: Add IOMMU fault notifier callback Lan Tianyu
  2017-02-20  2:58   ` Liu, Yi L
  2017-02-20 20:53   ` Alex Williamson
@ 2017-02-21  5:55   ` Michael S. Tsirkin
  2017-02-21  6:13     ` Lan Tianyu
  2 siblings, 1 reply; 27+ messages in thread
From: Michael S. Tsirkin @ 2017-02-21  5:55 UTC (permalink / raw)
  To: Lan Tianyu
  Cc: kvm, kevin.tian, jan.kiszka, jasowang, peterx, david,
	alex.williamson, yi.l.liu

On Sun, Feb 19, 2017 at 10:47:08PM +0800, Lan Tianyu wrote:
> This patch is to add callback to handle fault event reported by
> IOMMU driver. Callback stores fault into an array and notify userspace
> via eventfd to read fault info.
> 
> Signed-off-by: Lan Tianyu <tianyu.lan@intel.com>
> ---
>  drivers/vfio/vfio_iommu_type1.c | 30 ++++++++++++++++++++++++++++++
>  include/linux/iommu.h           |  7 +++++++
>  include/uapi/linux/vfio.h       |  7 +++++++
>  3 files changed, 44 insertions(+)
> 
> diff --git a/drivers/vfio/vfio_iommu_type1.c b/drivers/vfio/vfio_iommu_type1.c
> index 46674ea..dc434a3 100644
> --- a/drivers/vfio/vfio_iommu_type1.c
> +++ b/drivers/vfio/vfio_iommu_type1.c
> @@ -56,6 +56,8 @@
>  MODULE_PARM_DESC(disable_hugepages,
>  		 "Disable VFIO IOMMU support for IOMMU hugepages.");
>  
> +#define NR_IOMMU_FAULT_INFO	10
> +
>  struct vfio_iommu {
>  	struct list_head	domain_list;
>  	struct vfio_domain	*external_domain; /* domain for external user */
> @@ -64,6 +66,9 @@ struct vfio_iommu {
>  	struct blocking_notifier_head notifier;
>  	struct eventfd_ctx	*iommu_fault_fd;
>  	struct mutex            fault_lock;
> +	struct vfio_iommu_fault_info fault_info[NR_IOMMU_FAULT_INFO];

What if you run out of this space? Userspace will not
see any more faults. What will cause progress to happen?


> +	struct blocking_notifier_head iommu_fault_notifier;
> +	u8			fault_count;
>  	bool			v2;
>  	bool			nesting;
>  };
> @@ -1456,6 +1461,7 @@ static void *vfio_iommu_type1_open(unsigned long arg)
>  	iommu->dma_list = RB_ROOT;
>  	mutex_init(&iommu->lock);
>  	mutex_init(&iommu->fault_lock);
> +	iommu->fault_count = 0;
>  	BLOCKING_INIT_NOTIFIER_HEAD(&iommu->notifier);
>  
>  	return iommu;
> @@ -1516,6 +1522,30 @@ static int vfio_domains_have_iommu_cache(struct vfio_iommu *iommu)
>  	return ret;
>  }
>  
> +static int vfio_iommu_fault_event_notifier(struct notifier_block *nb,
> +					   struct iommu_fault_info *fault_info,
> +					   void *data)
> +{
> +	struct vfio_iommu *iommu = data;
> +	struct vfio_iommu_fault_info *info;
> +
> +	mutex_lock(&iommu->fault_lock);
> +
> +	info = &iommu->fault_info[iommu->fault_count];
> +	info->addr = fault_info->addr;
> +	info->sid = fault_info->sid;
> +	info->fault_reason = fault_info->fault_reason;
> +	info->is_write = fault_info->is_write;
> +
> +	iommu->fault_count++;

Will corrupt memory once array overflows NR_IOMMU_FAULT_INFO.


> +
> +	if (iommu->iommu_fault_fd)
> +		eventfd_signal(iommu->iommu_fault_fd, 1);
> +
> +	mutex_unlock(&iommu->fault_lock);
> +	return 0;
> +}
> +
>  static long vfio_iommu_type1_ioctl(void *iommu_data,
>  				   unsigned int cmd, unsigned long arg)
>  {
> diff --git a/include/linux/iommu.h b/include/linux/iommu.h
> index 0ff5111..b6a7bdb 100644
> --- a/include/linux/iommu.h
> +++ b/include/linux/iommu.h
> @@ -86,6 +86,13 @@ struct iommu_domain {
>  	void *iova_cookie;
>  };
>  
> +struct iommu_fault_info {
> +	__u64	addr;
> +	__u16   sid;
> +	__u8    fault_reason;
> +	__u8	is_write:1;
> +};
> +
>  enum iommu_cap {
>  	IOMMU_CAP_CACHE_COHERENCY,	/* IOMMU can enforce cache coherent DMA
>  					   transactions */
> diff --git a/include/uapi/linux/vfio.h b/include/uapi/linux/vfio.h
> index 8616334..da359dd 100644
> --- a/include/uapi/linux/vfio.h
> +++ b/include/uapi/linux/vfio.h
> @@ -562,6 +562,13 @@ struct vfio_iommu_type1_set_fault_eventfd {
>  
>  #define VFIO_IOMMU_SET_FAULT_EVENTFD	_IO(VFIO_TYPE, VFIO_BASE + 17)
>  
> +struct vfio_iommu_fault_info {
> +	__u64	addr;
> +	__u16   sid;

It's not clear it's userspace's business to know the sid.  It normally
does not care once management has bound vfio to a device. You should use
a device identifier that makes sense.


> +	__u8    fault_reason;
> +	__u8	is_write:1;
> +};
> +
>  /* -------- Additional API for SPAPR TCE (Server POWERPC) IOMMU -------- */
>  
>  /*
> -- 
> 1.8.3.1

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [RFC PATCH 2/3] VFIO: Add IOMMU fault notifier callback
  2017-02-20 20:53   ` Alex Williamson
@ 2017-02-21  6:05     ` Lan Tianyu
  0 siblings, 0 replies; 27+ messages in thread
From: Lan Tianyu @ 2017-02-21  6:05 UTC (permalink / raw)
  To: Alex Williamson
  Cc: kvm, kevin.tian, mst, jan.kiszka, jasowang, peterx, david, yi.l.liu

On 2017年02月21日 04:53, Alex Williamson wrote:
> On Sun, 19 Feb 2017 22:47:08 +0800
> Lan Tianyu <tianyu.lan@intel.com> wrote:
> 
>> This patch is to add callback to handle fault event reported by
>> IOMMU driver. Callback stores fault into an array and notify userspace
>> via eventfd to read fault info.
>>
>> Signed-off-by: Lan Tianyu <tianyu.lan@intel.com>
>> ---
>>  drivers/vfio/vfio_iommu_type1.c | 30 ++++++++++++++++++++++++++++++
>>  include/linux/iommu.h           |  7 +++++++
>>  include/uapi/linux/vfio.h       |  7 +++++++
>>  3 files changed, 44 insertions(+)
>>
>> diff --git a/drivers/vfio/vfio_iommu_type1.c b/drivers/vfio/vfio_iommu_type1.c
>> index 46674ea..dc434a3 100644
>> --- a/drivers/vfio/vfio_iommu_type1.c
>> +++ b/drivers/vfio/vfio_iommu_type1.c
>> @@ -56,6 +56,8 @@
>>  MODULE_PARM_DESC(disable_hugepages,
>>  		 "Disable VFIO IOMMU support for IOMMU hugepages.");
>>  
>> +#define NR_IOMMU_FAULT_INFO	10
> 
> How is this value determined?

The length of fault info array is defined by manually.
It's hard to estimate how many fault info entry will be queued in the
array.

> 
>> +
>>  struct vfio_iommu {
>>  	struct list_head	domain_list;
>>  	struct vfio_domain	*external_domain; /* domain for external user */
>> @@ -64,6 +66,9 @@ struct vfio_iommu {
>>  	struct blocking_notifier_head notifier;
>>  	struct eventfd_ctx	*iommu_fault_fd;
>>  	struct mutex            fault_lock;
>> +	struct vfio_iommu_fault_info fault_info[NR_IOMMU_FAULT_INFO];
>> +	struct blocking_notifier_head iommu_fault_notifier;
>> +	u8			fault_count;
>>  	bool			v2;
>>  	bool			nesting;
>>  };
>> @@ -1456,6 +1461,7 @@ static void *vfio_iommu_type1_open(unsigned long arg)
>>  	iommu->dma_list = RB_ROOT;
>>  	mutex_init(&iommu->lock);
>>  	mutex_init(&iommu->fault_lock);
>> +	iommu->fault_count = 0;
>>  	BLOCKING_INIT_NOTIFIER_HEAD(&iommu->notifier);
>>  
>>  	return iommu;
>> @@ -1516,6 +1522,30 @@ static int vfio_domains_have_iommu_cache(struct vfio_iommu *iommu)
>>  	return ret;
>>  }
>>  
>> +static int vfio_iommu_fault_event_notifier(struct notifier_block *nb,
>> +					   struct iommu_fault_info *fault_info,
>> +					   void *data)
>> +{
>> +	struct vfio_iommu *iommu = data;
>> +	struct vfio_iommu_fault_info *info;
>> +
>> +	mutex_lock(&iommu->fault_lock);
>> +
>> +	info = &iommu->fault_info[iommu->fault_count];
>> +	info->addr = fault_info->addr;
>> +	info->sid = fault_info->sid;
>> +	info->fault_reason = fault_info->fault_reason;
>> +	info->is_write = fault_info->is_write;
>> +
>> +	iommu->fault_count++;
>> +
>> +	if (iommu->iommu_fault_fd)
>> +		eventfd_signal(iommu->iommu_fault_fd, 1);
>> +
>> +	mutex_unlock(&iommu->fault_lock);
>> +	return 0;
>> +}
> 
> 
> This notifier is never registered as any sort of IOMMU fault reporting
> infrastructure is currently imaginary, and the iommu development list
> isn't cc'd.

Yes, I should cc IOMM maillist. I thought this patchset was to determine
new VFIO API and so...

> 
>> +
>>  static long vfio_iommu_type1_ioctl(void *iommu_data,
>>  				   unsigned int cmd, unsigned long arg)
>>  {
>> diff --git a/include/linux/iommu.h b/include/linux/iommu.h
>> index 0ff5111..b6a7bdb 100644
>> --- a/include/linux/iommu.h
>> +++ b/include/linux/iommu.h
>> @@ -86,6 +86,13 @@ struct iommu_domain {
>>  	void *iova_cookie;
>>  };
>>  
>> +struct iommu_fault_info {
>> +	__u64	addr;
>> +	__u16   sid;
>> +	__u8    fault_reason;
>> +	__u8	is_write:1;
>> +};
> 
> 
> A common fault reporting structure would need to be agreed upon by all
> parties, ie. everyone that current supporting the IOMMU API.  As used
> in QEMU, the fault_reason here appears to be very VT-d specific.

Fault reason number will be platform specific and it'd hard to use a
common definition to show fault reason number of all platforms. I wonder
whether we can define a vendor field in the fault info and
consumer can use the vendor field to deal with platform specific info.
-- 
Best regards
Tianyu Lan

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [RFC PATCH 1/3] VFIO: Add new cmd to receive eventfd from userspace to notify IOMMU fault event
  2017-02-21  5:48   ` Michael S. Tsirkin
@ 2017-02-21  6:05     ` Alex Williamson
  2017-02-21  6:11     ` Liu, Yi L
  1 sibling, 0 replies; 27+ messages in thread
From: Alex Williamson @ 2017-02-21  6:05 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: Lan Tianyu, kvm, kevin.tian, jan.kiszka, jasowang, peterx, david,
	yi.l.liu

On Tue, 21 Feb 2017 07:48:27 +0200
"Michael S. Tsirkin" <mst@redhat.com> wrote:

> On Sun, Feb 19, 2017 at 10:47:07PM +0800, Lan Tianyu wrote:
> > This patch is to receive IOMMU fault eventfd and IOMMU fault event flag
> > from userspace. VFIO should register IOMMU fault event handler according
> > fault event flag which designates what kind of fault events need to be
> > reported.  When VFIO get IOMMU fault event from IOMMU driver, it should
> > notify userspace via received fd.
> > 
> > Signed-off-by: Lan Tianyu <tianyu.lan@intel.com>
> > ---
> >  drivers/vfio/vfio_iommu_type1.c | 45 +++++++++++++++++++++++++++++++++++++++++
> >  include/uapi/linux/vfio.h       | 15 ++++++++++++++
> >  2 files changed, 60 insertions(+)
> > 
> > diff --git a/drivers/vfio/vfio_iommu_type1.c b/drivers/vfio/vfio_iommu_type1.c
> > index b3cc33f..46674ea 100644
> > --- a/drivers/vfio/vfio_iommu_type1.c
> > +++ b/drivers/vfio/vfio_iommu_type1.c
> > @@ -38,6 +38,7 @@
> >  #include <linux/workqueue.h>
> >  #include <linux/mdev.h>
> >  #include <linux/notifier.h>
> > +#include <linux/eventfd.h>
> >  
> >  #define DRIVER_VERSION  "0.2"
> >  #define DRIVER_AUTHOR   "Alex Williamson <alex.williamson@redhat.com>"
> > @@ -61,6 +62,8 @@ struct vfio_iommu {
> >  	struct mutex		lock;
> >  	struct rb_root		dma_list;
> >  	struct blocking_notifier_head notifier;
> > +	struct eventfd_ctx	*iommu_fault_fd;
> > +	struct mutex            fault_lock;
> >  	bool			v2;
> >  	bool			nesting;
> >  };
> > @@ -1452,6 +1455,7 @@ static void *vfio_iommu_type1_open(unsigned long arg)
> >  	INIT_LIST_HEAD(&iommu->domain_list);
> >  	iommu->dma_list = RB_ROOT;
> >  	mutex_init(&iommu->lock);
> > +	mutex_init(&iommu->fault_lock);
> >  	BLOCKING_INIT_NOTIFIER_HEAD(&iommu->notifier);
> >  
> >  	return iommu;
> > @@ -1582,6 +1586,47 @@ static long vfio_iommu_type1_ioctl(void *iommu_data,
> >  
> >  		return copy_to_user((void __user *)arg, &unmap, minsz) ?
> >  			-EFAULT : 0;
> > +	} else if (cmd == VFIO_IOMMU_SET_FAULT_EVENTFD) {
> > +		struct vfio_iommu_type1_set_fault_eventfd eventfd;
> > +		int fd;
> > +		int ret;
> > +
> > +		minsz = offsetofend(struct vfio_iommu_type1_set_fault_eventfd,
> > +				    fd);
> > +
> > +		if (copy_from_user(&eventfd, (void __user *)arg, minsz))
> > +			return -EFAULT;
> > +
> > +		if (eventfd.argsz < minsz)
> > +			return -EINVAL;
> > +
> > +
> > +		mutex_lock(&iommu->fault_lock);
> > +
> > +		fd = eventfd.fd;
> > +		if (fd == -1) {
> > +			if (iommu->iommu_fault_fd)
> > +				eventfd_ctx_put(iommu->iommu_fault_fd);
> > +			iommu->iommu_fault_fd = NULL;
> > +			ret = 0;
> > +		} else if (fd >= 0) {
> > +			struct eventfd_ctx *ctx;
> > +
> > +			ctx = eventfd_ctx_fdget(fd);
> > +			if (IS_ERR(ctx))
> > +				return PTR_ERR(ctx);
> > +
> > +			if (ctx)
> > +				eventfd_ctx_put(ctx);
> > +
> > +			iommu->iommu_fault_fd = ctx;
> > +			ret = 0;
> > +		} else
> > +			ret = -EINVAL;
> > +
> > +		mutex_unlock(&iommu->fault_lock);
> > +
> > +		return ret;
> >  	}
> >  
> >  	return -ENOTTY;
> > diff --git a/include/uapi/linux/vfio.h b/include/uapi/linux/vfio.h
> > index 519eff3..8616334 100644
> > --- a/include/uapi/linux/vfio.h
> > +++ b/include/uapi/linux/vfio.h
> > @@ -547,6 +547,21 @@ struct vfio_iommu_type1_dma_unmap {
> >  #define VFIO_IOMMU_ENABLE	_IO(VFIO_TYPE, VFIO_BASE + 15)
> >  #define VFIO_IOMMU_DISABLE	_IO(VFIO_TYPE, VFIO_BASE + 16)
> >  
> > +/*
> > + * VFIO_IOMMU_SET_FAULT_EVENT_FD	_IO(VFIO_TYPE, VFIO_BASE + 17)
> > + *
> > + * Receive eventfd from userspace to notify fault event from IOMMU.
> > + */
> > +struct vfio_iommu_type1_set_fault_eventfd {
> > +	__u32	argsz;
> > +	__u32   flags;
> > +/* What IOMMU Fault events should be reported. */
> > +#define VFIO_IOMMU_UR_FAULT_WITHOUT_PASID (1 << 0)
> > +	__s32	fd;			/* Eventfd from user space */
> > +};
> > +
> > +#define VFIO_IOMMU_SET_FAULT_EVENTFD	_IO(VFIO_TYPE, VFIO_BASE + 17)
> > +
> >  /* -------- Additional API for SPAPR TCE (Server POWERPC) IOMMU -------- */  
> 
> How about simply doing
> 
>         VFIO_PCI_MSIX_IRQ_INDEX,
>         VFIO_PCI_ERR_IRQ_INDEX,
>         VFIO_PCI_REQ_IRQ_INDEX,
> +       VFIO_PCI_IOMMU_UR_FAULT_WITHOUT_PASID_IRQ_INDEX,
>         VFIO_PCI_NUM_IRQS
> 
> 
> instead?
> 
> This way you don't need to write a whole new kernel/userspace
> interface.

It seems there are pros and cons either way, but this is fundamentally
different, making the iommu fault exposed on the device rather than the
container.  The container represents the iommu state, so it makes some
sense to report the fault for the container.  On the other hand if
it's reported per device there's no matching of source ID to a
device.  It certainly affects how the iommu fault reporting is designed,
whether it's a callback added to the device driver or a component of
the IOMMU API.  Thanks,

Alex

^ permalink raw reply	[flat|nested] 27+ messages in thread

* RE: [RFC PATCH 1/3] VFIO: Add new cmd to receive eventfd from userspace to notify IOMMU fault event
  2017-02-21  5:48   ` Michael S. Tsirkin
  2017-02-21  6:05     ` Alex Williamson
@ 2017-02-21  6:11     ` Liu, Yi L
  1 sibling, 0 replies; 27+ messages in thread
From: Liu, Yi L @ 2017-02-21  6:11 UTC (permalink / raw)
  To: Michael S. Tsirkin, Lan, Tianyu
  Cc: kvm, Tian, Kevin, jan.kiszka, jasowang, peterx, david, alex.williamson

> -----Original Message-----
> From: Michael S. Tsirkin [mailto:mst@redhat.com]
> Sent: Tuesday, February 21, 2017 1:48 PM
> To: Lan, Tianyu <tianyu.lan@intel.com>
> Cc: kvm@vger.kernel.org; Tian, Kevin <kevin.tian@intel.com>;
> jan.kiszka@siemens.com; jasowang@redhat.com; peterx@redhat.com;
> david@gibson.dropbear.id.au; alex.williamson@redhat.com; Liu, Yi L
> <yi.l.liu@intel.com>
> Subject: Re: [RFC PATCH 1/3] VFIO: Add new cmd to receive eventfd from
> userspace to notify IOMMU fault event
> 
> On Sun, Feb 19, 2017 at 10:47:07PM +0800, Lan Tianyu wrote:
> > This patch is to receive IOMMU fault eventfd and IOMMU fault event
> > flag from userspace. VFIO should register IOMMU fault event handler
> > according fault event flag which designates what kind of fault events
> > need to be reported.  When VFIO get IOMMU fault event from IOMMU
> > driver, it should notify userspace via received fd.
> >
> > Signed-off-by: Lan Tianyu <tianyu.lan@intel.com>
> > ---
> >  drivers/vfio/vfio_iommu_type1.c | 45
> +++++++++++++++++++++++++++++++++++++++++
> >  include/uapi/linux/vfio.h       | 15 ++++++++++++++
> >  2 files changed, 60 insertions(+)
> >
> > diff --git a/drivers/vfio/vfio_iommu_type1.c
> > b/drivers/vfio/vfio_iommu_type1.c index b3cc33f..46674ea 100644
> > --- a/drivers/vfio/vfio_iommu_type1.c
> > +++ b/drivers/vfio/vfio_iommu_type1.c
> > @@ -38,6 +38,7 @@
> >  #include <linux/workqueue.h>
> >  #include <linux/mdev.h>
> >  #include <linux/notifier.h>
> > +#include <linux/eventfd.h>
> >
> >  #define DRIVER_VERSION  "0.2"
> >  #define DRIVER_AUTHOR   "Alex Williamson <alex.williamson@redhat.com>"
> > @@ -61,6 +62,8 @@ struct vfio_iommu {
> >  	struct mutex		lock;
> >  	struct rb_root		dma_list;
> >  	struct blocking_notifier_head notifier;
> > +	struct eventfd_ctx	*iommu_fault_fd;
> > +	struct mutex            fault_lock;
> >  	bool			v2;
> >  	bool			nesting;
> >  };
> > @@ -1452,6 +1455,7 @@ static void *vfio_iommu_type1_open(unsigned
> long arg)
> >  	INIT_LIST_HEAD(&iommu->domain_list);
> >  	iommu->dma_list = RB_ROOT;
> >  	mutex_init(&iommu->lock);
> > +	mutex_init(&iommu->fault_lock);
> >  	BLOCKING_INIT_NOTIFIER_HEAD(&iommu->notifier);
> >
> >  	return iommu;
> > @@ -1582,6 +1586,47 @@ static long vfio_iommu_type1_ioctl(void
> > *iommu_data,
> >
> >  		return copy_to_user((void __user *)arg, &unmap, minsz) ?
> >  			-EFAULT : 0;
> > +	} else if (cmd == VFIO_IOMMU_SET_FAULT_EVENTFD) {
> > +		struct vfio_iommu_type1_set_fault_eventfd eventfd;
> > +		int fd;
> > +		int ret;
> > +
> > +		minsz = offsetofend(struct
> vfio_iommu_type1_set_fault_eventfd,
> > +				    fd);
> > +
> > +		if (copy_from_user(&eventfd, (void __user *)arg, minsz))
> > +			return -EFAULT;
> > +
> > +		if (eventfd.argsz < minsz)
> > +			return -EINVAL;
> > +
> > +
> > +		mutex_lock(&iommu->fault_lock);
> > +
> > +		fd = eventfd.fd;
> > +		if (fd == -1) {
> > +			if (iommu->iommu_fault_fd)
> > +				eventfd_ctx_put(iommu->iommu_fault_fd);
> > +			iommu->iommu_fault_fd = NULL;
> > +			ret = 0;
> > +		} else if (fd >= 0) {
> > +			struct eventfd_ctx *ctx;
> > +
> > +			ctx = eventfd_ctx_fdget(fd);
> > +			if (IS_ERR(ctx))
> > +				return PTR_ERR(ctx);
> > +
> > +			if (ctx)
> > +				eventfd_ctx_put(ctx);
> > +
> > +			iommu->iommu_fault_fd = ctx;
> > +			ret = 0;
> > +		} else
> > +			ret = -EINVAL;
> > +
> > +		mutex_unlock(&iommu->fault_lock);
> > +
> > +		return ret;
> >  	}
> >
> >  	return -ENOTTY;
> > diff --git a/include/uapi/linux/vfio.h b/include/uapi/linux/vfio.h
> > index 519eff3..8616334 100644
> > --- a/include/uapi/linux/vfio.h
> > +++ b/include/uapi/linux/vfio.h
> > @@ -547,6 +547,21 @@ struct vfio_iommu_type1_dma_unmap {
> >  #define VFIO_IOMMU_ENABLE	_IO(VFIO_TYPE, VFIO_BASE + 15)
> >  #define VFIO_IOMMU_DISABLE	_IO(VFIO_TYPE, VFIO_BASE + 16)
> >
> > +/*
> > + * VFIO_IOMMU_SET_FAULT_EVENT_FD	_IO(VFIO_TYPE, VFIO_BASE + 17)
> > + *
> > + * Receive eventfd from userspace to notify fault event from IOMMU.
> > + */
> > +struct vfio_iommu_type1_set_fault_eventfd {
> > +	__u32	argsz;
> > +	__u32   flags;
> > +/* What IOMMU Fault events should be reported. */ #define
> > +VFIO_IOMMU_UR_FAULT_WITHOUT_PASID (1 << 0)
> > +	__s32	fd;			/* Eventfd from user space */
> > +};
> > +
> > +#define VFIO_IOMMU_SET_FAULT_EVENTFD	_IO(VFIO_TYPE,
> VFIO_BASE + 17)
> > +
> >  /* -------- Additional API for SPAPR TCE (Server POWERPC) IOMMU
> > -------- */
> 
> How about simply doing
> 
>         VFIO_PCI_MSIX_IRQ_INDEX,
>         VFIO_PCI_ERR_IRQ_INDEX,
>         VFIO_PCI_REQ_IRQ_INDEX,
> +       VFIO_PCI_IOMMU_UR_FAULT_WITHOUT_PASID_IRQ_INDEX,
>         VFIO_PCI_NUM_IRQS
> 
> 
> instead?
> 
> This way you don't need to write a whole new kernel/userspace interface.

hmmm, it's a practical way. However, the IRQ framework is designed to be working
per-device. If we have multi-assigned device, it will introduce multi fault irq. Howevr,
the fault from pIOMMU should be per-IOMMU or at least per-domain. So I think we'd
better to have separate kernel/userspace interface although I admit the new interface
is similar with the one you suggested.

Thanks,
Yi L

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [RFC PATCH 2/3] VFIO: Add IOMMU fault notifier callback
  2017-02-21  5:55   ` Michael S. Tsirkin
@ 2017-02-21  6:13     ` Lan Tianyu
  0 siblings, 0 replies; 27+ messages in thread
From: Lan Tianyu @ 2017-02-21  6:13 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: kvm, kevin.tian, jan.kiszka, jasowang, peterx, david,
	alex.williamson, yi.l.liu

On 2017年02月21日 13:55, Michael S. Tsirkin wrote:
> On Sun, Feb 19, 2017 at 10:47:08PM +0800, Lan Tianyu wrote:
>> This patch is to add callback to handle fault event reported by
>> IOMMU driver. Callback stores fault into an array and notify userspace
>> via eventfd to read fault info.
>>
>> Signed-off-by: Lan Tianyu <tianyu.lan@intel.com>
>> ---
>>  drivers/vfio/vfio_iommu_type1.c | 30 ++++++++++++++++++++++++++++++
>>  include/linux/iommu.h           |  7 +++++++
>>  include/uapi/linux/vfio.h       |  7 +++++++
>>  3 files changed, 44 insertions(+)
>>
>> diff --git a/drivers/vfio/vfio_iommu_type1.c b/drivers/vfio/vfio_iommu_type1.c
>> index 46674ea..dc434a3 100644
>> --- a/drivers/vfio/vfio_iommu_type1.c
>> +++ b/drivers/vfio/vfio_iommu_type1.c
>> @@ -56,6 +56,8 @@
>>  MODULE_PARM_DESC(disable_hugepages,
>>  		 "Disable VFIO IOMMU support for IOMMU hugepages.");
>>  
>> +#define NR_IOMMU_FAULT_INFO	10
>> +
>>  struct vfio_iommu {
>>  	struct list_head	domain_list;
>>  	struct vfio_domain	*external_domain; /* domain for external user */
>> @@ -64,6 +66,9 @@ struct vfio_iommu {
>>  	struct blocking_notifier_head notifier;
>>  	struct eventfd_ctx	*iommu_fault_fd;
>>  	struct mutex            fault_lock;
>> +	struct vfio_iommu_fault_info fault_info[NR_IOMMU_FAULT_INFO];
> 
> What if you run out of this space? Userspace will not
> see any more faults. What will cause progress to happen?


When userspace gets fault info via new VFIO cmd, the fault_info in arry
will be clear. If userspace doesn't get fault info after triggering
fault event fd, the surplus fault info will be ignored.

> 
> 
>> +	struct blocking_notifier_head iommu_fault_notifier;
>> +	u8			fault_count;
>>  	bool			v2;
>>  	bool			nesting;
>>  };
>> @@ -1456,6 +1461,7 @@ static void *vfio_iommu_type1_open(unsigned long arg)
>>  	iommu->dma_list = RB_ROOT;
>>  	mutex_init(&iommu->lock);
>>  	mutex_init(&iommu->fault_lock);
>> +	iommu->fault_count = 0;
>>  	BLOCKING_INIT_NOTIFIER_HEAD(&iommu->notifier);
>>  
>>  	return iommu;
>> @@ -1516,6 +1522,30 @@ static int vfio_domains_have_iommu_cache(struct vfio_iommu *iommu)
>>  	return ret;
>>  }
>>  
>> +static int vfio_iommu_fault_event_notifier(struct notifier_block *nb,
>> +					   struct iommu_fault_info *fault_info,
>> +					   void *data)
>> +{
>> +	struct vfio_iommu *iommu = data;
>> +	struct vfio_iommu_fault_info *info;
>> +
>> +	mutex_lock(&iommu->fault_lock);
>> +
>> +	info = &iommu->fault_info[iommu->fault_count];
>> +	info->addr = fault_info->addr;
>> +	info->sid = fault_info->sid;
>> +	info->fault_reason = fault_info->fault_reason;
>> +	info->is_write = fault_info->is_write;
>> +
>> +	iommu->fault_count++;
> 
> Will corrupt memory once array overflows NR_IOMMU_FAULT_INFO.


Yes, I miss check of NR_IOMMU_FAULT_INFO Here. Thanks.

> 
> 
>> +
>> +	if (iommu->iommu_fault_fd)
>> +		eventfd_signal(iommu->iommu_fault_fd, 1);
>> +
>> +	mutex_unlock(&iommu->fault_lock);
>> +	return 0;
>> +}
>> +
>>  static long vfio_iommu_type1_ioctl(void *iommu_data,
>>  				   unsigned int cmd, unsigned long arg)
>>  {
>> diff --git a/include/linux/iommu.h b/include/linux/iommu.h
>> index 0ff5111..b6a7bdb 100644
>> --- a/include/linux/iommu.h
>> +++ b/include/linux/iommu.h
>> @@ -86,6 +86,13 @@ struct iommu_domain {
>>  	void *iova_cookie;
>>  };
>>  
>> +struct iommu_fault_info {
>> +	__u64	addr;
>> +	__u16   sid;
>> +	__u8    fault_reason;
>> +	__u8	is_write:1;
>> +};
>> +
>>  enum iommu_cap {
>>  	IOMMU_CAP_CACHE_COHERENCY,	/* IOMMU can enforce cache coherent DMA
>>  					   transactions */
>> diff --git a/include/uapi/linux/vfio.h b/include/uapi/linux/vfio.h
>> index 8616334..da359dd 100644
>> --- a/include/uapi/linux/vfio.h
>> +++ b/include/uapi/linux/vfio.h
>> @@ -562,6 +562,13 @@ struct vfio_iommu_type1_set_fault_eventfd {
>>  
>>  #define VFIO_IOMMU_SET_FAULT_EVENTFD	_IO(VFIO_TYPE, VFIO_BASE + 17)
>>  
>> +struct vfio_iommu_fault_info {
>> +	__u64	addr;
>> +	__u16   sid;
> 
> It's not clear it's userspace's business to know the sid.  It normally
> does not care once management has bound vfio to a device. You should use
> a device identifier that makes sense.

Yes, How about "bdf"?

> 
> 
>> +	__u8    fault_reason;
>> +	__u8	is_write:1;
>> +};
>> +
>>  /* -------- Additional API for SPAPR TCE (Server POWERPC) IOMMU -------- */
>>  
>>  /*
>> -- 
>> 1.8.3.1


-- 
Best regards
Tianyu Lan

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [RFC PATCH 0/3] VFIO: Report IOMMU fault event to userspace
  2017-02-21  5:29     ` Alex Williamson
@ 2017-02-21 15:18       ` Lan Tianyu
  2017-02-21 15:21         ` Lan, Tianyu
  2017-02-28 15:58       ` Lan, Tianyu
  1 sibling, 1 reply; 27+ messages in thread
From: Lan Tianyu @ 2017-02-21 15:18 UTC (permalink / raw)
  To: Alex Williamson
  Cc: kvm, kevin.tian, mst, jan.kiszka, jasowang, peterx, david, yi.l.liu

On 2017年02月21日 13:29, Alex Williamson wrote:
> On Tue, 21 Feb 2017 12:49:18 +0800
> Lan Tianyu <tianyu.lan@intel.com> wrote:
>
>> On 2017年02月21日 04:53, Alex Williamson wrote:
>>> On Sun, 19 Feb 2017 22:47:06 +0800
>>> Lan Tianyu <tianyu.lan@intel.com> wrote:
>>>
>>>> This patchset proposes a solution to report hardware IOMMU fault
>>>> event to userspace. Motivation is to make vIOMMU in Qemu get physical
>>>> fault event and info from IOMMU hardware. vIOMMU is in charge of transforming
>>>> fault info and inject to guest. The feature is also very important to
>>>> support first level translation(Translation request with PASID) in VM
>>>> which requires vIOMMU to inject device page request to VM.
>>>
>>> Is the type1 IOMMU backend even a valid target here? vIOMMU support
>>> with type1 is functional, but not really usable except for very
>>> specific cases.  Type1 is not designed for and not well suited to
>>> dynamic DMA mapping in a VM, which means that anything other than
>>> iommu=pt or assigning the device to a VM user or l2 guest is going to
>>> have horrible performance. Faulting in mappings, as seems to be
>>> enabled here, sort of implies dynmaic mapping, right?
>>
>> Peter's patches have enabled vIOMMU IOVA with assigned device and guest
>> may change vIOMMU's IOVA page table dynamically. For example, if we
>> assign NIC and enable DMA protection in VM, NIC driver will dynamically
>> map/unmap DMA memory when receive/send packages and change vIOMMU IOVA
>> page table.
>
> Yes, it's functional, but does it have sufficient performance to bother
> to further extend it for the vIOMMU use case?  The benefit of the
> current level of vfio/viommu integration is that we can a) make use of
> DPDK with assigned devices in a secure mode within the guest, and b)
> use nested device assignment (though the usefulness of this this
> versus simple novelty is is questionable).  Both of these make use of
> relatively static mappings through vfio and therefore should not
> experience much mapping overhead aside from setup and teardown.


The fault event introduced by this patchset only works for vIOMMU's
IOVA(request without PASID) and it targets use cases just like you
mentioned(static mapping). IOVA doesn't support device page request and
so it doesn't introduce more dynamic mapping. These fault events are
triggered by misoperation of IO page table in VM.

>
> As I understand your use case, particularly with PASID, you're trying
> to enable the device to fault in mappings, which implies a dynamic
> mapping use case.  Run a 10GBps NIC (or 40 or 100) in a viommu guest
> _without_ using iommu=pt.  How close do you get to native performance?
> How close can a virtio device get in the same configuration?  What is
> the usefulness in extending type1 to support piommu faults if it
> doesn't have worthwhile performance?

For SVM(called first level translation in VTD spec), it doesn't require
to shadow all first level page table. We just need to shadow gPASID
table pointer and put into extend context entry of pIOMMU. VTD hardware
can read guest first level page table directly because it supports
nested translation for request with PASID which helps to translate GPA
to HPA during reading guest page table. So for SVM, we don't need to go 
through typ1 map/umap and dynamic map/umap will just happen in the guest.

For detail, please see Yi's [RFC Design Doc v2] Enable Shared Virtual
Memory feature
https://lists.gnu.org/archive/html/qemu-devel/2016-11/msg03617.html


>
>>> Our container
>>> based accounting falls apart for type1 backing a vIOMMU as well.  Each
>>> device lives in its own address space and therefore vfio container,
>>> which means that management of vIOMMU assigned device VMs needs to
>>> account for locked memory limits per device.  This quickly opens a
>>> large vulnerability if a VM can lock multiples of the VM memory size.
>>
>> This seems we have already this issue if we use assigned device with
>> vIOMMU regardless of whether we report pIOMMU fault event to vIOMMU or
>> not, right?
>
> Yes, so why continue to push viommu features into an iommu model that
> has these limitations?  Perhaps type1 can be extended to avoid them,
> perhaps a new "type2" iommu backend is a better solution.  Either way I
> don't see how adding iommu fault handling to type1 as it exists today
> is more than just a proof of concept exercise.

As mentioned above, these fault events will make the situation worse.
I am not sure whether we still need to extend type1 or introduce type2?
If yes, could you give more guides how to do that. Thanks.

>
>>>
>>> So is what you're developing here just a proof of concept using type1,
>>> or is this something that you feel actually has value long term?
>>
>> We want to pass physical IOMMU fault event to vIOMMU and then inject
>> virtual fault event to VM. It's necessary to create the channel between
>> pIOMMU driver and vIOMMU. Currently, IOMMU map/umap requests go through
>> VFIO IOMMU type1 driver and so I supposed VFIO IOMMU type1 was better
>> place. This patchset is to show the idea. Another option is to go though
>> VFIO-PCI driver we have considered. It's very helpful if you can give
>> some guides here :)
>
> I understand, but is this any more than a functional proof of concept?
> What are your performance requirements?  Does type1 meet them?  Have
> you tested?  My expectation is that dynamic DMA mapping performance
> through type1 with a viommu configured for dynamic mappings is abysmal
> and it's therefore a poor target on which to base additional features
> such as fault handling.  Do you have data to suggest otherwise?  Going
> through the device rather than the container doesn't solve my assertion
> that type1 is merely functional and does not provide worthwhile
> performance when used in conjunction with dynamic mappings.  Thanks,

Above.

>
> Alex
>
>>>> Introduce two new VFIO cmd for VFIO IOMMU type1 driver
>>>> - VFIO_IOMMU_SET_FAULT_EVENT_FD
>>>>   Receive eventfd from user space and trigger eventfd when get fault event
>>>> from IOMMU.
>>>>
>>>> - VFIO_IOMMU_GET_FAULT_INFO
>>>>   Return IOMMU fault info to userspace.
>>>>
>>>> VFIO IOMMU Type1 driver needs to register IOMMU fault event notifier
>>>> to IOMMU driver for assigned devices when they are attached to VFIO
>>>> container. IOMMU driver will run VFIO fault event callback when hardware
>>>> IOMMU triggers fault events for devices assigned to VFIO container. Then,
>>>> type1 driver signals eventfd provided by userspace and usrspace gets fault
>>>> info via new cmd.
>>>>
>>>> IOMMU interface are still in design stage and so we still can't register
>>>> IOMMU fault event notifier to IOMMU driver. This patchset is prototype
>>>> code to check whether VFIO IOMMU type1 driver is suitable place to deal
>>>> with IOMMU fault event and just passes build test.
>>>>
>>>> Very appreciate for comments.
>>>>
>>>> Lan Tianyu (3):
>>>>   VFIO: Add new cmd to receive eventfd from userspace to notify IOMMU
>>>>     fault event
>>>>   VFIO: Add IOMMU fault notifier callback
>>>>   VFIO: Add new cmd for user space to get IOMMU fault info
>>>>
>>>>  drivers/vfio/vfio_iommu_type1.c | 106 ++++++++++++++++++++++++++++++++++++++++
>>>>  include/linux/iommu.h           |   7 +++
>>>>  include/uapi/linux/vfio.h       |  37 ++++++++++++++
>>>>  3 files changed, 150 insertions(+)
>>>>
>>>
>>
>>
>


-- 
Best regards
Tianyu Lan

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [RFC PATCH 0/3] VFIO: Report IOMMU fault event to userspace
  2017-02-21 15:18       ` Lan Tianyu
@ 2017-02-21 15:21         ` Lan, Tianyu
  0 siblings, 0 replies; 27+ messages in thread
From: Lan, Tianyu @ 2017-02-21 15:21 UTC (permalink / raw)
  To: Alex Williamson
  Cc: kvm, kevin.tian, mst, jan.kiszka, jasowang, peterx, david, yi.l.liu

On 2/21/2017 11:18 PM, Lan Tianyu wrote:
>
>>
>> As I understand your use case, particularly with PASID, you're trying
>> to enable the device to fault in mappings, which implies a dynamic
>> mapping use case.  Run a 10GBps NIC (or 40 or 100) in a viommu guest
>> _without_ using iommu=pt.  How close do you get to native performance?
>> How close can a virtio device get in the same configuration?  What is
>> the usefulness in extending type1 to support piommu faults if it
>> doesn't have worthwhile performance?
>
> For SVM(called first level translation in VTD spec), it doesn't require
> to shadow all first level page table. We just need to shadow gPASID
> table pointer and put into extend context entry of pIOMMU. VTD hardware
> can read guest first level page table directly because it supports
> nested translation for request with PASID which helps to translate GPA
> to HPA during reading guest page table. So for SVM, we don't need to go
> through typ1 map/umap and dynamic map/umap will just happen in the guest.
>
> For detail, please see Yi's [RFC Design Doc v2] Enable Shared Virtual
> Memory feature
> https://lists.gnu.org/archive/html/qemu-devel/2016-11/msg03617.html

BTW, current only Intel GPU supports request-with-PASID and device page 
request as far as I know,

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [RFC PATCH 0/3] VFIO: Report IOMMU fault event to userspace
  2017-02-21  5:29     ` Alex Williamson
  2017-02-21 15:18       ` Lan Tianyu
@ 2017-02-28 15:58       ` Lan, Tianyu
  2017-03-15  6:17         ` Liu, Yi L
  1 sibling, 1 reply; 27+ messages in thread
From: Lan, Tianyu @ 2017-02-28 15:58 UTC (permalink / raw)
  To: Alex Williamson
  Cc: kvm, kevin.tian, mst, jan.kiszka, jasowang, peterx, david,
	yi.l.liu, Jean-Philippe.Brucker

Hi Alex:
Does following comments make sense to you? In the previous discussion,
the type1 IOMMU driver isn't suitable for dynamic map/umap and we should
extend type1 or introduce type2. For fault event reporting or future
IOMMU related function, we need to figure out they should be in the
vfio-pci, vfio-IOMMU driver or something else. SVM support in VM also
will face such kind of choice. As Jean-Philippe posted SVM support for
ARM, I think most platforms have such requirement. Thanks.



On 2017年02月21日 13:29, Alex Williamson wrote:
> On Tue, 21 Feb 2017 12:49:18 +0800
> Lan Tianyu <tianyu.lan@intel.com> wrote:
>
>> On 2017年02月21日 04:53, Alex Williamson wrote:
>>> On Sun, 19 Feb 2017 22:47:06 +0800
>>> Lan Tianyu <tianyu.lan@intel.com> wrote:
>>>
>>>> This patchset proposes a solution to report hardware IOMMU fault
>>>> event to userspace. Motivation is to make vIOMMU in Qemu get physical
>>>> fault event and info from IOMMU hardware. vIOMMU is in charge of transforming
>>>> fault info and inject to guest. The feature is also very important to
>>>> support first level translation(Translation request with PASID) in VM
>>>> which requires vIOMMU to inject device page request to VM.
>>>
>>> Is the type1 IOMMU backend even a valid target here? vIOMMU support
>>> with type1 is functional, but not really usable except for very
>>> specific cases.  Type1 is not designed for and not well suited to
>>> dynamic DMA mapping in a VM, which means that anything other than
>>> iommu=pt or assigning the device to a VM user or l2 guest is going to
>>> have horrible performance. Faulting in mappings, as seems to be
>>> enabled here, sort of implies dynmaic mapping, right?
>>
>> Peter's patches have enabled vIOMMU IOVA with assigned device and guest
>> may change vIOMMU's IOVA page table dynamically. For example, if we
>> assign NIC and enable DMA protection in VM, NIC driver will dynamically
>> map/unmap DMA memory when receive/send packages and change vIOMMU IOVA
>> page table.
>
> Yes, it's functional, but does it have sufficient performance to bother
> to further extend it for the vIOMMU use case?  The benefit of the
> current level of vfio/viommu integration is that we can a) make use of
> DPDK with assigned devices in a secure mode within the guest, and b)
> use nested device assignment (though the usefulness of this this
> versus simple novelty is is questionable).  Both of these make use of
> relatively static mappings through vfio and therefore should not
> experience much mapping overhead aside from setup and teardown.


The fault event introduced by this patchset only works for vIOMMU's
IOVA(request without PASID) and it targets use cases just like you
mentioned(static mapping). IOVA doesn't support device page request and
so it doesn't introduce more dynamic mapping. These fault events are
triggered by misoperation of IO page table in VM.

>
> As I understand your use case, particularly with PASID, you're trying
> to enable the device to fault in mappings, which implies a dynamic
> mapping use case.  Run a 10GBps NIC (or 40 or 100) in a viommu guest
> _without_ using iommu=pt.  How close do you get to native performance?
> How close can a virtio device get in the same configuration?  What is
> the usefulness in extending type1 to support piommu faults if it
> doesn't have worthwhile performance?

For SVM(called first level translation in VTD spec), it doesn't require
to shadow all first level page table. We just need to shadow gPASID
table pointer and put into extend context entry of pIOMMU. VTD hardware
can read guest first level page table directly because it supports
nested translation for request with PASID which helps to translate GPA
to HPA during reading guest page table. So for SVM, we don't need to go 
through typ1 map/umap and dynamic map/umap will just happen in the guest.

For detail, please see Yi's [RFC Design Doc v2] Enable Shared Virtual
Memory feature
https://lists.gnu.org/archive/html/qemu-devel/2016-11/msg03617.html


>
>>> Our container
>>> based accounting falls apart for type1 backing a vIOMMU as well.  Each
>>> device lives in its own address space and therefore vfio container,
>>> which means that management of vIOMMU assigned device VMs needs to
>>> account for locked memory limits per device.  This quickly opens a
>>> large vulnerability if a VM can lock multiples of the VM memory size.
>>
>> This seems we have already this issue if we use assigned device with
>> vIOMMU regardless of whether we report pIOMMU fault event to vIOMMU or
>> not, right?
>
> Yes, so why continue to push viommu features into an iommu model that
> has these limitations?  Perhaps type1 can be extended to avoid them,
> perhaps a new "type2" iommu backend is a better solution.  Either way I
> don't see how adding iommu fault handling to type1 as it exists today
> is more than just a proof of concept exercise.

As mentioned above, these fault events won't make the situation worse.
I am not sure whether we still need to extend type1 or introduce type2?
If yes, could you give more guides how to do that. Thanks.

>
>>>
>>> So is what you're developing here just a proof of concept using type1,
>>> or is this something that you feel actually has value long term?
>>
>> We want to pass physical IOMMU fault event to vIOMMU and then inject
>> virtual fault event to VM. It's necessary to create the channel between
>> pIOMMU driver and vIOMMU. Currently, IOMMU map/umap requests go through
>> VFIO IOMMU type1 driver and so I supposed VFIO IOMMU type1 was better
>> place. This patchset is to show the idea. Another option is to go though
>> VFIO-PCI driver we have considered. It's very helpful if you can give
>> some guides here :)
>
> I understand, but is this any more than a functional proof of concept?
> What are your performance requirements?  Does type1 meet them?  Have
> you tested?  My expectation is that dynamic DMA mapping performance
> through type1 with a viommu configured for dynamic mappings is abysmal
> and it's therefore a poor target on which to base additional features
> such as fault handling.  Do you have data to suggest otherwise?  Going
> through the device rather than the container doesn't solve my assertion
> that type1 is merely functional and does not provide worthwhile
> performance when used in conjunction with dynamic mappings.  Thanks,

Above.

>
> Alex
>
>>>> Introduce two new VFIO cmd for VFIO IOMMU type1 driver
>>>> - VFIO_IOMMU_SET_FAULT_EVENT_FD
>>>>   Receive eventfd from user space and trigger eventfd when get fault event
>>>> from IOMMU.
>>>>
>>>> - VFIO_IOMMU_GET_FAULT_INFO
>>>>   Return IOMMU fault info to userspace.
>>>>
>>>> VFIO IOMMU Type1 driver needs to register IOMMU fault event notifier
>>>> to IOMMU driver for assigned devices when they are attached to VFIO
>>>> container. IOMMU driver will run VFIO fault event callback when hardware
>>>> IOMMU triggers fault events for devices assigned to VFIO container. Then,
>>>> type1 driver signals eventfd provided by userspace and usrspace gets fault
>>>> info via new cmd.
>>>>
>>>> IOMMU interface are still in design stage and so we still can't register
>>>> IOMMU fault event notifier to IOMMU driver. This patchset is prototype
>>>> code to check whether VFIO IOMMU type1 driver is suitable place to deal
>>>> with IOMMU fault event and just passes build test.
>>>>
>>>> Very appreciate for comments.
>>>>
>>>> Lan Tianyu (3):
>>>>   VFIO: Add new cmd to receive eventfd from userspace to notify IOMMU
>>>>     fault event
>>>>   VFIO: Add IOMMU fault notifier callback
>>>>   VFIO: Add new cmd for user space to get IOMMU fault info
>>>>
>>>>  drivers/vfio/vfio_iommu_type1.c | 106 ++++++++++++++++++++++++++++++++++++++++
>>>>  include/linux/iommu.h           |   7 +++
>>>>  include/uapi/linux/vfio.h       |  37 ++++++++++++++
>>>>  3 files changed, 150 insertions(+)
>>>>
>>>
>>
>>
>


-- 
Best regards
Tianyu Lan

^ permalink raw reply	[flat|nested] 27+ messages in thread

* RE: [RFC PATCH 0/3] VFIO: Report IOMMU fault event to userspace
  2017-02-28 15:58       ` Lan, Tianyu
@ 2017-03-15  6:17         ` Liu, Yi L
  2017-03-15 19:52           ` Alex Williamson
  0 siblings, 1 reply; 27+ messages in thread
From: Liu, Yi L @ 2017-03-15  6:17 UTC (permalink / raw)
  To: Lan, Tianyu, Alex Williamson
  Cc: kvm, Tian, Kevin, mst, jan.kiszka, jasowang, peterx, david,
	Jean-Philippe.Brucker

> -----Original Message-----
> From: Lan, Tianyu
> Sent: Tuesday, February 28, 2017 11:58 PM
> To: Alex Williamson <alex.williamson@redhat.com>
> Cc: kvm@vger.kernel.org; Tian, Kevin <kevin.tian@intel.com>; mst@redhat.com;
> jan.kiszka@siemens.com; jasowang@redhat.com; peterx@redhat.com;
> david@gibson.dropbear.id.au; Liu, Yi L <yi.l.liu@intel.com>; Jean-
> Philippe.Brucker@arm.com
> Subject: Re: [RFC PATCH 0/3] VFIO: Report IOMMU fault event to userspace
> 
> Hi Alex:
> Does following comments make sense to you? In the previous discussion, the type1
> IOMMU driver isn't suitable for dynamic map/umap and we should extend type1 or
> introduce type2. For fault event reporting or future IOMMU related function, we
> need to figure out they should be in the vfio-pci, vfio-IOMMU driver or something
> else. SVM support in VM also will face such kind of choice. As Jean-Philippe posted
> SVM support for ARM, I think most platforms have such requirement. Thanks.

Hello Alex,

Do you have any further suggestion on where to place the reporting channel in VFIO?
Seems like we have options includes: vfio-pci, vfio-IOMMU driver.

Thanks,
Yi L

> 
> On 2017年02月21日 13:29, Alex Williamson wrote:
> > On Tue, 21 Feb 2017 12:49:18 +0800
> > Lan Tianyu <tianyu.lan@intel.com> wrote:
> >
> >> On 2017年02月21日 04:53, Alex Williamson wrote:
> >>> On Sun, 19 Feb 2017 22:47:06 +0800
> >>> Lan Tianyu <tianyu.lan@intel.com> wrote:
> >>>
> >>>> This patchset proposes a solution to report hardware IOMMU fault
> >>>> event to userspace. Motivation is to make vIOMMU in Qemu get
> >>>> physical fault event and info from IOMMU hardware. vIOMMU is in
> >>>> charge of transforming fault info and inject to guest. The feature
> >>>> is also very important to support first level
> >>>> translation(Translation request with PASID) in VM which requires vIOMMU to
> inject device page request to VM.
> >>>
> >>> Is the type1 IOMMU backend even a valid target here? vIOMMU support
> >>> with type1 is functional, but not really usable except for very
> >>> specific cases.  Type1 is not designed for and not well suited to
> >>> dynamic DMA mapping in a VM, which means that anything other than
> >>> iommu=pt or assigning the device to a VM user or l2 guest is going
> >>> to have horrible performance. Faulting in mappings, as seems to be
> >>> enabled here, sort of implies dynmaic mapping, right?
> >>
> >> Peter's patches have enabled vIOMMU IOVA with assigned device and
> >> guest may change vIOMMU's IOVA page table dynamically. For example,
> >> if we assign NIC and enable DMA protection in VM, NIC driver will
> >> dynamically map/unmap DMA memory when receive/send packages and
> >> change vIOMMU IOVA page table.
> >
> > Yes, it's functional, but does it have sufficient performance to
> > bother to further extend it for the vIOMMU use case?  The benefit of
> > the current level of vfio/viommu integration is that we can a) make
> > use of DPDK with assigned devices in a secure mode within the guest,
> > and b) use nested device assignment (though the usefulness of this
> > this versus simple novelty is is questionable).  Both of these make
> > use of relatively static mappings through vfio and therefore should
> > not experience much mapping overhead aside from setup and teardown.
> 
> 
> The fault event introduced by this patchset only works for vIOMMU's IOVA(request
> without PASID) and it targets use cases just like you mentioned(static mapping). IOVA
> doesn't support device page request and so it doesn't introduce more dynamic
> mapping. These fault events are triggered by misoperation of IO page table in VM.
> 
> >
> > As I understand your use case, particularly with PASID, you're trying
> > to enable the device to fault in mappings, which implies a dynamic
> > mapping use case.  Run a 10GBps NIC (or 40 or 100) in a viommu guest
> > _without_ using iommu=pt.  How close do you get to native performance?
> > How close can a virtio device get in the same configuration?  What is
> > the usefulness in extending type1 to support piommu faults if it
> > doesn't have worthwhile performance?
> 
> For SVM(called first level translation in VTD spec), it doesn't require to shadow all
> first level page table. We just need to shadow gPASID table pointer and put into
> extend context entry of pIOMMU. VTD hardware can read guest first level page
> table directly because it supports nested translation for request with PASID which
> helps to translate GPA to HPA during reading guest page table. So for SVM, we don't
> need to go through typ1 map/umap and dynamic map/umap will just happen in the
> guest.
> 
> For detail, please see Yi's [RFC Design Doc v2] Enable Shared Virtual Memory feature
> https://lists.gnu.org/archive/html/qemu-devel/2016-11/msg03617.html
> 
> 
> >
> >>> Our container
> >>> based accounting falls apart for type1 backing a vIOMMU as well.
> >>> Each device lives in its own address space and therefore vfio
> >>> container, which means that management of vIOMMU assigned device VMs
> >>> needs to account for locked memory limits per device.  This quickly
> >>> opens a large vulnerability if a VM can lock multiples of the VM memory size.
> >>
> >> This seems we have already this issue if we use assigned device with
> >> vIOMMU regardless of whether we report pIOMMU fault event to vIOMMU
> >> or not, right?
> >
> > Yes, so why continue to push viommu features into an iommu model that
> > has these limitations?  Perhaps type1 can be extended to avoid them,
> > perhaps a new "type2" iommu backend is a better solution.  Either way
> > I don't see how adding iommu fault handling to type1 as it exists
> > today is more than just a proof of concept exercise.
> 
> As mentioned above, these fault events won't make the situation worse.
> I am not sure whether we still need to extend type1 or introduce type2?
> If yes, could you give more guides how to do that. Thanks.
> 
> >
> >>>
> >>> So is what you're developing here just a proof of concept using
> >>> type1, or is this something that you feel actually has value long term?
> >>
> >> We want to pass physical IOMMU fault event to vIOMMU and then inject
> >> virtual fault event to VM. It's necessary to create the channel
> >> between pIOMMU driver and vIOMMU. Currently, IOMMU map/umap requests
> >> go through VFIO IOMMU type1 driver and so I supposed VFIO IOMMU type1
> >> was better place. This patchset is to show the idea. Another option
> >> is to go though VFIO-PCI driver we have considered. It's very helpful
> >> if you can give some guides here :)
> >
> > I understand, but is this any more than a functional proof of concept?
> > What are your performance requirements?  Does type1 meet them?  Have
> > you tested?  My expectation is that dynamic DMA mapping performance
> > through type1 with a viommu configured for dynamic mappings is abysmal
> > and it's therefore a poor target on which to base additional features
> > such as fault handling.  Do you have data to suggest otherwise?  Going
> > through the device rather than the container doesn't solve my
> > assertion that type1 is merely functional and does not provide
> > worthwhile performance when used in conjunction with dynamic mappings.
> > Thanks,
> 
> Above.
> 
> >
> > Alex
> >
> >>>> Introduce two new VFIO cmd for VFIO IOMMU type1 driver
> >>>> - VFIO_IOMMU_SET_FAULT_EVENT_FD
> >>>>   Receive eventfd from user space and trigger eventfd when get
> >>>> fault event from IOMMU.
> >>>>
> >>>> - VFIO_IOMMU_GET_FAULT_INFO
> >>>>   Return IOMMU fault info to userspace.
> >>>>
> >>>> VFIO IOMMU Type1 driver needs to register IOMMU fault event
> >>>> notifier to IOMMU driver for assigned devices when they are
> >>>> attached to VFIO container. IOMMU driver will run VFIO fault event
> >>>> callback when hardware IOMMU triggers fault events for devices
> >>>> assigned to VFIO container. Then,
> >>>> type1 driver signals eventfd provided by userspace and usrspace
> >>>> gets fault info via new cmd.
> >>>>
> >>>> IOMMU interface are still in design stage and so we still can't
> >>>> register IOMMU fault event notifier to IOMMU driver. This patchset
> >>>> is prototype code to check whether VFIO IOMMU type1 driver is
> >>>> suitable place to deal with IOMMU fault event and just passes build test.
> >>>>
> >>>> Very appreciate for comments.
> >>>>
> >>>> Lan Tianyu (3):
> >>>>   VFIO: Add new cmd to receive eventfd from userspace to notify IOMMU
> >>>>     fault event
> >>>>   VFIO: Add IOMMU fault notifier callback
> >>>>   VFIO: Add new cmd for user space to get IOMMU fault info
> >>>>
> >>>>  drivers/vfio/vfio_iommu_type1.c | 106
> ++++++++++++++++++++++++++++++++++++++++
> >>>>  include/linux/iommu.h           |   7 +++
> >>>>  include/uapi/linux/vfio.h       |  37 ++++++++++++++
> >>>>  3 files changed, 150 insertions(+)
> >>>>
> >>>
> >>
> >>
> >
> 
> 
> --
> Best regards
> Tianyu Lan

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [RFC PATCH 0/3] VFIO: Report IOMMU fault event to userspace
  2017-03-15  6:17         ` Liu, Yi L
@ 2017-03-15 19:52           ` Alex Williamson
  2017-03-16  1:42             ` Lan Tianyu
  0 siblings, 1 reply; 27+ messages in thread
From: Alex Williamson @ 2017-03-15 19:52 UTC (permalink / raw)
  To: Liu, Yi L
  Cc: Lan, Tianyu, kvm, Tian, Kevin, mst, jan.kiszka, jasowang, peterx,
	david, Jean-Philippe.Brucker

On Wed, 15 Mar 2017 06:17:06 +0000
"Liu, Yi L" <yi.l.liu@intel.com> wrote:

> > -----Original Message-----
> > From: Lan, Tianyu
> > Sent: Tuesday, February 28, 2017 11:58 PM
> > To: Alex Williamson <alex.williamson@redhat.com>
> > Cc: kvm@vger.kernel.org; Tian, Kevin <kevin.tian@intel.com>; mst@redhat.com;
> > jan.kiszka@siemens.com; jasowang@redhat.com; peterx@redhat.com;
> > david@gibson.dropbear.id.au; Liu, Yi L <yi.l.liu@intel.com>; Jean-
> > Philippe.Brucker@arm.com
> > Subject: Re: [RFC PATCH 0/3] VFIO: Report IOMMU fault event to userspace
> > 
> > Hi Alex:
> > Does following comments make sense to you? In the previous discussion, the type1
> > IOMMU driver isn't suitable for dynamic map/umap and we should extend type1 or
> > introduce type2. For fault event reporting or future IOMMU related function, we
> > need to figure out they should be in the vfio-pci, vfio-IOMMU driver or something
> > else. SVM support in VM also will face such kind of choice. As Jean-Philippe posted
> > SVM support for ARM, I think most platforms have such requirement. Thanks.  
> 
> Hello Alex,
> 
> Do you have any further suggestion on where to place the reporting channel in VFIO?
> Seems like we have options includes: vfio-pci, vfio-IOMMU driver.

Here's my thought process, I start out leaning towards vfio-pci because
the vfio container can actually handle multiple IOMMU domains, each of
which is theoretically hosted on different physical IOMMUs, possibly by
different vendors.  So we can't even guarantee that we have a single
vendor error format per container.  A device however only maps through
a single IOMMU and therefore only has a single error format. Devices
already support various interrupt and error signaling mechanisms and we
already have device specific regions which could be used to expose some
form of error log.  It also removes any sort of source ID from the
error report.

Also I presume that this vIOMMU use case is not the only case where a
driver would want to be notified of IOMMU faults, in-kernel drivers
might want this too.  Drivers making use of the DMA API don't really
have any visibility to the IOMMU domain in use, so the framework we use
to connect drivers with the IOMMU faults probably needs to abstract
that.  Here's the problem though, in-kernel drivers are not going to
accept IOMMU vendor specific fault reporting.  So while we could
have maybe used device specific regions in vfio to report vendor
specific faults, that abstraction problem needs to be solved for any
in-kernel user anyway.

Now, if we go back and start from the premise that we have in-kernel
infrastructure to report IOMMU faults to drivers in a common,
non-vendor specific way, does that change my conclusion in the first
paragraph since not having a consistent error format was a contributing
factor.  It seems like a common error format is not the only problem
with a container hosting multiple domains though.  What if we have a
container where some domains are capable of reporting faults and others
are not.  Could a user positively determine that a device is capable of
reporting IOMMU faults in that case?  It seems not.  So perhaps the
vfio device is still the proper place to host that reporting and we can
simply leverage the common error reporting in the host layer to expose
similar common reporting to the user, which also provides the benefit
that the solution isn't locked to matching physical IOMMU and vIOMMU
from the same vendor.  Thanks,

Alex

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [RFC PATCH 0/3] VFIO: Report IOMMU fault event to userspace
  2017-03-15 19:52           ` Alex Williamson
@ 2017-03-16  1:42             ` Lan Tianyu
  2017-03-16  3:32               ` Jason Wang
  2017-03-21 23:57               ` Liu, Yi L
  0 siblings, 2 replies; 27+ messages in thread
From: Lan Tianyu @ 2017-03-16  1:42 UTC (permalink / raw)
  To: Alex Williamson, Liu, Yi L
  Cc: kvm, Tian, Kevin, mst, jan.kiszka, jasowang, peterx, david,
	Jean-Philippe.Brucker

On 2017年03月16日 03:52, Alex Williamson wrote:
> On Wed, 15 Mar 2017 06:17:06 +0000
> "Liu, Yi L" <yi.l.liu@intel.com> wrote:
> 
>>> -----Original Message-----
>>> From: Lan, Tianyu
>>> Sent: Tuesday, February 28, 2017 11:58 PM
>>> To: Alex Williamson <alex.williamson@redhat.com>
>>> Cc: kvm@vger.kernel.org; Tian, Kevin <kevin.tian@intel.com>; mst@redhat.com;
>>> jan.kiszka@siemens.com; jasowang@redhat.com; peterx@redhat.com;
>>> david@gibson.dropbear.id.au; Liu, Yi L <yi.l.liu@intel.com>; Jean-
>>> Philippe.Brucker@arm.com
>>> Subject: Re: [RFC PATCH 0/3] VFIO: Report IOMMU fault event to userspace
>>>
>>> Hi Alex:
>>> Does following comments make sense to you? In the previous discussion, the type1
>>> IOMMU driver isn't suitable for dynamic map/umap and we should extend type1 or
>>> introduce type2. For fault event reporting or future IOMMU related function, we
>>> need to figure out they should be in the vfio-pci, vfio-IOMMU driver or something
>>> else. SVM support in VM also will face such kind of choice. As Jean-Philippe posted
>>> SVM support for ARM, I think most platforms have such requirement. Thanks.  
>>
>> Hello Alex,
>>
>> Do you have any further suggestion on where to place the reporting channel in VFIO?
>> Seems like we have options includes: vfio-pci, vfio-IOMMU driver.
> 
> Here's my thought process, I start out leaning towards vfio-pci because
> the vfio container can actually handle multiple IOMMU domains, each of
> which is theoretically hosted on different physical IOMMUs, possibly by
> different vendors.  So we can't even guarantee that we have a single
> vendor error format per container.  A device however only maps through
> a single IOMMU and therefore only has a single error format. Devices
> already support various interrupt and error signaling mechanisms and we
> already have device specific regions which could be used to expose some
> form of error log.  It also removes any sort of source ID from the
> error report.

Agree.

> 
> Also I presume that this vIOMMU use case is not the only case where a
> driver would want to be notified of IOMMU faults, in-kernel drivers
> might want this too.  Drivers making use of the DMA API don't really
> have any visibility to the IOMMU domain in use, so the framework we use
> to connect drivers with the IOMMU faults probably needs to abstract
> that.  

Yes, device page request(part of SVM support) on native also requires
that device driver to receive IOMMU fault event(page request event) from
IOMMU driver. So it's necessary to add such abstract layer between IOMMU
driver and device driver(include VFIO-PCI driver).

First in my mind, IOMMU core needs to provide fault event reporting
notifier and a fault event

> Here's the problem though, in-kernel drivers are not going to
> accept IOMMU vendor specific fault reporting.  So while we could
> have maybe used device specific regions in vfio to report vendor
> specific faults, that abstraction problem needs to be solved for any
> in-kernel user anyway.

It looks we still need a common fault format to pass fault event between
IOMMU and VFIO-PCI. One device also maybe used on different platforms
and so device driver should not have such platform specific code to
handle fault event. If we already have such common structure, fault
event reporting from VFIO-PCI to Qemu also can reuse such structure.
Otherwise, we have to check the fault event before passing it to vIOMMU
since vIOMMU maybe belong to different platforms if we don't have
limitation that vIOMMU should match with platform we are running.(E,G
virtual vtd only can be used on the intel platform).

> 
> Now, if we go back and start from the premise that we have in-kernel
> infrastructure to report IOMMU faults to drivers in a common,
> non-vendor specific way, does that change my conclusion in the first
> paragraph since not having a consistent error format was a contributing
> factor.  It seems like a common error format is not the only problem
> with a container hosting multiple domains though.  What if we have a
> container where some domains are capable of reporting faults and others
> are not.  Could a user positively determine that a device is capable of
> reporting IOMMU faults in that case?  It seems not.  So perhaps the
> vfio device is still the proper place to host that reporting and we can
> simply leverage the common error reporting in the host layer to expose
> similar common reporting to the user, which also provides the benefit
> that the solution isn't locked to matching physical IOMMU and vIOMMU
> from the same vendor.  Thanks,
> 
> Alex
> 


-- 
Best regards
Tianyu Lan

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [RFC PATCH 0/3] VFIO: Report IOMMU fault event to userspace
  2017-03-16  1:42             ` Lan Tianyu
@ 2017-03-16  3:32               ` Jason Wang
  2017-03-16  5:22                 ` Lan Tianyu
  2017-03-21 23:57               ` Liu, Yi L
  1 sibling, 1 reply; 27+ messages in thread
From: Jason Wang @ 2017-03-16  3:32 UTC (permalink / raw)
  To: Lan Tianyu, Alex Williamson, Liu, Yi L
  Cc: kvm, Tian, Kevin, mst, jan.kiszka, peterx, david, Jean-Philippe.Brucker



On 2017年03月16日 09:42, Lan Tianyu wrote:
>> Also I presume that this vIOMMU use case is not the only case where a
>> driver would want to be notified of IOMMU faults, in-kernel drivers
>> might want this too.  Drivers making use of the DMA API don't really
>> have any visibility to the IOMMU domain in use, so the framework we use
>> to connect drivers with the IOMMU faults probably needs to abstract
>> that.
> Yes, device page request(part of SVM support) on native also requires
> that device driver to receive IOMMU fault event(page request event) from
> IOMMU driver. So it's necessary to add such abstract layer between IOMMU
> driver and device driver(include VFIO-PCI driver).

Hi:

Do you have a real case that requires PRS for kernel drivers?

Thanks

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [RFC PATCH 0/3] VFIO: Report IOMMU fault event to userspace
  2017-03-16  3:32               ` Jason Wang
@ 2017-03-16  5:22                 ` Lan Tianyu
  0 siblings, 0 replies; 27+ messages in thread
From: Lan Tianyu @ 2017-03-16  5:22 UTC (permalink / raw)
  To: Jason Wang, Alex Williamson, Liu, Yi L
  Cc: kvm, Tian, Kevin, mst, jan.kiszka, peterx, david, Jean-Philippe.Brucker

On 2017年03月16日 11:32, Jason Wang wrote:
> 
> 
> On 2017年03月16日 09:42, Lan Tianyu wrote:
>>> Also I presume that this vIOMMU use case is not the only case where a
>>> driver would want to be notified of IOMMU faults, in-kernel drivers
>>> might want this too.  Drivers making use of the DMA API don't really
>>> have any visibility to the IOMMU domain in use, so the framework we use
>>> to connect drivers with the IOMMU faults probably needs to abstract
>>> that.
>> Yes, device page request(part of SVM support) on native also requires
>> that device driver to receive IOMMU fault event(page request event) from
>> IOMMU driver. So it's necessary to add such abstract layer between IOMMU
>> driver and device driver(include VFIO-PCI driver).
> 
> Hi:
> 
> Do you have a real case that requires PRS for kernel drivers?
> 

Yes, Intel GPU supports this feature and patches is upstreaming.

https://lists.freedesktop.org/archives/intel-gfx/2017-January/116292.html

-- 
Best regards
Tianyu Lan

^ permalink raw reply	[flat|nested] 27+ messages in thread

* RE: [RFC PATCH 0/3] VFIO: Report IOMMU fault event to userspace
  2017-03-16  1:42             ` Lan Tianyu
  2017-03-16  3:32               ` Jason Wang
@ 2017-03-21 23:57               ` Liu, Yi L
  1 sibling, 0 replies; 27+ messages in thread
From: Liu, Yi L @ 2017-03-21 23:57 UTC (permalink / raw)
  To: Lan, Tianyu, Alex Williamson
  Cc: kvm, Tian, Kevin, mst, jan.kiszka, jasowang, peterx, david,
	Jean-Philippe.Brucker, iommu-bounces, Pan, Jacob jun, Raj, Ashok

Loop IOMMU mailing list and add Ashok and Jacob.

> -----Original Message-----
> From: Lan, Tianyu
> Sent: Thursday, March 16, 2017 9:43 AM
> To: Alex Williamson <alex.williamson@redhat.com>; Liu, Yi L <yi.l.liu@intel.com>
> Cc: kvm@vger.kernel.org; Tian, Kevin <kevin.tian@intel.com>; mst@redhat.com;
> jan.kiszka@siemens.com; jasowang@redhat.com; peterx@redhat.com;
> david@gibson.dropbear.id.au; Jean-Philippe.Brucker@arm.com
> Subject: Re: [RFC PATCH 0/3] VFIO: Report IOMMU fault event to userspace
> 
> On 2017年03月16日 03:52, Alex Williamson wrote:
> > On Wed, 15 Mar 2017 06:17:06 +0000
> > "Liu, Yi L" <yi.l.liu@intel.com> wrote:
> >
> >>> -----Original Message-----
> >>> From: Lan, Tianyu
> >>> Sent: Tuesday, February 28, 2017 11:58 PM
> >>> To: Alex Williamson <alex.williamson@redhat.com>
> >>> Cc: kvm@vger.kernel.org; Tian, Kevin <kevin.tian@intel.com>;
> >>> mst@redhat.com; jan.kiszka@siemens.com; jasowang@redhat.com;
> >>> peterx@redhat.com; david@gibson.dropbear.id.au; Liu, Yi L
> >>> <yi.l.liu@intel.com>; Jean- Philippe.Brucker@arm.com
> >>> Subject: Re: [RFC PATCH 0/3] VFIO: Report IOMMU fault event to
> >>> userspace
> >>>
> >>> Hi Alex:
> >>> Does following comments make sense to you? In the previous
> >>> discussion, the type1 IOMMU driver isn't suitable for dynamic
> >>> map/umap and we should extend type1 or introduce type2. For fault
> >>> event reporting or future IOMMU related function, we need to figure
> >>> out they should be in the vfio-pci, vfio-IOMMU driver or something
> >>> else. SVM support in VM also will face such kind of choice. As Jean-Philippe
> posted SVM support for ARM, I think most platforms have such requirement. Thanks.
> >>
> >> Hello Alex,
> >>
> >> Do you have any further suggestion on where to place the reporting channel in
> VFIO?
> >> Seems like we have options includes: vfio-pci, vfio-IOMMU driver.
> >
> > Here's my thought process, I start out leaning towards vfio-pci
> > because the vfio container can actually handle multiple IOMMU domains,
> > each of which is theoretically hosted on different physical IOMMUs,
> > possibly by different vendors.  So we can't even guarantee that we
> > have a single vendor error format per container.  A device however
> > only maps through a single IOMMU and therefore only has a single error
> > format. Devices already support various interrupt and error signaling
> > mechanisms and we already have device specific regions which could be
> > used to expose some form of error log.  It also removes any sort of
> > source ID from the error report.
> 
> Agree.
> 
> >
> > Also I presume that this vIOMMU use case is not the only case where a
> > driver would want to be notified of IOMMU faults, in-kernel drivers
> > might want this too.  Drivers making use of the DMA API don't really
> > have any visibility to the IOMMU domain in use, so the framework we
> > use to connect drivers with the IOMMU faults probably needs to
> > abstract that.
> 
> Yes, device page request(part of SVM support) on native also requires that device
> driver to receive IOMMU fault event(page request event) from IOMMU driver. So it's
> necessary to add such abstract layer between IOMMU driver and device
> driver(include VFIO-PCI driver).
> 
> First in my mind, IOMMU core needs to provide fault event reporting notifier and a
> fault event
> 
> > Here's the problem though, in-kernel drivers are not going to accept
> > IOMMU vendor specific fault reporting.  So while we could have maybe
> > used device specific regions in vfio to report vendor specific faults,
> > that abstraction problem needs to be solved for any in-kernel user
> > anyway.
> 
> It looks we still need a common fault format to pass fault event between IOMMU
> and VFIO-PCI. One device also maybe used on different platforms and so device
> driver should not have such platform specific code to handle fault event. If we
> already have such common structure, fault event reporting from VFIO-PCI to Qemu
> also can reuse such structure.
> Otherwise, we have to check the fault event before passing it to vIOMMU since
> vIOMMU maybe belong to different platforms if we don't have limitation that
> vIOMMU should match with platform we are running.(E,G virtual vtd only can be
> used on the intel platform).
> 
> >
> > Now, if we go back and start from the premise that we have in-kernel
> > infrastructure to report IOMMU faults to drivers in a common,
> > non-vendor specific way, does that change my conclusion in the first
> > paragraph since not having a consistent error format was a
> > contributing factor.  It seems like a common error format is not the
> > only problem with a container hosting multiple domains though.  What
> > if we have a container where some domains are capable of reporting
> > faults and others are not.  Could a user positively determine that a
> > device is capable of reporting IOMMU faults in that case?  It seems
> > not.  So perhaps the vfio device is still the proper place to host
> > that reporting and we can simply leverage the common error reporting
> > in the host layer to expose similar common reporting to the user,
> > which also provides the benefit that the solution isn't locked to
> > matching physical IOMMU and vIOMMU from the same vendor.  Thanks,
> >
> > Alex
> >
> 
> 
> --
> Best regards
> Tianyu Lan

^ permalink raw reply	[flat|nested] 27+ messages in thread

end of thread, other threads:[~2017-03-21 23:57 UTC | newest]

Thread overview: 27+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-02-19 14:47 [RFC PATCH 0/3] VFIO: Report IOMMU fault event to userspace Lan Tianyu
2017-02-19 14:47 ` [RFC PATCH 1/3] VFIO: Add new cmd to receive eventfd from userspace to notify IOMMU fault event Lan Tianyu
2017-02-20 20:53   ` Alex Williamson
2017-02-21  5:29     ` Lan Tianyu
2017-02-21  5:48   ` Michael S. Tsirkin
2017-02-21  6:05     ` Alex Williamson
2017-02-21  6:11     ` Liu, Yi L
2017-02-19 14:47 ` [RFC PATCH 2/3] VFIO: Add IOMMU fault notifier callback Lan Tianyu
2017-02-20  2:58   ` Liu, Yi L
2017-02-20 20:53   ` Alex Williamson
2017-02-21  6:05     ` Lan Tianyu
2017-02-21  5:55   ` Michael S. Tsirkin
2017-02-21  6:13     ` Lan Tianyu
2017-02-19 14:47 ` [RFC PATCH 3/3] VFIO: Add new cmd for user space to get IOMMU fault info Lan Tianyu
2017-02-20 20:53   ` Alex Williamson
2017-02-20 20:53 ` [RFC PATCH 0/3] VFIO: Report IOMMU fault event to userspace Alex Williamson
2017-02-21  4:49   ` Lan Tianyu
2017-02-21  5:29     ` Alex Williamson
2017-02-21 15:18       ` Lan Tianyu
2017-02-21 15:21         ` Lan, Tianyu
2017-02-28 15:58       ` Lan, Tianyu
2017-03-15  6:17         ` Liu, Yi L
2017-03-15 19:52           ` Alex Williamson
2017-03-16  1:42             ` Lan Tianyu
2017-03-16  3:32               ` Jason Wang
2017-03-16  5:22                 ` Lan Tianyu
2017-03-21 23:57               ` Liu, Yi L

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.