linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* Re: [patch 0/4] Intel IOMMU Supspend/Resume Support
       [not found] <20090327212241.234500000@intel.com>
@ 2009-03-28 14:24 ` Andrew Lutomirski
  2009-03-30 23:01 ` David Woodhouse
                   ` (2 subsequent siblings)
  3 siblings, 0 replies; 61+ messages in thread
From: Andrew Lutomirski @ 2009-03-28 14:24 UTC (permalink / raw)
  To: Fenghua Yu
  Cc: Ingo Molnar, David Woodhouse, Suresh Siddha, Jesse Barnes,
	Kyle McMartin, Yinghai Lu, LKML, IOMMU

Works for me (applied to current -tip).  I suspended and resumed just
fine, with no errors logged and all hardware apparently still working.

Tested-by: Andy Lutomirski <amluto@gmail.com>

Thanks for the patch!
Andy

On Fri, Mar 27, 2009 at 5:22 PM, Fenghua Yu <fenghua.yu@intel.com> wrote:
> Current Intel IOMMU does not support suspend and resume. In S3 event, kernel
> crashes when DMAR or interrupt remapping is running.
>
> The attached patch set implements the suspend and resume feature for Intel
> IOMMU. It hooks to kernel suspend and resume interface. When suspend happens, itsaves necessary hardware registers. When resume happens, it restores the
> registers and restarts IOMMU.
>
> This patch set can be applied on the tip tree.
>
> --
>
>

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [patch 0/4] Intel IOMMU Supspend/Resume Support
       [not found] <20090327212241.234500000@intel.com>
  2009-03-28 14:24 ` [patch 0/4] Intel IOMMU Supspend/Resume Support Andrew Lutomirski
@ 2009-03-30 23:01 ` David Woodhouse
       [not found] ` <20090327212321.520992000@intel.com>
       [not found] ` <20090327212321.070229000@intel.com>
  3 siblings, 0 replies; 61+ messages in thread
From: David Woodhouse @ 2009-03-30 23:01 UTC (permalink / raw)
  To: Fenghua Yu
  Cc: Ingo Molnar, Suresh Siddha, Andrew Lutomirski, Jesse Barnes,
	Kyle McMartin, Yinghai Lu, LKML, IOMMU

On Fri, 2009-03-27 at 14:22 -0700, Fenghua Yu wrote:
> Current Intel IOMMU does not support suspend and resume. In S3 event, kernel
> crashes when DMAR or interrupt remapping is running.
> 
> The attached patch set implements the suspend and resume feature for Intel
> IOMMU. It hooks to kernel suspend and resume interface. When suspend happens, itsaves necessary hardware registers. When resume happens, it restores the
> registers and restarts IOMMU.
> 
> This patch set can be applied on the tip tree. 

Looks good to me, apart from the fact that your #4 patch actually needs
to come before #1. Shall I apply it to my iommu-2.6.git tree or let Ingo
take it? If the latter, 

Acked-by: David Woodhouse <David.Woodhouse@intel.com>
Acked-by: David Woodhouse <David.Woodhouse@intel.com>
Acked-by: David Woodhouse <David.Woodhouse@intel.com>
Acked-by: David Woodhouse <David.Woodhouse@intel.com>

(if you look carefully, you'll see those are in order 4, 1, 2, 3)

Have we tested this on IA64?

-- 
David Woodhouse                            Open Source Technology Centre
David.Woodhouse@intel.com                              Intel Corporation


^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [patch 4/4] Intel IOMMU Suspend/Resume Support - Code Clean Up
       [not found] ` <20090327212321.520992000@intel.com>
@ 2009-04-03 12:37   ` David Woodhouse
  0 siblings, 0 replies; 61+ messages in thread
From: David Woodhouse @ 2009-04-03 12:37 UTC (permalink / raw)
  To: Fenghua Yu
  Cc: Ingo Molnar, Suresh Siddha, Andrew Lutomirski, Jesse Barnes,
	Kyle McMartin, Yinghai Lu, LKML, IOMMU

On Fri, 2009-03-27 at 14:22 -0700, Fenghua Yu wrote:
> plain text document attachment (iommu_sr_4.patch)
> A new macro list_for_each_entry_selected_do() is defined for linked list.
> iommu uses this macro to have concise and more readable code.
> 
> It's defined as generic code. Hopefully it could be used in other places.
> 
> Signed-off-by: Fenghua Yu <fenghua.yu@intel.com>

This is overkill, surely? It would be easier to do...

#define for_each_active_iommu(i, drhd)					\
	list_for_each_entry(drhd, &dmar_drhd_units, list)		\
		if (i=drhd->iommu, !drhd->ignored)

#define for_each_iommu(i, drhd)						\
	list_for_each_entry(drhd, &dmar_drhd_units, list)		\
		if (i=drhd->iommu, 1)

Or if you really want to avoid it eating a stray 'else' then

#define for_each_active_iommu(i, drhd)					\
	list_for_each_entry(drhd, &dmar_drhd_units, list)		\
		if (i=drhd->iommu, drhd->ignored) {} else

#define for_each_iommu(i, drhd)						\
	list_for_each_entry(drhd, &dmar_drhd_units, list)		\
		if (i=drhd->iommu, 0) {} else


> ---
> 
>  include/linux/dmar.h |    8 ++++++++
>  include/linux/list.h |   42 ++++++++++++++++++++++++++++++++++++++++++
>  2 files changed, 50 insertions(+)
> 
> diff --git a/include/linux/dmar.h b/include/linux/dmar.h
> index 2f34274..146d54b 100644
> --- a/include/linux/dmar.h
> +++ b/include/linux/dmar.h
> @@ -44,6 +44,14 @@ extern struct list_head dmar_drhd_units;
>  #define for_each_drhd_unit(drhd) \
>  	list_for_each_entry(drhd, &dmar_drhd_units, list)
>  
> +#define for_each_active_iommu(i, drhd)					\
> +	list_for_each_entry_selected_do(drhd, &dmar_drhd_units, list,	\
> +				!drhd->ignored, {i = drhd->iommu; })
> +
> +#define for_each_iommu(i, drhd)						\
> +	list_for_each_entry_selected_do(drhd, &dmar_drhd_units, list,	\
> +				1, {i = drhd->iommu; })
> +
>  extern int dmar_table_init(void);
>  extern int dmar_dev_scope_init(void);
>  
> diff --git a/include/linux/list.h b/include/linux/list.h
> index 969f6e9..6580f43 100644
> --- a/include/linux/list.h
> +++ b/include/linux/list.h
> @@ -530,6 +530,48 @@ static inline void list_splice_tail_init(struct list_head *list,
>  	     &pos->member != (head); 					\
>  	     pos = n, n = list_entry(n->member.prev, typeof(*n), member))
>  
> +/**
> + * list_first_entry_selected - get the first selected element from a list
> + * @pos:	the type * to use as a loop cursor.
> + * @head:	the head for your list.
> + * @member:	the name of the list_struct within the struct.
> + * @selector:	the selector to select the first entry.
> + */
> +#define list_first_entry_selected(pos, head, member, selector)		  \
> +	pos = list_entry((head)->next, typeof(*pos), member),		  \
> +	pos = ({while (!(selector) && (&pos->member != (head)))		  \
> +		pos = list_entry(pos->member.next, typeof(*pos), member); \
> +		pos; })
> +
> +/**
> + * list_next_entry_selected - get the next selected element from a list
> + * @pos:	the type * to use as a loop cursor.
> + * @head:	the head for your list.
> + * @member:	the name of the list_struct within the struct.
> + * @selector:	the selector to select the next entry.
> + */
> +#define list_next_entry_selected(pos, head, member, selector)		  \
> +	pos = list_entry(pos->member.next, typeof(*pos), member),	  \
> +	pos = ({while (!(selector) && (&pos->member != (head)))		  \
> +		pos = list_entry(pos->member.next, typeof(*pos), member); \
> +		pos; })
> +
> +/**
> + * list_for_each_entry_selected_do - iterate over list of given type and
> + * selected entries. Do something on each selected entry.
> + * @pos:	the type * to use as a loop cursor.
> + * @head:	the head for your list.
> + * @member:	the name of the list_struct within the struct.
> + * @selector:	the selector to select entries.
> + * @to_do:	things to do on each selected entry.
> + */
> +#define list_for_each_entry_selected_do(pos, head, member, selector, to_do) \
> +	for (list_first_entry_selected(pos, head, member, selector),	    \
> +		({if (&pos->member != (head)) to_do});			    \
> +		prefetch(pos->member.next), &pos->member != (head);	    \
> +		list_next_entry_selected(pos, head, member, selector),	    \
> +		({if (&pos->member != (head)) to_do}))
> +
>  /*
>   * Double linked lists with a single pointer list head.
>   * Mostly useful for hash tables where the two pointer list head is
> 
-- 
dwmw2


^ permalink raw reply	[flat|nested] 61+ messages in thread

* [PATCH] Intel IOMMU Pass Through Support
       [not found] ` <20090327212321.070229000@intel.com>
@ 2009-04-16  0:19   ` Fenghua Yu
  2009-04-16  2:13     ` Han, Weidong
                       ` (2 more replies)
  0 siblings, 3 replies; 61+ messages in thread
From: Fenghua Yu @ 2009-04-16  0:19 UTC (permalink / raw)
  To: David Woodhouse, Ingo Molnar, Linus Torvalds, Weidong Han; +Cc: LKML, IOMMU

The patch adds kernel parameter intel_iommu=pt to set up pass through mode in
context mapping entry. This disables DMAR in linux kernel; but KVM still runs on
VT-d and interrupt remapping still works.

In this mode, kernel uses swiotlb for DMA API functions but other VT-d
functionalities are enabled for KVM. KVM always uses multi level translation
page table in VT-d. By default, pass though mode is disabled in kernel.

This is useful when people don't want to enable VT-d DMAR in kernel but still
want to use KVM and interrupt remapping for reasons like DMAR performance
concern or debug purpose.

Thanks.

-Fenghua

Signed-off-by: Fenghua Yu <fenghua.yu@intel.com>

---

 Documentation/kernel-parameters.txt |    5 
 arch/ia64/include/asm/iommu.h       |    1 
 arch/ia64/kernel/pci-swiotlb.c      |    2 
 arch/x86/include/asm/iommu.h        |    1 
 arch/x86/kernel/pci-swiotlb.c       |    3 
 drivers/pci/dmar.c                  |    9 +
 drivers/pci/intel-iommu.c           |  187 ++++++++++++++++++++++++++----------
 include/linux/dma_remapping.h       |    8 +
 include/linux/intel-iommu.h         |    2 
 9 files changed, 167 insertions(+), 51 deletions(-)

diff --git a/Documentation/kernel-parameters.txt b/Documentation/kernel-parameters.txt
index 6172e43..5594cdb 100644
--- a/Documentation/kernel-parameters.txt
+++ b/Documentation/kernel-parameters.txt
@@ -915,6 +915,11 @@ and is between 256 and 4096 characters. It is defined in the file
 			With this option on every unmap_single operation will
 			result in a hardware IOTLB flush operation as opposed
 			to batching them for performance.
+		pt	[Default no Pass Through]
+			This option enables Pass Through in context mapping if
+			Pass Through is supported in hardware. With this option
+			DMAR is disabled in kernel and kernel uses swiotlb, but
+			KVM can still uses VT-d IOTLB hardware.
 
 	inttest=	[IA64]
 
diff --git a/arch/ia64/include/asm/iommu.h b/arch/ia64/include/asm/iommu.h
index 0490794..37d41ca 100644
--- a/arch/ia64/include/asm/iommu.h
+++ b/arch/ia64/include/asm/iommu.h
@@ -9,6 +9,7 @@ extern void pci_iommu_shutdown(void);
 extern void no_iommu_init(void);
 extern int force_iommu, no_iommu;
 extern int iommu_detected;
+extern int iommu_pass_through;
 extern void iommu_dma_init(void);
 extern void machvec_init(const char *name);
 
diff --git a/arch/ia64/kernel/pci-swiotlb.c b/arch/ia64/kernel/pci-swiotlb.c
index 285aae8..223abb1 100644
--- a/arch/ia64/kernel/pci-swiotlb.c
+++ b/arch/ia64/kernel/pci-swiotlb.c
@@ -46,7 +46,7 @@ void __init swiotlb_dma_init(void)
 
 void __init pci_swiotlb_init(void)
 {
-	if (!iommu_detected) {
+	if (!iommu_detected || iommu_pass_through) {
 #ifdef CONFIG_IA64_GENERIC
 		swiotlb = 1;
 		printk(KERN_INFO "PCI-DMA: Re-initialize machine vector.\n");
diff --git a/arch/x86/include/asm/iommu.h b/arch/x86/include/asm/iommu.h
index af326a2..fd6d21b 100644
--- a/arch/x86/include/asm/iommu.h
+++ b/arch/x86/include/asm/iommu.h
@@ -6,6 +6,7 @@ extern void no_iommu_init(void);
 extern struct dma_map_ops nommu_dma_ops;
 extern int force_iommu, no_iommu;
 extern int iommu_detected;
+extern int iommu_pass_through;
 
 /* 10 seconds */
 #define DMAR_OPERATION_TIMEOUT ((cycles_t) tsc_khz*10*1000)
diff --git a/arch/x86/kernel/pci-swiotlb.c b/arch/x86/kernel/pci-swiotlb.c
index 34f12e9..42a0eb1 100644
--- a/arch/x86/kernel/pci-swiotlb.c
+++ b/arch/x86/kernel/pci-swiotlb.c
@@ -71,7 +71,8 @@ void __init pci_swiotlb_init(void)
 {
 	/* don't initialize swiotlb if iommu=off (no_iommu=1) */
 #ifdef CONFIG_X86_64
-	if (!iommu_detected && !no_iommu && max_pfn > MAX_DMA32_PFN)
+	if ((!iommu_detected && !no_iommu && max_pfn > MAX_DMA32_PFN) ||
+		iommu_pass_through)
 	       swiotlb = 1;
 #endif
 	if (swiotlb_force)
diff --git a/drivers/pci/dmar.c b/drivers/pci/dmar.c
index fa3a113..1ef1a19 100644
--- a/drivers/pci/dmar.c
+++ b/drivers/pci/dmar.c
@@ -515,6 +515,7 @@ int alloc_iommu(struct dmar_drhd_unit *drhd)
 	u32 ver;
 	static int iommu_allocated = 0;
 	int agaw = 0;
+	int msagaw = 0;
 
 	iommu = kzalloc(sizeof(*iommu), GFP_KERNEL);
 	if (!iommu)
@@ -539,8 +540,16 @@ int alloc_iommu(struct dmar_drhd_unit *drhd)
 			iommu->seq_id);
 		goto error;
 	}
+	msagaw = iommu_calculate_max_sagaw(iommu);
+	if (msagaw < 0) {
+		printk(KERN_ERR
+			"Cannot get a valid max agaw for iommu (seq_id = %d)\n",
+			iommu->seq_id);
+		goto error;
+	}
 #endif
 	iommu->agaw = agaw;
+	iommu->msagaw = msagaw;
 
 	/* the registers might be more than one page */
 	map_size = max_t(int, ecap_max_iotlb_offset(iommu->ecap),
diff --git a/drivers/pci/intel-iommu.c b/drivers/pci/intel-iommu.c
index 001b328..205e4a1 100644
--- a/drivers/pci/intel-iommu.c
+++ b/drivers/pci/intel-iommu.c
@@ -53,6 +53,8 @@
 
 #define DEFAULT_DOMAIN_ADDRESS_WIDTH 48
 
+#define MAX_AGAW_WIDTH 64
+
 #define DOMAIN_MAX_ADDR(gaw) ((((u64)1) << gaw) - 1)
 
 #define IOVA_PFN(addr)		((addr) >> PAGE_SHIFT)
@@ -127,8 +129,6 @@ static inline void context_set_fault_enable(struct context_entry *context)
 	context->lo &= (((u64)-1) << 2) | 1;
 }
 
-#define CONTEXT_TT_MULTI_LEVEL 0
-
 static inline void context_set_translation_type(struct context_entry *context,
 						unsigned long value)
 {
@@ -288,6 +288,7 @@ int dmar_disabled = 1;
 static int __initdata dmar_map_gfx = 1;
 static int dmar_forcedac;
 static int intel_iommu_strict;
+int iommu_pass_through;
 
 #define DUMMY_DEVICE_DOMAIN_INFO ((struct device_domain_info *)(-1))
 static DEFINE_SPINLOCK(device_domain_lock);
@@ -318,6 +319,9 @@ static int __init intel_iommu_setup(char *str)
 			printk(KERN_INFO
 				"Intel-IOMMU: disable batched IOTLB flush\n");
 			intel_iommu_strict = 1;
+		} else if (!strncmp(str, "pt", 2)) {
+			iommu_pass_through = 1;
+			printk(KERN_INFO "Intel-IOMMU: Pass Through enabled\n");
 		}
 
 		str += strcspn(str, ",");
@@ -397,17 +401,13 @@ void free_iova_mem(struct iova *iova)
 
 static inline int width_to_agaw(int width);
 
-/* calculate agaw for each iommu.
- * "SAGAW" may be different across iommus, use a default agaw, and
- * get a supported less agaw for iommus that don't support the default agaw.
- */
-int iommu_calculate_agaw(struct intel_iommu *iommu)
+static int __iommu_calculate_agaw(struct intel_iommu *iommu, int max_gaw)
 {
 	unsigned long sagaw;
 	int agaw = -1;
 
 	sagaw = cap_sagaw(iommu->cap);
-	for (agaw = width_to_agaw(DEFAULT_DOMAIN_ADDRESS_WIDTH);
+	for (agaw = width_to_agaw(max_gaw);
 	     agaw >= 0; agaw--) {
 		if (test_bit(agaw, &sagaw))
 			break;
@@ -416,6 +416,24 @@ int iommu_calculate_agaw(struct intel_iommu *iommu)
 	return agaw;
 }
 
+/*
+ * Calculate max SAGAW for each iommu.
+ */
+int iommu_calculate_max_sagaw(struct intel_iommu *iommu)
+{
+	return __iommu_calculate_agaw(iommu, MAX_AGAW_WIDTH);
+}
+
+/*
+ * calculate agaw for each iommu.
+ * "SAGAW" may be different across iommus, use a default agaw, and
+ * get a supported less agaw for iommus that don't support the default agaw.
+ */
+int iommu_calculate_agaw(struct intel_iommu *iommu)
+{
+	return __iommu_calculate_agaw(iommu, DEFAULT_DOMAIN_ADDRESS_WIDTH);
+}
+
 /* in native case, each domain is related to only one iommu */
 static struct intel_iommu *domain_get_iommu(struct dmar_domain *domain)
 {
@@ -1321,8 +1339,8 @@ static void domain_exit(struct dmar_domain *domain)
 	free_domain_mem(domain);
 }
 
-static int domain_context_mapping_one(struct dmar_domain *domain,
-				      int segment, u8 bus, u8 devfn)
+static int domain_context_mapping_one(struct dmar_domain *domain, int segment,
+				 u8 bus, u8 devfn, int translation)
 {
 	struct context_entry *context;
 	unsigned long flags;
@@ -1335,7 +1353,10 @@ static int domain_context_mapping_one(struct dmar_domain *domain,
 
 	pr_debug("Set context mapping for %02x:%02x.%d\n",
 		bus, PCI_SLOT(devfn), PCI_FUNC(devfn));
+
 	BUG_ON(!domain->pgd);
+	BUG_ON(translation != CONTEXT_TT_PASS_THROUGH &&
+		translation != CONTEXT_TT_MULTI_LEVEL);
 
 	iommu = device_to_iommu(segment, bus, devfn);
 	if (!iommu)
@@ -1395,9 +1416,18 @@ static int domain_context_mapping_one(struct dmar_domain *domain,
 	}
 
 	context_set_domain_id(context, id);
-	context_set_address_width(context, iommu->agaw);
-	context_set_address_root(context, virt_to_phys(pgd));
-	context_set_translation_type(context, CONTEXT_TT_MULTI_LEVEL);
+
+	/*
+	 * In pass through mode, AW must be programmed to indicate the largest
+	 * AGAW value supported by hardware. And ASR is ignored by hardware.
+	 */
+	if (likely(translation == CONTEXT_TT_MULTI_LEVEL)) {
+		context_set_address_width(context, iommu->agaw);
+		context_set_address_root(context, virt_to_phys(pgd));
+	} else
+		context_set_address_width(context, iommu->msagaw);
+
+	context_set_translation_type(context, translation);
 	context_set_fault_enable(context);
 	context_set_present(context);
 	domain_flush_cache(domain, context, sizeof(*context));
@@ -1422,13 +1452,15 @@ static int domain_context_mapping_one(struct dmar_domain *domain,
 }
 
 static int
-domain_context_mapping(struct dmar_domain *domain, struct pci_dev *pdev)
+domain_context_mapping(struct dmar_domain *domain, struct pci_dev *pdev,
+			int translation)
 {
 	int ret;
 	struct pci_dev *tmp, *parent;
 
 	ret = domain_context_mapping_one(domain, pci_domain_nr(pdev->bus),
-					 pdev->bus->number, pdev->devfn);
+					pdev->bus->number, pdev->devfn,
+					translation);
 	if (ret)
 		return ret;
 
@@ -1440,9 +1472,9 @@ domain_context_mapping(struct dmar_domain *domain, struct pci_dev *pdev)
 	parent = pdev->bus->self;
 	while (parent != tmp) {
 		ret = domain_context_mapping_one(domain,
-						 pci_domain_nr(parent->bus),
-						 parent->bus->number,
-						 parent->devfn);
+						pci_domain_nr(parent->bus),
+						parent->bus->number,
+						parent->devfn, translation);
 		if (ret)
 			return ret;
 		parent = parent->bus->self;
@@ -1450,12 +1482,14 @@ domain_context_mapping(struct dmar_domain *domain, struct pci_dev *pdev)
 	if (tmp->is_pcie) /* this is a PCIE-to-PCI bridge */
 		return domain_context_mapping_one(domain,
 					pci_domain_nr(tmp->subordinate),
-					tmp->subordinate->number, 0);
+					tmp->subordinate->number, 0,
+					translation);
 	else /* this is a legacy PCI bridge */
 		return domain_context_mapping_one(domain,
 						  pci_domain_nr(tmp->bus),
 						  tmp->bus->number,
-						  tmp->devfn);
+						  tmp->devfn,
+						  translation);
 }
 
 static int domain_context_mapped(struct pci_dev *pdev)
@@ -1752,7 +1786,7 @@ static int iommu_prepare_identity_map(struct pci_dev *pdev,
 		goto error;
 
 	/* context entry init */
-	ret = domain_context_mapping(domain, pdev);
+	ret = domain_context_mapping(domain, pdev, CONTEXT_TT_MULTI_LEVEL);
 	if (!ret)
 		return 0;
 error:
@@ -1853,6 +1887,23 @@ static inline void iommu_prepare_isa(void)
 }
 #endif /* !CONFIG_DMAR_FLPY_WA */
 
+/* Initialize each context entry as pass through.*/
+static int __init init_context_pass_through(void)
+{
+	struct pci_dev *pdev = NULL;
+	struct dmar_domain *domain;
+	int ret;
+
+	for_each_pci_dev(pdev) {
+		domain = get_domain_for_dev(pdev, DEFAULT_DOMAIN_ADDRESS_WIDTH);
+		ret = domain_context_mapping(domain, pdev,
+						CONTEXT_TT_PASS_THROUGH);
+		if (ret)
+			return ret;
+	}
+	return 0;
+}
+
 static int __init init_dmars(void)
 {
 	struct dmar_drhd_unit *drhd;
@@ -1860,6 +1911,7 @@ static int __init init_dmars(void)
 	struct pci_dev *pdev;
 	struct intel_iommu *iommu;
 	int i, ret;
+	int pass_through = 1;
 
 	/*
 	 * for each drhd
@@ -1913,7 +1965,15 @@ static int __init init_dmars(void)
 			printk(KERN_ERR "IOMMU: allocate root entry failed\n");
 			goto error;
 		}
+		if (!ecap_pass_through(iommu->ecap))
+			pass_through = 0;
 	}
+	if (iommu_pass_through)
+		if (!pass_through) {
+			printk(KERN_INFO
+				"Pass Through is not supported by hardware.\n");
+			iommu_pass_through = 0;
+		}
 
 	/*
 	 * Start from the sane iommu hardware state.
@@ -1976,37 +2036,57 @@ static int __init init_dmars(void)
 			       "IOMMU: enable interrupt remapping failed\n");
 	}
 #endif
+	/*
+	 * If pass through is set and enabled, context entries of all pci
+	 * devices are intialized by pass through translation type.
+	 */
+	if (iommu_pass_through) {
+		ret = init_context_pass_through();
+		if (ret) {
+			printk(KERN_ERR "IOMMU: Pass through init failed.\n");
+			iommu_pass_through = 0;
+		}
+	}
 
 	/*
-	 * For each rmrr
-	 *   for each dev attached to rmrr
-	 *   do
-	 *     locate drhd for dev, alloc domain for dev
-	 *     allocate free domain
-	 *     allocate page table entries for rmrr
-	 *     if context not allocated for bus
-	 *           allocate and init context
-	 *           set present in root table for this bus
-	 *     init context with domain, translation etc
-	 *    endfor
-	 * endfor
+	 * If pass through is not set or not enabled, setup context entries for
+	 * identity mappings for rmrr, gfx, and isa.
 	 */
-	for_each_rmrr_units(rmrr) {
-		for (i = 0; i < rmrr->devices_cnt; i++) {
-			pdev = rmrr->devices[i];
-			/* some BIOS lists non-exist devices in DMAR table */
-			if (!pdev)
-				continue;
-			ret = iommu_prepare_rmrr_dev(rmrr, pdev);
-			if (ret)
-				printk(KERN_ERR
+	if (!iommu_pass_through) {
+		/*
+		 * For each rmrr
+		 *   for each dev attached to rmrr
+		 *   do
+		 *     locate drhd for dev, alloc domain for dev
+		 *     allocate free domain
+		 *     allocate page table entries for rmrr
+		 *     if context not allocated for bus
+		 *           allocate and init context
+		 *           set present in root table for this bus
+		 *     init context with domain, translation etc
+		 *    endfor
+		 * endfor
+		 */
+		for_each_rmrr_units(rmrr) {
+			for (i = 0; i < rmrr->devices_cnt; i++) {
+				pdev = rmrr->devices[i];
+				/*
+				 * some BIOS lists non-exist devices in DMAR
+				 * table.
+				 */
+				if (!pdev)
+					continue;
+				ret = iommu_prepare_rmrr_dev(rmrr, pdev);
+				if (ret)
+					printk(KERN_ERR
 				 "IOMMU: mapping reserved region failed\n");
+			}
 		}
-	}
 
-	iommu_prepare_gfx_mapping();
+		iommu_prepare_gfx_mapping();
 
-	iommu_prepare_isa();
+		iommu_prepare_isa();
+	}
 
 	/*
 	 * for each drhd
@@ -2117,7 +2197,8 @@ get_valid_domain_for_dev(struct pci_dev *pdev)
 
 	/* make sure context mapping is ok */
 	if (unlikely(!domain_context_mapped(pdev))) {
-		ret = domain_context_mapping(domain, pdev);
+		ret = domain_context_mapping(domain, pdev,
+						 CONTEXT_TT_MULTI_LEVEL);
 		if (ret) {
 			printk(KERN_ERR
 				"Domain context map for %s failed",
@@ -2786,7 +2867,7 @@ int __init intel_iommu_init(void)
 	 * Check the need for DMA-remapping initialization now.
 	 * Above initialization will also be used by Interrupt-remapping.
 	 */
-	if (no_iommu || swiotlb || dmar_disabled)
+	if (no_iommu || (swiotlb && !iommu_pass_through) || dmar_disabled)
 		return -ENODEV;
 
 	iommu_init_mempool();
@@ -2806,7 +2887,15 @@ int __init intel_iommu_init(void)
 
 	init_timer(&unmap_timer);
 	force_iommu = 1;
-	dma_ops = &intel_dma_ops;
+
+	if (!iommu_pass_through) {
+		printk(KERN_INFO
+			"Multi-level page-table translation for DMAR.\n");
+		dma_ops = &intel_dma_ops;
+	} else
+		printk(KERN_INFO
+			"DMAR: Pass through translation for DMAR.\n");
+
 	init_iommu_sysfs();
 
 	register_iommu(&intel_iommu_ops);
@@ -3146,7 +3235,7 @@ static int intel_iommu_attach_device(struct iommu_domain *domain,
 		return -EFAULT;
 	}
 
-	ret = domain_context_mapping(dmar_domain, pdev);
+	ret = domain_context_mapping(dmar_domain, pdev, CONTEXT_TT_MULTI_LEVEL);
 	if (ret)
 		return ret;
 
diff --git a/include/linux/dma_remapping.h b/include/linux/dma_remapping.h
index 1a455f1..e0a03af 100644
--- a/include/linux/dma_remapping.h
+++ b/include/linux/dma_remapping.h
@@ -13,6 +13,9 @@
 #define DMA_PTE_WRITE (2)
 #define DMA_PTE_SNP (1 << 11)
 
+#define CONTEXT_TT_MULTI_LEVEL	0
+#define CONTEXT_TT_PASS_THROUGH 2
+
 struct intel_iommu;
 struct dmar_domain;
 struct root_entry;
@@ -21,11 +24,16 @@ extern void free_dmar_iommu(struct intel_iommu *iommu);
 
 #ifdef CONFIG_DMAR
 extern int iommu_calculate_agaw(struct intel_iommu *iommu);
+extern int iommu_calculate_max_sagaw(struct intel_iommu *iommu);
 #else
 static inline int iommu_calculate_agaw(struct intel_iommu *iommu)
 {
 	return 0;
 }
+static inline int iommu_calculate_max_sagaw(struct intel_iommu *iommu)
+{
+	return 0;
+}
 #endif
 
 extern int dmar_disabled;
diff --git a/include/linux/intel-iommu.h b/include/linux/intel-iommu.h
index aa8c531..7246971 100644
--- a/include/linux/intel-iommu.h
+++ b/include/linux/intel-iommu.h
@@ -120,6 +120,7 @@ static inline void dmar_writeq(void __iomem *addr, u64 val)
 	(ecap_iotlb_offset(e) + ecap_niotlb_iunits(e) * 16)
 #define ecap_coherent(e)	((e) & 0x1)
 #define ecap_qis(e)		((e) & 0x2)
+#define ecap_pass_through(e)	((e >> 6) & 0x1)
 #define ecap_eim_support(e)	((e >> 4) & 0x1)
 #define ecap_ir_support(e)	((e >> 3) & 0x1)
 #define ecap_max_handle_mask(e) ((e >> 20) & 0xf)
@@ -302,6 +303,7 @@ struct intel_iommu {
 	spinlock_t	register_lock; /* protect register handling */
 	int		seq_id;	/* sequence id of the iommu */
 	int		agaw; /* agaw of this iommu */
+	int		msagaw; /* max sagaw of this iommu */
 	unsigned int 	irq;
 	unsigned char 	name[13];    /* Device Name */
 

^ permalink raw reply related	[flat|nested] 61+ messages in thread

* RE: [PATCH] Intel IOMMU Pass Through Support
  2009-04-16  0:19   ` [PATCH] Intel IOMMU Pass Through Support Fenghua Yu
@ 2009-04-16  2:13     ` Han, Weidong
  2009-04-19 10:05     ` David Woodhouse
  2009-04-30 23:29     ` [PATCH] Intel IOMMU Pass Through Support Andrew Morton
  2 siblings, 0 replies; 61+ messages in thread
From: Han, Weidong @ 2009-04-16  2:13 UTC (permalink / raw)
  To: Yu, Fenghua, David Woodhouse, Ingo Molnar, Linus Torvalds; +Cc: LKML, IOMMU

Acked-by: Weidong Han <weidong@intel.com>

Yu, Fenghua wrote:
> The patch adds kernel parameter intel_iommu=pt to set up pass through
> mode in
> context mapping entry. This disables DMAR in linux kernel; but KVM
> still runs on
> VT-d and interrupt remapping still works.
>
> In this mode, kernel uses swiotlb for DMA API functions but other VT-d
> functionalities are enabled for KVM. KVM always uses multi level
> translation
> page table in VT-d. By default, pass though mode is disabled in
> kernel.
>
> This is useful when people don't want to enable VT-d DMAR in kernel
> but still
> want to use KVM and interrupt remapping for reasons like DMAR
> performance
> concern or debug purpose.
>
> Thanks.
>
> -Fenghua
>
> Signed-off-by: Fenghua Yu <fenghua.yu@intel.com>
>
> ---
>
>  Documentation/kernel-parameters.txt |    5
>  arch/ia64/include/asm/iommu.h       |    1
>  arch/ia64/kernel/pci-swiotlb.c      |    2
>  arch/x86/include/asm/iommu.h        |    1
>  arch/x86/kernel/pci-swiotlb.c       |    3
>  drivers/pci/dmar.c                  |    9 +
>  drivers/pci/intel-iommu.c           |  187
>  ++++++++++++++++++++++++++---------- include/linux/dma_remapping.h
>  |    8 + include/linux/intel-iommu.h         |    2
>  9 files changed, 167 insertions(+), 51 deletions(-)
>
> diff --git a/Documentation/kernel-parameters.txt
> b/Documentation/kernel-parameters.txt
> index 6172e43..5594cdb 100644
> --- a/Documentation/kernel-parameters.txt
> +++ b/Documentation/kernel-parameters.txt
> @@ -915,6 +915,11 @@ and is between 256 and 4096 characters. It is
>                       defined in the file With this option on every unmap_single
>                       operation will result in a hardware IOTLB flush operation as
>                       opposed to batching them for performance.
> +             pt      [Default no Pass Through]
> +                     This option enables Pass Through in context mapping if
> +                     Pass Through is supported in hardware. With this option
> +                     DMAR is disabled in kernel and kernel uses swiotlb, but
> +                     KVM can still uses VT-d IOTLB hardware.
>
>       inttest=        [IA64]
>
> diff --git a/arch/ia64/include/asm/iommu.h
> b/arch/ia64/include/asm/iommu.h
> index 0490794..37d41ca 100644
> --- a/arch/ia64/include/asm/iommu.h
> +++ b/arch/ia64/include/asm/iommu.h
> @@ -9,6 +9,7 @@ extern void pci_iommu_shutdown(void);
>  extern void no_iommu_init(void);
>  extern int force_iommu, no_iommu;
>  extern int iommu_detected;
> +extern int iommu_pass_through;
>  extern void iommu_dma_init(void);
>  extern void machvec_init(const char *name);
>
> diff --git a/arch/ia64/kernel/pci-swiotlb.c
> b/arch/ia64/kernel/pci-swiotlb.c
> index 285aae8..223abb1 100644
> --- a/arch/ia64/kernel/pci-swiotlb.c
> +++ b/arch/ia64/kernel/pci-swiotlb.c
> @@ -46,7 +46,7 @@ void __init swiotlb_dma_init(void)
>
>  void __init pci_swiotlb_init(void)
>  {
> -     if (!iommu_detected) {
> +     if (!iommu_detected || iommu_pass_through) {
>  #ifdef CONFIG_IA64_GENERIC
>               swiotlb = 1;
>               printk(KERN_INFO "PCI-DMA: Re-initialize machine vector.\n");
> diff --git a/arch/x86/include/asm/iommu.h
> b/arch/x86/include/asm/iommu.h
> index af326a2..fd6d21b 100644
> --- a/arch/x86/include/asm/iommu.h
> +++ b/arch/x86/include/asm/iommu.h
> @@ -6,6 +6,7 @@ extern void no_iommu_init(void);
>  extern struct dma_map_ops nommu_dma_ops;
>  extern int force_iommu, no_iommu;
>  extern int iommu_detected;
> +extern int iommu_pass_through;
>
>  /* 10 seconds */
>  #define DMAR_OPERATION_TIMEOUT ((cycles_t) tsc_khz*10*1000)
> diff --git a/arch/x86/kernel/pci-swiotlb.c
> b/arch/x86/kernel/pci-swiotlb.c
> index 34f12e9..42a0eb1 100644
> --- a/arch/x86/kernel/pci-swiotlb.c
> +++ b/arch/x86/kernel/pci-swiotlb.c
> @@ -71,7 +71,8 @@ void __init pci_swiotlb_init(void)
>  {
>       /* don't initialize swiotlb if iommu=off (no_iommu=1) */
>  #ifdef CONFIG_X86_64
> -     if (!iommu_detected && !no_iommu && max_pfn > MAX_DMA32_PFN)
> +     if ((!iommu_detected && !no_iommu && max_pfn > MAX_DMA32_PFN) ||
> +             iommu_pass_through)
>              swiotlb = 1;
>  #endif
>       if (swiotlb_force)
> diff --git a/drivers/pci/dmar.c b/drivers/pci/dmar.c
> index fa3a113..1ef1a19 100644
> --- a/drivers/pci/dmar.c
> +++ b/drivers/pci/dmar.c
> @@ -515,6 +515,7 @@ int alloc_iommu(struct dmar_drhd_unit *drhd)
>       u32 ver;
>       static int iommu_allocated = 0;
>       int agaw = 0;
> +     int msagaw = 0;
>
>       iommu = kzalloc(sizeof(*iommu), GFP_KERNEL);
>       if (!iommu)
> @@ -539,8 +540,16 @@ int alloc_iommu(struct dmar_drhd_unit *drhd)
>                       iommu->seq_id);
>               goto error;
>       }
> +     msagaw = iommu_calculate_max_sagaw(iommu);
> +     if (msagaw < 0) {
> +             printk(KERN_ERR
> +                     "Cannot get a valid max agaw for iommu (seq_id = %d)\n",
> +                     iommu->seq_id);
> +             goto error;
> +     }
>  #endif
>       iommu->agaw = agaw;
> +     iommu->msagaw = msagaw;
>
>       /* the registers might be more than one page */
>       map_size = max_t(int, ecap_max_iotlb_offset(iommu->ecap),
> diff --git a/drivers/pci/intel-iommu.c b/drivers/pci/intel-iommu.c
> index 001b328..205e4a1 100644
> --- a/drivers/pci/intel-iommu.c
> +++ b/drivers/pci/intel-iommu.c
> @@ -53,6 +53,8 @@
>
>  #define DEFAULT_DOMAIN_ADDRESS_WIDTH 48
>
> +#define MAX_AGAW_WIDTH 64
> +
>  #define DOMAIN_MAX_ADDR(gaw) ((((u64)1) << gaw) - 1)
>
>  #define IOVA_PFN(addr)               ((addr) >> PAGE_SHIFT)
> @@ -127,8 +129,6 @@ static inline void
>       context_set_fault_enable(struct context_entry *context) context->lo
>  &= (((u64)-1) << 2) | 1; }
>
> -#define CONTEXT_TT_MULTI_LEVEL 0
> -
>  static inline void context_set_translation_type(struct context_entry
>                                               *context, unsigned long value)
>  {
> @@ -288,6 +288,7 @@ int dmar_disabled = 1;
>  static int __initdata dmar_map_gfx = 1;
>  static int dmar_forcedac;
>  static int intel_iommu_strict;
> +int iommu_pass_through;
>
>  #define DUMMY_DEVICE_DOMAIN_INFO ((struct device_domain_info *)(-1))
>  static DEFINE_SPINLOCK(device_domain_lock);
> @@ -318,6 +319,9 @@ static int __init intel_iommu_setup(char *str)
>                       printk(KERN_INFO
>                               "Intel-IOMMU: disable batched IOTLB flush\n");
>                       intel_iommu_strict = 1;
> +             } else if (!strncmp(str, "pt", 2)) {
> +                     iommu_pass_through = 1;
> +                     printk(KERN_INFO "Intel-IOMMU: Pass Through enabled\n");
>               }
>
>               str += strcspn(str, ",");
> @@ -397,17 +401,13 @@ void free_iova_mem(struct iova *iova)
>
>  static inline int width_to_agaw(int width);
>
> -/* calculate agaw for each iommu.
> - * "SAGAW" may be different across iommus, use a default agaw, and
> - * get a supported less agaw for iommus that don't support the
> default agaw.
> - */
> -int iommu_calculate_agaw(struct intel_iommu *iommu)
> +static int __iommu_calculate_agaw(struct intel_iommu *iommu, int
>  max_gaw) {
>       unsigned long sagaw;
>       int agaw = -1;
>
>       sagaw = cap_sagaw(iommu->cap);
> -     for (agaw = width_to_agaw(DEFAULT_DOMAIN_ADDRESS_WIDTH);
> +     for (agaw = width_to_agaw(max_gaw);
>            agaw >= 0; agaw--) {
>               if (test_bit(agaw, &sagaw))
>                       break;
> @@ -416,6 +416,24 @@ int iommu_calculate_agaw(struct intel_iommu
>       *iommu) return agaw;
>  }
>
> +/*
> + * Calculate max SAGAW for each iommu.
> + */
> +int iommu_calculate_max_sagaw(struct intel_iommu *iommu)
> +{
> +     return __iommu_calculate_agaw(iommu, MAX_AGAW_WIDTH);
> +}
> +
> +/*
> + * calculate agaw for each iommu.
> + * "SAGAW" may be different across iommus, use a default agaw, and
> + * get a supported less agaw for iommus that don't support the
> default agaw. + */
> +int iommu_calculate_agaw(struct intel_iommu *iommu)
> +{
> +     return __iommu_calculate_agaw(iommu, DEFAULT_DOMAIN_ADDRESS_WIDTH);
> +}
> +
>  /* in native case, each domain is related to only one iommu */
>  static struct intel_iommu *domain_get_iommu(struct dmar_domain
>  *domain) {
> @@ -1321,8 +1339,8 @@ static void domain_exit(struct dmar_domain
>       *domain) free_domain_mem(domain);
>  }
>
> -static int domain_context_mapping_one(struct dmar_domain *domain,
> -                                   int segment, u8 bus, u8 devfn)
> +static int domain_context_mapping_one(struct dmar_domain *domain,
> int segment, +                                 u8 bus, u8 devfn, int translation)
>  {
>       struct context_entry *context;
>       unsigned long flags;
> @@ -1335,7 +1353,10 @@ static int domain_context_mapping_one(struct
> dmar_domain *domain,
>
>       pr_debug("Set context mapping for %02x:%02x.%d\n",
>               bus, PCI_SLOT(devfn), PCI_FUNC(devfn));
> +
>       BUG_ON(!domain->pgd);
> +     BUG_ON(translation != CONTEXT_TT_PASS_THROUGH &&
> +             translation != CONTEXT_TT_MULTI_LEVEL);
>
>       iommu = device_to_iommu(segment, bus, devfn);
>       if (!iommu)
> @@ -1395,9 +1416,18 @@ static int domain_context_mapping_one(struct
>       dmar_domain *domain, }
>
>       context_set_domain_id(context, id);
> -     context_set_address_width(context, iommu->agaw);
> -     context_set_address_root(context, virt_to_phys(pgd));
> -     context_set_translation_type(context, CONTEXT_TT_MULTI_LEVEL);
> +
> +     /*
> +      * In pass through mode, AW must be programmed to indicate the
> largest +      * AGAW value supported by hardware. And ASR is ignored by
> hardware. +    */
> +     if (likely(translation == CONTEXT_TT_MULTI_LEVEL)) {
> +             context_set_address_width(context, iommu->agaw);
> +             context_set_address_root(context, virt_to_phys(pgd));
> +     } else
> +             context_set_address_width(context, iommu->msagaw);
> +
> +     context_set_translation_type(context, translation);
>       context_set_fault_enable(context);
>       context_set_present(context);
>       domain_flush_cache(domain, context, sizeof(*context));
> @@ -1422,13 +1452,15 @@ static int domain_context_mapping_one(struct
>  dmar_domain *domain, }
>
>  static int
> -domain_context_mapping(struct dmar_domain *domain, struct pci_dev
> *pdev) +domain_context_mapping(struct dmar_domain *domain, struct
> pci_dev *pdev, +                      int translation)
>  {
>       int ret;
>       struct pci_dev *tmp, *parent;
>
>       ret = domain_context_mapping_one(domain, pci_domain_nr(pdev->bus),
> -                                      pdev->bus->number, pdev->devfn);
> +                                     pdev->bus->number, pdev->devfn,
> +                                     translation);
>       if (ret)
>               return ret;
>
> @@ -1440,9 +1472,9 @@ domain_context_mapping(struct dmar_domain
>       *domain, struct pci_dev *pdev) parent = pdev->bus->self;
>       while (parent != tmp) {
>               ret = domain_context_mapping_one(domain,
> -                                              pci_domain_nr(parent->bus),
> -                                              parent->bus->number,
> -                                              parent->devfn);
> +                                             pci_domain_nr(parent->bus),
> +                                             parent->bus->number,
> +                                             parent->devfn, translation);
>               if (ret)
>                       return ret;
>               parent = parent->bus->self;
> @@ -1450,12 +1482,14 @@ domain_context_mapping(struct dmar_domain
>       *domain, struct pci_dev *pdev) if (tmp->is_pcie) /* this is a
>               PCIE-to-PCI bridge */ return domain_context_mapping_one(domain,
>                                       pci_domain_nr(tmp->subordinate),
> -                                     tmp->subordinate->number, 0);
> +                                     tmp->subordinate->number, 0,
> +                                     translation);
>       else /* this is a legacy PCI bridge */
>               return domain_context_mapping_one(domain,
>                                                 pci_domain_nr(tmp->bus),
>                                                 tmp->bus->number,
> -                                               tmp->devfn);
> +                                               tmp->devfn,
> +                                               translation);
>  }
>
>  static int domain_context_mapped(struct pci_dev *pdev)
> @@ -1752,7 +1786,7 @@ static int iommu_prepare_identity_map(struct
>               pci_dev *pdev, goto error;
>
>       /* context entry init */
> -     ret = domain_context_mapping(domain, pdev);
> +     ret = domain_context_mapping(domain, pdev, CONTEXT_TT_MULTI_LEVEL);
>       if (!ret)
>               return 0;
>  error:
> @@ -1853,6 +1887,23 @@ static inline void iommu_prepare_isa(void)
>  }
>  #endif /* !CONFIG_DMAR_FLPY_WA */
>
> +/* Initialize each context entry as pass through.*/
> +static int __init init_context_pass_through(void)
> +{
> +     struct pci_dev *pdev = NULL;
> +     struct dmar_domain *domain;
> +     int ret;
> +
> +     for_each_pci_dev(pdev) {
> +             domain = get_domain_for_dev(pdev, DEFAULT_DOMAIN_ADDRESS_WIDTH);
> +             ret = domain_context_mapping(domain, pdev,
> +                                             CONTEXT_TT_PASS_THROUGH);
> +             if (ret)
> +                     return ret;
> +     }
> +     return 0;
> +}
> +
>  static int __init init_dmars(void)
>  {
>       struct dmar_drhd_unit *drhd;
> @@ -1860,6 +1911,7 @@ static int __init init_dmars(void)
>       struct pci_dev *pdev;
>       struct intel_iommu *iommu;
>       int i, ret;
> +     int pass_through = 1;
>
>       /*
>        * for each drhd
> @@ -1913,7 +1965,15 @@ static int __init init_dmars(void)
>                       printk(KERN_ERR "IOMMU: allocate root entry failed\n");
>                       goto error;
>               }
> +             if (!ecap_pass_through(iommu->ecap))
> +                     pass_through = 0;
>       }
> +     if (iommu_pass_through)
> +             if (!pass_through) {
> +                     printk(KERN_INFO
> +                             "Pass Through is not supported by hardware.\n");
> +                     iommu_pass_through = 0;
> +             }
>
>       /*
>        * Start from the sane iommu hardware state.
> @@ -1976,37 +2036,57 @@ static int __init init_dmars(void)
>                              "IOMMU: enable interrupt remapping failed\n");
>       }
>  #endif
> +     /*
> +      * If pass through is set and enabled, context entries of all pci
> +      * devices are intialized by pass through translation type.
> +      */
> +     if (iommu_pass_through) {
> +             ret = init_context_pass_through();
> +             if (ret) {
> +                     printk(KERN_ERR "IOMMU: Pass through init failed.\n");
> +                     iommu_pass_through = 0;
> +             }
> +     }
>
>       /*
> -      * For each rmrr
> -      *   for each dev attached to rmrr
> -      *   do
> -      *     locate drhd for dev, alloc domain for dev
> -      *     allocate free domain
> -      *     allocate page table entries for rmrr
> -      *     if context not allocated for bus
> -      *           allocate and init context
> -      *           set present in root table for this bus
> -      *     init context with domain, translation etc
> -      *    endfor
> -      * endfor
> +      * If pass through is not set or not enabled, setup context entries
> for +  * identity mappings for rmrr, gfx, and isa.
>        */
> -     for_each_rmrr_units(rmrr) {
> -             for (i = 0; i < rmrr->devices_cnt; i++) {
> -                     pdev = rmrr->devices[i];
> -                     /* some BIOS lists non-exist devices in DMAR table */
> -                     if (!pdev)
> -                             continue;
> -                     ret = iommu_prepare_rmrr_dev(rmrr, pdev);
> -                     if (ret)
> -                             printk(KERN_ERR
> +     if (!iommu_pass_through) {
> +             /*
> +              * For each rmrr
> +              *   for each dev attached to rmrr
> +              *   do
> +              *     locate drhd for dev, alloc domain for dev
> +              *     allocate free domain
> +              *     allocate page table entries for rmrr
> +              *     if context not allocated for bus
> +              *           allocate and init context
> +              *           set present in root table for this bus
> +              *     init context with domain, translation etc
> +              *    endfor
> +              * endfor
> +              */
> +             for_each_rmrr_units(rmrr) {
> +                     for (i = 0; i < rmrr->devices_cnt; i++) {
> +                             pdev = rmrr->devices[i];
> +                             /*
> +                              * some BIOS lists non-exist devices in DMAR
> +                              * table.
> +                              */
> +                             if (!pdev)
> +                                     continue;
> +                             ret = iommu_prepare_rmrr_dev(rmrr, pdev);
> +                             if (ret)
> +                                     printk(KERN_ERR
>                                "IOMMU: mapping reserved region failed\n");
> +                     }
>               }
> -     }
>
> -     iommu_prepare_gfx_mapping();
> +             iommu_prepare_gfx_mapping();
>
> -     iommu_prepare_isa();
> +             iommu_prepare_isa();
> +     }
>
>       /*
>        * for each drhd
> @@ -2117,7 +2197,8 @@ get_valid_domain_for_dev(struct pci_dev *pdev)
>
>       /* make sure context mapping is ok */
>       if (unlikely(!domain_context_mapped(pdev))) {
> -             ret = domain_context_mapping(domain, pdev);
> +             ret = domain_context_mapping(domain, pdev,
> +                                              CONTEXT_TT_MULTI_LEVEL);
>               if (ret) {
>                       printk(KERN_ERR
>                               "Domain context map for %s failed",
> @@ -2786,7 +2867,7 @@ int __init intel_iommu_init(void)
>        * Check the need for DMA-remapping initialization now.
>        * Above initialization will also be used by Interrupt-remapping.
>        */
> -     if (no_iommu || swiotlb || dmar_disabled)
> +     if (no_iommu || (swiotlb && !iommu_pass_through) || dmar_disabled)
>               return -ENODEV;
>
>       iommu_init_mempool();
> @@ -2806,7 +2887,15 @@ int __init intel_iommu_init(void)
>
>       init_timer(&unmap_timer);
>       force_iommu = 1;
> -     dma_ops = &intel_dma_ops;
> +
> +     if (!iommu_pass_through) {
> +             printk(KERN_INFO
> +                     "Multi-level page-table translation for DMAR.\n");
> +             dma_ops = &intel_dma_ops;
> +     } else
> +             printk(KERN_INFO
> +                     "DMAR: Pass through translation for DMAR.\n");
> +
>       init_iommu_sysfs();
>
>       register_iommu(&intel_iommu_ops);
> @@ -3146,7 +3235,7 @@ static int intel_iommu_attach_device(struct
>               iommu_domain *domain, return -EFAULT;
>       }
>
> -     ret = domain_context_mapping(dmar_domain, pdev);
> +     ret = domain_context_mapping(dmar_domain, pdev,
>       CONTEXT_TT_MULTI_LEVEL); if (ret)
>               return ret;
>
> diff --git a/include/linux/dma_remapping.h
> b/include/linux/dma_remapping.h
> index 1a455f1..e0a03af 100644
> --- a/include/linux/dma_remapping.h
> +++ b/include/linux/dma_remapping.h
> @@ -13,6 +13,9 @@
>  #define DMA_PTE_WRITE (2)
>  #define DMA_PTE_SNP (1 << 11)
>
> +#define CONTEXT_TT_MULTI_LEVEL       0
> +#define CONTEXT_TT_PASS_THROUGH 2
> +
>  struct intel_iommu;
>  struct dmar_domain;
>  struct root_entry;
> @@ -21,11 +24,16 @@ extern void free_dmar_iommu(struct intel_iommu
> *iommu);
>
>  #ifdef CONFIG_DMAR
>  extern int iommu_calculate_agaw(struct intel_iommu *iommu);
> +extern int iommu_calculate_max_sagaw(struct intel_iommu *iommu);
>  #else
>  static inline int iommu_calculate_agaw(struct intel_iommu *iommu)
>  {
>       return 0;
>  }
> +static inline int iommu_calculate_max_sagaw(struct intel_iommu
> *iommu) +{
> +     return 0;
> +}
>  #endif
>
>  extern int dmar_disabled;
> diff --git a/include/linux/intel-iommu.h b/include/linux/intel-iommu.h
> index aa8c531..7246971 100644
> --- a/include/linux/intel-iommu.h
> +++ b/include/linux/intel-iommu.h
> @@ -120,6 +120,7 @@ static inline void dmar_writeq(void __iomem
>       *addr, u64 val) (ecap_iotlb_offset(e) + ecap_niotlb_iunits(e) * 16)
>  #define ecap_coherent(e)     ((e) & 0x1)
>  #define ecap_qis(e)          ((e) & 0x2)
> +#define ecap_pass_through(e) ((e >> 6) & 0x1)
>  #define ecap_eim_support(e)  ((e >> 4) & 0x1)
>  #define ecap_ir_support(e)   ((e >> 3) & 0x1)
>  #define ecap_max_handle_mask(e) ((e >> 20) & 0xf)
> @@ -302,6 +303,7 @@ struct intel_iommu {
>       spinlock_t      register_lock; /* protect register handling */
>       int             seq_id; /* sequence id of the iommu */
>       int             agaw; /* agaw of this iommu */
> +     int             msagaw; /* max sagaw of this iommu */
>       unsigned int    irq;
>       unsigned char   name[13];    /* Device Name */


^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [PATCH] Intel IOMMU Pass Through Support
  2009-04-16  0:19   ` [PATCH] Intel IOMMU Pass Through Support Fenghua Yu
  2009-04-16  2:13     ` Han, Weidong
@ 2009-04-19 10:05     ` David Woodhouse
  2009-04-20 17:27       ` Yu, Fenghua
  2009-04-30 23:29     ` [PATCH] Intel IOMMU Pass Through Support Andrew Morton
  2 siblings, 1 reply; 61+ messages in thread
From: David Woodhouse @ 2009-04-19 10:05 UTC (permalink / raw)
  To: Fenghua Yu; +Cc: Ingo Molnar, Linus Torvalds, Weidong Han, LKML, IOMMU

On Wed, 2009-04-15 at 17:19 -0700, Fenghua Yu wrote:
> +       if (iommu_pass_through)
> +               if (!pass_through) {
> +                       printk(KERN_INFO
> +                               "Pass Through is not supported by hardware.\n");
> +                       iommu_pass_through = 0;
> +               }

So if we ask for pass-through and the hardware doesn't support it, we
end up in full-translation mode? Wouldn't it be better to set up 1:1
mappings for each device instead?

(We probably want to fix up the code which sets up 1:1 mappings to
actually share page tables where appropriate, and to use superpages).

Also, did we already have a discussion about whether this should be an
intel_iommu= parameter, or a generic iommu= one?

>                 ret = domain_context_mapping_one(domain,
> -                                                pci_domain_nr(parent->bus),
> -                                                parent->bus->number,
> -                                                parent->devfn);
> +                                               pci_domain_nr(parent->bus),
> +                                               parent->bus->number,
> +                                               parent->devfn, translation);

Whitespace damage here -- you're modifying continuation lines which used
to be right underneath the 'domain' parameter, and you're making them
not line up properly.

You have the same problem elsewhere for newly-added code too, in fact:

> +               ret = domain_context_mapping(domain, pdev,
> +                                               CONTEXT_TT_PASS_THROUGH);


-- 
David Woodhouse                            Open Source Technology Centre
David.Woodhouse@intel.com                              Intel Corporation


^ permalink raw reply	[flat|nested] 61+ messages in thread

* RE: [PATCH] Intel IOMMU Pass Through Support
  2009-04-19 10:05     ` David Woodhouse
@ 2009-04-20 17:27       ` Yu, Fenghua
  2009-05-13 23:13         ` [PATCH] Fix Intel IOMMU Compilation Warnings on IA64 Fenghua Yu
  2009-05-20 17:42         ` [PATCH] Time out for possible dead loops during queued invalidation wait Fenghua Yu
  0 siblings, 2 replies; 61+ messages in thread
From: Yu, Fenghua @ 2009-04-20 17:27 UTC (permalink / raw)
  To: 'David Woodhouse'
  Cc: 'Ingo Molnar', 'Linus Torvalds',
	Han, Weidong, 'LKML', 'IOMMU'

>On Wed, 2009-04-15 at 17:19 -0700, Fenghua Yu wrote:
>> +       if (iommu_pass_through)
>> +               if (!pass_through) {
>> +                       printk(KERN_INFO
>> +                               "Pass Through is not supported by
>hardware.\n");
>> +                       iommu_pass_through = 0;
>> +               }
>
>So if we ask for pass-through and the hardware doesn't support it, we
>end up in full-translation mode? Wouldn't it be better to set up 1:1
>mappings for each device instead?
>
>(We probably want to fix up the code which sets up 1:1 mappings to
>actually share page tables where appropriate, and to use superpages).
>
>Also, did we already have a discussion about whether this should be an
>intel_iommu= parameter, or a generic iommu= one?

For this patch, I would just fall back to "normal" translation if without hardware pass-through support.

I'm doing 1:1 mapping now to speed up mapping and unmapping. When the 1:1 mapping is implemented, the fall back from pass-through will go to 1:1 mapping which will be "normal" translation.

BTW, I think most of (or newer) VT-d hardware supports pass through feature now. So it won't fail anyway if pass through is wanted.

So at this point, this should be fine.

>
>>                 ret = domain_context_mapping_one(domain,
>> -                                                pci_domain_nr(parent-
>>bus),
>> -                                                parent->bus->number,
>> -                                                parent->devfn);
>> +                                               pci_domain_nr(parent-
>>bus),
>> +                                               parent->bus->number,
>> +                                               parent->devfn,
>translation);
>
>Whitespace damage here -- you're modifying continuation lines which used
>to be right underneath the 'domain' parameter, and you're making them
>not line up properly.
>
>You have the same problem elsewhere for newly-added code too, in fact:
>
>> +               ret = domain_context_mapping(domain, pdev,
>> +                                               CONTEXT_TT_PASS_THROUGH);
>

OK. I'll change this to align to the upper parameter.

Thanks.

-Fenghuas

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [PATCH] Intel IOMMU Pass Through Support
  2009-04-16  0:19   ` [PATCH] Intel IOMMU Pass Through Support Fenghua Yu
  2009-04-16  2:13     ` Han, Weidong
  2009-04-19 10:05     ` David Woodhouse
@ 2009-04-30 23:29     ` Andrew Morton
  2009-04-30 23:37       ` Randy Dunlap
  2 siblings, 1 reply; 61+ messages in thread
From: Andrew Morton @ 2009-04-30 23:29 UTC (permalink / raw)
  To: Fenghua Yu; +Cc: dwmw2, mingo, torvalds, weidong.han, linux-kernel, iommu

On Wed, 15 Apr 2009 17:19:57 -0700
Fenghua Yu <fenghua.yu@intel.com> wrote:

> The patch adds kernel parameter intel_iommu=pt to set up pass through mode in
> context mapping entry. This disables DMAR in linux kernel; but KVM still runs on
> VT-d and interrupt remapping still works.
> 
> In this mode, kernel uses swiotlb for DMA API functions but other VT-d
> functionalities are enabled for KVM. KVM always uses multi level translation
> page table in VT-d. By default, pass though mode is disabled in kernel.
> 
> This is useful when people don't want to enable VT-d DMAR in kernel but still
> want to use KVM and interrupt remapping for reasons like DMAR performance
> concern or debug purpose.
> 

The patch is now in linux-next and broke my build.

arch/x86/built-in.o: In function `iommu_setup':
arch/x86/kernel/pci-dma.c:215: undefined reference to `iommu_pass_through'
arch/x86/built-in.o: In function `pci_swiotlb_init':
arch/x86/kernel/pci-swiotlb.c:74: undefined reference to `iommu_pass_through'

Because iommu_pass_through is defined in drivers/pci/intel-iommu.c and

# CONFIG_DMAR is not set

I'll need to cook up some local hack to work around that.



Also, the patch in linux-next (but not the one which I'm replying to
here does:

: --- a/arch/x86/kernel/pci-dma.c
: +++ b/arch/x86/kernel/pci-dma.c
: @@ -160,6 +160,8 @@ again:
:         return page_address(page);
:  }
:  
: +extern int iommu_pass_through;
: +

Which is wrong and unnecessary - the variable was already declared in a
header file.

scripts/checkpatch.pl would have warned you about this error, but
apparently neither you nor the people who reviewed or merged the patch
bother to use it.


^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [PATCH] Intel IOMMU Pass Through Support
  2009-04-30 23:29     ` [PATCH] Intel IOMMU Pass Through Support Andrew Morton
@ 2009-04-30 23:37       ` Randy Dunlap
  2009-05-01  0:00         ` Andrew Morton
  2009-05-01  0:05         ` Fenghua Yu
  0 siblings, 2 replies; 61+ messages in thread
From: Randy Dunlap @ 2009-04-30 23:37 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Fenghua Yu, dwmw2, mingo, torvalds, weidong.han, linux-kernel, iommu

Andrew Morton wrote:
> On Wed, 15 Apr 2009 17:19:57 -0700
> Fenghua Yu <fenghua.yu@intel.com> wrote:
> 
>> The patch adds kernel parameter intel_iommu=pt to set up pass through mode in
>> context mapping entry. This disables DMAR in linux kernel; but KVM still runs on
>> VT-d and interrupt remapping still works.
>>
>> In this mode, kernel uses swiotlb for DMA API functions but other VT-d
>> functionalities are enabled for KVM. KVM always uses multi level translation
>> page table in VT-d. By default, pass though mode is disabled in kernel.
>>
>> This is useful when people don't want to enable VT-d DMAR in kernel but still
>> want to use KVM and interrupt remapping for reasons like DMAR performance
>> concern or debug purpose.
>>
> 
> The patch is now in linux-next and broke my build.
> 
> arch/x86/built-in.o: In function `iommu_setup':
> arch/x86/kernel/pci-dma.c:215: undefined reference to `iommu_pass_through'
> arch/x86/built-in.o: In function `pci_swiotlb_init':
> arch/x86/kernel/pci-swiotlb.c:74: undefined reference to `iommu_pass_through'
> 
> Because iommu_pass_through is defined in drivers/pci/intel-iommu.c and
> 
> # CONFIG_DMAR is not set
> 
> I'll need to cook up some local hack to work around that.
> 

Patch just went to linux-next mailing list (but should have also
gone to lkml):

http://marc.info/?l=linux-next&m=124113213400748&w=2


> 
> Also, the patch in linux-next (but not the one which I'm replying to
> here does:
> 
> : --- a/arch/x86/kernel/pci-dma.c
> : +++ b/arch/x86/kernel/pci-dma.c
> : @@ -160,6 +160,8 @@ again:
> :         return page_address(page);
> :  }
> :  
> : +extern int iommu_pass_through;
> : +
> 
> Which is wrong and unnecessary - the variable was already declared in a
> header file.
> 
> scripts/checkpatch.pl would have warned you about this error, but
> apparently neither you nor the people who reviewed or merged the patch
> bother to use it.


^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [PATCH] Intel IOMMU Pass Through Support
  2009-04-30 23:37       ` Randy Dunlap
@ 2009-05-01  0:00         ` Andrew Morton
  2009-05-01  0:57           ` Fenghua Yu
  2009-05-01  0:05         ` Fenghua Yu
  1 sibling, 1 reply; 61+ messages in thread
From: Andrew Morton @ 2009-05-01  0:00 UTC (permalink / raw)
  To: Randy Dunlap
  Cc: fenghua.yu, dwmw2, mingo, torvalds, weidong.han, linux-kernel, iommu

On Thu, 30 Apr 2009 16:37:52 -0700
Randy Dunlap <rdunlap@xenotime.net> wrote:

> > 
> > arch/x86/built-in.o: In function `iommu_setup':
> > arch/x86/kernel/pci-dma.c:215: undefined reference to `iommu_pass_through'
> > arch/x86/built-in.o: In function `pci_swiotlb_init':
> > arch/x86/kernel/pci-swiotlb.c:74: undefined reference to `iommu_pass_through'
> > 
> > Because iommu_pass_through is defined in drivers/pci/intel-iommu.c and
> > 
> > # CONFIG_DMAR is not set
> > 
> > I'll need to cook up some local hack to work around that.
> > 
> 
> Patch just went to linux-next mailing list (but should have also
> gone to lkml):
> 
> http://marc.info/?l=linux-next&m=124113213400748&w=2

OK, thanks.  My linux-next feed seems to have broken.

That patch modifies 

	arch/ia64/kernel/dma-mapping.c
	arch/x86/kernel/pci-dma.c

but it would be more logical to modify

	arch/ia64/kernel/pci-dma.c
	arch/x86/kernel/pci-dma.c

is there a reason why that cannot be done?

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [PATCH] Intel IOMMU Pass Through Support
  2009-04-30 23:37       ` Randy Dunlap
  2009-05-01  0:00         ` Andrew Morton
@ 2009-05-01  0:05         ` Fenghua Yu
  2009-05-01  0:14           ` Andrew Morton
  1 sibling, 1 reply; 61+ messages in thread
From: Fenghua Yu @ 2009-05-01  0:05 UTC (permalink / raw)
  To: Andrew Morton, Randy Dunlap
  Cc: Yu, Fenghua, dwmw2, mingo, torvalds, Han, Weidong, linux-kernel, iommu

On Thu, Apr 30, 2009 at 04:37:52PM -0700, Randy Dunlap wrote:
> Andrew Morton wrote:
> > On Wed, 15 Apr 2009 17:19:57 -0700
> > Fenghua Yu <fenghua.yu@intel.com> wrote:
> > 
> >> The patch adds kernel parameter intel_iommu=pt to set up pass through mode in
> >> context mapping entry. This disables DMAR in linux kernel; but KVM still runs on
> >> VT-d and interrupt remapping still works.
> >>
> >> In this mode, kernel uses swiotlb for DMA API functions but other VT-d
> >> functionalities are enabled for KVM. KVM always uses multi level translation
> >> page table in VT-d. By default, pass though mode is disabled in kernel.
> >>
> >> This is useful when people don't want to enable VT-d DMAR in kernel but still
> >> want to use KVM and interrupt remapping for reasons like DMAR performance
> >> concern or debug purpose.
> >>
> > 
> > The patch is now in linux-next and broke my build.
> > 
> > arch/x86/built-in.o: In function `iommu_setup':
> > arch/x86/kernel/pci-dma.c:215: undefined reference to `iommu_pass_through'
> > arch/x86/built-in.o: In function `pci_swiotlb_init':
> > arch/x86/kernel/pci-swiotlb.c:74: undefined reference to `iommu_pass_through'
> > 
> > Because iommu_pass_through is defined in drivers/pci/intel-iommu.c and
> > 
> > # CONFIG_DMAR is not set
> > 
> > I'll need to cook up some local hack to work around that.
> > 
> 
> Patch just went to linux-next mailing list (but should have also
> gone to lkml):
> 
> http://marc.info/?l=linux-next&m=124113213400748&w=2

That's right. I just posted this patch to fix the compiling errors.

> 
> 
> > 
> > Also, the patch in linux-next (but not the one which I'm replying to
> > here does:
> > 
> > : --- a/arch/x86/kernel/pci-dma.c
> > : +++ b/arch/x86/kernel/pci-dma.c
> > : @@ -160,6 +160,8 @@ again:
> > :         return page_address(page);
> > :  }
> > :  
> > : +extern int iommu_pass_through;
> > : +
> > 
> > Which is wrong and unnecessary - the variable was already declared in a
> > header file.
> > 
> > scripts/checkpatch.pl would have warned you about this error, but
> > apparently neither you nor the people who reviewed or merged the patch
> > bother to use it.

Actually checkpatch.pl doesn't report anything wrong with this statement.

The following patch removes this statement on the top of linux-next.

Signed-off-by: Fenghua Yu <fenghua.yu@intel.com>

---

 pci-dma.c |    2 --
 1 files changed, 2 deletions(-)

diff --git a/arch/x86/kernel/pci-dma.c b/arch/x86/kernel/pci-dma.c
index 8cad0d8..049005e 100644
--- a/arch/x86/kernel/pci-dma.c
+++ b/arch/x86/kernel/pci-dma.c
@@ -160,8 +162,6 @@ again:
 	return page_address(page);
 }
 
-extern int iommu_pass_through;
-
 /*
  * See <Documentation/x86_64/boot-options.txt> for the iommu kernel parameter
  * documentation.

^ permalink raw reply related	[flat|nested] 61+ messages in thread

* Re: [PATCH] Intel IOMMU Pass Through Support
  2009-05-01  0:05         ` Fenghua Yu
@ 2009-05-01  0:14           ` Andrew Morton
  0 siblings, 0 replies; 61+ messages in thread
From: Andrew Morton @ 2009-05-01  0:14 UTC (permalink / raw)
  To: Fenghua Yu
  Cc: rdunlap, fenghua.yu, dwmw2, mingo, torvalds, weidong.han,
	linux-kernel, iommu

On Thu, 30 Apr 2009 17:05:10 -0700
Fenghua Yu <fenghua.yu@intel.com> wrote:

> > > : --- a/arch/x86/kernel/pci-dma.c
> > > : +++ b/arch/x86/kernel/pci-dma.c
> > > : @@ -160,6 +160,8 @@ again:
> > > :         return page_address(page);
> > > :  }
> > > :  
> > > : +extern int iommu_pass_through;
> > > : +
> > > 
> > > Which is wrong and unnecessary - the variable was already declared in a
> > > header file.
> > > 
> > > scripts/checkpatch.pl would have warned you about this error, but
> > > apparently neither you nor the people who reviewed or merged the patch
> > > bother to use it.
> 
> Actually checkpatch.pl doesn't report anything wrong with this statement.

Running 4ed0d3e6c64cfd9ba4ceb2099b10d1cf8ece4320 through checkpatch produces
the below output:

WARNING: externs should be avoided in .c files
#83: FILE: arch/x86/kernel/pci-dma.c:163:
+extern int iommu_pass_through;

WARNING: suspect code indent for conditional statements (8, 15)
#108: FILE: arch/x86/kernel/pci-swiotlb.c:74:
+       if ((!iommu_detected && !no_iommu && max_pfn > MAX_DMA32_PFN) ||
[...]
               swiotlb = 1;


^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [PATCH] Intel IOMMU Pass Through Support
  2009-05-01  0:00         ` Andrew Morton
@ 2009-05-01  0:57           ` Fenghua Yu
  0 siblings, 0 replies; 61+ messages in thread
From: Fenghua Yu @ 2009-05-01  0:57 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Randy Dunlap, Yu, Fenghua, dwmw2, mingo, torvalds, Han, Weidong,
	linux-kernel, iommu, Tony Luck, Stephen Rothwell

On Thu, Apr 30, 2009 at 05:00:34PM -0700, Andrew Morton wrote:
> On Thu, 30 Apr 2009 16:37:52 -0700
> Randy Dunlap <rdunlap@xenotime.net> wrote:
> 
> > > 
> > > arch/x86/built-in.o: In function `iommu_setup':
> > > arch/x86/kernel/pci-dma.c:215: undefined reference to `iommu_pass_through'
> > > arch/x86/built-in.o: In function `pci_swiotlb_init':
> > > arch/x86/kernel/pci-swiotlb.c:74: undefined reference to `iommu_pass_through'
> > > 
> > > Because iommu_pass_through is defined in drivers/pci/intel-iommu.c and
> > > 
> > > # CONFIG_DMAR is not set
> > > 
> > > I'll need to cook up some local hack to work around that.
> > > 
> > 
> > Patch just went to linux-next mailing list (but should have also
> > gone to lkml):
> > 
> > http://marc.info/?l=linux-next&m=124113213400748&w=2
> 
> OK, thanks.  My linux-next feed seems to have broken.
> 
> That patch modifies 
> 
> 	arch/ia64/kernel/dma-mapping.c
> 	arch/x86/kernel/pci-dma.c
> 
> but it would be more logical to modify
> 
> 	arch/ia64/kernel/pci-dma.c
> 	arch/x86/kernel/pci-dma.c
> 
> is there a reason why that cannot be done?

No particular reason. I just follow iommu_detected definition which is arch/ia64
kernel/dma-mapping.c.

Now I updated the compiling error fix patch as follows:

This updated patch should fix the compiling errors and remove the extern
iommu_pass_through from drivers/pci/intel-iommu.c file.

Signed-off-by: Fenghua Yu <fenghua.yu@intel.com>

---

 arch/ia64/kernel/pci-dma.c |    2 ++
 arch/x86/kernel/pci-dma.c  |    4 ++--
 drivers/pci/intel-iommu.c  |    1 -
 3 files changed, 4 insertions(+), 3 deletions(-)

diff --git a/arch/ia64/kernel/pci-dma.c b/arch/ia64/kernel/pci-dma.c
index eb98738..ecdde25 100644
--- a/arch/ia64/kernel/pci-dma.c
+++ b/arch/ia64/kernel/pci-dma.c
@@ -32,6 +32,8 @@ int force_iommu __read_mostly = 1;
 int force_iommu __read_mostly;
 #endif
 
+int iommu_pass_through;
+
 /* Dummy device used for NULL arguments (normally ISA). Better would
    be probably a smaller DMA mask, but this is bug-to-bug compatible
    to i386. */
diff --git a/arch/x86/kernel/pci-dma.c b/arch/x86/kernel/pci-dma.c
index 8cad0d8..049005e 100644
--- a/arch/x86/kernel/pci-dma.c
+++ b/arch/x86/kernel/pci-dma.c
@@ -32,6 +32,8 @@ int no_iommu __read_mostly;
 /* Set this to 1 if there is a HW IOMMU in the system */
 int iommu_detected __read_mostly = 0;
 
+int iommu_pass_through;
+
 dma_addr_t bad_dma_address __read_mostly = 0;
 EXPORT_SYMBOL(bad_dma_address);
 
@@ -160,8 +162,6 @@ again:
 	return page_address(page);
 }
 
-extern int iommu_pass_through;
-
 /*
  * See <Documentation/x86_64/boot-options.txt> for the iommu kernel parameter
  * documentation.
diff --git a/drivers/pci/intel-iommu.c b/drivers/pci/intel-iommu.c
index d3ac5e5..66a9cba 100644
--- a/drivers/pci/intel-iommu.c
+++ b/drivers/pci/intel-iommu.c
@@ -288,7 +288,6 @@ int dmar_disabled = 1;
 static int __initdata dmar_map_gfx = 1;
 static int dmar_forcedac;
 static int intel_iommu_strict;
-int iommu_pass_through;
 
 #define DUMMY_DEVICE_DOMAIN_INFO ((struct device_domain_info *)(-1))
 static DEFINE_SPINLOCK(device_domain_lock);

^ permalink raw reply related	[flat|nested] 61+ messages in thread

* [PATCH] Fix Intel IOMMU Compilation Warnings on IA64
  2009-04-20 17:27       ` Yu, Fenghua
@ 2009-05-13 23:13         ` Fenghua Yu
  2009-05-14 15:17           ` David Woodhouse
  2009-05-20 17:42         ` [PATCH] Time out for possible dead loops during queued invalidation wait Fenghua Yu
  1 sibling, 1 reply; 61+ messages in thread
From: Fenghua Yu @ 2009-05-13 23:13 UTC (permalink / raw)
  To: 'David Woodhouse', 'Tony Luck'
  Cc: 'lkml', 'iommu', 'ia64'

Compiling kernel on IA64 reports two warnings in intel-iommu.c:

drivers/pci/intel-iommu.c:3150: warning: format ?%llx? expects type ?long long unsigned int?, but argument 4 has type ?u64?
drivers/pci/intel-iommu.c: In function ?intel_iommu_map_range?:
drivers/pci/intel-iommu.c:3201: warning: format ?%llx? expects type ?long long unsigned int?, but argument 4 has type ?u64?

The warnings are fixed by adding type cast unsigned long long.

Signed-off-by: Fenghua Yu <fegnhua.yu@intel.com>

---

 intel-iommu.c |    6 ++++--
 1 files changed, 4 insertions(+), 2 deletions(-)

diff --git a/drivers/pci/intel-iommu.c b/drivers/pci/intel-iommu.c
index a563fbe..6f8cc21 100644
--- a/drivers/pci/intel-iommu.c
+++ b/drivers/pci/intel-iommu.c
@@ -3147,7 +3147,8 @@ static int intel_iommu_attach_device(struct iommu_domain *domain,
 	if (end < dmar_domain->max_addr) {
 		printk(KERN_ERR "%s: iommu agaw (%d) is not "
 		       "sufficient for the mapped address (%llx)\n",
-		       __func__, iommu->agaw, dmar_domain->max_addr);
+		       __func__, iommu->agaw,
+		       (unsigned long long)dmar_domain->max_addr);
 		return -EFAULT;
 	}
 
@@ -3198,7 +3199,8 @@ static int intel_iommu_map_range(struct iommu_domain *domain,
 		if (end < max_addr) {
 			printk(KERN_ERR "%s: iommu agaw (%d) is not "
 			       "sufficient for the mapped address (%llx)\n",
-			       __func__, min_agaw, max_addr);
+			       __func__, min_agaw,
+			       (unsigned long long)max_addr);
 			return -EFAULT;
 		}
 		dmar_domain->max_addr = max_addr;

^ permalink raw reply related	[flat|nested] 61+ messages in thread

* Re: [PATCH] Fix Intel IOMMU Compilation Warnings on IA64
  2009-05-13 23:13         ` [PATCH] Fix Intel IOMMU Compilation Warnings on IA64 Fenghua Yu
@ 2009-05-14 15:17           ` David Woodhouse
  2009-05-14 15:31             ` Matthew Wilcox
  2009-05-14 17:59             ` Fenghua Yu
  0 siblings, 2 replies; 61+ messages in thread
From: David Woodhouse @ 2009-05-14 15:17 UTC (permalink / raw)
  To: Fenghua Yu
  Cc: 'Tony Luck', 'lkml', 'iommu', 'ia64'

On Wed, 2009-05-13 at 16:13 -0700, Fenghua Yu wrote:
> Compiling kernel on IA64 reports two warnings in intel-iommu.c:
> 
> drivers/pci/intel-iommu.c:3150: warning: format ?%llx? expects
> type ?long long unsigned int?, but argument 4 has type ?u64?
> drivers/pci/intel-iommu.c: In function ?intel_iommu_map_range?:
> drivers/pci/intel-iommu.c:3201: warning: format ?%llx? expects
> type ?long long unsigned int?, but argument 4 has type ?u64?

Charset corruption there? I'm sure GCC didn't actually use question
marks...

> The warnings are fixed by adding type cast unsigned long long.
> 
> Signed-off-by: Fenghua Yu <fegnhua.yu@intel.com>
> 
> ---
> 
>  intel-iommu.c |    6 ++++--
>  1 files changed, 4 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/pci/intel-iommu.c b/drivers/pci/intel-iommu.c
> index a563fbe..6f8cc21 100644
> --- a/drivers/pci/intel-iommu.c
> +++ b/drivers/pci/intel-iommu.c
> @@ -3147,7 +3147,8 @@ static int intel_iommu_attach_device(struct
> iommu_domain *domain,
>         if (end < dmar_domain->max_addr) {
>                 printk(KERN_ERR "%s: iommu agaw (%d) is not "
>                        "sufficient for the mapped address (%llx)\n",
> -                      __func__, iommu->agaw, dmar_domain->max_addr);
> +                      __func__, iommu->agaw,
> +                      (unsigned long long)dmar_domain->max_addr);
>                 return -EFAULT;
>         }

Perhaps this would be better, modelled after commit fe333321:

diff --git a/arch/ia64/include/asm/types.h b/arch/ia64/include/asm/types.h
index e36b371..b0ecc20 100644
--- a/arch/ia64/include/asm/types.h
+++ b/arch/ia64/include/asm/types.h
@@ -13,7 +13,11 @@
  *	David Mosberger-Tang <davidm@hpl.hp.com>, Hewlett-Packard Co
  */
 
+#ifdef __KERNEL__
+#include <asm-generic/int-ll64.h>
+#else
 #include <asm-generic/int-l64.h>
+#endif
 
 #ifdef __ASSEMBLY__
 # define __IA64_UL(x)		(x)

-- 
David Woodhouse                            Open Source Technology Centre
David.Woodhouse@intel.com                              Intel Corporation


^ permalink raw reply related	[flat|nested] 61+ messages in thread

* Re: [PATCH] Fix Intel IOMMU Compilation Warnings on IA64
  2009-05-14 15:17           ` David Woodhouse
@ 2009-05-14 15:31             ` Matthew Wilcox
  2009-05-14 17:59             ` Fenghua Yu
  1 sibling, 0 replies; 61+ messages in thread
From: Matthew Wilcox @ 2009-05-14 15:31 UTC (permalink / raw)
  To: David Woodhouse
  Cc: Fenghua Yu, 'Tony Luck', 'lkml', 'iommu',
	'ia64'

On Thu, May 14, 2009 at 04:17:51PM +0100, David Woodhouse wrote:
> Perhaps this would be better, modelled after commit fe333321:
>  
> +#ifdef __KERNEL__
> +#include <asm-generic/int-ll64.h>
> +#else
>  #include <asm-generic/int-l64.h>
> +#endif

It's certainly something I've been lobbying for for a while.  There's
various new warnings that crop up, and I wasn't able to log into the
system Tony offered me to fix it on.  Maybe someone who cares about ia64
these days could take care of fixing up the remaining warnings?

I see http://kerneltrap.org/mailarchive/linux-ia64/2008/10/19/3945454
is out there.  There might be a more recent version ... somewhere ...
Perhaps Tony has a copy of it?

-- 
Matthew Wilcox				Intel Open Source Technology Centre
"Bill, look, we understand that you're interested in selling us this
operating system, but compare it to ours.  We can't possibly take such
a retrograde step."

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [PATCH] Fix Intel IOMMU Compilation Warnings on IA64
  2009-05-14 15:17           ` David Woodhouse
  2009-05-14 15:31             ` Matthew Wilcox
@ 2009-05-14 17:59             ` Fenghua Yu
  2009-06-18 18:05               ` [PATCH 1/2] IOMMU Identity Mapping Support: iommu_identity_mapping definition Fenghua Yu
  1 sibling, 1 reply; 61+ messages in thread
From: Fenghua Yu @ 2009-05-14 17:59 UTC (permalink / raw)
  To: David Woodhouse
  Cc: Yu, Fenghua, Luck, Tony, 'lkml', 'iommu', 'ia64'

On Thu, May 14, 2009 at 08:17:51AM -0700, David Woodhouse wrote:
> On Wed, 2009-05-13 at 16:13 -0700, Fenghua Yu wrote:
> > Compiling kernel on IA64 reports two warnings in intel-iommu.c:
> > 
> > drivers/pci/intel-iommu.c:3150: warning: format ?%llx? expects
> > type ?long long unsigned int?, but argument 4 has type ?u64?
> > drivers/pci/intel-iommu.c: In function ?intel_iommu_map_range?:
> > drivers/pci/intel-iommu.c:3201: warning: format ?%llx? expects
> > type ?long long unsigned int?, but argument 4 has type ?u64?
> 
> Charset corruption there? I'm sure GCC didn't actually use question
> marks...

Yes, somehow the charset is corrupted during procedure. Below is correct one:

drivers/pci/intel-iommu.c: In function ‘intel_iommu_attach_device’:
drivers/pci/intel-iommu.c:3150: warning: format ‘%llx’ expects type ‘long long unsigned int’, but argument 4 has type ‘u64’
drivers/pci/intel-iommu.c: In function ‘intel_iommu_map_range’:
drivers/pci/intel-iommu.c:3201: warning: format ‘%llx’ expects type ‘long long unsigned int’, but argument 4 has type ‘u64’

> 
> Perhaps this would be better, modelled after commit fe333321:
> 
> diff --git a/arch/ia64/include/asm/types.h b/arch/ia64/include/asm/types.h
> index e36b371..b0ecc20 100644
> --- a/arch/ia64/include/asm/types.h
> +++ b/arch/ia64/include/asm/types.h
> @@ -13,7 +13,11 @@
>   *	David Mosberger-Tang <davidm@hpl.hp.com>, Hewlett-Packard Co
>   */
>  
> +#ifdef __KERNEL__
> +#include <asm-generic/int-ll64.h>
> +#else
>  #include <asm-generic/int-l64.h>
> +#endif
>  
>  #ifdef __ASSEMBLY__
>  # define __IA64_UL(x)		(x)
> 

A lot of places in IA64 kernel assume l64. So it would be a big patch and
testing to change to ll64. I assume Matthew's patch will do that?

Thanks.

-Fenghua

^ permalink raw reply	[flat|nested] 61+ messages in thread

* [PATCH] Time out for possible dead loops during queued invalidation wait
  2009-04-20 17:27       ` Yu, Fenghua
  2009-05-13 23:13         ` [PATCH] Fix Intel IOMMU Compilation Warnings on IA64 Fenghua Yu
@ 2009-05-20 17:42         ` Fenghua Yu
  2009-05-27  5:51           ` Andrew Morton
  1 sibling, 1 reply; 61+ messages in thread
From: Fenghua Yu @ 2009-05-20 17:42 UTC (permalink / raw)
  To: 'David Woodhouse', 'Ingo Molnar',
	'Andrew Morton'
  Cc: 'LKML', 'IOMMU'

Two loops in qi_submit_sync() do not have time out. If hardware can not finish
the queued invalidation for some reason, the loops could end up in dead loops
without any hint for what is going on. I add time out for the loops and report
warning when time out happens.
 
Signed-off-by: Fenghua Yu <fenghua.yu@intel.com>

---

 dmar.c |   12 ++++++++++++
 1 files changed, 12 insertions(+)

diff --git a/drivers/pci/dmar.c b/drivers/pci/dmar.c
index fa3a113..95baacd 100644
--- a/drivers/pci/dmar.c
+++ b/drivers/pci/dmar.c
@@ -637,6 +637,7 @@ int qi_submit_sync(struct qi_desc *desc, struct intel_iommu *iommu)
 	struct qi_desc *hw, wait_desc;
 	int wait_index, index;
 	unsigned long flags;
+	cycles_t start_time;
 
 	if (!qi)
 		return 0;
@@ -644,8 +645,13 @@ int qi_submit_sync(struct qi_desc *desc, struct intel_iommu *iommu)
 	hw = qi->desc;
 
 	spin_lock_irqsave(&qi->q_lock, flags);
+	start_time = get_cycles();
 	while (qi->free_cnt < 3) {
 		spin_unlock_irqrestore(&qi->q_lock, flags);
+		if (DMAR_OPERATION_TIMEOUT < (get_cycles() - start_time)) {
+			WARN(1, "No space in invalidation queue.\n");
+			return -ENOSPC;
+		}
 		cpu_relax();
 		spin_lock_irqsave(&qi->q_lock, flags);
 	}
@@ -675,6 +681,7 @@ int qi_submit_sync(struct qi_desc *desc, struct intel_iommu *iommu)
 	 */
 	writel(qi->free_head << 4, iommu->reg + DMAR_IQT_REG);
 
+	start_time = get_cycles();
 	while (qi->desc_status[wait_index] != QI_DONE) {
 		/*
 		 * We will leave the interrupts disabled, to prevent interrupt
@@ -687,6 +694,11 @@ int qi_submit_sync(struct qi_desc *desc, struct intel_iommu *iommu)
 		if (rc)
 			goto out;
 
+		if (DMAR_OPERATION_TIMEOUT < (get_cycles() - start_time)) {
+			WARN(1, "Queued invalidation can not complete.\n");
+			goto out;
+		}
+
 		spin_unlock(&qi->q_lock);
 		cpu_relax();
 		spin_lock(&qi->q_lock);

^ permalink raw reply related	[flat|nested] 61+ messages in thread

* Re: [PATCH] Time out for possible dead loops during queued invalidation wait
  2009-05-20 17:42         ` [PATCH] Time out for possible dead loops during queued invalidation wait Fenghua Yu
@ 2009-05-27  5:51           ` Andrew Morton
  2009-05-27 22:40             ` Yu, Fenghua
  0 siblings, 1 reply; 61+ messages in thread
From: Andrew Morton @ 2009-05-27  5:51 UTC (permalink / raw)
  To: Fenghua Yu
  Cc: 'David Woodhouse', 'Ingo Molnar', 'LKML',
	'IOMMU'

On Wed, 20 May 2009 10:42:59 -0700 Fenghua Yu <fenghua.yu@intel.com> wrote:
>
> Subject: [PATCH] Time out for possible dead loops during queued invalidation wait

nits:

Please ensure that each patch title identifies the subsystem which is
being altered.  Because someone who reads this title has no clue what
part of the kernel is affected unless they dive in and read the actual
patch.  Suitable title prefixes for this one would be "dmar: " or
"drivers/pci/dmar.c: ".

The usual term for a timeout is "timeout", not "time out".

The term "dead loop" is unclear.  The reader might think that it refers
to a never-executed loop, as in "dead code".  A better term is
"infinite loop".

So I ended up with the title

	"drivers/pci/dmar.c: timeout for possible infinite loops during queued invalidation wait"

Welcome to my life :(

> Two loops in qi_submit_sync() do not have time out. If hardware can not finish
> the queued invalidation for some reason, the loops could end up in dead loops
> without any hint for what is going on. I add time out for the loops and report
> warning when time out happens.
>  
> Signed-off-by: Fenghua Yu <fenghua.yu@intel.com>

ok...

>  dmar.c |   12 ++++++++++++
>  1 files changed, 12 insertions(+)
> 
> diff --git a/drivers/pci/dmar.c b/drivers/pci/dmar.c
> index fa3a113..95baacd 100644
> --- a/drivers/pci/dmar.c
> +++ b/drivers/pci/dmar.c
> @@ -637,6 +637,7 @@ int qi_submit_sync(struct qi_desc *desc, struct intel_iommu *iommu)
>  	struct qi_desc *hw, wait_desc;
>  	int wait_index, index;
>  	unsigned long flags;
> +	cycles_t start_time;

It seems strange to me that the driver chose to implement a ten second
timeout using such a high resolution thing as cycles_t.  Why not use
plain old jiffies?  The advantages are:

- jiffies can be read very efficiently

- there's more kernel support for manipulating jiffies-based values.

For example,

>  	if (!qi)
>  		return 0;
> @@ -644,8 +645,13 @@ int qi_submit_sync(struct qi_desc *desc, struct intel_iommu *iommu)
>  	hw = qi->desc;
>  
>  	spin_lock_irqsave(&qi->q_lock, flags);
> +	start_time = get_cycles();
>  	while (qi->free_cnt < 3) {
>  		spin_unlock_irqrestore(&qi->q_lock, flags);
> +		if (DMAR_OPERATION_TIMEOUT < (get_cycles() - start_time)) {

if we were using jiffies, this nasty thing could just use time_after().

But that's all outside the scope of your patch.


I'd find it more readable if the above were coded as

	if (get_cycles() - start_time >= DMAR_OPERATION_TIMEOUT)

but maybe that's just me.

> +			WARN(1, "No space in invalidation queue.\n");
> +			return -ENOSPC;

ENOSPC means "your disk filled up".  I think it makes no sense to use
that error code in this context, even though it kinda sounds the same.

> +		}
>  		cpu_relax();
>  		spin_lock_irqsave(&qi->q_lock, flags);
>  	}
> @@ -675,6 +681,7 @@ int qi_submit_sync(struct qi_desc *desc, struct intel_iommu *iommu)
>  	 */
>  	writel(qi->free_head << 4, iommu->reg + DMAR_IQT_REG);
>  
> +	start_time = get_cycles();
>  	while (qi->desc_status[wait_index] != QI_DONE) {
>  		/*
>  		 * We will leave the interrupts disabled, to prevent interrupt
> @@ -687,6 +694,11 @@ int qi_submit_sync(struct qi_desc *desc, struct intel_iommu *iommu)
>  		if (rc)
>  			goto out;
>  
> +		if (DMAR_OPERATION_TIMEOUT < (get_cycles() - start_time)) {
> +			WARN(1, "Queued invalidation can not complete.\n");
> +			goto out;

As `rc' now contains zero, this will cause the function to incorrectly
return the "success" code, even though it is known that it did not
succeed.

> +		}
> +
>  		spin_unlock(&qi->q_lock);
>  		cpu_relax();
>  		spin_lock(&qi->q_lock);

^ permalink raw reply	[flat|nested] 61+ messages in thread

* RE: [PATCH] Time out for possible dead loops during queued invalidation wait
  2009-05-27  5:51           ` Andrew Morton
@ 2009-05-27 22:40             ` Yu, Fenghua
  2009-05-27 22:48               ` Andrew Morton
  2009-06-18 18:05               ` [PATCH 2/2] IOMMU Identity Mapping Support: Intel IOMMU implementation Fenghua Yu
  0 siblings, 2 replies; 61+ messages in thread
From: Yu, Fenghua @ 2009-05-27 22:40 UTC (permalink / raw)
  To: 'Andrew Morton'
  Cc: 'David Woodhouse', 'Ingo Molnar', 'LKML',
	'IOMMU'

>>
>> Subject: [PATCH] Time out for possible dead loops during queued
>invalidation wait
>
>nits:
>
>Please ensure that each patch title identifies the subsystem which is
>being altered.  Because someone who reads this title has no clue what
>part of the kernel is affected unless they dive in and read the actual
>patch.  Suitable title prefixes for this one would be "dmar: " or
>"drivers/pci/dmar.c: ".
>
>The usual term for a timeout is "timeout", not "time out".
>
>The term "dead loop" is unclear.  The reader might think that it refers
>to a never-executed loop, as in "dead code".  A better term is
>"infinite loop".
>
>So I ended up with the title
>
>	"drivers/pci/dmar.c: timeout for possible infinite loops during
>queued invalidation wait"
>
>Welcome to my life :(
>
>> Two loops in qi_submit_sync() do not have time out. If hardware can not
>finish
>> the queued invalidation for some reason, the loops could end up in dead
>loops
>> without any hint for what is going on. I add time out for the loops and
>report
>> warning when time out happens.
>>
>> Signed-off-by: Fenghua Yu <fenghua.yu@intel.com>
>
>ok...
>
>>  dmar.c |   12 ++++++++++++
>>  1 files changed, 12 insertions(+)
>>
>> diff --git a/drivers/pci/dmar.c b/drivers/pci/dmar.c
>> index fa3a113..95baacd 100644
>> --- a/drivers/pci/dmar.c
>> +++ b/drivers/pci/dmar.c
>> @@ -637,6 +637,7 @@ int qi_submit_sync(struct qi_desc *desc, struct
>intel_iommu *iommu)
>>  	struct qi_desc *hw, wait_desc;
>>  	int wait_index, index;
>>  	unsigned long flags;
>> +	cycles_t start_time;

>It seems strange to me that the driver chose to implement a ten second
>timeout using such a high resolution thing as cycles_t.  Why not use
>plain old jiffies?  The advantages are:
>
>- jiffies can be read very efficiently
>
>- there's more kernel support for manipulating jiffies-based values.
>
>For example,

I'll use time_after() and jiffies in the updated patch. The get_cycles() approach is being used in other places in iommu code. I'll change them too.

>
>>  	if (!qi)
>>  		return 0;
>> @@ -644,8 +645,13 @@ int qi_submit_sync(struct qi_desc *desc, struct
>intel_iommu *iommu)
>>  	hw = qi->desc;
>>
>>  	spin_lock_irqsave(&qi->q_lock, flags);
>> +	start_time = get_cycles();
>>  	while (qi->free_cnt < 3) {
>>  		spin_unlock_irqrestore(&qi->q_lock, flags);
>> +		if (DMAR_OPERATION_TIMEOUT < (get_cycles() - start_time)) {
>
>if we were using jiffies, this nasty thing could just use time_after().
>
>But that's all outside the scope of your patch.
>
>
>I'd find it more readable if the above were coded as
>
>	if (get_cycles() - start_time >= DMAR_OPERATION_TIMEOUT)
>
>but maybe that's just me.
>
>> +			WARN(1, "No space in invalidation queue.\n");
>> +			return -ENOSPC;
>
>ENOSPC means "your disk filled up".  I think it makes no sense to use
>that error code in this context, even though it kinda sounds the same.
>

Which error code is better? Is EAGAIN ok?

>> +		}
>>  		cpu_relax();
>>  		spin_lock_irqsave(&qi->q_lock, flags);
>>  	}
>> @@ -675,6 +681,7 @@ int qi_submit_sync(struct qi_desc *desc, struct
>intel_iommu *iommu)
>>  	 */
>>  	writel(qi->free_head << 4, iommu->reg + DMAR_IQT_REG);
>>
>> +	start_time = get_cycles();
>>  	while (qi->desc_status[wait_index] != QI_DONE) {
>>  		/*
>>  		 * We will leave the interrupts disabled, to prevent interrupt
>> @@ -687,6 +694,11 @@ int qi_submit_sync(struct qi_desc *desc, struct
>intel_iommu *iommu)
>>  		if (rc)
>>  			goto out;
>>
>> +		if (DMAR_OPERATION_TIMEOUT < (get_cycles() - start_time)) {
>> +			WARN(1, "Queued invalidation can not complete.\n");
>> +			goto out;
>
>As `rc' now contains zero, this will cause the function to incorrectly
>return the "success" code, even though it is known that it did not
>succeed.
>
>> +		}
>> +
>>  		spin_unlock(&qi->q_lock);
>>  		cpu_relax();
>>  		spin_lock(&qi->q_lock);

I'll change this code.

Thanks.

-Fenghua

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [PATCH] Time out for possible dead loops during queued invalidation wait
  2009-05-27 22:40             ` Yu, Fenghua
@ 2009-05-27 22:48               ` Andrew Morton
  2009-05-27 23:25                 ` Yu, Fenghua
  2009-06-18 18:05               ` [PATCH 2/2] IOMMU Identity Mapping Support: Intel IOMMU implementation Fenghua Yu
  1 sibling, 1 reply; 61+ messages in thread
From: Andrew Morton @ 2009-05-27 22:48 UTC (permalink / raw)
  To: Yu, Fenghua; +Cc: dwmw2, mingo, linux-kernel, iommu

On Wed, 27 May 2009 15:40:35 -0700
"Yu, Fenghua" <fenghua.yu@intel.com> wrote:

> >> +			WARN(1, "No space in invalidation queue.\n");
> >> +			return -ENOSPC;
> >
> >ENOSPC means "your disk filled up".  I think it makes no sense to use
> >that error code in this context, even though it kinda sounds the same.
> >
> 
> Which error code is better? Is EAGAIN ok?

That depends on driver details - probably EIO would be suitable, dunno.

But all the callers of qi_submit_sync() seem to just drop the error
code on the floor:

	/* should never fail */
	qi_submit_sync(&desc, iommu);

and may well cause a kernel crash as a result.

^ permalink raw reply	[flat|nested] 61+ messages in thread

* RE: [PATCH] Time out for possible dead loops during queued invalidation wait
  2009-05-27 22:48               ` Andrew Morton
@ 2009-05-27 23:25                 ` Yu, Fenghua
  2009-05-27 23:51                   ` Andrew Morton
  0 siblings, 1 reply; 61+ messages in thread
From: Yu, Fenghua @ 2009-05-27 23:25 UTC (permalink / raw)
  To: 'Andrew Morton'
  Cc: 'dwmw2@infradead.org', 'mingo@elte.hu',
	'linux-kernel@vger.kernel.org',
	'iommu@lists.linux-foundation.org'

>> Which error code is better? Is EAGAIN ok?
>
>That depends on driver details - probably EIO would be suitable, dunno.
>
>But all the callers of qi_submit_sync() seem to just drop the error
>code on the floor:
>
>	/* should never fail */
>	qi_submit_sync(&desc, iommu);
>
>and may well cause a kernel crash as a result.

Should the code go to kernel panic after timeout in qi_submit_sync() loops? When timeout (10 seconds) in the loops, something in hardware could be wrong.

Thanks.

-Fenghua

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [PATCH] Time out for possible dead loops during queued invalidation wait
  2009-05-27 23:25                 ` Yu, Fenghua
@ 2009-05-27 23:51                   ` Andrew Morton
  2009-05-28  0:47                     ` Yu, Fenghua
  0 siblings, 1 reply; 61+ messages in thread
From: Andrew Morton @ 2009-05-27 23:51 UTC (permalink / raw)
  To: Yu, Fenghua; +Cc: dwmw2, mingo, linux-kernel, iommu

On Wed, 27 May 2009 16:25:52 -0700
"Yu, Fenghua" <fenghua.yu@intel.com> wrote:

> >> Which error code is better? Is EAGAIN ok?
> >
> >That depends on driver details - probably EIO would be suitable, dunno.
> >
> >But all the callers of qi_submit_sync() seem to just drop the error
> >code on the floor:
> >
> >	/* should never fail */
> >	qi_submit_sync(&desc, iommu);
> >
> >and may well cause a kernel crash as a result.
> 
> Should the code go to kernel panic after timeout in qi_submit_sync() loops? When timeout (10 seconds) in the loops, something in hardware could be wrong.
> 

The most important thing to do when an error is detected is to protect
the user's data, perhaps by reliably halting the kernel.

The second most important thing is to report what happened, so people
can fix things.

The third most important thing is to attempt to recover from the error
so that the kernel continues to function.  This third requirement often
makes the second one more successful: a still-running kernel can report
things which a crashed kernel cannot.

In this particualr case I'd suggest that the driver be converted to
correctly recognise errors, clean things up and propagate the errors
back in an orderly fashion, as usual.


otoh...  How likely is it that this timeout actually occurs?  If it's
only conceivable that this can happen when the hardware is busted then
I'd suggest that we not patch the kernel at all - the kernel really has
little chance of surviving broken silicon so why bother adding a little
bit of code here to handle a tiny subset of it?

If the chip is indeed busted then afacit the kernel will get stuck in
an infinite loop.  That's OK, we can still diagnose those with NMI
watchdogs, sysrq-p handlers, etc.


^ permalink raw reply	[flat|nested] 61+ messages in thread

* RE: [PATCH] Time out for possible dead loops during queued invalidation wait
  2009-05-27 23:51                   ` Andrew Morton
@ 2009-05-28  0:47                     ` Yu, Fenghua
  0 siblings, 0 replies; 61+ messages in thread
From: Yu, Fenghua @ 2009-05-28  0:47 UTC (permalink / raw)
  To: 'Andrew Morton'
  Cc: 'dwmw2@infradead.org', 'mingo@elte.hu',
	'linux-kernel@vger.kernel.org',
	'iommu@lists.linux-foundation.org',
	Siddha, Suresh B

>> >> Which error code is better? Is EAGAIN ok?
>> >
>> >That depends on driver details - probably EIO would be suitable, dunno.
>> >
>> >But all the callers of qi_submit_sync() seem to just drop the error
>> >code on the floor:
>> >
>> >	/* should never fail */
>> >	qi_submit_sync(&desc, iommu);
>> >
>> >and may well cause a kernel crash as a result.
>>
>> Should the code go to kernel panic after timeout in qi_submit_sync()
>loops? When timeout (10 seconds) in the loops, something in hardware could
>be wrong.
>>
>
>The most important thing to do when an error is detected is to protect
>the user's data, perhaps by reliably halting the kernel.
>
>The second most important thing is to report what happened, so people
>can fix things.
>
>The third most important thing is to attempt to recover from the error
>so that the kernel continues to function.  This third requirement often
>makes the second one more successful: a still-running kernel can report
>things which a crashed kernel cannot.
>
>In this particualr case I'd suggest that the driver be converted to
>correctly recognise errors, clean things up and propagate the errors
>back in an orderly fashion, as usual.
>
>
>otoh...  How likely is it that this timeout actually occurs?  If it's
>only conceivable that this can happen when the hardware is busted then
>I'd suggest that we not patch the kernel at all - the kernel really has
>little chance of surviving broken silicon so why bother adding a little
>bit of code here to handle a tiny subset of it?
>
>If the chip is indeed busted then afacit the kernel will get stuck in
>an infinite loop.  That's OK, we can still diagnose those with NMI
>watchdogs, sysrq-p handlers, etc.

This timeout can occur either when the hardware is busted or when iommu kernel software or BIOS has a bug. The timeout detection with a suitable error code can help debug the hardware or software issues.

In some cases (e.g. interrupt migration in interrupt remapping), the timeout is not a blocking issue; it just impacts performance. If there is not the timeout detection, the cpu ends up in an infinite loop although system should be able to continue to run. But the timeout detection can dump warning info for user to debug and the system can continue to run.

Seems I will also need to change the upper level callers to correctly handle the error code (EIO) generated by qi_submit_sync().

Thanks.

-Fenghua

^ permalink raw reply	[flat|nested] 61+ messages in thread

* [PATCH 1/2] IOMMU Identity Mapping Support: iommu_identity_mapping definition
  2009-05-14 17:59             ` Fenghua Yu
@ 2009-06-18 18:05               ` Fenghua Yu
  2009-06-18 18:08                 ` Muli Ben-Yehuda
  2009-06-18 18:13                 ` [PATCH 1/2] IOMMU Identity Mapping Support: iommu_identity_mapping definition Chris Wright
  0 siblings, 2 replies; 61+ messages in thread
From: Fenghua Yu @ 2009-06-18 18:05 UTC (permalink / raw)
  To: David Woodhouse, 'Linus Torvalds',
	'Stephen Rothwell', 'Andrew Morton',
	'Ingo Molnar', 'Christopher Wright',
	'Allen Kay'
  Cc: 'lkml', 'iommu'

IOMMU Identity Mapping Support: iommu_identity_mapping definition

Identity mapping for IOMMU defines a single domain to 1:1 map all pci devices
to all usable memory.

This will reduces map/unmap overhead in DMA API's and improve IOMMU performance.
On 10Gb network cards, Netperf shows no performance degradation compared to
non-IOMMU performance.

This method may lose some of DMA remapping benefits like isolation.

The first patch defines iommu_identity_mapping varialbe which controls the
identity mapping code and is 0 by default.

Signed-off-by: Fenghua Yu <fenghua.yu@intel.com>

---

 Documentation/kernel-parameters.txt |    1 +
 arch/ia64/include/asm/iommu.h       |    2 ++
 arch/ia64/kernel/pci-dma.c          |    1 +
 arch/x86/include/asm/iommu.h        |    1 +
 arch/x86/kernel/pci-dma.c           |    5 +++++
 5 files changed, 10 insertions(+)

diff --git a/Documentation/kernel-parameters.txt b/Documentation/kernel-parameters.txt
index 4b78797..6b1d7b4 100644
--- a/Documentation/kernel-parameters.txt
+++ b/Documentation/kernel-parameters.txt
@@ -988,6 +988,7 @@ and is between 256 and 4096 characters. It is defined in the file
 		forcesac
 		soft
 		pt	[x86, IA64]
+		identity [x86, IA64]
 
 	io7=		[HW] IO7 for Marvel based alpha systems
 			See comment before marvel_specify_io7 in
diff --git a/arch/ia64/include/asm/iommu.h b/arch/ia64/include/asm/iommu.h
index 745e095..cc6d59f 100644
--- a/arch/ia64/include/asm/iommu.h
+++ b/arch/ia64/include/asm/iommu.h
@@ -11,8 +11,10 @@ extern int force_iommu, no_iommu;
 extern int iommu_detected;
 #ifdef	CONFIG_DMAR
 extern int iommu_pass_through;
+extern int iommu_identity_mapping;
 #else
 #define iommu_pass_through	(0)
+#define iommu_identity_mapping  (0)
 #endif
 extern void iommu_dma_init(void);
 extern void machvec_init(const char *name);
diff --git a/arch/ia64/kernel/pci-dma.c b/arch/ia64/kernel/pci-dma.c
index 0569596..15b8555 100644
--- a/arch/ia64/kernel/pci-dma.c
+++ b/arch/ia64/kernel/pci-dma.c
@@ -33,6 +33,7 @@ int force_iommu __read_mostly;
 #endif
 
 int iommu_pass_through;
+int iommu_identity_mapping;
 
 /* Dummy device used for NULL arguments (normally ISA). Better would
    be probably a smaller DMA mask, but this is bug-to-bug compatible
diff --git a/arch/x86/include/asm/iommu.h b/arch/x86/include/asm/iommu.h
index fd6d21b..d2aee4a 100644
--- a/arch/x86/include/asm/iommu.h
+++ b/arch/x86/include/asm/iommu.h
@@ -7,6 +7,7 @@ extern struct dma_map_ops nommu_dma_ops;
 extern int force_iommu, no_iommu;
 extern int iommu_detected;
 extern int iommu_pass_through;
+extern int iommu_identity_mapping;
 
 /* 10 seconds */
 #define DMAR_OPERATION_TIMEOUT ((cycles_t) tsc_khz*10*1000)
diff --git a/arch/x86/kernel/pci-dma.c b/arch/x86/kernel/pci-dma.c
index 049005e..489179a 100644
--- a/arch/x86/kernel/pci-dma.c
+++ b/arch/x86/kernel/pci-dma.c
@@ -33,6 +33,7 @@ int no_iommu __read_mostly;
 int iommu_detected __read_mostly = 0;
 
 int iommu_pass_through;
+int iommu_identity_mapping;
 
 dma_addr_t bad_dma_address __read_mostly = 0;
 EXPORT_SYMBOL(bad_dma_address);
@@ -208,6 +209,10 @@ static __init int iommu_setup(char *p)
 			forbid_dac = -1;
 			return 1;
 		}
+		if (!strncmp(p, "identity", 8)) {
+			iommu_identity_mapping = 1;
+			return 1;
+		}
 #ifdef CONFIG_SWIOTLB
 		if (!strncmp(p, "soft", 4))
 			swiotlb = 1;

^ permalink raw reply related	[flat|nested] 61+ messages in thread

* [PATCH 2/2] IOMMU Identity Mapping Support: Intel IOMMU implementation
  2009-05-27 22:40             ` Yu, Fenghua
  2009-05-27 22:48               ` Andrew Morton
@ 2009-06-18 18:05               ` Fenghua Yu
  2009-06-18 19:15                 ` Chris Wright
  1 sibling, 1 reply; 61+ messages in thread
From: Fenghua Yu @ 2009-06-18 18:05 UTC (permalink / raw)
  To: David Woodhouse, 'Linus Torvalds',
	'Stephen Rothwell', 'Andrew Morton',
	'Ingo Molnar', 'Christopher Wright',
	'Allen Kay'
  Cc: 'lkml', 'iommu'

IOMMU Identity Mapping Support: Intel IOMMU implementation

Identity mapping for IOMMU defines a single domain to 1:1 map all pci devices
to all usable memory.

This reduces map/unmap overhead in DMA API's and improve IOMMU performance.
On 10Gb network cards, Netperf shows no performance degradation compared to
non-IOMMU performance.

This method may lose some of DMA remapping benefits like isolation.

The second patch sets up identity mapping for all pci devices to all usable
memory. In the DMA API's, there is no overhead to maintain page tables,
invalidate iotlb, flush cache etc.

Signed-off-by: Fenghua Yu <fenghua.yu@intel.com>

---

 intel-iommu.c |  210 ++++++++++++++++++++++++++++++++++++++++++++++++++--------
 1 files changed, 182 insertions(+), 28 deletions(-)


diff --git a/drivers/pci/intel-iommu.c b/drivers/pci/intel-iommu.c
index 178853a..26da407 100644
--- a/drivers/pci/intel-iommu.c
+++ b/drivers/pci/intel-iommu.c
@@ -39,6 +39,7 @@
 #include <linux/sysdev.h>
 #include <asm/cacheflush.h>
 #include <asm/iommu.h>
+#include <asm/e820.h>
 #include "pci.h"
 
 #define ROOT_SIZE		VTD_PAGE_SIZE
@@ -220,10 +221,26 @@ static inline bool dma_pte_present(struct dma_pte *pte)
 /* devices under the same p2p bridge are owned in one domain */
 #define DOMAIN_FLAG_P2P_MULTIPLE_DEVICES (1 << 0)
 
-/* domain represents a virtual machine, more than one devices
- * across iommus may be owned in one domain, e.g. kvm guest.
+/*
+ * This domain is a statically identity mapping domain.
+ *	1. This domain creats a static 1:1 mapping to all usable memory.
+ * 	2. It maps to each iommu if successful.
+ *	3. Each iommu mapps to this domain if successful.
+ */
+struct dmar_domain *si_domain;
+
+/*
+ * There are three types of domains which are determined by flags in the
+ * domains:
+ *	0: A domain only containing one pci device.
+ *	1: A specific domain si_domain which has static 1:1 map to all
+ *	   usable memory for all pci devices.
+ *	2: Domain represents a virtual machine, more than one devices across
+ *	   iommus may be owned in one domain, e.g. kvm guest.
  */
-#define DOMAIN_FLAG_VIRTUAL_MACHINE	(1 << 1)
+#define DOMAIN_FLAG_SINGLE_DEVICE	1
+#define DOMAIN_FLAG_STATIC_IDENTITY	2
+#define DOMAIN_FLAG_VIRTUAL_MACHINE	4
 
 struct dmar_domain {
 	int	id;			/* domain id */
@@ -435,12 +452,13 @@ int iommu_calculate_agaw(struct intel_iommu *iommu)
 	return __iommu_calculate_agaw(iommu, DEFAULT_DOMAIN_ADDRESS_WIDTH);
 }
 
-/* in native case, each domain is related to only one iommu */
+/* This functionin only returns single iommu in a domain */
 static struct intel_iommu *domain_get_iommu(struct dmar_domain *domain)
 {
 	int iommu_id;
 
-	BUG_ON(domain->flags & DOMAIN_FLAG_VIRTUAL_MACHINE);
+	/* si_domain and vm domain should not get here. */
+	BUG_ON(domain->flags & ~DOMAIN_FLAG_SINGLE_DEVICE);
 
 	iommu_id = find_first_bit(&domain->iommu_bmp, g_num_of_iommus);
 	if (iommu_id < 0 || iommu_id >= g_num_of_iommus)
@@ -1189,48 +1207,74 @@ void free_dmar_iommu(struct intel_iommu *iommu)
 	free_context_table(iommu);
 }
 
-static struct dmar_domain * iommu_alloc_domain(struct intel_iommu *iommu)
+/* Sequential domain id starting from 0. */
+static unsigned long domain_id;
+
+static struct dmar_domain *alloc_domain(void)
 {
-	unsigned long num;
-	unsigned long ndomains;
 	struct dmar_domain *domain;
-	unsigned long flags;
 
 	domain = alloc_domain_mem();
 	if (!domain)
 		return NULL;
 
+	domain->id = domain_id++;
+	memset(&domain->iommu_bmp, 0, sizeof(unsigned long));
+	domain->flags = 0;
+
+	return domain;
+}
+
+static int iommu_attach_domain(struct dmar_domain *domain,
+			       struct intel_iommu *iommu)
+{
+	int num;
+	unsigned long ndomains;
+	unsigned long flags;
+
 	ndomains = cap_ndoms(iommu->cap);
 
 	spin_lock_irqsave(&iommu->lock, flags);
+
 	num = find_first_zero_bit(iommu->domain_ids, ndomains);
 	if (num >= ndomains) {
 		spin_unlock_irqrestore(&iommu->lock, flags);
-		free_domain_mem(domain);
 		printk(KERN_ERR "IOMMU: no free domain ids\n");
-		return NULL;
+		return -ENOMEM;
 	}
 
 	set_bit(num, iommu->domain_ids);
-	domain->id = num;
-	memset(&domain->iommu_bmp, 0, sizeof(unsigned long));
 	set_bit(iommu->seq_id, &domain->iommu_bmp);
-	domain->flags = 0;
 	iommu->domains[num] = domain;
 	spin_unlock_irqrestore(&iommu->lock, flags);
 
-	return domain;
+	return 0;
 }
 
-static void iommu_free_domain(struct dmar_domain *domain)
+static void iommu_detach_domain(struct dmar_domain *domain,
+				struct intel_iommu *iommu)
 {
 	unsigned long flags;
-	struct intel_iommu *iommu;
-
-	iommu = domain_get_iommu(domain);
+	int num, ndomains;
+	int found = 0;
 
 	spin_lock_irqsave(&iommu->lock, flags);
-	clear_bit(domain->id, iommu->domain_ids);
+	ndomains = cap_ndoms(iommu->cap);
+	num = find_first_bit(iommu->domain_ids, ndomains);
+	for (; num < ndomains; ) {
+		if (iommu->domains[num] == domain) {
+			found = 1;
+			break;
+		}
+		num = find_next_bit(iommu->domain_ids,
+				    cap_ndoms(iommu->cap), num+1);
+	}
+
+	if (found) {
+		clear_bit(num, iommu->domain_ids);
+		clear_bit(iommu->seq_id, &domain->iommu_bmp);
+		iommu->domains[num] = NULL;
+	}
 	spin_unlock_irqrestore(&iommu->lock, flags);
 }
 
@@ -1310,6 +1354,7 @@ static int domain_init(struct dmar_domain *domain, int guest_width)
 
 	domain_reserve_special_ranges(domain);
 
+	domain->flags = DOMAIN_FLAG_SINGLE_DEVICE;
 	/* calculate AGAW */
 	iommu = domain_get_iommu(domain);
 	if (guest_width > cap_mgaw(iommu->cap))
@@ -1350,6 +1395,8 @@ static int domain_init(struct dmar_domain *domain, int guest_width)
 
 static void domain_exit(struct dmar_domain *domain)
 {
+	struct dmar_drhd_unit *drhd;
+	struct intel_iommu *iommu;
 	u64 end;
 
 	/* Domain 0 is reserved, so dont process it */
@@ -1368,7 +1415,10 @@ static void domain_exit(struct dmar_domain *domain)
 	/* free page tables */
 	dma_pte_free_pagetable(domain, 0, end);
 
-	iommu_free_domain(domain);
+	for_each_active_iommu(iommu, drhd)
+		if (test_bit(iommu->seq_id, &domain->iommu_bmp))
+			iommu_detach_domain(domain, iommu);
+
 	free_domain_mem(domain);
 }
 
@@ -1383,6 +1433,7 @@ static int domain_context_mapping_one(struct dmar_domain *domain, int segment,
 	unsigned long ndomains;
 	int id;
 	int agaw;
+	int found = 0;
 	struct device_domain_info *info = NULL;
 
 	pr_debug("Set context mapping for %02x:%02x.%d\n",
@@ -1408,8 +1459,7 @@ static int domain_context_mapping_one(struct dmar_domain *domain, int segment,
 	id = domain->id;
 	pgd = domain->pgd;
 
-	if (domain->flags & DOMAIN_FLAG_VIRTUAL_MACHINE) {
-		int found = 0;
+	if (domain->flags != DOMAIN_FLAG_SINGLE_DEVICE) {
 
 		/* find an available domain id for this device in iommu */
 		ndomains = cap_ndoms(iommu->cap);
@@ -1433,6 +1483,7 @@ static int domain_context_mapping_one(struct dmar_domain *domain, int segment,
 			}
 
 			set_bit(num, iommu->domain_ids);
+			set_bit(iommu->seq_id, &domain->iommu_bmp);
 			iommu->domains[num] = domain;
 			id = num;
 		}
@@ -1675,6 +1726,7 @@ static struct dmar_domain *get_domain_for_dev(struct pci_dev *pdev, int gaw)
 	unsigned long flags;
 	int bus = 0, devfn = 0;
 	int segment;
+	int ret;
 
 	domain = find_domain(pdev);
 	if (domain)
@@ -1707,6 +1759,10 @@ static struct dmar_domain *get_domain_for_dev(struct pci_dev *pdev, int gaw)
 		}
 	}
 
+	domain = alloc_domain();
+	if (!domain)
+		goto error;
+
 	/* Allocate new domain for the device */
 	drhd = dmar_find_matched_drhd_unit(pdev);
 	if (!drhd) {
@@ -1716,9 +1772,11 @@ static struct dmar_domain *get_domain_for_dev(struct pci_dev *pdev, int gaw)
 	}
 	iommu = drhd->iommu;
 
-	domain = iommu_alloc_domain(iommu);
-	if (!domain)
+	ret = iommu_attach_domain(domain, iommu);
+	if (ret) {
+		domain_exit(domain);
 		goto error;
+	}
 
 	if (domain_init(domain, gaw)) {
 		domain_exit(domain);
@@ -1804,8 +1862,11 @@ static int iommu_prepare_identity_map(struct pci_dev *pdev,
 	printk(KERN_INFO
 		"IOMMU: Setting identity map for device %s [0x%Lx - 0x%Lx]\n",
 		pci_name(pdev), start, end);
-	/* page table init */
-	domain = get_domain_for_dev(pdev, DEFAULT_DOMAIN_ADDRESS_WIDTH);
+	if (iommu_identity_mapping)
+		domain = si_domain;
+	else
+		/* page table init */
+		domain = get_domain_for_dev(pdev, DEFAULT_DOMAIN_ADDRESS_WIDTH);
 	if (!domain)
 		return -ENOMEM;
 
@@ -1952,7 +2013,76 @@ static int __init init_context_pass_through(void)
 	return 0;
 }
 
-static int __init init_dmars(void)
+static int si_domain_init(void)
+{
+	struct dmar_drhd_unit *drhd;
+	struct intel_iommu *iommu;
+	int ret = 0;
+
+	si_domain = alloc_domain();
+	if (!si_domain)
+		return -EFAULT;
+
+	si_domain->flags = DOMAIN_FLAG_STATIC_IDENTITY;
+
+	for_each_active_iommu(iommu, drhd) {
+		ret = iommu_attach_domain(si_domain, iommu);
+		if (ret) {
+			domain_exit(si_domain);
+			return -EFAULT;
+		}
+	}
+
+	if (domain_init(si_domain, DEFAULT_DOMAIN_ADDRESS_WIDTH)) {
+		domain_exit(si_domain);
+		return -EFAULT;
+	}
+
+	return 0;
+}
+
+static int identity_list(struct pci_dev *pdev)
+{
+	if (iommu_identity_mapping)
+		return 1;
+
+	return 0;
+}
+
+static int iommu_prepare_static_identity_mapping(void)
+{
+	int i;
+	struct pci_dev *pdev = NULL;
+	int ret;
+
+	ret = si_domain_init();
+	if (ret)
+		return -EFAULT;
+
+	printk(KERN_INFO "IOMMU: Setting identity map:\n");
+	for_each_pci_dev(pdev) {
+		/* Devices not in the identity list won't do identity map. */
+		if (!identity_list(pdev))
+			continue;
+
+		for (i = 0; i < e820.nr_map; i++) {
+			struct e820entry *ei = &e820.map[i];
+
+			if (ei->type == E820_RAM) {
+				ret = iommu_prepare_identity_map(pdev,
+					ei->addr, ei->addr + ei->size);
+				if (ret)  {
+					printk(KERN_INFO "1:1 mapping to one domain failed.\n");
+					return -EFAULT;
+				}
+			}
+		}
+	}
+
+	return 0;
+}
+
+int __init init_dmars(void)
 {
 	struct dmar_drhd_unit *drhd;
 	struct dmar_rmrr_unit *rmrr;
@@ -2076,6 +2206,7 @@ static int __init init_dmars(void)
 		}
 	}
 
+
 	/*
 	 * If pass through is set and enabled, context entries of all pci
 	 * devices are intialized by pass through translation type.
@@ -2093,6 +2224,9 @@ static int __init init_dmars(void)
 	 * identity mappings for rmrr, gfx, and isa.
 	 */
 	if (!iommu_pass_through) {
+		if (iommu_identity_mapping)
+			iommu_prepare_static_identity_mapping();
+
 		/*
 		 * For each rmrr
 		 *   for each dev attached to rmrr
@@ -2107,6 +2241,7 @@ static int __init init_dmars(void)
 		 *    endfor
 		 * endfor
 		 */
+		printk(KERN_INFO "IOMMU: Setting RMRR:\n");
 		for_each_rmrr_units(rmrr) {
 			for (i = 0; i < rmrr->devices_cnt; i++) {
 				pdev = rmrr->devices[i];
@@ -2259,6 +2394,9 @@ static dma_addr_t __intel_map_single(struct device *hwdev, phys_addr_t paddr,
 	int ret;
 	struct intel_iommu *iommu;
 
+	if (identity_list(pdev))
+		return paddr;
+
 	BUG_ON(dir == DMA_NONE);
 	if (pdev->dev.archdata.iommu == DUMMY_DEVICE_DOMAIN_INFO)
 		return paddr;
@@ -2401,6 +2539,9 @@ static void intel_unmap_page(struct device *dev, dma_addr_t dev_addr,
 	struct iova *iova;
 	struct intel_iommu *iommu;
 
+	if (identity_list(pdev))
+		return;
+
 	if (pdev->dev.archdata.iommu == DUMMY_DEVICE_DOMAIN_INFO)
 		return;
 	domain = find_domain(pdev);
@@ -2492,6 +2633,9 @@ static void intel_unmap_sg(struct device *hwdev, struct scatterlist *sglist,
 	struct scatterlist *sg;
 	struct intel_iommu *iommu;
 
+	if (identity_list(pdev))
+		return;
+
 	if (pdev->dev.archdata.iommu == DUMMY_DEVICE_DOMAIN_INFO)
 		return;
 
@@ -2552,6 +2696,16 @@ static int intel_map_sg(struct device *hwdev, struct scatterlist *sglist, int ne
 	unsigned long start_addr;
 	struct intel_iommu *iommu;
 
+	if (identity_list(pdev)) {
+		for_each_sg(sglist, sg, nelems, i) {
+			addr = page_to_phys(sg_page(sg)) + sg->offset;
+			sg->dma_address = addr;
+			sg->dma_length = sg->length;
+		}
+
+		return nelems;
+	}
+
 	BUG_ON(dir == DMA_NONE);
 	if (pdev->dev.archdata.iommu == DUMMY_DEVICE_DOMAIN_INFO)
 		return intel_nontranslate_map_sg(hwdev, sglist, nelems, dir);

^ permalink raw reply related	[flat|nested] 61+ messages in thread

* Re: [PATCH 1/2] IOMMU Identity Mapping Support: iommu_identity_mapping definition
  2009-06-18 18:05               ` [PATCH 1/2] IOMMU Identity Mapping Support: iommu_identity_mapping definition Fenghua Yu
@ 2009-06-18 18:08                 ` Muli Ben-Yehuda
  2009-06-18 18:13                   ` Chris Wright
  2009-06-18 18:14                   ` Yu, Fenghua
  2009-06-18 18:13                 ` [PATCH 1/2] IOMMU Identity Mapping Support: iommu_identity_mapping definition Chris Wright
  1 sibling, 2 replies; 61+ messages in thread
From: Muli Ben-Yehuda @ 2009-06-18 18:08 UTC (permalink / raw)
  To: Fenghua Yu
  Cc: David Woodhouse, 'Linus Torvalds',
	'Stephen Rothwell', 'Andrew Morton',
	'Ingo Molnar', 'Christopher Wright',
	'Allen Kay', 'lkml', 'iommu'

On Thu, Jun 18, 2009 at 11:05:14AM -0700, Fenghua Yu wrote:

> IOMMU Identity Mapping Support: iommu_identity_mapping definition
> 
> Identity mapping for IOMMU defines a single domain to 1:1 map all
> pci devices to all usable memory.

Why use VT-d at all in this case? Do you have a use-case in mind?

Cheers,
Muli

> This method may lose some of DMA remapping benefits like isolation.

I think you meant s/may/will/.

Cheers,
Muli
-- 
Muli Ben-Yehuda | muli@il.ibm.com | +972-4-8281080
Manager, Virtualization and Systems Architecture
Master Inventor, IBM Haifa Research Laboratory

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [PATCH 1/2] IOMMU Identity Mapping Support: iommu_identity_mapping definition
  2009-06-18 18:05               ` [PATCH 1/2] IOMMU Identity Mapping Support: iommu_identity_mapping definition Fenghua Yu
  2009-06-18 18:08                 ` Muli Ben-Yehuda
@ 2009-06-18 18:13                 ` Chris Wright
  2009-06-18 18:28                   ` Yu, Fenghua
  2009-07-04 18:40                   ` David Woodhouse
  1 sibling, 2 replies; 61+ messages in thread
From: Chris Wright @ 2009-06-18 18:13 UTC (permalink / raw)
  To: Fenghua Yu
  Cc: David Woodhouse, 'Linus Torvalds',
	'Stephen Rothwell', 'Andrew Morton',
	'Ingo Molnar', 'Christopher Wright',
	'Allen Kay', 'iommu', 'lkml'

* Fenghua Yu (fenghua.yu@intel.com) wrote:
> IOMMU Identity Mapping Support: iommu_identity_mapping definition
> 
> Identity mapping for IOMMU defines a single domain to 1:1 map all pci devices
> to all usable memory.
> 
> This will reduces map/unmap overhead in DMA API's and improve IOMMU performance.
> On 10Gb network cards, Netperf shows no performance degradation compared to
> non-IOMMU performance.
> 
> This method may lose some of DMA remapping benefits like isolation.
> 
> The first patch defines iommu_identity_mapping varialbe which controls the
> identity mapping code and is 0 by default.

The only real difference between "pt" and "identity" is hardware support.
We should have a single value we don't have to tell users to do different
things depending on their hardware (they won't even know what they have)
to achieve the same result.

thanks,
-chris

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [PATCH 1/2] IOMMU Identity Mapping Support: iommu_identity_mapping definition
  2009-06-18 18:08                 ` Muli Ben-Yehuda
@ 2009-06-18 18:13                   ` Chris Wright
  2009-06-18 18:14                   ` Yu, Fenghua
  1 sibling, 0 replies; 61+ messages in thread
From: Chris Wright @ 2009-06-18 18:13 UTC (permalink / raw)
  To: Muli Ben-Yehuda
  Cc: Fenghua Yu, David Woodhouse, 'Linus Torvalds',
	'Stephen Rothwell', 'Andrew Morton',
	'Ingo Molnar', 'Christopher Wright',
	'Allen Kay', 'lkml', 'iommu'

* Muli Ben-Yehuda (muli@il.ibm.com) wrote:
> On Thu, Jun 18, 2009 at 11:05:14AM -0700, Fenghua Yu wrote:
> 
> > IOMMU Identity Mapping Support: iommu_identity_mapping definition
> > 
> > Identity mapping for IOMMU defines a single domain to 1:1 map all
> > pci devices to all usable memory.
> 
> Why use VT-d at all in this case? Do you have a use-case in mind?

Device assignment.

^ permalink raw reply	[flat|nested] 61+ messages in thread

* RE: [PATCH 1/2] IOMMU Identity Mapping Support: iommu_identity_mapping definition
  2009-06-18 18:08                 ` Muli Ben-Yehuda
  2009-06-18 18:13                   ` Chris Wright
@ 2009-06-18 18:14                   ` Yu, Fenghua
  2009-06-18 18:25                     ` Muli Ben-Yehuda
  2009-06-25  0:38                     ` [PATCH] IA64 Compilation Error Fix for Intel IOMMU Identity Mapping Support Fenghua Yu
  1 sibling, 2 replies; 61+ messages in thread
From: Yu, Fenghua @ 2009-06-18 18:14 UTC (permalink / raw)
  To: 'Muli Ben-Yehuda'
  Cc: 'David Woodhouse', 'Linus Torvalds',
	'Stephen Rothwell', 'Andrew Morton',
	'Ingo Molnar', 'Christopher Wright',
	Kay, Allen M, 'lkml', 'iommu'

>
>On Thu, Jun 18, 2009 at 11:05:14AM -0700, Fenghua Yu wrote:
>
>> IOMMU Identity Mapping Support: iommu_identity_mapping definition
>>
>> Identity mapping for IOMMU defines a single domain to 1:1 map all
>> pci devices to all usable memory.
>
>Why use VT-d at all in this case? Do you have a use-case in mind?

Some users want to use VT-d in KVM but are concerned of DMA remapping performance. They can use identity mapping and still have KVM on VT-d. They can also use pass through patch (sent out before) if hardware supports pass through.

By default, identity mapping is turned off. "iommu=identity" can turn it on

Thanks.

-Fenghua

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [PATCH 1/2] IOMMU Identity Mapping Support: iommu_identity_mapping definition
  2009-06-18 18:14                   ` Yu, Fenghua
@ 2009-06-18 18:25                     ` Muli Ben-Yehuda
  2009-06-18 18:31                       ` Chris Wright
  2009-06-25  0:38                     ` [PATCH] IA64 Compilation Error Fix for Intel IOMMU Identity Mapping Support Fenghua Yu
  1 sibling, 1 reply; 61+ messages in thread
From: Muli Ben-Yehuda @ 2009-06-18 18:25 UTC (permalink / raw)
  To: Yu, Fenghua
  Cc: 'David Woodhouse', 'Linus Torvalds',
	'Stephen Rothwell', 'Andrew Morton',
	'Ingo Molnar', 'Christopher Wright',
	Kay, Allen M, 'lkml', 'iommu'

On Thu, Jun 18, 2009 at 11:14:51AM -0700, Yu, Fenghua wrote:
> >
> >On Thu, Jun 18, 2009 at 11:05:14AM -0700, Fenghua Yu wrote:
> >
> >> IOMMU Identity Mapping Support: iommu_identity_mapping definition
> >>
> >> Identity mapping for IOMMU defines a single domain to 1:1 map all
> >> pci devices to all usable memory.
> >
> >Why use VT-d at all in this case? Do you have a use-case in mind?
> 
> Some users want to use VT-d in KVM but are concerned of DMA
> remapping performance. They can use identity mapping and still have
> KVM on VT-d. They can also use pass through patch (sent out before)
> if hardware supports pass through.

Sorry, I must be missing something. For the normal device assignment
case, we want the IOMMU page tables to have gpa->hpa mappings rather
than the 1-1 identity mapping. How do you envision the 1-1 mapping
being used in the device assignment case?

Cheers,
Muli
-- 
Muli Ben-Yehuda | muli@il.ibm.com | +972-4-8281080
Manager, Virtualization and Systems Architecture
Master Inventor, IBM Haifa Research Laboratory

^ permalink raw reply	[flat|nested] 61+ messages in thread

* RE: [PATCH 1/2] IOMMU Identity Mapping Support: iommu_identity_mapping definition
  2009-06-18 18:13                 ` [PATCH 1/2] IOMMU Identity Mapping Support: iommu_identity_mapping definition Chris Wright
@ 2009-06-18 18:28                   ` Yu, Fenghua
  2009-06-18 18:34                     ` Chris Wright
  2009-07-04 18:40                   ` David Woodhouse
  1 sibling, 1 reply; 61+ messages in thread
From: Yu, Fenghua @ 2009-06-18 18:28 UTC (permalink / raw)
  To: 'Chris Wright'
  Cc: 'David Woodhouse', 'Linus Torvalds',
	'Stephen Rothwell', 'Andrew Morton',
	'Ingo Molnar', 'Christopher Wright',
	Kay, Allen M, 'iommu', 'lkml'

>>
>> The first patch defines iommu_identity_mapping varialbe which controls
>the
>> identity mapping code and is 0 by default.
>
>The only real difference between "pt" and "identity" is hardware support.
>We should have a single value we don't have to tell users to do different
>things depending on their hardware (they won't even know what they have)
>to achieve the same result.

Technically keeping two separate options in base kernel might be clear and easy to understand. A distro might merge them together or have other usage model.

Thanks.

-Fenghua

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [PATCH 1/2] IOMMU Identity Mapping Support: iommu_identity_mapping definition
  2009-06-18 18:25                     ` Muli Ben-Yehuda
@ 2009-06-18 18:31                       ` Chris Wright
  2009-06-18 18:41                         ` Muli Ben-Yehuda
  0 siblings, 1 reply; 61+ messages in thread
From: Chris Wright @ 2009-06-18 18:31 UTC (permalink / raw)
  To: Muli Ben-Yehuda
  Cc: Yu, Fenghua, 'David Woodhouse', 'Linus Torvalds',
	'Stephen Rothwell', 'Andrew Morton',
	'Ingo Molnar', 'Christopher Wright',
	Kay, Allen M, 'lkml', 'iommu'

* Muli Ben-Yehuda (muli@il.ibm.com) wrote:
> On Thu, Jun 18, 2009 at 11:14:51AM -0700, Yu, Fenghua wrote:
> > >
> > >On Thu, Jun 18, 2009 at 11:05:14AM -0700, Fenghua Yu wrote:
> > >
> > >> IOMMU Identity Mapping Support: iommu_identity_mapping definition
> > >>
> > >> Identity mapping for IOMMU defines a single domain to 1:1 map all
> > >> pci devices to all usable memory.
> > >
> > >Why use VT-d at all in this case? Do you have a use-case in mind?
> > 
> > Some users want to use VT-d in KVM but are concerned of DMA
> > remapping performance. They can use identity mapping and still have
> > KVM on VT-d. They can also use pass through patch (sent out before)
> > if hardware supports pass through.
> 
> Sorry, I must be missing something. For the normal device assignment
> case, we want the IOMMU page tables to have gpa->hpa mappings rather
> than the 1-1 identity mapping. How do you envision the 1-1 mapping
> being used in the device assignment case?

The 1-1 mapping is for all the host devices _not_ assigned to guests.
To eliminate the i/o overhead imposed on all guests not using an
assigned device or from i/o from host.

It's just the same as VT-d PassThrough mode for hardware that doesn't
support it.

thanks,
-chris

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [PATCH 1/2] IOMMU Identity Mapping Support: iommu_identity_mapping definition
  2009-06-18 18:28                   ` Yu, Fenghua
@ 2009-06-18 18:34                     ` Chris Wright
  0 siblings, 0 replies; 61+ messages in thread
From: Chris Wright @ 2009-06-18 18:34 UTC (permalink / raw)
  To: Yu, Fenghua
  Cc: 'Chris Wright', 'David Woodhouse',
	'Linus Torvalds', 'Stephen Rothwell',
	'Andrew Morton', 'Ingo Molnar',
	'Christopher Wright', Kay, Allen M, 'iommu',
	'lkml'

* Yu, Fenghua (fenghua.yu@intel.com) wrote:
> >>
> >> The first patch defines iommu_identity_mapping varialbe which controls
> >the
> >> identity mapping code and is 0 by default.
> >
> >The only real difference between "pt" and "identity" is hardware support.
> >We should have a single value we don't have to tell users to do different
> >things depending on their hardware (they won't even know what they have)
> >to achieve the same result.
> 
> Technically keeping two separate options in base kernel might be clear and easy to understand. A distro might merge them together or have other usage model.

This pushes burden to distros and users for no obvious gain.  Just like
queued invalidation vs. register based invalidation...it's a hardware
detail that users don't really care that much about, they just care
about user visible functionality.

thanks,
-chris

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [PATCH 1/2] IOMMU Identity Mapping Support: iommu_identity_mapping definition
  2009-06-18 18:31                       ` Chris Wright
@ 2009-06-18 18:41                         ` Muli Ben-Yehuda
  2009-06-18 18:50                           ` Yu, Fenghua
  2009-06-18 18:51                           ` Chris Wright
  0 siblings, 2 replies; 61+ messages in thread
From: Muli Ben-Yehuda @ 2009-06-18 18:41 UTC (permalink / raw)
  To: Chris Wright
  Cc: Yu, Fenghua, 'David Woodhouse', 'Linus Torvalds',
	'Stephen Rothwell', 'Andrew Morton',
	'Ingo Molnar', Kay, Allen M, 'lkml',
	'iommu'

On Thu, Jun 18, 2009 at 11:31:21AM -0700, Chris Wright wrote:

> The 1-1 mapping is for all the host devices _not_ assigned to
> guests.  To eliminate the i/o overhead imposed on all guests not
> using an assigned device or from i/o from host.
>
> It's just the same as VT-d PassThrough mode for hardware that
> doesn't support it.

Ok, that makes sense. Thanks, Chris. However, that doesn't appear to
be what the patch does---unless I'm misreading, if
iommu_identity_mapping is set, *all* devices get identity
mapping. Instead of a global command line option, we need to provide a
way to enable/disable pt or identity mapping (I agree that the user
shouldn't know or care which is used, the kernel should pick the best
one automatically) on a per BDF basis.

Cheers,
Muli
-- 
Muli Ben-Yehuda | muli@il.ibm.com | +972-4-8281080
Manager, Virtualization and Systems Architecture
Master Inventor, IBM Haifa Research Laboratory

^ permalink raw reply	[flat|nested] 61+ messages in thread

* RE: [PATCH 1/2] IOMMU Identity Mapping Support: iommu_identity_mapping definition
  2009-06-18 18:41                         ` Muli Ben-Yehuda
@ 2009-06-18 18:50                           ` Yu, Fenghua
  2009-06-18 18:51                           ` Chris Wright
  1 sibling, 0 replies; 61+ messages in thread
From: Yu, Fenghua @ 2009-06-18 18:50 UTC (permalink / raw)
  To: 'Muli Ben-Yehuda', 'Chris Wright'
  Cc: 'David Woodhouse', 'Linus Torvalds',
	'Stephen Rothwell', 'Andrew Morton',
	'Ingo Molnar', Kay, Allen M, 'lkml',
	'iommu'

>
>On Thu, Jun 18, 2009 at 11:31:21AM -0700, Chris Wright wrote:
>
>> The 1-1 mapping is for all the host devices _not_ assigned to
>> guests.  To eliminate the i/o overhead imposed on all guests not
>> using an assigned device or from i/o from host.
>>
>> It's just the same as VT-d PassThrough mode for hardware that
>> doesn't support it.
>
>Ok, that makes sense. Thanks, Chris. However, that doesn't appear to
>be what the patch does---unless I'm misreading, if
>iommu_identity_mapping is set, *all* devices get identity
>mapping. Instead of a global command line option, we need to provide a
>way to enable/disable pt or identity mapping (I agree that the user
>shouldn't know or care which is used, the kernel should pick the best
>one automatically) on a per BDF basis.

The device in kvm is attached to a kvm domain by intel_iommu_attach_device(). In this function, domain_context_mapping() changes the device's domain to kvm domain from si_domain.

Actually there is a bug when a device is detached from kvm domain...in intel_iommu_detach_device(), I should assigned the device back to the si_domain. So the device can be used in native again.

Thanks.

-Fenghua


^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [PATCH 1/2] IOMMU Identity Mapping Support: iommu_identity_mapping definition
  2009-06-18 18:41                         ` Muli Ben-Yehuda
  2009-06-18 18:50                           ` Yu, Fenghua
@ 2009-06-18 18:51                           ` Chris Wright
  2009-06-18 19:09                             ` Yu, Fenghua
  1 sibling, 1 reply; 61+ messages in thread
From: Chris Wright @ 2009-06-18 18:51 UTC (permalink / raw)
  To: Muli Ben-Yehuda
  Cc: Chris Wright, Yu, Fenghua, 'David Woodhouse',
	'Linus Torvalds', 'Stephen Rothwell',
	'Andrew Morton', 'Ingo Molnar',
	Kay, Allen M, 'lkml', 'iommu'

* Muli Ben-Yehuda (muli@il.ibm.com) wrote:
> On Thu, Jun 18, 2009 at 11:31:21AM -0700, Chris Wright wrote:
> 
> > The 1-1 mapping is for all the host devices _not_ assigned to
> > guests.  To eliminate the i/o overhead imposed on all guests not
> > using an assigned device or from i/o from host.
> >
> > It's just the same as VT-d PassThrough mode for hardware that
> > doesn't support it.
> 
> Ok, that makes sense. Thanks, Chris. However, that doesn't appear to
> be what the patch does---unless I'm misreading, if
> iommu_identity_mapping is set, *all* devices get identity

Correct (and same as pt mode).  Of course, a subsequent device assignement
will set that mapping to gpa<->hpa, so you'll get proper isolation for
the assigned device.  Essentially it's saying the only reason you cared
about the IOMMU was for device assignment.

> mapping. Instead of a global command line option, we need to provide a
> way to enable/disable pt or identity mapping (I agree that the user
> shouldn't know or care which is used, the kernel should pick the best
> one automatically) on a per BDF basis.

Yeah, that's a possible enhancement.  Although once you've disabled one
device's isolation you've created a hole...

thanks,
-chris

^ permalink raw reply	[flat|nested] 61+ messages in thread

* RE: [PATCH 1/2] IOMMU Identity Mapping Support: iommu_identity_mapping definition
  2009-06-18 18:51                           ` Chris Wright
@ 2009-06-18 19:09                             ` Yu, Fenghua
  0 siblings, 0 replies; 61+ messages in thread
From: Yu, Fenghua @ 2009-06-18 19:09 UTC (permalink / raw)
  To: 'Chris Wright', 'Muli Ben-Yehuda'
  Cc: 'David Woodhouse', 'Linus Torvalds',
	'Stephen Rothwell', 'Andrew Morton',
	'Ingo Molnar', Kay, Allen M, 'lkml',
	'iommu'

>> mapping. Instead of a global command line option, we need to provide a
>> way to enable/disable pt or identity mapping (I agree that the user
>> shouldn't know or care which is used, the kernel should pick the best
>> one automatically) on a per BDF basis.
>
>Yeah, that's a possible enhancement.  Although once you've disabled one
>device's isolation you've created a hole...
>
The isolation hole is not much better than a global 1:1 mapping. If one device is 1:1 mapped to all memory. All of other devices can access part of this device's DMA range. And this device can access all of other devices' DMA range. Only better place than a global 1:1 mapping is all of other devices are isolated from each other.

Thanks.

-Fenghua

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [PATCH 2/2] IOMMU Identity Mapping Support: Intel IOMMU implementation
  2009-06-18 18:05               ` [PATCH 2/2] IOMMU Identity Mapping Support: Intel IOMMU implementation Fenghua Yu
@ 2009-06-18 19:15                 ` Chris Wright
  2009-06-18 19:40                   ` Yu, Fenghua
  0 siblings, 1 reply; 61+ messages in thread
From: Chris Wright @ 2009-06-18 19:15 UTC (permalink / raw)
  To: Fenghua Yu
  Cc: David Woodhouse, 'Linus Torvalds',
	'Stephen Rothwell', 'Andrew Morton',
	'Ingo Molnar', 'Christopher Wright',
	'Allen Kay', 'iommu', 'lkml'

* Fenghua Yu (fenghua.yu@intel.com) wrote:
> IOMMU Identity Mapping Support: Intel IOMMU implementation

BTW, this doesn't apply at all to Linus' current tree, and does w/ minor
fuzz and offset to David's VT-d tree.

Some quick feedback.

> @@ -1189,48 +1207,74 @@ void free_dmar_iommu(struct intel_iommu *iommu)
>  	free_context_table(iommu);
>  }
>  
> -static struct dmar_domain * iommu_alloc_domain(struct intel_iommu *iommu)
> +/* Sequential domain id starting from 0. */
> +static unsigned long domain_id;

This doesn't look SMP safe.

> +static int iommu_prepare_static_identity_mapping(void)
> +{
> +	int i;
> +	struct pci_dev *pdev = NULL;
> +	int ret;
> +
> +	ret = si_domain_init();
> +	if (ret)
> +		return -EFAULT;
> +
> +	printk(KERN_INFO "IOMMU: Setting identity map:\n");
> +	for_each_pci_dev(pdev) {
> +		/* Devices not in the identity list won't do identity map. */
> +		if (!identity_list(pdev))
> +			continue;
> +
> +		for (i = 0; i < e820.nr_map; i++) {
> +			struct e820entry *ei = &e820.map[i];
> +
> +			if (ei->type == E820_RAM) {
> +				ret = iommu_prepare_identity_map(pdev,

What about RMRR?

And in this mode,  you shouldn't need gfx workaround.

> +					ei->addr, ei->addr + ei->size);
> +				if (ret)  {
> +					printk(KERN_INFO "1:1 mapping to one domain failed.\n");
> +					return -EFAULT;
> +				}
> +			}
> +		}
> +	}
> +
> +	return 0;
> +}
> +
> +int __init init_dmars(void)
>  {
>  	struct dmar_drhd_unit *drhd;
>  	struct dmar_rmrr_unit *rmrr;
> @@ -2076,6 +2206,7 @@ static int __init init_dmars(void)
>  		}
>  	}
>  
> +
>  	/*
>  	 * If pass through is set and enabled, context entries of all pci
>  	 * devices are intialized by pass through translation type.
> @@ -2093,6 +2224,9 @@ static int __init init_dmars(void)
>  	 * identity mappings for rmrr, gfx, and isa.
>  	 */
>  	if (!iommu_pass_through) {
> +		if (iommu_identity_mapping)
> +			iommu_prepare_static_identity_mapping();
> +
>  		/*
>  		 * For each rmrr
>  		 *   for each dev attached to rmrr
> @@ -2107,6 +2241,7 @@ static int __init init_dmars(void)
>  		 *    endfor
>  		 * endfor
>  		 */

...Ah, RMRR looks OK.

> +		printk(KERN_INFO "IOMMU: Setting RMRR:\n");
>  		for_each_rmrr_units(rmrr) {
>  			for (i = 0; i < rmrr->devices_cnt; i++) {
>  				pdev = rmrr->devices[i];
> @@ -2259,6 +2394,9 @@ static dma_addr_t __intel_map_single(struct device *hwdev, phys_addr_t paddr,
>  	int ret;
>  	struct intel_iommu *iommu;
>  
> +	if (identity_list(pdev))
> +		return paddr;
> +

This is same as DUMMY_DEVICE_DOMAIN_INFO.  Please consolidate to a test
that just says "do i need translation".

And what about DMA mask smaller than physical memory.  The PT mode drops
back to swiotlb iirc.


>  	BUG_ON(dir == DMA_NONE);
>  	if (pdev->dev.archdata.iommu == DUMMY_DEVICE_DOMAIN_INFO)
>  		return paddr;
> @@ -2401,6 +2539,9 @@ static void intel_unmap_page(struct device *dev, dma_addr_t dev_addr,
>  	struct iova *iova;
>  	struct intel_iommu *iommu;
>  
> +	if (identity_list(pdev))
> +		return;
> +

Same...duplicate test

>  	if (pdev->dev.archdata.iommu == DUMMY_DEVICE_DOMAIN_INFO)
>  		return;
>  	domain = find_domain(pdev);
> @@ -2492,6 +2633,9 @@ static void intel_unmap_sg(struct device *hwdev, struct scatterlist *sglist,
>  	struct scatterlist *sg;
>  	struct intel_iommu *iommu;
>  
> +	if (identity_list(pdev))
> +		return;
> +

Same duplicate test

>  	if (pdev->dev.archdata.iommu == DUMMY_DEVICE_DOMAIN_INFO)
>  		return;
>  
> @@ -2552,6 +2696,16 @@ static int intel_map_sg(struct device *hwdev, struct scatterlist *sglist, int ne
>  	unsigned long start_addr;
>  	struct intel_iommu *iommu;
>  
> +	if (identity_list(pdev)) {
> +		for_each_sg(sglist, sg, nelems, i) {
> +			addr = page_to_phys(sg_page(sg)) + sg->offset;
> +			sg->dma_address = addr;
> +			sg->dma_length = sg->length;
> +		}
> +

This is just the same as intel_nontranslate_map_sg()
Please consolidate this.

thanks,
-chris

^ permalink raw reply	[flat|nested] 61+ messages in thread

* RE: [PATCH 2/2] IOMMU Identity Mapping Support: Intel IOMMU implementation
  2009-06-18 19:15                 ` Chris Wright
@ 2009-06-18 19:40                   ` Yu, Fenghua
  2009-06-18 20:02                     ` Chris Wright
  2009-06-19 20:47                     ` [PATCH v2] IOMMU Identity Mapping Support (drivers/pci/intel_iommu.c) Fenghua Yu
  0 siblings, 2 replies; 61+ messages in thread
From: Yu, Fenghua @ 2009-06-18 19:40 UTC (permalink / raw)
  To: 'Chris Wright'
  Cc: 'David Woodhouse', 'Linus Torvalds',
	'Stephen Rothwell', 'Andrew Morton',
	'Ingo Molnar', 'Christopher Wright',
	Kay, Allen M, 'iommu', 'lkml'

>
>* Fenghua Yu (fenghua.yu@intel.com) wrote:
>> IOMMU Identity Mapping Support: Intel IOMMU implementation
>
>BTW, this doesn't apply at all to Linus' current tree, and does w/ minor
>fuzz and offset to David's VT-d tree.
>

That's right. This patch needs to apply on the top of the pass through patch. Linus's tree doesn't have the pass through patch yet. David's tree has the pass through already. Linux-next also has the pass through patch. So you can use linux-next as well.


>Some quick feedback.
>
>> @@ -1189,48 +1207,74 @@ void free_dmar_iommu(struct intel_iommu *iommu)
>>  	free_context_table(iommu);
>>  }
>>
>> -static struct dmar_domain * iommu_alloc_domain(struct intel_iommu
>*iommu)
>> +/* Sequential domain id starting from 0. */
>> +static unsigned long domain_id;
>
>This doesn't look SMP safe.

I'll use device_domain_lock to protect it.

>
>> +static int iommu_prepare_static_identity_mapping(void)
>> +{
>> +	int i;
>> +	struct pci_dev *pdev = NULL;
>> +	int ret;
>> +
>> +	ret = si_domain_init();
>> +	if (ret)
>> +		return -EFAULT;
>> +
>> +	printk(KERN_INFO "IOMMU: Setting identity map:\n");
>> +	for_each_pci_dev(pdev) {
>> +		/* Devices not in the identity list won't do identity map. */
>> +		if (!identity_list(pdev))
>> +			continue;
>> +
>> +		for (i = 0; i < e820.nr_map; i++) {
>> +			struct e820entry *ei = &e820.map[i];
>> +
>> +			if (ei->type == E820_RAM) {
>> +				ret = iommu_prepare_identity_map(pdev,
>
>What about RMRR?

RMRR will be on the top of si_domain. So it should ok.

>
>And in this mode,  you shouldn't need gfx workaround.
>

The thing is gfx might access some memory range which is reserved. So to be safe, on the top of si_domain, I still call gfx workaround. If there is overlap of between the si_domain and gfx workaround, gfx workaround will set the 1:1 map on it again. Gfx w/a on top of si_domain won't hurt any way.

I tried to not use gfx w/a, but graphics is not starting.

If really paranoid on this, I won't call it.

>> +					ei->addr, ei->addr + ei->size);
>> +				if (ret)  {
>> +					printk(KERN_INFO "1:1 mapping to one domain
>failed.\n");
>> +					return -EFAULT;
>> +				}
>> +			}
>> +		}
>> +	}
>> +
>> +	return 0;
>> +}
>> +
>> +int __init init_dmars(void)
>>  {
>>  	struct dmar_drhd_unit *drhd;
>>  	struct dmar_rmrr_unit *rmrr;
>> @@ -2076,6 +2206,7 @@ static int __init init_dmars(void)
>>  		}
>>  	}
>>
>> +
>>  	/*
>>  	 * If pass through is set and enabled, context entries of all pci
>>  	 * devices are intialized by pass through translation type.
>> @@ -2093,6 +2224,9 @@ static int __init init_dmars(void)
>>  	 * identity mappings for rmrr, gfx, and isa.
>>  	 */
>>  	if (!iommu_pass_through) {
>> +		if (iommu_identity_mapping)
>> +			iommu_prepare_static_identity_mapping();
>> +
>>  		/*
>>  		 * For each rmrr
>>  		 *   for each dev attached to rmrr
>> @@ -2107,6 +2241,7 @@ static int __init init_dmars(void)
>>  		 *    endfor
>>  		 * endfor
>>  		 */
>
>...Ah, RMRR looks OK.
>
>> +		printk(KERN_INFO "IOMMU: Setting RMRR:\n");
>>  		for_each_rmrr_units(rmrr) {
>>  			for (i = 0; i < rmrr->devices_cnt; i++) {
>>  				pdev = rmrr->devices[i];
>> @@ -2259,6 +2394,9 @@ static dma_addr_t __intel_map_single(struct device
>*hwdev, phys_addr_t paddr,
>>  	int ret;
>>  	struct intel_iommu *iommu;
>>
>> +	if (identity_list(pdev))
>> +		return paddr;
>> +
>
>This is same as DUMMY_DEVICE_DOMAIN_INFO.  Please consolidate to a test
>that just says "do i need translation".
>

The purpose of identity_list() is to have an interface for future when we need to set 1:1 mapping on some specific devices instead of all (just like Muli Ben-Yehuda suggested earlier in this thread).

Right now it's almost empty. Yeah, I can change this checking to simply:
if (iommu_identity_mapping)
	return paddr;

>And what about DMA mask smaller than physical memory.  The PT mode drops
>back to swiotlb iirc.
>
>
>>  	BUG_ON(dir == DMA_NONE);
>>  	if (pdev->dev.archdata.iommu == DUMMY_DEVICE_DOMAIN_INFO)
>>  		return paddr;
>> @@ -2401,6 +2539,9 @@ static void intel_unmap_page(struct device *dev,
>dma_addr_t dev_addr,
>>  	struct iova *iova;
>>  	struct intel_iommu *iommu;
>>
>> +	if (identity_list(pdev))
>> +		return;
>> +
>
>Same...duplicate test
>
>>  	if (pdev->dev.archdata.iommu == DUMMY_DEVICE_DOMAIN_INFO)
>>  		return;
>>  	domain = find_domain(pdev);
>> @@ -2492,6 +2633,9 @@ static void intel_unmap_sg(struct device *hwdev,
>struct scatterlist *sglist,
>>  	struct scatterlist *sg;
>>  	struct intel_iommu *iommu;
>>
>> +	if (identity_list(pdev))
>> +		return;
>> +
>
>Same duplicate test
>
>>  	if (pdev->dev.archdata.iommu == DUMMY_DEVICE_DOMAIN_INFO)
>>  		return;
>>
>> @@ -2552,6 +2696,16 @@ static int intel_map_sg(struct device *hwdev,
>struct scatterlist *sglist, int ne
>>  	unsigned long start_addr;
>>  	struct intel_iommu *iommu;
>>
>> +	if (identity_list(pdev)) {
>> +		for_each_sg(sglist, sg, nelems, i) {
>> +			addr = page_to_phys(sg_page(sg)) + sg->offset;
>> +			sg->dma_address = addr;
>> +			sg->dma_length = sg->length;
>> +		}
>> +
>
>This is just the same as intel_nontranslate_map_sg()
>Please consolidate this.
>


Will do this.

>thanks,
>-chris

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [PATCH 2/2] IOMMU Identity Mapping Support: Intel IOMMU implementation
  2009-06-18 19:40                   ` Yu, Fenghua
@ 2009-06-18 20:02                     ` Chris Wright
  2009-06-19 20:47                     ` [PATCH v2] IOMMU Identity Mapping Support (drivers/pci/intel_iommu.c) Fenghua Yu
  1 sibling, 0 replies; 61+ messages in thread
From: Chris Wright @ 2009-06-18 20:02 UTC (permalink / raw)
  To: Yu, Fenghua
  Cc: 'Chris Wright', 'David Woodhouse',
	'Linus Torvalds', 'Stephen Rothwell',
	'Andrew Morton', 'Ingo Molnar',
	'Christopher Wright', Kay, Allen M, 'iommu',
	'lkml'

* Yu, Fenghua (fenghua.yu@intel.com) wrote:
> >> @@ -2259,6 +2394,9 @@ static dma_addr_t __intel_map_single(struct device
> >*hwdev, phys_addr_t paddr,
> >>  	int ret;
> >>  	struct intel_iommu *iommu;
> >>
> >> +	if (identity_list(pdev))
> >> +		return paddr;
> >> +
> >
> >This is same as DUMMY_DEVICE_DOMAIN_INFO.  Please consolidate to a test
> >that just says "do i need translation".
> >
> 
> The purpose of identity_list() is to have an interface for future when we need to set 1:1 mapping on some specific devices instead of all (just like Muli Ben-Yehuda suggested earlier in this thread).
> 
> Right now it's almost empty. Yeah, I can change this checking to simply:
> if (iommu_identity_mapping)
> 	return paddr;
> 

That's fine, I figured that's why you had the list check.  But my point is
there are 2 ways to test for the same thing, which is whether a translation
is needed or not.

So a simple helper to help clarify what is going on.  Like this:

static int iommu_dummy(struct pci_device *pdev)
{
	return (pdev->dev.archdata.iommu == DUMMY_DEVICE_DOMAIN_INFO);
}

static int iommu_no_mapping(struct pci_device *pdev)
{
	if (identity_list(pdev) || iommu_dummy(pdev))
		return 1;
	return 0;
}

> >And what about DMA mask smaller than physical memory.  The PT mode drops
> >back to swiotlb iirc.

Also...this?

^ permalink raw reply	[flat|nested] 61+ messages in thread

* [PATCH v2] IOMMU Identity Mapping Support (drivers/pci/intel_iommu.c)
  2009-06-18 19:40                   ` Yu, Fenghua
  2009-06-18 20:02                     ` Chris Wright
@ 2009-06-19 20:47                     ` Fenghua Yu
  1 sibling, 0 replies; 61+ messages in thread
From: Fenghua Yu @ 2009-06-19 20:47 UTC (permalink / raw)
  To: 'Chris Wright', 'Christopher Wright',
	Stephen Rothwell', 'David Woodhouse',
	'Ingo Molnar', 'Linus Torvalds',
	'Andrew Morton', 'Muli Ben-Yehuda'
  Cc: 'lkml', 'iommu'

Intel IOMMU Identity Mapping Support

Identity mapping for IOMMU defines a single domain to 1:1 map all pci devices
to all usable memory.

This reduces map/unmap overhead in DMA API's and improve IOMMU performance.
On 10Gb network cards, Netperf shows no performance degradation compared to
non-IOMMU performance.

This method may lose some of DMA remapping benefits like isolation.

The patch sets up identity mapping for all pci devices to all usable
memory. In the DMA API's, there is no overhead to maintain page tables,
invalidate iotlb, flush cache etc.

32 bit DMA devices don't use identity mapping domain in order to access memory
beyond 4GB.

When kernel option iommu=pt, pass through is first tried. If pass through
succeeds, IOMMU goes to pass through. If pass through is not supported in hw or
fail for whatever reason, IOMMU goes to identity mapping.  

Signed-off-by: Fenghua Yu <fenghua.yu@intel.com>

---

The patch needs pass through patch. It can be applied cleanly on iommu-2.6 tree
or linux-next tree.

 intel-iommu.c |  319 +++++++++++++++++++++++++++++++++++++++++++++++-----------
 1 files changed, 260 insertions(+), 59 deletions(-)

diff --git a/drivers/pci/intel-iommu.c b/drivers/pci/intel-iommu.c
index 178853a..2c41aef 100644
--- a/drivers/pci/intel-iommu.c
+++ b/drivers/pci/intel-iommu.c
@@ -39,6 +39,7 @@
 #include <linux/sysdev.h>
 #include <asm/cacheflush.h>
 #include <asm/iommu.h>
+#include <asm/e820.h>
 #include "pci.h"
 
 #define ROOT_SIZE		VTD_PAGE_SIZE
@@ -217,6 +218,14 @@ static inline bool dma_pte_present(struct dma_pte *pte)
 	return (pte->val & 3) != 0;
 }
 
+/*
+ * This domain is a statically identity mapping domain.
+ *	1. This domain creats a static 1:1 mapping to all usable memory.
+ * 	2. It maps to each iommu if successful.
+ *	3. Each iommu mapps to this domain if successful.
+ */
+struct dmar_domain *si_domain;
+
 /* devices under the same p2p bridge are owned in one domain */
 #define DOMAIN_FLAG_P2P_MULTIPLE_DEVICES (1 << 0)
 
@@ -225,6 +234,9 @@ static inline bool dma_pte_present(struct dma_pte *pte)
  */
 #define DOMAIN_FLAG_VIRTUAL_MACHINE	(1 << 1)
 
+/* si_domain contains mulitple devices */
+#define DOMAIN_FLAG_STATIC_IDENTITY	(1 << 2)
+
 struct dmar_domain {
 	int	id;			/* domain id */
 	unsigned long iommu_bmp;	/* bitmap of iommus this domain uses*/
@@ -435,12 +447,14 @@ int iommu_calculate_agaw(struct intel_iommu *iommu)
 	return __iommu_calculate_agaw(iommu, DEFAULT_DOMAIN_ADDRESS_WIDTH);
 }
 
-/* in native case, each domain is related to only one iommu */
+/* This functionin only returns single iommu in a domain */
 static struct intel_iommu *domain_get_iommu(struct dmar_domain *domain)
 {
 	int iommu_id;
 
+	/* si_domain and vm domain should not get here. */
 	BUG_ON(domain->flags & DOMAIN_FLAG_VIRTUAL_MACHINE);
+	BUG_ON(domain->flags & DOMAIN_FLAG_STATIC_IDENTITY);
 
 	iommu_id = find_first_bit(&domain->iommu_bmp, g_num_of_iommus);
 	if (iommu_id < 0 || iommu_id >= g_num_of_iommus)
@@ -1189,10 +1203,11 @@ void free_dmar_iommu(struct intel_iommu *iommu)
 	free_context_table(iommu);
 }
 
-static struct dmar_domain * iommu_alloc_domain(struct intel_iommu *iommu)
+/* Sequential domain id starting from 0. */
+static unsigned long domain_id;
+
+static struct dmar_domain *alloc_domain(void)
 {
-	unsigned long num;
-	unsigned long ndomains;
 	struct dmar_domain *domain;
 	unsigned long flags;
 
@@ -1200,37 +1215,66 @@ static struct dmar_domain * iommu_alloc_domain(struct intel_iommu *iommu)
 	if (!domain)
 		return NULL;
 
+	spin_lock_irqsave(&device_domain_lock, flags);
+	domain->id = domain_id++;
+	spin_unlock_irqrestore(&device_domain_lock, flags);
+
+	memset(&domain->iommu_bmp, 0, sizeof(unsigned long));
+	domain->flags = 0;
+
+	return domain;
+}
+
+static int iommu_attach_domain(struct dmar_domain *domain,
+			       struct intel_iommu *iommu)
+{
+	int num;
+	unsigned long ndomains;
+	unsigned long flags;
+
 	ndomains = cap_ndoms(iommu->cap);
 
 	spin_lock_irqsave(&iommu->lock, flags);
+
 	num = find_first_zero_bit(iommu->domain_ids, ndomains);
 	if (num >= ndomains) {
 		spin_unlock_irqrestore(&iommu->lock, flags);
-		free_domain_mem(domain);
 		printk(KERN_ERR "IOMMU: no free domain ids\n");
-		return NULL;
+		return -ENOMEM;
 	}
 
 	set_bit(num, iommu->domain_ids);
-	domain->id = num;
-	memset(&domain->iommu_bmp, 0, sizeof(unsigned long));
 	set_bit(iommu->seq_id, &domain->iommu_bmp);
-	domain->flags = 0;
 	iommu->domains[num] = domain;
 	spin_unlock_irqrestore(&iommu->lock, flags);
 
-	return domain;
+	return 0;
 }
 
-static void iommu_free_domain(struct dmar_domain *domain)
+static void iommu_detach_domain(struct dmar_domain *domain,
+				struct intel_iommu *iommu)
 {
 	unsigned long flags;
-	struct intel_iommu *iommu;
-
-	iommu = domain_get_iommu(domain);
+	int num, ndomains;
+	int found = 0;
 
 	spin_lock_irqsave(&iommu->lock, flags);
-	clear_bit(domain->id, iommu->domain_ids);
+	ndomains = cap_ndoms(iommu->cap);
+	num = find_first_bit(iommu->domain_ids, ndomains);
+	for (; num < ndomains; ) {
+		if (iommu->domains[num] == domain) {
+			found = 1;
+			break;
+		}
+		num = find_next_bit(iommu->domain_ids,
+				    cap_ndoms(iommu->cap), num+1);
+	}
+
+	if (found) {
+		clear_bit(num, iommu->domain_ids);
+		clear_bit(iommu->seq_id, &domain->iommu_bmp);
+		iommu->domains[num] = NULL;
+	}
 	spin_unlock_irqrestore(&iommu->lock, flags);
 }
 
@@ -1350,6 +1394,8 @@ static int domain_init(struct dmar_domain *domain, int guest_width)
 
 static void domain_exit(struct dmar_domain *domain)
 {
+	struct dmar_drhd_unit *drhd;
+	struct intel_iommu *iommu;
 	u64 end;
 
 	/* Domain 0 is reserved, so dont process it */
@@ -1368,7 +1414,10 @@ static void domain_exit(struct dmar_domain *domain)
 	/* free page tables */
 	dma_pte_free_pagetable(domain, 0, end);
 
-	iommu_free_domain(domain);
+	for_each_active_iommu(iommu, drhd)
+		if (test_bit(iommu->seq_id, &domain->iommu_bmp))
+			iommu_detach_domain(domain, iommu);
+
 	free_domain_mem(domain);
 }
 
@@ -1408,7 +1457,8 @@ static int domain_context_mapping_one(struct dmar_domain *domain, int segment,
 	id = domain->id;
 	pgd = domain->pgd;
 
-	if (domain->flags & DOMAIN_FLAG_VIRTUAL_MACHINE) {
+	if (domain->flags & DOMAIN_FLAG_VIRTUAL_MACHINE ||
+	    domain->flags & DOMAIN_FLAG_STATIC_IDENTITY) {
 		int found = 0;
 
 		/* find an available domain id for this device in iommu */
@@ -1433,6 +1483,7 @@ static int domain_context_mapping_one(struct dmar_domain *domain, int segment,
 			}
 
 			set_bit(num, iommu->domain_ids);
+			set_bit(iommu->seq_id, &domain->iommu_bmp);
 			iommu->domains[num] = domain;
 			id = num;
 		}
@@ -1675,6 +1726,7 @@ static struct dmar_domain *get_domain_for_dev(struct pci_dev *pdev, int gaw)
 	unsigned long flags;
 	int bus = 0, devfn = 0;
 	int segment;
+	int ret;
 
 	domain = find_domain(pdev);
 	if (domain)
@@ -1707,6 +1759,10 @@ static struct dmar_domain *get_domain_for_dev(struct pci_dev *pdev, int gaw)
 		}
 	}
 
+	domain = alloc_domain();
+	if (!domain)
+		goto error;
+
 	/* Allocate new domain for the device */
 	drhd = dmar_find_matched_drhd_unit(pdev);
 	if (!drhd) {
@@ -1716,9 +1772,11 @@ static struct dmar_domain *get_domain_for_dev(struct pci_dev *pdev, int gaw)
 	}
 	iommu = drhd->iommu;
 
-	domain = iommu_alloc_domain(iommu);
-	if (!domain)
+	ret = iommu_attach_domain(domain, iommu);
+	if (ret) {
+		domain_exit(domain);
 		goto error;
+	}
 
 	if (domain_init(domain, gaw)) {
 		domain_exit(domain);
@@ -1792,6 +1850,8 @@ error:
 	return find_domain(pdev);
 }
 
+static int iommu_identity_mapping;
+
 static int iommu_prepare_identity_map(struct pci_dev *pdev,
 				      unsigned long long start,
 				      unsigned long long end)
@@ -1804,8 +1864,11 @@ static int iommu_prepare_identity_map(struct pci_dev *pdev,
 	printk(KERN_INFO
 		"IOMMU: Setting identity map for device %s [0x%Lx - 0x%Lx]\n",
 		pci_name(pdev), start, end);
-	/* page table init */
-	domain = get_domain_for_dev(pdev, DEFAULT_DOMAIN_ADDRESS_WIDTH);
+	if (iommu_identity_mapping)
+		domain = si_domain;
+	else
+		/* page table init */
+		domain = get_domain_for_dev(pdev, DEFAULT_DOMAIN_ADDRESS_WIDTH);
 	if (!domain)
 		return -ENOMEM;
 
@@ -1952,7 +2015,110 @@ static int __init init_context_pass_through(void)
 	return 0;
 }
 
-static int __init init_dmars(void)
+static int md_domain_init(struct dmar_domain *domain, int guest_width);
+static int si_domain_init(void)
+{
+	struct dmar_drhd_unit *drhd;
+	struct intel_iommu *iommu;
+	int ret = 0;
+
+	si_domain = alloc_domain();
+	if (!si_domain)
+		return -EFAULT;
+
+
+	for_each_active_iommu(iommu, drhd) {
+		ret = iommu_attach_domain(si_domain, iommu);
+		if (ret) {
+			domain_exit(si_domain);
+			return -EFAULT;
+		}
+	}
+
+	if (md_domain_init(si_domain, DEFAULT_DOMAIN_ADDRESS_WIDTH)) {
+		domain_exit(si_domain);
+		return -EFAULT;
+	}
+
+	si_domain->flags = DOMAIN_FLAG_STATIC_IDENTITY;
+
+	return 0;
+}
+
+static void domain_remove_one_dev_info(struct dmar_domain *domain,
+					  struct pci_dev *pdev);
+static int identity_mapping(struct pci_dev *pdev)
+{
+	struct device_domain_info *info;
+
+	if (likely(!iommu_identity_mapping))
+		return 0;
+
+
+	list_for_each_entry(info, &si_domain->devices, link)
+		if (info->dev == pdev)
+			return 1;
+	return 0;
+}
+
+static int domain_add_dev_info(struct dmar_domain *domain,
+				  struct pci_dev *pdev)
+{
+	struct device_domain_info *info;
+	unsigned long flags;
+
+	info = alloc_devinfo_mem();
+	if (!info)
+		return -ENOMEM;
+
+	info->segment = pci_domain_nr(pdev->bus);
+	info->bus = pdev->bus->number;
+	info->devfn = pdev->devfn;
+	info->dev = pdev;
+	info->domain = domain;
+
+	spin_lock_irqsave(&device_domain_lock, flags);
+	list_add(&info->link, &domain->devices);
+	list_add(&info->global, &device_domain_list);
+	pdev->dev.archdata.iommu = info;
+	spin_unlock_irqrestore(&device_domain_lock, flags);
+
+	return 0;
+}
+
+static int iommu_prepare_static_identity_mapping(void)
+{
+	int i;
+	struct pci_dev *pdev = NULL;
+	int ret;
+
+	ret = si_domain_init();
+	if (ret)
+		return -EFAULT;
+
+	printk(KERN_INFO "IOMMU: Setting identity map:\n");
+	for_each_pci_dev(pdev) {
+		for (i = 0; i < e820.nr_map; i++) {
+			struct e820entry *ei = &e820.map[i];
+
+			if (ei->type == E820_RAM) {
+				ret = iommu_prepare_identity_map(pdev,
+					ei->addr, ei->addr + ei->size);
+				if (ret)  {
+					printk(KERN_INFO "1:1 mapping to one domain failed.\n");
+					return -EFAULT;
+				}
+			}
+		}
+		ret = domain_add_dev_info(si_domain, pdev);
+		if (ret)
+			return ret;
+	}
+
+	return 0;
+}
+
+int __init init_dmars(void)
 {
 	struct dmar_drhd_unit *drhd;
 	struct dmar_rmrr_unit *rmrr;
@@ -1962,6 +2128,13 @@ static int __init init_dmars(void)
 	int pass_through = 1;
 
 	/*
+	 * In case pass through can not be enabled, iommu tries to use identity
+	 * mapping.
+	 */
+	if (iommu_pass_through)
+		iommu_identity_mapping = 1;
+
+	/*
 	 * for each drhd
 	 *    allocate root
 	 *    initialize and program root entry to not present
@@ -2090,9 +2263,12 @@ static int __init init_dmars(void)
 
 	/*
 	 * If pass through is not set or not enabled, setup context entries for
-	 * identity mappings for rmrr, gfx, and isa.
+	 * identity mappings for rmrr, gfx, and isa and may fall back to static
+	 * identity mapping if iommu_identity_mapping is set.
 	 */
 	if (!iommu_pass_through) {
+		if (iommu_identity_mapping)
+			iommu_prepare_static_identity_mapping();
 		/*
 		 * For each rmrr
 		 *   for each dev attached to rmrr
@@ -2107,6 +2283,7 @@ static int __init init_dmars(void)
 		 *    endfor
 		 * endfor
 		 */
+		printk(KERN_INFO "IOMMU: Setting RMRR:\n");
 		for_each_rmrr_units(rmrr) {
 			for (i = 0; i < rmrr->devices_cnt; i++) {
 				pdev = rmrr->devices[i];
@@ -2248,6 +2425,52 @@ get_valid_domain_for_dev(struct pci_dev *pdev)
 	return domain;
 }
 
+static int iommu_dummy(struct pci_dev *pdev)
+{
+	return pdev->dev.archdata.iommu == DUMMY_DEVICE_DOMAIN_INFO;
+}
+
+/* Check if the pdev needs to go through non-identity map and unmap process.*/
+static int iommu_no_mapping(struct pci_dev *pdev)
+{
+	int found;
+
+	if (!iommu_identity_mapping)
+		return iommu_dummy(pdev);
+
+	found = identity_mapping(pdev);
+	if (found) {
+		if (pdev->dma_mask > DMA_BIT_MASK(32))
+			return 1;
+		else {
+			/*
+			 * 32 bit DMA is removed from si_domain and fall back
+			 * to non-identity mapping.
+			 */
+			domain_remove_one_dev_info(si_domain, pdev);
+			printk(KERN_INFO "32bit %s uses non-identity mapping\n",
+			       pci_name(pdev));
+			return 0;
+		}
+	} else {
+		/*
+		 * In case of a detached 64 bit DMA device from vm, the device
+		 * is put into si_domain for identity mapping.
+		 */
+		if (pdev->dma_mask > DMA_BIT_MASK(32)) {
+			int ret;
+			ret = domain_add_dev_info(si_domain, pdev);
+			if (!ret) {
+				printk(KERN_INFO "64bit %s uses identity mapping\n",
+				       pci_name(pdev));
+				return 1;
+			}
+		}
+	}
+
+	return iommu_dummy(pdev);
+}
+
 static dma_addr_t __intel_map_single(struct device *hwdev, phys_addr_t paddr,
 				     size_t size, int dir, u64 dma_mask)
 {
@@ -2260,7 +2483,8 @@ static dma_addr_t __intel_map_single(struct device *hwdev, phys_addr_t paddr,
 	struct intel_iommu *iommu;
 
 	BUG_ON(dir == DMA_NONE);
-	if (pdev->dev.archdata.iommu == DUMMY_DEVICE_DOMAIN_INFO)
+
+	if (iommu_no_mapping(pdev))
 		return paddr;
 
 	domain = get_valid_domain_for_dev(pdev);
@@ -2401,8 +2625,9 @@ static void intel_unmap_page(struct device *dev, dma_addr_t dev_addr,
 	struct iova *iova;
 	struct intel_iommu *iommu;
 
-	if (pdev->dev.archdata.iommu == DUMMY_DEVICE_DOMAIN_INFO)
+	if (iommu_no_mapping(pdev))
 		return;
+
 	domain = find_domain(pdev);
 	BUG_ON(!domain);
 
@@ -2492,7 +2717,7 @@ static void intel_unmap_sg(struct device *hwdev, struct scatterlist *sglist,
 	struct scatterlist *sg;
 	struct intel_iommu *iommu;
 
-	if (pdev->dev.archdata.iommu == DUMMY_DEVICE_DOMAIN_INFO)
+	if (iommu_no_mapping(pdev))
 		return;
 
 	domain = find_domain(pdev);
@@ -2553,7 +2778,7 @@ static int intel_map_sg(struct device *hwdev, struct scatterlist *sglist, int ne
 	struct intel_iommu *iommu;
 
 	BUG_ON(dir == DMA_NONE);
-	if (pdev->dev.archdata.iommu == DUMMY_DEVICE_DOMAIN_INFO)
+	if (iommu_no_mapping(pdev))
 		return intel_nontranslate_map_sg(hwdev, sglist, nelems, dir);
 
 	domain = get_valid_domain_for_dev(pdev);
@@ -2951,31 +3176,6 @@ int __init intel_iommu_init(void)
 	return 0;
 }
 
-static int vm_domain_add_dev_info(struct dmar_domain *domain,
-				  struct pci_dev *pdev)
-{
-	struct device_domain_info *info;
-	unsigned long flags;
-
-	info = alloc_devinfo_mem();
-	if (!info)
-		return -ENOMEM;
-
-	info->segment = pci_domain_nr(pdev->bus);
-	info->bus = pdev->bus->number;
-	info->devfn = pdev->devfn;
-	info->dev = pdev;
-	info->domain = domain;
-
-	spin_lock_irqsave(&device_domain_lock, flags);
-	list_add(&info->link, &domain->devices);
-	list_add(&info->global, &device_domain_list);
-	pdev->dev.archdata.iommu = info;
-	spin_unlock_irqrestore(&device_domain_lock, flags);
-
-	return 0;
-}
-
 static void iommu_detach_dependent_devices(struct intel_iommu *iommu,
 					   struct pci_dev *pdev)
 {
@@ -3003,7 +3203,7 @@ static void iommu_detach_dependent_devices(struct intel_iommu *iommu,
 	}
 }
 
-static void vm_domain_remove_one_dev_info(struct dmar_domain *domain,
+static void domain_remove_one_dev_info(struct dmar_domain *domain,
 					  struct pci_dev *pdev)
 {
 	struct device_domain_info *info;
@@ -3136,7 +3336,7 @@ static struct dmar_domain *iommu_alloc_vm_domain(void)
 	return domain;
 }
 
-static int vm_domain_init(struct dmar_domain *domain, int guest_width)
+static int md_domain_init(struct dmar_domain *domain, int guest_width)
 {
 	int adjust_width;
 
@@ -3227,7 +3427,7 @@ static int intel_iommu_domain_init(struct iommu_domain *domain)
 			"intel_iommu_domain_init: dmar_domain == NULL\n");
 		return -ENOMEM;
 	}
-	if (vm_domain_init(dmar_domain, DEFAULT_DOMAIN_ADDRESS_WIDTH)) {
+	if (md_domain_init(dmar_domain, DEFAULT_DOMAIN_ADDRESS_WIDTH)) {
 		printk(KERN_ERR
 			"intel_iommu_domain_init() failed\n");
 		vm_domain_exit(dmar_domain);
@@ -3262,8 +3462,9 @@ static int intel_iommu_attach_device(struct iommu_domain *domain,
 
 		old_domain = find_domain(pdev);
 		if (old_domain) {
-			if (dmar_domain->flags & DOMAIN_FLAG_VIRTUAL_MACHINE)
-				vm_domain_remove_one_dev_info(old_domain, pdev);
+			if (dmar_domain->flags & DOMAIN_FLAG_VIRTUAL_MACHINE ||
+			    dmar_domain->flags & DOMAIN_FLAG_STATIC_IDENTITY)
+				domain_remove_one_dev_info(old_domain, pdev);
 			else
 				domain_remove_dev_info(old_domain);
 		}
@@ -3285,7 +3486,7 @@ static int intel_iommu_attach_device(struct iommu_domain *domain,
 		return -EFAULT;
 	}
 
-	ret = vm_domain_add_dev_info(dmar_domain, pdev);
+	ret = domain_add_dev_info(dmar_domain, pdev);
 	if (ret)
 		return ret;
 
@@ -3299,7 +3500,7 @@ static void intel_iommu_detach_device(struct iommu_domain *domain,
 	struct dmar_domain *dmar_domain = domain->priv;
 	struct pci_dev *pdev = to_pci_dev(dev);
 
-	vm_domain_remove_one_dev_info(dmar_domain, pdev);
+	domain_remove_one_dev_info(dmar_domain, pdev);
 }
 
 static int intel_iommu_map_range(struct iommu_domain *domain,

^ permalink raw reply related	[flat|nested] 61+ messages in thread

* [PATCH] IA64 Compilation Error Fix for Intel IOMMU Identity Mapping Support
  2009-06-18 18:14                   ` Yu, Fenghua
  2009-06-18 18:25                     ` Muli Ben-Yehuda
@ 2009-06-25  0:38                     ` Fenghua Yu
  2009-06-25  1:00                       ` FUJITA Tomonori
  1 sibling, 1 reply; 61+ messages in thread
From: Fenghua Yu @ 2009-06-25  0:38 UTC (permalink / raw)
  To: 'David Woodhouse', 'Tony Luck',
	'Christopher Wright', 'Linus Torvalds',
	'Andrew Morton'
  Cc: 'lkml', 'iommu', 'linux-ia64'

IA64 Compilation error fix for Intel IOMMU identity mapping

The Intel IOMMU identity mapping uses e820 to walk the memory map. This causes
compilation error on IA64 when IOMMU is configured:

drivers/pci/intel-iommu.c:42:22: error: asm/e820.h: No such file or directory

This patch fixes this compilation error on IA64. It defines architecture
dependent memory map walk functions on x86 and ia64 seperately to set up
identity mapping for all devices.

Signed-off-by: Fenghua Yu <fenghua.yu@intel.com>
Acked-by: Tony Luck <tony.luck@intel.com>

---

This patch is against the latest upstream tree.

 arch/ia64/kernel/pci-dma.c  |   27 +++++++++++++++++++++++++++
 arch/x86/kernel/pci-dma.c   |   22 ++++++++++++++++++++++
 drivers/pci/intel-iommu.c   |   38 ++++++++++++++++++++++----------------
 include/linux/intel-iommu.h |    4 ++++
 4 files changed, 75 insertions(+), 16 deletions(-)

diff --git a/arch/ia64/kernel/pci-dma.c b/arch/ia64/kernel/pci-dma.c
index 0569596..9cb3700 100644
--- a/arch/ia64/kernel/pci-dma.c
+++ b/arch/ia64/kernel/pci-dma.c
@@ -10,7 +10,9 @@
 #include <linux/dmar.h>
 #include <asm/iommu.h>
 #include <asm/machvec.h>
+#include <linux/efi.h>
 #include <linux/dma-mapping.h>
+#include <linux/intel-iommu.h>
 
 #include <asm/system.h>
 
@@ -122,4 +124,29 @@ void __init pci_iommu_alloc(void)
 #endif
 }
 
+static int __initdata identity_mapped = 1;
+
+static int __init __iommu_prepare_identity_map(u64 start, u64 end, void *pdev)
+{
+	int ret;
+
+	ret = iommu_prepare_identity_map((struct pci_dev *)pdev, start, end);
+	if (ret)  {
+		printk(KERN_INFO "1:1 mapping to one domain failed.\n");
+		identity_mapped = 0;
+		return -EFAULT;
+	}
+
+	return 0;
+}
+
+int __init iommu_setup_identity_map(struct pci_dev *pdev)
+{
+	efi_memmap_walk(__iommu_prepare_identity_map, pdev);
+	if (!identity_mapped)
+		return -EFAULT;
+
+	return 0;
+}
+
 #endif
diff --git a/arch/x86/kernel/pci-dma.c b/arch/x86/kernel/pci-dma.c
index 4763047..775ece5 100644
--- a/arch/x86/kernel/pci-dma.c
+++ b/arch/x86/kernel/pci-dma.c
@@ -10,6 +10,7 @@
 #include <asm/gart.h>
 #include <asm/calgary.h>
 #include <asm/amd_iommu.h>
+#include <linux/intel-iommu.h>
 
 static int forbid_dac __read_mostly;
 
@@ -314,3 +315,24 @@ static __devinit void via_no_dac(struct pci_dev *dev)
 }
 DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_VIA, PCI_ANY_ID, via_no_dac);
 #endif
+
+int iommu_setup_identity_map(struct pci_dev *pdev)
+{
+	int i;
+	int ret;
+
+	for (i = 0; i < e820.nr_map; i++) {
+		struct e820entry *ei = &e820.map[i];
+
+		if (ei->type == E820_RAM) {
+			ret = iommu_prepare_identity_map(pdev,
+				ei->addr, ei->addr + ei->size);
+			if (ret)  {
+				printk(KERN_INFO "1:1 mapping to one domain failed.\n");
+				return -EFAULT;
+			}
+		}
+	}
+
+	return 0;
+}
diff --git a/drivers/pci/intel-iommu.c b/drivers/pci/intel-iommu.c
index e53eacd..009a79a 100644
--- a/drivers/pci/intel-iommu.c
+++ b/drivers/pci/intel-iommu.c
@@ -39,7 +39,6 @@
 #include <linux/sysdev.h>
 #include <asm/cacheflush.h>
 #include <asm/iommu.h>
-#include <asm/e820.h>
 #include "pci.h"
 
 #define ROOT_SIZE		VTD_PAGE_SIZE
@@ -1845,9 +1844,8 @@ error:
 
 static int iommu_identity_mapping;
 
-static int iommu_prepare_identity_map(struct pci_dev *pdev,
-				      unsigned long long start,
-				      unsigned long long end)
+int iommu_prepare_identity_map(struct pci_dev *pdev, unsigned long long start,
+			       unsigned long long end)
 {
 	struct dmar_domain *domain;
 	unsigned long size;
@@ -2081,8 +2079,8 @@ static int domain_add_dev_info(struct dmar_domain *domain,
 
 static int iommu_prepare_static_identity_mapping(void)
 {
-	int i;
 	struct pci_dev *pdev = NULL;
+	int first_pdev = 1;
 	int ret;
 
 	ret = si_domain_init();
@@ -2091,17 +2089,25 @@ static int iommu_prepare_static_identity_mapping(void)
 
 	printk(KERN_INFO "IOMMU: Setting identity map:\n");
 	for_each_pci_dev(pdev) {
-		for (i = 0; i < e820.nr_map; i++) {
-			struct e820entry *ei = &e820.map[i];
-
-			if (ei->type == E820_RAM) {
-				ret = iommu_prepare_identity_map(pdev,
-					ei->addr, ei->addr + ei->size);
-				if (ret)  {
-					printk(KERN_INFO "1:1 mapping to one domain failed.\n");
-					return -EFAULT;
-				}
-			}
+		if (first_pdev) {
+			/*
+			 * first pdev sets up the whole page table for
+			 * si_domain.
+			 */
+			first_pdev = 0;
+			ret = iommu_setup_identity_map(pdev);
+			if (ret)
+				return ret;
+		} else {
+			/*
+			 * following pdev's just use the page table.
+			 */
+			printk(KERN_INFO "IOMMU: identity mapping for device %s\n",
+			       pci_name(pdev));
+			ret = domain_context_mapping(si_domain, pdev,
+						     CONTEXT_TT_MULTI_LEVEL);
+			if (ret)
+				return ret;
 		}
 		ret = domain_add_dev_info(si_domain, pdev);
 		if (ret)
diff --git a/include/linux/intel-iommu.h b/include/linux/intel-iommu.h
index 482dc91..5d470cf 100644
--- a/include/linux/intel-iommu.h
+++ b/include/linux/intel-iommu.h
@@ -359,5 +359,9 @@ extern void qi_flush_dev_iotlb(struct intel_iommu *iommu, u16 sid, u16 qdep,
 			       u64 addr, unsigned mask);
 
 extern int qi_submit_sync(struct qi_desc *desc, struct intel_iommu *iommu);
+extern int iommu_prepare_identity_map(struct pci_dev *pdev,
+				      unsigned long long start,
+				      unsigned long long end);
 
+extern int iommu_setup_identity_map(struct pci_dev *pdev);
 #endif

^ permalink raw reply related	[flat|nested] 61+ messages in thread

* Re: [PATCH] IA64 Compilation Error Fix for Intel IOMMU Identity Mapping Support
  2009-06-25  0:38                     ` [PATCH] IA64 Compilation Error Fix for Intel IOMMU Identity Mapping Support Fenghua Yu
@ 2009-06-25  1:00                       ` FUJITA Tomonori
  2009-06-25  4:16                         ` [PATCH v2] " Fenghua Yu
  0 siblings, 1 reply; 61+ messages in thread
From: FUJITA Tomonori @ 2009-06-25  1:00 UTC (permalink / raw)
  To: fenghua.yu
  Cc: dwmw2, tony.luck, chrisw, torvalds, akpm, linux-kernel, iommu,
	linux-ia64

On Wed, 24 Jun 2009 17:38:41 -0700
Fenghua Yu <fenghua.yu@intel.com> wrote:

> IA64 Compilation error fix for Intel IOMMU identity mapping
> 
> The Intel IOMMU identity mapping uses e820 to walk the memory map. This causes
> compilation error on IA64 when IOMMU is configured:
> 
> drivers/pci/intel-iommu.c:42:22: error: asm/e820.h: No such file or directory
> 
> This patch fixes this compilation error on IA64. It defines architecture
> dependent memory map walk functions on x86 and ia64 seperately to set up
> identity mapping for all devices.
> 
> Signed-off-by: Fenghua Yu <fenghua.yu@intel.com>
> Acked-by: Tony Luck <tony.luck@intel.com>
> 
> ---
> 
> This patch is against the latest upstream tree.
> 
>  arch/ia64/kernel/pci-dma.c  |   27 +++++++++++++++++++++++++++
>  arch/x86/kernel/pci-dma.c   |   22 ++++++++++++++++++++++
>  drivers/pci/intel-iommu.c   |   38 ++++++++++++++++++++++----------------
>  include/linux/intel-iommu.h |    4 ++++
>  4 files changed, 75 insertions(+), 16 deletions(-)
> 
> diff --git a/arch/ia64/kernel/pci-dma.c b/arch/ia64/kernel/pci-dma.c
> index 0569596..9cb3700 100644
> --- a/arch/ia64/kernel/pci-dma.c
> +++ b/arch/ia64/kernel/pci-dma.c
> @@ -10,7 +10,9 @@
>  #include <linux/dmar.h>
>  #include <asm/iommu.h>
>  #include <asm/machvec.h>
> +#include <linux/efi.h>
>  #include <linux/dma-mapping.h>
> +#include <linux/intel-iommu.h>
>  
>  #include <asm/system.h>
>  
> @@ -122,4 +124,29 @@ void __init pci_iommu_alloc(void)
>  #endif
>  }
>  
> +static int __initdata identity_mapped = 1;
> +
> +static int __init __iommu_prepare_identity_map(u64 start, u64 end, void *pdev)
> +{
> +	int ret;
> +
> +	ret = iommu_prepare_identity_map((struct pci_dev *)pdev, start, end);
> +	if (ret)  {
> +		printk(KERN_INFO "1:1 mapping to one domain failed.\n");
> +		identity_mapped = 0;
> +		return -EFAULT;
> +	}
> +
> +	return 0;
> +}
> +
> +int __init iommu_setup_identity_map(struct pci_dev *pdev)
> +{
> +	efi_memmap_walk(__iommu_prepare_identity_map, pdev);
> +	if (!identity_mapped)
> +		return -EFAULT;
> +
> +	return 0;
> +}
> +
>  #endif
> diff --git a/arch/x86/kernel/pci-dma.c b/arch/x86/kernel/pci-dma.c
> index 4763047..775ece5 100644
> --- a/arch/x86/kernel/pci-dma.c
> +++ b/arch/x86/kernel/pci-dma.c
> @@ -10,6 +10,7 @@
>  #include <asm/gart.h>
>  #include <asm/calgary.h>
>  #include <asm/amd_iommu.h>
> +#include <linux/intel-iommu.h>
>  
>  static int forbid_dac __read_mostly;
>  
> @@ -314,3 +315,24 @@ static __devinit void via_no_dac(struct pci_dev *dev)
>  }
>  DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_VIA, PCI_ANY_ID, via_no_dac);
>  #endif
> +
> +int iommu_setup_identity_map(struct pci_dev *pdev)
> +{
> +	int i;
> +	int ret;
> +
> +	for (i = 0; i < e820.nr_map; i++) {
> +		struct e820entry *ei = &e820.map[i];
> +
> +		if (ei->type == E820_RAM) {
> +			ret = iommu_prepare_identity_map(pdev,
> +				ei->addr, ei->addr + ei->size);

iommu_prepare_identity_map() is in drivers/pci/intel-iommu.c. So this
patch breaks x86_64 build on !CONFIG_DMAR. IA64 works since
arch/ia64/kernel/pci-dma.c is only build on CONFIG_DMAR though.

^ permalink raw reply	[flat|nested] 61+ messages in thread

* [PATCH v2] IA64 Compilation Error Fix for Intel IOMMU Identity Mapping Support
  2009-06-25  1:00                       ` FUJITA Tomonori
@ 2009-06-25  4:16                         ` Fenghua Yu
  2009-06-25  4:48                           ` FUJITA Tomonori
  0 siblings, 1 reply; 61+ messages in thread
From: Fenghua Yu @ 2009-06-25  4:16 UTC (permalink / raw)
  To: FUJITA Tomonori, David Woodhouse, Christopher Wright,
	Linus Torvalds, Andrew Morton, Tony Luck
  Cc: linux-kernel, iommu, linux-ia64

IA64 compilation error fix for Intel IOMMU identity mapping
 
The Intel IOMMU identity mapping uses e820 to walk the memory map. This causes
compilation error on IA64 when IOMMU is configured:
 
drivers/pci/intel-iommu.c:42:22: error: asm/e820.h: No such file or directory
 
This patch fixes this compilation error on IA64. It defines architecture
dependent memory map walk functions on x86 and ia64 seperately to set up
identity mapping for all devices.
 
Signed-off-by: Fenghua Yu <fenghua.yu@intel.com>
Acked-by: Tony Luck <tony.luck@intel.com>
 
---
 
This patch is against the latest upstream tree.

 arch/ia64/kernel/pci-dma.c  |   27 +++++++++++++++++++++++++++
 arch/x86/kernel/pci-dma.c   |   29 +++++++++++++++++++++++++++++
 drivers/pci/intel-iommu.c   |   38 ++++++++++++++++++++++----------------
 include/linux/intel-iommu.h |    4 ++++
 4 files changed, 82 insertions(+), 16 deletions(-)

diff --git a/arch/ia64/kernel/pci-dma.c b/arch/ia64/kernel/pci-dma.c
index 0569596..9cb3700 100644
--- a/arch/ia64/kernel/pci-dma.c
+++ b/arch/ia64/kernel/pci-dma.c
@@ -10,7 +10,9 @@
 #include <linux/dmar.h>
 #include <asm/iommu.h>
 #include <asm/machvec.h>
+#include <linux/efi.h>
 #include <linux/dma-mapping.h>
+#include <linux/intel-iommu.h>
 
 #include <asm/system.h>
 
@@ -122,4 +124,29 @@ void __init pci_iommu_alloc(void)
 #endif
 }
 
+static int __initdata identity_mapped = 1;
+
+static int __init __iommu_prepare_identity_map(u64 start, u64 end, void *pdev)
+{
+	int ret;
+
+	ret = iommu_prepare_identity_map((struct pci_dev *)pdev, start, end);
+	if (ret)  {
+		printk(KERN_INFO "1:1 mapping to one domain failed.\n");
+		identity_mapped = 0;
+		return -EFAULT;
+	}
+
+	return 0;
+}
+
+int __init iommu_setup_identity_map(struct pci_dev *pdev)
+{
+	efi_memmap_walk(__iommu_prepare_identity_map, pdev);
+	if (!identity_mapped)
+		return -EFAULT;
+
+	return 0;
+}
+
 #endif
diff --git a/arch/x86/kernel/pci-dma.c b/arch/x86/kernel/pci-dma.c
index 4763047..16dcc61 100644
--- a/arch/x86/kernel/pci-dma.c
+++ b/arch/x86/kernel/pci-dma.c
@@ -10,6 +10,7 @@
 #include <asm/gart.h>
 #include <asm/calgary.h>
 #include <asm/amd_iommu.h>
+#include <linux/intel-iommu.h>
 
 static int forbid_dac __read_mostly;
 
@@ -314,3 +315,31 @@ static __devinit void via_no_dac(struct pci_dev *dev)
 }
 DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_VIA, PCI_ANY_ID, via_no_dac);
 #endif
+
+#ifdef CONFIG_DMAR
+int iommu_setup_identity_map(struct pci_dev *pdev)
+{
+	int i;
+	int ret;
+
+	for (i = 0; i < e820.nr_map; i++) {
+		struct e820entry *ei = &e820.map[i];
+
+		if (ei->type == E820_RAM) {
+			ret = iommu_prepare_identity_map(pdev,
+				ei->addr, ei->addr + ei->size);
+			if (ret)  {
+				printk(KERN_INFO "1:1 mapping to one domain failed.\n");
+				return -EFAULT;
+			}
+		}
+	}
+
+	return 0;
+}
+#else
+int iommu_setup_identity_map(struct pci_dev *pdev)
+{
+	return 0;
+}
+#endif
diff --git a/drivers/pci/intel-iommu.c b/drivers/pci/intel-iommu.c
index e53eacd..009a79a 100644
--- a/drivers/pci/intel-iommu.c
+++ b/drivers/pci/intel-iommu.c
@@ -39,7 +39,6 @@
 #include <linux/sysdev.h>
 #include <asm/cacheflush.h>
 #include <asm/iommu.h>
-#include <asm/e820.h>
 #include "pci.h"
 
 #define ROOT_SIZE		VTD_PAGE_SIZE
@@ -1845,9 +1844,8 @@ error:
 
 static int iommu_identity_mapping;
 
-static int iommu_prepare_identity_map(struct pci_dev *pdev,
-				      unsigned long long start,
-				      unsigned long long end)
+int iommu_prepare_identity_map(struct pci_dev *pdev, unsigned long long start,
+			       unsigned long long end)
 {
 	struct dmar_domain *domain;
 	unsigned long size;
@@ -2081,8 +2079,8 @@ static int domain_add_dev_info(struct dmar_domain *domain,
 
 static int iommu_prepare_static_identity_mapping(void)
 {
-	int i;
 	struct pci_dev *pdev = NULL;
+	int first_pdev = 1;
 	int ret;
 
 	ret = si_domain_init();
@@ -2091,17 +2089,25 @@ static int iommu_prepare_static_identity_mapping(void)
 
 	printk(KERN_INFO "IOMMU: Setting identity map:\n");
 	for_each_pci_dev(pdev) {
-		for (i = 0; i < e820.nr_map; i++) {
-			struct e820entry *ei = &e820.map[i];
-
-			if (ei->type == E820_RAM) {
-				ret = iommu_prepare_identity_map(pdev,
-					ei->addr, ei->addr + ei->size);
-				if (ret)  {
-					printk(KERN_INFO "1:1 mapping to one domain failed.\n");
-					return -EFAULT;
-				}
-			}
+		if (first_pdev) {
+			/*
+			 * first pdev sets up the whole page table for
+			 * si_domain.
+			 */
+			first_pdev = 0;
+			ret = iommu_setup_identity_map(pdev);
+			if (ret)
+				return ret;
+		} else {
+			/*
+			 * following pdev's just use the page table.
+			 */
+			printk(KERN_INFO "IOMMU: identity mapping for device %s\n",
+			       pci_name(pdev));
+			ret = domain_context_mapping(si_domain, pdev,
+						     CONTEXT_TT_MULTI_LEVEL);
+			if (ret)
+				return ret;
 		}
 		ret = domain_add_dev_info(si_domain, pdev);
 		if (ret)
diff --git a/include/linux/intel-iommu.h b/include/linux/intel-iommu.h
index 482dc91..5d470cf 100644
--- a/include/linux/intel-iommu.h
+++ b/include/linux/intel-iommu.h
@@ -359,5 +359,9 @@ extern void qi_flush_dev_iotlb(struct intel_iommu *iommu, u16 sid, u16 qdep,
 			       u64 addr, unsigned mask);
 
 extern int qi_submit_sync(struct qi_desc *desc, struct intel_iommu *iommu);
+extern int iommu_prepare_identity_map(struct pci_dev *pdev,
+				      unsigned long long start,
+				      unsigned long long end);
 
+extern int iommu_setup_identity_map(struct pci_dev *pdev);
 #endif

^ permalink raw reply related	[flat|nested] 61+ messages in thread

* Re: [PATCH v2] IA64 Compilation Error Fix for Intel IOMMU Identity Mapping Support
  2009-06-25  4:16                         ` [PATCH v2] " Fenghua Yu
@ 2009-06-25  4:48                           ` FUJITA Tomonori
  2009-06-25  7:11                             ` David Woodhouse
  0 siblings, 1 reply; 61+ messages in thread
From: FUJITA Tomonori @ 2009-06-25  4:48 UTC (permalink / raw)
  To: fenghua.yu
  Cc: fujita.tomonori, dwmw2, chrisw, torvalds, akpm, tony.luck,
	linux-kernel, iommu, linux-ia64

On Wed, 24 Jun 2009 21:16:05 -0700
Fenghua Yu <fenghua.yu@intel.com> wrote:

> IA64 compilation error fix for Intel IOMMU identity mapping
>  
> The Intel IOMMU identity mapping uses e820 to walk the memory map. This causes
> compilation error on IA64 when IOMMU is configured:
>  
> drivers/pci/intel-iommu.c:42:22: error: asm/e820.h: No such file or directory
>  
> This patch fixes this compilation error on IA64. It defines architecture
> dependent memory map walk functions on x86 and ia64 seperately to set up
> identity mapping for all devices.
>  
> Signed-off-by: Fenghua Yu <fenghua.yu@intel.com>
> Acked-by: Tony Luck <tony.luck@intel.com>
>  
> ---
>  
> This patch is against the latest upstream tree.
> 
>  arch/ia64/kernel/pci-dma.c  |   27 +++++++++++++++++++++++++++
>  arch/x86/kernel/pci-dma.c   |   29 +++++++++++++++++++++++++++++
>  drivers/pci/intel-iommu.c   |   38 ++++++++++++++++++++++----------------
>  include/linux/intel-iommu.h |    4 ++++
>  4 files changed, 82 insertions(+), 16 deletions(-)
> 
> diff --git a/arch/ia64/kernel/pci-dma.c b/arch/ia64/kernel/pci-dma.c
> index 0569596..9cb3700 100644
> --- a/arch/ia64/kernel/pci-dma.c
> +++ b/arch/ia64/kernel/pci-dma.c
> @@ -10,7 +10,9 @@
>  #include <linux/dmar.h>
>  #include <asm/iommu.h>
>  #include <asm/machvec.h>
> +#include <linux/efi.h>
>  #include <linux/dma-mapping.h>
> +#include <linux/intel-iommu.h>
>  
>  #include <asm/system.h>
>  
> @@ -122,4 +124,29 @@ void __init pci_iommu_alloc(void)
>  #endif
>  }
>  
> +static int __initdata identity_mapped = 1;
> +
> +static int __init __iommu_prepare_identity_map(u64 start, u64 end, void *pdev)
> +{
> +	int ret;
> +
> +	ret = iommu_prepare_identity_map((struct pci_dev *)pdev, start, end);
> +	if (ret)  {
> +		printk(KERN_INFO "1:1 mapping to one domain failed.\n");
> +		identity_mapped = 0;
> +		return -EFAULT;
> +	}
> +
> +	return 0;
> +}
> +
> +int __init iommu_setup_identity_map(struct pci_dev *pdev)
> +{
> +	efi_memmap_walk(__iommu_prepare_identity_map, pdev);
> +	if (!identity_mapped)
> +		return -EFAULT;
> +
> +	return 0;
> +}
> +
>  #endif
> diff --git a/arch/x86/kernel/pci-dma.c b/arch/x86/kernel/pci-dma.c
> index 4763047..16dcc61 100644
> --- a/arch/x86/kernel/pci-dma.c
> +++ b/arch/x86/kernel/pci-dma.c
> @@ -10,6 +10,7 @@
>  #include <asm/gart.h>
>  #include <asm/calgary.h>
>  #include <asm/amd_iommu.h>
> +#include <linux/intel-iommu.h>
>  
>  static int forbid_dac __read_mostly;
>  
> @@ -314,3 +315,31 @@ static __devinit void via_no_dac(struct pci_dev *dev)
>  }
>  DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_VIA, PCI_ANY_ID, via_no_dac);
>  #endif
> +
> +#ifdef CONFIG_DMAR
> +int iommu_setup_identity_map(struct pci_dev *pdev)
> +{
> +	int i;
> +	int ret;
> +
> +	for (i = 0; i < e820.nr_map; i++) {
> +		struct e820entry *ei = &e820.map[i];
> +
> +		if (ei->type == E820_RAM) {
> +			ret = iommu_prepare_identity_map(pdev,
> +				ei->addr, ei->addr + ei->size);
> +			if (ret)  {
> +				printk(KERN_INFO "1:1 mapping to one domain failed.\n");
> +				return -EFAULT;
> +			}
> +		}
> +	}
> +
> +	return 0;
> +}
> +#else
> +int iommu_setup_identity_map(struct pci_dev *pdev)
> +{
> +	return 0;
> +}
> +#endif

I don't think that we need #else part.

This patch is really ugly; adds another ifdef and VT-D specific code
to the generic place (arch/x86/kernel/pci-dma.c).

However, I guess that we need to live with this for now since this
fixes the build error.

Another complaint from me is that, IIRC, the patch causes this build
error was posted was posted first during the merge window (18 June)
and merged few days later without properly reviewed (or compile
tested).

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [PATCH v2] IA64 Compilation Error Fix for Intel IOMMU Identity Mapping Support
  2009-06-25  4:48                           ` FUJITA Tomonori
@ 2009-06-25  7:11                             ` David Woodhouse
  2009-06-25 21:52                               ` David Woodhouse
  0 siblings, 1 reply; 61+ messages in thread
From: David Woodhouse @ 2009-06-25  7:11 UTC (permalink / raw)
  To: FUJITA Tomonori
  Cc: fenghua.yu, chrisw, torvalds, akpm, tony.luck, linux-kernel,
	iommu, linux-ia64

On Thu, 2009-06-25 at 13:48 +0900, FUJITA Tomonori wrote:
> This patch is really ugly; adds another ifdef and VT-D specific code
> to the generic place (arch/x86/kernel/pci-dma.c).
> 
> However, I guess that we need to live with this for now since this
> fixes the build error.

It raises the question: Why are we using firmware-specific interfaces to
list the available memory -- can't we get that from somewhere _generic_?

The less we tie our code to these crappy BIOS, EFI and ACPI interfaces,
the better off we'll be.

> Another complaint from me is that, IIRC, the patch causes this build
> error was posted was posted first during the merge window (18 June)
> and merged few days later without properly reviewed (or compile
> tested).

I apologise for that. I got to it rather late, just as I was reviewing
my outstanding code for Linus to pull. It _had_ been reviewed though --
this was the second incarnation of the patch, after review.

I didn't include it in my original pull request, and instead let it sit
in linux-next for a day (which is not long, I know, but better than
nothing). I then spent that day testing it myself, and tracking down a
bug in it.

I sent the fixed version to Linus at the end of the day -- I _thought_
I would have been told of any IA64 build failures in linux-next by then,
even if the original hadn't been tested on IA64 -- evidently I was
wrong.

I really must set myself up an IA64 cross-compiler. Every time I've
looked at that so far I've got distracted into the whole "why is
GCC/glibc build so incestuously broken" quagmire -- but I only need gcc,
so I'll steel myself to ignore the braindamage and just build a simple
compiler without glibc support.

Better still would be if I could find some IA64 hardware with this
IOMMU. Would kind of help with maintenance...

-- 
David Woodhouse                            Open Source Technology Centre
David.Woodhouse@intel.com                              Intel Corporation


^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [PATCH v2] IA64 Compilation Error Fix for Intel IOMMU Identity Mapping Support
  2009-06-25  7:11                             ` David Woodhouse
@ 2009-06-25 21:52                               ` David Woodhouse
  2009-06-25 21:56                                 ` Yu, Fenghua
                                                   ` (2 more replies)
  0 siblings, 3 replies; 61+ messages in thread
From: David Woodhouse @ 2009-06-25 21:52 UTC (permalink / raw)
  To: FUJITA Tomonori
  Cc: fenghua.yu, chrisw, torvalds, akpm, tony.luck, linux-kernel,
	iommu, linux-ia64

On Thu, 2009-06-25 at 08:11 +0100, David Woodhouse wrote:
> It raises the question: Why are we using firmware-specific interfaces to
> list the available memory -- can't we get that from somewhere _generic_?
> 
> The less we tie our code to these crappy BIOS, EFI and ACPI interfaces,
> the better off we'll be.

Does this work everywhere... ?

(Fenghua, why are we doing the whole setup per pci dev anyway -- why not
set up the page tables once and point all devices at the same page
tables?)

diff --git a/drivers/pci/intel-iommu.c b/drivers/pci/intel-iommu.c
index e53eacd..1f0830a 100644
--- a/drivers/pci/intel-iommu.c
+++ b/drivers/pci/intel-iommu.c
@@ -39,7 +39,6 @@
 #include <linux/sysdev.h>
 #include <asm/cacheflush.h>
 #include <asm/iommu.h>
-#include <asm/e820.h>
 #include "pci.h"
 
 #define ROOT_SIZE		VTD_PAGE_SIZE
@@ -2081,7 +2080,6 @@ static int domain_add_dev_info(struct dmar_domain *domain,
 
 static int iommu_prepare_static_identity_mapping(void)
 {
-	int i;
 	struct pci_dev *pdev = NULL;
 	int ret;
 
@@ -2091,16 +2089,18 @@ static int iommu_prepare_static_identity_mapping(void)
 
 	printk(KERN_INFO "IOMMU: Setting identity map:\n");
 	for_each_pci_dev(pdev) {
-		for (i = 0; i < e820.nr_map; i++) {
-			struct e820entry *ei = &e820.map[i];
-
-			if (ei->type == E820_RAM) {
-				ret = iommu_prepare_identity_map(pdev,
-					ei->addr, ei->addr + ei->size);
-				if (ret)  {
-					printk(KERN_INFO "1:1 mapping to one domain failed.\n");
-					return -EFAULT;
-				}
+		struct zone *zone;
+
+		for_each_populated_zone(zone) {
+			unsigned long zone_end_pfn = zone->zone_start_pfn +
+						     zone->spanned_pages;
+
+			ret = iommu_prepare_identity_map(pdev,
+				 (uint64_t)zone->zone_start_pfn << PAGE_SHIFT,
+				 (uint64_t)zone_end_pfn << PAGE_SHIFT);
+			if (ret) {
+				printk(KERN_INFO "1:1 mapping to one domain failed.\n");
+				return -EFAULT;
 			}
 		}
 		ret = domain_add_dev_info(si_domain, pdev);



-- 
David Woodhouse                            Open Source Technology Centre
David.Woodhouse@intel.com                              Intel Corporation


^ permalink raw reply related	[flat|nested] 61+ messages in thread

* RE: [PATCH v2] IA64 Compilation Error Fix for Intel IOMMU Identity Mapping Support
  2009-06-25 21:52                               ` David Woodhouse
@ 2009-06-25 21:56                                 ` Yu, Fenghua
  2009-06-26 18:21                                   ` David Woodhouse
  2009-06-25 22:00                                 ` Linus Torvalds
  2009-06-25 23:43                                 ` Chris Wright
  2 siblings, 1 reply; 61+ messages in thread
From: Yu, Fenghua @ 2009-06-25 21:56 UTC (permalink / raw)
  To: 'David Woodhouse', 'FUJITA Tomonori'
  Cc: 'chrisw@redhat.com',
	'torvalds@linux-foundation.org',
	'akpm@linux-foundation.org',
	Luck, Tony, 'linux-kernel@vger.kernel.org',
	'iommu@lists.linux-foundation.org',
	'linux-ia64@vger.kernel.org'

>-----Original Message-----
>From: David Woodhouse [mailto:dwmw2@infradead.org]
>Sent: Thursday, June 25, 2009 2:52 PM
>To: FUJITA Tomonori
>Cc: Yu, Fenghua; chrisw@redhat.com; torvalds@linux-foundation.org;
>akpm@linux-foundation.org; Luck, Tony; linux-kernel@vger.kernel.org;
>iommu@lists.linux-foundation.org; linux-ia64@vger.kernel.org
>Subject: Re: [PATCH v2] IA64 Compilation Error Fix for Intel IOMMU Identity
>Mapping Support
>
>On Thu, 2009-06-25 at 08:11 +0100, David Woodhouse wrote:
>> It raises the question: Why are we using firmware-specific interfaces to
>> list the available memory -- can't we get that from somewhere _generic_?
>>
>> The less we tie our code to these crappy BIOS, EFI and ACPI interfaces,
>> the better off we'll be.
>
>Does this work everywhere... ?
>
>(Fenghua, why are we doing the whole setup per pci dev anyway -- why not
>set up the page tables once and point all devices at the same page
>tables?)
>

That's what the ia64 fix patch does...the first pci device set up the page table. Then all following pci devices use that page table by setting up the context mapping.

+		if (first_pdev) {
+			/*
+			 * first pdev sets up the whole page table for
+			 * si_domain.
+			 */
+			first_pdev = 0;
+			ret = iommu_setup_identity_map(pdev);
+			if (ret)
+				return ret;
+		} else {
+			/*
+			 * following pdev's just use the page table.
+			 */
+			printk(KERN_INFO "IOMMU: identity mapping for device %s\n",
+			       pci_name(pdev));
+			ret = domain_context_mapping(si_domain, pdev,
+						     CONTEXT_TT_MULTI_LEVEL);
+			if (ret)
Thanks.

-Fenghua

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [PATCH v2] IA64 Compilation Error Fix for Intel IOMMU Identity Mapping Support
  2009-06-25 21:52                               ` David Woodhouse
  2009-06-25 21:56                                 ` Yu, Fenghua
@ 2009-06-25 22:00                                 ` Linus Torvalds
  2009-06-25 22:46                                   ` Tony Luck
  2009-06-25 23:43                                 ` Chris Wright
  2 siblings, 1 reply; 61+ messages in thread
From: Linus Torvalds @ 2009-06-25 22:00 UTC (permalink / raw)
  To: David Woodhouse
  Cc: FUJITA Tomonori, fenghua.yu, chrisw, akpm, tony.luck,
	linux-kernel, iommu, linux-ia64



On Thu, 25 Jun 2009, David Woodhouse wrote:

> On Thu, 2009-06-25 at 08:11 +0100, David Woodhouse wrote:
> > It raises the question: Why are we using firmware-specific interfaces to
> > list the available memory -- can't we get that from somewhere _generic_?
> > 
> > The less we tie our code to these crappy BIOS, EFI and ACPI interfaces,
> > the better off we'll be.
> 
> Does this work everywhere... ?

Whimper.

Please pleas please please.. Pretty please, let this work (instead of 
duplicating the insanity).

		Linus



^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [PATCH v2] IA64 Compilation Error Fix for Intel IOMMU Identity  Mapping Support
  2009-06-25 22:00                                 ` Linus Torvalds
@ 2009-06-25 22:46                                   ` Tony Luck
  0 siblings, 0 replies; 61+ messages in thread
From: Tony Luck @ 2009-06-25 22:46 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: David Woodhouse, FUJITA Tomonori, fenghua.yu, chrisw, akpm,
	linux-kernel, iommu, linux-ia64

On Thu, Jun 25, 2009 at 3:00 PM, Linus
Torvalds<torvalds@linux-foundation.org> wrote:
>> Does this work everywhere... ?
>
> Whimper.
>
> Please pleas please please.. Pretty please, let this work (instead of
> duplicating the insanity).

Well it compiles on all the ia64 configurations (with and without
iommu configured).  So that is a big step forward.  It boots on
a machine w/o an iommu.  Fenghua will have to test on his machine
that has an iommu to confirm that this works for ia64.

-Tony

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [PATCH v2] IA64 Compilation Error Fix for Intel IOMMU Identity Mapping Support
  2009-06-25 21:52                               ` David Woodhouse
  2009-06-25 21:56                                 ` Yu, Fenghua
  2009-06-25 22:00                                 ` Linus Torvalds
@ 2009-06-25 23:43                                 ` Chris Wright
  2009-06-26  1:35                                   ` Linus Torvalds
  2 siblings, 1 reply; 61+ messages in thread
From: Chris Wright @ 2009-06-25 23:43 UTC (permalink / raw)
  To: David Woodhouse
  Cc: FUJITA Tomonori, fenghua.yu, chrisw, torvalds, akpm, tony.luck,
	linux-kernel, iommu, linux-ia64

* David Woodhouse (dwmw2@infradead.org) wrote:
> On Thu, 2009-06-25 at 08:11 +0100, David Woodhouse wrote:
> > It raises the question: Why are we using firmware-specific interfaces to
> > list the available memory -- can't we get that from somewhere _generic_?
> > 
> > The less we tie our code to these crappy BIOS, EFI and ACPI interfaces,
> > the better off we'll be.
> 
> Does this work everywhere... ?

Why don't we just use what's already there?  That should already be
working w/ IA-64.

Then it's clearer for later consolidation (there's no reason to have all
these different copies of same page tables).

thanks,
-chris
---
From: Chris Wright <chrisw@redhat.com>
Subject: [PATCH] intel-iommu: fix Identity Mapping to be arch independent

Drop the e820 scanning and use existing function for finding valid RAM
regions to add to 1:1 mapping.

Signed-off-by: Chris Wright <chrisw@redhat.com>
---
 drivers/pci/intel-iommu.c |   17 ++++-------------
 1 files changed, 4 insertions(+), 13 deletions(-)

diff --git a/drivers/pci/intel-iommu.c b/drivers/pci/intel-iommu.c
index e53eacd..151e7d9 100644
--- a/drivers/pci/intel-iommu.c
+++ b/drivers/pci/intel-iommu.c
@@ -39,7 +39,6 @@
 #include <linux/sysdev.h>
 #include <asm/cacheflush.h>
 #include <asm/iommu.h>
-#include <asm/e820.h>
 #include "pci.h"
 
 #define ROOT_SIZE		VTD_PAGE_SIZE
@@ -2081,7 +2080,6 @@ static int domain_add_dev_info(struct dmar_domain *domain,
 
 static int iommu_prepare_static_identity_mapping(void)
 {
-	int i;
 	struct pci_dev *pdev = NULL;
 	int ret;
 
@@ -2091,17 +2089,10 @@ static int iommu_prepare_static_identity_mapping(void)
 
 	printk(KERN_INFO "IOMMU: Setting identity map:\n");
 	for_each_pci_dev(pdev) {
-		for (i = 0; i < e820.nr_map; i++) {
-			struct e820entry *ei = &e820.map[i];
-
-			if (ei->type == E820_RAM) {
-				ret = iommu_prepare_identity_map(pdev,
-					ei->addr, ei->addr + ei->size);
-				if (ret)  {
-					printk(KERN_INFO "1:1 mapping to one domain failed.\n");
-					return -EFAULT;
-				}
-			}
+		ret = iommu_prepare_with_active_regions(pdev);
+		if (ret) {
+			printk(KERN_INFO "1:1 mapping to one domain failed.\n");
+			return -EFAULT;
 		}
 		ret = domain_add_dev_info(si_domain, pdev);
 		if (ret)

^ permalink raw reply related	[flat|nested] 61+ messages in thread

* Re: [PATCH v2] IA64 Compilation Error Fix for Intel IOMMU Identity Mapping Support
  2009-06-25 23:43                                 ` Chris Wright
@ 2009-06-26  1:35                                   ` Linus Torvalds
  2009-06-26  1:52                                     ` Chris Wright
  0 siblings, 1 reply; 61+ messages in thread
From: Linus Torvalds @ 2009-06-26  1:35 UTC (permalink / raw)
  To: Chris Wright
  Cc: David Woodhouse, FUJITA Tomonori, fenghua.yu, akpm, tony.luck,
	linux-kernel, iommu, linux-ia64



On Thu, 25 Jun 2009, Chris Wright wrote:
>
> From: Chris Wright <chrisw@redhat.com>
> Subject: [PATCH] intel-iommu: fix Identity Mapping to be arch independent
> 
> Drop the e820 scanning and use existing function for finding valid RAM
> regions to add to 1:1 mapping.
> 
> Signed-off-by: Chris Wright <chrisw@redhat.com>

Is this tested on something that actually uses IOMMU?

If so, of the patches I've seen so far, this is obviously my favorite.

		Linus

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [PATCH v2] IA64 Compilation Error Fix for Intel IOMMU Identity Mapping Support
  2009-06-26  1:35                                   ` Linus Torvalds
@ 2009-06-26  1:52                                     ` Chris Wright
  2009-06-26  2:00                                       ` Linus Torvalds
  0 siblings, 1 reply; 61+ messages in thread
From: Chris Wright @ 2009-06-26  1:52 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Chris Wright, David Woodhouse, FUJITA Tomonori, fenghua.yu, akpm,
	tony.luck, linux-kernel, iommu, linux-ia64

* Linus Torvalds (torvalds@linux-foundation.org) wrote:
> On Thu, 25 Jun 2009, Chris Wright wrote:
> >
> > From: Chris Wright <chrisw@redhat.com>
> > Subject: [PATCH] intel-iommu: fix Identity Mapping to be arch independent
> > 
> > Drop the e820 scanning and use existing function for finding valid RAM
> > regions to add to 1:1 mapping.
> > 
> > Signed-off-by: Chris Wright <chrisw@redhat.com>
> 
> Is this tested on something that actually uses IOMMU?
> 
> If so, of the patches I've seen so far, this is obviously my favorite.

I tested it on x86; works fine and creates a slightly tighter set of page
tables (matching to e820).  But I don't have an IA-64 box w/ an IOMMU.

It does have one problem, however, the graphics work around code isn't
compiled in on IA-64, so it needs:

(a) a small ifdef move
(b) to be tested on IA-64, since my hypothesis was based on the fact that
it was already being used on IA-64

Here's v2, fixing (a) above (compile tested on IA-64), still needing (b).

thanks,
-chris
---
From: Chris Wright <chrisw@redhat.com>
Subject: [PATCH v2] intel-iommu: fix Identity Mapping to be arch independent

Drop the e820 scanning and use existing function for finding valid RAM
regions to add to 1:1 mapping.

Signed-off-by: Chris Wright <chrisw@redhat.com>
---
 drivers/pci/intel-iommu.c |   19 +++++--------------
 1 files changed, 5 insertions(+), 14 deletions(-)

diff --git a/drivers/pci/intel-iommu.c b/drivers/pci/intel-iommu.c
index e53eacd..420afa8 100644
--- a/drivers/pci/intel-iommu.c
+++ b/drivers/pci/intel-iommu.c
@@ -39,7 +39,6 @@
 #include <linux/sysdev.h>
 #include <asm/cacheflush.h>
 #include <asm/iommu.h>
-#include <asm/e820.h>
 #include "pci.h"
 
 #define ROOT_SIZE		VTD_PAGE_SIZE
@@ -1908,7 +1907,6 @@ static inline int iommu_prepare_rmrr_dev(struct dmar_rmrr_unit *rmrr,
 		rmrr->end_address + 1);
 }
 
-#ifdef CONFIG_DMAR_GFX_WA
 struct iommu_prepare_data {
 	struct pci_dev *pdev;
 	int ret;
@@ -1943,6 +1941,7 @@ static int __init iommu_prepare_with_active_regions(struct pci_dev *pdev)
 	return data.ret;
 }
 
+#ifdef CONFIG_DMAR_GFX_WA
 static void __init iommu_prepare_gfx_mapping(void)
 {
 	struct pci_dev *pdev = NULL;
@@ -2081,7 +2080,6 @@ static int domain_add_dev_info(struct dmar_domain *domain,
 
 static int iommu_prepare_static_identity_mapping(void)
 {
-	int i;
 	struct pci_dev *pdev = NULL;
 	int ret;
 
@@ -2091,17 +2089,10 @@ static int iommu_prepare_static_identity_mapping(void)
 
 	printk(KERN_INFO "IOMMU: Setting identity map:\n");
 	for_each_pci_dev(pdev) {
-		for (i = 0; i < e820.nr_map; i++) {
-			struct e820entry *ei = &e820.map[i];
-
-			if (ei->type == E820_RAM) {
-				ret = iommu_prepare_identity_map(pdev,
-					ei->addr, ei->addr + ei->size);
-				if (ret)  {
-					printk(KERN_INFO "1:1 mapping to one domain failed.\n");
-					return -EFAULT;
-				}
-			}
+		ret = iommu_prepare_with_active_regions(pdev);
+		if (ret) {
+			printk(KERN_INFO "1:1 mapping to one domain failed.\n");
+			return -EFAULT;
 		}
 		ret = domain_add_dev_info(si_domain, pdev);
 		if (ret)

^ permalink raw reply related	[flat|nested] 61+ messages in thread

* Re: [PATCH v2] IA64 Compilation Error Fix for Intel IOMMU Identity Mapping Support
  2009-06-26  1:52                                     ` Chris Wright
@ 2009-06-26  2:00                                       ` Linus Torvalds
  2009-06-26  2:08                                         ` Chris Wright
  2009-06-27 11:44                                         ` David Woodhouse
  0 siblings, 2 replies; 61+ messages in thread
From: Linus Torvalds @ 2009-06-26  2:00 UTC (permalink / raw)
  To: Chris Wright
  Cc: David Woodhouse, FUJITA Tomonori, fenghua.yu, akpm, tony.luck,
	linux-kernel, iommu, linux-ia64



On Thu, 25 Jun 2009, Chris Wright wrote:
> I tested it on x86; works fine and creates a slightly tighter set of page
> tables (matching to e820).  But I don't have an IA-64 box w/ an IOMMU.
> 
> It does have one problem, however, the graphics work around code isn't
> compiled in on IA-64, so it needs:

Ok. Sounds like what I want is to get this patch through the ia64 people, 
after it has gotten some testing there. Since ia64 is where the current 
kernel fails anyway, that sounds like the motivation will be there too.

And there's little point in me merging it until it's been tested to fix 
the actual problem we have now.

Thanks,

		Linus

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [PATCH v2] IA64 Compilation Error Fix for Intel IOMMU Identity Mapping Support
  2009-06-26  2:00                                       ` Linus Torvalds
@ 2009-06-26  2:08                                         ` Chris Wright
  2009-06-26 11:15                                           ` David Woodhouse
  2009-06-27 11:44                                         ` David Woodhouse
  1 sibling, 1 reply; 61+ messages in thread
From: Chris Wright @ 2009-06-26  2:08 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Chris Wright, David Woodhouse, FUJITA Tomonori, fenghua.yu, akpm,
	tony.luck, linux-kernel, iommu, linux-ia64

* Linus Torvalds (torvalds@linux-foundation.org) wrote:
> Ok. Sounds like what I want is to get this patch through the ia64 people, 
> after it has gotten some testing there. Since ia64 is where the current 
> kernel fails anyway, that sounds like the motivation will be there too.

Yup, I agree.

thanks,
-chris

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [PATCH v2] IA64 Compilation Error Fix for Intel IOMMU Identity Mapping Support
  2009-06-26  2:08                                         ` Chris Wright
@ 2009-06-26 11:15                                           ` David Woodhouse
  2009-06-27  0:03                                             ` Chris Wright
  0 siblings, 1 reply; 61+ messages in thread
From: David Woodhouse @ 2009-06-26 11:15 UTC (permalink / raw)
  To: Chris Wright
  Cc: Linus Torvalds, FUJITA Tomonori, fenghua.yu, akpm, tony.luck,
	linux-kernel, iommu, linux-ia64

On Thu, 25 Jun 2009, Chris Wright wrote:

> * Linus Torvalds (torvalds@linux-foundation.org) wrote:
>> Ok. Sounds like what I want is to get this patch through the ia64 people,
>> after it has gotten some testing there. Since ia64 is where the current
>> kernel fails anyway, that sounds like the motivation will be there too.
>
> Yup, I agree.

Tony, please could you test what's in git://git.infradead.org/iommu-2.6.git

It builds, but we'd like to check that it works correctly with iommu=pt on 
IA64.

(I'm leaving Fenghua's performance improvement for later.)

-- 
dwmw2

^ permalink raw reply	[flat|nested] 61+ messages in thread

* RE: [PATCH v2] IA64 Compilation Error Fix for Intel IOMMU Identity Mapping Support
  2009-06-25 21:56                                 ` Yu, Fenghua
@ 2009-06-26 18:21                                   ` David Woodhouse
  0 siblings, 0 replies; 61+ messages in thread
From: David Woodhouse @ 2009-06-26 18:21 UTC (permalink / raw)
  To: Yu, Fenghua
  Cc: 'FUJITA Tomonori', 'chrisw@redhat.com',
	'torvalds@linux-foundation.org',
	'akpm@linux-foundation.org',
	Luck, Tony, 'linux-kernel@vger.kernel.org',
	'iommu@lists.linux-foundation.org',
	'linux-ia64@vger.kernel.org'

On Thu, 2009-06-25 at 14:56 -0700, Yu, Fenghua wrote:
> 
> >(Fenghua, why are we doing the whole setup per pci dev anyway -- why
> not
> >set up the page tables once and point all devices at the same page
> >tables?)
> >
> 
> That's what the ia64 fix patch does...the first pci device set up the
> page table. Then all following pci devices use that page table by
> setting up the context mapping.

I've sorted that out separately -- it's not necessarily part of this
fix, although it _would_ be nice to have it in 2.6.31, since your
original 1:1 mapping patch is quite slow (and takes quite a lot of
memory) without it.

This time, I _am_ going to let it bake in linux-next for a while first
though.

Until Linus pulls the IA64 build fix from the real iommu-2.6.git tree,
I've put this into git://git.infradead.org/~dwmw2/iommu-pt.git

-- 
David Woodhouse                            Open Source Technology Centre
David.Woodhouse@intel.com                              Intel Corporation


^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [PATCH v2] IA64 Compilation Error Fix for Intel IOMMU Identity Mapping Support
  2009-06-26 11:15                                           ` David Woodhouse
@ 2009-06-27  0:03                                             ` Chris Wright
  0 siblings, 0 replies; 61+ messages in thread
From: Chris Wright @ 2009-06-27  0:03 UTC (permalink / raw)
  To: David Woodhouse
  Cc: Chris Wright, tony.luck, linux-ia64, linux-kernel,
	FUJITA Tomonori, iommu, akpm, Linus Torvalds

* David Woodhouse (dwmw2@infradead.org) wrote:
> On Thu, 25 Jun 2009, Chris Wright wrote:
> 
> > * Linus Torvalds (torvalds@linux-foundation.org) wrote:
> >> Ok. Sounds like what I want is to get this patch through the ia64 people,
> >> after it has gotten some testing there. Since ia64 is where the current
> >> kernel fails anyway, that sounds like the motivation will be there too.
> >
> > Yup, I agree.
> 
> Tony, please could you test what's in git://git.infradead.org/iommu-2.6.git
> 
> It builds, but we'd like to check that it works correctly with iommu=pt on 
> IA64.

In the meantime, I booted this on an IA-64 box (w/out VT-d), forced it
to call intel_iommu_init() as if it had an IOMMU, and added a small bit
of debugging.  Looks like it's doing the right thing.

efi based
e000000001000000-e000000078000000
e00000007c000000-e00000007e990000
e00000007f300000-e00000007fda0000
e000000100000000-e00000047b7f0000
e00000047f800000-e00000047fdc0000
e00000047fe80000-e00000047ffb0000
zone based
1000000-100000000
100000000-47ffb0000
online node based
1000000-78000000
7c000000-7e990000
7f300000-7fda0000
100000000-47b7f0000
47f800000-47fdc0000
47fe80000-47ffb0000

The efi based mem walker is returning virtual addresses, Fenghua did
you test that one?  I'm surprised it worked.

This is quite similar to the type of output I see on x86_64 (except the
e820 map gives phys addrs, of course).

e820 based
10000-9d800
100000-bf341000
bf67e000-bf800000
100000000-340000000
zone based
10000-1000000
1000000-100000000
100000000-1c0000000
1c0000000-340000000
online node based
10000-9d000
100000-bf341000
bf67e000-bf800000
100000000-1c0000000
1c0000000-340000000

thanks,
-chris
---
debug patch for the really bored...

Index: linus-2.6/drivers/pci/intel-iommu.c
===================================================================
--- linus-2.6.orig/drivers/pci/intel-iommu.c
+++ linus-2.6/drivers/pci/intel-iommu.c
@@ -3110,10 +3110,66 @@ static int __init init_iommu_sysfs(void)
 }
 #endif	/* CONFIG_PM */
 
+#ifdef CONFIG_X86
+#include <asm/e820.h>
+static void __init print_memmap(void)
+{
+	int i;
+	printk("e820 based\n");
+	for (i = 0; i < e820.nr_map; i++) {
+		struct e820entry *ei = &e820.map[i];
+
+		if (ei->type == E820_RAM)
+			printk("%llx-%llx\n", ei->addr, ei->addr + ei->size);
+	}
+}
+#elif CONFIG_IA64
+#include <linux/efi.h>
+static int __init efi_print_map(u64 start, u64 end, void *unused)
+{
+	printk("%llx-%llx\n", start, end);
+	return 0;
+}
+
+static void __init print_memmap(void)
+{
+	printk("efi based\n");
+	efi_memmap_walk(efi_print_map, NULL);
+}
+#endif
+
+static int __init print_region(unsigned long start_pfn, unsigned long end_pfn,
+			       void *unused)
+{
+	printk("%lx-%lx\n", start_pfn << PAGE_SHIFT, end_pfn << PAGE_SHIFT);
+	return 0;
+}
+
+static void __init show_mem_regions(void)
+{
+	int  nid;
+	struct zone *zone;
+
+	print_memmap();
+	printk("zone based\n");
+	for_each_populated_zone(zone) {
+		unsigned long zone_end_pfn = zone->zone_start_pfn +
+					     zone->spanned_pages;
+		printk("%lx-%lx\n", zone->zone_start_pfn << PAGE_SHIFT,
+				    zone_end_pfn << PAGE_SHIFT);
+	}
+
+	printk("online node based\n");
+	for_each_online_node(nid)
+		work_with_active_regions(nid, print_region, NULL);
+}
+
 int __init intel_iommu_init(void)
 {
 	int ret = 0;
 
+	show_mem_regions();
+
 	if (dmar_table_init())
 		return 	-ENODEV;
 
Index: linus-2.6/arch/ia64/kernel/pci-dma.c
===================================================================
--- linus-2.6.orig/arch/ia64/kernel/pci-dma.c
+++ linus-2.6/arch/ia64/kernel/pci-dma.c
@@ -47,7 +47,7 @@ extern struct dma_map_ops intel_dma_ops;
 
 static int __init pci_iommu_init(void)
 {
-	if (iommu_detected)
+//	if (iommu_detected)
 		intel_iommu_init();
 
 	return 0;

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [PATCH v2] IA64 Compilation Error Fix for Intel IOMMU Identity Mapping Support
  2009-06-26  2:00                                       ` Linus Torvalds
  2009-06-26  2:08                                         ` Chris Wright
@ 2009-06-27 11:44                                         ` David Woodhouse
  1 sibling, 0 replies; 61+ messages in thread
From: David Woodhouse @ 2009-06-27 11:44 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Chris Wright, FUJITA Tomonori, fenghua.yu, akpm, tony.luck,
	linux-kernel, iommu, linux-ia64

On Thu, 2009-06-25 at 19:00 -0700, Linus Torvalds wrote:
> 
> On Thu, 25 Jun 2009, Chris Wright wrote:
> > I tested it on x86; works fine and creates a slightly tighter set of page
> > tables (matching to e820).  But I don't have an IA-64 box w/ an IOMMU.
> > 
> > It does have one problem, however, the graphics work around code isn't
> > compiled in on IA-64, so it needs:
> 
> Ok. Sounds like what I want is to get this patch through the ia64 people, 
> after it has gotten some testing there. Since ia64 is where the current 
> kernel fails anyway, that sounds like the motivation will be there too.
> 
> And there's little point in me merging it until it's been tested to fix 
> the actual problem we have now.

Well, the main problem is that the kernel doesn't build -- and it _has_
been tested to fix that.

As long as the daily linux-next test builds work, nobody really seems to
_care_ about VT-d on IA64 -- certainly not enough to get me a box when
they asked me to become VT-d maintainer.

Now that I chase it up, I've received only one test report and that was
"oh, even 2.6.29 doesn't boot on this box with VT-d enabled".

The patch _does_ fix the build breakage, definitely seems to be doing
the right thing, and the worst case is that the new feature doesn't work
on IA64. So I think we should merge it.

I'll bash some heads together and try to get myself some hardware so I
can actually test it on IA64 -- it looks like we have more problems
there than the small possibility that the new passthrough feature might
do the wrong thing (and in fact I suspect all IA64 VT-d hardware is new
enough to have the _hardware_ passthrough anyway, so won't need this
identity mapping set up at all).

Please pull Chris's fix from
	git://git.infradead.org/iommu-2.6.git

-- 
David Woodhouse                            Open Source Technology Centre
David.Woodhouse@intel.com                              Intel Corporation


^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [PATCH 1/2] IOMMU Identity Mapping Support: iommu_identity_mapping definition
  2009-06-18 18:13                 ` [PATCH 1/2] IOMMU Identity Mapping Support: iommu_identity_mapping definition Chris Wright
  2009-06-18 18:28                   ` Yu, Fenghua
@ 2009-07-04 18:40                   ` David Woodhouse
  1 sibling, 0 replies; 61+ messages in thread
From: David Woodhouse @ 2009-07-04 18:40 UTC (permalink / raw)
  To: Chris Wright
  Cc: Fenghua Yu, 'Linus Torvalds', 'Stephen Rothwell',
	'Andrew Morton', 'Ingo Molnar',
	'Christopher Wright', 'Allen Kay',
	'iommu', 'lkml'

On Thu, 2009-06-18 at 11:13 -0700, Chris Wright wrote:
> * Fenghua Yu (fenghua.yu@intel.com) wrote:
> > IOMMU Identity Mapping Support: iommu_identity_mapping definition
> > 
> > Identity mapping for IOMMU defines a single domain to 1:1 map all pci devices
> > to all usable memory.
> > 
> > This will reduces map/unmap overhead in DMA API's and improve IOMMU performance.
> > On 10Gb network cards, Netperf shows no performance degradation compared to
> > non-IOMMU performance.
> > 
> > This method may lose some of DMA remapping benefits like isolation.
> > 
> > The first patch defines iommu_identity_mapping varialbe which controls the
> > identity mapping code and is 0 by default.
> 
> The only real difference between "pt" and "identity" is hardware support.
> We should have a single value we don't have to tell users to do different
> things depending on their hardware (they won't even know what they have)
> to achieve the same result.

The _code_ ought to be a lot more shared than it is, too. Currently, the
hardware pass-through support has bugs that the software identity
mapping doesn't. It doesn't remove devices from the identity map if they
are limited to 32-bit DMA and a driver tries to set up mappings, which
is quite suboptimal. And it doesn't put them _back_ into the identity
map after they're detached from a VM, AFAICT.

I was going to fix that and unify the code paths, but then I found a bug
with the software identity mapping too -- if you have a PCI device which
is only capable of 32-bit DMA and it's behind a bridge (such as the
ohci1394 device on a Tylersburg SDV, although you'll have to hack the
kernel to pretend not to have the hardware PT support), it'll cause a
BUG() when it first sets up a mapping. What happens is this:

First it removes that device from si_domain because it can only address
4GiB of RAM, then get_domain_for_dev() will put it right back _in_ the
si_domain again, because it inherits its domain from the upstream PCI
bridge. And then we BUG() in domain_get_iommu() which _really_ doesn't
want to see the si_domain.

I _think_ this is the best fix for that...

>From 3dfc813d94bba2046c6aed216e0fd69ac93a8e03 Mon Sep 17 00:00:00 2001
From: David Woodhouse <David.Woodhouse@intel.com>
Date: Sat, 4 Jul 2009 19:11:08 +0100
Subject: [PATCH] intel-iommu: Don't use identity mapping for PCI devices behind bridges

Our current strategy for pass-through mode is to put all devices into
the 1:1 domain at startup (which is before we know what their dma_mask
will be), and only _later_ take them out of that domain, if it turns out
that they really can't address all of memory.

However, when there are a bunch of PCI devices behind a bridge, they all
end up with the same source-id on their DMA transactions, and hence in
the same IOMMU domain. This means that we _can't_ easily move them from
the 1:1 domain into their own domain at runtime, because there might be DMA
in-flight from their siblings.

So we have to adjust our pass-through strategy: For PCI devices not on
the root bus, and for the bridges which will take responsibility for
their transactions, we have to start up _out_ of the 1:1 domain, just in
case.

This fixes the BUG() we see when we have 32-bit-capable devices behind a
PCI-PCI bridge, and use the software identity mapping.

It does mean that we might end up using 'normal' mapping mode for some
devices which could actually live with the faster 1:1 mapping -- but
this is only for PCI devices behind bridges, which presumably aren't the
devices for which people are most concerned about performance.

Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>
---
 drivers/pci/intel-iommu.c |   30 ++++++++++++++++++++++++++++++
 1 files changed, 30 insertions(+), 0 deletions(-)

diff --git a/drivers/pci/intel-iommu.c b/drivers/pci/intel-iommu.c
index f9fc4f3..360fb67 100644
--- a/drivers/pci/intel-iommu.c
+++ b/drivers/pci/intel-iommu.c
@@ -2122,6 +2122,36 @@ static int iommu_should_identity_map(struct pci_dev *pdev, int startup)
 	if (iommu_identity_mapping == 2)
 		return IS_GFX_DEVICE(pdev);
 
+	/*
+	 * We want to start off with all devices in the 1:1 domain, and
+	 * take them out later if we find they can't access all of memory.
+	 *
+	 * However, we can't do this for PCI devices behind bridges,
+	 * because all PCI devices behind the same bridge will end up
+	 * with the same source-id on their transactions.
+	 *
+	 * Practically speaking, we can't change things around for these
+	 * devices at run-time, because we can't be sure there'll be no
+	 * DMA transactions in flight for any of their siblings.
+	 * 
+	 * So PCI devices (unless they're on the root bus) as well as
+	 * their parent PCI-PCI or PCIe-PCI bridges must be left _out_ of
+	 * the 1:1 domain, just in _case_ one of their siblings turns out
+	 * not to be able to map all of memory.
+	 */
+	if (!pdev->is_pcie) {
+		if (!pci_is_root_bus(pdev->bus))
+			return 0;
+		if (pdev->class >> 8 == PCI_CLASS_BRIDGE_PCI)
+			return 0;
+	} else if (pdev->pcie_type == PCI_EXP_TYPE_PCI_BRIDGE)
+		return 0;
+
+	/* 
+	 * At boot time, we don't yet know if devices will be 64-bit capable.
+	 * Assume that they will -- if they turn out not to be, then we can 
+	 * take them out of the 1:1 domain later.
+	 */
 	if (!startup)
 		return pdev->dma_mask > DMA_BIT_MASK(32);
 
-- 
1.6.2.5


-- 
David Woodhouse                            Open Source Technology Centre
David.Woodhouse@intel.com                              Intel Corporation


^ permalink raw reply related	[flat|nested] 61+ messages in thread

end of thread, other threads:[~2009-07-04 18:40 UTC | newest]

Thread overview: 61+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
     [not found] <20090327212241.234500000@intel.com>
2009-03-28 14:24 ` [patch 0/4] Intel IOMMU Supspend/Resume Support Andrew Lutomirski
2009-03-30 23:01 ` David Woodhouse
     [not found] ` <20090327212321.520992000@intel.com>
2009-04-03 12:37   ` [patch 4/4] Intel IOMMU Suspend/Resume Support - Code Clean Up David Woodhouse
     [not found] ` <20090327212321.070229000@intel.com>
2009-04-16  0:19   ` [PATCH] Intel IOMMU Pass Through Support Fenghua Yu
2009-04-16  2:13     ` Han, Weidong
2009-04-19 10:05     ` David Woodhouse
2009-04-20 17:27       ` Yu, Fenghua
2009-05-13 23:13         ` [PATCH] Fix Intel IOMMU Compilation Warnings on IA64 Fenghua Yu
2009-05-14 15:17           ` David Woodhouse
2009-05-14 15:31             ` Matthew Wilcox
2009-05-14 17:59             ` Fenghua Yu
2009-06-18 18:05               ` [PATCH 1/2] IOMMU Identity Mapping Support: iommu_identity_mapping definition Fenghua Yu
2009-06-18 18:08                 ` Muli Ben-Yehuda
2009-06-18 18:13                   ` Chris Wright
2009-06-18 18:14                   ` Yu, Fenghua
2009-06-18 18:25                     ` Muli Ben-Yehuda
2009-06-18 18:31                       ` Chris Wright
2009-06-18 18:41                         ` Muli Ben-Yehuda
2009-06-18 18:50                           ` Yu, Fenghua
2009-06-18 18:51                           ` Chris Wright
2009-06-18 19:09                             ` Yu, Fenghua
2009-06-25  0:38                     ` [PATCH] IA64 Compilation Error Fix for Intel IOMMU Identity Mapping Support Fenghua Yu
2009-06-25  1:00                       ` FUJITA Tomonori
2009-06-25  4:16                         ` [PATCH v2] " Fenghua Yu
2009-06-25  4:48                           ` FUJITA Tomonori
2009-06-25  7:11                             ` David Woodhouse
2009-06-25 21:52                               ` David Woodhouse
2009-06-25 21:56                                 ` Yu, Fenghua
2009-06-26 18:21                                   ` David Woodhouse
2009-06-25 22:00                                 ` Linus Torvalds
2009-06-25 22:46                                   ` Tony Luck
2009-06-25 23:43                                 ` Chris Wright
2009-06-26  1:35                                   ` Linus Torvalds
2009-06-26  1:52                                     ` Chris Wright
2009-06-26  2:00                                       ` Linus Torvalds
2009-06-26  2:08                                         ` Chris Wright
2009-06-26 11:15                                           ` David Woodhouse
2009-06-27  0:03                                             ` Chris Wright
2009-06-27 11:44                                         ` David Woodhouse
2009-06-18 18:13                 ` [PATCH 1/2] IOMMU Identity Mapping Support: iommu_identity_mapping definition Chris Wright
2009-06-18 18:28                   ` Yu, Fenghua
2009-06-18 18:34                     ` Chris Wright
2009-07-04 18:40                   ` David Woodhouse
2009-05-20 17:42         ` [PATCH] Time out for possible dead loops during queued invalidation wait Fenghua Yu
2009-05-27  5:51           ` Andrew Morton
2009-05-27 22:40             ` Yu, Fenghua
2009-05-27 22:48               ` Andrew Morton
2009-05-27 23:25                 ` Yu, Fenghua
2009-05-27 23:51                   ` Andrew Morton
2009-05-28  0:47                     ` Yu, Fenghua
2009-06-18 18:05               ` [PATCH 2/2] IOMMU Identity Mapping Support: Intel IOMMU implementation Fenghua Yu
2009-06-18 19:15                 ` Chris Wright
2009-06-18 19:40                   ` Yu, Fenghua
2009-06-18 20:02                     ` Chris Wright
2009-06-19 20:47                     ` [PATCH v2] IOMMU Identity Mapping Support (drivers/pci/intel_iommu.c) Fenghua Yu
2009-04-30 23:29     ` [PATCH] Intel IOMMU Pass Through Support Andrew Morton
2009-04-30 23:37       ` Randy Dunlap
2009-05-01  0:00         ` Andrew Morton
2009-05-01  0:57           ` Fenghua Yu
2009-05-01  0:05         ` Fenghua Yu
2009-05-01  0:14           ` Andrew Morton

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).