All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH] dma-debug: New interfaces to debug dma mapping errors
@ 2012-09-17  0:52 Shuah Khan
  2012-09-17  2:07 ` Greg KH
                   ` (2 more replies)
  0 siblings, 3 replies; 30+ messages in thread
From: Shuah Khan @ 2012-09-17  0:52 UTC (permalink / raw)
  To: konrad.wilk, joerg.roedel, tglx, mingo, hpa, rob, akpm, bhelgaas, stern
  Cc: LKML, linux-doc, devel, shuahkhan, x86

A recent dma mapping error analysis effort showed that a large percentage
of dma_map_single() and dma_map_page() returns are not checked for mapping
errors.

Reference:
http://linuxdriverproject.org/mediawiki/index.php/DMA_Mapping_Error_Analysis

Adding support for tracking dma mapping and unmapping errors to help assess
the following:

When do dma mapping errors get detected?
How often do these errors occur?
Why don't we see failures related to missing dma mapping error checks?
Are they silent failures?

Four new fields are added to struct device when CONFIG_DMA_API_DEBUG is
enabled, to track the following:

dma_map_errors:
  Total number of dma mapping errors returned by the dma mapping interfaces,
  in response to mapping requests from this device.
dma_map_errors_not_checked:
  Total number of dma mapping errors the device failed to check before using
  the returned address.
dma_unmap_errors:
  Total number of times the device tried to unmap or free an invalid dma
  address.
iotlb_overflow_cnt:
  Tracks how many times a swiotlb overflow buffer is returned to this device
  when regular iotlb is full.

Enhancements to dma-debug api are made to add new debugfs interfaces to
report total dma errors, dma errors that are not checked, and unmap errors
for the entire system. Please note that these are counts for all devices in
the system.

The following new dma-debug interfaces are added:

debug_dma_mapping_error(struct device *dev, dma_addr_t dma_addr):
	Tracks dma mapping errors checked by the device. It decrements
	the dma_map_errors_not_checked counter that was incremented by
	debug_dma_map_page() when it checked for errors.
debug_dma_dump_map_errors(struct device *dev, int all):
	Allows dump of dma mapping error summary or just the errors if any.

The following existing dma-debug api are changed to support this feature:
debug_dma_map_page():
	Increments dma_map_errors and dma_map_errors_not_checked errors for
	the current device as well as totals for the system, dma-debug api
	keeps track of, when dma_addr is invalid. Please note that this
	routine can no longer call dma_mapping_error(), because of the newly
	added debug_dma_mapping_error() interface. Calling this routine at the
	time dma error unchecked state is registered, will not help if state
	gets changed right away.
check_unmap():
	This is an existing internal routines that checks for unmap errors,
	changed to increment dma_unmap_errors for the current device, as well
	as the dma_unmap_errors counter for the system, dma-debug api keeps
	track of, when a device requests an invalid address to be unmapped.
	Please note that this routine can no longer call dma_mapping_error(),
	because of the newly added debug_dma_mapping_error() interface. Calling
	dma_mapping_error() from this routine will decrement
	dma_map_errors_not_checked counter incorrectly.

The following new swiotlb interface is changed:
swiotlb_map_page():
	Increments iotlb_overflow_cnt for the device when iotlb overflow
	buffer is returned when swiotlb is full.

Changed arch/x86/include/asm/dma-mapping.h to call debug_dma_mapping_error()
to validate these new interfaces on x86_64. Other architectures will be
changed in a subsequent patch.

The current dma-debug infrastructure is designed to track dma mappings, and
debug entries are added only for correctly mapped addresses and not when
mapping fails. Enhancing the current infrastructure to track failed mappings
will result in unnecessary complexity. The approach to add counters to
struct device eliminates the need for maintaining failed mappings in dma-debug
infrastructure and is cleaner and simpler without impacting the existing
dma-debug infrastructure.

Signed-off-by: Shuah Khan <shuah.khan@hp.com>
---
 Documentation/DMA-API.txt          |   13 +++++
 arch/x86/include/asm/dma-mapping.h |    1 +
 include/linux/device.h             |    6 ++
 include/linux/dma-debug.h          |   13 +++++
 lib/dma-debug.c                    |  106 ++++++++++++++++++++++++++++++++++--
 lib/swiotlb.c                      |    3 +
 6 files changed, 136 insertions(+), 6 deletions(-)

diff --git a/Documentation/DMA-API.txt b/Documentation/DMA-API.txt
index 66bd97a..59d58b9 100644
--- a/Documentation/DMA-API.txt
+++ b/Documentation/DMA-API.txt
@@ -638,6 +638,19 @@ this directory the following files can currently be found:
 	dma-api/error_count	This file is read-only and shows the total
 				numbers of errors found.
 
+	dma-api/dma_map_errors  This file is read-only and shows the total
+				number of dma mapping errors detected.
+
+	dma-api/dma_map_errors_not_checked
+				This file is read-only and shows the total
+				number of dma mapping errors, device drivers
+				failed to check prior to using the returned
+				address.
+
+	dma-api/dma_unmap_errors
+				This file is read-only and shows the total
+				number of invalid dma unmapping attempts.
+
 	dma-api/num_errors	The number in this file shows how many
 				warnings will be printed to the kernel log
 				before it stops. This number is initialized to
diff --git a/arch/x86/include/asm/dma-mapping.h b/arch/x86/include/asm/dma-mapping.h
index f7b4c79..808dae6 100644
--- a/arch/x86/include/asm/dma-mapping.h
+++ b/arch/x86/include/asm/dma-mapping.h
@@ -47,6 +47,7 @@ static inline struct dma_map_ops *get_dma_ops(struct device *dev)
 static inline int dma_mapping_error(struct device *dev, dma_addr_t dma_addr)
 {
 	struct dma_map_ops *ops = get_dma_ops(dev);
+	debug_dma_mapping_error(dev, dma_addr);
 	if (ops->mapping_error)
 		return ops->mapping_error(dev, dma_addr);
 
diff --git a/include/linux/device.h b/include/linux/device.h
index dd0d684..38d8917 100644
--- a/include/linux/device.h
+++ b/include/linux/device.h
@@ -693,6 +693,12 @@ struct device {
 
 	void	(*release)(struct device *dev);
 	struct iommu_group	*iommu_group;
+#ifdef CONFIG_DMA_API_DEBUG
+	int			dma_map_errors;
+	int			dma_map_errors_not_checked;
+	int			dma_unmap_errors;
+	int			iotlb_overflow_cnt;
+#endif
 };
 
 static inline struct device *kobj_to_dev(struct kobject *kobj)
diff --git a/include/linux/dma-debug.h b/include/linux/dma-debug.h
index 171ad8a..6bec75b 100644
--- a/include/linux/dma-debug.h
+++ b/include/linux/dma-debug.h
@@ -39,6 +39,8 @@ extern void debug_dma_map_page(struct device *dev, struct page *page,
 			       int direction, dma_addr_t dma_addr,
 			       bool map_single);
 
+extern void debug_dma_mapping_error(struct device *dev, dma_addr_t dma_addr);
+
 extern void debug_dma_unmap_page(struct device *dev, dma_addr_t addr,
 				 size_t size, int direction, bool map_single);
 
@@ -83,6 +85,8 @@ extern void debug_dma_sync_sg_for_device(struct device *dev,
 
 extern void debug_dma_dump_mappings(struct device *dev);
 
+extern void debug_dma_dump_map_errors(struct device *dev, int all);
+
 #else /* CONFIG_DMA_API_DEBUG */
 
 static inline void dma_debug_add_bus(struct bus_type *bus)
@@ -105,6 +109,11 @@ static inline void debug_dma_map_page(struct device *dev, struct page *page,
 {
 }
 
+static inline void debug_dma_mapping_error(struct device *dev,
+					  dma_addr_t dma_addr)
+{
+}
+
 static inline void debug_dma_unmap_page(struct device *dev, dma_addr_t addr,
 					size_t size, int direction,
 					bool map_single)
@@ -176,6 +185,10 @@ static inline void debug_dma_dump_mappings(struct device *dev)
 {
 }
 
+static inline void debug_dma_dump_map_errors(struct device *dev, int all)
+{
+}
+
 #endif /* CONFIG_DMA_API_DEBUG */
 
 #endif /* __DMA_DEBUG_H */
diff --git a/lib/dma-debug.c b/lib/dma-debug.c
index 66ce414..d44e283 100644
--- a/lib/dma-debug.c
+++ b/lib/dma-debug.c
@@ -83,6 +83,11 @@ static u32 global_disable __read_mostly;
 /* Global error count */
 static u32 error_count;
 
+/* dma mapping error counts */
+static u32 dma_map_errors;
+static u32 dma_map_errors_not_checked;
+static u32 dma_unmap_errors;
+
 /* Global error show enable*/
 static u32 show_all_errors __read_mostly;
 /* Number of errors to show */
@@ -104,6 +109,9 @@ static struct dentry *show_num_errors_dent  __read_mostly;
 static struct dentry *num_free_entries_dent __read_mostly;
 static struct dentry *min_free_entries_dent __read_mostly;
 static struct dentry *filter_dent           __read_mostly;
+static struct dentry *dma_map_errors_dent   __read_mostly;
+static struct dentry *dma_map_errors_not_checked_dent   __read_mostly;
+static struct dentry *dma_unmap_errors_dent __read_mostly;
 
 /* per-driver filter related state */
 
@@ -365,12 +373,57 @@ static void hash_bucket_del(struct dma_debug_entry *entry)
 }
 
 /*
+ * Dump map/unmap errors for debugging purposes
+ */
+void debug_dma_dump_map_errors(struct device *dev, int all)
+{
+	if (all) {
+		dev_info(dev,
+			 "DMA-API: DMA map error summary:\n"
+			 "DMA map errors returned = %d\n"
+			 "DMA map errors not checked = %d\n"
+			 "DMA unmap errors = %d\n"
+			 "SWIOTLB overflow triggers = %d\n",
+			 dev->dma_map_errors,
+			 dev->dma_map_errors_not_checked,
+			 dev->dma_unmap_errors,
+			 dev->iotlb_overflow_cnt);
+		return;
+	}
+
+	if (dev->dma_map_errors_not_checked) {
+		dev_err(dev,
+			"DMA-API: Driver failed to check\n"
+			"%d out of a total of %d DMA map errors returned\n"
+			"by DMA mapping interfaces\n",
+			dev->dma_map_errors_not_checked,
+			dev->dma_map_errors);
+	}
+
+	if (dev->dma_unmap_errors) {
+		dev_err(dev,
+			"DMA-API: Driver tried to free invalid\n"
+			"DMA addresses %d times\n",
+			dev->dma_unmap_errors);
+	}
+
+	if (dev->iotlb_overflow_cnt) {
+		dev_err(dev,
+			"DMA-API: SWIOTLB overflow buffer triggered %d times\n",
+			dev->iotlb_overflow_cnt);
+	}
+}
+EXPORT_SYMBOL(debug_dma_dump_map_errors);
+
+/*
  * Dump mapping entries for debugging purposes
  */
 void debug_dma_dump_mappings(struct device *dev)
 {
 	int idx;
 
+	debug_dma_dump_map_errors(dev, 1);
+
 	for (idx = 0; idx < HASH_SIZE; idx++) {
 		struct hash_bucket *bucket = &dma_entry_hash[idx];
 		struct dma_debug_entry *entry;
@@ -695,6 +748,28 @@ static int dma_debug_fs_init(void)
 	if (!filter_dent)
 		goto out_err;
 
+	dma_map_errors_dent = debugfs_create_u32("dma_map_errors", 0444,
+			dma_debug_dent,
+			&dma_map_errors);
+
+	if (!dma_map_errors_dent)
+		goto out_err;
+
+	dma_map_errors_not_checked_dent = debugfs_create_u32(
+			"dma_map_errors_not_checked",
+			0444,
+			dma_debug_dent,
+			&dma_map_errors_not_checked);
+
+	if (!dma_map_errors_not_checked_dent)
+		goto out_err;
+
+	dma_unmap_errors_dent = debugfs_create_u32("dma_unmap_errors", 0444,
+			dma_debug_dent,
+			&dma_unmap_errors);
+	if (!dma_unmap_errors_dent)
+		goto out_err;
+
 	return 0;
 
 out_err:
@@ -843,13 +918,15 @@ static __init int dma_debug_entries_cmdline(char *str)
 __setup("dma_debug=", dma_debug_cmdline);
 __setup("dma_debug_entries=", dma_debug_entries_cmdline);
 
-static void check_unmap(struct dma_debug_entry *ref)
+static void check_unmap(struct device *dev, struct dma_debug_entry *ref)
 {
 	struct dma_debug_entry *entry;
 	struct hash_bucket *bucket;
 	unsigned long flags;
 
-	if (dma_mapping_error(ref->dev, ref->dev_addr)) {
+	if (ref->dev_addr == DMA_ERROR_CODE) {
+		dev->dma_unmap_errors += 1;
+		dma_unmap_errors += 1;
 		err_printk(ref->dev, NULL, "DMA-API: device driver tries "
 			   "to free an invalid DMA memory address\n");
 		return;
@@ -1022,8 +1099,13 @@ void debug_dma_map_page(struct device *dev, struct page *page, size_t offset,
 	if (unlikely(global_disable))
 		return;
 
-	if (unlikely(dma_mapping_error(dev, dma_addr)))
+	if (dma_addr == DMA_ERROR_CODE) {
+		dma_map_errors += 1;
+		dev->dma_map_errors += 1;
+		dma_map_errors_not_checked += 1;
+		dev->dma_map_errors_not_checked += 1;
 		return;
+	}
 
 	entry = dma_entry_alloc();
 	if (!entry)
@@ -1050,6 +1132,18 @@ void debug_dma_map_page(struct device *dev, struct page *page, size_t offset,
 }
 EXPORT_SYMBOL(debug_dma_map_page);
 
+void debug_dma_mapping_error(struct device *dev, dma_addr_t dma_addr)
+{
+	if (unlikely(global_disable))
+		return;
+
+	if (dma_addr == DMA_ERROR_CODE) {
+		dma_map_errors_not_checked -= 1;
+		dev->dma_map_errors_not_checked -= 1;
+	}
+}
+EXPORT_SYMBOL(debug_dma_mapping_error);
+
 void debug_dma_unmap_page(struct device *dev, dma_addr_t addr,
 			  size_t size, int direction, bool map_single)
 {
@@ -1067,7 +1161,7 @@ void debug_dma_unmap_page(struct device *dev, dma_addr_t addr,
 	if (map_single)
 		ref.type = dma_debug_single;
 
-	check_unmap(&ref);
+	check_unmap(dev, &ref);
 }
 EXPORT_SYMBOL(debug_dma_unmap_page);
 
@@ -1151,7 +1245,7 @@ void debug_dma_unmap_sg(struct device *dev, struct scatterlist *sglist,
 		if (!i)
 			mapped_ents = get_nr_mapped_entries(dev, &ref);
 
-		check_unmap(&ref);
+		check_unmap(dev, &ref);
 	}
 }
 EXPORT_SYMBOL(debug_dma_unmap_sg);
@@ -1197,7 +1291,7 @@ void debug_dma_free_coherent(struct device *dev, size_t size,
 	if (unlikely(global_disable))
 		return;
 
-	check_unmap(&ref);
+	check_unmap(dev, &ref);
 }
 EXPORT_SYMBOL(debug_dma_free_coherent);
 
diff --git a/lib/swiotlb.c b/lib/swiotlb.c
index 45bc1f8..ebf2d07 100644
--- a/lib/swiotlb.c
+++ b/lib/swiotlb.c
@@ -682,6 +682,9 @@ dma_addr_t swiotlb_map_page(struct device *dev, struct page *page,
 	if (!map) {
 		swiotlb_full(dev, size, dir, 1);
 		map = io_tlb_overflow_buffer;
+#ifdef CONFIG_DMA_API_DEBUG
+		dev->iotlb_overflow_cnt += 1;
+#endif
 	}
 
 	dev_addr = swiotlb_virt_to_bus(dev, map);
-- 
1.7.9.5




^ permalink raw reply related	[flat|nested] 30+ messages in thread

* Re: [PATCH] dma-debug: New interfaces to debug dma mapping errors
  2012-09-17  0:52 [PATCH] dma-debug: New interfaces to debug dma mapping errors Shuah Khan
@ 2012-09-17  2:07 ` Greg KH
  2012-09-17 14:45   ` Shuah Khan
  2012-09-17 13:39 ` Konrad Rzeszutek Wilk
  2012-09-26  1:05 ` [PATCH v2] " Shuah Khan
  2 siblings, 1 reply; 30+ messages in thread
From: Greg KH @ 2012-09-17  2:07 UTC (permalink / raw)
  To: Shuah Khan
  Cc: konrad.wilk, joerg.roedel, tglx, mingo, hpa, rob, akpm, bhelgaas,
	stern, shuahkhan, devel, x86, LKML, linux-doc

On Sun, Sep 16, 2012 at 06:52:51PM -0600, Shuah Khan wrote:
> +void debug_dma_dump_map_errors(struct device *dev, int all)
> +{
> +	if (all) {
> +		dev_info(dev,
> +			 "DMA-API: DMA map error summary:\n"
> +			 "DMA map errors returned = %d\n"
> +			 "DMA map errors not checked = %d\n"
> +			 "DMA unmap errors = %d\n"
> +			 "SWIOTLB overflow triggers = %d\n",
> +			 dev->dma_map_errors,
> +			 dev->dma_map_errors_not_checked,
> +			 dev->dma_unmap_errors,
> +			 dev->iotlb_overflow_cnt);
> +		return;
> +	}
> +
> +	if (dev->dma_map_errors_not_checked) {
> +		dev_err(dev,
> +			"DMA-API: Driver failed to check\n"
> +			"%d out of a total of %d DMA map errors returned\n"
> +			"by DMA mapping interfaces\n",
> +			dev->dma_map_errors_not_checked,
> +			dev->dma_map_errors);
> +	}
> +
> +	if (dev->dma_unmap_errors) {
> +		dev_err(dev,
> +			"DMA-API: Driver tried to free invalid\n"
> +			"DMA addresses %d times\n",
> +			dev->dma_unmap_errors);
> +	}
> +
> +	if (dev->iotlb_overflow_cnt) {
> +		dev_err(dev,
> +			"DMA-API: SWIOTLB overflow buffer triggered %d times\n",
> +			dev->iotlb_overflow_cnt);
> +	}
> +}
> +EXPORT_SYMBOL(debug_dma_dump_map_errors);

Don't use syslog for stuff like this, that's not good.  Why not use
debugfs?

greg k-h

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH] dma-debug: New interfaces to debug dma mapping errors
  2012-09-17  0:52 [PATCH] dma-debug: New interfaces to debug dma mapping errors Shuah Khan
  2012-09-17  2:07 ` Greg KH
@ 2012-09-17 13:39 ` Konrad Rzeszutek Wilk
  2012-09-17 15:52   ` Shuah Khan
  2012-09-26  1:05 ` [PATCH v2] " Shuah Khan
  2 siblings, 1 reply; 30+ messages in thread
From: Konrad Rzeszutek Wilk @ 2012-09-17 13:39 UTC (permalink / raw)
  To: Shuah Khan
  Cc: joerg.roedel, tglx, mingo, hpa, rob, akpm, bhelgaas, stern, LKML,
	linux-doc, devel, shuahkhan, x86

On Sun, Sep 16, 2012 at 06:52:51PM -0600, Shuah Khan wrote:
> A recent dma mapping error analysis effort showed that a large percentage
> of dma_map_single() and dma_map_page() returns are not checked for mapping
> errors.
> 
> Reference:
> http://linuxdriverproject.org/mediawiki/index.php/DMA_Mapping_Error_Analysis
> 
> Adding support for tracking dma mapping and unmapping errors to help assess
> the following:
> 
> When do dma mapping errors get detected?
> How often do these errors occur?
> Why don't we see failures related to missing dma mapping error checks?
> Are they silent failures?
> 
> Four new fields are added to struct device when CONFIG_DMA_API_DEBUG is
> enabled, to track the following:
> 
> dma_map_errors:
>   Total number of dma mapping errors returned by the dma mapping interfaces,
>   in response to mapping requests from this device.
> dma_map_errors_not_checked:
>   Total number of dma mapping errors the device failed to check before using
>   the returned address.
> dma_unmap_errors:
>   Total number of times the device tried to unmap or free an invalid dma
>   address.
> iotlb_overflow_cnt:
>   Tracks how many times a swiotlb overflow buffer is returned to this device
>   when regular iotlb is full.
> 
> Enhancements to dma-debug api are made to add new debugfs interfaces to
> report total dma errors, dma errors that are not checked, and unmap errors
> for the entire system. Please note that these are counts for all devices in
> the system.
> 
> The following new dma-debug interfaces are added:
> 
> debug_dma_mapping_error(struct device *dev, dma_addr_t dma_addr):
> 	Tracks dma mapping errors checked by the device. It decrements
> 	the dma_map_errors_not_checked counter that was incremented by
> 	debug_dma_map_page() when it checked for errors.
> debug_dma_dump_map_errors(struct device *dev, int all):
> 	Allows dump of dma mapping error summary or just the errors if any.
> 
> The following existing dma-debug api are changed to support this feature:
> debug_dma_map_page():
> 	Increments dma_map_errors and dma_map_errors_not_checked errors for
> 	the current device as well as totals for the system, dma-debug api
> 	keeps track of, when dma_addr is invalid. Please note that this
> 	routine can no longer call dma_mapping_error(), because of the newly
> 	added debug_dma_mapping_error() interface. Calling this routine at the
> 	time dma error unchecked state is registered, will not help if state
> 	gets changed right away.
> check_unmap():
> 	This is an existing internal routines that checks for unmap errors,
> 	changed to increment dma_unmap_errors for the current device, as well
> 	as the dma_unmap_errors counter for the system, dma-debug api keeps
> 	track of, when a device requests an invalid address to be unmapped.
> 	Please note that this routine can no longer call dma_mapping_error(),
> 	because of the newly added debug_dma_mapping_error() interface. Calling
> 	dma_mapping_error() from this routine will decrement
> 	dma_map_errors_not_checked counter incorrectly.


I like the direction of this patch. That said I am wondering why you
choose to do it this way? Was there no way to have all of the logic within
debug dma file, and within check_unmap?

> 
> The following new swiotlb interface is changed:
> swiotlb_map_page():
> 	Increments iotlb_overflow_cnt for the device when iotlb overflow
> 	buffer is returned when swiotlb is full.
> 
> Changed arch/x86/include/asm/dma-mapping.h to call debug_dma_mapping_error()
> to validate these new interfaces on x86_64. Other architectures will be
> changed in a subsequent patch.
> 
> The current dma-debug infrastructure is designed to track dma mappings, and
> debug entries are added only for correctly mapped addresses and not when
> mapping fails. Enhancing the current infrastructure to track failed mappings
> will result in unnecessary complexity. The approach to add counters to

What is the extra complexity? Can you explain as if I was a newbie to debug DMA
API - perhaps there is still some hope in doing it there?

> struct device eliminates the need for maintaining failed mappings in dma-debug
> infrastructure and is cleaner and simpler without impacting the existing
> dma-debug infrastructure.

Could you explain please why it would be more difficult to do it in the existing
dma-debug infrastructure?

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH] dma-debug: New interfaces to debug dma mapping errors
  2012-09-17  2:07 ` Greg KH
@ 2012-09-17 14:45   ` Shuah Khan
  2012-09-17 15:25     ` Greg KH
  0 siblings, 1 reply; 30+ messages in thread
From: Shuah Khan @ 2012-09-17 14:45 UTC (permalink / raw)
  To: Greg KH
  Cc: konrad.wilk, joerg.roedel, tglx, mingo, hpa, rob, akpm, bhelgaas,
	stern, devel, x86, LKML, linux-doc, shuahkhan

On Sun, 2012-09-16 at 19:07 -0700, Greg KH wrote:
> On Sun, Sep 16, 2012 at 06:52:51PM -0600, Shuah Khan wrote:
> > +void debug_dma_dump_map_errors(struct device *dev, int all)
> > +{
> > +	if (all) {
> > +		dev_info(dev,
> > +			 "DMA-API: DMA map error summary:\n"
> > +			 "DMA map errors returned = %d\n"
> > +			 "DMA map errors not checked = %d\n"
> > +			 "DMA unmap errors = %d\n"
> > +			 "SWIOTLB overflow triggers = %d\n",
> > +			 dev->dma_map_errors,
> > +			 dev->dma_map_errors_not_checked,
> > +			 dev->dma_unmap_errors,
> > +			 dev->iotlb_overflow_cnt);
> > +		return;
> > +	}
> > +
> > +	if (dev->dma_map_errors_not_checked) {
> > +		dev_err(dev,
> > +			"DMA-API: Driver failed to check\n"
> > +			"%d out of a total of %d DMA map errors returned\n"
> > +			"by DMA mapping interfaces\n",
> > +			dev->dma_map_errors_not_checked,
> > +			dev->dma_map_errors);
> > +	}
> > +
> > +	if (dev->dma_unmap_errors) {
> > +		dev_err(dev,
> > +			"DMA-API: Driver tried to free invalid\n"
> > +			"DMA addresses %d times\n",
> > +			dev->dma_unmap_errors);
> > +	}
> > +
> > +	if (dev->iotlb_overflow_cnt) {
> > +		dev_err(dev,
> > +			"DMA-API: SWIOTLB overflow buffer triggered %d times\n",
> > +			dev->iotlb_overflow_cnt);
> > +	}
> > +}
> > +EXPORT_SYMBOL(debug_dma_dump_map_errors);
> 
> Don't use syslog for stuff like this, that's not good.  Why not use
> debugfs?

This debug interface debug_dma_dump_map_errors() is enabled only when
CONFIG_DMA_API_DEBUG is enabled, however I agree that debugfs is a
better choice than logging to syslog. I created debugfs interfaces for
these error counts at system level. Are recommending the same approach
for per device counts as well? 

-- Shuah


^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH] dma-debug: New interfaces to debug dma mapping errors
  2012-09-17 14:45   ` Shuah Khan
@ 2012-09-17 15:25     ` Greg KH
  0 siblings, 0 replies; 30+ messages in thread
From: Greg KH @ 2012-09-17 15:25 UTC (permalink / raw)
  To: Shuah Khan
  Cc: konrad.wilk, joerg.roedel, tglx, mingo, hpa, rob, akpm, bhelgaas,
	stern, devel, x86, LKML, linux-doc, shuahkhan

On Mon, Sep 17, 2012 at 08:45:58AM -0600, Shuah Khan wrote:
> On Sun, 2012-09-16 at 19:07 -0700, Greg KH wrote:
> > On Sun, Sep 16, 2012 at 06:52:51PM -0600, Shuah Khan wrote:
> > > +void debug_dma_dump_map_errors(struct device *dev, int all)
> > > +{
> > > +	if (all) {
> > > +		dev_info(dev,
> > > +			 "DMA-API: DMA map error summary:\n"
> > > +			 "DMA map errors returned = %d\n"
> > > +			 "DMA map errors not checked = %d\n"
> > > +			 "DMA unmap errors = %d\n"
> > > +			 "SWIOTLB overflow triggers = %d\n",
> > > +			 dev->dma_map_errors,
> > > +			 dev->dma_map_errors_not_checked,
> > > +			 dev->dma_unmap_errors,
> > > +			 dev->iotlb_overflow_cnt);
> > > +		return;
> > > +	}
> > > +
> > > +	if (dev->dma_map_errors_not_checked) {
> > > +		dev_err(dev,
> > > +			"DMA-API: Driver failed to check\n"
> > > +			"%d out of a total of %d DMA map errors returned\n"
> > > +			"by DMA mapping interfaces\n",
> > > +			dev->dma_map_errors_not_checked,
> > > +			dev->dma_map_errors);
> > > +	}
> > > +
> > > +	if (dev->dma_unmap_errors) {
> > > +		dev_err(dev,
> > > +			"DMA-API: Driver tried to free invalid\n"
> > > +			"DMA addresses %d times\n",
> > > +			dev->dma_unmap_errors);
> > > +	}
> > > +
> > > +	if (dev->iotlb_overflow_cnt) {
> > > +		dev_err(dev,
> > > +			"DMA-API: SWIOTLB overflow buffer triggered %d times\n",
> > > +			dev->iotlb_overflow_cnt);
> > > +	}
> > > +}
> > > +EXPORT_SYMBOL(debug_dma_dump_map_errors);
> > 
> > Don't use syslog for stuff like this, that's not good.  Why not use
> > debugfs?
> 
> This debug interface debug_dma_dump_map_errors() is enabled only when
> CONFIG_DMA_API_DEBUG is enabled, however I agree that debugfs is a
> better choice than logging to syslog. I created debugfs interfaces for
> these error counts at system level. Are recommending the same approach
> for per device counts as well? 

No, I have no idea about that (but hint, I really don't like adding
stuff to struct device for something like this, especially with a #ifdef
in it, that seems really wrong here.)  All I wanted to point out was
that do not use the syslog for this, use debugfs instead.

greg k-h

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH] dma-debug: New interfaces to debug dma mapping errors
  2012-09-17 13:39 ` Konrad Rzeszutek Wilk
@ 2012-09-17 15:52   ` Shuah Khan
  2012-09-17 17:23     ` Konrad Rzeszutek Wilk
  0 siblings, 1 reply; 30+ messages in thread
From: Shuah Khan @ 2012-09-17 15:52 UTC (permalink / raw)
  To: Konrad Rzeszutek Wilk, Greg KH
  Cc: joerg.roedel, tglx, mingo, hpa, rob, akpm, bhelgaas, stern, LKML,
	linux-doc, devel, x86, shuahkhan

On Mon, 2012-09-17 at 09:39 -0400, Konrad Rzeszutek Wilk wrote:

> > check_unmap():
> > 	This is an existing internal routines that checks for unmap errors,
> > 	changed to increment dma_unmap_errors for the current device, as well
> > 	as the dma_unmap_errors counter for the system, dma-debug api keeps
> > 	track of, when a device requests an invalid address to be unmapped.
> > 	Please note that this routine can no longer call dma_mapping_error(),
> > 	because of the newly added debug_dma_mapping_error() interface. Calling
> > 	dma_mapping_error() from this routine will decrement
> > 	dma_map_errors_not_checked counter incorrectly.
> 
> 
> I like the direction of this patch. That said I am wondering why you
> choose to do it this way? Was there no way to have all of the logic within
> debug dma file, and within check_unmap?

> What is the extra complexity? Can you explain as if I was a newbie to debug DMA
> API - perhaps there is still some hope in doing it there?
> 
> > struct device eliminates the need for maintaining failed mappings in dma-debug
> > infrastructure and is cleaner and simpler without impacting the existing
> > dma-debug infrastructure.
> 
> Could you explain please why it would be more difficult to do it in the existing
> dma-debug infrastructure?

I started out with a goal to provide a debug infrastructure to track all
the cases where dma mapping errors go unchecked.

I could have gone the route to track system wide counts and not be
concerned about per device counts. In which case, it would be a sub-set
of the functionality in this pacth.  i.e debug_dma_map_page() increments
dma_map_errors and dma_map_errors_not_checked. The new interface
debug_dma_mapping_error() simply decrements dma_map_errors_not_checked.

check_unmap() can increment the third system wide counter,
dma_unmap_errors.

However, system wide counters are of limited use, per device counters
will give us the ability to identify the drivers that need fixing and
fix them and have a way to regression test old drivers and sanity check
the new in the future.

Having decided on per device counters, my first approach was looking
into enhancing dma-debug infrastructure and contain this logic within
that module by enhancing the existing dma_debug_entry to track these
errors. 

One issue with this approach is that the current dma-debug
infrastructure tracks only the successful mappings. Entries are added to
the dma_debug_entry able from mapping interfaces for good maps. This
table is hashed using the mapped address (dma_addr). When dma mapping
error is detected by the debug interfaces debug_dma_map_page() namely,
nothing gets added to the dma_debug_entry table.

Tracking failed mappings would require either changing the current table
usage to include failed maps and change the hash function to use some
other key instead of the mapped address. I didn't want to go that route.
One option I considered was to maintain device list with dma-debug
module and at that point adding fields to struct device sounded like one
way to go instead of adding another set of parallel data structures to
maintain the association between these counts and devices.

But from what I am hearing as feedback "changing struct device for this
purpose is not a desirable." :)

I will go back and take a look at another approach to not disturb the
existing dma_debug_entry usage, still provide per device counts
contained within the dma-debug module. I have couple of ideas to pursue
further and see if they work.

-- Shuah







^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH] dma-debug: New interfaces to debug dma mapping errors
  2012-09-17 15:52   ` Shuah Khan
@ 2012-09-17 17:23     ` Konrad Rzeszutek Wilk
  2012-09-17 22:45       ` Shuah Khan
  0 siblings, 1 reply; 30+ messages in thread
From: Konrad Rzeszutek Wilk @ 2012-09-17 17:23 UTC (permalink / raw)
  To: Shuah Khan
  Cc: Greg KH, joerg.roedel, tglx, mingo, hpa, rob, akpm, bhelgaas,
	stern, LKML, linux-doc, devel, x86, shuahkhan

On Mon, Sep 17, 2012 at 09:52:52AM -0600, Shuah Khan wrote:
> On Mon, 2012-09-17 at 09:39 -0400, Konrad Rzeszutek Wilk wrote:
> 
> > > check_unmap():
> > > 	This is an existing internal routines that checks for unmap errors,
> > > 	changed to increment dma_unmap_errors for the current device, as well
> > > 	as the dma_unmap_errors counter for the system, dma-debug api keeps
> > > 	track of, when a device requests an invalid address to be unmapped.
> > > 	Please note that this routine can no longer call dma_mapping_error(),
> > > 	because of the newly added debug_dma_mapping_error() interface. Calling
> > > 	dma_mapping_error() from this routine will decrement
> > > 	dma_map_errors_not_checked counter incorrectly.
> > 
> > 
> > I like the direction of this patch. That said I am wondering why you
> > choose to do it this way? Was there no way to have all of the logic within
> > debug dma file, and within check_unmap?
> 
> > What is the extra complexity? Can you explain as if I was a newbie to debug DMA
> > API - perhaps there is still some hope in doing it there?
> > 
> > > struct device eliminates the need for maintaining failed mappings in dma-debug
> > > infrastructure and is cleaner and simpler without impacting the existing
> > > dma-debug infrastructure.
> > 
> > Could you explain please why it would be more difficult to do it in the existing
> > dma-debug infrastructure?
> 
> I started out with a goal to provide a debug infrastructure to track all
> the cases where dma mapping errors go unchecked.
> 
> I could have gone the route to track system wide counts and not be
> concerned about per device counts. In which case, it would be a sub-set
> of the functionality in this pacth.  i.e debug_dma_map_page() increments
> dma_map_errors and dma_map_errors_not_checked. The new interface
> debug_dma_mapping_error() simply decrements dma_map_errors_not_checked.
> 
> check_unmap() can increment the third system wide counter,
> dma_unmap_errors.
> 
> However, system wide counters are of limited use, per device counters
> wil gine us the ability to identify the drivers that need fixing and
> fix them and have a way to regression test old drivers and sanity check
> the new in the future.
> 
> Having decided on per device counters, my first approach was looking
> into enhancing dma-debug infrastructure and contain this logic within
> that module by enhancing the existing dma_debug_entry to track these
> errors. 
> 
> One issue with this approach is that the current dma-debug
> infrastructure tracks only the successful mappings. Entries are added to
> the dma_debug_entry able from mapping interfaces for good maps. This
> table is hashed using the mapped address (dma_addr). When dma mapping
> error is detected by the debug interfaces debug_dma_map_page() namely,
> nothing gets added to the dma_debug_entry table.

The check for the violations you are trying to find is to find that
during the life-time of 'map_page' -> 'unmap_page' that 'dma_mapping_error'
has been called. Presumarily part of that are good maps?

So what would it take to keep that state for that scenario? Could you
use the existing system of lookup?

For the scenario where the result of 'map_page' is invalid it seems
that you would need to use a completely different hash key anyway, as:

extern dma_addr_t swiotlb_map_page(struct device *dev, struct page *page,
                                   unsigned long offset, size_t size,
                                   enum dma_data_direction dir,
                                   struct dma_attrs *attrs);

on the input side you get 'struct device','struct page'... and that is it.
The DMA API is responsible for providing you with the 'dma_addr' which
is going to be zero or -1, or some valid DMA scratch address, depending on the IOMMU.

On the later invocation, so 'unmap_page', you have:

extern void swiotlb_unmap_page(struct device *hwdev, dma_addr_t dev_addr,
                               size_t size, enum dma_data_direction dir,
                               struct dma_attrs *attrs);

you are feed the 'dev_addr' that 'map_page' came up with (-1 or 0) and the
'struct device'.

Perhaps you could use just 'struct device' and 'dma_addr' and life with
the possiblity of the device doing multiple of these map_page where it gets
a invalid address and does nothing about it. If it does the 'dma_mapping_error'
you could deduct the count of invalid DMA address?


> 
> Tracking failed mappings would require either changing the current table
> usage to include failed maps and change the hash function to use some
> other key instead of the mapped address. I didn't want to go that route.
> One option I considered was to maintain device list with dma-debug
> module and at that point adding fields to struct device sounded like one
> way to go instead of adding another set of parallel data structures to
> maintain the association between these counts and devices.
> 
> But from what I am hearing as feedback "changing struct device for this
> purpose is not a desirable." :)

Yup.
> 
> I will go back and take a look at another approach to not disturb the
> existing dma_debug_entry usage, still provide per device counts
> contained within the dma-debug module. I have couple of ideas to pursue
> further and see if they work.

OK. Would it help if we suggested some ideas or do you want to try to
mull some of your ideas first?
> 
> -- Shuah
> 
> 
> 
> 
> 

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH] dma-debug: New interfaces to debug dma mapping errors
  2012-09-17 17:23     ` Konrad Rzeszutek Wilk
@ 2012-09-17 22:45       ` Shuah Khan
  2012-09-18 13:34         ` Joerg Roedel
  0 siblings, 1 reply; 30+ messages in thread
From: Shuah Khan @ 2012-09-17 22:45 UTC (permalink / raw)
  To: Konrad Rzeszutek Wilk
  Cc: Greg KH, joerg.roedel, tglx, mingo, hpa, rob, akpm, bhelgaas,
	stern, LKML, linux-doc, devel, x86, shuahkhan

On Mon, 2012-09-17 at 13:23 -0400, Konrad Rzeszutek Wilk wrote:
> On Mon, Sep 17, 2012 at 09:52:52AM -0600, Shuah Khan wrote:
> > On Mon, 2012-09-17 at 09:39 -0400, Konrad Rzeszutek Wilk wrote:
> > 
> > > > check_unmap():
> > > > 	This is an existing internal routines that checks for unmap errors,
> > > > 	changed to increment dma_unmap_errors for the current device, as well
> > > > 	as the dma_unmap_errors counter for the system, dma-debug api keeps
> > > > 	track of, when a device requests an invalid address to be unmapped.
> > > > 	Please note that this routine can no longer call dma_mapping_error(),
> > > > 	because of the newly added debug_dma_mapping_error() interface. Calling
> > > > 	dma_mapping_error() from this routine will decrement
> > > > 	dma_map_errors_not_checked counter incorrectly.
> > > 
> > > 
> > > I like the direction of this patch. That said I am wondering why you
> > > choose to do it this way? Was there no way to have all of the logic within
> > > debug dma file, and within check_unmap?
> > 
> > > What is the extra complexity? Can you explain as if I was a newbie to debug DMA
> > > API - perhaps there is still some hope in doing it there?
> > > 
> > > > struct device eliminates the need for maintaining failed mappings in dma-debug
> > > > infrastructure and is cleaner and simpler without impacting the existing
> > > > dma-debug infrastructure.
> > > 
> > > Could you explain please why it would be more difficult to do it in the existing
> > > dma-debug infrastructure?
> > 
> > I started out with a goal to provide a debug infrastructure to track all
> > the cases where dma mapping errors go unchecked.
> > 
> > I could have gone the route to track system wide counts and not be
> > concerned about per device counts. In which case, it would be a sub-set
> > of the functionality in this pacth.  i.e debug_dma_map_page() increments
> > dma_map_errors and dma_map_errors_not_checked. The new interface
> > debug_dma_mapping_error() simply decrements dma_map_errors_not_checked.
> > 
> > check_unmap() can increment the third system wide counter,
> > dma_unmap_errors.
> > 
> > However, system wide counters are of limited use, per device counters
> > wil gine us the ability to identify the drivers that need fixing and
> > fix them and have a way to regression test old drivers and sanity check
> > the new in the future.
> > 
> > Having decided on per device counters, my first approach was looking
> > into enhancing dma-debug infrastructure and contain this logic within
> > that module by enhancing the existing dma_debug_entry to track these
> > errors. 
> > 
> > One issue with this approach is that the current dma-debug
> > infrastructure tracks only the successful mappings. Entries are added to
> > the dma_debug_entry able from mapping interfaces for good maps. This
> > table is hashed using the mapped address (dma_addr). When dma mapping
> > error is detected by the debug interfaces debug_dma_map_page() namely,
> > nothing gets added to the dma_debug_entry table.
> 
> The check for the violations you are trying to find is to find that
> during the life-time of 'map_page' -> 'unmap_page' that 'dma_mapping_error'
> has been called. Presumarily part of that are good maps?
> 
> So what would it take to keep that state for that scenario? Could you
> use the existing system of lookup?
> 
> For the scenario where the result of 'map_page' is invalid it seems
> that you would need to use a completely different hash key anyway, as:
> 
> extern dma_addr_t swiotlb_map_page(struct device *dev, struct page *page,
>                                    unsigned long offset, size_t size,
>                                    enum dma_data_direction dir,
>                                    struct dma_attrs *attrs);
> 
> on the input side you get 'struct device','struct page'... and that is it.
> The DMA API is responsible for providing you with the 'dma_addr' which
> is going to be zero or -1, or some valid DMA scratch address, depending on the IOMMU.
> 
> On the later invocation, so 'unmap_page', you have:
> 
> extern void swiotlb_unmap_page(struct device *hwdev, dma_addr_t dev_addr,
>                                size_t size, enum dma_data_direction dir,
>                                struct dma_attrs *attrs);
> 
> you are feed the 'dev_addr' that 'map_page' came up with (-1 or 0) and the
> 'struct device'.
> 
> Perhaps you could use just 'struct device' and 'dma_addr' and life with
> the possiblity of the device doing multiple of these map_page where it gets
> a invalid address and does nothing about it. If it does the 'dma_mapping_error'
> you could deduct the count of invalid DMA address?

Right. I could do that. Hoping I can find a way to get full coverage
still :)

> 
> 
> > 
> > Tracking failed mappings would require either changing the current table
> > usage to include failed maps and change the hash function to use some
> > other key instead of the mapped address. I didn't want to go that route.
> > One option I considered was to maintain device list with dma-debug
> > module and at that point adding fields to struct device sounded like one
> > way to go instead of adding another set of parallel data structures to
> > maintain the association between these counts and devices.
> > 
> > But from what I am hearing as feedback "changing struct device for this
> > purpose is not a desirable." :)
> 
> Yup.
> > 
> > I will go back and take a look at another approach to not disturb the
> > existing dma_debug_entry usage, still provide per device counts
> > contained within the dma-debug module. I have couple of ideas to pursue
> > further and see if they work.
> 
> OK. Would it help if we suggested some ideas or do you want to try to
> mull some of your ideas first?

Yeah. I will firm up my ideas a bit and summarize in a day or two. Would
like to hear your ideas as well at that time, so we can pick the one
that works the best.

-- Shuah




^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH] dma-debug: New interfaces to debug dma mapping errors
  2012-09-17 22:45       ` Shuah Khan
@ 2012-09-18 13:34         ` Joerg Roedel
  2012-09-18 19:42           ` Shuah Khan
  0 siblings, 1 reply; 30+ messages in thread
From: Joerg Roedel @ 2012-09-18 13:34 UTC (permalink / raw)
  To: Shuah Khan
  Cc: Konrad Rzeszutek Wilk, Greg KH, tglx, mingo, hpa, rob, akpm,
	bhelgaas, stern, LKML, linux-doc, devel, x86, shuahkhan

On Mon, Sep 17, 2012 at 04:45:15PM -0600, Shuah Khan wrote:
> Yeah. I will firm up my ideas a bit and summarize in a day or two. Would
> like to hear your ideas as well at that time, so we can pick the one
> that works the best.

I think the best approach for this functionality is to add a flag to
'struct dma_debug_entry' which tells whether the address has been
checked with dma_mapping error or not. On unmap or driver unload you can
then check for that flag and print a warning when an unchecked address
is detected.


	Joerg

-- 
AMD Operating System Research Center

Advanced Micro Devices GmbH Einsteinring 24 85609 Dornach
General Managers: Alberto Bozzo
Registration: Dornach, Landkr. Muenchen; Registerger. Muenchen, HRB Nr. 43632


^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH] dma-debug: New interfaces to debug dma mapping errors
  2012-09-18 13:34         ` Joerg Roedel
@ 2012-09-18 19:42           ` Shuah Khan
  2012-09-18 19:45             ` Konrad Rzeszutek Wilk
  2012-09-19 13:08             ` Joerg Roedel
  0 siblings, 2 replies; 30+ messages in thread
From: Shuah Khan @ 2012-09-18 19:42 UTC (permalink / raw)
  To: Joerg Roedel
  Cc: Konrad Rzeszutek Wilk, Greg KH, tglx, mingo, hpa, rob, akpm,
	bhelgaas, stern, LKML, linux-doc, devel, x86, shuahkhan

On Tue, 2012-09-18 at 15:34 +0200, Joerg Roedel wrote:
> On Mon, Sep 17, 2012 at 04:45:15PM -0600, Shuah Khan wrote:
> > Yeah. I will firm up my ideas a bit and summarize in a day or two. Would
> > like to hear your ideas as well at that time, so we can pick the one
> > that works the best.
> 
> I think the best approach for this functionality is to add a flag to
> 'struct dma_debug_entry' which tells whether the address has been
> checked with dma_mapping error or not. On unmap or driver unload you can
> then check for that flag and print a warning when an unchecked address
> is detected.

Was hoping to get comments from you as well. You are original author for
this dam-debug module.

Are you ok with the system wide and per device error counts I added? Any
comments on the overall approach?

The approach you suggested will cover the cases where drivers fail to
check good map cases. We won't able to catch failed maps that get used
without checks. Are you not concerned about these cases? These could
cause a silent error with wild writes or could bring the system down. Or
are you recommending changing the infrastructure to track failed maps as
well?

I am still pursuing a way to track failed map cases. I combined the flag
idea with one of the ideas I am looking into. Details below: (if this
sounds like a reasonable approach, I can do v2 patch and we can discuss
the code)

. Add new fields dma_map_errors, dma_map_errors_not_checked,
dma_unmap_errors, iotlb_overflow_cnt, and flag to struct
dma_debug_entry. Maybe flag is not even needed if
dma_map_errors_not_checked can double as status.

. Enhance dma_debug_init() to create a second table to track failed maps
with PREALLOC_DMA_DEBUG_ENTRIES/64 = 64. 64 devices probably is good
enough.

. Entries added to this new table when debug_dma_map_page() detects
error when mapping error is detected for the first time. Subsequent
errors, will increment dma_map_errors, dma_map_errors_not_checked for
that the device that is tracked by this entry. Note: paddr field could
work as an index into this table (existing table uses dma_addr)

. Decrement dma_map_errors_not_checked from debug_dma_mapping_error(),
clear the flag.

. check_unmap() when it detects mapping error, checks flag (status) and
prints warn message.

-- Shuah


^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH] dma-debug: New interfaces to debug dma mapping errors
  2012-09-18 19:42           ` Shuah Khan
@ 2012-09-18 19:45             ` Konrad Rzeszutek Wilk
  2012-09-18 20:34               ` Shuah Khan
  2012-09-19 13:08             ` Joerg Roedel
  1 sibling, 1 reply; 30+ messages in thread
From: Konrad Rzeszutek Wilk @ 2012-09-18 19:45 UTC (permalink / raw)
  To: Shuah Khan
  Cc: Joerg Roedel, Greg KH, tglx, mingo, hpa, rob, akpm, bhelgaas,
	stern, LKML, linux-doc, devel, x86, shuahkhan

On Tue, Sep 18, 2012 at 01:42:49PM -0600, Shuah Khan wrote:
> On Tue, 2012-09-18 at 15:34 +0200, Joerg Roedel wrote:
> > On Mon, Sep 17, 2012 at 04:45:15PM -0600, Shuah Khan wrote:
> > > Yeah. I will firm up my ideas a bit and summarize in a day or two. Would
> > > like to hear your ideas as well at that time, so we can pick the one
> > > that works the best.
> > 
> > I think the best approach for this functionality is to add a flag to
> > 'struct dma_debug_entry' which tells whether the address has been
> > checked with dma_mapping error or not. On unmap or driver unload you can
> > then check for that flag and print a warning when an unchecked address
> > is detected.
> 
> Was hoping to get comments from you as well. You are original author for
> this dam-debug module.
> 
> Are you ok with the system wide and per device error counts I added? Any
> comments on the overall approach?
> 
> The approach you suggested will cover the cases where drivers fail to
> check good map cases. We won't able to catch failed maps that get used
> without checks. Are you not concerned about these cases? These could
> cause a silent error with wild writes or could bring the system down. Or
> are you recommending changing the infrastructure to track failed maps as
> well?
> 
> I am still pursuing a way to track failed map cases. I combined the flag
> idea with one of the ideas I am looking into. Details below: (if this
> sounds like a reasonable approach, I can do v2 patch and we can discuss
> the code)
> 
> . Add new fields dma_map_errors, dma_map_errors_not_checked,
> dma_unmap_errors, iotlb_overflow_cnt, and flag to struct
> dma_debug_entry. Maybe flag is not even needed if
> dma_map_errors_not_checked can double as status.

Not sure if you need the iotlb_overflow_cnt anymore. Just having
dma_map_errors_not_checked and the dma_map_errors
(which you can increment/decrement) would suffice. Unless you
were thinking to check that dma_map_errors == dma_unmap_errors and
if they != then produce a warning?
> 
> . Enhance dma_debug_init() to create a second table to track failed maps
> with PREALLOC_DMA_DEBUG_ENTRIES/64 = 64. 64 devices probably is good
> enough.
> 
> . Entries added to this new table when debug_dma_map_page() detects
> error when mapping error is detected for the first time. Subsequent
> errors, will increment dma_map_errors, dma_map_errors_not_checked for
> that the device that is tracked by this entry. Note: paddr field could
> work as an index into this table (existing table uses dma_addr)

> 
> . Decrement dma_map_errors_not_checked from debug_dma_mapping_error(),
> clear the flag.
> 
> . check_unmap() when it detects mapping error, checks flag (status) and
> prints warn message.

<nods>
> 
> -- Shuah

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH] dma-debug: New interfaces to debug dma mapping errors
  2012-09-18 19:45             ` Konrad Rzeszutek Wilk
@ 2012-09-18 20:34               ` Shuah Khan
  0 siblings, 0 replies; 30+ messages in thread
From: Shuah Khan @ 2012-09-18 20:34 UTC (permalink / raw)
  To: Konrad Rzeszutek Wilk
  Cc: Joerg Roedel, Greg KH, tglx, mingo, hpa, rob, akpm, bhelgaas,
	stern, LKML, linux-doc, devel, x86, shuahkhan

On Tue, 2012-09-18 at 15:45 -0400, Konrad Rzeszutek Wilk wrote:
> On Tue, Sep 18, 2012 at 01:42:49PM -0600, Shuah Khan wrote:
> > On Tue, 2012-09-18 at 15:34 +0200, Joerg Roedel wrote:
> > > On Mon, Sep 17, 2012 at 04:45:15PM -0600, Shuah Khan wrote:
> > > > Yeah. I will firm up my ideas a bit and summarize in a day or two. Would
> > > > like to hear your ideas as well at that time, so we can pick the one
> > > > that works the best.
> > > 
> > > I think the best approach for this functionality is to add a flag to
> > > 'struct dma_debug_entry' which tells whether the address has been
> > > checked with dma_mapping error or not. On unmap or driver unload you can
> > > then check for that flag and print a warning when an unchecked address
> > > is detected.
> > 
> > Was hoping to get comments from you as well. You are original author for
> > this dam-debug module.
> > 
> > Are you ok with the system wide and per device error counts I added? Any
> > comments on the overall approach?
> > 
> > The approach you suggested will cover the cases where drivers fail to
> > check good map cases. We won't able to catch failed maps that get used
> > without checks. Are you not concerned about these cases? These could
> > cause a silent error with wild writes or could bring the system down. Or
> > are you recommending changing the infrastructure to track failed maps as
> > well?
> > 
> > I am still pursuing a way to track failed map cases. I combined the flag
> > idea with one of the ideas I am looking into. Details below: (if this
> > sounds like a reasonable approach, I can do v2 patch and we can discuss
> > the code)
> > 
> > . Add new fields dma_map_errors, dma_map_errors_not_checked,
> > dma_unmap_errors, iotlb_overflow_cnt, and flag to struct
> > dma_debug_entry. Maybe flag is not even needed if
> > dma_map_errors_not_checked can double as status.
> 
> Not sure if you need the iotlb_overflow_cnt anymore. Just having
> dma_map_errors_not_checked and the dma_map_errors
> (which you can increment/decrement) would suffice. Unless you
> were thinking to check that dma_map_errors == dma_unmap_errors and
> if they != then produce a warning?

Right. I wsn't thinking about that, but I get it. Don't need
iotlb_overflow_cnt as it is included in the failed map count. What I
meant was dma_map_errors_not_checked > 0 is same as the status this flag
is intended to track can be a trigger for warn. But that is not going
work because it will generate warnings as soon as
dma_map_errors_not_checked becomes > 0 and stays that way. Need the
flag. :) So dropping iotlb_overflow_cnt and keeping the status flag.

-- Shuah


^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH] dma-debug: New interfaces to debug dma mapping errors
  2012-09-18 19:42           ` Shuah Khan
  2012-09-18 19:45             ` Konrad Rzeszutek Wilk
@ 2012-09-19 13:08             ` Joerg Roedel
  2012-09-19 19:16               ` Shuah Khan
  1 sibling, 1 reply; 30+ messages in thread
From: Joerg Roedel @ 2012-09-19 13:08 UTC (permalink / raw)
  To: Shuah Khan
  Cc: Konrad Rzeszutek Wilk, Greg KH, tglx, mingo, hpa, rob, akpm,
	bhelgaas, stern, LKML, linux-doc, devel, x86, shuahkhan

On Tue, Sep 18, 2012 at 01:42:49PM -0600, Shuah Khan wrote:
> Are you ok with the system wide and per device error counts I added? Any
> comments on the overall approach?

The general approach of having error counters is fine. But the addresses
allocated/addresses checked thing should be done per allocation and not
with counter comparison for several reasons:

	1. When doing it per-allocation we know exactly which allocation
	   was not checked and can tell the driver developer. The code
	   saves stack-traces for that. This is much more useful than
	   telling the developer 'somewhere you do not check your
	   dma-handles'

	2. Checking this per-allocation gives you the per-device and
	   also the per-driver checking you want.

	3. You don't need to change 'struct device' for that.

There are more reasons, like that this approach fits a lot better to the
general idea of the DMA-API debugging code.

> The approach you suggested will cover the cases where drivers fail to
> check good map cases. We won't able to catch failed maps that get used
> without checks. Are you not concerned about these cases? These could
> cause a silent error with wild writes or could bring the system down. Or
> are you recommending changing the infrastructure to track failed maps as
> well?

It is fine to only check the good-map cases. Think about what
DMA-debugging is good for: It is a tool for driver developers to find
bugs in their code they wouldn't notice otherwise. An unchecked bad-map
case is a bug they would notice otherwise. So if we check only the
good-map cases and warn the driver developers about non-checked
addresses they fix it and make the drivers more robust against failed
allocations, fixing also the bad-map cases.

> I am still pursuing a way to track failed map cases. I combined the flag
> idea with one of the ideas I am looking into. Details below: (if this
> sounds like a reasonable approach, I can do v2 patch and we can discuss
> the code)

Why do you want to track the bad-map cases?


	Joerg

-- 
AMD Operating System Research Center

Advanced Micro Devices GmbH Einsteinring 24 85609 Dornach
General Managers: Alberto Bozzo
Registration: Dornach, Landkr. Muenchen; Registerger. Muenchen, HRB Nr. 43632


^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH] dma-debug: New interfaces to debug dma mapping errors
  2012-09-19 13:08             ` Joerg Roedel
@ 2012-09-19 19:16               ` Shuah Khan
  0 siblings, 0 replies; 30+ messages in thread
From: Shuah Khan @ 2012-09-19 19:16 UTC (permalink / raw)
  To: Joerg Roedel
  Cc: Konrad Rzeszutek Wilk, Greg KH, tglx, mingo, hpa, rob, akpm,
	bhelgaas, stern, LKML, linux-doc, devel, x86, shuahkhan

On Wed, 2012-09-19 at 15:08 +0200, Joerg Roedel wrote:
> On Tue, Sep 18, 2012 at 01:42:49PM -0600, Shuah Khan wrote:
> > Are you ok with the system wide and per device error counts I added? Any
> > comments on the overall approach?
> 
> The general approach of having error counters is fine. But the addresses
> allocated/addresses checked thing should be done per allocation and not
> with counter comparison for several reasons:
> 
> 	1. When doing it per-allocation we know exactly which allocation
> 	   was not checked and can tell the driver developer. The code
> 	   saves stack-traces for that. This is much more useful than
> 	   telling the developer 'somewhere you do not check your
> 	   dma-handles'
Right. It would point directly the actual mapping instead of a blind
count.

> 
> 	2. Checking this per-allocation gives you the per-device and
> 	   also the per-driver checking you want.

Yes it would.
> 
> 	3. You don't need to change 'struct device' for that.

Right - heard from others as well on this one :)

> 
> There are more reasons, like that this approach fits a lot better to the
> general idea of the DMA-API debugging code.
> 
> > The approach you suggested will cover the cases where drivers fail to
> > check good map cases. We won't able to catch failed maps that get used
> > without checks. Are you not concerned about these cases? These could
> > cause a silent error with wild writes or could bring the system down. Or
> > are you recommending changing the infrastructure to track failed maps as
> > well?
> 
> It is fine to only check the good-map cases. Think about what
> DMA-debugging is good for: It is a tool for driver developers to find
> bugs in their code they wouldn't notice otherwise. An unchecked bad-map
> case is a bug they would notice otherwise. So if we check only the
> good-map cases and warn the driver developers about non-checked
> addresses they fix it and make the drivers more robust against failed
> allocations, fixing also the bad-map cases.

ok makes sense now that understand the scope of the dma-debug api. Here
is what I will do then, do checks on good maps. With that scope, there
is no need for another table.

> 
> > I am still pursuing a way to track failed map cases. I combined the flag
> > idea with one of the ideas I am looking into. Details below: (if this
> > sounds like a reasonable approach, I can do v2 patch and we can discuss
> > the code)
> 
> Why do you want to track the bad-map cases?

I am still concerned about data corruption type issues that will be hard
to debug and hoping having a error count might be an indicator. However,
I agree with what you said about not having the actual mapping
association is not very useful.

-- Shuah


^ permalink raw reply	[flat|nested] 30+ messages in thread

* [PATCH v2] dma-debug: New interfaces to debug dma mapping errors
  2012-09-17  0:52 [PATCH] dma-debug: New interfaces to debug dma mapping errors Shuah Khan
  2012-09-17  2:07 ` Greg KH
  2012-09-17 13:39 ` Konrad Rzeszutek Wilk
@ 2012-09-26  1:05 ` Shuah Khan
  2012-09-26 13:12   ` Konrad Rzeszutek Wilk
                     ` (2 more replies)
  2 siblings, 3 replies; 30+ messages in thread
From: Shuah Khan @ 2012-09-26  1:05 UTC (permalink / raw)
  To: konrad.wilk, joerg.roedel, tglx, mingo, hpa, rob, akpm, bhelgaas, stern
  Cc: LKML, linux-doc, devel, x86, shuahkhan

A recent dma mapping error analysis effort showed that a large percentage
of dma_map_single() and dma_map_page() returns are not checked for mapping
errors.

Reference:
http://linuxdriverproject.org/mediawiki/index.php/DMA_Mapping_Error_Analysis

Adding support for tracking dma mapping and unmapping errors to help assess
the following:

When do dma mapping errors get detected?
How often do these errors occur?
Why don't we see failures related to missing dma mapping error checks?
Are they silent failures?

Enhance dma-debug infrastructure to track dma mapping, and unmapping errors.

dma_map_errors: (system wide counter)
  Total number of dma mapping errors returned by the dma mapping interfaces,
  in response to mapping requests from all devices in the system.
dma_map_errors_not_checked: (system wide counter)
  Total number of dma mapping errors devices failed to check before using
  the returned address.
dma_unmap_errors: (system wide counter)
  Total number of times devices tried to unmap or free an invalid dma
  address.
dma_map_error_flag: (new field added to dma_debug_entry structure)
  New field to maintain dma mapping error check status. This flag is applicable
  to dma map page and dma map single entries tracked by dma-debug API. This
  status indicates whether or not a good mapping is checked by the device
  before it is used. dma_map_single() and dma_map_page() could fail to create
  a mapping in some cases, and drivers are expected to call dma_mapping_error()
  to check for errors.

Enhancements to dma-debug API are made to add new debugfs interfaces to
report total dma errors, dma errors that are not checked, and unmap errors
for the entire system. Please note that these are system wide counters for
all devices in the system.

The following new dma-debug interface is added:

debug_dma_mapping_error(struct device *dev, dma_addr_t dma_addr):
	Sets dma map error checked status for the dma map entry if one is
	found. Decrements the system wide dma_map_errors_not_checked counter
	that is incremented by debug_dma_map_page() when it checks for
	mapping error before adding it to the dma debug entry table.

The following existing dma-debug APIs are changed to support this feature:

debug_dma_map_page():
	Increments dma_map_errors and dma_map_errors_not_checked error totals
	for the system when dma_addr is invalid. Please note that this routine
	can no longer call dma_mapping_error(), because of the newly added
	debug_dma_mapping_error() interface. Calling this routine at the time
	dma error unchecked state is registered, will not help if state gets
	changed right away.

check_unmap():
	This is an existing internal routine that checks for various mapping
	errors. Changed to increment system wide dma_unmap_errors, when a
	device requests an invalid address to be unmapped. Please note that
	this routine can no longer call dma_mapping_error(), because of the
	newly added debug_dma_mapping_error() interface. Calling
	dma_mapping_error() from this routine will change the dma map error
	flag erroneously.

Changed arch/x86/include/asm/dma-mapping.h to call debug_dma_mapping_error()
to validate these new interfaces on x86_64. Other architectures will be
changed in a subsequent patch.

Tested: Intel iommu and swiotlb (iommu=soft) on x86-64 with
        CONFIG_DMA_API_DEBUG enabled and disabled.

Signed-off-by: Shuah Khan <shuah.khan@hp.com>
---
 Documentation/DMA-API.txt          |   13 +++++
 arch/x86/include/asm/dma-mapping.h |    1 +
 include/linux/dma-debug.h          |    7 +++
 lib/dma-debug.c                    |   92 ++++++++++++++++++++++++++++++++++--
 4 files changed, 109 insertions(+), 4 deletions(-)

diff --git a/Documentation/DMA-API.txt b/Documentation/DMA-API.txt
index 66bd97a..59d58b9 100644
--- a/Documentation/DMA-API.txt
+++ b/Documentation/DMA-API.txt
@@ -638,6 +638,19 @@ this directory the following files can currently be found:
 	dma-api/error_count	This file is read-only and shows the total
 				numbers of errors found.
 
+	dma-api/dma_map_errors  This file is read-only and shows the total
+				number of dma mapping errors detected.
+
+	dma-api/dma_map_errors_not_checked
+				This file is read-only and shows the total
+				number of dma mapping errors, device drivers
+				failed to check prior to using the returned
+				address.
+
+	dma-api/dma_unmap_errors
+				This file is read-only and shows the total
+				number of invalid dma unmapping attempts.
+
 	dma-api/num_errors	The number in this file shows how many
 				warnings will be printed to the kernel log
 				before it stops. This number is initialized to
diff --git a/arch/x86/include/asm/dma-mapping.h b/arch/x86/include/asm/dma-mapping.h
index f7b4c79..808dae6 100644
--- a/arch/x86/include/asm/dma-mapping.h
+++ b/arch/x86/include/asm/dma-mapping.h
@@ -47,6 +47,7 @@ static inline struct dma_map_ops *get_dma_ops(struct device *dev)
 static inline int dma_mapping_error(struct device *dev, dma_addr_t dma_addr)
 {
 	struct dma_map_ops *ops = get_dma_ops(dev);
+	debug_dma_mapping_error(dev, dma_addr);
 	if (ops->mapping_error)
 		return ops->mapping_error(dev, dma_addr);
 
diff --git a/include/linux/dma-debug.h b/include/linux/dma-debug.h
index 171ad8a..fc0e34c 100644
--- a/include/linux/dma-debug.h
+++ b/include/linux/dma-debug.h
@@ -39,6 +39,8 @@ extern void debug_dma_map_page(struct device *dev, struct page *page,
 			       int direction, dma_addr_t dma_addr,
 			       bool map_single);
 
+extern void debug_dma_mapping_error(struct device *dev, dma_addr_t dma_addr);
+
 extern void debug_dma_unmap_page(struct device *dev, dma_addr_t addr,
 				 size_t size, int direction, bool map_single);
 
@@ -105,6 +107,11 @@ static inline void debug_dma_map_page(struct device *dev, struct page *page,
 {
 }
 
+static inline void debug_dma_mapping_error(struct device *dev,
+					  dma_addr_t dma_addr)
+{
+}
+
 static inline void debug_dma_unmap_page(struct device *dev, dma_addr_t addr,
 					size_t size, int direction,
 					bool map_single)
diff --git a/lib/dma-debug.c b/lib/dma-debug.c
index 66ce414..70724a5 100644
--- a/lib/dma-debug.c
+++ b/lib/dma-debug.c
@@ -57,6 +57,7 @@ struct dma_debug_entry {
 	int              direction;
 	int		 sg_call_ents;
 	int		 sg_mapped_ents;
+	int		 dma_map_error_flag;
 #ifdef CONFIG_STACKTRACE
 	struct		 stack_trace stacktrace;
 	unsigned long	 st_entries[DMA_DEBUG_STACKTRACE_ENTRIES];
@@ -83,6 +84,11 @@ static u32 global_disable __read_mostly;
 /* Global error count */
 static u32 error_count;
 
+/* dma mapping error counts */
+static u32 dma_map_errors;
+static u32 dma_map_errors_not_checked;
+static u32 dma_unmap_errors;
+
 /* Global error show enable*/
 static u32 show_all_errors __read_mostly;
 /* Number of errors to show */
@@ -104,6 +110,9 @@ static struct dentry *show_num_errors_dent  __read_mostly;
 static struct dentry *num_free_entries_dent __read_mostly;
 static struct dentry *min_free_entries_dent __read_mostly;
 static struct dentry *filter_dent           __read_mostly;
+static struct dentry *dma_map_errors_dent   __read_mostly;
+static struct dentry *dma_map_errors_not_checked_dent   __read_mostly;
+static struct dentry *dma_unmap_errors_dent __read_mostly;
 
 /* per-driver filter related state */
 
@@ -120,6 +129,15 @@ static const char *type2name[4] = { "single", "page",
 static const char *dir2name[4] = { "DMA_BIDIRECTIONAL", "DMA_TO_DEVICE",
 				   "DMA_FROM_DEVICE", "DMA_NONE" };
 
+enum {
+	dma_map_error_check_not_applicable,
+	dma_map_error_not_checked,
+	dma_map_error_checked,
+};
+static const char *maperr2str[3] = { "dma map error check not applicable",
+				     "dma map error not checked",
+				     "dma map error checked" };
+
 /* little merge helper - remove it after the merge window */
 #ifndef BUS_NOTIFY_UNBOUND_DRIVER
 #define BUS_NOTIFY_UNBOUND_DRIVER 0x0005
@@ -381,11 +399,12 @@ void debug_dma_dump_mappings(struct device *dev)
 		list_for_each_entry(entry, &bucket->list, list) {
 			if (!dev || dev == entry->dev) {
 				dev_info(entry->dev,
-					 "%s idx %d P=%Lx D=%Lx L=%Lx %s\n",
+					 "%s idx %d P=%Lx D=%Lx L=%Lx %s %s\n",
 					 type2name[entry->type], idx,
 					 (unsigned long long)entry->paddr,
 					 entry->dev_addr, entry->size,
-					 dir2name[entry->direction]);
+					 dir2name[entry->direction],
+					 maperr2str[entry->dma_map_error_flag]);
 			}
 		}
 
@@ -695,6 +714,28 @@ static int dma_debug_fs_init(void)
 	if (!filter_dent)
 		goto out_err;
 
+	dma_map_errors_dent = debugfs_create_u32("dma_map_errors", 0444,
+			dma_debug_dent,
+			&dma_map_errors);
+
+	if (!dma_map_errors_dent)
+		goto out_err;
+
+	dma_map_errors_not_checked_dent = debugfs_create_u32(
+			"dma_map_errors_not_checked",
+			0444,
+			dma_debug_dent,
+			&dma_map_errors_not_checked);
+
+	if (!dma_map_errors_not_checked_dent)
+		goto out_err;
+
+	dma_unmap_errors_dent = debugfs_create_u32("dma_unmap_errors", 0444,
+			dma_debug_dent,
+			&dma_unmap_errors);
+	if (!dma_unmap_errors_dent)
+		goto out_err;
+
 	return 0;
 
 out_err:
@@ -849,7 +890,8 @@ static void check_unmap(struct dma_debug_entry *ref)
 	struct hash_bucket *bucket;
 	unsigned long flags;
 
-	if (dma_mapping_error(ref->dev, ref->dev_addr)) {
+	if (ref->dev_addr == DMA_ERROR_CODE) {
+		dma_unmap_errors += 1;
 		err_printk(ref->dev, NULL, "DMA-API: device driver tries "
 			   "to free an invalid DMA memory address\n");
 		return;
@@ -915,6 +957,15 @@ static void check_unmap(struct dma_debug_entry *ref)
 			   dir2name[ref->direction]);
 	}
 
+	if (entry->dma_map_error_flag == dma_map_error_not_checked) {
+		err_printk(ref->dev, entry,
+			   "DMA-API: device driver failed to check map error"
+			   "[device address=0x%016llx] [size=%llu bytes] "
+			   "[mapped as %s]",
+			   ref->dev_addr, ref->size,
+			   type2name[entry->type]);
+	}
+
 	hash_bucket_del(entry);
 	dma_entry_free(entry);
 
@@ -1022,8 +1073,11 @@ void debug_dma_map_page(struct device *dev, struct page *page, size_t offset,
 	if (unlikely(global_disable))
 		return;
 
-	if (unlikely(dma_mapping_error(dev, dma_addr)))
+	if (dma_addr == DMA_ERROR_CODE) {
+		dma_map_errors += 1;
+		dma_map_errors_not_checked += 1;
 		return;
+	}
 
 	entry = dma_entry_alloc();
 	if (!entry)
@@ -1035,6 +1089,7 @@ void debug_dma_map_page(struct device *dev, struct page *page, size_t offset,
 	entry->dev_addr  = dma_addr;
 	entry->size      = size;
 	entry->direction = direction;
+	entry->dma_map_error_flag = dma_map_error_not_checked;
 
 	if (map_single)
 		entry->type = dma_debug_single;
@@ -1050,6 +1105,35 @@ void debug_dma_map_page(struct device *dev, struct page *page, size_t offset,
 }
 EXPORT_SYMBOL(debug_dma_map_page);
 
+void debug_dma_mapping_error(struct device *dev, dma_addr_t dma_addr)
+{
+	struct dma_debug_entry ref;
+	struct dma_debug_entry *entry;
+	struct hash_bucket *bucket;
+	unsigned long flags;
+
+	if (unlikely(global_disable))
+		return;
+
+	if (dma_addr == DMA_ERROR_CODE) {
+		dma_map_errors_not_checked -= 1;
+		return;
+	}
+
+	ref.dev = dev;
+	ref.dev_addr = dma_addr;
+	bucket = get_hash_bucket(&ref, &flags);
+	entry = bucket_find_exact(bucket, &ref);
+
+	if (!entry) /* very likley dma-api didn't call debug_dma_map_page() */
+		goto out;
+
+	entry->dma_map_error_flag = dma_map_error_checked;
+out:
+	put_hash_bucket(bucket, &flags);
+}
+EXPORT_SYMBOL(debug_dma_mapping_error);
+
 void debug_dma_unmap_page(struct device *dev, dma_addr_t addr,
 			  size_t size, int direction, bool map_single)
 {
-- 
1.7.9.5





^ permalink raw reply related	[flat|nested] 30+ messages in thread

* Re: [PATCH v2] dma-debug: New interfaces to debug dma mapping errors
  2012-09-26  1:05 ` [PATCH v2] " Shuah Khan
@ 2012-09-26 13:12   ` Konrad Rzeszutek Wilk
  2012-09-26 16:23     ` Shuah Khan
  2012-09-27 10:20   ` Joerg Roedel
  2012-10-03 14:55   ` [PATCH v3] " Shuah Khan
  2 siblings, 1 reply; 30+ messages in thread
From: Konrad Rzeszutek Wilk @ 2012-09-26 13:12 UTC (permalink / raw)
  To: Shuah Khan
  Cc: joerg.roedel, tglx, mingo, hpa, rob, akpm, bhelgaas, stern, LKML,
	linux-doc, devel, x86, shuahkhan

> Enhancements to dma-debug API are made to add new debugfs interfaces to
> report total dma errors, dma errors that are not checked, and unmap errors
> for the entire system. Please note that these are system wide counters for
> all devices in the system.
> 
> The following new dma-debug interface is added:
> 
> debug_dma_mapping_error(struct device *dev, dma_addr_t dma_addr):
> 	Sets dma map error checked status for the dma map entry if one is
> 	found. Decrements the system wide dma_map_errors_not_checked counter
> 	that is incremented by debug_dma_map_page() when it checks for
> 	mapping error before adding it to the dma debug entry table.
> 
> The following existing dma-debug APIs are changed to support this feature:
> 
> debug_dma_map_page():
> 	Increments dma_map_errors and dma_map_errors_not_checked error totals
> 	for the system when dma_addr is invalid. Please note that this routine
> 	can no longer call dma_mapping_error(), because of the newly added
> 	debug_dma_mapping_error() interface. Calling this routine at the time
> 	dma error unchecked state is registered, will not help if state gets
> 	changed right away.
> 
> check_unmap():
> 	This is an existing internal routine that checks for various mapping
> 	errors. Changed to increment system wide dma_unmap_errors, when a
> 	device requests an invalid address to be unmapped. Please note that
> 	this routine can no longer call dma_mapping_error(), because of the
> 	newly added debug_dma_mapping_error() interface. Calling
> 	dma_mapping_error() from this routine will change the dma map error
> 	flag erroneously.
> 
> Changed arch/x86/include/asm/dma-mapping.h to call debug_dma_mapping_error()
> to validate these new interfaces on x86_64. Other architectures will be
> changed in a subsequent patch.

Thanks for improving this patch. It is looking more and more ready for
the kernel. With that in mind, I've some comments below.

> 
> Tested: Intel iommu and swiotlb (iommu=soft) on x86-64 with
>         CONFIG_DMA_API_DEBUG enabled and disabled.
> 
> Signed-off-by: Shuah Khan <shuah.khan@hp.com>
> ---
>  Documentation/DMA-API.txt          |   13 +++++
>  arch/x86/include/asm/dma-mapping.h |    1 +
>  include/linux/dma-debug.h          |    7 +++
>  lib/dma-debug.c                    |   92 ++++++++++++++++++++++++++++++++++--
>  4 files changed, 109 insertions(+), 4 deletions(-)
> 
> diff --git a/Documentation/DMA-API.txt b/Documentation/DMA-API.txt
> index 66bd97a..59d58b9 100644
> --- a/Documentation/DMA-API.txt
> +++ b/Documentation/DMA-API.txt
> @@ -638,6 +638,19 @@ this directory the following files can currently be found:
>  	dma-api/error_count	This file is read-only and shows the total
>  				numbers of errors found.
>  
> +	dma-api/dma_map_errors  This file is read-only and shows the total
> +				number of dma mapping errors detected.
> +
> +	dma-api/dma_map_errors_not_checked
> +				This file is read-only and shows the total
> +				number of dma mapping errors, device drivers
> +				failed to check prior to using the returned
> +				address.
> +
> +	dma-api/dma_unmap_errors
> +				This file is read-only and shows the total
> +				number of invalid dma unmapping attempts.
> +
>  	dma-api/num_errors	The number in this file shows how many
>  				warnings will be printed to the kernel log
>  				before it stops. This number is initialized to
> diff --git a/arch/x86/include/asm/dma-mapping.h b/arch/x86/include/asm/dma-mapping.h
> index f7b4c79..808dae6 100644
> --- a/arch/x86/include/asm/dma-mapping.h
> +++ b/arch/x86/include/asm/dma-mapping.h
> @@ -47,6 +47,7 @@ static inline struct dma_map_ops *get_dma_ops(struct device *dev)
>  static inline int dma_mapping_error(struct device *dev, dma_addr_t dma_addr)
>  {
>  	struct dma_map_ops *ops = get_dma_ops(dev);
> +	debug_dma_mapping_error(dev, dma_addr);
>  	if (ops->mapping_error)
>  		return ops->mapping_error(dev, dma_addr);
>  
> diff --git a/include/linux/dma-debug.h b/include/linux/dma-debug.h
> index 171ad8a..fc0e34c 100644
> --- a/include/linux/dma-debug.h
> +++ b/include/linux/dma-debug.h
> @@ -39,6 +39,8 @@ extern void debug_dma_map_page(struct device *dev, struct page *page,
>  			       int direction, dma_addr_t dma_addr,
>  			       bool map_single);
>  
> +extern void debug_dma_mapping_error(struct device *dev, dma_addr_t dma_addr);
> +
>  extern void debug_dma_unmap_page(struct device *dev, dma_addr_t addr,
>  				 size_t size, int direction, bool map_single);
>  
> @@ -105,6 +107,11 @@ static inline void debug_dma_map_page(struct device *dev, struct page *page,
>  {
>  }
>  
> +static inline void debug_dma_mapping_error(struct device *dev,
> +					  dma_addr_t dma_addr)
> +{
> +}
> +
>  static inline void debug_dma_unmap_page(struct device *dev, dma_addr_t addr,
>  					size_t size, int direction,
>  					bool map_single)
> diff --git a/lib/dma-debug.c b/lib/dma-debug.c
> index 66ce414..70724a5 100644
> --- a/lib/dma-debug.c
> +++ b/lib/dma-debug.c
> @@ -57,6 +57,7 @@ struct dma_debug_entry {
>  	int              direction;
>  	int		 sg_call_ents;
>  	int		 sg_mapped_ents;
> +	int		 dma_map_error_flag;

I don't think you need 'dma_map' as a prefix a this is in an
dma_debug structure. Perhaps 'map_error_cnt' ?

But looking at the implementation this is actually an enum?
Can you do this instead:

	enum map_err_type	map_err_type;
?

>  #ifdef CONFIG_STACKTRACE
>  	struct		 stack_trace stacktrace;
>  	unsigned long	 st_entries[DMA_DEBUG_STACKTRACE_ENTRIES];
> @@ -83,6 +84,11 @@ static u32 global_disable __read_mostly;
>  /* Global error count */
>  static u32 error_count;
>  
> +/* dma mapping error counts */
> +static u32 dma_map_errors;
> +static u32 dma_map_errors_not_checked;
> +static u32 dma_unmap_errors;

s/dma//

> +
>  /* Global error show enable*/
>  static u32 show_all_errors __read_mostly;
>  /* Number of errors to show */
> @@ -104,6 +110,9 @@ static struct dentry *show_num_errors_dent  __read_mostly;
>  static struct dentry *num_free_entries_dent __read_mostly;
>  static struct dentry *min_free_entries_dent __read_mostly;
>  static struct dentry *filter_dent           __read_mostly;
> +static struct dentry *dma_map_errors_dent   __read_mostly;
> +static struct dentry *dma_map_errors_not_checked_dent   __read_mostly;
> +static struct dentry *dma_unmap_errors_dent __read_mostly;

Ditto.

>  
>  /* per-driver filter related state */
>  
> @@ -120,6 +129,15 @@ static const char *type2name[4] = { "single", "page",
>  static const char *dir2name[4] = { "DMA_BIDIRECTIONAL", "DMA_TO_DEVICE",
>  				   "DMA_FROM_DEVICE", "DMA_NONE" };
>  
> +enum {
> +	dma_map_error_check_not_applicable,
> +	dma_map_error_not_checked,
> +	dma_map_error_checked,
> +};

s/dma//
s/error/err//

And perhaps make them uppercase?

Can you name the enum? Say 'map_err_types'


> +static const char *maperr2str[3] = { "dma map error check not applicable",
> +				     "dma map error not checked",
> +				     "dma map error checked" };
> +

Just do this:

static const char *const names[] = {
		[err_check_na] = {"check n/a"},
		[err_not_checked] = {"not checked"},
		.. snip..
		};

That way you don't have to worry about the size and can just
use ARRAY_SIZE(names) to check for valid enum size.

>  /* little merge helper - remove it after the merge window */
>  #ifndef BUS_NOTIFY_UNBOUND_DRIVER
>  #define BUS_NOTIFY_UNBOUND_DRIVER 0x0005
> @@ -381,11 +399,12 @@ void debug_dma_dump_mappings(struct device *dev)
>  		list_for_each_entry(entry, &bucket->list, list) {
>  			if (!dev || dev == entry->dev) {
>  				dev_info(entry->dev,
> -					 "%s idx %d P=%Lx D=%Lx L=%Lx %s\n",
> +					 "%s idx %d P=%Lx D=%Lx L=%Lx %s %s\n",
>  					 type2name[entry->type], idx,
>  					 (unsigned long long)entry->paddr,
>  					 entry->dev_addr, entry->size,
> -					 dir2name[entry->direction]);
> +					 dir2name[entry->direction],
> +					 maperr2str[entry->dma_map_error_flag]);
>  			}
>  		}
>  
> @@ -695,6 +714,28 @@ static int dma_debug_fs_init(void)
>  	if (!filter_dent)
>  		goto out_err;
>  
> +	dma_map_errors_dent = debugfs_create_u32("dma_map_errors", 0444,

Just call it 'map_errors'

> +			dma_debug_dent,
> +			&dma_map_errors);
> +
> +	if (!dma_map_errors_dent)
> +		goto out_err;
> +
> +	dma_map_errors_not_checked_dent = debugfs_create_u32(
> +			"dma_map_errors_not_checked",

Ditto. s/dma//
> +			0444,
> +			dma_debug_dent,
> +			&dma_map_errors_not_checked);
> +
> +	if (!dma_map_errors_not_checked_dent)
> +		goto out_err;
> +
> +	dma_unmap_errors_dent = debugfs_create_u32("dma_unmap_errors", 0444,

s/dma//
> +			dma_debug_dent,
> +			&dma_unmap_errors);
> +	if (!dma_unmap_errors_dent)
> +		goto out_err;
> +

This whole function could use a a loop to set this up instead of doing
one by one... But that is another patch that can be done later.

>  	return 0;
>  
>  out_err:
> @@ -849,7 +890,8 @@ static void check_unmap(struct dma_debug_entry *ref)
>  	struct hash_bucket *bucket;
>  	unsigned long flags;
>  
> -	if (dma_mapping_error(ref->dev, ref->dev_addr)) {
> +	if (ref->dev_addr == DMA_ERROR_CODE) {

The DMA_ERROR_CODE is not exported on every architecture. Worst yet,
it is not used universally on all IOMMUs - if you look in the GART
(gart_mapping_error) it does not check for the DMA_ERROR_CODE but for
its own bad_dma_addr address. I think using the dma_mapping_errors here
is still a good idea.

> +		dma_unmap_errors += 1;
>  		err_printk(ref->dev, NULL, "DMA-API: device driver tries "
>  			   "to free an invalid DMA memory address\n");
>  		return;
> @@ -915,6 +957,15 @@ static void check_unmap(struct dma_debug_entry *ref)
>  			   dir2name[ref->direction]);
>  	}
>  
> +	if (entry->dma_map_error_flag == dma_map_error_not_checked) {

Wait. Aren't you using dma_map_error_flag to only be up to 3 - as you
are using it to lookup in the string table for its proper name?

Perhaps the string table lookup should use a different flag??

> +		err_printk(ref->dev, entry,
> +			   "DMA-API: device driver failed to check map error"
> +			   "[device address=0x%016llx] [size=%llu bytes] "
> +			   "[mapped as %s]",
> +			   ref->dev_addr, ref->size,
> +			   type2name[entry->type]);
> +	}
> +
>  	hash_bucket_del(entry);
>  	dma_entry_free(entry);
>  
> @@ -1022,8 +1073,11 @@ void debug_dma_map_page(struct device *dev, struct page *page, size_t offset,
>  	if (unlikely(global_disable))
>  		return;
>  
> -	if (unlikely(dma_mapping_error(dev, dma_addr)))
> +	if (dma_addr == DMA_ERROR_CODE) {

Ditto
> +		dma_map_errors += 1;
> +		dma_map_errors_not_checked += 1;
>  		return;
> +	}
>  
>  	entry = dma_entry_alloc();
>  	if (!entry)
> @@ -1035,6 +1089,7 @@ void debug_dma_map_page(struct device *dev, struct page *page, size_t offset,
>  	entry->dev_addr  = dma_addr;
>  	entry->size      = size;
>  	entry->direction = direction;
> +	entry->dma_map_error_flag = dma_map_error_not_checked;

So if it is greater than three, then maperr2str[entry->dma_map_error_flag] is going
to blow up, right?

>  
>  	if (map_single)
>  		entry->type = dma_debug_single;
> @@ -1050,6 +1105,35 @@ void debug_dma_map_page(struct device *dev, struct page *page, size_t offset,
>  }
>  EXPORT_SYMBOL(debug_dma_map_page);
>  
> +void debug_dma_mapping_error(struct device *dev, dma_addr_t dma_addr)
> +{
> +	struct dma_debug_entry ref;
> +	struct dma_debug_entry *entry;
> +	struct hash_bucket *bucket;
> +	unsigned long flags;
> +
> +	if (unlikely(global_disable))
> +		return;
> +
> +	if (dma_addr == DMA_ERROR_CODE) {

Don't use that... Just the IOMMUs' dma_mapping_error call..

> +		dma_map_errors_not_checked -= 1;

Should you check in case the user calls dma_mapping_error more than once on
the same dma_addr?

> +		return;
> +	}
> +
> +	ref.dev = dev;
> +	ref.dev_addr = dma_addr;
> +	bucket = get_hash_bucket(&ref, &flags);
> +	entry = bucket_find_exact(bucket, &ref);
> +
> +	if (!entry) /* very likley dma-api didn't call debug_dma_map_page() */

Ok, so should we re-adjust dma_map_errors_not_checked? Or we
don't care?

> +		goto out;
> +
> +	entry->dma_map_error_flag = dma_map_error_checked;
> +out:
> +	put_hash_bucket(bucket, &flags);
> +}
> +EXPORT_SYMBOL(debug_dma_mapping_error);
> +
>  void debug_dma_unmap_page(struct device *dev, dma_addr_t addr,
>  			  size_t size, int direction, bool map_single)
>  {
> -- 
> 1.7.9.5
> 
> 
> 

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH v2] dma-debug: New interfaces to debug dma mapping errors
  2012-09-26 13:12   ` Konrad Rzeszutek Wilk
@ 2012-09-26 16:23     ` Shuah Khan
  0 siblings, 0 replies; 30+ messages in thread
From: Shuah Khan @ 2012-09-26 16:23 UTC (permalink / raw)
  To: Konrad Rzeszutek Wilk
  Cc: joerg.roedel, tglx, mingo, hpa, rob, akpm, bhelgaas, stern, LKML,
	linux-doc, devel, x86, shuahkhan

On Wed, 2012-09-26 at 09:12 -0400, Konrad Rzeszutek Wilk wrote:
> >
> 
> Thanks for improving this patch. It is looking more and more ready for
> the kernel. With that in mind, I've some comments below.

Good. Thanks for the comments and good suggestions. Will work on v3.
> 
> > 
> > 
> > --- a/lib/dma-debug.c
> > +++ b/lib/dma-debug.c
> > @@ -57,6 +57,7 @@ struct dma_debug_entry {
> >  	int              direction;
> >  	int		 sg_call_ents;
> >  	int		 sg_mapped_ents;
> > +	int		 dma_map_error_flag;
> 
> I don't think you need 'dma_map' as a prefix a this is in an
> dma_debug structure. Perhaps 'map_error_cnt' ?

I can drop dma prefix from all the counts I added.
> 
> But looking at the implementation this is actually an enum?
> Can you do this instead:
> 
> 	enum map_err_type	map_err_type;

Correct. It can be a enum type.

> ?
> 
> >  #ifdef CONFIG_STACKTRACE
> >  	struct		 stack_trace stacktrace;
> >  	unsigned long	 st_entries[DMA_DEBUG_STACKTRACE_ENTRIES];
> > @@ -83,6 +84,11 @@ static u32 global_disable __read_mostly;
> >  /* Global error count */
> >  static u32 error_count;
> >  
> > +/* dma mapping error counts */
> > +static u32 dma_map_errors;
> > +static u32 dma_map_errors_not_checked;
> > +static u32 dma_unmap_errors;
> 
> s/dma//

Will drop dma prefix here.

> 
> > +
> >  /* Global error show enable*/
> >  static u32 show_all_errors __read_mostly;
> >  /* Number of errors to show */
> > @@ -104,6 +110,9 @@ static struct dentry *show_num_errors_dent  __read_mostly;
> >  static struct dentry *num_free_entries_dent __read_mostly;
> >  static struct dentry *min_free_entries_dent __read_mostly;
> >  static struct dentry *filter_dent           __read_mostly;
> > +static struct dentry *dma_map_errors_dent   __read_mostly;
> > +static struct dentry *dma_map_errors_not_checked_dent   __read_mostly;
> > +static struct dentry *dma_unmap_errors_dent __read_mostly;
> 
> Ditto.
ok
> 
> >  
> >  /* per-driver filter related state */
> >  
> > @@ -120,6 +129,15 @@ static const char *type2name[4] = { "single", "page",
> >  static const char *dir2name[4] = { "DMA_BIDIRECTIONAL", "DMA_TO_DEVICE",
> >  				   "DMA_FROM_DEVICE", "DMA_NONE" };
> >  
> > +enum {
> > +	dma_map_error_check_not_applicable,
> > +	dma_map_error_not_checked,
> > +	dma_map_error_checked,
> > +};
> 
> s/dma//
> s/error/err//
> 
> And perhaps make them uppercase?
Yup. Uppercase will be in line with the general enum convention.
> 
> Can you name the enum? Say 'map_err_types'
> 
> 
> > +static const char *maperr2str[3] = { "dma map error check not applicable",
> > +				     "dma map error not checked",
> > +				     "dma map error checked" };
> > +
> 
> Just do this:
> 
> static const char *const names[] = {
> 		[err_check_na] = {"check n/a"},
> 		[err_not_checked] = {"not checked"},
> 		.. snip..
> 		};
> 

yes. Will work better than the current.

> That way you don't have to worry about the size and can just
> use ARRAY_SIZE(names) to check for valid enum size.
> 
> >  /* little merge helper - remove it after the merge window */
> >  #ifndef BUS_NOTIFY_UNBOUND_DRIVER
> >  #define BUS_NOTIFY_UNBOUND_DRIVER 0x0005
> > @@ -381,11 +399,12 @@ void debug_dma_dump_mappings(struct device *dev)
> >  		list_for_each_entry(entry, &bucket->list, list) {
> >  			if (!dev || dev == entry->dev) {
> >  				dev_info(entry->dev,
> > -					 "%s idx %d P=%Lx D=%Lx L=%Lx %s\n",
> > +					 "%s idx %d P=%Lx D=%Lx L=%Lx %s %s\n",
> >  					 type2name[entry->type], idx,
> >  					 (unsigned long long)entry->paddr,
> >  					 entry->dev_addr, entry->size,
> > -					 dir2name[entry->direction]);
> > +					 dir2name[entry->direction],
> > +					 maperr2str[entry->dma_map_error_flag]);
> >  			}
> >  		}
> >  
> > @@ -695,6 +714,28 @@ static int dma_debug_fs_init(void)
> >  	if (!filter_dent)
> >  		goto out_err;
> >  
> > +	dma_map_errors_dent = debugfs_create_u32("dma_map_errors", 0444,
> 
> Just call it 'map_errors'
> 
> > +			dma_debug_dent,
> > +			&dma_map_errors);
> > +
> > +	if (!dma_map_errors_dent)
> > +		goto out_err;
> > +
> > +	dma_map_errors_not_checked_dent = debugfs_create_u32(
> > +			"dma_map_errors_not_checked",
> 
> Ditto. s/dma//
> > +			0444,
> > +			dma_debug_dent,
> > +			&dma_map_errors_not_checked);
> > +
> > +	if (!dma_map_errors_not_checked_dent)
> > +		goto out_err;
> > +
> > +	dma_unmap_errors_dent = debugfs_create_u32("dma_unmap_errors", 0444,
> 
> s/dma//
> > +			dma_debug_dent,
> > +			&dma_unmap_errors);
> > +	if (!dma_unmap_errors_dent)
> > +		goto out_err;
> > +
> 
> This whole function could use a a loop to set this up instead of doing
> one by one... But that is another patch that can be done later.

Yes.
> 
> >  	return 0;
> >  
> >  out_err:
> > @@ -849,7 +890,8 @@ static void check_unmap(struct dma_debug_entry *ref)
> >  	struct hash_bucket *bucket;
> >  	unsigned long flags;
> >  
> > -	if (dma_mapping_error(ref->dev, ref->dev_addr)) {
> > +	if (ref->dev_addr == DMA_ERROR_CODE) {
> 
> The DMA_ERROR_CODE is not exported on every architecture. Worst yet,
> it is not used universally on all IOMMUs - if you look in the GART
> (gart_mapping_error) it does not check for the DMA_ERROR_CODE but for
> its own bad_dma_addr address. I think using the dma_mapping_errors here
> is still a good idea.

Good point.
> 
> > +		dma_unmap_errors += 1;
> >  		err_printk(ref->dev, NULL, "DMA-API: device driver tries "
> >  			   "to free an invalid DMA memory address\n");
> >  		return;
> > @@ -915,6 +957,15 @@ static void check_unmap(struct dma_debug_entry *ref)
> >  			   dir2name[ref->direction]);
> >  	}
> >  
> > +	if (entry->dma_map_error_flag == dma_map_error_not_checked) {
> 
> Wait. Aren't you using dma_map_error_flag to only be up to 3 - as you
> are using it to lookup in the string table for its proper name?
> 
> Perhaps the string table lookup should use a different flag??
> 
> > +		err_printk(ref->dev, entry,
> > +			   "DMA-API: device driver failed to check map error"
> > +			   "[device address=0x%016llx] [size=%llu bytes] "
> > +			   "[mapped as %s]",
> > +			   ref->dev_addr, ref->size,
> > +			   type2name[entry->type]);
> > +	}
> > +
> >  	hash_bucket_del(entry);
> >  	dma_entry_free(entry);
> >  
> > @@ -1022,8 +1073,11 @@ void debug_dma_map_page(struct device *dev, struct page *page, size_t offset,
> >  	if (unlikely(global_disable))
> >  		return;
> >  
> > -	if (unlikely(dma_mapping_error(dev, dma_addr)))
> > +	if (dma_addr == DMA_ERROR_CODE) {
> 
> Ditto
> > +		dma_map_errors += 1;
> > +		dma_map_errors_not_checked += 1;
> >  		return;
> > +	}
> >  
> >  	entry = dma_entry_alloc();
> >  	if (!entry)
> > @@ -1035,6 +1089,7 @@ void debug_dma_map_page(struct device *dev, struct page *page, size_t offset,
> >  	entry->dev_addr  = dma_addr;
> >  	entry->size      = size;
> >  	entry->direction = direction;
> > +	entry->dma_map_error_flag = dma_map_error_not_checked;
> 
> So if it is greater than three, then maperr2str[entry->dma_map_error_flag] is going
> to blow up, right?
Yeah. It shouldn't, but would  be good to check.
> 
> >  
> >  	if (map_single)
> >  		entry->type = dma_debug_single;
> > @@ -1050,6 +1105,35 @@ void debug_dma_map_page(struct device *dev, struct page *page, size_t offset,
> >  }
> >  EXPORT_SYMBOL(debug_dma_map_page);
> >  
> > +void debug_dma_mapping_error(struct device *dev, dma_addr_t dma_addr)
> > +{
> > +	struct dma_debug_entry ref;
> > +	struct dma_debug_entry *entry;
> > +	struct hash_bucket *bucket;
> > +	unsigned long flags;
> > +
> > +	if (unlikely(global_disable))
> > +		return;
> > +
> > +	if (dma_addr == DMA_ERROR_CODE) {
> 
> Don't use that... Just the IOMMUs' dma_mapping_error call..
> 
> > +		dma_map_errors_not_checked -= 1;
> 
> Should you check in case the user calls dma_mapping_error more than once on
> the same dma_addr?

Correct. Check is needed especially coupled with dma-debug internal use
of dma_mapping_error().

> 
> > +		return;
> > +	}
> > +
> > +	ref.dev = dev;
> > +	ref.dev_addr = dma_addr;
> > +	bucket = get_hash_bucket(&ref, &flags);
> > +	entry = bucket_find_exact(bucket, &ref);
> > +
> > +	if (!entry) /* very likley dma-api didn't call debug_dma_map_page() */
> 
> Ok, so should we re-adjust dma_map_errors_not_checked? Or we
> don't care?
> 
> > +		goto out;
> > +
> > +	entry->dma_map_error_flag = dma_map_error_checked;
> > +out:
> > +	put_hash_bucket(bucket, &flags);
> > +}
> > +EXPORT_SYMBOL(debug_dma_mapping_error);
> > +
> >  void debug_dma_unmap_page(struct device *dev, dma_addr_t addr,
> >  			  size_t size, int direction, bool map_single)
> >  {
> > -- 
> > 1.7.9.5
> > 
> > 
> > 



^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH v2] dma-debug: New interfaces to debug dma mapping errors
  2012-09-26  1:05 ` [PATCH v2] " Shuah Khan
  2012-09-26 13:12   ` Konrad Rzeszutek Wilk
@ 2012-09-27 10:20   ` Joerg Roedel
  2012-09-27 14:13     ` Shuah Khan
  2012-10-03 14:55   ` [PATCH v3] " Shuah Khan
  2 siblings, 1 reply; 30+ messages in thread
From: Joerg Roedel @ 2012-09-27 10:20 UTC (permalink / raw)
  To: Shuah Khan
  Cc: konrad.wilk, tglx, mingo, hpa, rob, akpm, bhelgaas, stern, LKML,
	linux-doc, devel, x86, shuahkhan

Hi Shuah,

the patch looks better then the older versions. It comes closer to a
merge, but I see one issue here:

On Tue, Sep 25, 2012 at 07:05:17PM -0600, Shuah Khan wrote:
> debug_dma_mapping_error(struct device *dev, dma_addr_t dma_addr):
> 	Sets dma map error checked status for the dma map entry if one is
> 	found. Decrements the system wide dma_map_errors_not_checked counter
> 	that is incremented by debug_dma_map_page() when it checks for
> 	mapping error before adding it to the dma debug entry table.

It is problematic that the DMA debug code can no longer call
dma_mapping_error because of this. How about adding a special version of
that function which does no checking and use it in the DMA debug code
instead?


Regards,

	Joerg

-- 
AMD Operating System Research Center

Advanced Micro Devices GmbH Einsteinring 24 85609 Dornach
General Managers: Alberto Bozzo
Registration: Dornach, Landkr. Muenchen; Registerger. Muenchen, HRB Nr. 43632


^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH v2] dma-debug: New interfaces to debug dma mapping errors
  2012-09-27 10:20   ` Joerg Roedel
@ 2012-09-27 14:13     ` Shuah Khan
  0 siblings, 0 replies; 30+ messages in thread
From: Shuah Khan @ 2012-09-27 14:13 UTC (permalink / raw)
  To: Joerg Roedel
  Cc: konrad.wilk, tglx, mingo, hpa, rob, akpm, bhelgaas, stern, LKML,
	linux-doc, devel, x86, shuahkhan

Hi Joerg,

On Thu, 2012-09-27 at 12:20 +0200, Joerg Roedel wrote:
> Hi Shuah,
> 
> the patch looks better then the older versions. It comes closer to a
> merge, but I see one issue here:
> 
> On Tue, Sep 25, 2012 at 07:05:17PM -0600, Shuah Khan wrote:
> > debug_dma_mapping_error(struct device *dev, dma_addr_t dma_addr):
> > 	Sets dma map error checked status for the dma map entry if one is
> > 	found. Decrements the system wide dma_map_errors_not_checked counter
> > 	that is incremented by debug_dma_map_page() when it checks for
> > 	mapping error before adding it to the dma debug entry table.
> 
> It is problematic that the DMA debug code can no longer call
> dma_mapping_error because of this. How about adding a special version of
> that function which does no checking and use it in the DMA debug code
> instead?

Yes. Konrad expressed the same concern and I understand the reasons.
Defining a special version of dma_mapping_error() will eliminate the
problem that prevents this routine being used from dma-debug api. Will
do that. Working on a v3 patch to fix this problem as well as the other
review comments from Konrad. Hoping I can have it ready in by the end of
this week(end)

Thanks,
-- Shuah


^ permalink raw reply	[flat|nested] 30+ messages in thread

* [PATCH v3] dma-debug: New interfaces to debug dma mapping errors
  2012-09-26  1:05 ` [PATCH v2] " Shuah Khan
  2012-09-26 13:12   ` Konrad Rzeszutek Wilk
  2012-09-27 10:20   ` Joerg Roedel
@ 2012-10-03 14:55   ` Shuah Khan
  2012-10-03 21:45     ` Andrew Morton
                       ` (2 more replies)
  2 siblings, 3 replies; 30+ messages in thread
From: Shuah Khan @ 2012-10-03 14:55 UTC (permalink / raw)
  To: konrad.wilk, tglx, mingo, hpa, rob, akpm, stern, joerg.roedel, bhelgaas
  Cc: LKML, linux-doc, devel, x86, shuahkhan

A recent dma mapping error analysis effort showed that a large percentage
of dma_map_single() and dma_map_page() returns are not checked for mapping
errors.

Reference:
http://linuxdriverproject.org/mediawiki/index.php/DMA_Mapping_Error_Analysis

Adding support for tracking dma mapping and unmapping errors to help assess
the following:

When do dma mapping errors get detected?
How often do these errors occur?
Why don't we see failures related to missing dma mapping error checks?
Are they silent failures?

Patch v3: Addresses review and design comments from Patch v2.
Patch v2: Addressed design issues from Patch v1.

Enhance dma-debug infrastructure to track dma mapping, and unmapping errors.

map_errors: (system wide counter)
  Total number of dma mapping errors returned by the dma mapping interfaces,
  in response to mapping requests from all devices in the system.
map_errors_not_checked: (system wide counter)
  Total number of dma mapping errors devices failed to check before using
  the returned address.
unmap_errors: (system wide counter)
  Total number of times devices tried to unmap or free an invalid dma
  address.
map_err_type: (new field added to dma_debug_entry structure)
  New field to maintain dma mapping error check status. This error type
  is applicable to the dma map page and dma map single entries tracked by
  dma-debug api. This status indicates whether or not a good mapping is
  checked by the device before its use. dma_map_single() and dma_map_page()
  could fail to create a mapping in some cases, and drivers are expected to
  call dma_mapping_error() to check for errors. Please note that this is not
  counter.

Enhancements to dma-debug api are made to add new debugfs interfaces to
report total dma errors, dma errors that are not checked, and unmap errors
for the entire system. Please note that these are system wide counters for
all devices in the system.

The following new dma-debug interface is added:

debug_dma_mapping_error(struct device *dev, dma_addr_t dma_addr)
	Sets dma map error checked status for the dma map entry if one is
	found. Decrements the system wide dma_map_errors_not_checked counter
	that is incremented by debug_dma_map_page() when it checks for
	mapping error before adding it to the dma debug entry table.

New dma-debug internal interface:
check_mapping_error(struct device *dev, dma_addr_t dma_addr)
Calling dma_mapping_error() from dma-debug api will result in dma mapping
check when it shouldn't. This internal routine checks dma mapping error
without any debug checks. 

The following existing dma-debug api are changed to support this feature:
debug_dma_map_page()
	Increments dma_map_errors and dma_map_errors_not_checked errors totals
	for the system, dma-debug api keeps track of, when dma_addr is invalid.
	This routine now calls internal check_mapping_error() interface to
	avoid doing dma mapping debug checks from dma-debug internal mapping
	error checks.
check_unmap()
	This is an existing internal routines that checks for various mapping
	errors. Changed to increment system wide dma_unmap_errors, when a
	device requests an invalid address to be unmapped. This routine now
	calls internal check_mapping_error() interface to avoid doing dma
	mapping debug checks from dma-debug internal mapping error checks.

Changed arch/x86/include/asm/dma-mapping.h to call debug_dma_mapping_error()
to validate these new interfaces on x86_64. Other architectures will be
changed in a subsequent patch.

Tested: Intel iommu and swiotlb (iommu=soft) on x86-64 with
        CONFIG_DMA_API_DEBUG enabled and disabled.

Signed-off-by: Shuah Khan <shuah.khan@hp.com>
---
 Documentation/DMA-API.txt          |   13 ++++
 arch/x86/include/asm/dma-mapping.h |    1 +
 include/linux/dma-debug.h          |    7 ++
 lib/dma-debug.c                    |  128 ++++++++++++++++++++++++++++++++----
 4 files changed, 136 insertions(+), 13 deletions(-)

diff --git a/Documentation/DMA-API.txt b/Documentation/DMA-API.txt
index 66bd97a..886aaf9 100644
--- a/Documentation/DMA-API.txt
+++ b/Documentation/DMA-API.txt
@@ -638,6 +638,19 @@ this directory the following files can currently be found:
 	dma-api/error_count	This file is read-only and shows the total
 				numbers of errors found.
 
+	dma-api/map_errors	This file is read-only and shows the total
+				number of dma mapping errors detected.
+
+	dma-api/map_errors_not_checked
+				This file is read-only and shows the total
+				number of dma mapping errors, device drivers
+				failed to check prior to using the returned
+				address.
+
+	dma-api/unmap_errors
+				This file is read-only and shows the total
+				number of invalid dma unmapping attempts.
+
 	dma-api/num_errors	The number in this file shows how many
 				warnings will be printed to the kernel log
 				before it stops. This number is initialized to
diff --git a/arch/x86/include/asm/dma-mapping.h b/arch/x86/include/asm/dma-mapping.h
index f7b4c79..808dae6 100644
--- a/arch/x86/include/asm/dma-mapping.h
+++ b/arch/x86/include/asm/dma-mapping.h
@@ -47,6 +47,7 @@ static inline struct dma_map_ops *get_dma_ops(struct device *dev)
 static inline int dma_mapping_error(struct device *dev, dma_addr_t dma_addr)
 {
 	struct dma_map_ops *ops = get_dma_ops(dev);
+	debug_dma_mapping_error(dev, dma_addr);
 	if (ops->mapping_error)
 		return ops->mapping_error(dev, dma_addr);
 
diff --git a/include/linux/dma-debug.h b/include/linux/dma-debug.h
index 171ad8a..fc0e34c 100644
--- a/include/linux/dma-debug.h
+++ b/include/linux/dma-debug.h
@@ -39,6 +39,8 @@ extern void debug_dma_map_page(struct device *dev, struct page *page,
 			       int direction, dma_addr_t dma_addr,
 			       bool map_single);
 
+extern void debug_dma_mapping_error(struct device *dev, dma_addr_t dma_addr);
+
 extern void debug_dma_unmap_page(struct device *dev, dma_addr_t addr,
 				 size_t size, int direction, bool map_single);
 
@@ -105,6 +107,11 @@ static inline void debug_dma_map_page(struct device *dev, struct page *page,
 {
 }
 
+static inline void debug_dma_mapping_error(struct device *dev,
+					  dma_addr_t dma_addr)
+{
+}
+
 static inline void debug_dma_unmap_page(struct device *dev, dma_addr_t addr,
 					size_t size, int direction,
 					bool map_single)
diff --git a/lib/dma-debug.c b/lib/dma-debug.c
index 66ce414..1708a10 100644
--- a/lib/dma-debug.c
+++ b/lib/dma-debug.c
@@ -45,18 +45,25 @@ enum {
 	dma_debug_coherent,
 };
 
+enum map_err_types {
+	MAP_ERR_CHECK_NOT_APPLICABLE,
+	MAP_ERR_NOT_CHECKED,
+	MAP_ERR_CHECKED,
+};
+
 #define DMA_DEBUG_STACKTRACE_ENTRIES 5
 
 struct dma_debug_entry {
-	struct list_head list;
-	struct device    *dev;
-	int              type;
-	phys_addr_t      paddr;
-	u64              dev_addr;
-	u64              size;
-	int              direction;
-	int		 sg_call_ents;
-	int		 sg_mapped_ents;
+	struct list_head    list;
+	struct device       *dev;
+	int                 type;
+	phys_addr_t         paddr;
+	u64                 dev_addr;
+	u64                 size;
+	int                 direction;
+	int		    sg_call_ents;
+	int		    sg_mapped_ents;
+	enum map_err_types  map_err_type;
 #ifdef CONFIG_STACKTRACE
 	struct		 stack_trace stacktrace;
 	unsigned long	 st_entries[DMA_DEBUG_STACKTRACE_ENTRIES];
@@ -83,6 +90,11 @@ static u32 global_disable __read_mostly;
 /* Global error count */
 static u32 error_count;
 
+/* dma mapping error counts */
+static u32 map_errors;
+static u32 map_errors_not_checked;
+static u32 unmap_errors;
+
 /* Global error show enable*/
 static u32 show_all_errors __read_mostly;
 /* Number of errors to show */
@@ -104,6 +116,9 @@ static struct dentry *show_num_errors_dent  __read_mostly;
 static struct dentry *num_free_entries_dent __read_mostly;
 static struct dentry *min_free_entries_dent __read_mostly;
 static struct dentry *filter_dent           __read_mostly;
+static struct dentry *map_errors_dent   __read_mostly;
+static struct dentry *map_errors_not_checked_dent   __read_mostly;
+static struct dentry *unmap_errors_dent __read_mostly;
 
 /* per-driver filter related state */
 
@@ -114,6 +129,12 @@ static struct device_driver *current_driver                    __read_mostly;
 
 static DEFINE_RWLOCK(driver_name_lock);
 
+static const char *const maperr2str[] = {
+	[MAP_ERR_CHECK_NOT_APPLICABLE] = "dma map error check not applicable",
+	[MAP_ERR_NOT_CHECKED] = "dma map error not checked",
+	[MAP_ERR_CHECKED] = "dma map error checked",
+};
+
 static const char *type2name[4] = { "single", "page",
 				    "scather-gather", "coherent" };
 
@@ -381,11 +402,12 @@ void debug_dma_dump_mappings(struct device *dev)
 		list_for_each_entry(entry, &bucket->list, list) {
 			if (!dev || dev == entry->dev) {
 				dev_info(entry->dev,
-					 "%s idx %d P=%Lx D=%Lx L=%Lx %s\n",
+					 "%s idx %d P=%Lx D=%Lx L=%Lx %s %s\n",
 					 type2name[entry->type], idx,
 					 (unsigned long long)entry->paddr,
 					 entry->dev_addr, entry->size,
-					 dir2name[entry->direction]);
+					 dir2name[entry->direction],
+					 maperr2str[entry->map_err_type]);
 			}
 		}
 
@@ -695,6 +717,28 @@ static int dma_debug_fs_init(void)
 	if (!filter_dent)
 		goto out_err;
 
+	map_errors_dent = debugfs_create_u32("map_errors", 0444,
+			dma_debug_dent,
+			&map_errors);
+
+	if (!map_errors_dent)
+		goto out_err;
+
+	map_errors_not_checked_dent = debugfs_create_u32(
+			"map_errors_not_checked",
+			0444,
+			dma_debug_dent,
+			&map_errors_not_checked);
+
+	if (!map_errors_not_checked_dent)
+		goto out_err;
+
+	unmap_errors_dent = debugfs_create_u32("unmap_errors", 0444,
+			dma_debug_dent,
+			&unmap_errors);
+	if (!unmap_errors_dent)
+		goto out_err;
+
 	return 0;
 
 out_err:
@@ -843,13 +887,29 @@ static __init int dma_debug_entries_cmdline(char *str)
 __setup("dma_debug=", dma_debug_cmdline);
 __setup("dma_debug_entries=", dma_debug_entries_cmdline);
 
+/* Calling dma_mapping_error() from dma-debug api will result in calling
+   debug_dma_mapping_error() - need internal mapping error routine to
+   avoid debug checks */
+#ifndef DMA_ERROR_CODE
+#define DMA_ERROR_CODE 0
+#endif
+static inline int check_mapping_error(struct device *dev, dma_addr_t dma_addr)
+{
+	const struct dma_map_ops *ops = get_dma_ops(dev);
+	if (ops->mapping_error)
+		return ops->mapping_error(dev, dma_addr);
+
+	return (dma_addr == DMA_ERROR_CODE);
+}
+
 static void check_unmap(struct dma_debug_entry *ref)
 {
 	struct dma_debug_entry *entry;
 	struct hash_bucket *bucket;
 	unsigned long flags;
 
-	if (dma_mapping_error(ref->dev, ref->dev_addr)) {
+	if (unlikely(check_mapping_error(ref->dev, ref->dev_addr))) {
+		unmap_errors += 1;
 		err_printk(ref->dev, NULL, "DMA-API: device driver tries "
 			   "to free an invalid DMA memory address\n");
 		return;
@@ -915,6 +975,15 @@ static void check_unmap(struct dma_debug_entry *ref)
 			   dir2name[ref->direction]);
 	}
 
+	if (entry->map_err_type == MAP_ERR_NOT_CHECKED) {
+		err_printk(ref->dev, entry,
+			   "DMA-API: device driver failed to check map error"
+			   "[device address=0x%016llx] [size=%llu bytes] "
+			   "[mapped as %s]",
+			   ref->dev_addr, ref->size,
+			   type2name[entry->type]);
+	}
+
 	hash_bucket_del(entry);
 	dma_entry_free(entry);
 
@@ -1022,8 +1091,11 @@ void debug_dma_map_page(struct device *dev, struct page *page, size_t offset,
 	if (unlikely(global_disable))
 		return;
 
-	if (unlikely(dma_mapping_error(dev, dma_addr)))
+	if (unlikely(check_mapping_error(dev, dma_addr))) {
+		map_errors += 1;
+		map_errors_not_checked += 1;
 		return;
+	}
 
 	entry = dma_entry_alloc();
 	if (!entry)
@@ -1035,6 +1107,7 @@ void debug_dma_map_page(struct device *dev, struct page *page, size_t offset,
 	entry->dev_addr  = dma_addr;
 	entry->size      = size;
 	entry->direction = direction;
+	entry->map_err_type = MAP_ERR_NOT_CHECKED;
 
 	if (map_single)
 		entry->type = dma_debug_single;
@@ -1050,6 +1123,35 @@ void debug_dma_map_page(struct device *dev, struct page *page, size_t offset,
 }
 EXPORT_SYMBOL(debug_dma_map_page);
 
+void debug_dma_mapping_error(struct device *dev, dma_addr_t dma_addr)
+{
+	struct dma_debug_entry ref;
+	struct dma_debug_entry *entry;
+	struct hash_bucket *bucket;
+	unsigned long flags;
+
+	if (unlikely(global_disable))
+		return;
+
+	ref.dev = dev;
+	ref.dev_addr = dma_addr;
+	bucket = get_hash_bucket(&ref, &flags);
+	entry = bucket_find_exact(bucket, &ref);
+
+	if (!entry) {
+		/* very likley dma-api didn't call debug_dma_map_page() or
+		   debug_dma_map_page() detected mapping error */
+		if (map_errors_not_checked)
+			map_errors_not_checked -= 1;
+		goto out;
+	}
+
+	entry->map_err_type = MAP_ERR_CHECKED;
+out:
+	put_hash_bucket(bucket, &flags);
+}
+EXPORT_SYMBOL(debug_dma_mapping_error);
+
 void debug_dma_unmap_page(struct device *dev, dma_addr_t addr,
 			  size_t size, int direction, bool map_single)
 {
-- 
1.7.9.5




^ permalink raw reply related	[flat|nested] 30+ messages in thread

* Re: [PATCH v3] dma-debug: New interfaces to debug dma mapping errors
  2012-10-03 14:55   ` [PATCH v3] " Shuah Khan
@ 2012-10-03 21:45     ` Andrew Morton
  2012-10-04 14:01       ` Konrad Rzeszutek Wilk
  2012-10-04 17:38     ` Konrad Rzeszutek Wilk
  2012-10-05  1:23     ` [PATCH v4] " Shuah Khan
  2 siblings, 1 reply; 30+ messages in thread
From: Andrew Morton @ 2012-10-03 21:45 UTC (permalink / raw)
  To: shuah.khan
  Cc: konrad.wilk, tglx, mingo, hpa, rob, stern, joerg.roedel,
	bhelgaas, LKML, linux-doc, devel, x86, shuahkhan

On Wed, 03 Oct 2012 08:55:59 -0600
Shuah Khan <shuah.khan@hp.com> wrote:

> A recent dma mapping error analysis effort showed that a large percentage
> of dma_map_single() and dma_map_page() returns are not checked for mapping
> errors.
> 
> Reference:
> http://linuxdriverproject.org/mediawiki/index.php/DMA_Mapping_Error_Analysis
> 
> Adding support for tracking dma mapping and unmapping errors to help assess
> the following:
> 
> When do dma mapping errors get detected?
> How often do these errors occur?
> Why don't we see failures related to missing dma mapping error checks?
> Are they silent failures?

This seems to be a strange way of addressing kernel programming errors.
Instead of fixing them up, we generate lots of statistics about how
often they happen!

Would it not be better to find and fix the buggy code sites?  A
coccinelle script wold probably help here.

And let's also look at *why* we keep doing this.  Partly it's because
these things are self-propagating - people copy-n-paste bad code so we
get more bad code.


Another reason surely is the poor documentation.  Suppose our diligent
programmer goes to the dma_map_single() definition site:

#define dma_map_single(d, a, s, r) dma_map_single_attrs(d, a, s, r, NULL)

No documentation at all.  Because it's a stupid macro it doesn't even
give the types and names of the arguments or the type of the return
value.

So he goes to dma_map_single_attrs() and finds that is altogether
undocmented.

So he goes into Documentation/DMA-API-HOWTO.txt, searches for
"dma_map_single" and finds

: To map a single region, you do:
: 
: 	struct device *dev = &my_dev->dev;
: 	dma_addr_t dma_handle;
: 	void *addr = buffer->ptr;
: 	size_t size = buffer->len;
: 
: 	dma_handle = dma_map_single(dev, addr, size, direction);
: 
: and to unmap it:
: 
: 	dma_unmap_single(dev, dma_handle, size, direction);


So it is hardly surprising that we keep screwing this up!

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH v3] dma-debug: New interfaces to debug dma mapping errors
  2012-10-03 21:45     ` Andrew Morton
@ 2012-10-04 14:01       ` Konrad Rzeszutek Wilk
  2012-10-04 22:16         ` Shuah Khan
  0 siblings, 1 reply; 30+ messages in thread
From: Konrad Rzeszutek Wilk @ 2012-10-04 14:01 UTC (permalink / raw)
  To: Andrew Morton
  Cc: shuah.khan, konrad.wilk, tglx, mingo, hpa, rob, stern,
	joerg.roedel, bhelgaas, LKML, linux-doc, devel, x86, shuahkhan

On Wed, Oct 03, 2012 at 02:45:11PM -0700, Andrew Morton wrote:
> On Wed, 03 Oct 2012 08:55:59 -0600
> Shuah Khan <shuah.khan@hp.com> wrote:
> 
> > A recent dma mapping error analysis effort showed that a large percentage
> > of dma_map_single() and dma_map_page() returns are not checked for mapping
> > errors.
> > 
> > Reference:
> > http://linuxdriverproject.org/mediawiki/index.php/DMA_Mapping_Error_Analysis
> > 
> > Adding support for tracking dma mapping and unmapping errors to help assess
> > the following:
> > 
> > When do dma mapping errors get detected?
> > How often do these errors occur?
> > Why don't we see failures related to missing dma mapping error checks?
> > Are they silent failures?
> 
> This seems to be a strange way of addressing kernel programming errors.
> Instead of fixing them up, we generate lots of statistics about how
> often they happen!

And by using this we can fix the drivers.
> 
> Would it not be better to find and fix the buggy code sites?  A
> coccinelle script wold probably help here.

That is the end goal (fixing the buggy code sites). Shuah has
identified the bad culprits.

> 
> And let's also look at *why* we keep doing this.  Partly it's because
> these things are self-propagating - people copy-n-paste bad code so we
> get more bad code.


And this patch will now tell the poor engineers that write new code that
they pasted bad code.

> 
> 
> Another reason surely is the poor documentation.  Suppose our diligent
> programmer goes to the dma_map_single() definition site:
> 
> #define dma_map_single(d, a, s, r) dma_map_single_attrs(d, a, s, r, NULL)
> 
> No documentation at all.  Because it's a stupid macro it doesn't even
> give the types and names of the arguments or the type of the return
> value.
> 
> So he goes to dma_map_single_attrs() and finds that is altogether
> undocmented.
> 
> So he goes into Documentation/DMA-API-HOWTO.txt, searches for
> "dma_map_single" and finds
> 
> : To map a single region, you do:
> : 
> : 	struct device *dev = &my_dev->dev;
> : 	dma_addr_t dma_handle;
> : 	void *addr = buffer->ptr;
> : 	size_t size = buffer->len;
> : 
> : 	dma_handle = dma_map_single(dev, addr, size, direction);
> : 
> : and to unmap it:
> : 
> : 	dma_unmap_single(dev, dma_handle, size, direction);
> 
> 
> So it is hardly surprising that we keep screwing this up!

Right, so that should be also modified (Thank you for looking at that)!

> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/
> 

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH v3] dma-debug: New interfaces to debug dma mapping errors
  2012-10-03 14:55   ` [PATCH v3] " Shuah Khan
  2012-10-03 21:45     ` Andrew Morton
@ 2012-10-04 17:38     ` Konrad Rzeszutek Wilk
  2012-10-04 22:19       ` Shuah Khan
  2012-10-05  1:23     ` [PATCH v4] " Shuah Khan
  2 siblings, 1 reply; 30+ messages in thread
From: Konrad Rzeszutek Wilk @ 2012-10-04 17:38 UTC (permalink / raw)
  To: Shuah Khan
  Cc: konrad.wilk, tglx, mingo, hpa, rob, akpm, stern, joerg.roedel,
	bhelgaas, LKML, linux-doc, devel, x86, shuahkhan

On Wed, Oct 03, 2012 at 08:55:59AM -0600, Shuah Khan wrote:
> A recent dma mapping error analysis effort showed that a large percentage
> of dma_map_single() and dma_map_page() returns are not checked for mapping
> errors.
> 
> Reference:
> http://linuxdriverproject.org/mediawiki/index.php/DMA_Mapping_Error_Analysis
> 
> Adding support for tracking dma mapping and unmapping errors to help assess
> the following:
> 
> When do dma mapping errors get detected?
> How often do these errors occur?
> Why don't we see failures related to missing dma mapping error checks?
> Are they silent failures?
> 
> Patch v3: Addresses review and design comments from Patch v2.
> Patch v2: Addressed design issues from Patch v1.
> 
> Enhance dma-debug infrastructure to track dma mapping, and unmapping errors.
> 
> map_errors: (system wide counter)
>   Total number of dma mapping errors returned by the dma mapping interfaces,
>   in response to mapping requests from all devices in the system.
> map_errors_not_checked: (system wide counter)
>   Total number of dma mapping errors devices failed to check before using
>   the returned address.
> unmap_errors: (system wide counter)
>   Total number of times devices tried to unmap or free an invalid dma
>   address.
> map_err_type: (new field added to dma_debug_entry structure)
>   New field to maintain dma mapping error check status. This error type
>   is applicable to the dma map page and dma map single entries tracked by
>   dma-debug api. This status indicates whether or not a good mapping is
>   checked by the device before its use. dma_map_single() and dma_map_page()
>   could fail to create a mapping in some cases, and drivers are expected to
>   call dma_mapping_error() to check for errors. Please note that this is not
>   counter.
> 
> Enhancements to dma-debug api are made to add new debugfs interfaces to
> report total dma errors, dma errors that are not checked, and unmap errors
> for the entire system. Please note that these are system wide counters for
> all devices in the system.
> 
> The following new dma-debug interface is added:
> 
> debug_dma_mapping_error(struct device *dev, dma_addr_t dma_addr)
> 	Sets dma map error checked status for the dma map entry if one is
> 	found. Decrements the system wide dma_map_errors_not_checked counter
> 	that is incremented by debug_dma_map_page() when it checks for
> 	mapping error before adding it to the dma debug entry table.
> 
> New dma-debug internal interface:
> check_mapping_error(struct device *dev, dma_addr_t dma_addr)
> Calling dma_mapping_error() from dma-debug api will result in dma mapping
> check when it shouldn't. This internal routine checks dma mapping error
> without any debug checks. 
> 
> The following existing dma-debug api are changed to support this feature:
> debug_dma_map_page()
> 	Increments dma_map_errors and dma_map_errors_not_checked errors totals
> 	for the system, dma-debug api keeps track of, when dma_addr is invalid.
> 	This routine now calls internal check_mapping_error() interface to
> 	avoid doing dma mapping debug checks from dma-debug internal mapping
> 	error checks.
> check_unmap()
> 	This is an existing internal routines that checks for various mapping
> 	errors. Changed to increment system wide dma_unmap_errors, when a
> 	device requests an invalid address to be unmapped. This routine now
> 	calls internal check_mapping_error() interface to avoid doing dma
> 	mapping debug checks from dma-debug internal mapping error checks.
> 
> Changed arch/x86/include/asm/dma-mapping.h to call debug_dma_mapping_error()
> to validate these new interfaces on x86_64. Other architectures will be
> changed in a subsequent patch.
> 
> Tested: Intel iommu and swiotlb (iommu=soft) on x86-64 with
>         CONFIG_DMA_API_DEBUG enabled and disabled.
> 
> Signed-off-by: Shuah Khan <shuah.khan@hp.com>
> ---
>  Documentation/DMA-API.txt          |   13 ++++
>  arch/x86/include/asm/dma-mapping.h |    1 +
>  include/linux/dma-debug.h          |    7 ++
>  lib/dma-debug.c                    |  128 ++++++++++++++++++++++++++++++++----
>  4 files changed, 136 insertions(+), 13 deletions(-)
> 
> diff --git a/Documentation/DMA-API.txt b/Documentation/DMA-API.txt
> index 66bd97a..886aaf9 100644
> --- a/Documentation/DMA-API.txt
> +++ b/Documentation/DMA-API.txt
> @@ -638,6 +638,19 @@ this directory the following files can currently be found:
>  	dma-api/error_count	This file is read-only and shows the total
>  				numbers of errors found.
>  
> +	dma-api/map_errors	This file is read-only and shows the total
> +				number of dma mapping errors detected.
> +
> +	dma-api/map_errors_not_checked
> +				This file is read-only and shows the total
> +				number of dma mapping errors, device drivers
> +				failed to check prior to using the returned
> +				address.
> +
> +	dma-api/unmap_errors
> +				This file is read-only and shows the total
> +				number of invalid dma unmapping attempts.
> +
>  	dma-api/num_errors	The number in this file shows how many
>  				warnings will be printed to the kernel log
>  				before it stops. This number is initialized to
> diff --git a/arch/x86/include/asm/dma-mapping.h b/arch/x86/include/asm/dma-mapping.h
> index f7b4c79..808dae6 100644
> --- a/arch/x86/include/asm/dma-mapping.h
> +++ b/arch/x86/include/asm/dma-mapping.h
> @@ -47,6 +47,7 @@ static inline struct dma_map_ops *get_dma_ops(struct device *dev)
>  static inline int dma_mapping_error(struct device *dev, dma_addr_t dma_addr)
>  {
>  	struct dma_map_ops *ops = get_dma_ops(dev);
> +	debug_dma_mapping_error(dev, dma_addr);
>  	if (ops->mapping_error)
>  		return ops->mapping_error(dev, dma_addr);
>  
> diff --git a/include/linux/dma-debug.h b/include/linux/dma-debug.h
> index 171ad8a..fc0e34c 100644
> --- a/include/linux/dma-debug.h
> +++ b/include/linux/dma-debug.h
> @@ -39,6 +39,8 @@ extern void debug_dma_map_page(struct device *dev, struct page *page,
>  			       int direction, dma_addr_t dma_addr,
>  			       bool map_single);
>  
> +extern void debug_dma_mapping_error(struct device *dev, dma_addr_t dma_addr);
> +
>  extern void debug_dma_unmap_page(struct device *dev, dma_addr_t addr,
>  				 size_t size, int direction, bool map_single);
>  
> @@ -105,6 +107,11 @@ static inline void debug_dma_map_page(struct device *dev, struct page *page,
>  {
>  }
>  
> +static inline void debug_dma_mapping_error(struct device *dev,
> +					  dma_addr_t dma_addr)
> +{
> +}
> +
>  static inline void debug_dma_unmap_page(struct device *dev, dma_addr_t addr,
>  					size_t size, int direction,
>  					bool map_single)
> diff --git a/lib/dma-debug.c b/lib/dma-debug.c
> index 66ce414..1708a10 100644
> --- a/lib/dma-debug.c
> +++ b/lib/dma-debug.c
> @@ -45,18 +45,25 @@ enum {
>  	dma_debug_coherent,
>  };
>  
> +enum map_err_types {
> +	MAP_ERR_CHECK_NOT_APPLICABLE,
> +	MAP_ERR_NOT_CHECKED,
> +	MAP_ERR_CHECKED,
> +};
> +
>  #define DMA_DEBUG_STACKTRACE_ENTRIES 5
>  
>  struct dma_debug_entry {
> -	struct list_head list;
> -	struct device    *dev;
> -	int              type;
> -	phys_addr_t      paddr;
> -	u64              dev_addr;
> -	u64              size;
> -	int              direction;
> -	int		 sg_call_ents;
> -	int		 sg_mapped_ents;
> +	struct list_head    list;
> +	struct device       *dev;
> +	int                 type;
> +	phys_addr_t         paddr;
> +	u64                 dev_addr;
> +	u64                 size;
> +	int                 direction;
> +	int		    sg_call_ents;
> +	int		    sg_mapped_ents;
> +	enum map_err_types  map_err_type;

Something gone wild with your tabs/space. It should be only
one extra field added.

Otherwise:
Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>

>  #ifdef CONFIG_STACKTRACE
>  	struct		 stack_trace stacktrace;
>  	unsigned long	 st_entries[DMA_DEBUG_STACKTRACE_ENTRIES];
> @@ -83,6 +90,11 @@ static u32 global_disable __read_mostly;
>  /* Global error count */
>  static u32 error_count;
>  
> +/* dma mapping error counts */
> +static u32 map_errors;
> +static u32 map_errors_not_checked;
> +static u32 unmap_errors;
> +
>  /* Global error show enable*/
>  static u32 show_all_errors __read_mostly;
>  /* Number of errors to show */
> @@ -104,6 +116,9 @@ static struct dentry *show_num_errors_dent  __read_mostly;
>  static struct dentry *num_free_entries_dent __read_mostly;
>  static struct dentry *min_free_entries_dent __read_mostly;
>  static struct dentry *filter_dent           __read_mostly;
> +static struct dentry *map_errors_dent   __read_mostly;
> +static struct dentry *map_errors_not_checked_dent   __read_mostly;
> +static struct dentry *unmap_errors_dent __read_mostly;
>  
>  /* per-driver filter related state */
>  
> @@ -114,6 +129,12 @@ static struct device_driver *current_driver                    __read_mostly;
>  
>  static DEFINE_RWLOCK(driver_name_lock);
>  
> +static const char *const maperr2str[] = {
> +	[MAP_ERR_CHECK_NOT_APPLICABLE] = "dma map error check not applicable",
> +	[MAP_ERR_NOT_CHECKED] = "dma map error not checked",
> +	[MAP_ERR_CHECKED] = "dma map error checked",
> +};
> +
>  static const char *type2name[4] = { "single", "page",
>  				    "scather-gather", "coherent" };
>  
> @@ -381,11 +402,12 @@ void debug_dma_dump_mappings(struct device *dev)
>  		list_for_each_entry(entry, &bucket->list, list) {
>  			if (!dev || dev == entry->dev) {
>  				dev_info(entry->dev,
> -					 "%s idx %d P=%Lx D=%Lx L=%Lx %s\n",
> +					 "%s idx %d P=%Lx D=%Lx L=%Lx %s %s\n",
>  					 type2name[entry->type], idx,
>  					 (unsigned long long)entry->paddr,
>  					 entry->dev_addr, entry->size,
> -					 dir2name[entry->direction]);
> +					 dir2name[entry->direction],
> +					 maperr2str[entry->map_err_type]);
>  			}
>  		}
>  
> @@ -695,6 +717,28 @@ static int dma_debug_fs_init(void)
>  	if (!filter_dent)
>  		goto out_err;
>  
> +	map_errors_dent = debugfs_create_u32("map_errors", 0444,
> +			dma_debug_dent,
> +			&map_errors);
> +
> +	if (!map_errors_dent)
> +		goto out_err;
> +
> +	map_errors_not_checked_dent = debugfs_create_u32(
> +			"map_errors_not_checked",
> +			0444,
> +			dma_debug_dent,
> +			&map_errors_not_checked);
> +
> +	if (!map_errors_not_checked_dent)
> +		goto out_err;
> +
> +	unmap_errors_dent = debugfs_create_u32("unmap_errors", 0444,
> +			dma_debug_dent,
> +			&unmap_errors);
> +	if (!unmap_errors_dent)
> +		goto out_err;
> +
>  	return 0;
>  
>  out_err:
> @@ -843,13 +887,29 @@ static __init int dma_debug_entries_cmdline(char *str)
>  __setup("dma_debug=", dma_debug_cmdline);
>  __setup("dma_debug_entries=", dma_debug_entries_cmdline);
>  
> +/* Calling dma_mapping_error() from dma-debug api will result in calling
> +   debug_dma_mapping_error() - need internal mapping error routine to
> +   avoid debug checks */
> +#ifndef DMA_ERROR_CODE
> +#define DMA_ERROR_CODE 0
> +#endif
> +static inline int check_mapping_error(struct device *dev, dma_addr_t dma_addr)
> +{
> +	const struct dma_map_ops *ops = get_dma_ops(dev);
> +	if (ops->mapping_error)
> +		return ops->mapping_error(dev, dma_addr);
> +
> +	return (dma_addr == DMA_ERROR_CODE);
> +}
> +
>  static void check_unmap(struct dma_debug_entry *ref)
>  {
>  	struct dma_debug_entry *entry;
>  	struct hash_bucket *bucket;
>  	unsigned long flags;
>  
> -	if (dma_mapping_error(ref->dev, ref->dev_addr)) {
> +	if (unlikely(check_mapping_error(ref->dev, ref->dev_addr))) {
> +		unmap_errors += 1;
>  		err_printk(ref->dev, NULL, "DMA-API: device driver tries "
>  			   "to free an invalid DMA memory address\n");
>  		return;
> @@ -915,6 +975,15 @@ static void check_unmap(struct dma_debug_entry *ref)
>  			   dir2name[ref->direction]);
>  	}
>  
> +	if (entry->map_err_type == MAP_ERR_NOT_CHECKED) {
> +		err_printk(ref->dev, entry,
> +			   "DMA-API: device driver failed to check map error"
> +			   "[device address=0x%016llx] [size=%llu bytes] "
> +			   "[mapped as %s]",
> +			   ref->dev_addr, ref->size,
> +			   type2name[entry->type]);
> +	}
> +
>  	hash_bucket_del(entry);
>  	dma_entry_free(entry);
>  
> @@ -1022,8 +1091,11 @@ void debug_dma_map_page(struct device *dev, struct page *page, size_t offset,
>  	if (unlikely(global_disable))
>  		return;
>  
> -	if (unlikely(dma_mapping_error(dev, dma_addr)))
> +	if (unlikely(check_mapping_error(dev, dma_addr))) {
> +		map_errors += 1;
> +		map_errors_not_checked += 1;
>  		return;
> +	}
>  
>  	entry = dma_entry_alloc();
>  	if (!entry)
> @@ -1035,6 +1107,7 @@ void debug_dma_map_page(struct device *dev, struct page *page, size_t offset,
>  	entry->dev_addr  = dma_addr;
>  	entry->size      = size;
>  	entry->direction = direction;
> +	entry->map_err_type = MAP_ERR_NOT_CHECKED;
>  
>  	if (map_single)
>  		entry->type = dma_debug_single;
> @@ -1050,6 +1123,35 @@ void debug_dma_map_page(struct device *dev, struct page *page, size_t offset,
>  }
>  EXPORT_SYMBOL(debug_dma_map_page);
>  
> +void debug_dma_mapping_error(struct device *dev, dma_addr_t dma_addr)
> +{
> +	struct dma_debug_entry ref;
> +	struct dma_debug_entry *entry;
> +	struct hash_bucket *bucket;
> +	unsigned long flags;
> +
> +	if (unlikely(global_disable))
> +		return;
> +
> +	ref.dev = dev;
> +	ref.dev_addr = dma_addr;
> +	bucket = get_hash_bucket(&ref, &flags);
> +	entry = bucket_find_exact(bucket, &ref);
> +
> +	if (!entry) {
> +		/* very likley dma-api didn't call debug_dma_map_page() or
> +		   debug_dma_map_page() detected mapping error */
> +		if (map_errors_not_checked)
> +			map_errors_not_checked -= 1;
> +		goto out;
> +	}
> +
> +	entry->map_err_type = MAP_ERR_CHECKED;
> +out:
> +	put_hash_bucket(bucket, &flags);
> +}
> +EXPORT_SYMBOL(debug_dma_mapping_error);
> +
>  void debug_dma_unmap_page(struct device *dev, dma_addr_t addr,
>  			  size_t size, int direction, bool map_single)
>  {
> -- 
> 1.7.9.5
> 
> 
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/
> 

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH v3] dma-debug: New interfaces to debug dma mapping errors
  2012-10-04 14:01       ` Konrad Rzeszutek Wilk
@ 2012-10-04 22:16         ` Shuah Khan
  0 siblings, 0 replies; 30+ messages in thread
From: Shuah Khan @ 2012-10-04 22:16 UTC (permalink / raw)
  To: Konrad Rzeszutek Wilk, akpm
  Cc: konrad.wilk, tglx, mingo, hpa, rob, stern, joerg.roedel,
	bhelgaas, LKML, linux-doc, devel, x86, shuahkhan

On Thu, 2012-10-04 at 10:01 -0400, Konrad Rzeszutek Wilk wrote:
> On Wed, Oct 03, 2012 at 02:45:11PM -0700, Andrew Morton wrote:
> > On Wed, 03 Oct 2012 08:55:59 -0600
> > Shuah Khan <shuah.khan@hp.com> wrote:
> > 
> > > A recent dma mapping error analysis effort showed that a large percentage
> > > of dma_map_single() and dma_map_page() returns are not checked for mapping
> > > errors.
> > > 
> > > Reference:
> > > http://linuxdriverproject.org/mediawiki/index.php/DMA_Mapping_Error_Analysis
> > > 
> > > Adding support for tracking dma mapping and unmapping errors to help assess
> > > the following:
> > > 
> > > When do dma mapping errors get detected?
> > > How often do these errors occur?
> > > Why don't we see failures related to missing dma mapping error checks?
> > > Are they silent failures?
> > 
> > This seems to be a strange way of addressing kernel programming errors.
> > Instead of fixing them up, we generate lots of statistics about how
> > often they happen!
> 
> And by using this we can fix the drivers.

Sorry I was out sick for a bit and catching up. Several drivers are
missing checks for mapping errors and in several cases in addition to
missing mapping error checks, drivers are not doing unmap as they
should. Is is very likely the result of cut and paste as you pointed
out. After the analysis, I realized it is worth while spending time to
provide debug infrastructure driver writers can use to find problems and
continue to use it when new mapping code gets added to an existing
driver and/or a new driver is written.

> > 
> > Would it not be better to find and fix the buggy code sites?  A
> > coccinelle script wold probably help here.
> 
> That is the end goal (fixing the buggy code sites). Shuah has
> identified the bad culprits.

I compiled a list and documented it for review. We have to start fixing
them.

> 
> > 
> > And let's also look at *why* we keep doing this.  Partly it's because
> > these things are self-propagating - people copy-n-paste bad code so we
> > get more bad code.
> 
> 
> And this patch will now tell the poor engineers that write new code that
> they pasted bad code.
> 
> > 
> > 
> > Another reason surely is the poor documentation.  Suppose our diligent
> > programmer goes to the dma_map_single() definition site:
> > 
> > #define dma_map_single(d, a, s, r) dma_map_single_attrs(d, a, s, r, NULL)
> > 
> > No documentation at all.  Because it's a stupid macro it doesn't even
> > give the types and names of the arguments or the type of the return
> > value.
> > 
> > So he goes to dma_map_single_attrs() and finds that is altogether
> > undocmented.
> > 
> > So he goes into Documentation/DMA-API-HOWTO.txt, searches for
> > "dma_map_single" and finds
> > 
> > : To map a single region, you do:
> > : 
> > : 	struct device *dev = &my_dev->dev;
> > : 	dma_addr_t dma_handle;
> > : 	void *addr = buffer->ptr;
> > : 	size_t size = buffer->len;
> > : 
> > : 	dma_handle = dma_map_single(dev, addr, size, direction);
> > : 
> > : and to unmap it:
> > : 
> > : 	dma_unmap_single(dev, dma_handle, size, direction);
> > 
> > 
> > So it is hardly surprising that we keep screwing this up!
> 
> Right, so that should be also modified (Thank you for looking at that)!
> 
> > --
> > To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> > the body of a message to majordomo@vger.kernel.org
> > More majordomo info at  http://vger.kernel.org/majordomo-info.html
> > Please read the FAQ at  http://www.tux.org/lkml/
> > 



^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH v3] dma-debug: New interfaces to debug dma mapping errors
  2012-10-04 17:38     ` Konrad Rzeszutek Wilk
@ 2012-10-04 22:19       ` Shuah Khan
  0 siblings, 0 replies; 30+ messages in thread
From: Shuah Khan @ 2012-10-04 22:19 UTC (permalink / raw)
  To: Konrad Rzeszutek Wilk
  Cc: konrad.wilk, tglx, mingo, hpa, rob, akpm, stern, joerg.roedel,
	bhelgaas, LKML, linux-doc, devel, x86, shuahkhan

On Thu, 2012-10-04 at 13:38 -0400, Konrad Rzeszutek Wilk wrote:
> On Wed, Oct 03, 2012 at 08:55:59AM -0600, Shuah Khan wrote:
> > A recent dma mapping error analysis effort showed that a large percentage
> > of dma_map_single() and dma_map_page() returns are not checked for mapping
> > errors.
> > 
> > Reference:
> > http://linuxdriverproject.org/mediawiki/index.php/DMA_Mapping_Error_Analysis
> > 
> > Adding support for tracking dma mapping and unmapping errors to help assess
> > the following:
> > 
> > When do dma mapping errors get detected?
> > How often do these errors occur?
> > Why don't we see failures related to missing dma mapping error checks?
> > Are they silent failures?
> > 
> > Patch v3: Addresses review and design comments from Patch v2.
> > Patch v2: Addressed design issues from Patch v1.
> > 
> > Enhance dma-debug infrastructure to track dma mapping, and unmapping errors.
> > 
> > map_errors: (system wide counter)
> >   Total number of dma mapping errors returned by the dma mapping interfaces,
> >   in response to mapping requests from all devices in the system.
> > map_errors_not_checked: (system wide counter)
> >   Total number of dma mapping errors devices failed to check before using
> >   the returned address.
> > unmap_errors: (system wide counter)
> >   Total number of times devices tried to unmap or free an invalid dma
> >   address.
> > map_err_type: (new field added to dma_debug_entry structure)
> >   New field to maintain dma mapping error check status. This error type
> >   is applicable to the dma map page and dma map single entries tracked by
> >   dma-debug api. This status indicates whether or not a good mapping is
> >   checked by the device before its use. dma_map_single() and dma_map_page()
> >   could fail to create a mapping in some cases, and drivers are expected to
> >   call dma_mapping_error() to check for errors. Please note that this is not
> >   counter.
> > 
> > Enhancements to dma-debug api are made to add new debugfs interfaces to
> > report total dma errors, dma errors that are not checked, and unmap errors
> > for the entire system. Please note that these are system wide counters for
> > all devices in the system.
> > 
> > The following new dma-debug interface is added:
> > 
> > debug_dma_mapping_error(struct device *dev, dma_addr_t dma_addr)
> > 	Sets dma map error checked status for the dma map entry if one is
> > 	found. Decrements the system wide dma_map_errors_not_checked counter
> > 	that is incremented by debug_dma_map_page() when it checks for
> > 	mapping error before adding it to the dma debug entry table.
> > 
> > New dma-debug internal interface:
> > check_mapping_error(struct device *dev, dma_addr_t dma_addr)
> > Calling dma_mapping_error() from dma-debug api will result in dma mapping
> > check when it shouldn't. This internal routine checks dma mapping error
> > without any debug checks. 
> > 
> > The following existing dma-debug api are changed to support this feature:
> > debug_dma_map_page()
> > 	Increments dma_map_errors and dma_map_errors_not_checked errors totals
> > 	for the system, dma-debug api keeps track of, when dma_addr is invalid.
> > 	This routine now calls internal check_mapping_error() interface to
> > 	avoid doing dma mapping debug checks from dma-debug internal mapping
> > 	error checks.
> > check_unmap()
> > 	This is an existing internal routines that checks for various mapping
> > 	errors. Changed to increment system wide dma_unmap_errors, when a
> > 	device requests an invalid address to be unmapped. This routine now
> > 	calls internal check_mapping_error() interface to avoid doing dma
> > 	mapping debug checks from dma-debug internal mapping error checks.
> > 
> > Changed arch/x86/include/asm/dma-mapping.h to call debug_dma_mapping_error()
> > to validate these new interfaces on x86_64. Other architectures will be
> > changed in a subsequent patch.
> > 
> > Tested: Intel iommu and swiotlb (iommu=soft) on x86-64 with
> >         CONFIG_DMA_API_DEBUG enabled and disabled.
> > 
> > Signed-off-by: Shuah Khan <shuah.khan@hp.com>
> > ---
> >  Documentation/DMA-API.txt          |   13 ++++
> >  arch/x86/include/asm/dma-mapping.h |    1 +
> >  include/linux/dma-debug.h          |    7 ++
> >  lib/dma-debug.c                    |  128 ++++++++++++++++++++++++++++++++----
> >  4 files changed, 136 insertions(+), 13 deletions(-)
> > 
> > diff --git a/Documentation/DMA-API.txt b/Documentation/DMA-API.txt
> > index 66bd97a..886aaf9 100644
> > --- a/Documentation/DMA-API.txt
> > +++ b/Documentation/DMA-API.txt
> > @@ -638,6 +638,19 @@ this directory the following files can currently be found:
> >  	dma-api/error_count	This file is read-only and shows the total
> >  				numbers of errors found.
> >  
> > +	dma-api/map_errors	This file is read-only and shows the total
> > +				number of dma mapping errors detected.
> > +
> > +	dma-api/map_errors_not_checked
> > +				This file is read-only and shows the total
> > +				number of dma mapping errors, device drivers
> > +				failed to check prior to using the returned
> > +				address.
> > +
> > +	dma-api/unmap_errors
> > +				This file is read-only and shows the total
> > +				number of invalid dma unmapping attempts.
> > +
> >  	dma-api/num_errors	The number in this file shows how many
> >  				warnings will be printed to the kernel log
> >  				before it stops. This number is initialized to
> > diff --git a/arch/x86/include/asm/dma-mapping.h b/arch/x86/include/asm/dma-mapping.h
> > index f7b4c79..808dae6 100644
> > --- a/arch/x86/include/asm/dma-mapping.h
> > +++ b/arch/x86/include/asm/dma-mapping.h
> > @@ -47,6 +47,7 @@ static inline struct dma_map_ops *get_dma_ops(struct device *dev)
> >  static inline int dma_mapping_error(struct device *dev, dma_addr_t dma_addr)
> >  {
> >  	struct dma_map_ops *ops = get_dma_ops(dev);
> > +	debug_dma_mapping_error(dev, dma_addr);
> >  	if (ops->mapping_error)
> >  		return ops->mapping_error(dev, dma_addr);
> >  
> > diff --git a/include/linux/dma-debug.h b/include/linux/dma-debug.h
> > index 171ad8a..fc0e34c 100644
> > --- a/include/linux/dma-debug.h
> > +++ b/include/linux/dma-debug.h
> > @@ -39,6 +39,8 @@ extern void debug_dma_map_page(struct device *dev, struct page *page,
> >  			       int direction, dma_addr_t dma_addr,
> >  			       bool map_single);
> >  
> > +extern void debug_dma_mapping_error(struct device *dev, dma_addr_t dma_addr);
> > +
> >  extern void debug_dma_unmap_page(struct device *dev, dma_addr_t addr,
> >  				 size_t size, int direction, bool map_single);
> >  
> > @@ -105,6 +107,11 @@ static inline void debug_dma_map_page(struct device *dev, struct page *page,
> >  {
> >  }
> >  
> > +static inline void debug_dma_mapping_error(struct device *dev,
> > +					  dma_addr_t dma_addr)
> > +{
> > +}
> > +
> >  static inline void debug_dma_unmap_page(struct device *dev, dma_addr_t addr,
> >  					size_t size, int direction,
> >  					bool map_single)
> > diff --git a/lib/dma-debug.c b/lib/dma-debug.c
> > index 66ce414..1708a10 100644
> > --- a/lib/dma-debug.c
> > +++ b/lib/dma-debug.c
> > @@ -45,18 +45,25 @@ enum {
> >  	dma_debug_coherent,
> >  };
> >  
> > +enum map_err_types {
> > +	MAP_ERR_CHECK_NOT_APPLICABLE,
> > +	MAP_ERR_NOT_CHECKED,
> > +	MAP_ERR_CHECKED,
> > +};
> > +
> >  #define DMA_DEBUG_STACKTRACE_ENTRIES 5
> >  
> >  struct dma_debug_entry {
> > -	struct list_head list;
> > -	struct device    *dev;
> > -	int              type;
> > -	phys_addr_t      paddr;
> > -	u64              dev_addr;
> > -	u64              size;
> > -	int              direction;
> > -	int		 sg_call_ents;
> > -	int		 sg_mapped_ents;
> > +	struct list_head    list;
> > +	struct device       *dev;
> > +	int                 type;
> > +	phys_addr_t         paddr;
> > +	u64                 dev_addr;
> > +	u64                 size;
> > +	int                 direction;
> > +	int		    sg_call_ents;
> > +	int		    sg_mapped_ents;
> > +	enum map_err_types  map_err_type;
> 
> Something gone wild with your tabs/space. It should be only
> one extra field added.
I tabbed the existing fields over for readability. Guess I should have
avoided the churn and left it alone. Will fix the tabs and send v4.

> 
> Otherwise:
> Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>

Is it ok to add your reviewed by tag when I send v4?

-- Shuah
> 
> >  #ifdef CONFIG_STACKTRACE
> >  	struct		 stack_trace stacktrace;
> >  	unsigned long	 st_entries[DMA_DEBUG_STACKTRACE_ENTRIES];
> > @@ -83,6 +90,11 @@ static u32 global_disable __read_mostly;
> >  /* Global error count */
> >  static u32 error_count;
> >  
> > +/* dma mapping error counts */
> > +static u32 map_errors;
> > +static u32 map_errors_not_checked;
> > +static u32 unmap_errors;
> > +
> >  /* Global error show enable*/
> >  static u32 show_all_errors __read_mostly;
> >  /* Number of errors to show */
> > @@ -104,6 +116,9 @@ static struct dentry *show_num_errors_dent  __read_mostly;
> >  static struct dentry *num_free_entries_dent __read_mostly;
> >  static struct dentry *min_free_entries_dent __read_mostly;
> >  static struct dentry *filter_dent           __read_mostly;
> > +static struct dentry *map_errors_dent   __read_mostly;
> > +static struct dentry *map_errors_not_checked_dent   __read_mostly;
> > +static struct dentry *unmap_errors_dent __read_mostly;
> >  
> >  /* per-driver filter related state */
> >  
> > @@ -114,6 +129,12 @@ static struct device_driver *current_driver                    __read_mostly;
> >  
> >  static DEFINE_RWLOCK(driver_name_lock);
> >  
> > +static const char *const maperr2str[] = {
> > +	[MAP_ERR_CHECK_NOT_APPLICABLE] = "dma map error check not applicable",
> > +	[MAP_ERR_NOT_CHECKED] = "dma map error not checked",
> > +	[MAP_ERR_CHECKED] = "dma map error checked",
> > +};
> > +
> >  static const char *type2name[4] = { "single", "page",
> >  				    "scather-gather", "coherent" };
> >  
> > @@ -381,11 +402,12 @@ void debug_dma_dump_mappings(struct device *dev)
> >  		list_for_each_entry(entry, &bucket->list, list) {
> >  			if (!dev || dev == entry->dev) {
> >  				dev_info(entry->dev,
> > -					 "%s idx %d P=%Lx D=%Lx L=%Lx %s\n",
> > +					 "%s idx %d P=%Lx D=%Lx L=%Lx %s %s\n",
> >  					 type2name[entry->type], idx,
> >  					 (unsigned long long)entry->paddr,
> >  					 entry->dev_addr, entry->size,
> > -					 dir2name[entry->direction]);
> > +					 dir2name[entry->direction],
> > +					 maperr2str[entry->map_err_type]);
> >  			}
> >  		}
> >  
> > @@ -695,6 +717,28 @@ static int dma_debug_fs_init(void)
> >  	if (!filter_dent)
> >  		goto out_err;
> >  
> > +	map_errors_dent = debugfs_create_u32("map_errors", 0444,
> > +			dma_debug_dent,
> > +			&map_errors);
> > +
> > +	if (!map_errors_dent)
> > +		goto out_err;
> > +
> > +	map_errors_not_checked_dent = debugfs_create_u32(
> > +			"map_errors_not_checked",
> > +			0444,
> > +			dma_debug_dent,
> > +			&map_errors_not_checked);
> > +
> > +	if (!map_errors_not_checked_dent)
> > +		goto out_err;
> > +
> > +	unmap_errors_dent = debugfs_create_u32("unmap_errors", 0444,
> > +			dma_debug_dent,
> > +			&unmap_errors);
> > +	if (!unmap_errors_dent)
> > +		goto out_err;
> > +
> >  	return 0;
> >  
> >  out_err:
> > @@ -843,13 +887,29 @@ static __init int dma_debug_entries_cmdline(char *str)
> >  __setup("dma_debug=", dma_debug_cmdline);
> >  __setup("dma_debug_entries=", dma_debug_entries_cmdline);
> >  
> > +/* Calling dma_mapping_error() from dma-debug api will result in calling
> > +   debug_dma_mapping_error() - need internal mapping error routine to
> > +   avoid debug checks */
> > +#ifndef DMA_ERROR_CODE
> > +#define DMA_ERROR_CODE 0
> > +#endif
> > +static inline int check_mapping_error(struct device *dev, dma_addr_t dma_addr)
> > +{
> > +	const struct dma_map_ops *ops = get_dma_ops(dev);
> > +	if (ops->mapping_error)
> > +		return ops->mapping_error(dev, dma_addr);
> > +
> > +	return (dma_addr == DMA_ERROR_CODE);
> > +}
> > +
> >  static void check_unmap(struct dma_debug_entry *ref)
> >  {
> >  	struct dma_debug_entry *entry;
> >  	struct hash_bucket *bucket;
> >  	unsigned long flags;
> >  
> > -	if (dma_mapping_error(ref->dev, ref->dev_addr)) {
> > +	if (unlikely(check_mapping_error(ref->dev, ref->dev_addr))) {
> > +		unmap_errors += 1;
> >  		err_printk(ref->dev, NULL, "DMA-API: device driver tries "
> >  			   "to free an invalid DMA memory address\n");
> >  		return;
> > @@ -915,6 +975,15 @@ static void check_unmap(struct dma_debug_entry *ref)
> >  			   dir2name[ref->direction]);
> >  	}
> >  
> > +	if (entry->map_err_type == MAP_ERR_NOT_CHECKED) {
> > +		err_printk(ref->dev, entry,
> > +			   "DMA-API: device driver failed to check map error"
> > +			   "[device address=0x%016llx] [size=%llu bytes] "
> > +			   "[mapped as %s]",
> > +			   ref->dev_addr, ref->size,
> > +			   type2name[entry->type]);
> > +	}
> > +
> >  	hash_bucket_del(entry);
> >  	dma_entry_free(entry);
> >  
> > @@ -1022,8 +1091,11 @@ void debug_dma_map_page(struct device *dev, struct page *page, size_t offset,
> >  	if (unlikely(global_disable))
> >  		return;
> >  
> > -	if (unlikely(dma_mapping_error(dev, dma_addr)))
> > +	if (unlikely(check_mapping_error(dev, dma_addr))) {
> > +		map_errors += 1;
> > +		map_errors_not_checked += 1;
> >  		return;
> > +	}
> >  
> >  	entry = dma_entry_alloc();
> >  	if (!entry)
> > @@ -1035,6 +1107,7 @@ void debug_dma_map_page(struct device *dev, struct page *page, size_t offset,
> >  	entry->dev_addr  = dma_addr;
> >  	entry->size      = size;
> >  	entry->direction = direction;
> > +	entry->map_err_type = MAP_ERR_NOT_CHECKED;
> >  
> >  	if (map_single)
> >  		entry->type = dma_debug_single;
> > @@ -1050,6 +1123,35 @@ void debug_dma_map_page(struct device *dev, struct page *page, size_t offset,
> >  }
> >  EXPORT_SYMBOL(debug_dma_map_page);
> >  
> > +void debug_dma_mapping_error(struct device *dev, dma_addr_t dma_addr)
> > +{
> > +	struct dma_debug_entry ref;
> > +	struct dma_debug_entry *entry;
> > +	struct hash_bucket *bucket;
> > +	unsigned long flags;
> > +
> > +	if (unlikely(global_disable))
> > +		return;
> > +
> > +	ref.dev = dev;
> > +	ref.dev_addr = dma_addr;
> > +	bucket = get_hash_bucket(&ref, &flags);
> > +	entry = bucket_find_exact(bucket, &ref);
> > +
> > +	if (!entry) {
> > +		/* very likley dma-api didn't call debug_dma_map_page() or
> > +		   debug_dma_map_page() detected mapping error */
> > +		if (map_errors_not_checked)
> > +			map_errors_not_checked -= 1;
> > +		goto out;
> > +	}
> > +
> > +	entry->map_err_type = MAP_ERR_CHECKED;
> > +out:
> > +	put_hash_bucket(bucket, &flags);
> > +}
> > +EXPORT_SYMBOL(debug_dma_mapping_error);
> > +
> >  void debug_dma_unmap_page(struct device *dev, dma_addr_t addr,
> >  			  size_t size, int direction, bool map_single)
> >  {
> > -- 
> > 1.7.9.5
> > 
> > 
> > 
> > --
> > To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> > the body of a message to majordomo@vger.kernel.org
> > More majordomo info at  http://vger.kernel.org/majordomo-info.html
> > Please read the FAQ at  http://www.tux.org/lkml/
> > 



^ permalink raw reply	[flat|nested] 30+ messages in thread

* [PATCH v4] dma-debug: New interfaces to debug dma mapping errors
  2012-10-03 14:55   ` [PATCH v3] " Shuah Khan
  2012-10-03 21:45     ` Andrew Morton
  2012-10-04 17:38     ` Konrad Rzeszutek Wilk
@ 2012-10-05  1:23     ` Shuah Khan
  2012-10-05 22:51       ` Andrew Morton
  2 siblings, 1 reply; 30+ messages in thread
From: Shuah Khan @ 2012-10-05  1:23 UTC (permalink / raw)
  To: konrad.wilk, tglx, mingo, hpa, rob, akpm, stern, joerg.roedel, bhelgaas
  Cc: LKML, linux-doc, devel, x86, shuahkhan

A recent dma mapping error analysis effort showed that a large percentage
of dma_map_single() and dma_map_page() returns are not checked for mapping
errors.

Reference:
http://linuxdriverproject.org/mediawiki/index.php/DMA_Mapping_Error_Analysis

Adding support for tracking dma mapping and unmapping errors to help assess
the following:

When do dma mapping errors get detected?
How often do these errors occur?
Why don't we see failures related to missing dma mapping error checks?
Are they silent failures?

Patch v4: Addresses extra tab review comments from Patch v3.
Patch v3: Addresses review and design comments from Patch v2.
Patch v2: Addressed design issues from Patch v1.

Enhance dma-debug infrastructure to track dma mapping, and unmapping errors.

map_errors: (system wide counter)
  Total number of dma mapping errors returned by the dma mapping interfaces,
  in response to mapping requests from all devices in the system.
map_errors_not_checked: (system wide counter)
  Total number of dma mapping errors devices failed to check before using
  the returned address.
unmap_errors: (system wide counter)
  Total number of times devices tried to unmap or free an invalid dma
  address.
map_err_type: (new field added to dma_debug_entry structure)
  New field to maintain dma mapping error check status. This error type
  is applicable to the dma map page and dma map single entries tracked by
  dma-debug api. This status indicates whether or not a good mapping is
  checked by the device before its use. dma_map_single() and dma_map_page()
  could fail to create a mapping in some cases, and drivers are expected to
  call dma_mapping_error() to check for errors. Please note that this is not
  counter.

Enhancements to dma-debug api are made to add new debugfs interfaces to
report total dma errors, dma errors that are not checked, and unmap errors
for the entire system. Please note that these are system wide counters for
all devices in the system.

The following new dma-debug interface is added:

debug_dma_mapping_error(struct device *dev, dma_addr_t dma_addr)
	Sets dma map error checked status for the dma map entry if one is
	found. Decrements the system wide dma_map_errors_not_checked counter
	that is incremented by debug_dma_map_page() when it checks for
	mapping error before adding it to the dma debug entry table.

New dma-debug internal interface:
check_mapping_error(struct device *dev, dma_addr_t dma_addr)
Calling dma_mapping_error() from dma-debug api will result in dma mapping
check when it shouldn't. This internal routine checks dma mapping error
without any debug checks. 

The following existing dma-debug api are changed to support this feature:
debug_dma_map_page()
	Increments dma_map_errors and dma_map_errors_not_checked errors totals
	for the system, dma-debug api keeps track of, when dma_addr is invalid.
	This routine now calls internal check_mapping_error() interface to
	avoid doing dma mapping debug checks from dma-debug internal mapping
	error checks.
check_unmap()
	This is an existing internal routines that checks for various mapping
	errors. Changed to increment system wide dma_unmap_errors, when a
	device requests an invalid address to be unmapped. This routine now
	calls internal check_mapping_error() interface to avoid doing dma
	mapping debug checks from dma-debug internal mapping error checks.

Changed arch/x86/include/asm/dma-mapping.h to call debug_dma_mapping_error()
to validate these new interfaces on x86_64. Other architectures will be
changed in a subsequent patch.

Tested: Intel iommu and swiotlb (iommu=soft) on x86-64 with
        CONFIG_DMA_API_DEBUG enabled and disabled.

Signed-off-by: Shuah Khan <shuah.khan@hp.com>
Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
---
 Documentation/DMA-API.txt          |   13 +++++
 arch/x86/include/asm/dma-mapping.h |    1 +
 include/linux/dma-debug.h          |    7 +++
 lib/dma-debug.c                    |  110 ++++++++++++++++++++++++++++++++++--
 4 files changed, 127 insertions(+), 4 deletions(-)

diff --git a/Documentation/DMA-API.txt b/Documentation/DMA-API.txt
index 66bd97a..886aaf9 100644
--- a/Documentation/DMA-API.txt
+++ b/Documentation/DMA-API.txt
@@ -638,6 +638,19 @@ this directory the following files can currently be found:
 	dma-api/error_count	This file is read-only and shows the total
 				numbers of errors found.
 
+	dma-api/map_errors	This file is read-only and shows the total
+				number of dma mapping errors detected.
+
+	dma-api/map_errors_not_checked
+				This file is read-only and shows the total
+				number of dma mapping errors, device drivers
+				failed to check prior to using the returned
+				address.
+
+	dma-api/unmap_errors
+				This file is read-only and shows the total
+				number of invalid dma unmapping attempts.
+
 	dma-api/num_errors	The number in this file shows how many
 				warnings will be printed to the kernel log
 				before it stops. This number is initialized to
diff --git a/arch/x86/include/asm/dma-mapping.h b/arch/x86/include/asm/dma-mapping.h
index f7b4c79..808dae6 100644
--- a/arch/x86/include/asm/dma-mapping.h
+++ b/arch/x86/include/asm/dma-mapping.h
@@ -47,6 +47,7 @@ static inline struct dma_map_ops *get_dma_ops(struct device *dev)
 static inline int dma_mapping_error(struct device *dev, dma_addr_t dma_addr)
 {
 	struct dma_map_ops *ops = get_dma_ops(dev);
+	debug_dma_mapping_error(dev, dma_addr);
 	if (ops->mapping_error)
 		return ops->mapping_error(dev, dma_addr);
 
diff --git a/include/linux/dma-debug.h b/include/linux/dma-debug.h
index 171ad8a..fc0e34c 100644
--- a/include/linux/dma-debug.h
+++ b/include/linux/dma-debug.h
@@ -39,6 +39,8 @@ extern void debug_dma_map_page(struct device *dev, struct page *page,
 			       int direction, dma_addr_t dma_addr,
 			       bool map_single);
 
+extern void debug_dma_mapping_error(struct device *dev, dma_addr_t dma_addr);
+
 extern void debug_dma_unmap_page(struct device *dev, dma_addr_t addr,
 				 size_t size, int direction, bool map_single);
 
@@ -105,6 +107,11 @@ static inline void debug_dma_map_page(struct device *dev, struct page *page,
 {
 }
 
+static inline void debug_dma_mapping_error(struct device *dev,
+					  dma_addr_t dma_addr)
+{
+}
+
 static inline void debug_dma_unmap_page(struct device *dev, dma_addr_t addr,
 					size_t size, int direction,
 					bool map_single)
diff --git a/lib/dma-debug.c b/lib/dma-debug.c
index 66ce414..74a7da9 100644
--- a/lib/dma-debug.c
+++ b/lib/dma-debug.c
@@ -45,6 +45,12 @@ enum {
 	dma_debug_coherent,
 };
 
+enum map_err_types {
+	MAP_ERR_CHECK_NOT_APPLICABLE,
+	MAP_ERR_NOT_CHECKED,
+	MAP_ERR_CHECKED,
+};
+
 #define DMA_DEBUG_STACKTRACE_ENTRIES 5
 
 struct dma_debug_entry {
@@ -57,6 +63,7 @@ struct dma_debug_entry {
 	int              direction;
 	int		 sg_call_ents;
 	int		 sg_mapped_ents;
+	enum map_err_types  map_err_type;
 #ifdef CONFIG_STACKTRACE
 	struct		 stack_trace stacktrace;
 	unsigned long	 st_entries[DMA_DEBUG_STACKTRACE_ENTRIES];
@@ -83,6 +90,11 @@ static u32 global_disable __read_mostly;
 /* Global error count */
 static u32 error_count;
 
+/* dma mapping error counts */
+static u32 map_errors;
+static u32 map_errors_not_checked;
+static u32 unmap_errors;
+
 /* Global error show enable*/
 static u32 show_all_errors __read_mostly;
 /* Number of errors to show */
@@ -104,6 +116,9 @@ static struct dentry *show_num_errors_dent  __read_mostly;
 static struct dentry *num_free_entries_dent __read_mostly;
 static struct dentry *min_free_entries_dent __read_mostly;
 static struct dentry *filter_dent           __read_mostly;
+static struct dentry *map_errors_dent   __read_mostly;
+static struct dentry *map_errors_not_checked_dent   __read_mostly;
+static struct dentry *unmap_errors_dent __read_mostly;
 
 /* per-driver filter related state */
 
@@ -114,6 +129,12 @@ static struct device_driver *current_driver                    __read_mostly;
 
 static DEFINE_RWLOCK(driver_name_lock);
 
+static const char *const maperr2str[] = {
+	[MAP_ERR_CHECK_NOT_APPLICABLE] = "dma map error check not applicable",
+	[MAP_ERR_NOT_CHECKED] = "dma map error not checked",
+	[MAP_ERR_CHECKED] = "dma map error checked",
+};
+
 static const char *type2name[4] = { "single", "page",
 				    "scather-gather", "coherent" };
 
@@ -381,11 +402,12 @@ void debug_dma_dump_mappings(struct device *dev)
 		list_for_each_entry(entry, &bucket->list, list) {
 			if (!dev || dev == entry->dev) {
 				dev_info(entry->dev,
-					 "%s idx %d P=%Lx D=%Lx L=%Lx %s\n",
+					 "%s idx %d P=%Lx D=%Lx L=%Lx %s %s\n",
 					 type2name[entry->type], idx,
 					 (unsigned long long)entry->paddr,
 					 entry->dev_addr, entry->size,
-					 dir2name[entry->direction]);
+					 dir2name[entry->direction],
+					 maperr2str[entry->map_err_type]);
 			}
 		}
 
@@ -695,6 +717,28 @@ static int dma_debug_fs_init(void)
 	if (!filter_dent)
 		goto out_err;
 
+	map_errors_dent = debugfs_create_u32("map_errors", 0444,
+			dma_debug_dent,
+			&map_errors);
+
+	if (!map_errors_dent)
+		goto out_err;
+
+	map_errors_not_checked_dent = debugfs_create_u32(
+			"map_errors_not_checked",
+			0444,
+			dma_debug_dent,
+			&map_errors_not_checked);
+
+	if (!map_errors_not_checked_dent)
+		goto out_err;
+
+	unmap_errors_dent = debugfs_create_u32("unmap_errors", 0444,
+			dma_debug_dent,
+			&unmap_errors);
+	if (!unmap_errors_dent)
+		goto out_err;
+
 	return 0;
 
 out_err:
@@ -843,13 +887,29 @@ static __init int dma_debug_entries_cmdline(char *str)
 __setup("dma_debug=", dma_debug_cmdline);
 __setup("dma_debug_entries=", dma_debug_entries_cmdline);
 
+/* Calling dma_mapping_error() from dma-debug api will result in calling
+   debug_dma_mapping_error() - need internal mapping error routine to
+   avoid debug checks */
+#ifndef DMA_ERROR_CODE
+#define DMA_ERROR_CODE 0
+#endif
+static inline int check_mapping_error(struct device *dev, dma_addr_t dma_addr)
+{
+	const struct dma_map_ops *ops = get_dma_ops(dev);
+	if (ops->mapping_error)
+		return ops->mapping_error(dev, dma_addr);
+
+	return (dma_addr == DMA_ERROR_CODE);
+}
+
 static void check_unmap(struct dma_debug_entry *ref)
 {
 	struct dma_debug_entry *entry;
 	struct hash_bucket *bucket;
 	unsigned long flags;
 
-	if (dma_mapping_error(ref->dev, ref->dev_addr)) {
+	if (unlikely(check_mapping_error(ref->dev, ref->dev_addr))) {
+		unmap_errors += 1;
 		err_printk(ref->dev, NULL, "DMA-API: device driver tries "
 			   "to free an invalid DMA memory address\n");
 		return;
@@ -915,6 +975,15 @@ static void check_unmap(struct dma_debug_entry *ref)
 			   dir2name[ref->direction]);
 	}
 
+	if (entry->map_err_type == MAP_ERR_NOT_CHECKED) {
+		err_printk(ref->dev, entry,
+			   "DMA-API: device driver failed to check map error"
+			   "[device address=0x%016llx] [size=%llu bytes] "
+			   "[mapped as %s]",
+			   ref->dev_addr, ref->size,
+			   type2name[entry->type]);
+	}
+
 	hash_bucket_del(entry);
 	dma_entry_free(entry);
 
@@ -1022,8 +1091,11 @@ void debug_dma_map_page(struct device *dev, struct page *page, size_t offset,
 	if (unlikely(global_disable))
 		return;
 
-	if (unlikely(dma_mapping_error(dev, dma_addr)))
+	if (unlikely(check_mapping_error(dev, dma_addr))) {
+		map_errors += 1;
+		map_errors_not_checked += 1;
 		return;
+	}
 
 	entry = dma_entry_alloc();
 	if (!entry)
@@ -1035,6 +1107,7 @@ void debug_dma_map_page(struct device *dev, struct page *page, size_t offset,
 	entry->dev_addr  = dma_addr;
 	entry->size      = size;
 	entry->direction = direction;
+	entry->map_err_type = MAP_ERR_NOT_CHECKED;
 
 	if (map_single)
 		entry->type = dma_debug_single;
@@ -1050,6 +1123,35 @@ void debug_dma_map_page(struct device *dev, struct page *page, size_t offset,
 }
 EXPORT_SYMBOL(debug_dma_map_page);
 
+void debug_dma_mapping_error(struct device *dev, dma_addr_t dma_addr)
+{
+	struct dma_debug_entry ref;
+	struct dma_debug_entry *entry;
+	struct hash_bucket *bucket;
+	unsigned long flags;
+
+	if (unlikely(global_disable))
+		return;
+
+	ref.dev = dev;
+	ref.dev_addr = dma_addr;
+	bucket = get_hash_bucket(&ref, &flags);
+	entry = bucket_find_exact(bucket, &ref);
+
+	if (!entry) {
+		/* very likley dma-api didn't call debug_dma_map_page() or
+		   debug_dma_map_page() detected mapping error */
+		if (map_errors_not_checked)
+			map_errors_not_checked -= 1;
+		goto out;
+	}
+
+	entry->map_err_type = MAP_ERR_CHECKED;
+out:
+	put_hash_bucket(bucket, &flags);
+}
+EXPORT_SYMBOL(debug_dma_mapping_error);
+
 void debug_dma_unmap_page(struct device *dev, dma_addr_t addr,
 			  size_t size, int direction, bool map_single)
 {
-- 
1.7.9.5




^ permalink raw reply related	[flat|nested] 30+ messages in thread

* Re: [PATCH v4] dma-debug: New interfaces to debug dma mapping errors
  2012-10-05  1:23     ` [PATCH v4] " Shuah Khan
@ 2012-10-05 22:51       ` Andrew Morton
  2012-10-08 17:07         ` Shuah Khan
  0 siblings, 1 reply; 30+ messages in thread
From: Andrew Morton @ 2012-10-05 22:51 UTC (permalink / raw)
  To: shuah.khan
  Cc: konrad.wilk, tglx, mingo, hpa, rob, stern, joerg.roedel,
	bhelgaas, LKML, linux-doc, devel, x86, shuahkhan

On Thu, 04 Oct 2012 19:23:13 -0600
Shuah Khan <shuah.khan@hp.com> wrote:

> A recent dma mapping error analysis effort showed that a large percentage
> of dma_map_single() and dma_map_page() returns are not checked for mapping
> errors.
> 
> Reference:
> http://linuxdriverproject.org/mediawiki/index.php/DMA_Mapping_Error_Analysis
> 
> Adding support for tracking dma mapping and unmapping errors to help assess
> the following:
> 
> When do dma mapping errors get detected?
> How often do these errors occur?
> Why don't we see failures related to missing dma mapping error checks?
> Are they silent failures?
> 
> Patch v4: Addresses extra tab review comments from Patch v3.
> Patch v3: Addresses review and design comments from Patch v2.
> Patch v2: Addressed design issues from Patch v1.
> 
> Enhance dma-debug infrastructure to track dma mapping, and unmapping errors.
> 
> map_errors: (system wide counter)
>   Total number of dma mapping errors returned by the dma mapping interfaces,
>   in response to mapping requests from all devices in the system.
> map_errors_not_checked: (system wide counter)
>   Total number of dma mapping errors devices failed to check before using
>   the returned address.
> unmap_errors: (system wide counter)
>   Total number of times devices tried to unmap or free an invalid dma
>   address.
> map_err_type: (new field added to dma_debug_entry structure)
>   New field to maintain dma mapping error check status. This error type
>   is applicable to the dma map page and dma map single entries tracked by
>   dma-debug api. This status indicates whether or not a good mapping is
>   checked by the device before its use. dma_map_single() and dma_map_page()
>   could fail to create a mapping in some cases, and drivers are expected to
>   call dma_mapping_error() to check for errors. Please note that this is not
>   counter.
> 
> Enhancements to dma-debug api are made to add new debugfs interfaces to
> report total dma errors, dma errors that are not checked, and unmap errors
> for the entire system. Please note that these are system wide counters for
> all devices in the system.
> 
> The following new dma-debug interface is added:
> 
> debug_dma_mapping_error(struct device *dev, dma_addr_t dma_addr)
> 	Sets dma map error checked status for the dma map entry if one is
> 	found. Decrements the system wide dma_map_errors_not_checked counter
> 	that is incremented by debug_dma_map_page() when it checks for
> 	mapping error before adding it to the dma debug entry table.
> 
> New dma-debug internal interface:
> check_mapping_error(struct device *dev, dma_addr_t dma_addr)
> Calling dma_mapping_error() from dma-debug api will result in dma mapping
> check when it shouldn't. This internal routine checks dma mapping error
> without any debug checks. 
> 
> The following existing dma-debug api are changed to support this feature:
> debug_dma_map_page()
> 	Increments dma_map_errors and dma_map_errors_not_checked errors totals
> 	for the system, dma-debug api keeps track of, when dma_addr is invalid.
> 	This routine now calls internal check_mapping_error() interface to
> 	avoid doing dma mapping debug checks from dma-debug internal mapping
> 	error checks.
> check_unmap()
> 	This is an existing internal routines that checks for various mapping
> 	errors. Changed to increment system wide dma_unmap_errors, when a
> 	device requests an invalid address to be unmapped. This routine now
> 	calls internal check_mapping_error() interface to avoid doing dma
> 	mapping debug checks from dma-debug internal mapping error checks.
> 
> Changed arch/x86/include/asm/dma-mapping.h to call debug_dma_mapping_error()
> to validate these new interfaces on x86_64. Other architectures will be
> changed in a subsequent patch.
> 
> Tested: Intel iommu and swiotlb (iommu=soft) on x86-64 with
>         CONFIG_DMA_API_DEBUG enabled and disabled.

Still seems overly complicated to me, but whatev.

I think the way to handle this is pretty simple: set a flag in the dma
entry when someone runs dma_mapping_error() and, if that flag wasn't
set at unmap time, emit a loud warning.

>From my reading of the code, this patch indeed does that, along with a
bunch of other (unnecessary?) stuff.  But boy, the changelog conceals
this information well!

>  Documentation/DMA-API.txt          |   13 +++++
>  arch/x86/include/asm/dma-mapping.h |    1 +
>  include/linux/dma-debug.h          |    7 +++
>  lib/dma-debug.c                    |  110 ++++++++++++++++++++++++++++++++++--

Please, go through Documentation/DMA-API-HOWTO.txt with a toothcomb and
ensure that all this is appropriately covered and that the examples are
completed for error checking.

>
> ...
>
> +static inline int check_mapping_error(struct device *dev, dma_addr_t dma_addr)
> +{
> +	const struct dma_map_ops *ops = get_dma_ops(dev);
> +	if (ops->mapping_error)
> +		return ops->mapping_error(dev, dma_addr);
> +
> +	return (dma_addr == DMA_ERROR_CODE);
> +}

I'm not a fan of functions called check_foo() because I never know
whether their return value means "foo is true" or "foo is false".  It
doesn't help that the return value here is undocumented.  Names such as
has_mapping_error() or mapping_has_error() will remove this question,
thereby making the call sites more readable.

>  static void check_unmap(struct dma_debug_entry *ref)
>  {
>  	struct dma_debug_entry *entry;
>  	struct hash_bucket *bucket;
>  	unsigned long flags;
>  
> -	if (dma_mapping_error(ref->dev, ref->dev_addr)) {
> +	if (unlikely(check_mapping_error(ref->dev, ref->dev_addr))) {
> +		unmap_errors += 1;
>  		err_printk(ref->dev, NULL, "DMA-API: device driver tries "
>  			   "to free an invalid DMA memory address\n");
>  		return;
> @@ -915,6 +975,15 @@ static void check_unmap(struct dma_debug_entry *ref)
>  			   dir2name[ref->direction]);
>  	}
>  
> +	if (entry->map_err_type == MAP_ERR_NOT_CHECKED) {
> +		err_printk(ref->dev, entry,
> +			   "DMA-API: device driver failed to check map error"
> +			   "[device address=0x%016llx] [size=%llu bytes] "
> +			   "[mapped as %s]",
> +			   ref->dev_addr, ref->size,
> +			   type2name[entry->type]);
> +	}

It's important that this warning be associated with a backtrace so we
can identify the offending call site in the usual fashion. 
err_printk() does include a backtrace under some circumstances, but
those circumstances hurt my brain.

Is it guaranteed that we'll have that backtrace?  If not, I'd suggest
making it so.

>  	hash_bucket_del(entry);
>  	dma_entry_free(entry);
>  
>
> ...
>
> +void debug_dma_mapping_error(struct device *dev, dma_addr_t dma_addr)
> +{
> +	struct dma_debug_entry ref;
> +	struct dma_debug_entry *entry;
> +	struct hash_bucket *bucket;
> +	unsigned long flags;
> +
> +	if (unlikely(global_disable))
> +		return;
> +
> +	ref.dev = dev;
> +	ref.dev_addr = dma_addr;
> +	bucket = get_hash_bucket(&ref, &flags);
> +	entry = bucket_find_exact(bucket, &ref);
> +
> +	if (!entry) {
> +		/* very likley dma-api didn't call debug_dma_map_page() or
> +		   debug_dma_map_page() detected mapping error */
> +		if (map_errors_not_checked)
> +			map_errors_not_checked -= 1;
> +		goto out;
> +	}
> +
> +	entry->map_err_type = MAP_ERR_CHECKED;
> +out:
> +	put_hash_bucket(bucket, &flags);
> +}
> +EXPORT_SYMBOL(debug_dma_mapping_error);

Well, it's a global, exported-to-modules symbol.  Some formal
documentation is appropriate for such things.

>
> ...
>


^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH v4] dma-debug: New interfaces to debug dma mapping errors
  2012-10-05 22:51       ` Andrew Morton
@ 2012-10-08 17:07         ` Shuah Khan
  2012-10-09 21:02           ` Andrew Morton
  0 siblings, 1 reply; 30+ messages in thread
From: Shuah Khan @ 2012-10-08 17:07 UTC (permalink / raw)
  To: Andrew Morton
  Cc: konrad.wilk, tglx, mingo, hpa, rob, stern, joerg.roedel,
	bhelgaas, LKML, linux-doc, devel, x86, shuahkhan

On Fri, 2012-10-05 at 15:51 -0700, Andrew Morton wrote:

> 
> Still seems overly complicated to me, but whatev.
> 
> I think the way to handle this is pretty simple: set a flag in the dma
> entry when someone runs dma_mapping_error() and, if that flag wasn't
> set at unmap time, emit a loud warning.
> 
> From my reading of the code, this patch indeed does that, along with a
> bunch of other (unnecessary?) stuff.  But boy, the changelog conceals
> this information well!

Are you referring to the system wide error counters when you say
unnecessary stuff. The reason I added those was to catch errors when
drivers don't do unmap. Several drivers fail to do unmap. Checking flag
from unmap debug interfaces, doesn't cover these cases. However, I think
the value of system wide counters is limited in the sense that they
don't really identify the driver that needs fixing. In that sense it can
be deemed unnecessary. I dropped them in v5 patch, which I am sending
out.

I also changed the changelog to be clear.

> 
> >  Documentation/DMA-API.txt          |   13 +++++
> >  arch/x86/include/asm/dma-mapping.h |    1 +
> >  include/linux/dma-debug.h          |    7 +++
> >  lib/dma-debug.c                    |  110 ++++++++++++++++++++++++++++++++++--
> 
> Please, go through Documentation/DMA-API-HOWTO.txt with a toothcomb and
> ensure that all this is appropriately covered and that the examples are
> completed for error checking.
> 
> >
> > ...
> >
> > +static inline int check_mapping_error(struct device *dev, dma_addr_t dma_addr)
> > +{
> > +	const struct dma_map_ops *ops = get_dma_ops(dev);
> > +	if (ops->mapping_error)
> > +		return ops->mapping_error(dev, dma_addr);
> > +
> > +	return (dma_addr == DMA_ERROR_CODE);
> > +}
> 
> I'm not a fan of functions called check_foo() because I never know
> whether their return value means "foo is true" or "foo is false".  It
> doesn't help that the return value here is undocumented.  Names such as
> has_mapping_error() or mapping_has_error() will remove this question,
> thereby making the call sites more readable.

Changed the name to has_mapping_error().

> 
> >  static void check_unmap(struct dma_debug_entry *ref)
> >  {
> >  	struct dma_debug_entry *entry;
> >  	struct hash_bucket *bucket;
> >  	unsigned long flags;
> >  
> > -	if (dma_mapping_error(ref->dev, ref->dev_addr)) {
> > +	if (unlikely(check_mapping_error(ref->dev, ref->dev_addr))) {
> > +		unmap_errors += 1;
> >  		err_printk(ref->dev, NULL, "DMA-API: device driver tries "
> >  			   "to free an invalid DMA memory address\n");
> >  		return;
> > @@ -915,6 +975,15 @@ static void check_unmap(struct dma_debug_entry *ref)
> >  			   dir2name[ref->direction]);
> >  	}
> >  
> > +	if (entry->map_err_type == MAP_ERR_NOT_CHECKED) {
> > +		err_printk(ref->dev, entry,
> > +			   "DMA-API: device driver failed to check map error"
> > +			   "[device address=0x%016llx] [size=%llu bytes] "
> > +			   "[mapped as %s]",
> > +			   ref->dev_addr, ref->size,
> > +			   type2name[entry->type]);
> > +	}
> 
> It's important that this warning be associated with a backtrace so we
> can identify the offending call site in the usual fashion. 
> err_printk() does include a backtrace under some circumstances, but
> those circumstances hurt my brain.
> 
> Is it guaranteed that we'll have that backtrace?  If not, I'd suggest
> making it so.

Yes it does. Here is one from my test run:

[ 5232.368453] e1000e 0000:00:19.0: DMA-API: device driver failed to
check map error[device address=0x00000000ffe4e1c0] [size=1522 bytes]
[mapped as single]

[ 5232.368546] Pid: 0, comm: swapper/1 Not tainted 3.6.0-next-20121002+
#2
[ 5232.368549] Call Trace:
[ 5232.368553]  <IRQ>  [<ffffffff81055faf>] warn_slowpath_common
+0x7f/0xc0
[ 5232.368570]  [<ffffffff810560a6>] warn_slowpath_fmt+0x46/0x50
[ 5232.368579]  [<ffffffff81341fbe>] check_unmap+0x48e/0x860
[ 5232.368588]  [<ffffffff81092ee9>] ? check_preempt_wakeup+0x1c9/0x270
[ 5232.368595]  [<ffffffff8134251c>] debug_dma_unmap_page+0x5c/0x60
[ 5232.368616]  [<ffffffffa0016e4e>] e1000_clean_rx_irq+0x1be/0x410
[e1000e]
[ 5232.368626]  [<ffffffff8108c885>] ? wake_up_process+0x15/0x20
[ 5232.368644]  [<ffffffffa001b35c>] e1000e_poll+0x7c/0x400 [e1000e]
[ 5232.368653]  [<ffffffff8107e539>] ? enqueue_hrtimer+0x39/0xc0
[ 5232.368662]  [<ffffffff815756c4>] net_rx_action+0x134/0x240
[ 5232.368670]  [<ffffffff813b15c3>] ? arch_local_irq_enable+0x8/0xd
[ 5232.368678]  [<ffffffff8105e9d0>] __do_softirq+0xc0/0x240
[ 5232.368688]  [<ffffffff816856ce>] ? _raw_spin_lock+0xe/0x20
[ 5232.368697]  [<ffffffff8168f31c>] call_softirq+0x1c/0x30
[ 5232.368706]  [<ffffffff81016315>] do_softirq+0x65/0xa0
[ 5232.368713]  [<ffffffff8105ecae>] irq_exit+0x8e/0xb0
[ 5232.368721]  [<ffffffff8168fba3>] do_IRQ+0x63/0xe0
[ 5232.368728]  [<ffffffff81685cad>] common_interrupt+0x6d/0x6d
[ 5232.368731]  <EOI>  [<ffffffff813b15c3>] ? arch_local_irq_enable
+0x8/0xd
[ 5232.368744]  [<ffffffff813b2159>] acpi_idle_enter_c1+0x94/0xb9
[ 5232.368752]  [<ffffffff81533fd9>] cpuidle_enter+0x19/0x20
[ 5232.368758]  [<ffffffff815346ac>] cpuidle_idle_call+0xac/0x290
[ 5232.368767]  [<ffffffff8101d59f>] cpu_idle+0xcf/0x120
[ 5232.368775]  [<ffffffff81670bc6>] start_secondary+0x1ec/0x1f3
[ 5232.368780] ---[ end trace 58eeb6eaa1eca88f ]---
[ 5232.368782] Mapped at:
[ 5232.368785]  [<ffffffff81340fcb>] debug_dma_map_page+0x8b/0x150
[ 5232.368792]  [<ffffffffa001823f>] e1000_alloc_rx_buffers+0x15f/0x2d0
[e1000e]
[ 5232.368808]  [<ffffffffa0016f8b>] e1000_clean_rx_irq+0x2fb/0x410
[e1000e]
[ 5232.368823]  [<ffffffffa001b35c>] e1000e_poll+0x7c/0x400 [e1000e]
[ 5232.368838]  [<ffffffff815756c4>] net_rx_action+0x134/0x240

> 
> >  	hash_bucket_del(entry);
> >  	dma_entry_free(entry);
> >  
> >
> > ...
> >
> > +void debug_dma_mapping_error(struct device *dev, dma_addr_t dma_addr)
> > +{
> > +	struct dma_debug_entry ref;
> > +	struct dma_debug_entry *entry;
> > +	struct hash_bucket *bucket;
> > +	unsigned long flags;
> > +
> > +	if (unlikely(global_disable))
> > +		return;
> > +
> > +	ref.dev = dev;
> > +	ref.dev_addr = dma_addr;
> > +	bucket = get_hash_bucket(&ref, &flags);
> > +	entry = bucket_find_exact(bucket, &ref);
> > +
> > +	if (!entry) {
> > +		/* very likley dma-api didn't call debug_dma_map_page() or
> > +		   debug_dma_map_page() detected mapping error */
> > +		if (map_errors_not_checked)
> > +			map_errors_not_checked -= 1;
> > +		goto out;
> > +	}
> > +
> > +	entry->map_err_type = MAP_ERR_CHECKED;
> > +out:
> > +	put_hash_bucket(bucket, &flags);
> > +}
> > +EXPORT_SYMBOL(debug_dma_mapping_error);
> 
> Well, it's a global, exported-to-modules symbol.  Some formal
> documentation is appropriate for such things.

I added information on what it does and who can/should call to
DMA-API.txt
> 
> >
> > ...
> >
> 



^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH v4] dma-debug: New interfaces to debug dma mapping errors
  2012-10-08 17:07         ` Shuah Khan
@ 2012-10-09 21:02           ` Andrew Morton
  2012-10-10 21:50             ` Shuah Khan
  0 siblings, 1 reply; 30+ messages in thread
From: Andrew Morton @ 2012-10-09 21:02 UTC (permalink / raw)
  To: shuah.khan
  Cc: konrad.wilk, tglx, mingo, hpa, rob, stern, joerg.roedel,
	bhelgaas, LKML, linux-doc, devel, x86, shuahkhan

On Mon, 08 Oct 2012 11:07:20 -0600
Shuah Khan <shuah.khan@hp.com> wrote:

> > 
> > Still seems overly complicated to me, but whatev.
> > 
> > I think the way to handle this is pretty simple: set a flag in the dma
> > entry when someone runs dma_mapping_error() and, if that flag wasn't
> > set at unmap time, emit a loud warning.
> > 
> > From my reading of the code, this patch indeed does that, along with a
> > bunch of other (unnecessary?) stuff.  But boy, the changelog conceals
> > this information well!
> 
> Are you referring to the system wide error counters when you say
> unnecessary stuff. The reason I added those was to catch errors when
> drivers don't do unmap. Several drivers fail to do unmap. Checking flag
> from unmap debug interfaces, doesn't cover these cases. However, I think
> the value of system wide counters is limited in the sense that they
> don't really identify the driver that needs fixing. In that sense it can
> be deemed unnecessary. I dropped them in v5 patch, which I am sending
> out.

hm.  Could we keep a counter of the number of map/unmap calls within
the dma object and then emit a warning if that is non-zero at teardown
time?  That should identify the offending driver.


^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH v4] dma-debug: New interfaces to debug dma mapping errors
  2012-10-09 21:02           ` Andrew Morton
@ 2012-10-10 21:50             ` Shuah Khan
  0 siblings, 0 replies; 30+ messages in thread
From: Shuah Khan @ 2012-10-10 21:50 UTC (permalink / raw)
  To: Andrew Morton
  Cc: konrad.wilk, tglx, mingo, hpa, rob, stern, joerg.roedel,
	bhelgaas, LKML, linux-doc, devel, x86, shuahkhan

On Tue, 2012-10-09 at 14:02 -0700, Andrew Morton wrote:
> On Mon, 08 Oct 2012 11:07:20 -0600
> Shuah Khan <shuah.khan@hp.com> wrote:
> 
> > > 
> > > Still seems overly complicated to me, but whatev.
> > > 
> > > I think the way to handle this is pretty simple: set a flag in the dma
> > > entry when someone runs dma_mapping_error() and, if that flag wasn't
> > > set at unmap time, emit a loud warning.
> > > 
> > > From my reading of the code, this patch indeed does that, along with a
> > > bunch of other (unnecessary?) stuff.  But boy, the changelog conceals
> > > this information well!
> > 
> > Are you referring to the system wide error counters when you say
> > unnecessary stuff. The reason I added those was to catch errors when
> > drivers don't do unmap. Several drivers fail to do unmap. Checking flag
> > from unmap debug interfaces, doesn't cover these cases. However, I think
> > the value of system wide counters is limited in the sense that they
> > don't really identify the driver that needs fixing. In that sense it can
> > be deemed unnecessary. I dropped them in v5 patch, which I am sending
> > out.
> 
> hm.  Could we keep a counter of the number of map/unmap calls within
> the dma object and then emit a warning if that is non-zero at teardown
> time?  That should identify the offending driver.
> 

Thanks. That would work. Will take a look and see what it takes to
implement.

-- Shuah




^ permalink raw reply	[flat|nested] 30+ messages in thread

end of thread, other threads:[~2012-10-10 21:50 UTC | newest]

Thread overview: 30+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2012-09-17  0:52 [PATCH] dma-debug: New interfaces to debug dma mapping errors Shuah Khan
2012-09-17  2:07 ` Greg KH
2012-09-17 14:45   ` Shuah Khan
2012-09-17 15:25     ` Greg KH
2012-09-17 13:39 ` Konrad Rzeszutek Wilk
2012-09-17 15:52   ` Shuah Khan
2012-09-17 17:23     ` Konrad Rzeszutek Wilk
2012-09-17 22:45       ` Shuah Khan
2012-09-18 13:34         ` Joerg Roedel
2012-09-18 19:42           ` Shuah Khan
2012-09-18 19:45             ` Konrad Rzeszutek Wilk
2012-09-18 20:34               ` Shuah Khan
2012-09-19 13:08             ` Joerg Roedel
2012-09-19 19:16               ` Shuah Khan
2012-09-26  1:05 ` [PATCH v2] " Shuah Khan
2012-09-26 13:12   ` Konrad Rzeszutek Wilk
2012-09-26 16:23     ` Shuah Khan
2012-09-27 10:20   ` Joerg Roedel
2012-09-27 14:13     ` Shuah Khan
2012-10-03 14:55   ` [PATCH v3] " Shuah Khan
2012-10-03 21:45     ` Andrew Morton
2012-10-04 14:01       ` Konrad Rzeszutek Wilk
2012-10-04 22:16         ` Shuah Khan
2012-10-04 17:38     ` Konrad Rzeszutek Wilk
2012-10-04 22:19       ` Shuah Khan
2012-10-05  1:23     ` [PATCH v4] " Shuah Khan
2012-10-05 22:51       ` Andrew Morton
2012-10-08 17:07         ` Shuah Khan
2012-10-09 21:02           ` Andrew Morton
2012-10-10 21:50             ` Shuah Khan

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.