All of lore.kernel.org
 help / color / mirror / Atom feed
From: Russell King - ARM Linux admin <linux@armlinux.org.uk>
To: Embedded Engineer <embed786@gmail.com>
Cc: Andrew Lunn <andrew@lunn.ch>,
	Vladimir Murzin <vladimir.murzin@arm.com>,
	Jon Hunter <jonathanh@nvidia.com>,
	Thierry Reding <thierry.reding@gmail.com>,
	linux-tegra@vger.kernel.org,
	linux-arm-kernel@lists.infradead.org
Subject: Re: Unstable Kernel behavior on an ARM based board
Date: Tue, 5 Mar 2019 14:23:51 +0000	[thread overview]
Message-ID: <20190305142351.ktciqkj5kycdwilr@shell.armlinux.org.uk> (raw)
In-Reply-To: <CA+_ZnZS7wimpnH+TnvM_ztnX3wbADpf5FUBd_8igqeg+FnFr5g@mail.gmail.com>

On Tue, Mar 05, 2019 at 06:32:19PM +0500, Embedded Engineer wrote:
> On Tue, Mar 5, 2019 at 6:23 PM Russell King - ARM Linux admin
> <linux@armlinux.org.uk> wrote:
> >
> > Is there no later u-boot you can use to rule that out?
> 
> This u-boot was working just fine with our board so didn't try
> updating it to some newer version. Also the downstream u-boot has
> different text base addresses than mainline ones I guess so didn't put
> any effort in that.

As it is also suffering from the "hanging" issue, it seems that the
problem is not specific to the kernel.

It leaves only a few possible causes:

1. The board firmware (including u-boot) is enabling some DMA that is
   causing corruption of some RAM.

2. You really do have an issue between the CPU and RAM causing
   random-ish data corruption.

It may be worth getting mm/dmapool.c to print the hexdump a number of
times to see whether the data read from the corrupted region changes.
Around line 372, there is a call to print_hex_dump().  Just replicate
that a number of times.

Another idea would be to print a hexdump of each object as it's
allocated and the next object.

Maybe something like this (untested, may need tweaks to get it to build,
you'll also need to revert Thierry's patch):

diff --git a/mm/dmapool.c b/mm/dmapool.c
index 6d4b97e7e9e9..3db1e9b63809 100644
--- a/mm/dmapool.c
+++ b/mm/dmapool.c
@@ -219,6 +219,47 @@ static void pool_initialise_page(struct dma_pool *pool, struct dma_page *page)
 	} while (offset < pool->allocation);
 }
 
+#ifdef	DMAPOOL_DEBUG
+static int verify_one(struct dma_pool *pool, struct dma_page *page,
+		      unsigned int offset, const char *desc)
+{
+	dma_addr_t handle = page->dma + offset;
+	u8 *data = page->vaddr + offset;
+	int i;
+
+	for (i = sizeof(page->offset); i < pool->size; i++) {
+		if (data[i] == POOL_POISON_FREED)
+			continue;
+		if (pool->dev)
+			dev_err(pool->dev,
+				"%s %s, %pad (corrupted)\n",
+				desc, pool->name, &handle);
+		else
+			pr_err("%s %s, %pad (corrupted)\n",
+				desc, pool->name, &handle);
+
+		/*
+		 * Dump the first 4 bytes even if they are not
+		 * POOL_POISON_FREED
+		 */
+		print_hex_dump(KERN_ERR, "", DUMP_PREFIX_OFFSET, 16, 1,
+				data, pool->size, 1);
+		return 1;
+	}
+	return 0;
+}
+
+static void verify_free(struct dma_pool *pool, struct dma_page *page, const char *desc)
+{
+	unsigned int offset;
+
+	for (offset = page->offset; offset < page->allocation;
+	     offset = *(int *)(page->vaddr + offset))
+		if (verify_one(pool, page, offset, desc))
+			break;
+}
+#endif
+
 static struct dma_page *pool_alloc_page(struct dma_pool *pool, gfp_t mem_flags)
 {
 	struct dma_page *page;
@@ -235,6 +276,9 @@ static struct dma_page *pool_alloc_page(struct dma_pool *pool, gfp_t mem_flags)
 		pool_initialise_page(pool, page);
 		page->in_use = 0;
 		page->offset = 0;
+#ifdef	DMAPOOL_DEBUG
+		verify_free(pool, page, "pool_alloc_page");
+#endif
 	} else {
 		kfree(page);
 		page = NULL;
@@ -345,35 +389,17 @@ void *dma_pool_alloc(struct dma_pool *pool, gfp_t mem_flags,
 	list_add(&page->page_list, &pool->page_list);
  ready:
 	page->in_use++;
+#ifdef	DMAPOOL_DEBUG
+	verify_free(pool, page, "dma_pool_alloc pre");
+#endif
 	offset = page->offset;
 	page->offset = *(int *)(page->vaddr + offset);
 	retval = offset + page->vaddr;
 	*handle = offset + page->dma;
 #ifdef	DMAPOOL_DEBUG
-	{
-		int i;
-		u8 *data = retval;
-		/* page->offset is stored in first 4 bytes */
-		for (i = sizeof(page->offset); i < pool->size; i++) {
-			if (data[i] == POOL_POISON_FREED)
-				continue;
-			if (pool->dev)
-				dev_err(pool->dev,
-					"dma_pool_alloc %s, %p (corrupted)\n",
-					pool->name, retval);
-			else
-				pr_err("dma_pool_alloc %s, %p (corrupted)\n",
-					pool->name, retval);
-
-			/*
-			 * Dump the first 4 bytes even if they are not
-			 * POOL_POISON_FREED
-			 */
-			print_hex_dump(KERN_ERR, "", DUMP_PREFIX_OFFSET, 16, 1,
-					data, pool->size, 1);
-			break;
-		}
-	}
+	verify_one(pool, page, offset, "dma_pool_alloc");
+	if (page->offset < pool->allocation)
+		verify_one(pool, page, page->offset, "dma_pool_alloc next");
 	if (!(mem_flags & __GFP_ZERO))
 		memset(retval, POOL_POISON_ALLOCATED, pool->size);
 #endif

-- 
RMK's Patch system: https://www.armlinux.org.uk/developer/patches/
FTTC broadband for 0.8mile line in suburbia: sync at 12.1Mbps down 622kbps up
According to speedtest.net: 11.9Mbps down 500kbps up

WARNING: multiple messages have this Message-ID (diff)
From: Russell King - ARM Linux admin <linux@armlinux.org.uk>
To: Embedded Engineer <embed786@gmail.com>
Cc: Andrew Lunn <andrew@lunn.ch>,
	Vladimir Murzin <vladimir.murzin@arm.com>,
	Jon Hunter <jonathanh@nvidia.com>,
	Thierry Reding <thierry.reding@gmail.com>,
	linux-tegra@vger.kernel.org,
	linux-arm-kernel@lists.infradead.org
Subject: Re: Unstable Kernel behavior on an ARM based board
Date: Tue, 5 Mar 2019 14:23:51 +0000	[thread overview]
Message-ID: <20190305142351.ktciqkj5kycdwilr@shell.armlinux.org.uk> (raw)
In-Reply-To: <CA+_ZnZS7wimpnH+TnvM_ztnX3wbADpf5FUBd_8igqeg+FnFr5g@mail.gmail.com>

On Tue, Mar 05, 2019 at 06:32:19PM +0500, Embedded Engineer wrote:
> On Tue, Mar 5, 2019 at 6:23 PM Russell King - ARM Linux admin
> <linux@armlinux.org.uk> wrote:
> >
> > Is there no later u-boot you can use to rule that out?
> 
> This u-boot was working just fine with our board so didn't try
> updating it to some newer version. Also the downstream u-boot has
> different text base addresses than mainline ones I guess so didn't put
> any effort in that.

As it is also suffering from the "hanging" issue, it seems that the
problem is not specific to the kernel.

It leaves only a few possible causes:

1. The board firmware (including u-boot) is enabling some DMA that is
   causing corruption of some RAM.

2. You really do have an issue between the CPU and RAM causing
   random-ish data corruption.

It may be worth getting mm/dmapool.c to print the hexdump a number of
times to see whether the data read from the corrupted region changes.
Around line 372, there is a call to print_hex_dump().  Just replicate
that a number of times.

Another idea would be to print a hexdump of each object as it's
allocated and the next object.

Maybe something like this (untested, may need tweaks to get it to build,
you'll also need to revert Thierry's patch):

diff --git a/mm/dmapool.c b/mm/dmapool.c
index 6d4b97e7e9e9..3db1e9b63809 100644
--- a/mm/dmapool.c
+++ b/mm/dmapool.c
@@ -219,6 +219,47 @@ static void pool_initialise_page(struct dma_pool *pool, struct dma_page *page)
 	} while (offset < pool->allocation);
 }
 
+#ifdef	DMAPOOL_DEBUG
+static int verify_one(struct dma_pool *pool, struct dma_page *page,
+		      unsigned int offset, const char *desc)
+{
+	dma_addr_t handle = page->dma + offset;
+	u8 *data = page->vaddr + offset;
+	int i;
+
+	for (i = sizeof(page->offset); i < pool->size; i++) {
+		if (data[i] == POOL_POISON_FREED)
+			continue;
+		if (pool->dev)
+			dev_err(pool->dev,
+				"%s %s, %pad (corrupted)\n",
+				desc, pool->name, &handle);
+		else
+			pr_err("%s %s, %pad (corrupted)\n",
+				desc, pool->name, &handle);
+
+		/*
+		 * Dump the first 4 bytes even if they are not
+		 * POOL_POISON_FREED
+		 */
+		print_hex_dump(KERN_ERR, "", DUMP_PREFIX_OFFSET, 16, 1,
+				data, pool->size, 1);
+		return 1;
+	}
+	return 0;
+}
+
+static void verify_free(struct dma_pool *pool, struct dma_page *page, const char *desc)
+{
+	unsigned int offset;
+
+	for (offset = page->offset; offset < page->allocation;
+	     offset = *(int *)(page->vaddr + offset))
+		if (verify_one(pool, page, offset, desc))
+			break;
+}
+#endif
+
 static struct dma_page *pool_alloc_page(struct dma_pool *pool, gfp_t mem_flags)
 {
 	struct dma_page *page;
@@ -235,6 +276,9 @@ static struct dma_page *pool_alloc_page(struct dma_pool *pool, gfp_t mem_flags)
 		pool_initialise_page(pool, page);
 		page->in_use = 0;
 		page->offset = 0;
+#ifdef	DMAPOOL_DEBUG
+		verify_free(pool, page, "pool_alloc_page");
+#endif
 	} else {
 		kfree(page);
 		page = NULL;
@@ -345,35 +389,17 @@ void *dma_pool_alloc(struct dma_pool *pool, gfp_t mem_flags,
 	list_add(&page->page_list, &pool->page_list);
  ready:
 	page->in_use++;
+#ifdef	DMAPOOL_DEBUG
+	verify_free(pool, page, "dma_pool_alloc pre");
+#endif
 	offset = page->offset;
 	page->offset = *(int *)(page->vaddr + offset);
 	retval = offset + page->vaddr;
 	*handle = offset + page->dma;
 #ifdef	DMAPOOL_DEBUG
-	{
-		int i;
-		u8 *data = retval;
-		/* page->offset is stored in first 4 bytes */
-		for (i = sizeof(page->offset); i < pool->size; i++) {
-			if (data[i] == POOL_POISON_FREED)
-				continue;
-			if (pool->dev)
-				dev_err(pool->dev,
-					"dma_pool_alloc %s, %p (corrupted)\n",
-					pool->name, retval);
-			else
-				pr_err("dma_pool_alloc %s, %p (corrupted)\n",
-					pool->name, retval);
-
-			/*
-			 * Dump the first 4 bytes even if they are not
-			 * POOL_POISON_FREED
-			 */
-			print_hex_dump(KERN_ERR, "", DUMP_PREFIX_OFFSET, 16, 1,
-					data, pool->size, 1);
-			break;
-		}
-	}
+	verify_one(pool, page, offset, "dma_pool_alloc");
+	if (page->offset < pool->allocation)
+		verify_one(pool, page, page->offset, "dma_pool_alloc next");
 	if (!(mem_flags & __GFP_ZERO))
 		memset(retval, POOL_POISON_ALLOCATED, pool->size);
 #endif

-- 
RMK's Patch system: https://www.armlinux.org.uk/developer/patches/
FTTC broadband for 0.8mile line in suburbia: sync at 12.1Mbps down 622kbps up
According to speedtest.net: 11.9Mbps down 500kbps up

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

  reply	other threads:[~2019-03-05 14:23 UTC|newest]

Thread overview: 63+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-03-02 10:44 Unstable Kernel behavior on an ARM based board Embedded Engineer
2019-03-02 11:00 ` Russell King - ARM Linux admin
2019-03-02 11:01 ` Willy Tarreau
2019-03-02 11:22   ` Embedded Engineer
2019-03-02 11:25     ` Willy Tarreau
2019-03-02 11:46       ` Russell King - ARM Linux admin
2019-03-04 13:57         ` Thierry Reding
2019-03-02 11:36     ` Russell King - ARM Linux admin
2019-03-02 11:52       ` Embedded Engineer
2019-03-02 11:57         ` Russell King - ARM Linux admin
2019-03-02 12:20           ` Embedded Engineer
2019-03-02 12:39             ` Russell King - ARM Linux admin
2019-03-02 13:10               ` Embedded Engineer
2019-03-02 15:07               ` Clemens Koller
2019-03-04  5:14                 ` Embedded Engineer
2019-03-04 10:26                   ` Vladimir Murzin
2019-03-04 12:25                     ` Embedded Engineer
2019-03-04 14:25                       ` Thierry Reding
2019-03-04 15:51                         ` Embedded Engineer
2019-03-04 15:51                           ` Embedded Engineer
2019-03-05 10:01                         ` Embedded Engineer
2019-03-05 10:01                           ` Embedded Engineer
2019-03-05 10:07                           ` Russell King - ARM Linux admin
2019-03-05 10:07                             ` Russell King - ARM Linux admin
2019-03-05 10:29                             ` Embedded Engineer
2019-03-05 10:29                               ` Embedded Engineer
2019-03-05 11:20                               ` Thierry Reding
2019-03-05 11:22                               ` Russell King - ARM Linux admin
2019-03-05 11:22                                 ` Russell King - ARM Linux admin
2019-03-05 11:57                                 ` Thierry Reding
2019-03-05 13:16                                   ` Embedded Engineer
2019-03-05 13:16                                     ` Embedded Engineer
2019-03-05 13:23                                     ` Russell King - ARM Linux admin
2019-03-05 13:23                                       ` Russell King - ARM Linux admin
2019-03-05 13:32                                       ` Embedded Engineer
2019-03-05 13:32                                         ` Embedded Engineer
2019-03-05 14:23                                         ` Russell King - ARM Linux admin [this message]
2019-03-05 14:23                                           ` Russell King - ARM Linux admin
2019-03-05 14:57                                           ` Embedded Engineer
2019-03-05 14:57                                             ` Embedded Engineer
2019-03-05 14:58                                             ` Russell King - ARM Linux admin
2019-03-05 14:58                                               ` Russell King - ARM Linux admin
2019-03-05 15:11                                               ` Embedded Engineer
2019-03-05 15:11                                                 ` Embedded Engineer
2019-03-05 15:31                                                 ` Russell King - ARM Linux admin
2019-03-05 15:31                                                   ` Russell King - ARM Linux admin
2019-03-05 15:44                                                   ` Embedded Engineer
2019-03-05 15:44                                                     ` Embedded Engineer
2019-03-15  8:55                                                     ` Marcel Ziswiler
2019-03-15  8:55                                                       ` Marcel Ziswiler
2019-03-05 16:00                                                   ` Clemens Koller
2019-03-05 16:21                                                     ` Embedded Engineer
2019-03-09  7:50                                                     ` Embedded Engineer
2019-03-09  7:50                                                       ` Embedded Engineer
2019-03-05 10:32                           ` Thierry Reding
2019-03-05 11:05                             ` Embedded Engineer
2019-03-05 11:05                               ` Embedded Engineer
2019-03-05 11:36                               ` Thierry Reding
2019-03-04 14:00                   ` Andrew Lunn
2019-03-04 14:27                     ` Thierry Reding
2019-03-04 15:27                     ` Embedded Engineer
2019-03-04 15:57                       ` Andrew Lunn
2019-03-04 16:03                         ` Embedded Engineer

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20190305142351.ktciqkj5kycdwilr@shell.armlinux.org.uk \
    --to=linux@armlinux.org.uk \
    --cc=andrew@lunn.ch \
    --cc=embed786@gmail.com \
    --cc=jonathanh@nvidia.com \
    --cc=linux-arm-kernel@lists.infradead.org \
    --cc=linux-tegra@vger.kernel.org \
    --cc=thierry.reding@gmail.com \
    --cc=vladimir.murzin@arm.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.