All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 0/7] mmc: core: cleanup and locking patches description
@ 2013-09-26 19:22 Grant Grundler
  2013-09-26 19:22 ` [PATCH 1/7] mmc: core: rename "data" to saved_areq Grant Grundler
                   ` (7 more replies)
  0 siblings, 8 replies; 13+ messages in thread
From: Grant Grundler @ 2013-09-26 19:22 UTC (permalink / raw)
  To: Chris Ball, Ulf Hansson, Seungwon Jeon
  Cc: linux-mmc, linux-kernel, Grant Grundler

Following 7 patches are mostly cleanup with one key patch around host->areq
locking.  The host->areq locking problem description is here:
	http://www.spinics.net/lists/linux-mmc/msg21644.html

I do believe this preposed host->areq locking patch is a complete fix.
But it appears to fix the problem and is better than nothing.

This patch sequence applies clean to linus' 3.12-rc2 branch and only compile
tested in this form. This is a forward port (and cleanup) of the
same patch I've been testing on ChromeOS-3.4 tree using Exynos5250 chipset.

cheers,
grant

0000-mmc-core-cleanups-and-locking-description (this email)
0001-mmc-core-rename-data-to-saved_areq.patch
0002-mmc-core-rename-local-var-err-to-saved_err.patch
0003-mmc-core-restructure-error-handling-for-start-req.patch
0004-mmc-core-use-common-code-path-to-return-error.patch
0005-mmc-core-handling-polling-more-gracefully.patch
0006-mmc-core-protect-references-to-host-areq-with-host-l.patch
0007-mmc-core-mmc_start_req-is-a-misnomer-mmc_process_are.patch


 drivers/mmc/card/block.c    |   8 ++--
 drivers/mmc/card/mmc_test.c |   4 +-
 drivers/mmc/core/core.c     | 103 +++++++++++++++++++++++++-------------------
 include/linux/mmc/core.h    |   2 +-
 4 files changed, 66 insertions(+), 51 deletions(-)

^ permalink raw reply	[flat|nested] 13+ messages in thread

* [PATCH 1/7] mmc: core: rename "data" to saved_areq
  2013-09-26 19:22 [PATCH 0/7] mmc: core: cleanup and locking patches description Grant Grundler
@ 2013-09-26 19:22 ` Grant Grundler
  2013-09-26 19:22 ` [PATCH 2/7] mmc: core: rename local var err to saved_err Grant Grundler
                   ` (6 subsequent siblings)
  7 siblings, 0 replies; 13+ messages in thread
From: Grant Grundler @ 2013-09-26 19:22 UTC (permalink / raw)
  To: Chris Ball, Ulf Hansson, Seungwon Jeon
  Cc: linux-mmc, linux-kernel, Grant Grundler

Replace "data" with a more descriptive name.

Using a local variable (ie a register) makes explicit what the
compiler is doing under the covers anyway: the function is
dereferencing one pointer value the whole time.

Signed-off-by: Grant Grundler <grundler@chromium.org>
---
 drivers/mmc/core/core.c | 20 ++++++++++----------
 1 file changed, 10 insertions(+), 10 deletions(-)

diff --git a/drivers/mmc/core/core.c b/drivers/mmc/core/core.c
index 006ead2..e5a40ee 100644
--- a/drivers/mmc/core/core.c
+++ b/drivers/mmc/core/core.c
@@ -529,14 +529,14 @@ struct mmc_async_req *mmc_start_req(struct mmc_host *host,
 {
 	int err = 0;
 	int start_err = 0;
-	struct mmc_async_req *data = host->areq;
+	struct mmc_async_req *saved_areq = host->areq;
 
 	/* Prepare a new request */
 	if (areq)
-		mmc_pre_req(host, areq->mrq, !host->areq);
+		mmc_pre_req(host, areq->mrq, !saved_areq);
 
-	if (host->areq) {
-		err = mmc_wait_for_data_req_done(host, host->areq->mrq,	areq);
+	if (saved_areq) {
+		err = mmc_wait_for_data_req_done(host, saved_areq->mrq,	areq);
 		if (err == MMC_BLK_NEW_REQUEST) {
 			if (error)
 				*error = err;
@@ -550,17 +550,17 @@ struct mmc_async_req *mmc_start_req(struct mmc_host *host,
 		 * Check BKOPS urgency for each R1 response
 		 */
 		if (host->card && mmc_card_mmc(host->card) &&
-		    ((mmc_resp_type(host->areq->mrq->cmd) == MMC_RSP_R1) ||
-		     (mmc_resp_type(host->areq->mrq->cmd) == MMC_RSP_R1B)) &&
-		    (host->areq->mrq->cmd->resp[0] & R1_EXCEPTION_EVENT))
+		    ((mmc_resp_type(saved_areq->mrq->cmd) == MMC_RSP_R1) ||
+		     (mmc_resp_type(saved_areq->mrq->cmd) == MMC_RSP_R1B)) &&
+		    (saved_areq->mrq->cmd->resp[0] & R1_EXCEPTION_EVENT))
 			mmc_start_bkops(host->card, true);
 	}
 
 	if (!err && areq)
 		start_err = __mmc_start_data_req(host, areq->mrq);
 
-	if (host->areq)
-		mmc_post_req(host, host->areq->mrq, 0);
+	if (saved_areq)
+		mmc_post_req(host, saved_areq->mrq, 0);
 
 	 /* Cancel a prepared request if it was not started. */
 	if ((err || start_err) && areq)
@@ -573,7 +573,7 @@ struct mmc_async_req *mmc_start_req(struct mmc_host *host,
 
 	if (error)
 		*error = err;
-	return data;
+	return saved_areq;
 }
 EXPORT_SYMBOL(mmc_start_req);
 
-- 
1.8.4


^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [PATCH 2/7] mmc: core: rename local var err to saved_err
  2013-09-26 19:22 [PATCH 0/7] mmc: core: cleanup and locking patches description Grant Grundler
  2013-09-26 19:22 ` [PATCH 1/7] mmc: core: rename "data" to saved_areq Grant Grundler
@ 2013-09-26 19:22 ` Grant Grundler
  2013-09-26 19:22 ` [PATCH 3/7] mmc: core: restructure error handling for start req Grant Grundler
                   ` (5 subsequent siblings)
  7 siblings, 0 replies; 13+ messages in thread
From: Grant Grundler @ 2013-09-26 19:22 UTC (permalink / raw)
  To: Chris Ball, Ulf Hansson, Seungwon Jeon
  Cc: linux-mmc, linux-kernel, Grant Grundler

Just making it more obvious 'err' refers to the status of the in flight
request (aka saved_areq) and not the new async request we might start.

Signed-off-by: Grant Grundler <grundler@chromium.org>
---
 drivers/mmc/core/core.c | 16 ++++++++--------
 1 file changed, 8 insertions(+), 8 deletions(-)

diff --git a/drivers/mmc/core/core.c b/drivers/mmc/core/core.c
index e5a40ee..675139e 100644
--- a/drivers/mmc/core/core.c
+++ b/drivers/mmc/core/core.c
@@ -527,7 +527,7 @@ static void mmc_post_req(struct mmc_host *host, struct mmc_request *mrq,
 struct mmc_async_req *mmc_start_req(struct mmc_host *host,
 				    struct mmc_async_req *areq, int *error)
 {
-	int err = 0;
+	int saved_err = 0;
 	int start_err = 0;
 	struct mmc_async_req *saved_areq = host->areq;
 
@@ -536,10 +536,10 @@ struct mmc_async_req *mmc_start_req(struct mmc_host *host,
 		mmc_pre_req(host, areq->mrq, !saved_areq);
 
 	if (saved_areq) {
-		err = mmc_wait_for_data_req_done(host, saved_areq->mrq,	areq);
-		if (err == MMC_BLK_NEW_REQUEST) {
+		saved_err = mmc_wait_for_data_req_done(host, saved_areq->mrq,	areq);
+		if (saved_err == MMC_BLK_NEW_REQUEST) {
 			if (error)
-				*error = err;
+				*error = saved_err;
 			/*
 			 * The previous request was not completed,
 			 * nothing to return
@@ -556,23 +556,23 @@ struct mmc_async_req *mmc_start_req(struct mmc_host *host,
 			mmc_start_bkops(host->card, true);
 	}
 
-	if (!err && areq)
+	if (!saved_err && areq)
 		start_err = __mmc_start_data_req(host, areq->mrq);
 
 	if (saved_areq)
 		mmc_post_req(host, saved_areq->mrq, 0);
 
 	 /* Cancel a prepared request if it was not started. */
-	if ((err || start_err) && areq)
+	if ((saved_err || start_err) && areq)
 		mmc_post_req(host, areq->mrq, -EINVAL);
 
-	if (err)
+	if (saved_err)
 		host->areq = NULL;
 	else
 		host->areq = areq;
 
 	if (error)
-		*error = err;
+		*error = saved_err;
 	return saved_areq;
 }
 EXPORT_SYMBOL(mmc_start_req);
-- 
1.8.4


^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [PATCH 3/7] mmc: core: restructure error handling for start req
  2013-09-26 19:22 [PATCH 0/7] mmc: core: cleanup and locking patches description Grant Grundler
  2013-09-26 19:22 ` [PATCH 1/7] mmc: core: rename "data" to saved_areq Grant Grundler
  2013-09-26 19:22 ` [PATCH 2/7] mmc: core: rename local var err to saved_err Grant Grundler
@ 2013-09-26 19:22 ` Grant Grundler
  2013-09-26 19:22 ` [PATCH 4/7] mmc: core: use common code path to return error Grant Grundler
                   ` (4 subsequent siblings)
  7 siblings, 0 replies; 13+ messages in thread
From: Grant Grundler @ 2013-09-26 19:22 UTC (permalink / raw)
  To: Chris Ball, Ulf Hansson, Seungwon Jeon
  Cc: linux-mmc, linux-kernel, Grant Grundler

This is an intermediate step to fixing the locking around the
access to host->areq. Reduce the number of "if (saved_areq) vs
"if (areq)" tests in the main code path since I'm going to add
references to "host->lock" in the next patch.

Signed-off-by: Grant Grundler <grundler@chromium.org>
---
 drivers/mmc/core/core.c | 19 +++++++++----------
 1 file changed, 9 insertions(+), 10 deletions(-)

diff --git a/drivers/mmc/core/core.c b/drivers/mmc/core/core.c
index 675139e..4d5de98 100644
--- a/drivers/mmc/core/core.c
+++ b/drivers/mmc/core/core.c
@@ -556,21 +556,20 @@ struct mmc_async_req *mmc_start_req(struct mmc_host *host,
 			mmc_start_bkops(host->card, true);
 	}
 
-	if (!saved_err && areq)
+	/* Don't start something new if previous one failed. */
+	if (!saved_err && areq) {
 		start_err = __mmc_start_data_req(host, areq->mrq);
+		/* Cancel a prepared request if it was not started. */
+		if (start_err) {
+			mmc_post_req(host, areq->mrq, -EINVAL);
+			host->areq = NULL;
+		} else 
+			host->areq = areq;
+	}
 
 	if (saved_areq)
 		mmc_post_req(host, saved_areq->mrq, 0);
 
-	 /* Cancel a prepared request if it was not started. */
-	if ((saved_err || start_err) && areq)
-		mmc_post_req(host, areq->mrq, -EINVAL);
-
-	if (saved_err)
-		host->areq = NULL;
-	else
-		host->areq = areq;
-
 	if (error)
 		*error = saved_err;
 	return saved_areq;
-- 
1.8.4


^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [PATCH 4/7] mmc: core: use common code path to return error
  2013-09-26 19:22 [PATCH 0/7] mmc: core: cleanup and locking patches description Grant Grundler
                   ` (2 preceding siblings ...)
  2013-09-26 19:22 ` [PATCH 3/7] mmc: core: restructure error handling for start req Grant Grundler
@ 2013-09-26 19:22 ` Grant Grundler
  2013-09-26 19:22 ` [PATCH 5/7] mmc: core: handling polling more gracefully Grant Grundler
                   ` (3 subsequent siblings)
  7 siblings, 0 replies; 13+ messages in thread
From: Grant Grundler @ 2013-09-26 19:22 UTC (permalink / raw)
  To: Chris Ball, Ulf Hansson, Seungwon Jeon
  Cc: linux-mmc, linux-kernel, Grant Grundler

Don't replicate how the *error is returned by mmc_start_req().

Signed-off-by: Grant Grundler <grundler@chromium.org>
---
 drivers/mmc/core/core.c | 7 ++++---
 1 file changed, 4 insertions(+), 3 deletions(-)

diff --git a/drivers/mmc/core/core.c b/drivers/mmc/core/core.c
index 4d5de98..deb0ee5 100644
--- a/drivers/mmc/core/core.c
+++ b/drivers/mmc/core/core.c
@@ -538,13 +538,12 @@ struct mmc_async_req *mmc_start_req(struct mmc_host *host,
 	if (saved_areq) {
 		saved_err = mmc_wait_for_data_req_done(host, saved_areq->mrq,	areq);
 		if (saved_err == MMC_BLK_NEW_REQUEST) {
-			if (error)
-				*error = saved_err;
 			/*
 			 * The previous request was not completed,
 			 * nothing to return
 			 */
-			return NULL;
+			saved_areq = NULL;
+			goto set_error;
 		}
 		/*
 		 * Check BKOPS urgency for each R1 response
@@ -570,8 +569,10 @@ struct mmc_async_req *mmc_start_req(struct mmc_host *host,
 	if (saved_areq)
 		mmc_post_req(host, saved_areq->mrq, 0);
 
+set_error:
 	if (error)
 		*error = saved_err;
+
 	return saved_areq;
 }
 EXPORT_SYMBOL(mmc_start_req);
-- 
1.8.4


^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [PATCH 5/7] mmc: core: handling polling more gracefully
  2013-09-26 19:22 [PATCH 0/7] mmc: core: cleanup and locking patches description Grant Grundler
                   ` (3 preceding siblings ...)
  2013-09-26 19:22 ` [PATCH 4/7] mmc: core: use common code path to return error Grant Grundler
@ 2013-09-26 19:22 ` Grant Grundler
  2013-09-26 19:22 ` [PATCH 6/7] mmc: core: protect references to host->areq with host->lock Grant Grundler
                   ` (2 subsequent siblings)
  7 siblings, 0 replies; 13+ messages in thread
From: Grant Grundler @ 2013-09-26 19:22 UTC (permalink / raw)
  To: Chris Ball, Ulf Hansson, Seungwon Jeon
  Cc: linux-mmc, linux-kernel, Grant Grundler

In case threads "race" to harvest async req completions, just return.

Signed-off-by: Grant Grundler <grundler@chromium.org>
---
 drivers/mmc/core/core.c | 4 ++++
 1 file changed, 4 insertions(+)

diff --git a/drivers/mmc/core/core.c b/drivers/mmc/core/core.c
index deb0ee5..36cfe91 100644
--- a/drivers/mmc/core/core.c
+++ b/drivers/mmc/core/core.c
@@ -531,6 +531,10 @@ struct mmc_async_req *mmc_start_req(struct mmc_host *host,
 	int start_err = 0;
 	struct mmc_async_req *saved_areq = host->areq;
 
+	if (!saved_areq && !areq)
+		/* Nothing to do...some code is polling. */
+		goto set_error;
+
 	/* Prepare a new request */
 	if (areq)
 		mmc_pre_req(host, areq->mrq, !saved_areq);
-- 
1.8.4


^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [PATCH 6/7] mmc: core: protect references to host->areq with host->lock
  2013-09-26 19:22 [PATCH 0/7] mmc: core: cleanup and locking patches description Grant Grundler
                   ` (4 preceding siblings ...)
  2013-09-26 19:22 ` [PATCH 5/7] mmc: core: handling polling more gracefully Grant Grundler
@ 2013-09-26 19:22 ` Grant Grundler
  2013-10-09  0:48   ` Grant Grundler
  2013-09-26 19:23 ` [PATCH 7/7] mmc: core: mmc_start_req is a misnomer -> mmc_process_areq Grant Grundler
  2013-09-26 19:57 ` [PATCH 0/7] mmc: core: cleanup and locking patches description Grant Grundler
  7 siblings, 1 reply; 13+ messages in thread
From: Grant Grundler @ 2013-09-26 19:22 UTC (permalink / raw)
  To: Chris Ball, Ulf Hansson, Seungwon Jeon
  Cc: linux-mmc, linux-kernel, Grant Grundler

Races between host->areq being NULL or not are resulting in mmcqd
hung_task panics. Like this one:

<3>[  240.501202] INFO: task mmcqd/1:85 blocked for more than 120 seconds.
<3>[  240.501213] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
<6>[  240.501223] mmcqd/1         D 80528020     0    85      2 0x00000000
<5>[  240.501254] [<80528020>] (__schedule+0x614/0x780) from [<80528550>] (schedule+0x94/0x98)
<5>[  240.501269] [<80528550>] (schedule+0x94/0x98) from [<80526270>] (schedule_timeout+0x38/0x2d0)
<5>[  240.501284] [<80526270>] (schedule_timeout+0x38/0x2d0) from [<805283a4>] (wait_for_common+0x164/0x1a0)
<5>[  240.501298] [<805283a4>] (wait_for_common+0x164/0x1a0) from [<805284b8>] (wait_for_completion+0x20/0x24)
<5>[  240.501313] [<805284b8>] (wait_for_completion+0x20/0x24) from [<803d7068>] (mmc_wait_for_req_done+0x2c/0x84)
<5>[  240.501327] [<803d7068>] (mmc_wait_for_req_done+0x2c/0x84) from [<803d81c0>] (mmc_start_req+0x60/0x120)
<5>[  240.501342] [<803d81c0>] (mmc_start_req+0x60/0x120) from [<803e402c>] (mmc_blk_issue_rw_rq+0xa0/0x3a8)
<5>[  240.501355] [<803e402c>] (mmc_blk_issue_rw_rq+0xa0/0x3a8) from [<803e4758>] (mmc_blk_issue_rq+0x424/0x478)
<5>[  240.501368] [<803e4758>] (mmc_blk_issue_rq+0x424/0x478) from [<803e587c>] (mmc_queue_thread+0xb0/0x118)
<5>[  240.501382] [<803e587c>] (mmc_queue_thread+0xb0/0x118) from [<8004d61c>] (kthread+0xa8/0xbc)
<5>[  240.501396] [<8004d61c>] (kthread+0xa8/0xbc) from [<8000f1c8>] (kernel_thread_exit+0x0/0x8)
<0>[  240.501407] Kernel panic - not syncing: hung_task: blocked tasks
<5>[  240.501421] [<800150a4>] (unwind_backtrace+0x0/0x114) from [<80521920>] (dump_stack+0x20/0x24)
<5>[  240.501434] [<80521920>] (dump_stack+0x20/0x24) from [<80521a90>] (panic+0xa8/0x1f4)
<5>[  240.501447] [<80521a90>] (panic+0xa8/0x1f4) from [<80086d3c>] (watchdog+0x1f4/0x25c)
<5>[  240.501459] [<80086d3c>] (watchdog+0x1f4/0x25c) from [<8004d61c>] (kthread+0xa8/0xbc)
<5>[  240.501471] [<8004d61c>] (kthread+0xa8/0xbc) from [<8000f1c8>] (kernel_thread_exit+0x0/0x8)

I was able to reproduce the mmcqd "hung task" timeout consistently
with this fio command line on an Exynos5250 system with Toshiba HS200
eMMC running in HS200 mode:
	fio --name=short_randwrite --size=2G --time_based --runtime=3m \
		--readwrite=randwrite --numjobs=2 --bs=4k --norandommap \
		--ioengine=psync --direct=0 --filename=/dev/mmcblk0p5

I believe the key parameters are "--numjobs=2" (or more) and "randwrite"
workload.  Then the completions are happening around the same time as
mmc_start_req() is referencing and/or updating host->areq.

I was NOT able to consistently reproduce the problem on a similar
Exynos 5250 system which had "engineering samples" of Samsung HS200
capable eMMC installed. Just my clue that the timing is different
(and the fio performance numbers are different too).

Signed-off-by: Grant Grundler <grundler@chromium.org>
---
 drivers/mmc/core/core.c | 34 +++++++++++++++++++++++-----------
 1 file changed, 23 insertions(+), 11 deletions(-)

diff --git a/drivers/mmc/core/core.c b/drivers/mmc/core/core.c
index 36cfe91..e5a9599 100644
--- a/drivers/mmc/core/core.c
+++ b/drivers/mmc/core/core.c
@@ -529,29 +529,40 @@ struct mmc_async_req *mmc_start_req(struct mmc_host *host,
 {
 	int saved_err = 0;
 	int start_err = 0;
-	struct mmc_async_req *saved_areq = host->areq;
+	struct mmc_async_req *saved_areq;
+	unsigned long flags;
 
-	if (!saved_areq && !areq)
-		/* Nothing to do...some code is polling. */
+	spin_lock_irqsave(&host->lock, flags);
+	saved_areq = host->areq;
+	if (!saved_areq && !areq) {
+		/* Nothing? Code is racing to harvest a completion. */
+		spin_unlock_irqrestore(&host->lock, flags);
 		goto set_error;
+	}
 
 	/* Prepare a new request */
 	if (areq)
 		mmc_pre_req(host, areq->mrq, !saved_areq);
 
 	if (saved_areq) {
+		/* This thread owns this IO (saved_areq) for now. */
+		host->areq = NULL;
+		spin_unlock_irqrestore(&host->lock, flags);
+
 		saved_err = mmc_wait_for_data_req_done(host, saved_areq->mrq,	areq);
 		if (saved_err == MMC_BLK_NEW_REQUEST) {
-			/*
-			 * The previous request was not completed,
-			 * nothing to return
-			 */
+			spin_lock_irqsave(&host->lock, flags);
+			BUG_ON(host->areq != NULL);
+
+			/* Not completed. Don't report it. */
+			host->areq = saved_areq;
 			saved_areq = NULL;
+			saved_err = 0;
+			spin_unlock_irqrestore(&host->lock, flags);
 			goto set_error;
 		}
-		/*
-		 * Check BKOPS urgency for each R1 response
-		 */
+
+		/* Check BKOPS urgency for each R1 response */
 		if (host->card && mmc_card_mmc(host->card) &&
 		    ((mmc_resp_type(saved_areq->mrq->cmd) == MMC_RSP_R1) ||
 		     (mmc_resp_type(saved_areq->mrq->cmd) == MMC_RSP_R1B)) &&
@@ -562,11 +573,12 @@ struct mmc_async_req *mmc_start_req(struct mmc_host *host,
 	/* Don't start something new if previous one failed. */
 	if (!saved_err && areq) {
 		start_err = __mmc_start_data_req(host, areq->mrq);
+
 		/* Cancel a prepared request if it was not started. */
 		if (start_err) {
 			mmc_post_req(host, areq->mrq, -EINVAL);
 			host->areq = NULL;
-		} else 
+		} else
 			host->areq = areq;
 	}
 
-- 
1.8.4


^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [PATCH 7/7] mmc: core: mmc_start_req is a misnomer -> mmc_process_areq
  2013-09-26 19:22 [PATCH 0/7] mmc: core: cleanup and locking patches description Grant Grundler
                   ` (5 preceding siblings ...)
  2013-09-26 19:22 ` [PATCH 6/7] mmc: core: protect references to host->areq with host->lock Grant Grundler
@ 2013-09-26 19:23 ` Grant Grundler
  2013-09-26 19:57 ` [PATCH 0/7] mmc: core: cleanup and locking patches description Grant Grundler
  7 siblings, 0 replies; 13+ messages in thread
From: Grant Grundler @ 2013-09-26 19:23 UTC (permalink / raw)
  To: Chris Ball, Ulf Hansson, Seungwon Jeon
  Cc: linux-mmc, linux-kernel, Grant Grundler

mmc_start_req() only handles *asynchronous* requests and is easily
confused with mmc_start_request and __mmc_start_req() (both of which
only handle synchronous requests). Renaming should hopefully make
it clearer this function is used to harvest completions and start
new requests.

Signed-off-by: Grant Grundler <grundler@chromium.org>
---
 drivers/mmc/card/block.c    |  8 ++++----
 drivers/mmc/card/mmc_test.c |  4 ++--
 drivers/mmc/core/core.c     | 23 +++++++++++------------
 include/linux/mmc/core.h    |  2 +-
 4 files changed, 18 insertions(+), 19 deletions(-)

diff --git a/drivers/mmc/card/block.c b/drivers/mmc/card/block.c
index 1a3163f..91a5dae 100644
--- a/drivers/mmc/card/block.c
+++ b/drivers/mmc/card/block.c
@@ -1805,7 +1805,7 @@ static int mmc_blk_issue_rw_rq(struct mmc_queue *mq, struct request *rqc)
 			areq = &mq->mqrq_cur->mmc_active;
 		} else
 			areq = NULL;
-		areq = mmc_start_req(card->host, areq, (int *) &status);
+		areq = mmc_process_areq(card->host, areq, (int *) &status);
 		if (!areq) {
 			if (status == MMC_BLK_NEW_REQUEST)
 				mq->flags |= MMC_QUEUE_NEW_REQUEST;
@@ -1902,7 +1902,7 @@ static int mmc_blk_issue_rw_rq(struct mmc_queue *mq, struct request *rqc)
 				if (!mq_rq->packed->retries)
 					goto cmd_abort;
 				mmc_blk_packed_hdr_wrq_prep(mq_rq, card, mq);
-				mmc_start_req(card->host,
+				mmc_process_areq(card->host,
 					      &mq_rq->mmc_active, NULL);
 			} else {
 
@@ -1912,7 +1912,7 @@ static int mmc_blk_issue_rw_rq(struct mmc_queue *mq, struct request *rqc)
 				 */
 				mmc_blk_rw_rq_prep(mq_rq, card,
 						disable_multi, mq);
-				mmc_start_req(card->host,
+				mmc_process_areq(card->host,
 						&mq_rq->mmc_active, NULL);
 			}
 		}
@@ -1944,7 +1944,7 @@ static int mmc_blk_issue_rw_rq(struct mmc_queue *mq, struct request *rqc)
 				mmc_blk_revert_packed_req(mq, mq->mqrq_cur);
 
 			mmc_blk_rw_rq_prep(mq->mqrq_cur, card, 0, mq);
-			mmc_start_req(card->host,
+			mmc_process_areq(card->host,
 				      &mq->mqrq_cur->mmc_active, NULL);
 		}
 	}
diff --git a/drivers/mmc/card/mmc_test.c b/drivers/mmc/card/mmc_test.c
index 0c0fc52..c1e2e09 100644
--- a/drivers/mmc/card/mmc_test.c
+++ b/drivers/mmc/card/mmc_test.c
@@ -811,7 +811,7 @@ static int mmc_test_nonblock_transfer(struct mmc_test_card *test,
 	for (i = 0; i < count; i++) {
 		mmc_test_prepare_mrq(test, cur_areq->mrq, sg, sg_len, dev_addr,
 				     blocks, blksz, write);
-		done_areq = mmc_start_req(test->card->host, cur_areq, &ret);
+		done_areq = mmc_process_areq(test->card->host, cur_areq, &ret);
 
 		if (ret || (!done_areq && i > 0))
 			goto err;
@@ -830,7 +830,7 @@ static int mmc_test_nonblock_transfer(struct mmc_test_card *test,
 		dev_addr += blocks;
 	}
 
-	done_areq = mmc_start_req(test->card->host, NULL, &ret);
+	done_areq = mmc_process_areq(test->card->host, NULL, &ret);
 
 	return ret;
 err:
diff --git a/drivers/mmc/core/core.c b/drivers/mmc/core/core.c
index e5a9599..e824ad9 100644
--- a/drivers/mmc/core/core.c
+++ b/drivers/mmc/core/core.c
@@ -475,7 +475,7 @@ static void mmc_wait_for_req_done(struct mmc_host *host,
  *	@is_first_req: true if there is no previous started request
  *                     that may run in parellel to this call, otherwise false
  *
- *	mmc_pre_req() is called in prior to mmc_start_req() to let
+ *	mmc_pre_req() is called in prior to mmc_process_req() to let
  *	host prepare for the new request. Preparation of a request may be
  *	performed while another request is running on the host.
  */
@@ -509,22 +509,21 @@ static void mmc_post_req(struct mmc_host *host, struct mmc_request *mrq,
 }
 
 /**
- *	mmc_start_req - start a non-blocking request
+ *	mmc_process_areq - start a non-blocking IO and handle completions
  *	@host: MMC host to start command
  *	@areq: async request to start
- *	@error: out parameter returns 0 for success, otherwise non zero
+ *	@error: out parameter returns 0 (success) or non-zero (failure)
+ *		 of completed IO (not the IO we tried to start).
  *
- *	Start a new MMC custom command request for a host.
- *	If there is on ongoing async request wait for completion
- *	of that request and start the new one and return.
+ *	If a host has an in-flight async request, wait for completion
+ *	of that request, then start the new MMC custom command request.
  *	Does not wait for the new request to complete.
  *
- *      Returns the completed request, NULL in case of none completed.
- *	Wait for the an ongoing request (previoulsy started) to complete and
- *	return the completed request. If there is no ongoing request, NULL
- *	is returned without waiting. NULL is not an error condition.
+ *	Returns the pointer to the completed request; NULL in case of
+ *	none completed or no in-flight request.  NULL is not an error
+ *	condition.
  */
-struct mmc_async_req *mmc_start_req(struct mmc_host *host,
+struct mmc_async_req *mmc_process_areq(struct mmc_host *host,
 				    struct mmc_async_req *areq, int *error)
 {
 	int saved_err = 0;
@@ -591,7 +590,7 @@ set_error:
 
 	return saved_areq;
 }
-EXPORT_SYMBOL(mmc_start_req);
+EXPORT_SYMBOL(mmc_process_areq);
 
 /**
  *	mmc_wait_for_req - start a request and wait for completion
diff --git a/include/linux/mmc/core.h b/include/linux/mmc/core.h
index a00fc49..773cabb 100644
--- a/include/linux/mmc/core.h
+++ b/include/linux/mmc/core.h
@@ -142,7 +142,7 @@ struct mmc_async_req;
 
 extern int mmc_stop_bkops(struct mmc_card *);
 extern int mmc_read_bkops_status(struct mmc_card *);
-extern struct mmc_async_req *mmc_start_req(struct mmc_host *,
+extern struct mmc_async_req *mmc_process_areq(struct mmc_host *,
 					   struct mmc_async_req *, int *);
 extern int mmc_interrupt_hpi(struct mmc_card *);
 extern void mmc_wait_for_req(struct mmc_host *, struct mmc_request *);
-- 
1.8.4


^ permalink raw reply related	[flat|nested] 13+ messages in thread

* Re: [PATCH 0/7] mmc: core: cleanup and locking patches description
  2013-09-26 19:22 [PATCH 0/7] mmc: core: cleanup and locking patches description Grant Grundler
                   ` (6 preceding siblings ...)
  2013-09-26 19:23 ` [PATCH 7/7] mmc: core: mmc_start_req is a misnomer -> mmc_process_areq Grant Grundler
@ 2013-09-26 19:57 ` Grant Grundler
  2013-10-03 20:35   ` Grant Grundler
  7 siblings, 1 reply; 13+ messages in thread
From: Grant Grundler @ 2013-09-26 19:57 UTC (permalink / raw)
  To: Grant Grundler; +Cc: Chris Ball, Ulf Hansson, Seungwon Jeon, linux-mmc, LKML

Argh...too much wordsmithing...

On Thu, Sep 26, 2013 at 12:22 PM, Grant Grundler <grundler@chromium.org> wrote:
> Following 7 patches are mostly cleanup with one key patch around host->areq
> locking.  The host->areq locking problem description is here:
>         http://www.spinics.net/lists/linux-mmc/msg21644.html
>
> I do believe this preposed host->areq locking patch is a complete fix.

... is NOT a complete fix.

> But it appears to fix the problem and is better than nothing.

Still true.

cheers,
grant

>
> This patch sequence applies clean to linus' 3.12-rc2 branch and only compile
> tested in this form. This is a forward port (and cleanup) of the
> same patch I've been testing on ChromeOS-3.4 tree using Exynos5250 chipset.
>
> cheers,
> grant
>
> 0000-mmc-core-cleanups-and-locking-description (this email)
> 0001-mmc-core-rename-data-to-saved_areq.patch
> 0002-mmc-core-rename-local-var-err-to-saved_err.patch
> 0003-mmc-core-restructure-error-handling-for-start-req.patch
> 0004-mmc-core-use-common-code-path-to-return-error.patch
> 0005-mmc-core-handling-polling-more-gracefully.patch
> 0006-mmc-core-protect-references-to-host-areq-with-host-l.patch
> 0007-mmc-core-mmc_start_req-is-a-misnomer-mmc_process_are.patch
>
>
>  drivers/mmc/card/block.c    |   8 ++--
>  drivers/mmc/card/mmc_test.c |   4 +-
>  drivers/mmc/core/core.c     | 103 +++++++++++++++++++++++++-------------------
>  include/linux/mmc/core.h    |   2 +-
>  4 files changed, 66 insertions(+), 51 deletions(-)

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH 0/7] mmc: core: cleanup and locking patches description
  2013-09-26 19:57 ` [PATCH 0/7] mmc: core: cleanup and locking patches description Grant Grundler
@ 2013-10-03 20:35   ` Grant Grundler
       [not found]     ` <CAPDyKFqZHkDAKoaRM1vrUXCMVsAZBEs-PanjyoR=E=Lrp1aU7Q@mail.gmail.com>
  0 siblings, 1 reply; 13+ messages in thread
From: Grant Grundler @ 2013-10-03 20:35 UTC (permalink / raw)
  To: Seungwon Jeon, ALIM AKHTAR, Jaehoon Chung
  Cc: Chris Ball, Ulf Hansson, linux-mmc, LKML

Ping?

Has anyone had a chance to review/test this series and/or be willing
to do so this week?

grundler@firesword:~$ pwclient list -w 'grundler@chromium.org' -p LKML
Patches submitted by Grant Grundler <grundler@chromium.org>:
ID      State        Name
--      -----        ----
...
2950961 New          [1/7] mmc: core: rename "data" to saved_areq
2951081 New          [2/7] mmc: core: rename local var err to saved_err
2950981 New          [3/7] mmc: core: restructure error handling for start req
2951001 New          [4/7] mmc: core: use common code path to return error
2951061 New          [5/7] mmc: core: handling polling more gracefully
2951041 New          [6/7] mmc: core: protect references to host->areq
with host->lock
2951021 New          [7/7] mmc: core: mmc_start_req is a misnomer ->
mmc_process_areq


To remind, this is the fio command line that results in mmcqd hangs on
the exynos 5250 system (uses dw_mmc storage controller):

$FIO --name=short_randwrite --size=2G --time_based --runtime=90 \
        --readwrite=randwrite --numjobs=2 --bs=4k --norandommap
--ioengine=psync \
        --direct=0 --invalidate=1 --filename=/dev/mmcblk0p5

thanks,
grant

On Thu, Sep 26, 2013 at 12:57 PM, Grant Grundler <grundler@chromium.org> wrote:
> Argh...too much wordsmithing...
>
> On Thu, Sep 26, 2013 at 12:22 PM, Grant Grundler <grundler@chromium.org> wrote:
>> Following 7 patches are mostly cleanup with one key patch around host->areq
>> locking.  The host->areq locking problem description is here:
>>         http://www.spinics.net/lists/linux-mmc/msg21644.html
>>
>> I do believe this preposed host->areq locking patch is a complete fix.
>
> ... is NOT a complete fix.
>
>> But it appears to fix the problem and is better than nothing.
>
> Still true.
>
> cheers,
> grant
>
>>
>> This patch sequence applies clean to linus' 3.12-rc2 branch and only compile
>> tested in this form. This is a forward port (and cleanup) of the
>> same patch I've been testing on ChromeOS-3.4 tree using Exynos5250 chipset.
>>
>> cheers,
>> grant
>>
>> 0000-mmc-core-cleanups-and-locking-description (this email)
>> 0001-mmc-core-rename-data-to-saved_areq.patch
>> 0002-mmc-core-rename-local-var-err-to-saved_err.patch
>> 0003-mmc-core-restructure-error-handling-for-start-req.patch
>> 0004-mmc-core-use-common-code-path-to-return-error.patch
>> 0005-mmc-core-handling-polling-more-gracefully.patch
>> 0006-mmc-core-protect-references-to-host-areq-with-host-l.patch
>> 0007-mmc-core-mmc_start_req-is-a-misnomer-mmc_process_are.patch
>>
>>
>>  drivers/mmc/card/block.c    |   8 ++--
>>  drivers/mmc/card/mmc_test.c |   4 +-
>>  drivers/mmc/core/core.c     | 103 +++++++++++++++++++++++++-------------------
>>  include/linux/mmc/core.h    |   2 +-
>>  4 files changed, 66 insertions(+), 51 deletions(-)

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH 0/7] mmc: core: cleanup and locking patches description
       [not found]     ` <CAPDyKFqZHkDAKoaRM1vrUXCMVsAZBEs-PanjyoR=E=Lrp1aU7Q@mail.gmail.com>
@ 2013-10-03 23:48       ` Grant Grundler
  0 siblings, 0 replies; 13+ messages in thread
From: Grant Grundler @ 2013-10-03 23:48 UTC (permalink / raw)
  To: Ulf Hansson
  Cc: Grant Grundler, linux-mmc, Seungwon Jeon, LKML, Chris Ball,
	Jaehoon Chung, ALIM AKHTAR

On Thu, Oct 3, 2013 at 2:35 PM, Ulf Hansson <ulf.hansson@linaro.org> wrote:
>
> Den 3 okt 2013 22:35 skrev "Grant Grundler" <grundler@chromium.org>:
>
>
>>
>> Ping?
>>
>> Has anyone had a chance to review/test this series and/or be willing
>> to do so this week?
>
> Hi Grant,
>
> It is on my todo list. My plan is to check the behavior on ux500 board,
> which uses mmci and which has been one of the first drivers that utilizes
> the async request mechanism.

Thanks Ulf for the update.

dw_mmc is the only driver that I can see that uses a tasklet (Soft
Interrupt context) to handle completions. So it's "timing" will be
unique and nothing it calls can acquire semaphores in that context. :/

> I am not that keen of the patches that renames stuff though, not sure it
> actually improves anything.

I do agree the name changes are secondary to the locking changes.

Names matter. "data" doesn't say anything about how the variable is
expected to be used. And "err" doesn't suggest this variable is ONLY
related to the completion, not the starting of a new async request.
mmc_start_req() is very confusing since it's very different from
mmc_start_request. At a minimum, mmc_start_req() should be renamed to
"<mumble>_areq" to be clear it's not dealing with synchronous requests
at all. So yes, renaming stuff can improve what developers expect from
the code.

> This week is not possible, but likely the next.

Ok! Thanks for letting me know you plan on looking at it. I can wait.
If necessary, I will poke you again next week. :)

thank you!
grant

>
> Kind regards
> Ulf Hanson
>
>>
>> grundler@firesword:~$ pwclient list -w 'grundler@chromium.org' -p LKML
>> Patches submitted by Grant Grundler <grundler@chromium.org>:
>> ID      State        Name
>> --      -----        ----
>> ...
>> 2950961 New          [1/7] mmc: core: rename "data" to saved_areq
>> 2951081 New          [2/7] mmc: core: rename local var err to saved_err
>> 2950981 New          [3/7] mmc: core: restructure error handling for start
>> req
>> 2951001 New          [4/7] mmc: core: use common code path to return error
>> 2951061 New          [5/7] mmc: core: handling polling more gracefully
>> 2951041 New          [6/7] mmc: core: protect references to host->areq
>> with host->lock
>> 2951021 New          [7/7] mmc: core: mmc_start_req is a misnomer ->
>> mmc_process_areq
>>
>>
>> To remind, this is the fio command line that results in mmcqd hangs on
>> the exynos 5250 system (uses dw_mmc storage controller):
>>
>> $FIO --name=short_randwrite --size=2G --time_based --runtime=90 \
>>         --readwrite=randwrite --numjobs=2 --bs=4k --norandommap
>> --ioengine=psync \
>>         --direct=0 --invalidate=1 --filename=/dev/mmcblk0p5
>>
>> thanks,
>> grant
>>
>> On Thu, Sep 26, 2013 at 12:57 PM, Grant Grundler <grundler@chromium.org>
>> wrote:
>> > Argh...too much wordsmithing...
>> >
>> > On Thu, Sep 26, 2013 at 12:22 PM, Grant Grundler <grundler@chromium.org>
>> > wrote:
>> >> Following 7 patches are mostly cleanup with one key patch around
>> >> host->areq
>> >> locking.  The host->areq locking problem description is here:
>> >>         http://www.spinics.net/lists/linux-mmc/msg21644.html
>> >>
>> >> I do believe this preposed host->areq locking patch is a complete fix.
>> >
>> > ... is NOT a complete fix.
>> >
>> >> But it appears to fix the problem and is better than nothing.
>> >
>> > Still true.
>> >
>> > cheers,
>> > grant
>> >
>> >>
>> >> This patch sequence applies clean to linus' 3.12-rc2 branch and only
>> >> compile
>> >> tested in this form. This is a forward port (and cleanup) of the
>> >> same patch I've been testing on ChromeOS-3.4 tree using Exynos5250
>> >> chipset.
>> >>
>> >> cheers,
>> >> grant
>> >>
>> >> 0000-mmc-core-cleanups-and-locking-description (this email)
>> >> 0001-mmc-core-rename-data-to-saved_areq.patch
>> >> 0002-mmc-core-rename-local-var-err-to-saved_err.patch
>> >> 0003-mmc-core-restructure-error-handling-for-start-req.patch
>> >> 0004-mmc-core-use-common-code-path-to-return-error.patch
>> >> 0005-mmc-core-handling-polling-more-gracefully.patch
>> >> 0006-mmc-core-protect-references-to-host-areq-with-host-l.patch
>> >> 0007-mmc-core-mmc_start_req-is-a-misnomer-mmc_process_are.patch
>> >>
>> >>
>> >>  drivers/mmc/card/block.c    |   8 ++--
>> >>  drivers/mmc/card/mmc_test.c |   4 +-
>> >>  drivers/mmc/core/core.c     | 103
>> >> +++++++++++++++++++++++++-------------------
>> >>  include/linux/mmc/core.h    |   2 +-
>> >>  4 files changed, 66 insertions(+), 51 deletions(-)

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH 6/7] mmc: core: protect references to host->areq with host->lock
  2013-09-26 19:22 ` [PATCH 6/7] mmc: core: protect references to host->areq with host->lock Grant Grundler
@ 2013-10-09  0:48   ` Grant Grundler
  2013-10-09  9:07     ` Ulf Hansson
  0 siblings, 1 reply; 13+ messages in thread
From: Grant Grundler @ 2013-10-09  0:48 UTC (permalink / raw)
  To: Grant Grundler; +Cc: Chris Ball, Ulf Hansson, Seungwon Jeon, linux-mmc, LKML

Ulf,
While this patch might be correct, it's not solving the problem I
claimed and my explanation was wrong. See comments in this code
review:
    https://chromium-review.googlesource.com/#/c/170880/1//COMMIT_MSG

While I no longer see the same crash with this change in our "ToT
tree", I'm able to reproduce the original mmcqd crash on a different
kernel variant (also based on chromeos-3.4 kernel).

I think I need to review references to mqrq_prev and mqrq_cur since
those appear to be protected by mq->thread_sem and I suspect
references are happening from dw_mmc tasklet without holding this
semaphore.

apologies,
grant


On Thu, Sep 26, 2013 at 12:22 PM, Grant Grundler <grundler@chromium.org> wrote:
> Races between host->areq being NULL or not are resulting in mmcqd
> hung_task panics. Like this one:
>
> <3>[  240.501202] INFO: task mmcqd/1:85 blocked for more than 120 seconds.
> <3>[  240.501213] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
> <6>[  240.501223] mmcqd/1         D 80528020     0    85      2 0x00000000
> <5>[  240.501254] [<80528020>] (__schedule+0x614/0x780) from [<80528550>] (schedule+0x94/0x98)
> <5>[  240.501269] [<80528550>] (schedule+0x94/0x98) from [<80526270>] (schedule_timeout+0x38/0x2d0)
> <5>[  240.501284] [<80526270>] (schedule_timeout+0x38/0x2d0) from [<805283a4>] (wait_for_common+0x164/0x1a0)
> <5>[  240.501298] [<805283a4>] (wait_for_common+0x164/0x1a0) from [<805284b8>] (wait_for_completion+0x20/0x24)
> <5>[  240.501313] [<805284b8>] (wait_for_completion+0x20/0x24) from [<803d7068>] (mmc_wait_for_req_done+0x2c/0x84)
> <5>[  240.501327] [<803d7068>] (mmc_wait_for_req_done+0x2c/0x84) from [<803d81c0>] (mmc_start_req+0x60/0x120)
> <5>[  240.501342] [<803d81c0>] (mmc_start_req+0x60/0x120) from [<803e402c>] (mmc_blk_issue_rw_rq+0xa0/0x3a8)
> <5>[  240.501355] [<803e402c>] (mmc_blk_issue_rw_rq+0xa0/0x3a8) from [<803e4758>] (mmc_blk_issue_rq+0x424/0x478)
> <5>[  240.501368] [<803e4758>] (mmc_blk_issue_rq+0x424/0x478) from [<803e587c>] (mmc_queue_thread+0xb0/0x118)
> <5>[  240.501382] [<803e587c>] (mmc_queue_thread+0xb0/0x118) from [<8004d61c>] (kthread+0xa8/0xbc)
> <5>[  240.501396] [<8004d61c>] (kthread+0xa8/0xbc) from [<8000f1c8>] (kernel_thread_exit+0x0/0x8)
> <0>[  240.501407] Kernel panic - not syncing: hung_task: blocked tasks
> <5>[  240.501421] [<800150a4>] (unwind_backtrace+0x0/0x114) from [<80521920>] (dump_stack+0x20/0x24)
> <5>[  240.501434] [<80521920>] (dump_stack+0x20/0x24) from [<80521a90>] (panic+0xa8/0x1f4)
> <5>[  240.501447] [<80521a90>] (panic+0xa8/0x1f4) from [<80086d3c>] (watchdog+0x1f4/0x25c)
> <5>[  240.501459] [<80086d3c>] (watchdog+0x1f4/0x25c) from [<8004d61c>] (kthread+0xa8/0xbc)
> <5>[  240.501471] [<8004d61c>] (kthread+0xa8/0xbc) from [<8000f1c8>] (kernel_thread_exit+0x0/0x8)
>
> I was able to reproduce the mmcqd "hung task" timeout consistently
> with this fio command line on an Exynos5250 system with Toshiba HS200
> eMMC running in HS200 mode:
>         fio --name=short_randwrite --size=2G --time_based --runtime=3m \
>                 --readwrite=randwrite --numjobs=2 --bs=4k --norandommap \
>                 --ioengine=psync --direct=0 --filename=/dev/mmcblk0p5
>
> I believe the key parameters are "--numjobs=2" (or more) and "randwrite"
> workload.  Then the completions are happening around the same time as
> mmc_start_req() is referencing and/or updating host->areq.
>
> I was NOT able to consistently reproduce the problem on a similar
> Exynos 5250 system which had "engineering samples" of Samsung HS200
> capable eMMC installed. Just my clue that the timing is different
> (and the fio performance numbers are different too).
>
> Signed-off-by: Grant Grundler <grundler@chromium.org>
> ---
>  drivers/mmc/core/core.c | 34 +++++++++++++++++++++++-----------
>  1 file changed, 23 insertions(+), 11 deletions(-)
>
> diff --git a/drivers/mmc/core/core.c b/drivers/mmc/core/core.c
> index 36cfe91..e5a9599 100644
> --- a/drivers/mmc/core/core.c
> +++ b/drivers/mmc/core/core.c
> @@ -529,29 +529,40 @@ struct mmc_async_req *mmc_start_req(struct mmc_host *host,
>  {
>         int saved_err = 0;
>         int start_err = 0;
> -       struct mmc_async_req *saved_areq = host->areq;
> +       struct mmc_async_req *saved_areq;
> +       unsigned long flags;
>
> -       if (!saved_areq && !areq)
> -               /* Nothing to do...some code is polling. */
> +       spin_lock_irqsave(&host->lock, flags);
> +       saved_areq = host->areq;
> +       if (!saved_areq && !areq) {
> +               /* Nothing? Code is racing to harvest a completion. */
> +               spin_unlock_irqrestore(&host->lock, flags);
>                 goto set_error;
> +       }
>
>         /* Prepare a new request */
>         if (areq)
>                 mmc_pre_req(host, areq->mrq, !saved_areq);
>
>         if (saved_areq) {
> +               /* This thread owns this IO (saved_areq) for now. */
> +               host->areq = NULL;
> +               spin_unlock_irqrestore(&host->lock, flags);
> +
>                 saved_err = mmc_wait_for_data_req_done(host, saved_areq->mrq,   areq);
>                 if (saved_err == MMC_BLK_NEW_REQUEST) {
> -                       /*
> -                        * The previous request was not completed,
> -                        * nothing to return
> -                        */
> +                       spin_lock_irqsave(&host->lock, flags);
> +                       BUG_ON(host->areq != NULL);
> +
> +                       /* Not completed. Don't report it. */
> +                       host->areq = saved_areq;
>                         saved_areq = NULL;
> +                       saved_err = 0;
> +                       spin_unlock_irqrestore(&host->lock, flags);
>                         goto set_error;
>                 }
> -               /*
> -                * Check BKOPS urgency for each R1 response
> -                */
> +
> +               /* Check BKOPS urgency for each R1 response */
>                 if (host->card && mmc_card_mmc(host->card) &&
>                     ((mmc_resp_type(saved_areq->mrq->cmd) == MMC_RSP_R1) ||
>                      (mmc_resp_type(saved_areq->mrq->cmd) == MMC_RSP_R1B)) &&
> @@ -562,11 +573,12 @@ struct mmc_async_req *mmc_start_req(struct mmc_host *host,
>         /* Don't start something new if previous one failed. */
>         if (!saved_err && areq) {
>                 start_err = __mmc_start_data_req(host, areq->mrq);
> +
>                 /* Cancel a prepared request if it was not started. */
>                 if (start_err) {
>                         mmc_post_req(host, areq->mrq, -EINVAL);
>                         host->areq = NULL;
> -               } else
> +               } else
>                         host->areq = areq;
>         }
>
> --
> 1.8.4
>

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH 6/7] mmc: core: protect references to host->areq with host->lock
  2013-10-09  0:48   ` Grant Grundler
@ 2013-10-09  9:07     ` Ulf Hansson
  0 siblings, 0 replies; 13+ messages in thread
From: Ulf Hansson @ 2013-10-09  9:07 UTC (permalink / raw)
  To: Grant Grundler; +Cc: Chris Ball, Seungwon Jeon, linux-mmc, LKML

On 9 October 2013 02:48, Grant Grundler <grundler@chromium.org> wrote:
> Ulf,
> While this patch might be correct, it's not solving the problem I
> claimed and my explanation was wrong. See comments in this code
> review:
>     https://chromium-review.googlesource.com/#/c/170880/1//COMMIT_MSG
>
> While I no longer see the same crash with this change in our "ToT
> tree", I'm able to reproduce the original mmcqd crash on a different
> kernel variant (also based on chromeos-3.4 kernel).
>
> I think I need to review references to mqrq_prev and mqrq_cur since
> those appear to be protected by mq->thread_sem and I suspect
> references are happening from dw_mmc tasklet without holding this
> semaphore.
>
> apologies,
> grant
>

No worries Grant. Feel free to add me on Cc if you send a patch for
dw_mmc to fix the problem, maybe I can help out reviewing.

Kind regards
Ulf Hansson



>
> On Thu, Sep 26, 2013 at 12:22 PM, Grant Grundler <grundler@chromium.org> wrote:
>> Races between host->areq being NULL or not are resulting in mmcqd
>> hung_task panics. Like this one:
>>
>> <3>[  240.501202] INFO: task mmcqd/1:85 blocked for more than 120 seconds.
>> <3>[  240.501213] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
>> <6>[  240.501223] mmcqd/1         D 80528020     0    85      2 0x00000000
>> <5>[  240.501254] [<80528020>] (__schedule+0x614/0x780) from [<80528550>] (schedule+0x94/0x98)
>> <5>[  240.501269] [<80528550>] (schedule+0x94/0x98) from [<80526270>] (schedule_timeout+0x38/0x2d0)
>> <5>[  240.501284] [<80526270>] (schedule_timeout+0x38/0x2d0) from [<805283a4>] (wait_for_common+0x164/0x1a0)
>> <5>[  240.501298] [<805283a4>] (wait_for_common+0x164/0x1a0) from [<805284b8>] (wait_for_completion+0x20/0x24)
>> <5>[  240.501313] [<805284b8>] (wait_for_completion+0x20/0x24) from [<803d7068>] (mmc_wait_for_req_done+0x2c/0x84)
>> <5>[  240.501327] [<803d7068>] (mmc_wait_for_req_done+0x2c/0x84) from [<803d81c0>] (mmc_start_req+0x60/0x120)
>> <5>[  240.501342] [<803d81c0>] (mmc_start_req+0x60/0x120) from [<803e402c>] (mmc_blk_issue_rw_rq+0xa0/0x3a8)
>> <5>[  240.501355] [<803e402c>] (mmc_blk_issue_rw_rq+0xa0/0x3a8) from [<803e4758>] (mmc_blk_issue_rq+0x424/0x478)
>> <5>[  240.501368] [<803e4758>] (mmc_blk_issue_rq+0x424/0x478) from [<803e587c>] (mmc_queue_thread+0xb0/0x118)
>> <5>[  240.501382] [<803e587c>] (mmc_queue_thread+0xb0/0x118) from [<8004d61c>] (kthread+0xa8/0xbc)
>> <5>[  240.501396] [<8004d61c>] (kthread+0xa8/0xbc) from [<8000f1c8>] (kernel_thread_exit+0x0/0x8)
>> <0>[  240.501407] Kernel panic - not syncing: hung_task: blocked tasks
>> <5>[  240.501421] [<800150a4>] (unwind_backtrace+0x0/0x114) from [<80521920>] (dump_stack+0x20/0x24)
>> <5>[  240.501434] [<80521920>] (dump_stack+0x20/0x24) from [<80521a90>] (panic+0xa8/0x1f4)
>> <5>[  240.501447] [<80521a90>] (panic+0xa8/0x1f4) from [<80086d3c>] (watchdog+0x1f4/0x25c)
>> <5>[  240.501459] [<80086d3c>] (watchdog+0x1f4/0x25c) from [<8004d61c>] (kthread+0xa8/0xbc)
>> <5>[  240.501471] [<8004d61c>] (kthread+0xa8/0xbc) from [<8000f1c8>] (kernel_thread_exit+0x0/0x8)
>>
>> I was able to reproduce the mmcqd "hung task" timeout consistently
>> with this fio command line on an Exynos5250 system with Toshiba HS200
>> eMMC running in HS200 mode:
>>         fio --name=short_randwrite --size=2G --time_based --runtime=3m \
>>                 --readwrite=randwrite --numjobs=2 --bs=4k --norandommap \
>>                 --ioengine=psync --direct=0 --filename=/dev/mmcblk0p5
>>
>> I believe the key parameters are "--numjobs=2" (or more) and "randwrite"
>> workload.  Then the completions are happening around the same time as
>> mmc_start_req() is referencing and/or updating host->areq.
>>
>> I was NOT able to consistently reproduce the problem on a similar
>> Exynos 5250 system which had "engineering samples" of Samsung HS200
>> capable eMMC installed. Just my clue that the timing is different
>> (and the fio performance numbers are different too).
>>
>> Signed-off-by: Grant Grundler <grundler@chromium.org>
>> ---
>>  drivers/mmc/core/core.c | 34 +++++++++++++++++++++++-----------
>>  1 file changed, 23 insertions(+), 11 deletions(-)
>>
>> diff --git a/drivers/mmc/core/core.c b/drivers/mmc/core/core.c
>> index 36cfe91..e5a9599 100644
>> --- a/drivers/mmc/core/core.c
>> +++ b/drivers/mmc/core/core.c
>> @@ -529,29 +529,40 @@ struct mmc_async_req *mmc_start_req(struct mmc_host *host,
>>  {
>>         int saved_err = 0;
>>         int start_err = 0;
>> -       struct mmc_async_req *saved_areq = host->areq;
>> +       struct mmc_async_req *saved_areq;
>> +       unsigned long flags;
>>
>> -       if (!saved_areq && !areq)
>> -               /* Nothing to do...some code is polling. */
>> +       spin_lock_irqsave(&host->lock, flags);
>> +       saved_areq = host->areq;
>> +       if (!saved_areq && !areq) {
>> +               /* Nothing? Code is racing to harvest a completion. */
>> +               spin_unlock_irqrestore(&host->lock, flags);
>>                 goto set_error;
>> +       }
>>
>>         /* Prepare a new request */
>>         if (areq)
>>                 mmc_pre_req(host, areq->mrq, !saved_areq);
>>
>>         if (saved_areq) {
>> +               /* This thread owns this IO (saved_areq) for now. */
>> +               host->areq = NULL;
>> +               spin_unlock_irqrestore(&host->lock, flags);
>> +
>>                 saved_err = mmc_wait_for_data_req_done(host, saved_areq->mrq,   areq);
>>                 if (saved_err == MMC_BLK_NEW_REQUEST) {
>> -                       /*
>> -                        * The previous request was not completed,
>> -                        * nothing to return
>> -                        */
>> +                       spin_lock_irqsave(&host->lock, flags);
>> +                       BUG_ON(host->areq != NULL);
>> +
>> +                       /* Not completed. Don't report it. */
>> +                       host->areq = saved_areq;
>>                         saved_areq = NULL;
>> +                       saved_err = 0;
>> +                       spin_unlock_irqrestore(&host->lock, flags);
>>                         goto set_error;
>>                 }
>> -               /*
>> -                * Check BKOPS urgency for each R1 response
>> -                */
>> +
>> +               /* Check BKOPS urgency for each R1 response */
>>                 if (host->card && mmc_card_mmc(host->card) &&
>>                     ((mmc_resp_type(saved_areq->mrq->cmd) == MMC_RSP_R1) ||
>>                      (mmc_resp_type(saved_areq->mrq->cmd) == MMC_RSP_R1B)) &&
>> @@ -562,11 +573,12 @@ struct mmc_async_req *mmc_start_req(struct mmc_host *host,
>>         /* Don't start something new if previous one failed. */
>>         if (!saved_err && areq) {
>>                 start_err = __mmc_start_data_req(host, areq->mrq);
>> +
>>                 /* Cancel a prepared request if it was not started. */
>>                 if (start_err) {
>>                         mmc_post_req(host, areq->mrq, -EINVAL);
>>                         host->areq = NULL;
>> -               } else
>> +               } else
>>                         host->areq = areq;
>>         }
>>
>> --
>> 1.8.4
>>

^ permalink raw reply	[flat|nested] 13+ messages in thread

end of thread, other threads:[~2013-10-09  9:07 UTC | newest]

Thread overview: 13+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2013-09-26 19:22 [PATCH 0/7] mmc: core: cleanup and locking patches description Grant Grundler
2013-09-26 19:22 ` [PATCH 1/7] mmc: core: rename "data" to saved_areq Grant Grundler
2013-09-26 19:22 ` [PATCH 2/7] mmc: core: rename local var err to saved_err Grant Grundler
2013-09-26 19:22 ` [PATCH 3/7] mmc: core: restructure error handling for start req Grant Grundler
2013-09-26 19:22 ` [PATCH 4/7] mmc: core: use common code path to return error Grant Grundler
2013-09-26 19:22 ` [PATCH 5/7] mmc: core: handling polling more gracefully Grant Grundler
2013-09-26 19:22 ` [PATCH 6/7] mmc: core: protect references to host->areq with host->lock Grant Grundler
2013-10-09  0:48   ` Grant Grundler
2013-10-09  9:07     ` Ulf Hansson
2013-09-26 19:23 ` [PATCH 7/7] mmc: core: mmc_start_req is a misnomer -> mmc_process_areq Grant Grundler
2013-09-26 19:57 ` [PATCH 0/7] mmc: core: cleanup and locking patches description Grant Grundler
2013-10-03 20:35   ` Grant Grundler
     [not found]     ` <CAPDyKFqZHkDAKoaRM1vrUXCMVsAZBEs-PanjyoR=E=Lrp1aU7Q@mail.gmail.com>
2013-10-03 23:48       ` Grant Grundler

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.