[PATCH v2 0/5] mmc: dw_mmc: Fix the CTO timer patch, plus the DTO timer

linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* [PATCH v2 0/5] mmc: dw_mmc: Fix the CTO timer patch, plus the DTO timer
@ 2017-10-12 20:11 Douglas Anderson
  2017-10-12 20:11 ` [PATCH v2 1/5] mmc: dw_mmc: cancel the CTO timer after a voltage switch Douglas Anderson
                   ` (5 more replies)
  0 siblings, 6 replies; 20+ messages in thread
From: Douglas Anderson @ 2017-10-12 20:11 UTC (permalink / raw)
  To: jh80.chung, ulf.hansson, shawn.lin
  Cc: xzy.xu, amstan, linux-rockchip, briannorris, linux-samsung-soc,
	kernel, Douglas Anderson, linux-mmc, linux-kernel

Recently we landed 03de19212ea3 ("mmc: dw_mmc: introduce timer for
broken command transfer over scheme").  I found a bunch of problems
with that patch, so this series attempts to solve some of them.

This also fixes the DTO timer in some of the same ways even though I
haven't personally seen problems with the DTO timer.

NOTE: this series has only been lighly tested so far.  I can at least
reproduce the need for the CTO timer on one of my devices and so I can
confirm that part still works.  As mentioned in the 3rd patch I also
ran the mmc_test kernel module on this and did manage to see the 3rd
patch doing something useful.

Changes in v2:
- Removed extra "int i"
- Fix the DTO timeout calculation new for v2
- Cleanup the DTO timer new for v2

Douglas Anderson (5):
  mmc: dw_mmc: cancel the CTO timer after a voltage switch
  mmc: dw_mmc: Fix the CTO timeout calculation
  mmc: dw_mmc: Add locking to the CTO timer
  mmc: dw_mmc: Fix the DTO timeout calculation
  mmc: dw_mmc: Cleanup the DTO timer like the CTO one

 drivers/mmc/host/dw_mmc.c | 162 +++++++++++++++++++++++++++++++++++++++++-----
 1 file changed, 146 insertions(+), 16 deletions(-)

-- 
2.15.0.rc0.271.g36b669edcc-goog

^ permalink raw reply	[flat|nested] 20+ messages in thread

* [PATCH v2 1/5] mmc: dw_mmc: cancel the CTO timer after a voltage switch
  2017-10-12 20:11 [PATCH v2 0/5] mmc: dw_mmc: Fix the CTO timer patch, plus the DTO timer Douglas Anderson
@ 2017-10-12 20:11 ` Douglas Anderson
  2017-10-12 20:11 ` [PATCH v2 2/5] mmc: dw_mmc: Fix the CTO timeout calculation Douglas Anderson
                   ` (4 subsequent siblings)
  5 siblings, 0 replies; 20+ messages in thread
From: Douglas Anderson @ 2017-10-12 20:11 UTC (permalink / raw)
  To: jh80.chung, ulf.hansson, shawn.lin
  Cc: xzy.xu, amstan, linux-rockchip, briannorris, linux-samsung-soc,
	kernel, Douglas Anderson, linux-mmc, linux-kernel

When running with the commit 03de19212ea3 ("mmc: dw_mmc: introduce
timer for broken command transfer over scheme") I found this message
in the log:
  Unexpected command timeout, state 7

It turns out that we weren't properly cancelling the new CTO timer in
the case that a voltage switch was done.  Let's promote the cancel
into the dw_mci_cmd_interrupt() function to fix this.

Fixes: 03de19212ea3 ("mmc: dw_mmc: introduce timer for broken command transfer over scheme")
Tested-by: Emil Renner Berthing <kernel@esmil.dk>
Reviewed-by: Shawn Lin <shawn.lin@rock-chips.com>
Signed-off-by: Douglas Anderson <dianders@chromium.org>
---

Changes in v2: None

 drivers/mmc/host/dw_mmc.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/drivers/mmc/host/dw_mmc.c b/drivers/mmc/host/dw_mmc.c
index 860313bd952a..f5b2bb4b4d98 100644
--- a/drivers/mmc/host/dw_mmc.c
+++ b/drivers/mmc/host/dw_mmc.c
@@ -2570,6 +2570,8 @@ static void dw_mci_write_data_pio(struct dw_mci *host)
 
 static void dw_mci_cmd_interrupt(struct dw_mci *host, u32 status)
 {
+	del_timer(&host->cto_timer);
+
 	if (!host->cmd_status)
 		host->cmd_status = status;
 
@@ -2662,7 +2664,6 @@ static irqreturn_t dw_mci_interrupt(int irq, void *dev_id)
 		}
 
 		if (pending & SDMMC_INT_CMD_DONE) {
-			del_timer(&host->cto_timer);
 			mci_writel(host, RINTSTS, SDMMC_INT_CMD_DONE);
 			dw_mci_cmd_interrupt(host, pending);
 		}
-- 
2.15.0.rc0.271.g36b669edcc-goog

^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH v2 2/5] mmc: dw_mmc: Fix the CTO timeout calculation
  2017-10-12 20:11 [PATCH v2 0/5] mmc: dw_mmc: Fix the CTO timer patch, plus the DTO timer Douglas Anderson
  2017-10-12 20:11 ` [PATCH v2 1/5] mmc: dw_mmc: cancel the CTO timer after a voltage switch Douglas Anderson
@ 2017-10-12 20:11 ` Douglas Anderson
  2017-10-12 20:11 ` [PATCH v2 3/5] mmc: dw_mmc: Add locking to the CTO timer Douglas Anderson
                   ` (3 subsequent siblings)
  5 siblings, 0 replies; 20+ messages in thread
From: Douglas Anderson @ 2017-10-12 20:11 UTC (permalink / raw)
  To: jh80.chung, ulf.hansson, shawn.lin
  Cc: xzy.xu, amstan, linux-rockchip, briannorris, linux-samsung-soc,
	kernel, Douglas Anderson, linux-mmc, linux-kernel

In the commit 03de19212ea3 ("mmc: dw_mmc: introduce timer for broken
command transfer over scheme") we tried to calculate the expected
hardware command timeout value.  Unfortunately that calculation isn't
quite correct in all cases.  It used "bus_hz" but, as far as I can
tell, it's supposed to use the card clock.  Let's account for the div
value, which is documented as 2x the value stored in the register, or
1 if the register is 0.

NOTE: It's not expected that this will actually fix anything important
since the 10 ms margin added by the function will pretty much dwarf
any calculations.  The card clock should be 100 kHz at minimum and:
  1000 ms/s * (255 * 2) / 100000 Hz.
Gives us 5.1 ms.

...so really the point of this patch is just to make the code more
"correct" in case anyone ever tries to remove the 10 ms buffer.

Fixes: 03de19212ea3 ("mmc: dw_mmc: introduce timer for broken command transfer over scheme")
Tested-by: Emil Renner Berthing <kernel@esmil.dk>
Reviewed-by: Shawn Lin <shawn.lin@rock-chips.com>
Signed-off-by: Douglas Anderson <dianders@chromium.org>
---

Changes in v2: None

 drivers/mmc/host/dw_mmc.c | 6 +++++-
 1 file changed, 5 insertions(+), 1 deletion(-)

diff --git a/drivers/mmc/host/dw_mmc.c b/drivers/mmc/host/dw_mmc.c
index f5b2bb4b4d98..16516c528a88 100644
--- a/drivers/mmc/host/dw_mmc.c
+++ b/drivers/mmc/host/dw_mmc.c
@@ -401,10 +401,14 @@ static u32 dw_mci_prep_stop_abort(struct dw_mci *host, struct mmc_command *cmd)
 static inline void dw_mci_set_cto(struct dw_mci *host)
 {
 	unsigned int cto_clks;
+	unsigned int cto_div;
 	unsigned int cto_ms;

 	cto_clks = mci_readl(host, TMOUT) & 0xff;
-	cto_ms = DIV_ROUND_UP(cto_clks, host->bus_hz / 1000);
+	cto_div = (mci_readl(host, CLKDIV) & 0xff) * 2;
+	if (cto_div == 0)
+		cto_div = 1;
+	cto_ms = DIV_ROUND_UP(MSEC_PER_SEC * cto_clks * cto_div, host->bus_hz);

 	/* add a bit spare time */
 	cto_ms += 10;
-- 
2.15.0.rc0.271.g36b669edcc-goog

^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH v2 3/5] mmc: dw_mmc: Add locking to the CTO timer
  2017-10-12 20:11 [PATCH v2 0/5] mmc: dw_mmc: Fix the CTO timer patch, plus the DTO timer Douglas Anderson
  2017-10-12 20:11 ` [PATCH v2 1/5] mmc: dw_mmc: cancel the CTO timer after a voltage switch Douglas Anderson
  2017-10-12 20:11 ` [PATCH v2 2/5] mmc: dw_mmc: Fix the CTO timeout calculation Douglas Anderson
@ 2017-10-12 20:11 ` Douglas Anderson
  2017-10-13  1:32   ` Shawn Lin
  2017-10-12 20:11 ` [PATCH v2 4/5] mmc: dw_mmc: Fix the DTO timeout calculation Douglas Anderson
                   ` (2 subsequent siblings)
  5 siblings, 1 reply; 20+ messages in thread
From: Douglas Anderson @ 2017-10-12 20:11 UTC (permalink / raw)
  To: jh80.chung, ulf.hansson, shawn.lin
  Cc: xzy.xu, amstan, linux-rockchip, briannorris, linux-samsung-soc,
	kernel, Douglas Anderson, linux-mmc, linux-kernel

This attempts to instill a bit of paranoia to the code dealing with
the CTO timer.  It's believed that this will make the CTO timer more
robust in the case that we're having very long interrupt latencies.

Note that I originally thought that perhaps this patch was being
overly paranoid and wasn't really needed, but then while I was running
mmc_test on an rk3399 board I saw one instance of the message:
  dwmmc_rockchip fe320000.dwmmc: Unexpected interrupt latency

I had debug prints in the CTO timer code and I found that it was
running CMD 13 at the time.

...so even though this patch seems like it might be overly paranoid,
maybe it really isn't?

Presumably the bad interrupt latency experienced was due to the fact
that I had serial console enabled as serial console is typically where
I place blame when I see absurdly large interrupt latencies.  In this
particular case there was an (unrelated) printout to the serial
console just before I saw the "Unexpected interrupt latency" printout.

...and actually, I managed to even reproduce the problems by running
"iw mlan0 scan > /dev/null" while mmc_test was running.  That not only
does a bunch of PCIe traffic but it also (on my system) outputs some
SELinux log spam.

Fixes: 03de19212ea3 ("mmc: dw_mmc: introduce timer for broken command transfer over scheme")
Tested-by: Emil Renner Berthing <kernel@esmil.dk>
Signed-off-by: Douglas Anderson <dianders@chromium.org>
---

Changes in v2:
- Removed extra "int i"

 drivers/mmc/host/dw_mmc.c | 91 +++++++++++++++++++++++++++++++++++++++++------
 1 file changed, 81 insertions(+), 10 deletions(-)

diff --git a/drivers/mmc/host/dw_mmc.c b/drivers/mmc/host/dw_mmc.c
index 16516c528a88..50148991f30e 100644
--- a/drivers/mmc/host/dw_mmc.c
+++ b/drivers/mmc/host/dw_mmc.c
@@ -403,6 +403,7 @@ static inline void dw_mci_set_cto(struct dw_mci *host)
 	unsigned int cto_clks;
 	unsigned int cto_div;
 	unsigned int cto_ms;
+	unsigned long irqflags;
 
 	cto_clks = mci_readl(host, TMOUT) & 0xff;
 	cto_div = (mci_readl(host, CLKDIV) & 0xff) * 2;
@@ -413,8 +414,24 @@ static inline void dw_mci_set_cto(struct dw_mci *host)
 	/* add a bit spare time */
 	cto_ms += 10;
 
-	mod_timer(&host->cto_timer,
-		  jiffies + msecs_to_jiffies(cto_ms) + 1);
+	/*
+	 * The durations we're working with are fairly short so we have to be
+	 * extra careful about synchronization here.  Specifically in hardware a
+	 * command timeout is _at most_ 5.1 ms, so that means we expect an
+	 * interrupt (either command done or timeout) to come rather quickly
+	 * after the mci_writel.  ...but just in case we have a long interrupt
+	 * latency let's add a bit of paranoia.
+	 *
+	 * In general we'll assume that at least an interrupt will be asserted
+	 * in hardware by the time the cto_timer runs.  ...and if it hasn't
+	 * been asserted in hardware by that time then we'll assume it'll never
+	 * come.
+	 */
+	spin_lock_irqsave(&host->irq_lock, irqflags);
+	if (!test_bit(EVENT_CMD_COMPLETE, &host->pending_events))
+		mod_timer(&host->cto_timer,
+			jiffies + msecs_to_jiffies(cto_ms) + 1);
+	spin_unlock_irqrestore(&host->irq_lock, irqflags);
 }
 
 static void dw_mci_start_command(struct dw_mci *host,
@@ -429,11 +446,11 @@ static void dw_mci_start_command(struct dw_mci *host,
 	wmb(); /* drain writebuffer */
 	dw_mci_wait_while_busy(host, cmd_flags);
 
+	mci_writel(host, CMD, cmd_flags | SDMMC_CMD_START);
+
 	/* response expected command only */
 	if (cmd_flags & SDMMC_CMD_RESP_EXP)
 		dw_mci_set_cto(host);
-
-	mci_writel(host, CMD, cmd_flags | SDMMC_CMD_START);
 }
 
 static inline void send_stop_abort(struct dw_mci *host, struct mmc_data *data)
@@ -1930,6 +1947,24 @@ static void dw_mci_set_drto(struct dw_mci *host)
 	mod_timer(&host->dto_timer, jiffies + msecs_to_jiffies(drto_ms));
 }
 
+static bool dw_mci_clear_pending_cmd_complete(struct dw_mci *host)
+{
+	if (!test_bit(EVENT_CMD_COMPLETE, &host->pending_events))
+		return false;
+
+	/*
+	 * Really be certain that the timer has stopped.  This is a bit of
+	 * paranoia and could only really happen if we had really bad
+	 * interrupt latency and the interrupt routine and timeout were
+	 * running concurrently so that the del_timer() in the interrupt
+	 * handler couldn't run.
+	 */
+	WARN_ON(del_timer_sync(&host->cto_timer));
+	clear_bit(EVENT_CMD_COMPLETE, &host->pending_events);
+
+	return true;
+}
+
 static void dw_mci_tasklet_func(unsigned long priv)
 {
 	struct dw_mci *host = (struct dw_mci *)priv;
@@ -1956,8 +1991,7 @@ static void dw_mci_tasklet_func(unsigned long priv)
 
 		case STATE_SENDING_CMD11:
 		case STATE_SENDING_CMD:
-			if (!test_and_clear_bit(EVENT_CMD_COMPLETE,
-						&host->pending_events))
+			if (!dw_mci_clear_pending_cmd_complete(host))
 				break;
 
 			cmd = host->cmd;
@@ -2126,8 +2160,7 @@ static void dw_mci_tasklet_func(unsigned long priv)
 			/* fall through */
 
 		case STATE_SENDING_STOP:
-			if (!test_and_clear_bit(EVENT_CMD_COMPLETE,
-						&host->pending_events))
+			if (!dw_mci_clear_pending_cmd_complete(host))
 				break;
 
 			/* CMD error in data command */
@@ -2600,6 +2633,7 @@ static irqreturn_t dw_mci_interrupt(int irq, void *dev_id)
 	struct dw_mci *host = dev_id;
 	u32 pending;
 	struct dw_mci_slot *slot = host->slot;
+	unsigned long irqflags;
 
 	pending = mci_readl(host, MINTSTS); /* read-only mask reg */
 
@@ -2607,8 +2641,6 @@ static irqreturn_t dw_mci_interrupt(int irq, void *dev_id)
 		/* Check volt switch first, since it can look like an error */
 		if ((host->state == STATE_SENDING_CMD11) &&
 		    (pending & SDMMC_INT_VOLT_SWITCH)) {
-			unsigned long irqflags;
-
 			mci_writel(host, RINTSTS, SDMMC_INT_VOLT_SWITCH);
 			pending &= ~SDMMC_INT_VOLT_SWITCH;
 
@@ -2624,11 +2656,15 @@ static irqreturn_t dw_mci_interrupt(int irq, void *dev_id)
 		}
 
 		if (pending & DW_MCI_CMD_ERROR_FLAGS) {
+			spin_lock_irqsave(&host->irq_lock, irqflags);
+
 			del_timer(&host->cto_timer);
 			mci_writel(host, RINTSTS, DW_MCI_CMD_ERROR_FLAGS);
 			host->cmd_status = pending;
 			smp_wmb(); /* drain writebuffer */
 			set_bit(EVENT_CMD_COMPLETE, &host->pending_events);
+
+			spin_unlock_irqrestore(&host->irq_lock, irqflags);
 		}
 
 		if (pending & DW_MCI_DATA_ERROR_FLAGS) {
@@ -2668,8 +2704,12 @@ static irqreturn_t dw_mci_interrupt(int irq, void *dev_id)
 		}
 
 		if (pending & SDMMC_INT_CMD_DONE) {
+			spin_lock_irqsave(&host->irq_lock, irqflags);
+
 			mci_writel(host, RINTSTS, SDMMC_INT_CMD_DONE);
 			dw_mci_cmd_interrupt(host, pending);
+
+			spin_unlock_irqrestore(&host->irq_lock, irqflags);
 		}
 
 		if (pending & SDMMC_INT_CD) {
@@ -2943,7 +2983,35 @@ static void dw_mci_cmd11_timer(unsigned long arg)
 static void dw_mci_cto_timer(unsigned long arg)
 {
 	struct dw_mci *host = (struct dw_mci *)arg;
+	unsigned long irqflags;
+	u32 pending;
+
+	spin_lock_irqsave(&host->irq_lock, irqflags);
 
+	/*
+	 * If somehow we have very bad interrupt latency it's remotely possible
+	 * that the timer could fire while the interrupt is still pending or
+	 * while the interrupt is midway through running.  Let's be paranoid
+	 * and detect those two cases.  Note that this is paranoia is somewhat
+	 * justified because in this function we don't actually cancel the
+	 * pending command in the controller--we just assume it will never come.
+	 */
+	pending = mci_readl(host, MINTSTS); /* read-only mask reg */
+	if (pending & (DW_MCI_CMD_ERROR_FLAGS | SDMMC_INT_CMD_DONE)) {
+		/* The interrupt should fire; no need to act but we can warn */
+		dev_warn(host->dev, "Unexpected interrupt latency\n");
+		goto exit;
+	}
+	if (test_bit(EVENT_CMD_COMPLETE, &host->pending_events)) {
+		/* Presumably interrupt handler couldn't delete the timer */
+		dev_warn(host->dev, "CTO timeout when already completed\n");
+		goto exit;
+	}
+
+	/*
+	 * Continued paranoia to make sure we're in the state we expect.
+	 * This paranoia isn't really justified but it seems good to be safe.
+	 */
 	switch (host->state) {
 	case STATE_SENDING_CMD11:
 	case STATE_SENDING_CMD:
@@ -2962,6 +3030,9 @@ static void dw_mci_cto_timer(unsigned long arg)
 			 host->state);
 		break;
 	}
+
+exit:
+	spin_unlock_irqrestore(&host->irq_lock, irqflags);
 }
 
 static void dw_mci_dto_timer(unsigned long arg)
-- 
2.15.0.rc0.271.g36b669edcc-goog

^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH v2 4/5] mmc: dw_mmc: Fix the DTO timeout calculation
  2017-10-12 20:11 [PATCH v2 0/5] mmc: dw_mmc: Fix the CTO timer patch, plus the DTO timer Douglas Anderson
                   ` (2 preceding siblings ...)
  2017-10-12 20:11 ` [PATCH v2 3/5] mmc: dw_mmc: Add locking to the CTO timer Douglas Anderson
@ 2017-10-12 20:11 ` Douglas Anderson
  2017-10-13  1:02   ` Shawn Lin
  2017-10-12 20:11 ` [PATCH v2 5/5] mmc: dw_mmc: Cleanup the DTO timer like the CTO one Douglas Anderson
  2017-10-30 11:40 ` [PATCH v2 0/5] mmc: dw_mmc: Fix the CTO timer patch, plus the DTO timer Ulf Hansson
  5 siblings, 1 reply; 20+ messages in thread
From: Douglas Anderson @ 2017-10-12 20:11 UTC (permalink / raw)
  To: jh80.chung, ulf.hansson, shawn.lin
  Cc: xzy.xu, amstan, linux-rockchip, briannorris, linux-samsung-soc,
	kernel, Douglas Anderson, linux-mmc, linux-kernel

Just like the CTO timeout calculation introduced recently, the DTO
timeout calculation was incorrect.  It used "bus_hz" but, as far as I
can tell, it's supposed to use the card clock.  Let's account for the
div value, which is documented as 2x the value stored in the register,
or 1 if the register is 0.

NOTE: This was likely not terribly important until commit 16a34574c6ca
("mmc: dw_mmc: remove the quirks flags") landed because "DIV" is
documented on Rockchip SoCs (the ones that used to define the quirk)
to always be 0 or 1.  ...and, in fact, it's documented to only be 1
with EMMC in 8-bit DDR52 mode.  Thus before the quirk was applied to
everyone it was mostly OK to ignore the DIV value.

I haven't personally observed any problems that are fixed by this
patch but I also haven't tested this anywhere with a DIV other an 0.
AKA: this problem was found simply by code inspection and I have no
failing test cases that are fixed by it.  Presumably this could fix
real bugs for someone out there, though.

Fixes: 16a34574c6ca ("mmc: dw_mmc: remove the quirks flags")
Signed-off-by: Douglas Anderson <dianders@chromium.org>
---

Changes in v2:
- Fix the DTO timeout calculation new for v2

 drivers/mmc/host/dw_mmc.c | 8 +++++++-
 1 file changed, 7 insertions(+), 1 deletion(-)

diff --git a/drivers/mmc/host/dw_mmc.c b/drivers/mmc/host/dw_mmc.c
index 50148991f30e..6bc87b1385a9 100644
--- a/drivers/mmc/host/dw_mmc.c
+++ b/drivers/mmc/host/dw_mmc.c
@@ -1936,10 +1936,16 @@ static int dw_mci_data_complete(struct dw_mci *host, struct mmc_data *data)
 static void dw_mci_set_drto(struct dw_mci *host)
 {
 	unsigned int drto_clks;
+	unsigned int drto_div;
 	unsigned int drto_ms;
+	unsigned long irqflags;
 
 	drto_clks = mci_readl(host, TMOUT) >> 8;
-	drto_ms = DIV_ROUND_UP(drto_clks, host->bus_hz / 1000);
+	drto_div = (mci_readl(host, CLKDIV) & 0xff) * 2;
+	if (drto_div == 0)
+		drto_div = 1;
+	drto_ms = DIV_ROUND_UP(MSEC_PER_SEC * drto_clks * drto_div,
+			       host->bus_hz);
 
 	/* add a bit spare time */
 	drto_ms += 10;
-- 
2.15.0.rc0.271.g36b669edcc-goog

^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH v2 5/5] mmc: dw_mmc: Cleanup the DTO timer like the CTO one
  2017-10-12 20:11 [PATCH v2 0/5] mmc: dw_mmc: Fix the CTO timer patch, plus the DTO timer Douglas Anderson
                   ` (3 preceding siblings ...)
  2017-10-12 20:11 ` [PATCH v2 4/5] mmc: dw_mmc: Fix the DTO timeout calculation Douglas Anderson
@ 2017-10-12 20:11 ` Douglas Anderson
  2017-10-17  1:17   ` Shawn Lin
  2017-10-30 11:40 ` [PATCH v2 0/5] mmc: dw_mmc: Fix the CTO timer patch, plus the DTO timer Ulf Hansson
  5 siblings, 1 reply; 20+ messages in thread
From: Douglas Anderson @ 2017-10-12 20:11 UTC (permalink / raw)
  To: jh80.chung, ulf.hansson, shawn.lin
  Cc: xzy.xu, amstan, linux-rockchip, briannorris, linux-samsung-soc,
	kernel, Douglas Anderson, linux-mmc, linux-kernel

The recent CTO timer introduced in commit 03de19212ea3 ("mmc: dw_mmc:
introduce timer for broken command transfer over scheme") was causing
observable problems due to race conditions.  Previous patches have
fixed those race conditions.

It can be observed that these same race conditions ought to be
theoretically possible with the DTO timer too though they are
massively less likely to happen because the data timeout is always set
to 0xffffff right now.  That means even at a 200 MHz card clock we
were arming the DTO timer for 94 ms:
  >>> (0xffffff * 1000. / 200000000) + 10
  93.886075

We always also were setting the DTO timer _after_ starting the
transfer, unlike how the old code was seting the CTO timer.

In any case, even though the DTO timer is much less likely to have
races, it still makes sense to add code to handle it _just in case_.

Signed-off-by: Douglas Anderson <dianders@chromium.org>
---

Changes in v2:
- Cleanup the DTO timer new for v2

 drivers/mmc/host/dw_mmc.c | 54 ++++++++++++++++++++++++++++++++++++++++++++---
 1 file changed, 51 insertions(+), 3 deletions(-)

diff --git a/drivers/mmc/host/dw_mmc.c b/drivers/mmc/host/dw_mmc.c
index 6bc87b1385a9..bc0808615431 100644
--- a/drivers/mmc/host/dw_mmc.c
+++ b/drivers/mmc/host/dw_mmc.c
@@ -1950,7 +1950,11 @@ static void dw_mci_set_drto(struct dw_mci *host)
 	/* add a bit spare time */
 	drto_ms += 10;
 
-	mod_timer(&host->dto_timer, jiffies + msecs_to_jiffies(drto_ms));
+	spin_lock_irqsave(&host->irq_lock, irqflags);
+	if (!test_bit(EVENT_DATA_COMPLETE, &host->pending_events))
+		mod_timer(&host->dto_timer,
+			  jiffies + msecs_to_jiffies(drto_ms));
+	spin_unlock_irqrestore(&host->irq_lock, irqflags);
 }
 
 static bool dw_mci_clear_pending_cmd_complete(struct dw_mci *host)
@@ -1971,6 +1975,18 @@ static bool dw_mci_clear_pending_cmd_complete(struct dw_mci *host)
 	return true;
 }
 
+static bool dw_mci_clear_pending_data_complete(struct dw_mci *host)
+{
+	if (!test_bit(EVENT_DATA_COMPLETE, &host->pending_events))
+		return false;
+
+	/* Extra paranoia just like dw_mci_clear_pending_cmd_complete() */
+	WARN_ON(del_timer_sync(&host->dto_timer));
+	clear_bit(EVENT_DATA_COMPLETE, &host->pending_events);
+
+	return true;
+}
+
 static void dw_mci_tasklet_func(unsigned long priv)
 {
 	struct dw_mci *host = (struct dw_mci *)priv;
@@ -2112,8 +2128,7 @@ static void dw_mci_tasklet_func(unsigned long priv)
 			/* fall through */
 
 		case STATE_DATA_BUSY:
-			if (!test_and_clear_bit(EVENT_DATA_COMPLETE,
-						&host->pending_events)) {
+			if (!dw_mci_clear_pending_data_complete(host)) {
 				/*
 				 * If data error interrupt comes but data over
 				 * interrupt doesn't come within the given time.
@@ -2683,6 +2698,8 @@ static irqreturn_t dw_mci_interrupt(int irq, void *dev_id)
 		}
 
 		if (pending & SDMMC_INT_DATA_OVER) {
+			spin_lock_irqsave(&host->irq_lock, irqflags);
+
 			del_timer(&host->dto_timer);
 
 			mci_writel(host, RINTSTS, SDMMC_INT_DATA_OVER);
@@ -2695,6 +2712,8 @@ static irqreturn_t dw_mci_interrupt(int irq, void *dev_id)
 			}
 			set_bit(EVENT_DATA_COMPLETE, &host->pending_events);
 			tasklet_schedule(&host->tasklet);
+
+			spin_unlock_irqrestore(&host->irq_lock, irqflags);
 		}
 
 		if (pending & SDMMC_INT_RXDR) {
@@ -3044,7 +3063,31 @@ static void dw_mci_cto_timer(unsigned long arg)
 static void dw_mci_dto_timer(unsigned long arg)
 {
 	struct dw_mci *host = (struct dw_mci *)arg;
+	unsigned long irqflags;
+	u32 pending;
+
+	spin_lock_irqsave(&host->irq_lock, irqflags);
 
+	/*
+	 * The DTO timer is much longer than the CTO timer, so it's even less
+	 * likely that we'll these cases, but it pays to be paranoid.
+	 */
+	pending = mci_readl(host, MINTSTS); /* read-only mask reg */
+	if (pending & SDMMC_INT_DATA_OVER) {
+		/* The interrupt should fire; no need to act but we can warn */
+		dev_warn(host->dev, "Unexpected data interrupt latency\n");
+		goto exit;
+	}
+	if (test_bit(EVENT_DATA_COMPLETE, &host->pending_events)) {
+		/* Presumably interrupt handler couldn't delete the timer */
+		dev_warn(host->dev, "DTO timeout when already completed\n");
+		goto exit;
+	}
+
+	/*
+	 * Continued paranoia to make sure we're in the state we expect.
+	 * This paranoia isn't really justified but it seems good to be safe.
+	 */
 	switch (host->state) {
 	case STATE_SENDING_DATA:
 	case STATE_DATA_BUSY:
@@ -3059,8 +3102,13 @@ static void dw_mci_dto_timer(unsigned long arg)
 		tasklet_schedule(&host->tasklet);
 		break;
 	default:
+		dev_warn(host->dev, "Unexpected data timeout, state %d\n",
+			 host->state);
 		break;
 	}
+
+exit:
+	spin_unlock_irqrestore(&host->irq_lock, irqflags);
 }
 
 #ifdef CONFIG_OF
-- 
2.15.0.rc0.271.g36b669edcc-goog

^ permalink raw reply related	[flat|nested] 20+ messages in thread

* Re: [PATCH v2 4/5] mmc: dw_mmc: Fix the DTO timeout calculation
  2017-10-12 20:11 ` [PATCH v2 4/5] mmc: dw_mmc: Fix the DTO timeout calculation Douglas Anderson
@ 2017-10-13  1:02   ` Shawn Lin
  0 siblings, 0 replies; 20+ messages in thread
From: Shawn Lin @ 2017-10-13  1:02 UTC (permalink / raw)
  To: Douglas Anderson
  Cc: jh80.chung, ulf.hansson, shawn.lin, xzy.xu, amstan,
	linux-rockchip, briannorris, linux-samsung-soc, kernel,
	linux-mmc, linux-kernel


On 2017/10/13 4:11, Douglas Anderson wrote:
> Just like the CTO timeout calculation introduced recently, the DTO
> timeout calculation was incorrect.  It used "bus_hz" but, as far as I
> can tell, it's supposed to use the card clock.  Let's account for the
> div value, which is documented as 2x the value stored in the register,
> or 1 if the register is 0.
> 
> NOTE: This was likely not terribly important until commit 16a34574c6ca
> ("mmc: dw_mmc: remove the quirks flags") landed because "DIV" is
> documented on Rockchip SoCs (the ones that used to define the quirk)
> to always be 0 or 1.  ...and, in fact, it's documented to only be 1
> with EMMC in 8-bit DDR52 mode.  Thus before the quirk was applied to
> everyone it was mostly OK to ignore the DIV value.
> 
> I haven't personally observed any problems that are fixed by this
> patch but I also haven't tested this anywhere with a DIV other an 0.
> AKA: this problem was found simply by code inspection and I have no
> failing test cases that are fixed by it.  Presumably this could fix
> real bugs for someone out there, though.
> 
> Fixes: 16a34574c6ca ("mmc: dw_mmc: remove the quirks flags")
> Signed-off-by: Douglas Anderson <dianders@chromium.org>

Reviewed-by: Shawn Lin <shawn.lin@rock-chips.com>

> ---
> 
> Changes in v2:
> - Fix the DTO timeout calculation new for v2
> 
>   drivers/mmc/host/dw_mmc.c | 8 +++++++-
>   1 file changed, 7 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/mmc/host/dw_mmc.c b/drivers/mmc/host/dw_mmc.c
> index 50148991f30e..6bc87b1385a9 100644
> --- a/drivers/mmc/host/dw_mmc.c
> +++ b/drivers/mmc/host/dw_mmc.c
> @@ -1936,10 +1936,16 @@ static int dw_mci_data_complete(struct dw_mci *host, struct mmc_data *data)
>   static void dw_mci_set_drto(struct dw_mci *host)
>   {
>   	unsigned int drto_clks;
> +	unsigned int drto_div;
>   	unsigned int drto_ms;
> +	unsigned long irqflags;
>   
>   	drto_clks = mci_readl(host, TMOUT) >> 8;
> -	drto_ms = DIV_ROUND_UP(drto_clks, host->bus_hz / 1000);
> +	drto_div = (mci_readl(host, CLKDIV) & 0xff) * 2;
> +	if (drto_div == 0)
> +		drto_div = 1;
> +	drto_ms = DIV_ROUND_UP(MSEC_PER_SEC * drto_clks * drto_div,
> +			       host->bus_hz);
>   
>   	/* add a bit spare time */
>   	drto_ms += 10;
> 

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH v2 3/5] mmc: dw_mmc: Add locking to the CTO timer
  2017-10-12 20:11 ` [PATCH v2 3/5] mmc: dw_mmc: Add locking to the CTO timer Douglas Anderson
@ 2017-10-13  1:32   ` Shawn Lin
  2017-10-13  4:20     ` Doug Anderson
  0 siblings, 1 reply; 20+ messages in thread
From: Shawn Lin @ 2017-10-13  1:32 UTC (permalink / raw)
  To: Douglas Anderson
  Cc: jh80.chung, ulf.hansson, shawn.lin, xzy.xu, amstan,
	linux-rockchip, briannorris, linux-samsung-soc, kernel,
	linux-mmc, linux-kernel


On 2017/10/13 4:11, Douglas Anderson wrote:
> This attempts to instill a bit of paranoia to the code dealing with
> the CTO timer.  It's believed that this will make the CTO timer more
> robust in the case that we're having very long interrupt latencies.
> 

Ack. It could help fix some problems observed.

> Note that I originally thought that perhaps this patch was being
> overly paranoid and wasn't really needed, but then while I was running
> mmc_test on an rk3399 board I saw one instance of the message:
>    dwmmc_rockchip fe320000.dwmmc: Unexpected interrupt latency
> 
> I had debug prints in the CTO timer code and I found that it was
> running CMD 13 at the time.
> 
> ...so even though this patch seems like it might be overly paranoid,
> maybe it really isn't?
> 
> Presumably the bad interrupt latency experienced was due to the fact
> that I had serial console enabled as serial console is typically where
> I place blame when I see absurdly large interrupt latencies.  In this
> particular case there was an (unrelated) printout to the serial
> console just before I saw the "Unexpected interrupt latency" printout.
> 
> ...and actually, I managed to even reproduce the problems by running
> "iw mlan0 scan > /dev/null" while mmc_test was running.  That not only
> does a bunch of PCIe traffic but it also (on my system) outputs some
> SELinux log spam.
> > Fixes: 03de19212ea3 ("mmc: dw_mmc: introduce timer for broken command 
transfer over scheme")
> Tested-by: Emil Renner Berthing <kernel@esmil.dk>
> Signed-off-by: Douglas Anderson <dianders@chromium.org>
> ---
> 
> Changes in v2:
> - Removed extra "int i"
> 
>   drivers/mmc/host/dw_mmc.c | 91 +++++++++++++++++++++++++++++++++++++++++------
>   1 file changed, 81 insertions(+), 10 deletions(-)
> 
> diff --git a/drivers/mmc/host/dw_mmc.c b/drivers/mmc/host/dw_mmc.c
> index 16516c528a88..50148991f30e 100644
> --- a/drivers/mmc/host/dw_mmc.c
> +++ b/drivers/mmc/host/dw_mmc.c
> @@ -403,6 +403,7 @@ static inline void dw_mci_set_cto(struct dw_mci *host)
>   	unsigned int cto_clks;
>   	unsigned int cto_div;
>   	unsigned int cto_ms;
> +	unsigned long irqflags;
>   
>   	cto_clks = mci_readl(host, TMOUT) & 0xff;
>   	cto_div = (mci_readl(host, CLKDIV) & 0xff) * 2;
> @@ -413,8 +414,24 @@ static inline void dw_mci_set_cto(struct dw_mci *host)
>   	/* add a bit spare time */
>   	cto_ms += 10;
>   
> -	mod_timer(&host->cto_timer,
> -		  jiffies + msecs_to_jiffies(cto_ms) + 1);
> +	/*
> +	 * The durations we're working with are fairly short so we have to be
> +	 * extra careful about synchronization here.  Specifically in hardware a
> +	 * command timeout is _at most_ 5.1 ms, so that means we expect an
> +	 * interrupt (either command done or timeout) to come rather quickly
> +	 * after the mci_writel.  ...but just in case we have a long interrupt
> +	 * latency let's add a bit of paranoia.
> +	 *
> +	 * In general we'll assume that at least an interrupt will be asserted
> +	 * in hardware by the time the cto_timer runs.  ...and if it hasn't
> +	 * been asserted in hardware by that time then we'll assume it'll never
> +	 * come.
> +	 */
> +	spin_lock_irqsave(&host->irq_lock, irqflags);
> +	if (!test_bit(EVENT_CMD_COMPLETE, &host->pending_events))
> +		mod_timer(&host->cto_timer,
> +			jiffies + msecs_to_jiffies(cto_ms) + 1);
> +	spin_unlock_irqrestore(&host->irq_lock, irqflags);

IIUC, this change is beacuse you move
mci_writel(host, CMD, cmd_flags | SDMMC_CMD_START) before
setting up the timer, so there is a timing gap that the cmd_done
already comes and handled by dw_mci_interrupt->dw_mci_cmd_interrupt.
At this point, we don't need the cto timer at all.

>   }
>   
>   static void dw_mci_start_command(struct dw_mci *host,
> @@ -429,11 +446,11 @@ static void dw_mci_start_command(struct dw_mci *host,
>   	wmb(); /* drain writebuffer */
>   	dw_mci_wait_while_busy(host, cmd_flags);
>   
> +	mci_writel(host, CMD, cmd_flags | SDMMC_CMD_START);
> +
>   	/* response expected command only */
>   	if (cmd_flags & SDMMC_CMD_RESP_EXP)
>   		dw_mci_set_cto(host);
> -
> -	mci_writel(host, CMD, cmd_flags | SDMMC_CMD_START);


But why? If we still keep the original logic, it's always correct
that cmd_done comes after setting up the cto timer. So could you
eleborate a bit more to help me understand the real intention here?

>   }
>   
>   static inline void send_stop_abort(struct dw_mci *host, struct mmc_data *data)
> @@ -1930,6 +1947,24 @@ static void dw_mci_set_drto(struct dw_mci *host)
>   	mod_timer(&host->dto_timer, jiffies + msecs_to_jiffies(drto_ms));
>   }
>   
> +static bool dw_mci_clear_pending_cmd_complete(struct dw_mci *host)
> +{
> +	if (!test_bit(EVENT_CMD_COMPLETE, &host->pending_events))
> +		return false;
> +
> +	/*
> +	 * Really be certain that the timer has stopped.  This is a bit of
> +	 * paranoia and could only really happen if we had really bad
> +	 * interrupt latency and the interrupt routine and timeout were
> +	 * running concurrently so that the del_timer() in the interrupt
> +	 * handler couldn't run.
> +	 */
> +	WARN_ON(del_timer_sync(&host->cto_timer));
> +	clear_bit(EVENT_CMD_COMPLETE, &host->pending_events);
> +
> +	return true;
> +}
> +
>   static void dw_mci_tasklet_func(unsigned long priv)
>   {
>   	struct dw_mci *host = (struct dw_mci *)priv;
> @@ -1956,8 +1991,7 @@ static void dw_mci_tasklet_func(unsigned long priv)
>   
>   		case STATE_SENDING_CMD11:
>   		case STATE_SENDING_CMD:
> -			if (!test_and_clear_bit(EVENT_CMD_COMPLETE,
> -						&host->pending_events))
> +			if (!dw_mci_clear_pending_cmd_complete(host))
>   				break;
>   
>   			cmd = host->cmd;
> @@ -2126,8 +2160,7 @@ static void dw_mci_tasklet_func(unsigned long priv)
>   			/* fall through */
>   
>   		case STATE_SENDING_STOP:
> -			if (!test_and_clear_bit(EVENT_CMD_COMPLETE,
> -						&host->pending_events))
> +			if (!dw_mci_clear_pending_cmd_complete(host))
>   				break;
>   
>   			/* CMD error in data command */
> @@ -2600,6 +2633,7 @@ static irqreturn_t dw_mci_interrupt(int irq, void *dev_id)
>   	struct dw_mci *host = dev_id;
>   	u32 pending;
>   	struct dw_mci_slot *slot = host->slot;
> +	unsigned long irqflags;
>   
>   	pending = mci_readl(host, MINTSTS); /* read-only mask reg */
>   
> @@ -2607,8 +2641,6 @@ static irqreturn_t dw_mci_interrupt(int irq, void *dev_id)
>   		/* Check volt switch first, since it can look like an error */
>   		if ((host->state == STATE_SENDING_CMD11) &&
>   		    (pending & SDMMC_INT_VOLT_SWITCH)) {
> -			unsigned long irqflags;
> -
>   			mci_writel(host, RINTSTS, SDMMC_INT_VOLT_SWITCH);
>   			pending &= ~SDMMC_INT_VOLT_SWITCH;
>   
> @@ -2624,11 +2656,15 @@ static irqreturn_t dw_mci_interrupt(int irq, void *dev_id)
>   		}
>   
>   		if (pending & DW_MCI_CMD_ERROR_FLAGS) {
> +			spin_lock_irqsave(&host->irq_lock, irqflags);
> +
>   			del_timer(&host->cto_timer);
>   			mci_writel(host, RINTSTS, DW_MCI_CMD_ERROR_FLAGS);
>   			host->cmd_status = pending;
>   			smp_wmb(); /* drain writebuffer */
>   			set_bit(EVENT_CMD_COMPLETE, &host->pending_events);
> +
> +			spin_unlock_irqrestore(&host->irq_lock, irqflags);
>   		}
>   
>   		if (pending & DW_MCI_DATA_ERROR_FLAGS) {
> @@ -2668,8 +2704,12 @@ static irqreturn_t dw_mci_interrupt(int irq, void *dev_id)
>   		}
>   
>   		if (pending & SDMMC_INT_CMD_DONE) {
> +			spin_lock_irqsave(&host->irq_lock, irqflags);
> +
>   			mci_writel(host, RINTSTS, SDMMC_INT_CMD_DONE);
>   			dw_mci_cmd_interrupt(host, pending);
> +
> +			spin_unlock_irqrestore(&host->irq_lock, irqflags);
>   		}
>   
>   		if (pending & SDMMC_INT_CD) {
> @@ -2943,7 +2983,35 @@ static void dw_mci_cmd11_timer(unsigned long arg)
>   static void dw_mci_cto_timer(unsigned long arg)
>   {
>   	struct dw_mci *host = (struct dw_mci *)arg;
> +	unsigned long irqflags;
> +	u32 pending;
> +
> +	spin_lock_irqsave(&host->irq_lock, irqflags);
>   
> +	/*
> +	 * If somehow we have very bad interrupt latency it's remotely possible
> +	 * that the timer could fire while the interrupt is still pending or
> +	 * while the interrupt is midway through running.  Let's be paranoid
> +	 * and detect those two cases.  Note that this is paranoia is somewhat
> +	 * justified because in this function we don't actually cancel the
> +	 * pending command in the controller--we just assume it will never come.
> +	 */
> +	pending = mci_readl(host, MINTSTS); /* read-only mask reg */
> +	if (pending & (DW_MCI_CMD_ERROR_FLAGS | SDMMC_INT_CMD_DONE)) {
> +		/* The interrupt should fire; no need to act but we can warn */
> +		dev_warn(host->dev, "Unexpected interrupt latency\n");
> +		goto exit;
> +	}
> +	if (test_bit(EVENT_CMD_COMPLETE, &host->pending_events)) {
> +		/* Presumably interrupt handler couldn't delete the timer */
> +		dev_warn(host->dev, "CTO timeout when already completed\n");
> +		goto exit;
> +	}
> +
> +	/*
> +	 * Continued paranoia to make sure we're in the state we expect.
> +	 * This paranoia isn't really justified but it seems good to be safe.
> +	 */
>   	switch (host->state) {
>   	case STATE_SENDING_CMD11:
>   	case STATE_SENDING_CMD:
> @@ -2962,6 +3030,9 @@ static void dw_mci_cto_timer(unsigned long arg)
>   			 host->state);
>   		break;
>   	}
> +
> +exit:
> +	spin_unlock_irqrestore(&host->irq_lock, irqflags);
>   }
>   
>   static void dw_mci_dto_timer(unsigned long arg)
> 

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH v2 3/5] mmc: dw_mmc: Add locking to the CTO timer
  2017-10-13  1:32   ` Shawn Lin
@ 2017-10-13  4:20     ` Doug Anderson
  2017-10-17  0:54       ` Shawn Lin
  0 siblings, 1 reply; 20+ messages in thread
From: Doug Anderson @ 2017-10-13  4:20 UTC (permalink / raw)
  To: Shawn Lin
  Cc: Jaehoon Chung, Ulf Hansson, Ziyuan Xu, Alexandru M Stan,
	open list:ARM/Rockchip SoC...,
	Brian Norris, linux-samsung-soc, Emil Renner Berthing, linux-mmc,
	linux-kernel

Shawn,

On Thu, Oct 12, 2017 at 6:32 PM, Shawn Lin <shawn.lin@rock-chips.com> wrote:
>
> On 2017/10/13 4:11, Douglas Anderson wrote:
>>
>> This attempts to instill a bit of paranoia to the code dealing with
>> the CTO timer.  It's believed that this will make the CTO timer more
>> robust in the case that we're having very long interrupt latencies.
>>
>
> Ack. It could help fix some problems observed.
>
>
>> Note that I originally thought that perhaps this patch was being
>> overly paranoid and wasn't really needed, but then while I was running
>> mmc_test on an rk3399 board I saw one instance of the message:
>>    dwmmc_rockchip fe320000.dwmmc: Unexpected interrupt latency
>>
>> I had debug prints in the CTO timer code and I found that it was
>> running CMD 13 at the time.
>>
>> ...so even though this patch seems like it might be overly paranoid,
>> maybe it really isn't?
>>
>> Presumably the bad interrupt latency experienced was due to the fact
>> that I had serial console enabled as serial console is typically where
>> I place blame when I see absurdly large interrupt latencies.  In this
>> particular case there was an (unrelated) printout to the serial
>> console just before I saw the "Unexpected interrupt latency" printout.
>>
>> ...and actually, I managed to even reproduce the problems by running
>> "iw mlan0 scan > /dev/null" while mmc_test was running.  That not only
>> does a bunch of PCIe traffic but it also (on my system) outputs some
>> SELinux log spam.
>> > Fixes: 03de19212ea3 ("mmc: dw_mmc: introduce timer for broken command
>
> transfer over scheme")
>>
>> Tested-by: Emil Renner Berthing <kernel@esmil.dk>
>> Signed-off-by: Douglas Anderson <dianders@chromium.org>
>> ---
>>
>> Changes in v2:
>> - Removed extra "int i"
>>
>>   drivers/mmc/host/dw_mmc.c | 91
>> +++++++++++++++++++++++++++++++++++++++++------
>>   1 file changed, 81 insertions(+), 10 deletions(-)
>>
>> diff --git a/drivers/mmc/host/dw_mmc.c b/drivers/mmc/host/dw_mmc.c
>> index 16516c528a88..50148991f30e 100644
>> --- a/drivers/mmc/host/dw_mmc.c
>> +++ b/drivers/mmc/host/dw_mmc.c
>> @@ -403,6 +403,7 @@ static inline void dw_mci_set_cto(struct dw_mci *host)
>>         unsigned int cto_clks;
>>         unsigned int cto_div;
>>         unsigned int cto_ms;
>> +       unsigned long irqflags;
>>         cto_clks = mci_readl(host, TMOUT) & 0xff;
>>         cto_div = (mci_readl(host, CLKDIV) & 0xff) * 2;
>> @@ -413,8 +414,24 @@ static inline void dw_mci_set_cto(struct dw_mci
>> *host)
>>         /* add a bit spare time */
>>         cto_ms += 10;
>>   -     mod_timer(&host->cto_timer,
>> -                 jiffies + msecs_to_jiffies(cto_ms) + 1);
>> +       /*
>> +        * The durations we're working with are fairly short so we have to
>> be
>> +        * extra careful about synchronization here.  Specifically in
>> hardware a
>> +        * command timeout is _at most_ 5.1 ms, so that means we expect an
>> +        * interrupt (either command done or timeout) to come rather
>> quickly
>> +        * after the mci_writel.  ...but just in case we have a long
>> interrupt
>> +        * latency let's add a bit of paranoia.
>> +        *
>> +        * In general we'll assume that at least an interrupt will be
>> asserted
>> +        * in hardware by the time the cto_timer runs.  ...and if it
>> hasn't
>> +        * been asserted in hardware by that time then we'll assume it'll
>> never
>> +        * come.
>> +        */
>> +       spin_lock_irqsave(&host->irq_lock, irqflags);
>> +       if (!test_bit(EVENT_CMD_COMPLETE, &host->pending_events))
>> +               mod_timer(&host->cto_timer,
>> +                       jiffies + msecs_to_jiffies(cto_ms) + 1);
>> +       spin_unlock_irqrestore(&host->irq_lock, irqflags);
>
>
> IIUC, this change is beacuse you move
> mci_writel(host, CMD, cmd_flags | SDMMC_CMD_START) before
> setting up the timer, so there is a timing gap that the cmd_done
> already comes and handled by dw_mci_interrupt->dw_mci_cmd_interrupt.
> At this point, we don't need the cto timer at all.

As per below, if I don't move the mci_writel() before setting up the
timer then there's still a race.  ...and actually that race was harder
for me to write code for, but I invite you to try to see if it's
somehow cleaner.


>>   }
>>     static void dw_mci_start_command(struct dw_mci *host,
>> @@ -429,11 +446,11 @@ static void dw_mci_start_command(struct dw_mci
>> *host,
>>         wmb(); /* drain writebuffer */
>>         dw_mci_wait_while_busy(host, cmd_flags);
>>   +     mci_writel(host, CMD, cmd_flags | SDMMC_CMD_START);
>> +
>>         /* response expected command only */
>>         if (cmd_flags & SDMMC_CMD_RESP_EXP)
>>                 dw_mci_set_cto(host);
>> -
>> -       mci_writel(host, CMD, cmd_flags | SDMMC_CMD_START);
>
>
>
> But why? If we still keep the original logic, it's always correct
> that cmd_done comes after setting up the cto timer. So could you
> eleborate a bit more to help me understand the real intention here?

No matter which order you put things, there's a race one way or the
other.  You need a lock.

Let's think about the old code you wrote.  You did this:
1. Start the CTO timer.
2. Start the command.

Now if you (somehow) take 20 ms to handle the interrupt, then this happens:

1. Start the CTO timer.
2. Start the command.
3. Interrupt is pending, but interrupt handler doesn't run yet.
4. CTO timer fires and enqueues CTO timeout.
5. Interrupt finally fires.


Now normally things are pretty bad if you've got an interrupt latency
of 20 ms.  ...and, in fact, I originally wrote up a commit that simply
explained why the race didn't matter and was thinking of posting that
instead of this one.  I wrote up:

     * Start a timer to detect missing cmd timeout if we expect a response.
     *
     * Note that we need to be a little careful about race conditions here
     * since our timer will be racing with the actual hardware interrupt
     * and things would get confused if both of them happened.
     *
     * We end up avoiding races here mostly because of our 10 ms "spare
     * time" buffer above.  That's probably reliable enough because:
     * - There's "guaranteed" "very little" time between setting the timer
     *   and starting the command.  We're holding a spinlock (host->lock)
     *   in all calls to this function so we won't get preempted.  Possibly
     *   we could get interrupts still, but that shouldn't add up to
     *   anything like the 10 ms spare time.
     * - We expect that when the actual interrupt fires that our interrupt
     *   routine should get called "relatively quickly" (compared to the
     *   10 ms buffer) and will be able to cancel this timer.

...but then I ran a whole bunch of tests and I found that, as far as I
could tell, we actually _were_ getting a super long interrupt latency.
Specifically I saw the printout "Unexpected interrupt latency" in my
patch.  In order to see that printout in my patch (which even starts
the command _before_ the CTO timer), the only explanation is bad
interrupt latency, right?  Also: based on my past experience I believe
it is possible to get upwards of 100 ms interrupt latency if you've
got serial console enabled.  printk, especially printk from an
interrupt context, can do some funny things.


...but this stuff is always hard to get right, so if I messed up the
above please let me know!  I tried to think of all of the cases so it
would work no matter if delays happened in any random place but
concurrency is hard.


-Doug

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH v2 3/5] mmc: dw_mmc: Add locking to the CTO timer
  2017-10-13  4:20     ` Doug Anderson
@ 2017-10-17  0:54       ` Shawn Lin
  2017-10-17 16:40         ` Doug Anderson
  0 siblings, 1 reply; 20+ messages in thread
From: Shawn Lin @ 2017-10-17  0:54 UTC (permalink / raw)
  To: Doug Anderson
  Cc: shawn.lin, Jaehoon Chung, Ulf Hansson, Ziyuan Xu,
	Alexandru M Stan, open list:ARM/Rockchip SoC...,
	Brian Norris, linux-samsung-soc, Emil Renner Berthing, linux-mmc,
	linux-kernel

Hi Doug

On 2017/10/13 12:20, Doug Anderson wrote:
> Shawn,
> 
> On Thu, Oct 12, 2017 at 6:32 PM, Shawn Lin <shawn.lin@rock-chips.com> wrote:
>>
>> On 2017/10/13 4:11, Douglas Anderson wrote:
>>>
>>> This attempts to instill a bit of paranoia to the code dealing with
>>> the CTO timer.  It's believed that this will make the CTO timer more
>>> robust in the case that we're having very long interrupt latencies.
>>>
>>
>> Ack. It could help fix some problems observed.
>>
>>
>>> Note that I originally thought that perhaps this patch was being
>>> overly paranoid and wasn't really needed, but then while I was running
>>> mmc_test on an rk3399 board I saw one instance of the message:
>>>     dwmmc_rockchip fe320000.dwmmc: Unexpected interrupt latency
>>>
>>> I had debug prints in the CTO timer code and I found that it was
>>> running CMD 13 at the time.
>>>
>>> ...so even though this patch seems like it might be overly paranoid,
>>> maybe it really isn't?
>>>
>>> Presumably the bad interrupt latency experienced was due to the fact
>>> that I had serial console enabled as serial console is typically where
>>> I place blame when I see absurdly large interrupt latencies.  In this
>>> particular case there was an (unrelated) printout to the serial
>>> console just before I saw the "Unexpected interrupt latency" printout.
>>>
>>> ...and actually, I managed to even reproduce the problems by running
>>> "iw mlan0 scan > /dev/null" while mmc_test was running.  That not only
>>> does a bunch of PCIe traffic but it also (on my system) outputs some
>>> SELinux log spam.
>>>> Fixes: 03de19212ea3 ("mmc: dw_mmc: introduce timer for broken command
>>
>> transfer over scheme")
>>>
>>> Tested-by: Emil Renner Berthing <kernel@esmil.dk>
>>> Signed-off-by: Douglas Anderson <dianders@chromium.org>
>>> ---
>>>
>>> Changes in v2:
>>> - Removed extra "int i"
>>>
>>>    drivers/mmc/host/dw_mmc.c | 91
>>> +++++++++++++++++++++++++++++++++++++++++------
>>>    1 file changed, 81 insertions(+), 10 deletions(-)
>>>
>>> diff --git a/drivers/mmc/host/dw_mmc.c b/drivers/mmc/host/dw_mmc.c
>>> index 16516c528a88..50148991f30e 100644
>>> --- a/drivers/mmc/host/dw_mmc.c
>>> +++ b/drivers/mmc/host/dw_mmc.c
>>> @@ -403,6 +403,7 @@ static inline void dw_mci_set_cto(struct dw_mci *host)
>>>          unsigned int cto_clks;
>>>          unsigned int cto_div;
>>>          unsigned int cto_ms;
>>> +       unsigned long irqflags;
>>>          cto_clks = mci_readl(host, TMOUT) & 0xff;
>>>          cto_div = (mci_readl(host, CLKDIV) & 0xff) * 2;
>>> @@ -413,8 +414,24 @@ static inline void dw_mci_set_cto(struct dw_mci
>>> *host)
>>>          /* add a bit spare time */
>>>          cto_ms += 10;
>>>    -     mod_timer(&host->cto_timer,
>>> -                 jiffies + msecs_to_jiffies(cto_ms) + 1);
>>> +       /*
>>> +        * The durations we're working with are fairly short so we have to
>>> be
>>> +        * extra careful about synchronization here.  Specifically in
>>> hardware a
>>> +        * command timeout is _at most_ 5.1 ms, so that means we expect an
>>> +        * interrupt (either command done or timeout) to come rather
>>> quickly
>>> +        * after the mci_writel.  ...but just in case we have a long
>>> interrupt
>>> +        * latency let's add a bit of paranoia.
>>> +        *
>>> +        * In general we'll assume that at least an interrupt will be
>>> asserted
>>> +        * in hardware by the time the cto_timer runs.  ...and if it
>>> hasn't
>>> +        * been asserted in hardware by that time then we'll assume it'll
>>> never
>>> +        * come.
>>> +        */
>>> +       spin_lock_irqsave(&host->irq_lock, irqflags);
>>> +       if (!test_bit(EVENT_CMD_COMPLETE, &host->pending_events))
>>> +               mod_timer(&host->cto_timer,
>>> +                       jiffies + msecs_to_jiffies(cto_ms) + 1);
>>> +       spin_unlock_irqrestore(&host->irq_lock, irqflags);
>>
>>
>> IIUC, this change is beacuse you move
>> mci_writel(host, CMD, cmd_flags | SDMMC_CMD_START) before
>> setting up the timer, so there is a timing gap that the cmd_done
>> already comes and handled by dw_mci_interrupt->dw_mci_cmd_interrupt.
>> At this point, we don't need the cto timer at all.
> 
> As per below, if I don't move the mci_writel() before setting up the
> timer then there's still a race.  ...and actually that race was harder
> for me to write code for, but I invite you to try to see if it's
> somehow cleaner.
> 
> 
>>>    }
>>>      static void dw_mci_start_command(struct dw_mci *host,
>>> @@ -429,11 +446,11 @@ static void dw_mci_start_command(struct dw_mci
>>> *host,
>>>          wmb(); /* drain writebuffer */
>>>          dw_mci_wait_while_busy(host, cmd_flags);
>>>    +     mci_writel(host, CMD, cmd_flags | SDMMC_CMD_START);
>>> +
>>>          /* response expected command only */
>>>          if (cmd_flags & SDMMC_CMD_RESP_EXP)
>>>                  dw_mci_set_cto(host);
>>> -
>>> -       mci_writel(host, CMD, cmd_flags | SDMMC_CMD_START);
>>
>>
>>
>> But why? If we still keep the original logic, it's always correct
>> that cmd_done comes after setting up the cto timer. So could you
>> eleborate a bit more to help me understand the real intention here?
> 
> No matter which order you put things, there's a race one way or the
> other.  You need a lock.
> 
> Let's think about the old code you wrote.  You did this:
> 1. Start the CTO timer.
> 2. Start the command.
> 
> Now if you (somehow) take 20 ms to handle the interrupt, then this happens:
> 
> 1. Start the CTO timer.
> 2. Start the command.
> 3. Interrupt is pending, but interrupt handler doesn't run yet.
> 4. CTO timer fires and enqueues CTO timeout.
> 5. Interrupt finally fires.

OK, got it.

> 
> 
> Now normally things are pretty bad if you've got an interrupt latency
> of 20 ms.  ...and, in fact, I originally wrote up a commit that simply
> explained why the race didn't matter and was thinking of posting that
> instead of this one.  I wrote up:
> 
>       * Start a timer to detect missing cmd timeout if we expect a response.
>       *
>       * Note that we need to be a little careful about race conditions here
>       * since our timer will be racing with the actual hardware interrupt
>       * and things would get confused if both of them happened.
>       *
>       * We end up avoiding races here mostly because of our 10 ms "spare
>       * time" buffer above.  That's probably reliable enough because:
>       * - There's "guaranteed" "very little" time between setting the timer
>       *   and starting the command.  We're holding a spinlock (host->lock)
>       *   in all calls to this function so we won't get preempted.  Possibly
>       *   we could get interrupts still, but that shouldn't add up to
>       *   anything like the 10 ms spare time.
>       * - We expect that when the actual interrupt fires that our interrupt
>       *   routine should get called "relatively quickly" (compared to the
>       *   10 ms buffer) and will be able to cancel this timer.
> 
> ...but then I ran a whole bunch of tests and I found that, as far as I
> could tell, we actually _were_ getting a super long interrupt latency.
> Specifically I saw the printout "Unexpected interrupt latency" in my
> patch.  In order to see that printout in my patch (which even starts
> the command _before_ the CTO timer), the only explanation is bad
> interrupt latency, right?  Also: based on my past experience I believe
> it is possible to get upwards of 100 ms interrupt latency if you've
> got serial console enabled.  printk, especially printk from an
> interrupt context, can do some funny things.
> 

Right! It makes sense to me now.

> 
> ...but this stuff is always hard to get right, so if I messed up the
> above please let me know!  I tried to think of all of the cases so it
> would work no matter if delays happened in any random place but
> concurrency is hard.

Yes, it looks hard to get concurrency right. I have a comment for your
DRTO case(patch 5). Let's do some brainstorm there.

> 
> 
> -Doug
> 
> 
> 

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH v2 5/5] mmc: dw_mmc: Cleanup the DTO timer like the CTO one
  2017-10-12 20:11 ` [PATCH v2 5/5] mmc: dw_mmc: Cleanup the DTO timer like the CTO one Douglas Anderson
@ 2017-10-17  1:17   ` Shawn Lin
  2017-10-17  5:05     ` Doug Anderson
  0 siblings, 1 reply; 20+ messages in thread
From: Shawn Lin @ 2017-10-17  1:17 UTC (permalink / raw)
  To: Douglas Anderson
  Cc: jh80.chung, ulf.hansson, shawn.lin, xzy.xu, amstan,
	linux-rockchip, briannorris, linux-samsung-soc, kernel,
	linux-mmc, linux-kernel

Hi Doug

On 2017/10/13 4:11, Douglas Anderson wrote:
> The recent CTO timer introduced in commit 03de19212ea3 ("mmc: dw_mmc:
> introduce timer for broken command transfer over scheme") was causing
> observable problems due to race conditions.  Previous patches have
> fixed those race conditions.
> 
> It can be observed that these same race conditions ought to be
> theoretically possible with the DTO timer too though they are
> massively less likely to happen because the data timeout is always set
> to 0xffffff right now.  That means even at a 200 MHz card clock we
> were arming the DTO timer for 94 ms:
>    >>> (0xffffff * 1000. / 200000000) + 10
>    93.886075
> 
> We always also were setting the DTO timer _after_ starting the
> transfer, unlike how the old code was seting the CTO timer.
> 
> In any case, even though the DTO timer is much less likely to have
> races, it still makes sense to add code to handle it _just in case_.
> 
> Signed-off-by: Douglas Anderson <dianders@chromium.org>
> ---
> 
> Changes in v2:
> - Cleanup the DTO timer new for v2
> 
>   drivers/mmc/host/dw_mmc.c | 54 ++++++++++++++++++++++++++++++++++++++++++++---
>   1 file changed, 51 insertions(+), 3 deletions(-)
> 
> diff --git a/drivers/mmc/host/dw_mmc.c b/drivers/mmc/host/dw_mmc.c
> index 6bc87b1385a9..bc0808615431 100644
> --- a/drivers/mmc/host/dw_mmc.c
> +++ b/drivers/mmc/host/dw_mmc.c
> @@ -1950,7 +1950,11 @@ static void dw_mci_set_drto(struct dw_mci *host)
>   	/* add a bit spare time */
>   	drto_ms += 10;
>   
> -	mod_timer(&host->dto_timer, jiffies + msecs_to_jiffies(drto_ms));
> +	spin_lock_irqsave(&host->irq_lock, irqflags);
> +	if (!test_bit(EVENT_DATA_COMPLETE, &host->pending_events))
> +		mod_timer(&host->dto_timer,
> +			  jiffies + msecs_to_jiffies(drto_ms));
> +	spin_unlock_irqrestore(&host->irq_lock, irqflags);
>   }
>   
>   static bool dw_mci_clear_pending_cmd_complete(struct dw_mci *host)
> @@ -1971,6 +1975,18 @@ static bool dw_mci_clear_pending_cmd_complete(struct dw_mci *host)
>   	return true;
>   }
>   
> +static bool dw_mci_clear_pending_data_complete(struct dw_mci *host)
> +{
> +	if (!test_bit(EVENT_DATA_COMPLETE, &host->pending_events))
> +		return false;
> +
> +	/* Extra paranoia just like dw_mci_clear_pending_cmd_complete() */
> +	WARN_ON(del_timer_sync(&host->dto_timer));
> +	clear_bit(EVENT_DATA_COMPLETE, &host->pending_events);
> +
> +	return true;
> +}
> +
>   static void dw_mci_tasklet_func(unsigned long priv)
>   {
>   	struct dw_mci *host = (struct dw_mci *)priv;
> @@ -2112,8 +2128,7 @@ static void dw_mci_tasklet_func(unsigned long priv)
>   			/* fall through */
>   
>   		case STATE_DATA_BUSY:
> -			if (!test_and_clear_bit(EVENT_DATA_COMPLETE,
> -						&host->pending_events)) {
> +			if (!dw_mci_clear_pending_data_complete(host)) {
>   				/*
>   				 * If data error interrupt comes but data over
>   				 * interrupt doesn't come within the given time.
> @@ -2683,6 +2698,8 @@ static irqreturn_t dw_mci_interrupt(int irq, void *dev_id)
>   		}
>   
>   		if (pending & SDMMC_INT_DATA_OVER) {
> +			spin_lock_irqsave(&host->irq_lock, irqflags);
> +
>   			del_timer(&host->dto_timer);
>   
>   			mci_writel(host, RINTSTS, SDMMC_INT_DATA_OVER);
> @@ -2695,6 +2712,8 @@ static irqreturn_t dw_mci_interrupt(int irq, void *dev_id)
>   			}
>   			set_bit(EVENT_DATA_COMPLETE, &host->pending_events);
>   			tasklet_schedule(&host->tasklet);
> +
> +			spin_unlock_irqrestore(&host->irq_lock, irqflags);
>   		}
>   
>   		if (pending & SDMMC_INT_RXDR) {
> @@ -3044,7 +3063,31 @@ static void dw_mci_cto_timer(unsigned long arg)
>   static void dw_mci_dto_timer(unsigned long arg)
>   {
>   	struct dw_mci *host = (struct dw_mci *)arg;
> +	unsigned long irqflags;
> +	u32 pending;
> +
> +	spin_lock_irqsave(&host->irq_lock, irqflags);
>   
> +	/*
> +	 * The DTO timer is much longer than the CTO timer, so it's even less
> +	 * likely that we'll these cases, but it pays to be paranoid.
> +	 */
> +	pending = mci_readl(host, MINTSTS); /* read-only mask reg */
> +	if (pending & SDMMC_INT_DATA_OVER) {
> +		/* The interrupt should fire; no need to act but we can warn */
> +		dev_warn(host->dev, "Unexpected data interrupt latency\n");
> +		goto exit;

I was checking a problem like this:

(1) Start a CTO timer
(2) Start a command
(3) Got CMD_DONE interrupt and cancel the CTO timer
(4) Start a DRTO timer
(5) Start external dma to get the data from fifo
(6) The system bus/DRAM port is idle for a very long time for no
matter what happen.
(7) DRTO timer fires but DTO was set as the card have already
sent all data to the fifo.
(8) Now you patch bails out earlier  and notify the mmc core that this
data transfer was finished successfully.
(9) mmc core propgate the successful state to block layer and maybe
a critical reader in file system will use the data right now but it
falls into trouble due to the incomplete data.


The problem comes from step 6 and setep 7. Quote some bit from dwmmc
databook, V270a, section 7.1,

"While using the external DMA interface for reading from a card, the DTO
interrupt occurs only after all the data is flushed to memory by the DMA
Interface unit. A Busy Clear Interrupt is asserted after the DTO."

So the DTO isn't reliable or perfectly good in practice for that case
that the delay is in external DMA side. That is hard to reproduced but
it was the reason for me to come up with the immature idea of adding
a longer enough and catch-all timer. Or we only set a longer enough
timeout value for CTO and DRTO timer and we could blindly believe the
hardware falls into troube for HW reason and seems that makes the change
simpler. Looking forward to your opinion. :)





> +	}
> +	if (test_bit(EVENT_DATA_COMPLETE, &host->pending_events)) {
> +		/* Presumably interrupt handler couldn't delete the timer */
> +		dev_warn(host->dev, "DTO timeout when already completed\n");
> +		goto exit;
> +	}
> +
> +	/*
> +	 * Continued paranoia to make sure we're in the state we expect.
> +	 * This paranoia isn't really justified but it seems good to be safe.
> +	 */
>   	switch (host->state) {
>   	case STATE_SENDING_DATA:
>   	case STATE_DATA_BUSY:
> @@ -3059,8 +3102,13 @@ static void dw_mci_dto_timer(unsigned long arg)
>   		tasklet_schedule(&host->tasklet);
>   		break;
>   	default:
> +		dev_warn(host->dev, "Unexpected data timeout, state %d\n",
> +			 host->state);
>   		break;
>   	}
> +
> +exit:
> +	spin_unlock_irqrestore(&host->irq_lock, irqflags);
>   }
>   
>   #ifdef CONFIG_OF
> 

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH v2 5/5] mmc: dw_mmc: Cleanup the DTO timer like the CTO one
  2017-10-17  1:17   ` Shawn Lin
@ 2017-10-17  5:05     ` Doug Anderson
  2017-10-17  6:33       ` Shawn Lin
  0 siblings, 1 reply; 20+ messages in thread
From: Doug Anderson @ 2017-10-17  5:05 UTC (permalink / raw)
  To: Shawn Lin
  Cc: Jaehoon Chung, Ulf Hansson, Ziyuan Xu, Alexandru M Stan,
	open list:ARM/Rockchip SoC...,
	Brian Norris, linux-samsung-soc, Emil Renner Berthing, linux-mmc,
	linux-kernel

Hi,

On Mon, Oct 16, 2017 at 6:17 PM, Shawn Lin <shawn.lin@rock-chips.com> wrote:
> Hi Doug
>
>
> On 2017/10/13 4:11, Douglas Anderson wrote:
>>
>> The recent CTO timer introduced in commit 03de19212ea3 ("mmc: dw_mmc:
>> introduce timer for broken command transfer over scheme") was causing
>> observable problems due to race conditions.  Previous patches have
>> fixed those race conditions.
>>
>> It can be observed that these same race conditions ought to be
>> theoretically possible with the DTO timer too though they are
>> massively less likely to happen because the data timeout is always set
>> to 0xffffff right now.  That means even at a 200 MHz card clock we
>> were arming the DTO timer for 94 ms:
>>    >>> (0xffffff * 1000. / 200000000) + 10
>>    93.886075
>>
>> We always also were setting the DTO timer _after_ starting the
>> transfer, unlike how the old code was seting the CTO timer.
>>
>> In any case, even though the DTO timer is much less likely to have
>> races, it still makes sense to add code to handle it _just in case_.
>>
>> Signed-off-by: Douglas Anderson <dianders@chromium.org>
>> ---
>>
>> Changes in v2:
>> - Cleanup the DTO timer new for v2
>>
>>   drivers/mmc/host/dw_mmc.c | 54
>> ++++++++++++++++++++++++++++++++++++++++++++---
>>   1 file changed, 51 insertions(+), 3 deletions(-)
>>
>> diff --git a/drivers/mmc/host/dw_mmc.c b/drivers/mmc/host/dw_mmc.c
>> index 6bc87b1385a9..bc0808615431 100644
>> --- a/drivers/mmc/host/dw_mmc.c
>> +++ b/drivers/mmc/host/dw_mmc.c
>> @@ -1950,7 +1950,11 @@ static void dw_mci_set_drto(struct dw_mci *host)
>>         /* add a bit spare time */
>>         drto_ms += 10;
>>   -     mod_timer(&host->dto_timer, jiffies + msecs_to_jiffies(drto_ms));
>> +       spin_lock_irqsave(&host->irq_lock, irqflags);
>> +       if (!test_bit(EVENT_DATA_COMPLETE, &host->pending_events))
>> +               mod_timer(&host->dto_timer,
>> +                         jiffies + msecs_to_jiffies(drto_ms));
>> +       spin_unlock_irqrestore(&host->irq_lock, irqflags);
>>   }
>>     static bool dw_mci_clear_pending_cmd_complete(struct dw_mci *host)
>> @@ -1971,6 +1975,18 @@ static bool
>> dw_mci_clear_pending_cmd_complete(struct dw_mci *host)
>>         return true;
>>   }
>>   +static bool dw_mci_clear_pending_data_complete(struct dw_mci *host)
>> +{
>> +       if (!test_bit(EVENT_DATA_COMPLETE, &host->pending_events))
>> +               return false;
>> +
>> +       /* Extra paranoia just like dw_mci_clear_pending_cmd_complete() */
>> +       WARN_ON(del_timer_sync(&host->dto_timer));
>> +       clear_bit(EVENT_DATA_COMPLETE, &host->pending_events);
>> +
>> +       return true;
>> +}
>> +
>>   static void dw_mci_tasklet_func(unsigned long priv)
>>   {
>>         struct dw_mci *host = (struct dw_mci *)priv;
>> @@ -2112,8 +2128,7 @@ static void dw_mci_tasklet_func(unsigned long priv)
>>                         /* fall through */
>>                 case STATE_DATA_BUSY:
>> -                       if (!test_and_clear_bit(EVENT_DATA_COMPLETE,
>> -                                               &host->pending_events)) {
>> +                       if (!dw_mci_clear_pending_data_complete(host)) {
>>                                 /*
>>                                  * If data error interrupt comes but data
>> over
>>                                  * interrupt doesn't come within the given
>> time.
>> @@ -2683,6 +2698,8 @@ static irqreturn_t dw_mci_interrupt(int irq, void
>> *dev_id)
>>                 }
>>                 if (pending & SDMMC_INT_DATA_OVER) {
>> +                       spin_lock_irqsave(&host->irq_lock, irqflags);
>> +
>>                         del_timer(&host->dto_timer);
>>                         mci_writel(host, RINTSTS, SDMMC_INT_DATA_OVER);
>> @@ -2695,6 +2712,8 @@ static irqreturn_t dw_mci_interrupt(int irq, void
>> *dev_id)
>>                         }
>>                         set_bit(EVENT_DATA_COMPLETE,
>> &host->pending_events);
>>                         tasklet_schedule(&host->tasklet);
>> +
>> +                       spin_unlock_irqrestore(&host->irq_lock, irqflags);
>>                 }
>>                 if (pending & SDMMC_INT_RXDR) {
>> @@ -3044,7 +3063,31 @@ static void dw_mci_cto_timer(unsigned long arg)
>>   static void dw_mci_dto_timer(unsigned long arg)
>>   {
>>         struct dw_mci *host = (struct dw_mci *)arg;
>> +       unsigned long irqflags;
>> +       u32 pending;
>> +
>> +       spin_lock_irqsave(&host->irq_lock, irqflags);
>>   +     /*
>> +        * The DTO timer is much longer than the CTO timer, so it's even
>> less
>> +        * likely that we'll these cases, but it pays to be paranoid.
>> +        */
>> +       pending = mci_readl(host, MINTSTS); /* read-only mask reg */
>> +       if (pending & SDMMC_INT_DATA_OVER) {
>> +               /* The interrupt should fire; no need to act but we can
>> warn */
>> +               dev_warn(host->dev, "Unexpected data interrupt
>> latency\n");
>> +               goto exit;
>
>
> I was checking a problem like this:
>
> (1) Start a CTO timer
> (2) Start a command
> (3) Got CMD_DONE interrupt and cancel the CTO timer
> (4) Start a DRTO timer
> (5) Start external dma to get the data from fifo
> (6) The system bus/DRAM port is idle for a very long time for no
> matter what happen.
> (7) DRTO timer fires but DTO was set as the card have already
> sent all data to the fifo.
> (8) Now you patch bails out earlier  and notify the mmc core that this
> data transfer was finished successfully.

I don't understand how you're saying that my patch will notify the mmc
core that the data transfer was finished successfully.  Two things:

A) My patch should only be fixing race conditions here and not
introducing anything new.  In other words if we are somehow
accidentally telling the MMC core that we have a successful transfer
then I don't believe that's something new that my patch introduced.

B) If the dw_mci_dto_timer function gets called then we always
indicate an error.

Specifically the _only_ action that the dw_mci_dto_timer() function
can take is this:

                host->data_status = SDMMC_INT_DRTO;
                set_bit(EVENT_DATA_ERROR, &host->pending_events);
                set_bit(EVENT_DATA_COMPLETE, &host->pending_events);
                tasklet_schedule(&host->tasklet);

Which sets the "EVENT_DATA_ERROR" and thus can't tell the mmc core
that this data transfer was finished "successfully"

> (9) mmc core propgate the successful state to block layer and maybe
> a critical reader in file system will use the data right now but it
> falls into trouble due to the incomplete data.
>
>
> The problem comes from step 6 and setep 7. Quote some bit from dwmmc
> databook, V270a, section 7.1,
>
> "While using the external DMA interface for reading from a card, the DTO
> interrupt occurs only after all the data is flushed to memory by the DMA
> Interface unit. A Busy Clear Interrupt is asserted after the DTO."

Ugh.  Not your fault, but terrible terms.  I keep getting "DTO" and
"DRTO" confused, especially since in the code the "drto" timer is
called the "dto" timer.

DTO = Data Transfer Over
DRTO = Data Read Time Out

NOTE: it seems the bit you're quoting from the databook say that the
DTO is expected to be delayed with external DMA.  This doesn't seem to
match what you said above that  "(7) DRTO timer fires but DTO was set
as the card have already sent all data to the fifo.".  If the databook
is saying that "DTO" will be delayed then how could DTO already be set
when the timer fires??

> So the DTO isn't reliable or perfectly good in practice for that case
> that the delay is in external DMA side.

So just to restate to make sure I'm understanding you properly:

If you're using external DMA then it's possible that you'll get a Data
Transfer Over (DTO) interrupt at some point in time _later_ than the
more than 94 ms that we're waiting because the DTO timer can't be
asserted until all the DMA is flushed.  Actually, on Rockchip you
can't run faster than 150 MHz, so it's actually 121 ms.  It seems a
little bit hard to believe that DMA for a transfer is taking more than
121 ms to flush, but I guess it could happen?

It seems even harder to believe that it's taking > 121 ms to flush and
the system is running well enough that it was able to get to the dto
timer function all without the DTO interrupt bit even being set.

In any case, if the DMA transfer really is taking more than 121 ms to
flush then we'll assert a DRTO interrupt and report an error to the
MMC core.  I suppose (by chance) we could somehow get confused when
the DTO interrupt finally fires and then we could think that a 2nd
transfer finished, but I'm not even sure that would happen...

> That is hard to reproduced but
> it was the reason for me to come up with the immature idea of adding
> a longer enough and catch-all timer. Or we only set a longer enough
> timeout value for CTO and DRTO timer and we could blindly believe the
> hardware falls into troube for HW reason and seems that makes the change
> simpler. Looking forward to your opinion. :)

If you're running into the problem you describe, it kinda sounds like
it's more reason _not_ to use the same code for the CTO and DRTO
timers.  As I understand it, the CTO timer _doesn't_ suffer from the
problems above., so we shouldn't make it suffer any workarounds we
need for the DRTO.  Also the CTO timer is _very_ fast.  We expect a
normal CTO within 1 ms whereas the DRTO timer is much, much longer.
If it's been 10 ms and we haven't seen a command finish and haven't
seen a real CTO then there's no reason to delay further.

As for blindly setting a longer timeout for CTO / DRTO I'm not sure
that's a great idea.  We routinely get these timeouts in tuning and we
really don't want to make tuning even slower than it already is by
lengthening any of these timeouts too much.

Overall: if you're having weird trouble with external DMA as you
describe, I suppose you could just have an even longer DRTO delay only
for external DMA?

NOTE also: the DesignWare Manual that I have (2.80a) actually even
suggests that for long data timeouts (like 100ms) that a software
timeout is appropriate.  They even suggest that in that case you could
rely only on the software timeout.  They say:

> Note: The software timer should be used if the timeout value is in the order
> of 100 ms. In this case, read data timeout interrupt needs to be disabled.

Presumably they are saying that because you can't really express much
more than 100 ms in the TMOUT register?

---

Just to summarize:

* I don't think my patch introduces any new problems like the one you
describe.  I think if there are problems like that, they are
pre-existing.

* I don't see a reason to use a catchall timer where we use the same
timeout for CTO and DRTO.  We could try to save a few bytes of storage
space and have a single "struct timer" at the expense of a little
extra logic to try to disambiguate, but I'm not terribly interested in
writing or reviewing that patch.

---

As always, please let me know if I got mixed up somewhere.

-Doug

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH v2 5/5] mmc: dw_mmc: Cleanup the DTO timer like the CTO one
  2017-10-17  5:05     ` Doug Anderson
@ 2017-10-17  6:33       ` Shawn Lin
  0 siblings, 0 replies; 20+ messages in thread
From: Shawn Lin @ 2017-10-17  6:33 UTC (permalink / raw)
  To: Doug Anderson
  Cc: shawn.lin, Jaehoon Chung, Ulf Hansson, Ziyuan Xu,
	Alexandru M Stan, open list:ARM/Rockchip SoC...,
	Brian Norris, linux-samsung-soc, Emil Renner Berthing, linux-mmc,
	linux-kernel



On 2017/10/17 13:05, Doug Anderson wrote:
> Hi,
> 
> On Mon, Oct 16, 2017 at 6:17 PM, Shawn Lin <shawn.lin@rock-chips.com> wrote:
>> Hi Doug
>>
>>
>> On 2017/10/13 4:11, Douglas Anderson wrote:
>>>
>>> The recent CTO timer introduced in commit 03de19212ea3 ("mmc: dw_mmc:
>>> introduce timer for broken command transfer over scheme") was causing
>>> observable problems due to race conditions.  Previous patches have
>>> fixed those race conditions.
>>>
>>> It can be observed that these same race conditions ought to be
>>> theoretically possible with the DTO timer too though they are
>>> massively less likely to happen because the data timeout is always set
>>> to 0xffffff right now.  That means even at a 200 MHz card clock we
>>> were arming the DTO timer for 94 ms:
>>>     >>> (0xffffff * 1000. / 200000000) + 10
>>>     93.886075
>>>
>>> We always also were setting the DTO timer _after_ starting the
>>> transfer, unlike how the old code was seting the CTO timer.
>>>
>>> In any case, even though the DTO timer is much less likely to have
>>> races, it still makes sense to add code to handle it _just in case_.
>>>
>>> Signed-off-by: Douglas Anderson <dianders@chromium.org>
>>> ---
>>>
>>> Changes in v2:
>>> - Cleanup the DTO timer new for v2
>>>
>>>    drivers/mmc/host/dw_mmc.c | 54
>>> ++++++++++++++++++++++++++++++++++++++++++++---
>>>    1 file changed, 51 insertions(+), 3 deletions(-)
>>>
>>> diff --git a/drivers/mmc/host/dw_mmc.c b/drivers/mmc/host/dw_mmc.c
>>> index 6bc87b1385a9..bc0808615431 100644
>>> --- a/drivers/mmc/host/dw_mmc.c
>>> +++ b/drivers/mmc/host/dw_mmc.c
>>> @@ -1950,7 +1950,11 @@ static void dw_mci_set_drto(struct dw_mci *host)
>>>          /* add a bit spare time */
>>>          drto_ms += 10;
>>>    -     mod_timer(&host->dto_timer, jiffies + msecs_to_jiffies(drto_ms));
>>> +       spin_lock_irqsave(&host->irq_lock, irqflags);
>>> +       if (!test_bit(EVENT_DATA_COMPLETE, &host->pending_events))
>>> +               mod_timer(&host->dto_timer,
>>> +                         jiffies + msecs_to_jiffies(drto_ms));
>>> +       spin_unlock_irqrestore(&host->irq_lock, irqflags);
>>>    }
>>>      static bool dw_mci_clear_pending_cmd_complete(struct dw_mci *host)
>>> @@ -1971,6 +1975,18 @@ static bool
>>> dw_mci_clear_pending_cmd_complete(struct dw_mci *host)
>>>          return true;
>>>    }
>>>    +static bool dw_mci_clear_pending_data_complete(struct dw_mci *host)
>>> +{
>>> +       if (!test_bit(EVENT_DATA_COMPLETE, &host->pending_events))
>>> +               return false;
>>> +
>>> +       /* Extra paranoia just like dw_mci_clear_pending_cmd_complete() */
>>> +       WARN_ON(del_timer_sync(&host->dto_timer));
>>> +       clear_bit(EVENT_DATA_COMPLETE, &host->pending_events);
>>> +
>>> +       return true;
>>> +}
>>> +
>>>    static void dw_mci_tasklet_func(unsigned long priv)
>>>    {
>>>          struct dw_mci *host = (struct dw_mci *)priv;
>>> @@ -2112,8 +2128,7 @@ static void dw_mci_tasklet_func(unsigned long priv)
>>>                          /* fall through */
>>>                  case STATE_DATA_BUSY:
>>> -                       if (!test_and_clear_bit(EVENT_DATA_COMPLETE,
>>> -                                               &host->pending_events)) {
>>> +                       if (!dw_mci_clear_pending_data_complete(host)) {
>>>                                  /*
>>>                                   * If data error interrupt comes but data
>>> over
>>>                                   * interrupt doesn't come within the given
>>> time.
>>> @@ -2683,6 +2698,8 @@ static irqreturn_t dw_mci_interrupt(int irq, void
>>> *dev_id)
>>>                  }
>>>                  if (pending & SDMMC_INT_DATA_OVER) {
>>> +                       spin_lock_irqsave(&host->irq_lock, irqflags);
>>> +
>>>                          del_timer(&host->dto_timer);
>>>                          mci_writel(host, RINTSTS, SDMMC_INT_DATA_OVER);
>>> @@ -2695,6 +2712,8 @@ static irqreturn_t dw_mci_interrupt(int irq, void
>>> *dev_id)
>>>                          }
>>>                          set_bit(EVENT_DATA_COMPLETE,
>>> &host->pending_events);
>>>                          tasklet_schedule(&host->tasklet);
>>> +
>>> +                       spin_unlock_irqrestore(&host->irq_lock, irqflags);
>>>                  }
>>>                  if (pending & SDMMC_INT_RXDR) {
>>> @@ -3044,7 +3063,31 @@ static void dw_mci_cto_timer(unsigned long arg)
>>>    static void dw_mci_dto_timer(unsigned long arg)
>>>    {
>>>          struct dw_mci *host = (struct dw_mci *)arg;
>>> +       unsigned long irqflags;
>>> +       u32 pending;
>>> +
>>> +       spin_lock_irqsave(&host->irq_lock, irqflags);
>>>    +     /*
>>> +        * The DTO timer is much longer than the CTO timer, so it's even
>>> less
>>> +        * likely that we'll these cases, but it pays to be paranoid.
>>> +        */
>>> +       pending = mci_readl(host, MINTSTS); /* read-only mask reg */
>>> +       if (pending & SDMMC_INT_DATA_OVER) {
>>> +               /* The interrupt should fire; no need to act but we can
>>> warn */
>>> +               dev_warn(host->dev, "Unexpected data interrupt
>>> latency\n");
>>> +               goto exit;
>>
>>
>> I was checking a problem like this:
>>
>> (1) Start a CTO timer
>> (2) Start a command
>> (3) Got CMD_DONE interrupt and cancel the CTO timer
>> (4) Start a DRTO timer
>> (5) Start external dma to get the data from fifo
>> (6) The system bus/DRAM port is idle for a very long time for no
>> matter what happen.
>> (7) DRTO timer fires but DTO was set as the card have already
>> sent all data to the fifo.
>> (8) Now you patch bails out earlier  and notify the mmc core that this
>> data transfer was finished successfully.
> 
> I don't understand how you're saying that my patch will notify the mmc
> core that the data transfer was finished successfully.  Two things:
> 
> A) My patch should only be fixing race conditions here and not
> introducing anything new.  In other words if we are somehow
> accidentally telling the MMC core that we have a successful transfer
> then I don't believe that's something new that my patch introduced.
> 
> 
> B) If the dw_mci_dto_timer function gets called then we always
> indicate an error.

Oh，yes it is. I overlooked the "goto exit;".

> 
> Specifically the _only_ action that the dw_mci_dto_timer() function
> can take is this:
> 
>                  host->data_status = SDMMC_INT_DRTO;
>                  set_bit(EVENT_DATA_ERROR, &host->pending_events);
>                  set_bit(EVENT_DATA_COMPLETE, &host->pending_events);
>                  tasklet_schedule(&host->tasklet);
> 
> Which sets the "EVENT_DATA_ERROR" and thus can't tell the mmc core
> that this data transfer was finished "successfully"
> 
>> (9) mmc core propgate the successful state to block layer and maybe
>> a critical reader in file system will use the data right now but it
>> falls into trouble due to the incomplete data.
>>
>>
>> The problem comes from step 6 and setep 7. Quote some bit from dwmmc
>> databook, V270a, section 7.1,
>>
>> "While using the external DMA interface for reading from a card, the DTO
>> interrupt occurs only after all the data is flushed to memory by the DMA
>> Interface unit. A Busy Clear Interrupt is asserted after the DTO."
> 
> Ugh.  Not your fault, but terrible terms.  I keep getting "DTO" and
> "DRTO" confused, especially since in the code the "drto" timer is
> called the "dto" timer.
> 
> DTO = Data Transfer Over
> DRTO = Data Read Time Out
> 
> NOTE: it seems the bit you're quoting from the databook say that the
> DTO is expected to be delayed with external DMA.  This doesn't seem to
> match what you said above that  "(7) DRTO timer fires but DTO was set
> as the card have already sent all data to the fifo.".  If the databook
> is saying that "DTO" will be delayed then how could DTO already be set
> when the timer fires??
> >
> 
>> So the DTO isn't reliable or perfectly good in practice for that case
>> that the delay is in external DMA side.
> 
> So just to restate to make sure I'm understanding you properly:
> 
> If you're using external DMA then it's possible that you'll get a Data
> Transfer Over (DTO) interrupt at some point in time _later_ than the
> more than 94 ms that we're waiting because the DTO timer can't be
> asserted until all the DMA is flushed.  Actually, on Rockchip you
> can't run faster than 150 MHz, so it's actually 121 ms.  It seems a
> little bit hard to believe that DMA for a transfer is taking more than
> 121 ms to flush, but I guess it could happen?
> 

In theory it could, but I didn't see it.

> It seems even harder to believe that it's taking > 121 ms to flush and
> the system is running well enough that it was able to get to the dto
> timer function all without the DTO interrupt bit even being set.
> 
> In any case, if the DMA transfer really is taking more than 121 ms to
> flush then we'll assert a DRTO interrupt and report an error to the
> MMC core.  I suppose (by chance) we could somehow get confused when
> the DTO interrupt finally fires and then we could think that a 2nd
> transfer finished, but I'm not even sure that would happen...

So there is no problem now for me as I missed some code above.

> 
> 
>> That is hard to reproduced but
>> it was the reason for me to come up with the immature idea of adding
>> a longer enough and catch-all timer. Or we only set a longer enough
>> timeout value for CTO and DRTO timer and we could blindly believe the
>> hardware falls into troube for HW reason and seems that makes the change
>> simpler. Looking forward to your opinion. :)
> 
> If you're running into the problem you describe, it kinda sounds like
> it's more reason _not_ to use the same code for the CTO and DRTO
> timers.  As I understand it, the CTO timer _doesn't_ suffer from the
> problems above., so we shouldn't make it suffer any workarounds we
> need for the DRTO.  Also the CTO timer is _very_ fast.  We expect a
> normal CTO within 1 ms whereas the DRTO timer is much, much longer.
> If it's been 10 ms and we haven't seen a command finish and haven't
> seen a real CTO then there's no reason to delay further.

Ok, now I aggree with you.

> 
> 
> As for blindly setting a longer timeout for CTO / DRTO I'm not sure
> that's a great idea.  We routinely get these timeouts in tuning and we
> really don't want to make tuning even slower than it already is by
> lengthening any of these timeouts too much.
> 
> 
> Overall: if you're having weird trouble with external DMA as you
> describe, I suppose you could just have an even longer DRTO delay only
> for external DMA?
> 
> 
> NOTE also: the DesignWare Manual that I have (2.80a) actually even
> suggests that for long data timeouts (like 100ms) that a software
> timeout is appropriate.  They even suggest that in that case you could
> rely only on the software timeout.  They say:
> 
>> Note: The software timer should be used if the timeout value is in the order
>> of 100 ms. In this case, read data timeout interrupt needs to be disabled.
> 
> Presumably they are saying that because you can't really express much
> more than 100 ms in the TMOUT register?

I'm not sure, as it says the timeout value is in the *order* of 100ms.

> 
> ---
> 
> Just to summarize:
> 
> * I don't think my patch introduces any new problems like the one you
> describe.  I think if there are problems like that, they are
> pre-existing.
> 
> * I don't see a reason to use a catchall timer where we use the same
> timeout for CTO and DRTO.  We could try to save a few bytes of storage
> space and have a single "struct timer" at the expense of a little
> extra logic to try to disambiguate, but I'm not terribly interested in
> writing or reviewing that patch.

Not need to do that.

> 
> ---
> 
> As always, please let me know if I got mixed up somewhere.

Thanks for helping me understand the solution correctly.

FWIW:

Reviewed-by: Shawn Lin <shawn.lin@rock-chips.com>

> 
> -Doug
> --
> To unsubscribe from this list: send the line "unsubscribe linux-mmc" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 
> 
> 

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH v2 3/5] mmc: dw_mmc: Add locking to the CTO timer
  2017-10-17  0:54       ` Shawn Lin
@ 2017-10-17 16:40         ` Doug Anderson
  2017-10-23 17:59           ` Doug Anderson
  0 siblings, 1 reply; 20+ messages in thread
From: Doug Anderson @ 2017-10-17 16:40 UTC (permalink / raw)
  To: Shawn Lin
  Cc: Jaehoon Chung, Ulf Hansson, Ziyuan Xu, Alexandru M Stan,
	open list:ARM/Rockchip SoC...,
	Brian Norris, linux-samsung-soc, Emil Renner Berthing, linux-mmc,
	linux-kernel

Hi,

On Mon, Oct 16, 2017 at 5:54 PM, Shawn Lin <shawn.lin@rock-chips.com> wrote:
> Hi Doug
>
>
> On 2017/10/13 12:20, Doug Anderson wrote:
>>
>> Shawn,
>>
>> On Thu, Oct 12, 2017 at 6:32 PM, Shawn Lin <shawn.lin@rock-chips.com>
>> wrote:
>>>
>>>
>>> On 2017/10/13 4:11, Douglas Anderson wrote:
>>>>
>>>>
>>>> This attempts to instill a bit of paranoia to the code dealing with
>>>> the CTO timer.  It's believed that this will make the CTO timer more
>>>> robust in the case that we're having very long interrupt latencies.
>>>>
>>>
>>> Ack. It could help fix some problems observed.
>>>
>>>
>>>> Note that I originally thought that perhaps this patch was being
>>>> overly paranoid and wasn't really needed, but then while I was running
>>>> mmc_test on an rk3399 board I saw one instance of the message:
>>>>     dwmmc_rockchip fe320000.dwmmc: Unexpected interrupt latency
>>>>
>>>> I had debug prints in the CTO timer code and I found that it was
>>>> running CMD 13 at the time.
>>>>
>>>> ...so even though this patch seems like it might be overly paranoid,
>>>> maybe it really isn't?
>>>>
>>>> Presumably the bad interrupt latency experienced was due to the fact
>>>> that I had serial console enabled as serial console is typically where
>>>> I place blame when I see absurdly large interrupt latencies.  In this
>>>> particular case there was an (unrelated) printout to the serial
>>>> console just before I saw the "Unexpected interrupt latency" printout.
>>>>
>>>> ...and actually, I managed to even reproduce the problems by running
>>>> "iw mlan0 scan > /dev/null" while mmc_test was running.  That not only
>>>> does a bunch of PCIe traffic but it also (on my system) outputs some
>>>> SELinux log spam.
>>>>>
>>>>> Fixes: 03de19212ea3 ("mmc: dw_mmc: introduce timer for broken command
>>>
>>>
>>> transfer over scheme")
>>>>
>>>>
>>>> Tested-by: Emil Renner Berthing <kernel@esmil.dk>
>>>> Signed-off-by: Douglas Anderson <dianders@chromium.org>
>>>> ---
>>>>
>>>> Changes in v2:
>>>> - Removed extra "int i"
>>>>
>>>>    drivers/mmc/host/dw_mmc.c | 91
>>>> +++++++++++++++++++++++++++++++++++++++++------
>>>>    1 file changed, 81 insertions(+), 10 deletions(-)
>>>>
>>>> diff --git a/drivers/mmc/host/dw_mmc.c b/drivers/mmc/host/dw_mmc.c
>>>> index 16516c528a88..50148991f30e 100644
>>>> --- a/drivers/mmc/host/dw_mmc.c
>>>> +++ b/drivers/mmc/host/dw_mmc.c
>>>> @@ -403,6 +403,7 @@ static inline void dw_mci_set_cto(struct dw_mci
>>>> *host)
>>>>          unsigned int cto_clks;
>>>>          unsigned int cto_div;
>>>>          unsigned int cto_ms;
>>>> +       unsigned long irqflags;
>>>>          cto_clks = mci_readl(host, TMOUT) & 0xff;
>>>>          cto_div = (mci_readl(host, CLKDIV) & 0xff) * 2;
>>>> @@ -413,8 +414,24 @@ static inline void dw_mci_set_cto(struct dw_mci
>>>> *host)
>>>>          /* add a bit spare time */
>>>>          cto_ms += 10;
>>>>    -     mod_timer(&host->cto_timer,
>>>> -                 jiffies + msecs_to_jiffies(cto_ms) + 1);
>>>> +       /*
>>>> +        * The durations we're working with are fairly short so we have
>>>> to
>>>> be
>>>> +        * extra careful about synchronization here.  Specifically in
>>>> hardware a
>>>> +        * command timeout is _at most_ 5.1 ms, so that means we expect
>>>> an
>>>> +        * interrupt (either command done or timeout) to come rather
>>>> quickly
>>>> +        * after the mci_writel.  ...but just in case we have a long
>>>> interrupt
>>>> +        * latency let's add a bit of paranoia.
>>>> +        *
>>>> +        * In general we'll assume that at least an interrupt will be
>>>> asserted
>>>> +        * in hardware by the time the cto_timer runs.  ...and if it
>>>> hasn't
>>>> +        * been asserted in hardware by that time then we'll assume
>>>> it'll
>>>> never
>>>> +        * come.
>>>> +        */
>>>> +       spin_lock_irqsave(&host->irq_lock, irqflags);
>>>> +       if (!test_bit(EVENT_CMD_COMPLETE, &host->pending_events))
>>>> +               mod_timer(&host->cto_timer,
>>>> +                       jiffies + msecs_to_jiffies(cto_ms) + 1);
>>>> +       spin_unlock_irqrestore(&host->irq_lock, irqflags);
>>>
>>>
>>>
>>> IIUC, this change is beacuse you move
>>> mci_writel(host, CMD, cmd_flags | SDMMC_CMD_START) before
>>> setting up the timer, so there is a timing gap that the cmd_done
>>> already comes and handled by dw_mci_interrupt->dw_mci_cmd_interrupt.
>>> At this point, we don't need the cto timer at all.
>>
>>
>> As per below, if I don't move the mci_writel() before setting up the
>> timer then there's still a race.  ...and actually that race was harder
>> for me to write code for, but I invite you to try to see if it's
>> somehow cleaner.
>>
>>
>>>>    }
>>>>      static void dw_mci_start_command(struct dw_mci *host,
>>>> @@ -429,11 +446,11 @@ static void dw_mci_start_command(struct dw_mci
>>>> *host,
>>>>          wmb(); /* drain writebuffer */
>>>>          dw_mci_wait_while_busy(host, cmd_flags);
>>>>    +     mci_writel(host, CMD, cmd_flags | SDMMC_CMD_START);
>>>> +
>>>>          /* response expected command only */
>>>>          if (cmd_flags & SDMMC_CMD_RESP_EXP)
>>>>                  dw_mci_set_cto(host);
>>>> -
>>>> -       mci_writel(host, CMD, cmd_flags | SDMMC_CMD_START);
>>>
>>>
>>>
>>>
>>> But why? If we still keep the original logic, it's always correct
>>> that cmd_done comes after setting up the cto timer. So could you
>>> eleborate a bit more to help me understand the real intention here?
>>
>>
>> No matter which order you put things, there's a race one way or the
>> other.  You need a lock.
>>
>> Let's think about the old code you wrote.  You did this:
>> 1. Start the CTO timer.
>> 2. Start the command.
>>
>> Now if you (somehow) take 20 ms to handle the interrupt, then this
>> happens:
>>
>> 1. Start the CTO timer.
>> 2. Start the command.
>> 3. Interrupt is pending, but interrupt handler doesn't run yet.
>> 4. CTO timer fires and enqueues CTO timeout.
>> 5. Interrupt finally fires.
>
>
> OK, got it.
>
>
>>
>>
>> Now normally things are pretty bad if you've got an interrupt latency
>> of 20 ms.  ...and, in fact, I originally wrote up a commit that simply
>> explained why the race didn't matter and was thinking of posting that
>> instead of this one.  I wrote up:
>>
>>       * Start a timer to detect missing cmd timeout if we expect a
>> response.
>>       *
>>       * Note that we need to be a little careful about race conditions
>> here
>>       * since our timer will be racing with the actual hardware interrupt
>>       * and things would get confused if both of them happened.
>>       *
>>       * We end up avoiding races here mostly because of our 10 ms "spare
>>       * time" buffer above.  That's probably reliable enough because:
>>       * - There's "guaranteed" "very little" time between setting the
>> timer
>>       *   and starting the command.  We're holding a spinlock (host->lock)
>>       *   in all calls to this function so we won't get preempted.
>> Possibly
>>       *   we could get interrupts still, but that shouldn't add up to
>>       *   anything like the 10 ms spare time.
>>       * - We expect that when the actual interrupt fires that our
>> interrupt
>>       *   routine should get called "relatively quickly" (compared to the
>>       *   10 ms buffer) and will be able to cancel this timer.
>>
>> ...but then I ran a whole bunch of tests and I found that, as far as I
>> could tell, we actually _were_ getting a super long interrupt latency.
>> Specifically I saw the printout "Unexpected interrupt latency" in my
>> patch.  In order to see that printout in my patch (which even starts
>> the command _before_ the CTO timer), the only explanation is bad
>> interrupt latency, right?  Also: based on my past experience I believe
>> it is possible to get upwards of 100 ms interrupt latency if you've
>> got serial console enabled.  printk, especially printk from an
>> interrupt context, can do some funny things.
>>
>
> Right! It makes sense to me now.
>
>>
>> ...but this stuff is always hard to get right, so if I messed up the
>> above please let me know!  I tried to think of all of the cases so it
>> would work no matter if delays happened in any random place but
>> concurrency is hard.
>
>
> Yes, it looks hard to get concurrency right. I have a comment for your
> DRTO case(patch 5). Let's do some brainstorm there.

Since your comments in this patch are positive and you've now added
your Reviewed-by to patch #5, I'm going to assume that you'd also like
your Reviewed-by on this patch?


Jaehoon: I think I have Shawn's review on all this series.  It would
be great if you could review them yourself and/or pick them up in your
tree.  Since they fix a regression on 4.14 we really don't want to
delay too long.  If you're busy, please yell and we can figure out a
way to get these in (either through Ulf directly or we should find
someone else to make a git tree and send a pull request).

Thanks!

-Doug

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH v2 3/5] mmc: dw_mmc: Add locking to the CTO timer
  2017-10-17 16:40         ` Doug Anderson
@ 2017-10-23 17:59           ` Doug Anderson
  2017-10-24  1:41             ` Jaehoon Chung
  0 siblings, 1 reply; 20+ messages in thread
From: Doug Anderson @ 2017-10-23 17:59 UTC (permalink / raw)
  To: Ulf Hansson, Jaehoon Chung
  Cc: Ziyuan Xu, Alexandru M Stan, open list:ARM/Rockchip SoC...,
	Brian Norris, linux-samsung-soc, Emil Renner Berthing, linux-mmc,
	linux-kernel, Shawn Lin

Hi,

On Tue, Oct 17, 2017 at 9:40 AM, Doug Anderson <dianders@chromium.org> wrote:
> ...
> ...
>> Yes, it looks hard to get concurrency right. I have a comment for your
>> DRTO case(patch 5). Let's do some brainstorm there.
>
> Since your comments in this patch are positive and you've now added
> your Reviewed-by to patch #5, I'm going to assume that you'd also like
> your Reviewed-by on this patch?
>
>
> Jaehoon: I think I have Shawn's review on all this series.  It would
> be great if you could review them yourself and/or pick them up in your
> tree.  Since they fix a regression on 4.14 we really don't want to
> delay too long.  If you're busy, please yell and we can figure out a
> way to get these in (either through Ulf directly or we should find
> someone else to make a git tree and send a pull request).

Ulf: I still haven't heard anything for Jaehoon.  Do you have any
interest in landing this series directly to your tree?  I think the
whole series has been reviewed by Shawn.  I'm happy to re-post with
collected tags or anything else you'd like.  It would be nice to get
the regression fixed sooner rather than later...

Thanks!  :)

-Doug

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH v2 3/5] mmc: dw_mmc: Add locking to the CTO timer
  2017-10-23 17:59           ` Doug Anderson
@ 2017-10-24  1:41             ` Jaehoon Chung
  0 siblings, 0 replies; 20+ messages in thread
From: Jaehoon Chung @ 2017-10-24  1:41 UTC (permalink / raw)
  To: Doug Anderson, Ulf Hansson
  Cc: Ziyuan Xu, Alexandru M Stan, open list:ARM/Rockchip SoC...,
	Brian Norris, linux-samsung-soc, Emil Renner Berthing, linux-mmc,
	linux-kernel, Shawn Lin

On 10/24/2017 02:59 AM, Doug Anderson wrote:
> Hi,
> 
> On Tue, Oct 17, 2017 at 9:40 AM, Doug Anderson <dianders@chromium.org> wrote:
>> ...
>> ...
>>> Yes, it looks hard to get concurrency right. I have a comment for your
>>> DRTO case(patch 5). Let's do some brainstorm there.
>>
>> Since your comments in this patch are positive and you've now added
>> your Reviewed-by to patch #5, I'm going to assume that you'd also like
>> your Reviewed-by on this patch?
>>
>>
>> Jaehoon: I think I have Shawn's review on all this series.  It would
>> be great if you could review them yourself and/or pick them up in your
>> tree.  Since they fix a regression on 4.14 we really don't want to
>> delay too long.  If you're busy, please yell and we can figure out a
>> way to get these in (either through Ulf directly or we should find
>> someone else to make a git tree and send a pull request).
> 
> Ulf: I still haven't heard anything for Jaehoon.  Do you have any
> interest in landing this series directly to your tree?  I think the
> whole series has been reviewed by Shawn.  I'm happy to re-post with
> collected tags or anything else you'd like.  It would be nice to get
> the regression fixed sooner rather than later...

Sorry. I didn't find this email in my mail-box. so i lost this.
Current i'm reading the comment history..Sorry for late. 

Best Regards,
Jaehoon Chung

> 
> Thanks!  :)
> 
> -Doug
> 
> 
> 

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH v2 0/5] mmc: dw_mmc: Fix the CTO timer patch, plus the DTO timer
  2017-10-12 20:11 [PATCH v2 0/5] mmc: dw_mmc: Fix the CTO timer patch, plus the DTO timer Douglas Anderson
                   ` (4 preceding siblings ...)
  2017-10-12 20:11 ` [PATCH v2 5/5] mmc: dw_mmc: Cleanup the DTO timer like the CTO one Douglas Anderson
@ 2017-10-30 11:40 ` Ulf Hansson
  2017-10-31  7:05   ` Shawn Lin
  5 siblings, 1 reply; 20+ messages in thread
From: Ulf Hansson @ 2017-10-30 11:40 UTC (permalink / raw)
  To: Douglas Anderson, Jaehoon Chung
  Cc: Shawn Lin, Ziyuan Xu, Alexandru Stan,
	open list:ARM/Rockchip SoC...,
	Brian Norris, linux-samsung-soc, kernel, linux-mmc, linux-kernel

On 12 October 2017 at 22:11, Douglas Anderson <dianders@chromium.org> wrote:
> Recently we landed 03de19212ea3 ("mmc: dw_mmc: introduce timer for
> broken command transfer over scheme").  I found a bunch of problems
> with that patch, so this series attempts to solve some of them.
>
> This also fixes the DTO timer in some of the same ways even though I
> haven't personally seen problems with the DTO timer.
>
> NOTE: this series has only been lighly tested so far.  I can at least
> reproduce the need for the CTO timer on one of my devices and so I can
> confirm that part still works.  As mentioned in the 3rd patch I also
> ran the mmc_test kernel module on this and did manage to see the 3rd
> patch doing something useful.
>
> Changes in v2:
> - Removed extra "int i"
> - Fix the DTO timeout calculation new for v2
> - Cleanup the DTO timer new for v2
>
> Douglas Anderson (5):
>   mmc: dw_mmc: cancel the CTO timer after a voltage switch
>   mmc: dw_mmc: Fix the CTO timeout calculation
>   mmc: dw_mmc: Add locking to the CTO timer
>   mmc: dw_mmc: Fix the DTO timeout calculation
>   mmc: dw_mmc: Cleanup the DTO timer like the CTO one
>
>  drivers/mmc/host/dw_mmc.c | 162 +++++++++++++++++++++++++++++++++++++++++-----
>  1 file changed, 146 insertions(+), 16 deletions(-)
>

Douglas, Jaehoon,

I decided to pick patch 1->4 for fixes and the patch 5 for next, that
should help us to get them more tested, while Jaehoon is still
catching up.

I can add ack/drop patches for yet a couple of days this week.

Kind regards
Uffe

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH v2 0/5] mmc: dw_mmc: Fix the CTO timer patch, plus the DTO timer
  2017-10-30 11:40 ` [PATCH v2 0/5] mmc: dw_mmc: Fix the CTO timer patch, plus the DTO timer Ulf Hansson
@ 2017-10-31  7:05   ` Shawn Lin
  2017-10-31 18:14     ` Doug Anderson
  0 siblings, 1 reply; 20+ messages in thread
From: Shawn Lin @ 2017-10-31  7:05 UTC (permalink / raw)
  To: Ulf Hansson, Douglas Anderson
  Cc: Jaehoon Chung, shawn.lin, Ziyuan Xu, Alexandru Stan,
	open list:ARM/Rockchip SoC...,
	Brian Norris, linux-samsung-soc, kernel, linux-mmc, linux-kernel

Hi Ulf,

On 2017/10/30 19:40, Ulf Hansson wrote:
> On 12 October 2017 at 22:11, Douglas Anderson <dianders@chromium.org> wrote:
>> Recently we landed 03de19212ea3 ("mmc: dw_mmc: introduce timer for
>> broken command transfer over scheme").  I found a bunch of problems
>> with that patch, so this series attempts to solve some of them.
>>
>> This also fixes the DTO timer in some of the same ways even though I
>> haven't personally seen problems with the DTO timer.
>>
>> NOTE: this series has only been lighly tested so far.  I can at least
>> reproduce the need for the CTO timer on one of my devices and so I can
>> confirm that part still works.  As mentioned in the 3rd patch I also
>> ran the mmc_test kernel module on this and did manage to see the 3rd
>> patch doing something useful.
>>
>> Changes in v2:
>> - Removed extra "int i"
>> - Fix the DTO timeout calculation new for v2
>> - Cleanup the DTO timer new for v2
>>
>> Douglas Anderson (5):
>>    mmc: dw_mmc: cancel the CTO timer after a voltage switch
>>    mmc: dw_mmc: Fix the CTO timeout calculation
>>    mmc: dw_mmc: Add locking to the CTO timer
>>    mmc: dw_mmc: Fix the DTO timeout calculation
>>    mmc: dw_mmc: Cleanup the DTO timer like the CTO one
>>
>>   drivers/mmc/host/dw_mmc.c | 162 +++++++++++++++++++++++++++++++++++++++++-----
>>   1 file changed, 146 insertions(+), 16 deletions(-)
>>
> 
> Douglas, Jaehoon,
> 
> I decided to pick patch 1->4 for fixes and the patch 5 for next, that
> should help us to get them more tested, while Jaehoon is still
> catching up.
> 
> I can add ack/drop patches for yet a couple of days this week.

Patch 4 introduce a warning:

warning: unused variable ‘irqflags’ [-Wunused-variable]

irqflags should be introduced in patch 5 in the same place.
As it seems patch 5 will be candidate for 4.15, so could you please
help fix patch 4 and 5 manually? Or Doug need to resend patch 4 and 5?



> 
> Kind regards
> Uffe
> --
> To unsubscribe from this list: send the line "unsubscribe linux-mmc" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 
> 
> 

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH v2 0/5] mmc: dw_mmc: Fix the CTO timer patch, plus the DTO timer
  2017-10-31  7:05   ` Shawn Lin
@ 2017-10-31 18:14     ` Doug Anderson
  2017-11-01 14:18       ` Ulf Hansson
  0 siblings, 1 reply; 20+ messages in thread
From: Doug Anderson @ 2017-10-31 18:14 UTC (permalink / raw)
  To: Shawn Lin
  Cc: Ulf Hansson, Jaehoon Chung, Ziyuan Xu, Alexandru Stan,
	open list:ARM/Rockchip SoC...,
	Brian Norris, linux-samsung-soc, Emil Renner Berthing, linux-mmc,
	linux-kernel

Hi,

On Tue, Oct 31, 2017 at 12:05 AM, Shawn Lin <shawn.lin@rock-chips.com> wrote:
> Hi Ulf,
>
>
> On 2017/10/30 19:40, Ulf Hansson wrote:
>>
>> On 12 October 2017 at 22:11, Douglas Anderson <dianders@chromium.org>
>> wrote:
>>>
>>> Recently we landed 03de19212ea3 ("mmc: dw_mmc: introduce timer for
>>> broken command transfer over scheme").  I found a bunch of problems
>>> with that patch, so this series attempts to solve some of them.
>>>
>>> This also fixes the DTO timer in some of the same ways even though I
>>> haven't personally seen problems with the DTO timer.
>>>
>>> NOTE: this series has only been lighly tested so far.  I can at least
>>> reproduce the need for the CTO timer on one of my devices and so I can
>>> confirm that part still works.  As mentioned in the 3rd patch I also
>>> ran the mmc_test kernel module on this and did manage to see the 3rd
>>> patch doing something useful.
>>>
>>> Changes in v2:
>>> - Removed extra "int i"
>>> - Fix the DTO timeout calculation new for v2
>>> - Cleanup the DTO timer new for v2
>>>
>>> Douglas Anderson (5):
>>>    mmc: dw_mmc: cancel the CTO timer after a voltage switch
>>>    mmc: dw_mmc: Fix the CTO timeout calculation
>>>    mmc: dw_mmc: Add locking to the CTO timer
>>>    mmc: dw_mmc: Fix the DTO timeout calculation
>>>    mmc: dw_mmc: Cleanup the DTO timer like the CTO one
>>>
>>>   drivers/mmc/host/dw_mmc.c | 162
>>> +++++++++++++++++++++++++++++++++++++++++-----
>>>   1 file changed, 146 insertions(+), 16 deletions(-)
>>>
>>
>> Douglas, Jaehoon,
>>
>> I decided to pick patch 1->4 for fixes and the patch 5 for next, that
>> should help us to get them more tested, while Jaehoon is still
>> catching up.
>>
>> I can add ack/drop patches for yet a couple of days this week.
>
>
> Patch 4 introduce a warning:
>
> warning: unused variable ‘irqflags’ [-Wunused-variable]
>
> irqflags should be introduced in patch 5 in the same place.
> As it seems patch 5 will be candidate for 4.15, so could you please
> help fix patch 4 and 5 manually? Or Doug need to resend patch 4 and 5?

Thanks for catching this and sorry for the dumb screwup.  Ulf: I'm
happy to do do whatever makes it easiest for you.  If you want me to
re-post it should be very quick.

-Doug

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH v2 0/5] mmc: dw_mmc: Fix the CTO timer patch, plus the DTO timer
  2017-10-31 18:14     ` Doug Anderson
@ 2017-11-01 14:18       ` Ulf Hansson
  0 siblings, 0 replies; 20+ messages in thread
From: Ulf Hansson @ 2017-11-01 14:18 UTC (permalink / raw)
  To: Doug Anderson
  Cc: Shawn Lin, Jaehoon Chung, Ziyuan Xu, Alexandru Stan,
	open list:ARM/Rockchip SoC...,
	Brian Norris, linux-samsung-soc, Emil Renner Berthing, linux-mmc,
	linux-kernel

On 31 October 2017 at 19:14, Doug Anderson <dianders@chromium.org> wrote:
> Hi,
>
> On Tue, Oct 31, 2017 at 12:05 AM, Shawn Lin <shawn.lin@rock-chips.com> wrote:
>> Hi Ulf,
>>
>>
>> On 2017/10/30 19:40, Ulf Hansson wrote:
>>>
>>> On 12 October 2017 at 22:11, Douglas Anderson <dianders@chromium.org>
>>> wrote:
>>>>
>>>> Recently we landed 03de19212ea3 ("mmc: dw_mmc: introduce timer for
>>>> broken command transfer over scheme").  I found a bunch of problems
>>>> with that patch, so this series attempts to solve some of them.
>>>>
>>>> This also fixes the DTO timer in some of the same ways even though I
>>>> haven't personally seen problems with the DTO timer.
>>>>
>>>> NOTE: this series has only been lighly tested so far.  I can at least
>>>> reproduce the need for the CTO timer on one of my devices and so I can
>>>> confirm that part still works.  As mentioned in the 3rd patch I also
>>>> ran the mmc_test kernel module on this and did manage to see the 3rd
>>>> patch doing something useful.
>>>>
>>>> Changes in v2:
>>>> - Removed extra "int i"
>>>> - Fix the DTO timeout calculation new for v2
>>>> - Cleanup the DTO timer new for v2
>>>>
>>>> Douglas Anderson (5):
>>>>    mmc: dw_mmc: cancel the CTO timer after a voltage switch
>>>>    mmc: dw_mmc: Fix the CTO timeout calculation
>>>>    mmc: dw_mmc: Add locking to the CTO timer
>>>>    mmc: dw_mmc: Fix the DTO timeout calculation
>>>>    mmc: dw_mmc: Cleanup the DTO timer like the CTO one
>>>>
>>>>   drivers/mmc/host/dw_mmc.c | 162
>>>> +++++++++++++++++++++++++++++++++++++++++-----
>>>>   1 file changed, 146 insertions(+), 16 deletions(-)
>>>>
>>>
>>> Douglas, Jaehoon,
>>>
>>> I decided to pick patch 1->4 for fixes and the patch 5 for next, that
>>> should help us to get them more tested, while Jaehoon is still
>>> catching up.
>>>
>>> I can add ack/drop patches for yet a couple of days this week.
>>
>>
>> Patch 4 introduce a warning:
>>
>> warning: unused variable ‘irqflags’ [-Wunused-variable]
>>
>> irqflags should be introduced in patch 5 in the same place.
>> As it seems patch 5 will be candidate for 4.15, so could you please
>> help fix patch 4 and 5 manually? Or Doug need to resend patch 4 and 5?
>
> Thanks for catching this and sorry for the dumb screwup.  Ulf: I'm
> happy to do do whatever makes it easiest for you.  If you want me to
> re-post it should be very quick.

I fixed it myself, by amending the commits, no worries this time!

Kind regards
Uffe

^ permalink raw reply	[flat|nested] 20+ messages in thread

end of thread, other threads:[~2017-11-01 14:19 UTC | newest]

Thread overview: 20+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-10-12 20:11 [PATCH v2 0/5] mmc: dw_mmc: Fix the CTO timer patch, plus the DTO timer Douglas Anderson
2017-10-12 20:11 ` [PATCH v2 1/5] mmc: dw_mmc: cancel the CTO timer after a voltage switch Douglas Anderson
2017-10-12 20:11 ` [PATCH v2 2/5] mmc: dw_mmc: Fix the CTO timeout calculation Douglas Anderson
2017-10-12 20:11 ` [PATCH v2 3/5] mmc: dw_mmc: Add locking to the CTO timer Douglas Anderson
2017-10-13  1:32   ` Shawn Lin
2017-10-13  4:20     ` Doug Anderson
2017-10-17  0:54       ` Shawn Lin
2017-10-17 16:40         ` Doug Anderson
2017-10-23 17:59           ` Doug Anderson
2017-10-24  1:41             ` Jaehoon Chung
2017-10-12 20:11 ` [PATCH v2 4/5] mmc: dw_mmc: Fix the DTO timeout calculation Douglas Anderson
2017-10-13  1:02   ` Shawn Lin
2017-10-12 20:11 ` [PATCH v2 5/5] mmc: dw_mmc: Cleanup the DTO timer like the CTO one Douglas Anderson
2017-10-17  1:17   ` Shawn Lin
2017-10-17  5:05     ` Doug Anderson
2017-10-17  6:33       ` Shawn Lin
2017-10-30 11:40 ` [PATCH v2 0/5] mmc: dw_mmc: Fix the CTO timer patch, plus the DTO timer Ulf Hansson
2017-10-31  7:05   ` Shawn Lin
2017-10-31 18:14     ` Doug Anderson
2017-11-01 14:18       ` Ulf Hansson

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).