[PATCH v2 0/3] brcmfmac: sdio: Deal better w/ transmission errors waking from idle

netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* [PATCH v2 0/3] brcmfmac: sdio: Deal better w/ transmission errors waking from idle
@ 2019-06-03 18:37 Douglas Anderson
  2019-06-03 18:37 ` [PATCH v2 1/3] Revert "brcmfmac: disable command decode in sdio_aos" Douglas Anderson
                   ` (2 more replies)
  0 siblings, 3 replies; 15+ messages in thread
From: Douglas Anderson @ 2019-06-03 18:37 UTC (permalink / raw)
  To: Ulf Hansson, Kalle Valo, Adrian Hunter, Arend van Spriel
  Cc: brcm80211-dev-list.pdl, linux-rockchip, Double Lo, briannorris,
	linux-wireless, Naveen Gupta, Madhan Mohan R, mka, Wright Feng,
	Chi-Hsien Lin, netdev, brcm80211-dev-list, Douglas Anderson,
	linux-mmc, Shawn Lin, YueHaibing, Hans de Goede, Hante Meuleman,
	Michael Trimarchi, Wolfram Sang, Franky Lin, David S. Miller,
	linux-kernel, Avri Altman

This series attempts to deal better with the expected transmission
errors that we get when waking up (from idle) the SDIO-based WiFi on
rk3288-veyron-minnie, rk3288-veyron-speedy, and rk3288-veyron-mickey.

Some details about those errors can be found in
<https://crbug.com/960222>, but to summarize it here: if we try to
send the wakeup command to the WiFi card at the same time it has
decided to wake up itself then it will behave badly on the SDIO bus.
This can cause timeouts or CRC errors.

When I tested on 4.19 and 4.20 these CRC errors can be seen to cause
re-tuning.  Since I am currently developing on 4.19 this was the
original problem I attempted to solve.

On mainline it turns out that you don't see the retuning errors but
you see tons of spam about timeouts trying to wakeup from sleep.  I
tracked down the commit that was causing that and have partially
reverted it here.  I have no real knowledge about Broadcom WiFi, but
the commit that was causing problems sounds (from the descriptioin) to
be a hack commit penalizing all Broadcom WiFi users because of a bug
in a Cypress SD controller.  I will let others comment if this is
truly the case and, if so, what the right solution should be.

There wasn't a good resolution on v1 and it's been a while, so I'm
sending out a v2.  Other than changing patch #1 to a full revert, the
only other changes here are just to the patch descriptions.

Changes in v2:
- A full revert, not just a partial one (Arend).  ...with explicit Cc.
- Updated commit message to clarify based on discussion of v1.

Douglas Anderson (3):
  Revert "brcmfmac: disable command decode in sdio_aos"
  mmc: core: API for temporarily disabling auto-retuning due to errors
  brcmfmac: sdio: Disable auto-tuning around commands expected to fail

 drivers/mmc/core/core.c                       | 27 +++++++++++++++++--
 .../broadcom/brcm80211/brcmfmac/sdio.c        |  9 +++----
 include/linux/mmc/core.h                      |  2 ++
 include/linux/mmc/host.h                      |  1 +
 4 files changed, 32 insertions(+), 7 deletions(-)

-- 
2.22.0.rc1.311.g5d7573a151-goog

^ permalink raw reply	[flat|nested] 15+ messages in thread

* [PATCH v2 1/3] Revert "brcmfmac: disable command decode in sdio_aos"
  2019-06-03 18:37 [PATCH v2 0/3] brcmfmac: sdio: Deal better w/ transmission errors waking from idle Douglas Anderson
@ 2019-06-03 18:37 ` Douglas Anderson
  2019-06-03 18:37 ` [PATCH v2 2/3] mmc: core: API for temporarily disabling auto-retuning due to errors Douglas Anderson
  2019-06-03 18:37 ` [PATCH v2 3/3] brcmfmac: sdio: Disable auto-tuning around commands expected to fail Douglas Anderson
  2 siblings, 0 replies; 15+ messages in thread
From: Douglas Anderson @ 2019-06-03 18:37 UTC (permalink / raw)
  To: Ulf Hansson, Kalle Valo, Adrian Hunter, Arend van Spriel
  Cc: brcm80211-dev-list.pdl, linux-rockchip, Double Lo, briannorris,
	linux-wireless, Naveen Gupta, Madhan Mohan R, mka, Wright Feng,
	Chi-Hsien Lin, netdev, brcm80211-dev-list, Douglas Anderson,
	Franky Lin, linux-kernel, Hans de Goede, Hante Meuleman,
	YueHaibing, David S. Miller

This reverts commit 29f6589140a10ece8c1d73f58043ea5b3473ab3e.

After that patch landed I find that my kernel log on
rk3288-veyron-minnie and rk3288-veyron-speedy is filled with:
brcmfmac: brcmf_sdio_bus_sleep: error while changing bus sleep state -110

This seems to happen every time the Broadcom WiFi transitions out of
sleep mode.  Reverting the commit fixes the problem for me, so that's
what this patch does.

Note that, in general, the justification in the original commit seemed
a little weak.  It looked like someone was testing on a SD card
controller that would sometimes die if there were CRC errors on the
bus.  This used to happen back in early days of dw_mmc (the controller
on my boards), but we fixed it.  Disabling a feature on all boards
just because one SD card controller is broken seems bad.

Fixes: 29f6589140a1 ("brcmfmac: disable command decode in sdio_aos")
Cc: Wright Feng <wright.feng@cypress.com>
Cc: Double Lo <double.lo@cypress.com>
Cc: Madhan Mohan R <madhanmohan.r@cypress.com>
Cc: Chi-Hsien Lin <chi-hsien.lin@cypress.com>
Signed-off-by: Douglas Anderson <dianders@chromium.org>
---

Changes in v2:
- A full revert, not just a partial one (Arend).  ...with explicit Cc.

 drivers/net/wireless/broadcom/brcm80211/brcmfmac/sdio.c | 6 +-----
 1 file changed, 1 insertion(+), 5 deletions(-)

diff --git a/drivers/net/wireless/broadcom/brcm80211/brcmfmac/sdio.c b/drivers/net/wireless/broadcom/brcm80211/brcmfmac/sdio.c
index 4e15ea57d4f5..4a750838d8cd 100644
--- a/drivers/net/wireless/broadcom/brcm80211/brcmfmac/sdio.c
+++ b/drivers/net/wireless/broadcom/brcm80211/brcmfmac/sdio.c
@@ -3364,11 +3364,7 @@ static int brcmf_sdio_download_firmware(struct brcmf_sdio *bus,
 
 static bool brcmf_sdio_aos_no_decode(struct brcmf_sdio *bus)
 {
-	if (bus->ci->chip == CY_CC_43012_CHIP_ID ||
-	    bus->ci->chip == CY_CC_4373_CHIP_ID ||
-	    bus->ci->chip == BRCM_CC_4339_CHIP_ID ||
-	    bus->ci->chip == BRCM_CC_4345_CHIP_ID ||
-	    bus->ci->chip == BRCM_CC_4354_CHIP_ID)
+	if (bus->ci->chip == CY_CC_43012_CHIP_ID)
 		return true;
 	else
 		return false;
-- 
2.22.0.rc1.311.g5d7573a151-goog


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH v2 2/3] mmc: core: API for temporarily disabling auto-retuning due to errors
  2019-06-03 18:37 [PATCH v2 0/3] brcmfmac: sdio: Deal better w/ transmission errors waking from idle Douglas Anderson
  2019-06-03 18:37 ` [PATCH v2 1/3] Revert "brcmfmac: disable command decode in sdio_aos" Douglas Anderson
@ 2019-06-03 18:37 ` Douglas Anderson
  2019-06-05  7:54   ` Arend Van Spriel
  2019-06-03 18:37 ` [PATCH v2 3/3] brcmfmac: sdio: Disable auto-tuning around commands expected to fail Douglas Anderson
  2 siblings, 1 reply; 15+ messages in thread
From: Douglas Anderson @ 2019-06-03 18:37 UTC (permalink / raw)
  To: Ulf Hansson, Kalle Valo, Adrian Hunter, Arend van Spriel
  Cc: brcm80211-dev-list.pdl, linux-rockchip, Double Lo, briannorris,
	linux-wireless, Naveen Gupta, Madhan Mohan R, mka, Wright Feng,
	Chi-Hsien Lin, netdev, brcm80211-dev-list, Douglas Anderson,
	linux-mmc, linux-kernel, Shawn Lin, Wolfram Sang, Avri Altman

Normally when the MMC core sees an "-EILSEQ" error returned by a host
controller then it will trigger a retuning of the card.  This is
generally a good idea.

However, if a command is expected to sometimes cause transfer errors
then these transfer errors shouldn't cause a re-tuning.  This
re-tuning will be a needless waste of time.  One example case where a
transfer is expected to cause errors is when transitioning between
idle (sometimes referred to as "sleep" in Broadcom code) and active
state on certain Broadcom WiFi cards.  Specifically if the card was
already transitioning between states when the command was sent it
could cause an error on the SDIO bus.

Let's add an API that the SDIO card drivers can call that will
temporarily disable the auto-tuning functionality.  Then we can add a
call to this in the Broadcom WiFi driver and any other driver that
might have similar needs.

NOTE: this makes the assumption that the card is already tuned well
enough that it's OK to disable the auto-retuning during one of these
error-prone situations.  Presumably the driver code performing the
error-prone transfer knows how to recover / retry from errors.  ...and
after we can get back to a state where transfers are no longer
error-prone then we can enable the auto-retuning again.  If we truly
find ourselves in a case where the card needs to be retuned sometimes
to handle one of these error-prone transfers then we can always try a
few transfers first without auto-retuning and then re-try with
auto-retuning if the first few fail.

Without this change on rk3288-veyron-minnie I periodically see this in
the logs of a machine just sitting there idle:
  dwmmc_rockchip ff0d0000.dwmmc: Successfully tuned phase to XYZ

Fixes: bd11e8bd03ca ("mmc: core: Flag re-tuning is needed on CRC errors")
Signed-off-by: Douglas Anderson <dianders@chromium.org>
---
Note that are are a whole boatload of different ways that we could
provide an API for the Broadcom WiFi SDIO driver.  This patch
illustrates one way but if maintainers feel strongly that this is too
ugly and have a better idea then I can give it a shot too.  From a
purist point of view I kinda felt that the "expect errors" really
belonged as part of the mmc_request structure, but getting it into
there meant changing a whole pile of core SD/MMC APIs.  Simply adding
it to the host seemed to match the current style better and was a less
intrusive change.

Changes in v2:
- Updated commit message to clarify based on discussion of v1.

 drivers/mmc/core/core.c  | 27 +++++++++++++++++++++++++--
 include/linux/mmc/core.h |  2 ++
 include/linux/mmc/host.h |  1 +
 3 files changed, 28 insertions(+), 2 deletions(-)

diff --git a/drivers/mmc/core/core.c b/drivers/mmc/core/core.c
index 6db36dc870b5..ba4f71aa8cd9 100644
--- a/drivers/mmc/core/core.c
+++ b/drivers/mmc/core/core.c
@@ -144,8 +144,9 @@ void mmc_request_done(struct mmc_host *host, struct mmc_request *mrq)
 	int err = cmd->error;
 
 	/* Flag re-tuning needed on CRC errors */
-	if ((cmd->opcode != MMC_SEND_TUNING_BLOCK &&
-	    cmd->opcode != MMC_SEND_TUNING_BLOCK_HS200) &&
+	if (cmd->opcode != MMC_SEND_TUNING_BLOCK &&
+	    cmd->opcode != MMC_SEND_TUNING_BLOCK_HS200 &&
+	    !host->expect_errors &&
 	    (err == -EILSEQ || (mrq->sbc && mrq->sbc->error == -EILSEQ) ||
 	    (mrq->data && mrq->data->error == -EILSEQ) ||
 	    (mrq->stop && mrq->stop->error == -EILSEQ)))
@@ -2163,6 +2164,28 @@ int mmc_sw_reset(struct mmc_host *host)
 }
 EXPORT_SYMBOL(mmc_sw_reset);
 
+void mmc_expect_errors_begin(struct mmc_host *host)
+{
+	unsigned long flags;
+
+	spin_lock_irqsave(&host->lock, flags);
+	WARN_ON(host->expect_errors);
+	host->expect_errors = true;
+	spin_unlock_irqrestore(&host->lock, flags);
+}
+EXPORT_SYMBOL_GPL(mmc_expect_errors_begin);
+
+void mmc_expect_errors_end(struct mmc_host *host)
+{
+	unsigned long flags;
+
+	spin_lock_irqsave(&host->lock, flags);
+	WARN_ON(!host->expect_errors);
+	host->expect_errors = false;
+	spin_unlock_irqrestore(&host->lock, flags);
+}
+EXPORT_SYMBOL_GPL(mmc_expect_errors_end);
+
 static int mmc_rescan_try_freq(struct mmc_host *host, unsigned freq)
 {
 	host->f_init = freq;
diff --git a/include/linux/mmc/core.h b/include/linux/mmc/core.h
index 134a6483347a..02a13abf0cda 100644
--- a/include/linux/mmc/core.h
+++ b/include/linux/mmc/core.h
@@ -178,6 +178,8 @@ int mmc_wait_for_cmd(struct mmc_host *host, struct mmc_command *cmd,
 
 int mmc_hw_reset(struct mmc_host *host);
 int mmc_sw_reset(struct mmc_host *host);
+void mmc_expect_errors_begin(struct mmc_host *host);
+void mmc_expect_errors_end(struct mmc_host *host);
 void mmc_set_data_timeout(struct mmc_data *data, const struct mmc_card *card);
 
 #endif /* LINUX_MMC_CORE_H */
diff --git a/include/linux/mmc/host.h b/include/linux/mmc/host.h
index 43d0f0c496f6..8d553fb8c834 100644
--- a/include/linux/mmc/host.h
+++ b/include/linux/mmc/host.h
@@ -398,6 +398,7 @@ struct mmc_host {
 	unsigned int		retune_now:1;	/* do re-tuning at next req */
 	unsigned int		retune_paused:1; /* re-tuning is temporarily disabled */
 	unsigned int		use_blk_mq:1;	/* use blk-mq */
+	unsigned int		expect_errors:1; /* don't trigger retune upon errors */
 
 	int			rescan_disable;	/* disable card detection */
 	int			rescan_entered;	/* used with nonremovable devices */
-- 
2.22.0.rc1.311.g5d7573a151-goog


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH v2 3/3] brcmfmac: sdio: Disable auto-tuning around commands expected to fail
  2019-06-03 18:37 [PATCH v2 0/3] brcmfmac: sdio: Deal better w/ transmission errors waking from idle Douglas Anderson
  2019-06-03 18:37 ` [PATCH v2 1/3] Revert "brcmfmac: disable command decode in sdio_aos" Douglas Anderson
  2019-06-03 18:37 ` [PATCH v2 2/3] mmc: core: API for temporarily disabling auto-retuning due to errors Douglas Anderson
@ 2019-06-03 18:37 ` Douglas Anderson
  2019-06-06 13:59   ` Adrian Hunter
  2 siblings, 1 reply; 15+ messages in thread
From: Douglas Anderson @ 2019-06-03 18:37 UTC (permalink / raw)
  To: Ulf Hansson, Kalle Valo, Adrian Hunter, Arend van Spriel
  Cc: brcm80211-dev-list.pdl, linux-rockchip, Double Lo, briannorris,
	linux-wireless, Naveen Gupta, Madhan Mohan R, mka, Wright Feng,
	Chi-Hsien Lin, netdev, brcm80211-dev-list, Douglas Anderson,
	David S. Miller, Franky Lin, linux-kernel, Hante Meuleman,
	YueHaibing, Michael Trimarchi

There are certain cases, notably when transitioning between sleep and
active state, when Broadcom SDIO WiFi cards will produce errors on the
SDIO bus.  This is evident from the source code where you can see that
we try commands in a loop until we either get success or we've tried
too many times.  The comment in the code reinforces this by saying
"just one write attempt may fail"

Unfortunately these failures sometimes end up causing an "-EILSEQ"
back to the core which triggers a retuning of the SDIO card and that
blocks all traffic to the card until it's done.

Let's disable retuning around the commands we expect might fail.

Fixes: bd11e8bd03ca ("mmc: core: Flag re-tuning is needed on CRC errors")
Signed-off-by: Douglas Anderson <dianders@chromium.org>
---

Changes in v2: None

 drivers/net/wireless/broadcom/brcm80211/brcmfmac/sdio.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/drivers/net/wireless/broadcom/brcm80211/brcmfmac/sdio.c b/drivers/net/wireless/broadcom/brcm80211/brcmfmac/sdio.c
index 4a750838d8cd..e0cfcc078a54 100644
--- a/drivers/net/wireless/broadcom/brcm80211/brcmfmac/sdio.c
+++ b/drivers/net/wireless/broadcom/brcm80211/brcmfmac/sdio.c
@@ -16,6 +16,7 @@
 #include <linux/mmc/sdio_ids.h>
 #include <linux/mmc/sdio_func.h>
 #include <linux/mmc/card.h>
+#include <linux/mmc/core.h>
 #include <linux/semaphore.h>
 #include <linux/firmware.h>
 #include <linux/module.h>
@@ -697,6 +698,7 @@ brcmf_sdio_kso_control(struct brcmf_sdio *bus, bool on)
 		bmask = SBSDIO_FUNC1_SLEEPCSR_KSO_MASK;
 	}
 
+	mmc_expect_errors_begin(bus->sdiodev->func1->card->host);
 	do {
 		/* reliable KSO bit set/clr:
 		 * the sdiod sleep write access is synced to PMU 32khz clk
@@ -719,6 +721,7 @@ brcmf_sdio_kso_control(struct brcmf_sdio *bus, bool on)
 				   &err);
 
 	} while (try_cnt++ < MAX_KSO_ATTEMPTS);
+	mmc_expect_errors_end(bus->sdiodev->func1->card->host);
 
 	if (try_cnt > 2)
 		brcmf_dbg(SDIO, "try_cnt=%d rd_val=0x%x err=%d\n", try_cnt,
-- 
2.22.0.rc1.311.g5d7573a151-goog


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* Re: [PATCH v2 2/3] mmc: core: API for temporarily disabling auto-retuning due to errors
  2019-06-03 18:37 ` [PATCH v2 2/3] mmc: core: API for temporarily disabling auto-retuning due to errors Douglas Anderson
@ 2019-06-05  7:54   ` Arend Van Spriel
  2019-06-05 22:51     ` Doug Anderson
  0 siblings, 1 reply; 15+ messages in thread
From: Arend Van Spriel @ 2019-06-05  7:54 UTC (permalink / raw)
  To: Douglas Anderson, Ulf Hansson, Kalle Valo, Adrian Hunter
  Cc: brcm80211-dev-list.pdl, linux-rockchip, Double Lo, briannorris,
	linux-wireless, Naveen Gupta, Madhan Mohan R, mka, Wright Feng,
	Chi-Hsien Lin, netdev, brcm80211-dev-list, linux-mmc,
	linux-kernel, Shawn Lin, Wolfram Sang, Avri Altman

On 6/3/2019 8:37 PM, Douglas Anderson wrote:
> Normally when the MMC core sees an "-EILSEQ" error returned by a host
> controller then it will trigger a retuning of the card.  This is
> generally a good idea.
> 
> However, if a command is expected to sometimes cause transfer errors
> then these transfer errors shouldn't cause a re-tuning.  This
> re-tuning will be a needless waste of time.  One example case where a
> transfer is expected to cause errors is when transitioning between
> idle (sometimes referred to as "sleep" in Broadcom code) and active
> state on certain Broadcom WiFi cards.  Specifically if the card was
> already transitioning between states when the command was sent it
> could cause an error on the SDIO bus.
> 
> Let's add an API that the SDIO card drivers can call that will
> temporarily disable the auto-tuning functionality.  Then we can add a
> call to this in the Broadcom WiFi driver and any other driver that
> might have similar needs.
> 
> NOTE: this makes the assumption that the card is already tuned well
> enough that it's OK to disable the auto-retuning during one of these
> error-prone situations.  Presumably the driver code performing the
> error-prone transfer knows how to recover / retry from errors.  ...and
> after we can get back to a state where transfers are no longer
> error-prone then we can enable the auto-retuning again.  If we truly
> find ourselves in a case where the card needs to be retuned sometimes
> to handle one of these error-prone transfers then we can always try a
> few transfers first without auto-retuning and then re-try with
> auto-retuning if the first few fail.
> 
> Without this change on rk3288-veyron-minnie I periodically see this in
> the logs of a machine just sitting there idle:
>    dwmmc_rockchip ff0d0000.dwmmc: Successfully tuned phase to XYZ
> 
> Fixes: bd11e8bd03ca ("mmc: core: Flag re-tuning is needed on CRC errors")
> Signed-off-by: Douglas Anderson <dianders@chromium.org>
> ---
> Note that are are a whole boatload of different ways that we could
> provide an API for the Broadcom WiFi SDIO driver.  This patch
> illustrates one way but if maintainers feel strongly that this is too
> ugly and have a better idea then I can give it a shot too.  From a
> purist point of view I kinda felt that the "expect errors" really
> belonged as part of the mmc_request structure, but getting it into
> there meant changing a whole pile of core SD/MMC APIs.  Simply adding
> it to the host seemed to match the current style better and was a less
> intrusive change.

Hi Doug,

Sorry for bringing this up, but there used to be an issue with retuning 
in general, ie. the device handled tuning command 19 only once after 
startup. I guess that is no longer an issue given your results. I guess 
the problem goes away when you disable device sleep functionality. No 
what you want in terms of power consumption, but would be good to know. 
You can disable it with below patch.

Regards,
Arend
---
diff --git a/drivers/net/wireless/broadcom/brcm80211/brcmfmac/sdio.c 
b/drivers
index 15a40fd..18e90bd 100644
--- a/drivers/net/wireless/broadcom/brcm80211/brcmfmac/sdio.c
+++ b/drivers/net/wireless/broadcom/brcm80211/brcmfmac/sdio.c
@@ -307,7 +307,7 @@ struct rte_console {
  #define BRCMF_IDLE_ACTIVE      0       /* Do not request any SD clock 
change
                                          * when idle
                                          */
-#define BRCMF_IDLE_INTERVAL    1
+#define BRCMF_IDLE_INTERVAL    0

  #define KSO_WAIT_US 50
  #define MAX_KSO_ATTEMPTS (PMU_MAX_TRANSITION_DLY/KSO_WAIT_US)


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* Re: [PATCH v2 2/3] mmc: core: API for temporarily disabling auto-retuning due to errors
  2019-06-05  7:54   ` Arend Van Spriel
@ 2019-06-05 22:51     ` Doug Anderson
  0 siblings, 0 replies; 15+ messages in thread
From: Doug Anderson @ 2019-06-05 22:51 UTC (permalink / raw)
  To: Arend Van Spriel
  Cc: Ulf Hansson, Kalle Valo, Adrian Hunter, brcm80211-dev-list.pdl,
	open list:ARM/Rockchip SoC...,
	Double Lo, Brian Norris, linux-wireless, Naveen Gupta,
	Madhan Mohan R, Matthias Kaehlcke, Wright Feng, Chi-Hsien Lin,
	netdev, brcm80211-dev-list, Linux MMC List, LKML, Shawn Lin,
	Wolfram Sang, Avri Altman

Hi,

On Wed, Jun 5, 2019 at 12:54 AM Arend Van Spriel
<arend.vanspriel@broadcom.com> wrote:
>
> On 6/3/2019 8:37 PM, Douglas Anderson wrote:
> > Normally when the MMC core sees an "-EILSEQ" error returned by a host
> > controller then it will trigger a retuning of the card.  This is
> > generally a good idea.
> >
> > However, if a command is expected to sometimes cause transfer errors
> > then these transfer errors shouldn't cause a re-tuning.  This
> > re-tuning will be a needless waste of time.  One example case where a
> > transfer is expected to cause errors is when transitioning between
> > idle (sometimes referred to as "sleep" in Broadcom code) and active
> > state on certain Broadcom WiFi cards.  Specifically if the card was
> > already transitioning between states when the command was sent it
> > could cause an error on the SDIO bus.
> >
> > Let's add an API that the SDIO card drivers can call that will
> > temporarily disable the auto-tuning functionality.  Then we can add a
> > call to this in the Broadcom WiFi driver and any other driver that
> > might have similar needs.
> >
> > NOTE: this makes the assumption that the card is already tuned well
> > enough that it's OK to disable the auto-retuning during one of these
> > error-prone situations.  Presumably the driver code performing the
> > error-prone transfer knows how to recover / retry from errors.  ...and
> > after we can get back to a state where transfers are no longer
> > error-prone then we can enable the auto-retuning again.  If we truly
> > find ourselves in a case where the card needs to be retuned sometimes
> > to handle one of these error-prone transfers then we can always try a
> > few transfers first without auto-retuning and then re-try with
> > auto-retuning if the first few fail.
> >
> > Without this change on rk3288-veyron-minnie I periodically see this in
> > the logs of a machine just sitting there idle:
> >    dwmmc_rockchip ff0d0000.dwmmc: Successfully tuned phase to XYZ
> >
> > Fixes: bd11e8bd03ca ("mmc: core: Flag re-tuning is needed on CRC errors")
> > Signed-off-by: Douglas Anderson <dianders@chromium.org>
> > ---
> > Note that are are a whole boatload of different ways that we could
> > provide an API for the Broadcom WiFi SDIO driver.  This patch
> > illustrates one way but if maintainers feel strongly that this is too
> > ugly and have a better idea then I can give it a shot too.  From a
> > purist point of view I kinda felt that the "expect errors" really
> > belonged as part of the mmc_request structure, but getting it into
> > there meant changing a whole pile of core SD/MMC APIs.  Simply adding
> > it to the host seemed to match the current style better and was a less
> > intrusive change.
>
> Hi Doug,
>
> Sorry for bringing this up, but there used to be an issue with retuning
> in general, ie. the device handled tuning command 19 only once after
> startup. I guess that is no longer an issue given your results.

Right.  It definitely used to just happen once at bootup and you were
out of luck if that value was bad for some reason or if conditions
changed.  In cases in my own personal experience it was actually fine
to just tune once at bootup once all the tuning bugs in the controller
were fixed.  ...but I can imagine that some controllers could use
delay elements that drift more.  ...and in any case if you're getting
CRC errors trying a re-tuning seems a sensible thing to do anyway
(unless the CRC error was expected).

Looking at commit bd11e8bd03ca ("mmc: core: Flag re-tuning is needed
on CRC errors") you can definitely see evidence that tuning can happen
again after bootup.


> I guess
> the problem goes away when you disable device sleep functionality. No
> what you want in terms of power consumption, but would be good to know.
> You can disable it with below patch.

I can try testing this if it's useful, but I'm not sure what it will
prove.  I definitely don't want to disable device sleep, so it's not a
long term solution.  Are you just looking for extra evidence that this
is indeed my problem?  I don't think I need any extra evidence, do I?
The fact that patch #3 in this series fixes my problems (plus
debugging I did to arrive at that patch) means we absolutely know that
brcmf_sdio_kso_control() is responsible for the CRC errors that caused
the unneeded tuning.  Setting BRCMF_IDLE_INTERVAL to 0 will
effectively prevent brcmf_sdio_kso_control() from doing anything
useful (except in full system suspend, but that's not the case I was
testing anyway).

-Doug

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH v2 3/3] brcmfmac: sdio: Disable auto-tuning around commands expected to fail
  2019-06-03 18:37 ` [PATCH v2 3/3] brcmfmac: sdio: Disable auto-tuning around commands expected to fail Douglas Anderson
@ 2019-06-06 13:59   ` Adrian Hunter
  2019-06-06 21:37     ` Doug Anderson
  0 siblings, 1 reply; 15+ messages in thread
From: Adrian Hunter @ 2019-06-06 13:59 UTC (permalink / raw)
  To: Douglas Anderson, Ulf Hansson, Kalle Valo, Arend van Spriel
  Cc: brcm80211-dev-list.pdl, linux-rockchip, Double Lo, briannorris,
	linux-wireless, Naveen Gupta, Madhan Mohan R, mka, Wright Feng,
	Chi-Hsien Lin, netdev, brcm80211-dev-list, David S. Miller,
	Franky Lin, linux-kernel, Hante Meuleman, YueHaibing,
	Michael Trimarchi

On 3/06/19 9:37 PM, Douglas Anderson wrote:
> There are certain cases, notably when transitioning between sleep and
> active state, when Broadcom SDIO WiFi cards will produce errors on the
> SDIO bus.  This is evident from the source code where you can see that
> we try commands in a loop until we either get success or we've tried
> too many times.  The comment in the code reinforces this by saying
> "just one write attempt may fail"
> 
> Unfortunately these failures sometimes end up causing an "-EILSEQ"
> back to the core which triggers a retuning of the SDIO card and that
> blocks all traffic to the card until it's done.
> 
> Let's disable retuning around the commands we expect might fail.

It seems to me that re-tuning needs to be prevented before the
first access otherwise it might be attempted there, and it needs
to continue to be prevented during the transition when it might
reasonably be expected to fail.

What about something along these lines:

diff --git a/drivers/net/wireless/broadcom/brcm80211/brcmfmac/sdio.c b/drivers/net/wireless/broadcom/brcm80211/brcmfmac/sdio.c
index 4e15ea57d4f5..d932780ef56e 100644
--- a/drivers/net/wireless/broadcom/brcm80211/brcmfmac/sdio.c
+++ b/drivers/net/wireless/broadcom/brcm80211/brcmfmac/sdio.c
@@ -664,9 +664,18 @@ brcmf_sdio_kso_control(struct brcmf_sdio *bus, bool on)
 	int err = 0;
 	int err_cnt = 0;
 	int try_cnt = 0;
+	int need_retune = 0;
+	bool retune_release = false;
 
 	brcmf_dbg(TRACE, "Enter: on=%d\n", on);
 
+	/* Cannot re-tune if device is asleep */
+	if (on) {
+		need_retune = sdio_retune_get_needed(bus->sdiodev->func1); // TODO: host->can_retune ? host->need_retune : 0
+		sdio_retune_hold_now(bus->sdiodev->func1); // TODO: add sdio_retune_hold_now()
+		retune_release = true;
+	}
+
 	wr_val = (on << SBSDIO_FUNC1_SLEEPCSR_KSO_SHIFT);
 	/* 1st KSO write goes to AOS wake up core if device is asleep  */
 	brcmf_sdiod_writeb(bus->sdiodev, SBSDIO_FUNC1_SLEEPCSR, wr_val, &err);
@@ -711,8 +720,16 @@ brcmf_sdio_kso_control(struct brcmf_sdio *bus, bool on)
 			err_cnt = 0;
 		}
 		/* bail out upon subsequent access errors */
-		if (err && (err_cnt++ > BRCMF_SDIO_MAX_ACCESS_ERRORS))
-			break;
+		if (err && (err_cnt++ > BRCMF_SDIO_MAX_ACCESS_ERRORS)) {
+			if (!retune_release)
+				break;
+			/*
+			 * Allow one more retry with re-tuning released in case
+			 * it helps.
+			 */
+			sdio_retune_release(bus->sdiodev->func1);
+			retune_release = false;
+		}
 
 		udelay(KSO_WAIT_US);
 		brcmf_sdiod_writeb(bus->sdiodev, SBSDIO_FUNC1_SLEEPCSR, wr_val,
@@ -727,6 +744,18 @@ brcmf_sdio_kso_control(struct brcmf_sdio *bus, bool on)
 	if (try_cnt > MAX_KSO_ATTEMPTS)
 		brcmf_err("max tries: rd_val=0x%x err=%d\n", rd_val, err);
 
+	if (retune_release) {
+		/*
+		 * CRC errors are not unexpected during the transition but they
+		 * also trigger re-tuning. Clear that here to avoid an
+		 * unnecessary re-tune if it wasn't already triggered to start
+		 * with.
+		 */
+		if (!need_retune)
+			sdio_retune_clear_needed(bus->sdiodev->func1); // TODO: host->need_retune = 0
+		sdio_retune_release(bus->sdiodev->func1); // TODO: add sdio_retune_release()
+	}
+
 	return err;
 }
 

^ permalink raw reply related	[flat|nested] 15+ messages in thread

* Re: [PATCH v2 3/3] brcmfmac: sdio: Disable auto-tuning around commands expected to fail
  2019-06-06 13:59   ` Adrian Hunter
@ 2019-06-06 21:37     ` Doug Anderson
  2019-06-07  5:12       ` Arend Van Spriel
  2019-06-07 12:28       ` Adrian Hunter
  0 siblings, 2 replies; 15+ messages in thread
From: Doug Anderson @ 2019-06-06 21:37 UTC (permalink / raw)
  To: Adrian Hunter
  Cc: Ulf Hansson, Kalle Valo, Arend van Spriel,
	brcm80211-dev-list.pdl, open list:ARM/Rockchip SoC...,
	Double Lo, Brian Norris, linux-wireless, Naveen Gupta,
	Madhan Mohan R, Matthias Kaehlcke, Wright Feng, Chi-Hsien Lin,
	netdev, brcm80211-dev-list, David S. Miller, Franky Lin, LKML,
	Hante Meuleman, YueHaibing, Michael Trimarchi

Hi,

On Thu, Jun 6, 2019 at 7:00 AM Adrian Hunter <adrian.hunter@intel.com> wrote:
>
> On 3/06/19 9:37 PM, Douglas Anderson wrote:
> > There are certain cases, notably when transitioning between sleep and
> > active state, when Broadcom SDIO WiFi cards will produce errors on the
> > SDIO bus.  This is evident from the source code where you can see that
> > we try commands in a loop until we either get success or we've tried
> > too many times.  The comment in the code reinforces this by saying
> > "just one write attempt may fail"
> >
> > Unfortunately these failures sometimes end up causing an "-EILSEQ"
> > back to the core which triggers a retuning of the SDIO card and that
> > blocks all traffic to the card until it's done.
> >
> > Let's disable retuning around the commands we expect might fail.
>
> It seems to me that re-tuning needs to be prevented before the
> first access otherwise it might be attempted there,

By this I think you mean I wasn't starting my section early enough to
catch the "1st KSO write".  Oops.  Thanks!

> and it needs
> to continue to be prevented during the transition when it might
> reasonably be expected to fail.
>
> What about something along these lines:
>
> diff --git a/drivers/net/wireless/broadcom/brcm80211/brcmfmac/sdio.c b/drivers/net/wireless/broadcom/brcm80211/brcmfmac/sdio.c
> index 4e15ea57d4f5..d932780ef56e 100644
> --- a/drivers/net/wireless/broadcom/brcm80211/brcmfmac/sdio.c
> +++ b/drivers/net/wireless/broadcom/brcm80211/brcmfmac/sdio.c
> @@ -664,9 +664,18 @@ brcmf_sdio_kso_control(struct brcmf_sdio *bus, bool on)
>         int err = 0;
>         int err_cnt = 0;
>         int try_cnt = 0;
> +       int need_retune = 0;
> +       bool retune_release = false;
>
>         brcmf_dbg(TRACE, "Enter: on=%d\n", on);
>
> +       /* Cannot re-tune if device is asleep */
> +       if (on) {
> +               need_retune = sdio_retune_get_needed(bus->sdiodev->func1); // TODO: host->can_retune ? host->need_retune : 0
> +               sdio_retune_hold_now(bus->sdiodev->func1); // TODO: add sdio_retune_hold_now()
> +               retune_release = true;
> +       }

The code below still has retries even for the "!on" case.  That
implies that you could still get CRC errors from the card in the "!on"
direction too.  Any reason why we shouldn't just hold retuning even
for the !on case?

> +
>         wr_val = (on << SBSDIO_FUNC1_SLEEPCSR_KSO_SHIFT);
>         /* 1st KSO write goes to AOS wake up core if device is asleep  */
>         brcmf_sdiod_writeb(bus->sdiodev, SBSDIO_FUNC1_SLEEPCSR, wr_val, &err);
> @@ -711,8 +720,16 @@ brcmf_sdio_kso_control(struct brcmf_sdio *bus, bool on)
>                         err_cnt = 0;
>                 }
>                 /* bail out upon subsequent access errors */
> -               if (err && (err_cnt++ > BRCMF_SDIO_MAX_ACCESS_ERRORS))
> -                       break;
> +               if (err && (err_cnt++ > BRCMF_SDIO_MAX_ACCESS_ERRORS)) {
> +                       if (!retune_release)
> +                               break;
> +                       /*
> +                        * Allow one more retry with re-tuning released in case
> +                        * it helps.
> +                        */
> +                       sdio_retune_release(bus->sdiodev->func1);
> +                       retune_release = false;

I would be tempted to wait before adding this logic until we actually
see that it's needed.  Sure, doing one more transfer probably won't
really hurt, but until we know that it actually helps it seems like
we're just adding extra complexity?

> +               }
>
>                 udelay(KSO_WAIT_US);
>                 brcmf_sdiod_writeb(bus->sdiodev, SBSDIO_FUNC1_SLEEPCSR, wr_val,
> @@ -727,6 +744,18 @@ brcmf_sdio_kso_control(struct brcmf_sdio *bus, bool on)
>         if (try_cnt > MAX_KSO_ATTEMPTS)
>                 brcmf_err("max tries: rd_val=0x%x err=%d\n", rd_val, err);
>
> +       if (retune_release) {
> +               /*
> +                * CRC errors are not unexpected during the transition but they
> +                * also trigger re-tuning. Clear that here to avoid an
> +                * unnecessary re-tune if it wasn't already triggered to start
> +                * with.
> +                */
> +               if (!need_retune)
> +                       sdio_retune_clear_needed(bus->sdiodev->func1); // TODO: host->need_retune = 0
> +               sdio_retune_release(bus->sdiodev->func1); // TODO: add sdio_retune_release()
> +       }

Every time I re-look at this I have to re-figure out all the subtle
differences between the variables and functions involved here.  Let me
see if I got everything right:

* need_retune: set to 1 if we can retune and some event happened that
makes us truly believe that we need to be retuned, like we got a CRC
error or a timer expired or our host controller told us to retune.

* retune_now: set to 1 it's an OK time to be retuning.  Specifically
if retune_now is false we won't send any retuning commands but we'll
still keep track of the need to retune.

* hold_retune: If this gets set to 1 by mmc_retune_hold_now() then a
future call to mmc_retune_hold() will _not_ schedule a retune by
setting retune_now (because mmc_retune_hold() will see that
hold_retune was already 1).  ...and a future call to
mmc_retune_recheck() between mmc_hold() and mmc_release() will also
not schedule a retune because hold_retune will be 2 (or generally >
1).

---

So overall trying to summarize what I think are the differences
between your patch and my patch.

1. If we needed to re-tune _before_ calling brcmf_sdio_kso_control(),
with your patch we'll make sure that we don't actually attempt to
retune until brcmf_sdio_kso_control() finishes.

2. If we needed to retune during brcmf_sdio_kso_control() (because a
timer expired?) then we wouldn't trigger that retune while
brcmf_sdio_kso_control() is running.

In the case of dw_mmc, which I'm most familiar with, we don't have any
sort of automated or timed-based retuning.  ...so we'll only re-tune
when we see the CRC error.  If I'm understanding things correctly then
that for dw_mmc my solution and yours behave the same.  That means the
difference is how we deal with other retuning requests, either ones
that come about because of an interrupt that the host controller
provided or because of a timer.  Did I get that right?

...and I guess the reason we have to deal specially with these cases
is because any time that SDIO card is "sleeping" we don't want to
retune because it won't work.  Right?  NOTE: the solution that would
come to my mind first to solve this would be to hold the retuning for
the whole time that the card was sleeping and then release it once the
card was awake again.  ...but I guess we don't truly need to do that
because tuning only happens as a side effect of sending a command to
the card and the only command we send to the card is the "wake up"
command.  That's why your solution to hold tuning while sending the
"wake up" command works, right?

---

OK, so assuming all the above is correct, I feel like we're actually
solving two problems and in fact I believe we actually need both our
approaches to solve everything correctly.  With just your patch in
place there's a problem because we will clobber any external retuning
requests that happened while we were waking up the card.  AKA, imagine
this:

A) brcmf_sdio_kso_control(on=True) gets called; need_retune starts as 0

B) We call sdio_retune_hold_now()

C) A retuning timer goes off or the SD Host controller tells us to retune

D) We get to the end of brcmf_sdio_kso_control() and clear the "retune
needed" since need_retune was 0 at the start.

...so we dropped the retuning request from C), right?

What we truly need is:

1. CRC errors shouldn't trigger a retuning request when we're in
brcmf_sdio_kso_control()

2. A separate patch that holds any retuning requests while the SDIO
card is off.  This patch _shouldn't_ do any clearing of retuning
requests, just defer them.

Does that make sense to you?  If so, I can try to code it up...

-Doug

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH v2 3/3] brcmfmac: sdio: Disable auto-tuning around commands expected to fail
  2019-06-06 21:37     ` Doug Anderson
@ 2019-06-07  5:12       ` Arend Van Spriel
  2019-06-07 12:38         ` Adrian Hunter
  2019-06-07 12:28       ` Adrian Hunter
  1 sibling, 1 reply; 15+ messages in thread
From: Arend Van Spriel @ 2019-06-07  5:12 UTC (permalink / raw)
  To: Doug Anderson, Adrian Hunter
  Cc: Ulf Hansson, Kalle Valo, brcm80211-dev-list.pdl,
	open list:ARM/Rockchip SoC...,
	Double Lo, Brian Norris, linux-wireless, Naveen Gupta,
	Madhan Mohan R, Matthias Kaehlcke, Wright Feng, Chi-Hsien Lin,
	netdev, brcm80211-dev-list, David S. Miller, Franky Lin, LKML,
	Hante Meuleman, YueHaibing, Michael Trimarchi

On June 6, 2019 11:37:22 PM Doug Anderson <dianders@chromium.org> wrote:
>
> In the case of dw_mmc, which I'm most familiar with, we don't have any
> sort of automated or timed-based retuning.  ...so we'll only re-tune
> when we see the CRC error.  If I'm understanding things correctly then
> that for dw_mmc my solution and yours behave the same.  That means the
> difference is how we deal with other retuning requests, either ones
> that come about because of an interrupt that the host controller
> provided or because of a timer.  Did I get that right?

Right.

> ...and I guess the reason we have to deal specially with these cases
> is because any time that SDIO card is "sleeping" we don't want to
> retune because it won't work.  Right?  NOTE: the solution that would
> come to my mind first to solve this would be to hold the retuning for
> the whole time that the card was sleeping and then release it once the
> card was awake again.  ...but I guess we don't truly need to do that
> because tuning only happens as a side effect of sending a command to
> the card and the only command we send to the card is the "wake up"
> command.  That's why your solution to hold tuning while sending the
> "wake up" command works, right?

Yup.

> ---
>
> OK, so assuming all the above is correct, I feel like we're actually
> solving two problems and in fact I believe we actually need both our
> approaches to solve everything correctly.  With just your patch in
> place there's a problem because we will clobber any external retuning
> requests that happened while we were waking up the card.  AKA, imagine
> this:
>
> A) brcmf_sdio_kso_control(on=True) gets called; need_retune starts as 0
>
> B) We call sdio_retune_hold_now()
>
> C) A retuning timer goes off or the SD Host controller tells us to retune
>
> D) We get to the end of brcmf_sdio_kso_control() and clear the "retune
> needed" since need_retune was 0 at the start.
>
> ...so we dropped the retuning request from C), right?
>
>
> What we truly need is:
>
> 1. CRC errors shouldn't trigger a retuning request when we're in
> brcmf_sdio_kso_control()
>
> 2. A separate patch that holds any retuning requests while the SDIO
> card is off.  This patch _shouldn't_ do any clearing of retuning
> requests, just defer them.
>
>
> Does that make sense to you?  If so, I can try to code it up...

FWIW it does make sense to me. However, I am still not sure if our sdio 
hardware supports retuning. Have to track down an asic designer who can 
tell or dive into vhdl myself.

So I want to disable device sleep and trigger retuning through debugfs or 
some other hack.

Regards,
Arend



^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH v2 3/3] brcmfmac: sdio: Disable auto-tuning around commands expected to fail
  2019-06-06 21:37     ` Doug Anderson
  2019-06-07  5:12       ` Arend Van Spriel
@ 2019-06-07 12:28       ` Adrian Hunter
  2019-06-07 18:00         ` Doug Anderson
  1 sibling, 1 reply; 15+ messages in thread
From: Adrian Hunter @ 2019-06-07 12:28 UTC (permalink / raw)
  To: Doug Anderson
  Cc: Ulf Hansson, Kalle Valo, Arend van Spriel,
	brcm80211-dev-list.pdl, open list:ARM/Rockchip SoC...,
	Double Lo, Brian Norris, linux-wireless, Naveen Gupta,
	Madhan Mohan R, Matthias Kaehlcke, Wright Feng, Chi-Hsien Lin,
	netdev, brcm80211-dev-list, David S. Miller, Franky Lin, LKML,
	Hante Meuleman, YueHaibing, Michael Trimarchi

On 7/06/19 12:37 AM, Doug Anderson wrote:
> Hi,
> 
> On Thu, Jun 6, 2019 at 7:00 AM Adrian Hunter <adrian.hunter@intel.com> wrote:
>>
>> On 3/06/19 9:37 PM, Douglas Anderson wrote:
>>> There are certain cases, notably when transitioning between sleep and
>>> active state, when Broadcom SDIO WiFi cards will produce errors on the
>>> SDIO bus.  This is evident from the source code where you can see that
>>> we try commands in a loop until we either get success or we've tried
>>> too many times.  The comment in the code reinforces this by saying
>>> "just one write attempt may fail"
>>>
>>> Unfortunately these failures sometimes end up causing an "-EILSEQ"
>>> back to the core which triggers a retuning of the SDIO card and that
>>> blocks all traffic to the card until it's done.
>>>
>>> Let's disable retuning around the commands we expect might fail.
>>
>> It seems to me that re-tuning needs to be prevented before the
>> first access otherwise it might be attempted there,
> 
> By this I think you mean I wasn't starting my section early enough to
> catch the "1st KSO write".  Oops.  Thanks!
> 
> 
>> and it needs
>> to continue to be prevented during the transition when it might
>> reasonably be expected to fail.
>>
>> What about something along these lines:
>>
>> diff --git a/drivers/net/wireless/broadcom/brcm80211/brcmfmac/sdio.c b/drivers/net/wireless/broadcom/brcm80211/brcmfmac/sdio.c
>> index 4e15ea57d4f5..d932780ef56e 100644
>> --- a/drivers/net/wireless/broadcom/brcm80211/brcmfmac/sdio.c
>> +++ b/drivers/net/wireless/broadcom/brcm80211/brcmfmac/sdio.c
>> @@ -664,9 +664,18 @@ brcmf_sdio_kso_control(struct brcmf_sdio *bus, bool on)
>>         int err = 0;
>>         int err_cnt = 0;
>>         int try_cnt = 0;
>> +       int need_retune = 0;
>> +       bool retune_release = false;
>>
>>         brcmf_dbg(TRACE, "Enter: on=%d\n", on);
>>
>> +       /* Cannot re-tune if device is asleep */
>> +       if (on) {
>> +               need_retune = sdio_retune_get_needed(bus->sdiodev->func1); // TODO: host->can_retune ? host->need_retune : 0
>> +               sdio_retune_hold_now(bus->sdiodev->func1); // TODO: add sdio_retune_hold_now()
>> +               retune_release = true;
>> +       }
> 
> The code below still has retries even for the "!on" case.  That
> implies that you could still get CRC errors from the card in the "!on"
> direction too.  Any reason why we shouldn't just hold retuning even
> for the !on case?

No

> 
> 
>> +
>>         wr_val = (on << SBSDIO_FUNC1_SLEEPCSR_KSO_SHIFT);
>>         /* 1st KSO write goes to AOS wake up core if device is asleep  */
>>         brcmf_sdiod_writeb(bus->sdiodev, SBSDIO_FUNC1_SLEEPCSR, wr_val, &err);
>> @@ -711,8 +720,16 @@ brcmf_sdio_kso_control(struct brcmf_sdio *bus, bool on)
>>                         err_cnt = 0;
>>                 }
>>                 /* bail out upon subsequent access errors */
>> -               if (err && (err_cnt++ > BRCMF_SDIO_MAX_ACCESS_ERRORS))
>> -                       break;
>> +               if (err && (err_cnt++ > BRCMF_SDIO_MAX_ACCESS_ERRORS)) {
>> +                       if (!retune_release)
>> +                               break;
>> +                       /*
>> +                        * Allow one more retry with re-tuning released in case
>> +                        * it helps.
>> +                        */
>> +                       sdio_retune_release(bus->sdiodev->func1);
>> +                       retune_release = false;
> 
> I would be tempted to wait before adding this logic until we actually
> see that it's needed.  Sure, doing one more transfer probably won't
> really hurt, but until we know that it actually helps it seems like
> we're just adding extra complexity?

Depends, what is the downside of unnecessarily returning an error from
brcmf_sdio_kso_control() in that case?

> 
> 
>> +               }
>>
>>                 udelay(KSO_WAIT_US);
>>                 brcmf_sdiod_writeb(bus->sdiodev, SBSDIO_FUNC1_SLEEPCSR, wr_val,
>> @@ -727,6 +744,18 @@ brcmf_sdio_kso_control(struct brcmf_sdio *bus, bool on)
>>         if (try_cnt > MAX_KSO_ATTEMPTS)
>>                 brcmf_err("max tries: rd_val=0x%x err=%d\n", rd_val, err);
>>
>> +       if (retune_release) {
>> +               /*
>> +                * CRC errors are not unexpected during the transition but they
>> +                * also trigger re-tuning. Clear that here to avoid an
>> +                * unnecessary re-tune if it wasn't already triggered to start
>> +                * with.
>> +                */
>> +               if (!need_retune)
>> +                       sdio_retune_clear_needed(bus->sdiodev->func1); // TODO: host->need_retune = 0
>> +               sdio_retune_release(bus->sdiodev->func1); // TODO: add sdio_retune_release()
>> +       }
> 
> Every time I re-look at this I have to re-figure out all the subtle
> differences between the variables and functions involved here.  Let me
> see if I got everything right:
> 
> * need_retune: set to 1 if we can retune and some event happened that
> makes us truly believe that we need to be retuned, like we got a CRC
> error or a timer expired or our host controller told us to retune.
> 
> * retune_now: set to 1 it's an OK time to be retuning.  Specifically
> if retune_now is false we won't send any retuning commands but we'll
> still keep track of the need to retune.
> 
> * hold_retune: If this gets set to 1 by mmc_retune_hold_now() then a
> future call to mmc_retune_hold() will _not_ schedule a retune by
> setting retune_now (because mmc_retune_hold() will see that
> hold_retune was already 1).  ...and a future call to
> mmc_retune_recheck() between mmc_hold() and mmc_release() will also
> not schedule a retune because hold_retune will be 2 (or generally >
> 1).
> 
> ---
> 
> So overall trying to summarize what I think are the differences
> between your patch and my patch.
> 
> 1. If we needed to re-tune _before_ calling brcmf_sdio_kso_control(),
> with your patch we'll make sure that we don't actually attempt to
> retune until brcmf_sdio_kso_control() finishes.
> 
> 2. If we needed to retune during brcmf_sdio_kso_control() (because a
> timer expired?) then we wouldn't trigger that retune while
> brcmf_sdio_kso_control() is running.
> 
> In the case of dw_mmc, which I'm most familiar with, we don't have any
> sort of automated or timed-based retuning.  ...so we'll only re-tune
> when we see the CRC error.  If I'm understanding things correctly then
> that for dw_mmc my solution and yours behave the same.  That means the
> difference is how we deal with other retuning requests, either ones
> that come about because of an interrupt that the host controller
> provided or because of a timer.  Did I get that right?
> 
> ...and I guess the reason we have to deal specially with these cases
> is because any time that SDIO card is "sleeping" we don't want to
> retune because it won't work.  Right?  NOTE: the solution that would
> come to my mind first to solve this would be to hold the retuning for
> the whole time that the card was sleeping and then release it once the
> card was awake again.  ...but I guess we don't truly need to do that
> because tuning only happens as a side effect of sending a command to
> the card and the only command we send to the card is the "wake up"
> command.  That's why your solution to hold tuning while sending the
> "wake up" command works, right?
> 
> ---
> 
> OK, so assuming all the above is correct, I feel like we're actually
> solving two problems and in fact I believe we actually need both our
> approaches to solve everything correctly.  With just your patch in
> place there's a problem because we will clobber any external retuning
> requests that happened while we were waking up the card.  AKA, imagine
> this:
> 
> A) brcmf_sdio_kso_control(on=True) gets called; need_retune starts as 0
> 
> B) We call sdio_retune_hold_now()
> 
> C) A retuning timer goes off or the SD Host controller tells us to retune
> 
> D) We get to the end of brcmf_sdio_kso_control() and clear the "retune
> needed" since need_retune was 0 at the start.
> 
> ...so we dropped the retuning request from C), right?

True

> 
> 
> What we truly need is:
> 
> 1. CRC errors shouldn't trigger a retuning request when we're in
> brcmf_sdio_kso_control()
> 
> 2. A separate patch that holds any retuning requests while the SDIO
> card is off.  This patch _shouldn't_ do any clearing of retuning
> requests, just defer them.
> 
> 
> Does that make sense to you?  If so, I can try to code it up...

Sounds good :-)

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH v2 3/3] brcmfmac: sdio: Disable auto-tuning around commands expected to fail
  2019-06-07  5:12       ` Arend Van Spriel
@ 2019-06-07 12:38         ` Adrian Hunter
  2019-06-07 13:32           ` Arend Van Spriel
  0 siblings, 1 reply; 15+ messages in thread
From: Adrian Hunter @ 2019-06-07 12:38 UTC (permalink / raw)
  To: Arend Van Spriel, Doug Anderson
  Cc: Ulf Hansson, Kalle Valo, brcm80211-dev-list.pdl,
	open list:ARM/Rockchip SoC...,
	Double Lo, Brian Norris, linux-wireless, Naveen Gupta,
	Madhan Mohan R, Matthias Kaehlcke, Wright Feng, Chi-Hsien Lin,
	netdev, brcm80211-dev-list, David S. Miller, Franky Lin, LKML,
	Hante Meuleman, YueHaibing, Michael Trimarchi

On 7/06/19 8:12 AM, Arend Van Spriel wrote:
> On June 6, 2019 11:37:22 PM Doug Anderson <dianders@chromium.org> wrote:
>>
>> In the case of dw_mmc, which I'm most familiar with, we don't have any
>> sort of automated or timed-based retuning.  ...so we'll only re-tune
>> when we see the CRC error.  If I'm understanding things correctly then
>> that for dw_mmc my solution and yours behave the same.  That means the
>> difference is how we deal with other retuning requests, either ones
>> that come about because of an interrupt that the host controller
>> provided or because of a timer.  Did I get that right?
> 
> Right.
> 
>> ...and I guess the reason we have to deal specially with these cases
>> is because any time that SDIO card is "sleeping" we don't want to
>> retune because it won't work.  Right?  NOTE: the solution that would
>> come to my mind first to solve this would be to hold the retuning for
>> the whole time that the card was sleeping and then release it once the
>> card was awake again.  ...but I guess we don't truly need to do that
>> because tuning only happens as a side effect of sending a command to
>> the card and the only command we send to the card is the "wake up"
>> command.  That's why your solution to hold tuning while sending the
>> "wake up" command works, right?
> 
> Yup.
> 
>> ---
>>
>> OK, so assuming all the above is correct, I feel like we're actually
>> solving two problems and in fact I believe we actually need both our
>> approaches to solve everything correctly.  With just your patch in
>> place there's a problem because we will clobber any external retuning
>> requests that happened while we were waking up the card.  AKA, imagine
>> this:
>>
>> A) brcmf_sdio_kso_control(on=True) gets called; need_retune starts as 0
>>
>> B) We call sdio_retune_hold_now()
>>
>> C) A retuning timer goes off or the SD Host controller tells us to retune
>>
>> D) We get to the end of brcmf_sdio_kso_control() and clear the "retune
>> needed" since need_retune was 0 at the start.
>>
>> ...so we dropped the retuning request from C), right?
>>
>>
>> What we truly need is:
>>
>> 1. CRC errors shouldn't trigger a retuning request when we're in
>> brcmf_sdio_kso_control()
>>
>> 2. A separate patch that holds any retuning requests while the SDIO
>> card is off.  This patch _shouldn't_ do any clearing of retuning
>> requests, just defer them.
>>
>>
>> Does that make sense to you?  If so, I can try to code it up...
> 
> FWIW it does make sense to me. However, I am still not sure if our sdio
> hardware supports retuning. Have to track down an asic designer who can tell
> or dive into vhdl myself.

The card supports re-tuning if is handles CMD19, which it does.  It is not
the card that does any tuning, only the host.  The card just helps by
providing a known data pattern in response to CMD19.  It can be that a card
provides good enough signals that the host should not need to re-tune.  I
don't know if that can be affected by the board design though.

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH v2 3/3] brcmfmac: sdio: Disable auto-tuning around commands expected to fail
  2019-06-07 12:38         ` Adrian Hunter
@ 2019-06-07 13:32           ` Arend Van Spriel
  2019-06-07 18:06             ` Doug Anderson
  0 siblings, 1 reply; 15+ messages in thread
From: Arend Van Spriel @ 2019-06-07 13:32 UTC (permalink / raw)
  To: Adrian Hunter, Doug Anderson
  Cc: Ulf Hansson, Kalle Valo, brcm80211-dev-list.pdl,
	open list:ARM/Rockchip SoC...,
	Double Lo, Brian Norris, linux-wireless, Naveen Gupta,
	Madhan Mohan R, Matthias Kaehlcke, Wright Feng, Chi-Hsien Lin,
	netdev, brcm80211-dev-list, David S. Miller, Franky Lin, LKML,
	Hante Meuleman, YueHaibing, Michael Trimarchi

On June 7, 2019 2:40:04 PM Adrian Hunter <adrian.hunter@intel.com> wrote:

> On 7/06/19 8:12 AM, Arend Van Spriel wrote:
>> On June 6, 2019 11:37:22 PM Doug Anderson <dianders@chromium.org> wrote:
>>>
>>> In the case of dw_mmc, which I'm most familiar with, we don't have any
>>> sort of automated or timed-based retuning.  ...so we'll only re-tune
>>> when we see the CRC error.  If I'm understanding things correctly then
>>> that for dw_mmc my solution and yours behave the same.  That means the
>>> difference is how we deal with other retuning requests, either ones
>>> that come about because of an interrupt that the host controller
>>> provided or because of a timer.  Did I get that right?
>> 
>> Right.
>> 
>>> ...and I guess the reason we have to deal specially with these cases
>>> is because any time that SDIO card is "sleeping" we don't want to
>>> retune because it won't work.  Right?  NOTE: the solution that would
>>> come to my mind first to solve this would be to hold the retuning for
>>> the whole time that the card was sleeping and then release it once the
>>> card was awake again.  ...but I guess we don't truly need to do that
>>> because tuning only happens as a side effect of sending a command to
>>> the card and the only command we send to the card is the "wake up"
>>> command.  That's why your solution to hold tuning while sending the
>>> "wake up" command works, right?
>> 
>> Yup.
>> 
>>> ---
>>>
>>> OK, so assuming all the above is correct, I feel like we're actually
>>> solving two problems and in fact I believe we actually need both our
>>> approaches to solve everything correctly.  With just your patch in
>>> place there's a problem because we will clobber any external retuning
>>> requests that happened while we were waking up the card.  AKA, imagine
>>> this:
>>>
>>> A) brcmf_sdio_kso_control(on=True) gets called; need_retune starts as 0
>>>
>>> B) We call sdio_retune_hold_now()
>>>
>>> C) A retuning timer goes off or the SD Host controller tells us to retune
>>>
>>> D) We get to the end of brcmf_sdio_kso_control() and clear the "retune
>>> needed" since need_retune was 0 at the start.
>>>
>>> ...so we dropped the retuning request from C), right?
>>>
>>>
>>> What we truly need is:
>>>
>>> 1. CRC errors shouldn't trigger a retuning request when we're in
>>> brcmf_sdio_kso_control()
>>>
>>> 2. A separate patch that holds any retuning requests while the SDIO
>>> card is off.  This patch _shouldn't_ do any clearing of retuning
>>> requests, just defer them.
>>>
>>>
>>> Does that make sense to you?  If so, I can try to code it up...
>> 
>> FWIW it does make sense to me. However, I am still not sure if our sdio
>> hardware supports retuning. Have to track down an asic designer who can tell
>> or dive into vhdl myself.
>
> The card supports re-tuning if is handles CMD19, which it does.  It is not
> the card that does any tuning, only the host.  The card just helps by
> providing a known data pattern in response to CMD19.  It can be that a card
> provides good enough signals that the host should not need to re-tune.  I
> don't know if that can be affected by the board design though.

Right. I know it supports initial tuning, but I'm not sure about subsequent 
retuning initiated by the host controller.

Regards,
Arend



^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH v2 3/3] brcmfmac: sdio: Disable auto-tuning around commands expected to fail
  2019-06-07 12:28       ` Adrian Hunter
@ 2019-06-07 18:00         ` Doug Anderson
  0 siblings, 0 replies; 15+ messages in thread
From: Doug Anderson @ 2019-06-07 18:00 UTC (permalink / raw)
  To: Adrian Hunter
  Cc: Ulf Hansson, Kalle Valo, Arend van Spriel,
	brcm80211-dev-list.pdl, open list:ARM/Rockchip SoC...,
	Double Lo, Brian Norris, linux-wireless, Naveen Gupta,
	Madhan Mohan R, Matthias Kaehlcke, Wright Feng, Chi-Hsien Lin,
	netdev, brcm80211-dev-list, David S. Miller, Franky Lin, LKML,
	Hante Meuleman, YueHaibing, Michael Trimarchi

Hi,

On Fri, Jun 7, 2019 at 5:29 AM Adrian Hunter <adrian.hunter@intel.com> wrote:
>
> >> @@ -711,8 +720,16 @@ brcmf_sdio_kso_control(struct brcmf_sdio *bus, bool on)
> >>                         err_cnt = 0;
> >>                 }
> >>                 /* bail out upon subsequent access errors */
> >> -               if (err && (err_cnt++ > BRCMF_SDIO_MAX_ACCESS_ERRORS))
> >> -                       break;
> >> +               if (err && (err_cnt++ > BRCMF_SDIO_MAX_ACCESS_ERRORS)) {
> >> +                       if (!retune_release)
> >> +                               break;
> >> +                       /*
> >> +                        * Allow one more retry with re-tuning released in case
> >> +                        * it helps.
> >> +                        */
> >> +                       sdio_retune_release(bus->sdiodev->func1);
> >> +                       retune_release = false;
> >
> > I would be tempted to wait before adding this logic until we actually
> > see that it's needed.  Sure, doing one more transfer probably won't
> > really hurt, but until we know that it actually helps it seems like
> > we're just adding extra complexity?
>
> Depends, what is the downside of unnecessarily returning an error from
> brcmf_sdio_kso_control() in that case?

I'm not aware of any downside, but without any example of it being
needed it's just untested code that might or might not fix a problem.
For now I'm going to leave it out of my patch and if someone later
finds it needed or if you're convinced that we really need it we can
add it as a patch atop.

-Doug

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH v2 3/3] brcmfmac: sdio: Disable auto-tuning around commands expected to fail
  2019-06-07 13:32           ` Arend Van Spriel
@ 2019-06-07 18:06             ` Doug Anderson
  2019-06-07 18:56               ` Arend Van Spriel
  0 siblings, 1 reply; 15+ messages in thread
From: Doug Anderson @ 2019-06-07 18:06 UTC (permalink / raw)
  To: Arend Van Spriel
  Cc: Adrian Hunter, Ulf Hansson, Kalle Valo, brcm80211-dev-list.pdl,
	open list:ARM/Rockchip SoC...,
	Double Lo, Brian Norris, linux-wireless, Naveen Gupta,
	Madhan Mohan R, Matthias Kaehlcke, Wright Feng, Chi-Hsien Lin,
	netdev, brcm80211-dev-list, David S. Miller, Franky Lin, LKML,
	Hante Meuleman, YueHaibing, Michael Trimarchi

Hi,

On Fri, Jun 7, 2019 at 6:32 AM Arend Van Spriel
<arend.vanspriel@broadcom.com> wrote:
>
> On June 7, 2019 2:40:04 PM Adrian Hunter <adrian.hunter@intel.com> wrote:
>
> > On 7/06/19 8:12 AM, Arend Van Spriel wrote:
> >> On June 6, 2019 11:37:22 PM Doug Anderson <dianders@chromium.org> wrote:
> >>>
> >>> In the case of dw_mmc, which I'm most familiar with, we don't have any
> >>> sort of automated or timed-based retuning.  ...so we'll only re-tune
> >>> when we see the CRC error.  If I'm understanding things correctly then
> >>> that for dw_mmc my solution and yours behave the same.  That means the
> >>> difference is how we deal with other retuning requests, either ones
> >>> that come about because of an interrupt that the host controller
> >>> provided or because of a timer.  Did I get that right?
> >>
> >> Right.
> >>
> >>> ...and I guess the reason we have to deal specially with these cases
> >>> is because any time that SDIO card is "sleeping" we don't want to
> >>> retune because it won't work.  Right?  NOTE: the solution that would
> >>> come to my mind first to solve this would be to hold the retuning for
> >>> the whole time that the card was sleeping and then release it once the
> >>> card was awake again.  ...but I guess we don't truly need to do that
> >>> because tuning only happens as a side effect of sending a command to
> >>> the card and the only command we send to the card is the "wake up"
> >>> command.  That's why your solution to hold tuning while sending the
> >>> "wake up" command works, right?
> >>
> >> Yup.
> >>
> >>> ---
> >>>
> >>> OK, so assuming all the above is correct, I feel like we're actually
> >>> solving two problems and in fact I believe we actually need both our
> >>> approaches to solve everything correctly.  With just your patch in
> >>> place there's a problem because we will clobber any external retuning
> >>> requests that happened while we were waking up the card.  AKA, imagine
> >>> this:
> >>>
> >>> A) brcmf_sdio_kso_control(on=True) gets called; need_retune starts as 0
> >>>
> >>> B) We call sdio_retune_hold_now()
> >>>
> >>> C) A retuning timer goes off or the SD Host controller tells us to retune
> >>>
> >>> D) We get to the end of brcmf_sdio_kso_control() and clear the "retune
> >>> needed" since need_retune was 0 at the start.
> >>>
> >>> ...so we dropped the retuning request from C), right?
> >>>
> >>>
> >>> What we truly need is:
> >>>
> >>> 1. CRC errors shouldn't trigger a retuning request when we're in
> >>> brcmf_sdio_kso_control()
> >>>
> >>> 2. A separate patch that holds any retuning requests while the SDIO
> >>> card is off.  This patch _shouldn't_ do any clearing of retuning
> >>> requests, just defer them.
> >>>
> >>>
> >>> Does that make sense to you?  If so, I can try to code it up...
> >>
> >> FWIW it does make sense to me. However, I am still not sure if our sdio
> >> hardware supports retuning. Have to track down an asic designer who can tell
> >> or dive into vhdl myself.
> >
> > The card supports re-tuning if is handles CMD19, which it does.  It is not
> > the card that does any tuning, only the host.  The card just helps by
> > providing a known data pattern in response to CMD19.  It can be that a card
> > provides good enough signals that the host should not need to re-tune.  I
> > don't know if that can be affected by the board design though.
>
> Right. I know it supports initial tuning, but I'm not sure about subsequent
> retuning initiated by the host controller.

My evidence says that it supports subsequent tuning.  In fact, without
this series my logs would be filled with:

  dwmmc_rockchip ff0d0000.dwmmc: Successfully tuned phase to XYZ

...where the phase varied by a few degrees each time.  AKA: it was
retuning over and over again and getting sane results which implies
that the tuning was working just fine.

The whole point of this series is not that the retuning was actually
broken or anything it was just pointless and blocking the bus while it
happened.  On rk3288 dw_mmc ports we also currently do pretty
extensive tuning, trying _lots_ of phases.  Thus the re-tuning was
blocking the bus for a significant amount of time.

-Doug

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH v2 3/3] brcmfmac: sdio: Disable auto-tuning around commands expected to fail
  2019-06-07 18:06             ` Doug Anderson
@ 2019-06-07 18:56               ` Arend Van Spriel
  0 siblings, 0 replies; 15+ messages in thread
From: Arend Van Spriel @ 2019-06-07 18:56 UTC (permalink / raw)
  To: Doug Anderson
  Cc: Adrian Hunter, Ulf Hansson, Kalle Valo, brcm80211-dev-list.pdl,
	open list:ARM/Rockchip SoC...,
	Double Lo, Brian Norris, linux-wireless, Naveen Gupta,
	Madhan Mohan R, Matthias Kaehlcke, Wright Feng, Chi-Hsien Lin,
	netdev, brcm80211-dev-list, David S. Miller, Franky Lin, LKML,
	Hante Meuleman, YueHaibing, Michael Trimarchi

On June 7, 2019 8:06:30 PM Doug Anderson <dianders@chromium.org> wrote:

> Hi,
>
> On Fri, Jun 7, 2019 at 6:32 AM Arend Van Spriel
> <arend.vanspriel@broadcom.com> wrote:
>>
>> Right. I know it supports initial tuning, but I'm not sure about subsequent
>> retuning initiated by the host controller.
>
> My evidence says that it supports subsequent tuning.  In fact, without
> this series my logs would be filled with:
>
>   dwmmc_rockchip ff0d0000.dwmmc: Successfully tuned phase to XYZ
>
> ...where the phase varied by a few degrees each time.  AKA: it was
> retuning over and over again and getting sane results which implies
> that the tuning was working just fine.

Ok. Thanks for confirming this.

Regards,
Arend



^ permalink raw reply	[flat|nested] 15+ messages in thread

end of thread, other threads:[~2019-06-07 18:56 UTC | newest]

Thread overview: 15+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-06-03 18:37 [PATCH v2 0/3] brcmfmac: sdio: Deal better w/ transmission errors waking from idle Douglas Anderson
2019-06-03 18:37 ` [PATCH v2 1/3] Revert "brcmfmac: disable command decode in sdio_aos" Douglas Anderson
2019-06-03 18:37 ` [PATCH v2 2/3] mmc: core: API for temporarily disabling auto-retuning due to errors Douglas Anderson
2019-06-05  7:54   ` Arend Van Spriel
2019-06-05 22:51     ` Doug Anderson
2019-06-03 18:37 ` [PATCH v2 3/3] brcmfmac: sdio: Disable auto-tuning around commands expected to fail Douglas Anderson
2019-06-06 13:59   ` Adrian Hunter
2019-06-06 21:37     ` Doug Anderson
2019-06-07  5:12       ` Arend Van Spriel
2019-06-07 12:38         ` Adrian Hunter
2019-06-07 13:32           ` Arend Van Spriel
2019-06-07 18:06             ` Doug Anderson
2019-06-07 18:56               ` Arend Van Spriel
2019-06-07 12:28       ` Adrian Hunter
2019-06-07 18:00         ` Doug Anderson

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).