All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 0/3] ath10k: fix host corruption
@ 2013-12-12 14:24 ` Michal Kazior
  0 siblings, 0 replies; 12+ messages in thread
From: Michal Kazior @ 2013-12-12 14:24 UTC (permalink / raw)
  To: ath10k; +Cc: linux-wireless, Michal Kazior

Hi,

This patchset aims at fixing (or at least
reducing) the frequency of a (apparently) strange
HW bug.

The bug happens in some workloads with different
frequency with different hardware spinoffs. It is
triggered by a cold reset after HW/FW has been
excercised with some work.

On x86 this could be a hang, on AP135 it is a data
bus error.


Michal Kazior (3):
  ath10k: fix device initialization routine
  ath10k: zero device DRAM to avoid host hangs
  ath10k: zero CE config upon deinit

 drivers/net/wireless/ath/ath10k/ce.c  |  27 +++++++
 drivers/net/wireless/ath/ath10k/hw.h  |   7 ++
 drivers/net/wireless/ath/ath10k/pci.c | 143 ++++++++++++++++++++++++++++++----
 3 files changed, 164 insertions(+), 13 deletions(-)

-- 
1.8.4.rc3


^ permalink raw reply	[flat|nested] 12+ messages in thread

* [PATCH 0/3] ath10k: fix host corruption
@ 2013-12-12 14:24 ` Michal Kazior
  0 siblings, 0 replies; 12+ messages in thread
From: Michal Kazior @ 2013-12-12 14:24 UTC (permalink / raw)
  To: ath10k; +Cc: linux-wireless, Michal Kazior

Hi,

This patchset aims at fixing (or at least
reducing) the frequency of a (apparently) strange
HW bug.

The bug happens in some workloads with different
frequency with different hardware spinoffs. It is
triggered by a cold reset after HW/FW has been
excercised with some work.

On x86 this could be a hang, on AP135 it is a data
bus error.


Michal Kazior (3):
  ath10k: fix device initialization routine
  ath10k: zero device DRAM to avoid host hangs
  ath10k: zero CE config upon deinit

 drivers/net/wireless/ath/ath10k/ce.c  |  27 +++++++
 drivers/net/wireless/ath/ath10k/hw.h  |   7 ++
 drivers/net/wireless/ath/ath10k/pci.c | 143 ++++++++++++++++++++++++++++++----
 3 files changed, 164 insertions(+), 13 deletions(-)

-- 
1.8.4.rc3


_______________________________________________
ath10k mailing list
ath10k@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/ath10k

^ permalink raw reply	[flat|nested] 12+ messages in thread

* [PATCH 1/3] ath10k: fix device initialization routine
  2013-12-12 14:24 ` Michal Kazior
@ 2013-12-12 14:24   ` Michal Kazior
  -1 siblings, 0 replies; 12+ messages in thread
From: Michal Kazior @ 2013-12-12 14:24 UTC (permalink / raw)
  To: ath10k; +Cc: linux-wireless, Michal Kazior

Hardware revision 2 has some issues with cold
reset that lead to Data Bus Errors or system hangs
in some cases. It's safer to use warm reset when
possible as it shouldn't trigger the
aforementioned issues.

Prefer warm reset over cold reset. However since
warm reset doesn't always work (e.g. after FW
crash) make sure to fallback to cold reset.

This should generally reduce frequency of the
problem.

Signed-off-by: Michal Kazior <michal.kazior@tieto.com>
---
 drivers/net/wireless/ath/ath10k/hw.h  |   6 ++
 drivers/net/wireless/ath/ath10k/pci.c | 125 ++++++++++++++++++++++++++++++----
 2 files changed, 118 insertions(+), 13 deletions(-)

diff --git a/drivers/net/wireless/ath/ath10k/hw.h b/drivers/net/wireless/ath/ath10k/hw.h
index 9535eaa..2032737 100644
--- a/drivers/net/wireless/ath/ath10k/hw.h
+++ b/drivers/net/wireless/ath/ath10k/hw.h
@@ -204,8 +204,11 @@ enum ath10k_mcast2ucast_mode {
 #define WLAN_ANALOG_INTF_PCIE_BASE_ADDRESS	0x0006c000
 #define PCIE_LOCAL_BASE_ADDRESS			0x00080000
 
+#define SOC_RESET_CONTROL_ADDRESS		0x00000000
 #define SOC_RESET_CONTROL_OFFSET		0x00000000
 #define SOC_RESET_CONTROL_SI0_RST_MASK		0x00000001
+#define SOC_RESET_CONTROL_CE_RST_MASK		0x00040000
+#define SOC_RESET_CONTROL_CPU_WARM_RST_MASK	0x00000040
 #define SOC_CPU_CLOCK_OFFSET			0x00000020
 #define SOC_CPU_CLOCK_STANDARD_LSB		0
 #define SOC_CPU_CLOCK_STANDARD_MASK		0x00000003
@@ -215,6 +218,8 @@ enum ath10k_mcast2ucast_mode {
 #define SOC_LPO_CAL_OFFSET			0x000000e0
 #define SOC_LPO_CAL_ENABLE_LSB			20
 #define SOC_LPO_CAL_ENABLE_MASK			0x00100000
+#define SOC_LF_TIMER_CONTROL0_ADDRESS		0x00000050
+#define SOC_LF_TIMER_CONTROL0_ENABLE_MASK	0x00000004
 
 #define SOC_CHIP_ID_ADDRESS			0x000000ec
 #define SOC_CHIP_ID_REV_LSB			8
@@ -272,6 +277,7 @@ enum ath10k_mcast2ucast_mode {
 #define PCIE_INTR_CAUSE_ADDRESS			0x000c
 #define PCIE_INTR_CLR_ADDRESS			0x0014
 #define SCRATCH_3_ADDRESS			0x0030
+#define CPU_INTR_ADDRESS			0x0010
 
 /* Firmware indications to the Host via SCRATCH_3 register. */
 #define FW_INDICATOR_ADDRESS	(SOC_CORE_BASE_ADDRESS + SCRATCH_3_ADDRESS)
diff --git a/drivers/net/wireless/ath/ath10k/pci.c b/drivers/net/wireless/ath/ath10k/pci.c
index 29fd197..475b4da 100644
--- a/drivers/net/wireless/ath/ath10k/pci.c
+++ b/drivers/net/wireless/ath/ath10k/pci.c
@@ -64,7 +64,8 @@ static int ath10k_pci_post_rx_pipe(struct ath10k_pci_pipe *pipe_info,
 					     int num);
 static void ath10k_pci_rx_pipe_cleanup(struct ath10k_pci_pipe *pipe_info);
 static void ath10k_pci_stop_ce(struct ath10k *ar);
-static int ath10k_pci_device_reset(struct ath10k *ar);
+static int ath10k_pci_cold_reset(struct ath10k *ar);
+static int ath10k_pci_warm_reset(struct ath10k *ar);
 static int ath10k_pci_wait_for_target_init(struct ath10k *ar);
 static int ath10k_pci_init_irq(struct ath10k *ar);
 static int ath10k_pci_deinit_irq(struct ath10k *ar);
@@ -1477,6 +1478,10 @@ static void ath10k_pci_hif_stop(struct ath10k *ar)
 
 	ath10k_dbg(ATH10K_DBG_PCI, "%s\n", __func__);
 
+	/* Make the sure the device won't access any structures on the host by
+	 * resetting it. This should prevent host memory corruption, hangs. */
+	ath10k_pci_warm_reset(ar);
+
 	ret = ath10k_ce_disable_interrupts(ar);
 	if (ret)
 		ath10k_warn("failed to disable CE interrupts: %d\n", ret);
@@ -1497,13 +1502,6 @@ static void ath10k_pci_hif_stop(struct ath10k *ar)
 	ath10k_pci_cleanup_ce(ar);
 	ath10k_pci_buffer_cleanup(ar);
 
-	/* Make the sure the device won't access any structures on the host by
-	 * resetting it. The device was fed with PCI CE ringbuffer
-	 * configuration during init. If ringbuffers are freed and the device
-	 * were to access them this could lead to memory corruption on the
-	 * host. */
-	ath10k_pci_device_reset(ar);
-
 	ar_pci->started = 0;
 }
 
@@ -1993,7 +1991,73 @@ static void ath10k_pci_fw_interrupt_handler(struct ath10k *ar)
 	ath10k_pci_sleep(ar);
 }
 
-static int ath10k_pci_hif_power_up(struct ath10k *ar)
+static int ath10k_pci_warm_reset(struct ath10k *ar)
+{
+	struct ath10k_pci *ar_pci = ath10k_pci_priv(ar);
+	void __iomem *addr;
+	int ret = 0;
+	u32 val;
+
+	ath10k_dbg(ATH10K_DBG_BOOT, "performing warm chip reset\n");
+
+	ret = ath10k_do_pci_wake(ar);
+	if (ret)
+		return ret;
+
+	/* debug */
+	addr = ar_pci->mem + SOC_CORE_BASE_ADDRESS + PCIE_INTR_CAUSE_ADDRESS;
+	ath10k_dbg(ATH10K_DBG_BOOT, "host CPU intr cause: 0x%x\n", ioread32(addr));
+	addr = ar_pci->mem + SOC_CORE_BASE_ADDRESS + CPU_INTR_ADDRESS;
+	ath10k_dbg(ATH10K_DBG_BOOT, "target CPU intr cause: 0x%x\n", ioread32(addr));
+
+	/* disable pending irqs */
+	iowrite32(0, ar_pci->mem + SOC_CORE_BASE_ADDRESS + PCIE_INTR_ENABLE_ADDRESS);
+	iowrite32(~0, ar_pci->mem + SOC_CORE_BASE_ADDRESS + PCIE_INTR_CLR_ADDRESS);
+
+	msleep(100);
+
+	/* clear fw indicator */
+	iowrite32(0, ar_pci->mem + ar_pci->fw_indicator_address);
+
+	/* clear target LF timer interrupts */
+	addr = ar_pci->mem + RTC_SOC_BASE_ADDRESS + SOC_LF_TIMER_CONTROL0_ADDRESS;
+	iowrite32(ioread32(addr) & ~SOC_LF_TIMER_CONTROL0_ENABLE_MASK, addr);
+
+	/* reset CE */
+	addr = ar_pci->mem + RTC_SOC_BASE_ADDRESS + SOC_RESET_CONTROL_ADDRESS;
+	val = ioread32(addr);
+	val |= SOC_RESET_CONTROL_CE_RST_MASK;
+	iowrite32(val, addr);
+	val = ioread32(addr);
+	msleep(10);
+
+	/* unreset CE */
+	val &= ~SOC_RESET_CONTROL_CE_RST_MASK, addr;
+	iowrite32(val, addr);
+	val = ioread32(addr);
+	msleep(10);
+
+	/* debug */
+	addr = ar_pci->mem + SOC_CORE_BASE_ADDRESS + PCIE_INTR_CAUSE_ADDRESS;
+	ath10k_dbg(ATH10K_DBG_BOOT, "host CPU intr cause: 0x%x\n", ioread32(addr));
+	addr = ar_pci->mem + SOC_CORE_BASE_ADDRESS + CPU_INTR_ADDRESS;
+	ath10k_dbg(ATH10K_DBG_BOOT, "target CPU intr cause: 0x%x\n", ioread32(addr));
+
+	/* CPU warm reset */
+	addr = ar_pci->mem + RTC_SOC_BASE_ADDRESS + SOC_RESET_CONTROL_ADDRESS;
+	iowrite32(ioread32(addr) | SOC_RESET_CONTROL_CPU_WARM_RST_MASK, addr);
+
+	ath10k_dbg(ATH10K_DBG_BOOT, "target reset state: 0x%x\n", ioread32(addr));
+
+	msleep(100);
+
+	ath10k_dbg(ATH10K_DBG_BOOT, "warm reset complete\n");
+
+	ath10k_do_pci_sleep(ar);
+	return ret;
+}
+
+static int __ath10k_pci_hif_power_up(struct ath10k *ar, bool cold_reset)
 {
 	struct ath10k_pci *ar_pci = ath10k_pci_priv(ar);
 	const char *irq_mode;
@@ -2009,7 +2073,11 @@ static int ath10k_pci_hif_power_up(struct ath10k *ar)
 	 * is in an unexpected state. We try to catch that here in order to
 	 * reset the Target and retry the probe.
 	 */
-	ret = ath10k_pci_device_reset(ar);
+	if (cold_reset)
+		ret = ath10k_pci_cold_reset(ar);
+	else
+		ret = ath10k_pci_warm_reset(ar);
+
 	if (ret) {
 		ath10k_err("failed to reset target: %d\n", ret);
 		goto err;
@@ -2078,8 +2146,8 @@ err_free_early_irq:
 err_deinit_irq:
 	ath10k_pci_deinit_irq(ar);
 err_ce:
+	ath10k_pci_warm_reset(ar);
 	ath10k_pci_ce_deinit(ar);
-	ath10k_pci_device_reset(ar);
 err_ps:
 	if (!test_bit(ATH10K_PCI_FEATURE_SOC_POWER_SAVE, ar_pci->features))
 		ath10k_do_pci_sleep(ar);
@@ -2087,14 +2155,45 @@ err:
 	return ret;
 }
 
+static int ath10k_pci_hif_power_up(struct ath10k *ar)
+{
+	int ret;
+
+	/*
+	 * Hardware revision 2 has some issues with cold reset and the
+	 * preferred (and safer) way to perform a device reset is through a
+	 * warm reset.
+	 *
+	 * Warm reset doesn't always work though (notably after a firmware
+	 * crash) so fall back to cold reset if necessary.
+	 */
+	ret = __ath10k_pci_hif_power_up(ar, 0);
+	if (ret) {
+		ath10k_warn("failed to power up target using warm reset (%d), trying cold reset\n",
+			    ret);
+
+		ret = __ath10k_pci_hif_power_up(ar, 1);
+		if (ret) {
+			ath10k_err("failed to power up target using cold reset too (%d)\n",
+				   ret);
+			return ret;
+		}
+	}
+
+	return 0;
+}
+
 static void ath10k_pci_hif_power_down(struct ath10k *ar)
 {
 	struct ath10k_pci *ar_pci = ath10k_pci_priv(ar);
 
+	/* Make the sure the device won't access any structures on the host by
+	 * resetting it. This should prevent host memory corruption, hangs. */
+	ath10k_pci_warm_reset(ar);
+
 	ath10k_pci_free_early_irq(ar);
 	ath10k_pci_kill_tasklet(ar);
 	ath10k_pci_deinit_irq(ar);
-	ath10k_pci_device_reset(ar);
 
 	ath10k_pci_ce_deinit(ar);
 	if (!test_bit(ATH10K_PCI_FEATURE_SOC_POWER_SAVE, ar_pci->features))
@@ -2523,7 +2622,7 @@ out:
 	return ret;
 }
 
-static int ath10k_pci_device_reset(struct ath10k *ar)
+static int ath10k_pci_cold_reset(struct ath10k *ar)
 {
 	int i, ret;
 	u32 val;
-- 
1.8.4.rc3


^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [PATCH 1/3] ath10k: fix device initialization routine
@ 2013-12-12 14:24   ` Michal Kazior
  0 siblings, 0 replies; 12+ messages in thread
From: Michal Kazior @ 2013-12-12 14:24 UTC (permalink / raw)
  To: ath10k; +Cc: linux-wireless, Michal Kazior

Hardware revision 2 has some issues with cold
reset that lead to Data Bus Errors or system hangs
in some cases. It's safer to use warm reset when
possible as it shouldn't trigger the
aforementioned issues.

Prefer warm reset over cold reset. However since
warm reset doesn't always work (e.g. after FW
crash) make sure to fallback to cold reset.

This should generally reduce frequency of the
problem.

Signed-off-by: Michal Kazior <michal.kazior@tieto.com>
---
 drivers/net/wireless/ath/ath10k/hw.h  |   6 ++
 drivers/net/wireless/ath/ath10k/pci.c | 125 ++++++++++++++++++++++++++++++----
 2 files changed, 118 insertions(+), 13 deletions(-)

diff --git a/drivers/net/wireless/ath/ath10k/hw.h b/drivers/net/wireless/ath/ath10k/hw.h
index 9535eaa..2032737 100644
--- a/drivers/net/wireless/ath/ath10k/hw.h
+++ b/drivers/net/wireless/ath/ath10k/hw.h
@@ -204,8 +204,11 @@ enum ath10k_mcast2ucast_mode {
 #define WLAN_ANALOG_INTF_PCIE_BASE_ADDRESS	0x0006c000
 #define PCIE_LOCAL_BASE_ADDRESS			0x00080000
 
+#define SOC_RESET_CONTROL_ADDRESS		0x00000000
 #define SOC_RESET_CONTROL_OFFSET		0x00000000
 #define SOC_RESET_CONTROL_SI0_RST_MASK		0x00000001
+#define SOC_RESET_CONTROL_CE_RST_MASK		0x00040000
+#define SOC_RESET_CONTROL_CPU_WARM_RST_MASK	0x00000040
 #define SOC_CPU_CLOCK_OFFSET			0x00000020
 #define SOC_CPU_CLOCK_STANDARD_LSB		0
 #define SOC_CPU_CLOCK_STANDARD_MASK		0x00000003
@@ -215,6 +218,8 @@ enum ath10k_mcast2ucast_mode {
 #define SOC_LPO_CAL_OFFSET			0x000000e0
 #define SOC_LPO_CAL_ENABLE_LSB			20
 #define SOC_LPO_CAL_ENABLE_MASK			0x00100000
+#define SOC_LF_TIMER_CONTROL0_ADDRESS		0x00000050
+#define SOC_LF_TIMER_CONTROL0_ENABLE_MASK	0x00000004
 
 #define SOC_CHIP_ID_ADDRESS			0x000000ec
 #define SOC_CHIP_ID_REV_LSB			8
@@ -272,6 +277,7 @@ enum ath10k_mcast2ucast_mode {
 #define PCIE_INTR_CAUSE_ADDRESS			0x000c
 #define PCIE_INTR_CLR_ADDRESS			0x0014
 #define SCRATCH_3_ADDRESS			0x0030
+#define CPU_INTR_ADDRESS			0x0010
 
 /* Firmware indications to the Host via SCRATCH_3 register. */
 #define FW_INDICATOR_ADDRESS	(SOC_CORE_BASE_ADDRESS + SCRATCH_3_ADDRESS)
diff --git a/drivers/net/wireless/ath/ath10k/pci.c b/drivers/net/wireless/ath/ath10k/pci.c
index 29fd197..475b4da 100644
--- a/drivers/net/wireless/ath/ath10k/pci.c
+++ b/drivers/net/wireless/ath/ath10k/pci.c
@@ -64,7 +64,8 @@ static int ath10k_pci_post_rx_pipe(struct ath10k_pci_pipe *pipe_info,
 					     int num);
 static void ath10k_pci_rx_pipe_cleanup(struct ath10k_pci_pipe *pipe_info);
 static void ath10k_pci_stop_ce(struct ath10k *ar);
-static int ath10k_pci_device_reset(struct ath10k *ar);
+static int ath10k_pci_cold_reset(struct ath10k *ar);
+static int ath10k_pci_warm_reset(struct ath10k *ar);
 static int ath10k_pci_wait_for_target_init(struct ath10k *ar);
 static int ath10k_pci_init_irq(struct ath10k *ar);
 static int ath10k_pci_deinit_irq(struct ath10k *ar);
@@ -1477,6 +1478,10 @@ static void ath10k_pci_hif_stop(struct ath10k *ar)
 
 	ath10k_dbg(ATH10K_DBG_PCI, "%s\n", __func__);
 
+	/* Make the sure the device won't access any structures on the host by
+	 * resetting it. This should prevent host memory corruption, hangs. */
+	ath10k_pci_warm_reset(ar);
+
 	ret = ath10k_ce_disable_interrupts(ar);
 	if (ret)
 		ath10k_warn("failed to disable CE interrupts: %d\n", ret);
@@ -1497,13 +1502,6 @@ static void ath10k_pci_hif_stop(struct ath10k *ar)
 	ath10k_pci_cleanup_ce(ar);
 	ath10k_pci_buffer_cleanup(ar);
 
-	/* Make the sure the device won't access any structures on the host by
-	 * resetting it. The device was fed with PCI CE ringbuffer
-	 * configuration during init. If ringbuffers are freed and the device
-	 * were to access them this could lead to memory corruption on the
-	 * host. */
-	ath10k_pci_device_reset(ar);
-
 	ar_pci->started = 0;
 }
 
@@ -1993,7 +1991,73 @@ static void ath10k_pci_fw_interrupt_handler(struct ath10k *ar)
 	ath10k_pci_sleep(ar);
 }
 
-static int ath10k_pci_hif_power_up(struct ath10k *ar)
+static int ath10k_pci_warm_reset(struct ath10k *ar)
+{
+	struct ath10k_pci *ar_pci = ath10k_pci_priv(ar);
+	void __iomem *addr;
+	int ret = 0;
+	u32 val;
+
+	ath10k_dbg(ATH10K_DBG_BOOT, "performing warm chip reset\n");
+
+	ret = ath10k_do_pci_wake(ar);
+	if (ret)
+		return ret;
+
+	/* debug */
+	addr = ar_pci->mem + SOC_CORE_BASE_ADDRESS + PCIE_INTR_CAUSE_ADDRESS;
+	ath10k_dbg(ATH10K_DBG_BOOT, "host CPU intr cause: 0x%x\n", ioread32(addr));
+	addr = ar_pci->mem + SOC_CORE_BASE_ADDRESS + CPU_INTR_ADDRESS;
+	ath10k_dbg(ATH10K_DBG_BOOT, "target CPU intr cause: 0x%x\n", ioread32(addr));
+
+	/* disable pending irqs */
+	iowrite32(0, ar_pci->mem + SOC_CORE_BASE_ADDRESS + PCIE_INTR_ENABLE_ADDRESS);
+	iowrite32(~0, ar_pci->mem + SOC_CORE_BASE_ADDRESS + PCIE_INTR_CLR_ADDRESS);
+
+	msleep(100);
+
+	/* clear fw indicator */
+	iowrite32(0, ar_pci->mem + ar_pci->fw_indicator_address);
+
+	/* clear target LF timer interrupts */
+	addr = ar_pci->mem + RTC_SOC_BASE_ADDRESS + SOC_LF_TIMER_CONTROL0_ADDRESS;
+	iowrite32(ioread32(addr) & ~SOC_LF_TIMER_CONTROL0_ENABLE_MASK, addr);
+
+	/* reset CE */
+	addr = ar_pci->mem + RTC_SOC_BASE_ADDRESS + SOC_RESET_CONTROL_ADDRESS;
+	val = ioread32(addr);
+	val |= SOC_RESET_CONTROL_CE_RST_MASK;
+	iowrite32(val, addr);
+	val = ioread32(addr);
+	msleep(10);
+
+	/* unreset CE */
+	val &= ~SOC_RESET_CONTROL_CE_RST_MASK, addr;
+	iowrite32(val, addr);
+	val = ioread32(addr);
+	msleep(10);
+
+	/* debug */
+	addr = ar_pci->mem + SOC_CORE_BASE_ADDRESS + PCIE_INTR_CAUSE_ADDRESS;
+	ath10k_dbg(ATH10K_DBG_BOOT, "host CPU intr cause: 0x%x\n", ioread32(addr));
+	addr = ar_pci->mem + SOC_CORE_BASE_ADDRESS + CPU_INTR_ADDRESS;
+	ath10k_dbg(ATH10K_DBG_BOOT, "target CPU intr cause: 0x%x\n", ioread32(addr));
+
+	/* CPU warm reset */
+	addr = ar_pci->mem + RTC_SOC_BASE_ADDRESS + SOC_RESET_CONTROL_ADDRESS;
+	iowrite32(ioread32(addr) | SOC_RESET_CONTROL_CPU_WARM_RST_MASK, addr);
+
+	ath10k_dbg(ATH10K_DBG_BOOT, "target reset state: 0x%x\n", ioread32(addr));
+
+	msleep(100);
+
+	ath10k_dbg(ATH10K_DBG_BOOT, "warm reset complete\n");
+
+	ath10k_do_pci_sleep(ar);
+	return ret;
+}
+
+static int __ath10k_pci_hif_power_up(struct ath10k *ar, bool cold_reset)
 {
 	struct ath10k_pci *ar_pci = ath10k_pci_priv(ar);
 	const char *irq_mode;
@@ -2009,7 +2073,11 @@ static int ath10k_pci_hif_power_up(struct ath10k *ar)
 	 * is in an unexpected state. We try to catch that here in order to
 	 * reset the Target and retry the probe.
 	 */
-	ret = ath10k_pci_device_reset(ar);
+	if (cold_reset)
+		ret = ath10k_pci_cold_reset(ar);
+	else
+		ret = ath10k_pci_warm_reset(ar);
+
 	if (ret) {
 		ath10k_err("failed to reset target: %d\n", ret);
 		goto err;
@@ -2078,8 +2146,8 @@ err_free_early_irq:
 err_deinit_irq:
 	ath10k_pci_deinit_irq(ar);
 err_ce:
+	ath10k_pci_warm_reset(ar);
 	ath10k_pci_ce_deinit(ar);
-	ath10k_pci_device_reset(ar);
 err_ps:
 	if (!test_bit(ATH10K_PCI_FEATURE_SOC_POWER_SAVE, ar_pci->features))
 		ath10k_do_pci_sleep(ar);
@@ -2087,14 +2155,45 @@ err:
 	return ret;
 }
 
+static int ath10k_pci_hif_power_up(struct ath10k *ar)
+{
+	int ret;
+
+	/*
+	 * Hardware revision 2 has some issues with cold reset and the
+	 * preferred (and safer) way to perform a device reset is through a
+	 * warm reset.
+	 *
+	 * Warm reset doesn't always work though (notably after a firmware
+	 * crash) so fall back to cold reset if necessary.
+	 */
+	ret = __ath10k_pci_hif_power_up(ar, 0);
+	if (ret) {
+		ath10k_warn("failed to power up target using warm reset (%d), trying cold reset\n",
+			    ret);
+
+		ret = __ath10k_pci_hif_power_up(ar, 1);
+		if (ret) {
+			ath10k_err("failed to power up target using cold reset too (%d)\n",
+				   ret);
+			return ret;
+		}
+	}
+
+	return 0;
+}
+
 static void ath10k_pci_hif_power_down(struct ath10k *ar)
 {
 	struct ath10k_pci *ar_pci = ath10k_pci_priv(ar);
 
+	/* Make the sure the device won't access any structures on the host by
+	 * resetting it. This should prevent host memory corruption, hangs. */
+	ath10k_pci_warm_reset(ar);
+
 	ath10k_pci_free_early_irq(ar);
 	ath10k_pci_kill_tasklet(ar);
 	ath10k_pci_deinit_irq(ar);
-	ath10k_pci_device_reset(ar);
 
 	ath10k_pci_ce_deinit(ar);
 	if (!test_bit(ATH10K_PCI_FEATURE_SOC_POWER_SAVE, ar_pci->features))
@@ -2523,7 +2622,7 @@ out:
 	return ret;
 }
 
-static int ath10k_pci_device_reset(struct ath10k *ar)
+static int ath10k_pci_cold_reset(struct ath10k *ar)
 {
 	int i, ret;
 	u32 val;
-- 
1.8.4.rc3


_______________________________________________
ath10k mailing list
ath10k@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/ath10k

^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [PATCH 2/3] ath10k: zero device DRAM to avoid host hangs
  2013-12-12 14:24 ` Michal Kazior
@ 2013-12-12 14:24   ` Michal Kazior
  -1 siblings, 0 replies; 12+ messages in thread
From: Michal Kazior @ 2013-12-12 14:24 UTC (permalink / raw)
  To: ath10k; +Cc: linux-wireless, Michal Kazior

Hardware has a bug that causes it to dereference
garbage from the DRAM into the host. This can lead
to host crashes, hangs, memory corruption or data
bus errors.

Apparently doing a cold reset in a tight loop
isn't enough to trigger the bug. The device must
be excercised with a regular workload (i.e. start
AP, etc). After that there's a chance cold reset
will break and hang the host.

A rough guess here is this is related to DRAM
contents. The patch tries to zero the DRAM when
tearing down the device to avoid subsequent cold
reset break the host.

Ideally DRAM should be also zeroed right before
a cold reset but current CE init code doesn't
allow that.

Signed-off-by: Michal Kazior <michal.kazior@tieto.com>
---
 drivers/net/wireless/ath/ath10k/hw.h  |  1 +
 drivers/net/wireless/ath/ath10k/pci.c | 18 ++++++++++++++++++
 2 files changed, 19 insertions(+)

diff --git a/drivers/net/wireless/ath/ath10k/hw.h b/drivers/net/wireless/ath/ath10k/hw.h
index 2032737..3c60476 100644
--- a/drivers/net/wireless/ath/ath10k/hw.h
+++ b/drivers/net/wireless/ath/ath10k/hw.h
@@ -289,6 +289,7 @@ enum ath10k_mcast2ucast_mode {
 #define PCIE_INTR_CE_MASK_ALL			0x0007f800
 
 #define DRAM_BASE_ADDRESS			0x00400000
+#define DRAM_BASE_SIZE				(512*1024)
 
 #define MISSING 0
 
diff --git a/drivers/net/wireless/ath/ath10k/pci.c b/drivers/net/wireless/ath/ath10k/pci.c
index 475b4da..2527004 100644
--- a/drivers/net/wireless/ath/ath10k/pci.c
+++ b/drivers/net/wireless/ath/ath10k/pci.c
@@ -617,6 +617,22 @@ static int ath10k_pci_diag_write_access(struct ath10k *ar, u32 address,
 	return 0;
 }
 
+static void ath10k_pci_zero_target_dram(struct ath10k *ar)
+{
+	int i;
+
+	/* Target device has a bug with cold reset. It can dereference garbage
+	 * and access host memory leading to data bus errors, memory corruption
+	 * on host and hangs.
+	 *
+	 * To avoid that try to zero target DRAM through the diagnostic CE. */
+
+	ath10k_dbg(ATH10K_DBG_BOOT, "zeroing device DRAM\n");
+
+	for (i = 0; i < DRAM_BASE_SIZE; i += sizeof(u32))
+		ath10k_pci_diag_write_access(ar, DRAM_BASE_ADDRESS + i, 0);
+}
+
 static bool ath10k_pci_target_is_awake(struct ath10k *ar)
 {
 	void __iomem *mem = ath10k_pci_priv(ar)->mem;
@@ -1461,6 +1477,8 @@ static void ath10k_pci_ce_deinit(struct ath10k *ar)
 	struct ath10k_pci_pipe *pipe_info;
 	int pipe_num;
 
+	ath10k_pci_zero_target_dram(ar);
+
 	for (pipe_num = 0; pipe_num < CE_COUNT; pipe_num++) {
 		pipe_info = &ar_pci->pipe_info[pipe_num];
 		if (pipe_info->ce_hdl) {
-- 
1.8.4.rc3


^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [PATCH 2/3] ath10k: zero device DRAM to avoid host hangs
@ 2013-12-12 14:24   ` Michal Kazior
  0 siblings, 0 replies; 12+ messages in thread
From: Michal Kazior @ 2013-12-12 14:24 UTC (permalink / raw)
  To: ath10k; +Cc: linux-wireless, Michal Kazior

Hardware has a bug that causes it to dereference
garbage from the DRAM into the host. This can lead
to host crashes, hangs, memory corruption or data
bus errors.

Apparently doing a cold reset in a tight loop
isn't enough to trigger the bug. The device must
be excercised with a regular workload (i.e. start
AP, etc). After that there's a chance cold reset
will break and hang the host.

A rough guess here is this is related to DRAM
contents. The patch tries to zero the DRAM when
tearing down the device to avoid subsequent cold
reset break the host.

Ideally DRAM should be also zeroed right before
a cold reset but current CE init code doesn't
allow that.

Signed-off-by: Michal Kazior <michal.kazior@tieto.com>
---
 drivers/net/wireless/ath/ath10k/hw.h  |  1 +
 drivers/net/wireless/ath/ath10k/pci.c | 18 ++++++++++++++++++
 2 files changed, 19 insertions(+)

diff --git a/drivers/net/wireless/ath/ath10k/hw.h b/drivers/net/wireless/ath/ath10k/hw.h
index 2032737..3c60476 100644
--- a/drivers/net/wireless/ath/ath10k/hw.h
+++ b/drivers/net/wireless/ath/ath10k/hw.h
@@ -289,6 +289,7 @@ enum ath10k_mcast2ucast_mode {
 #define PCIE_INTR_CE_MASK_ALL			0x0007f800
 
 #define DRAM_BASE_ADDRESS			0x00400000
+#define DRAM_BASE_SIZE				(512*1024)
 
 #define MISSING 0
 
diff --git a/drivers/net/wireless/ath/ath10k/pci.c b/drivers/net/wireless/ath/ath10k/pci.c
index 475b4da..2527004 100644
--- a/drivers/net/wireless/ath/ath10k/pci.c
+++ b/drivers/net/wireless/ath/ath10k/pci.c
@@ -617,6 +617,22 @@ static int ath10k_pci_diag_write_access(struct ath10k *ar, u32 address,
 	return 0;
 }
 
+static void ath10k_pci_zero_target_dram(struct ath10k *ar)
+{
+	int i;
+
+	/* Target device has a bug with cold reset. It can dereference garbage
+	 * and access host memory leading to data bus errors, memory corruption
+	 * on host and hangs.
+	 *
+	 * To avoid that try to zero target DRAM through the diagnostic CE. */
+
+	ath10k_dbg(ATH10K_DBG_BOOT, "zeroing device DRAM\n");
+
+	for (i = 0; i < DRAM_BASE_SIZE; i += sizeof(u32))
+		ath10k_pci_diag_write_access(ar, DRAM_BASE_ADDRESS + i, 0);
+}
+
 static bool ath10k_pci_target_is_awake(struct ath10k *ar)
 {
 	void __iomem *mem = ath10k_pci_priv(ar)->mem;
@@ -1461,6 +1477,8 @@ static void ath10k_pci_ce_deinit(struct ath10k *ar)
 	struct ath10k_pci_pipe *pipe_info;
 	int pipe_num;
 
+	ath10k_pci_zero_target_dram(ar);
+
 	for (pipe_num = 0; pipe_num < CE_COUNT; pipe_num++) {
 		pipe_info = &ar_pci->pipe_info[pipe_num];
 		if (pipe_info->ce_hdl) {
-- 
1.8.4.rc3


_______________________________________________
ath10k mailing list
ath10k@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/ath10k

^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [PATCH 3/3] ath10k: zero CE config upon deinit
  2013-12-12 14:24 ` Michal Kazior
@ 2013-12-12 14:24   ` Michal Kazior
  -1 siblings, 0 replies; 12+ messages in thread
From: Michal Kazior @ 2013-12-12 14:24 UTC (permalink / raw)
  To: ath10k; +Cc: linux-wireless, Michal Kazior

Make sure to reset CE configuration in the exposed
device registers. This sounds like a safe plan
since CE configurations includes DMA addresses
shared between host and the device which are made
invalid after teardown.

Signed-off-by: Michal Kazior <michal.kazior@tieto.com>
---
 drivers/net/wireless/ath/ath10k/ce.c | 27 +++++++++++++++++++++++++++
 1 file changed, 27 insertions(+)

diff --git a/drivers/net/wireless/ath/ath10k/ce.c b/drivers/net/wireless/ath/ath10k/ce.c
index d44d618..ba82a03 100644
--- a/drivers/net/wireless/ath/ath10k/ce.c
+++ b/drivers/net/wireless/ath/ath10k/ce.c
@@ -1109,11 +1109,38 @@ out:
 	return ce_state;
 }
 
+void ath10k_ce_reset_src_ring(struct ath10k *ar, int num)
+{
+	u32 ctrl_addr = ath10k_ce_base_address(num);
+
+	ath10k_ce_src_ring_base_addr_set(ar, ctrl_addr, 0);
+	ath10k_ce_src_ring_size_set(ar, ctrl_addr, 0);
+	ath10k_ce_src_ring_dmax_set(ar, ctrl_addr, 0);
+	ath10k_ce_src_ring_byte_swap_set(ar, ctrl_addr, 0);
+	ath10k_ce_src_ring_lowmark_set(ar, ctrl_addr, 0);
+	ath10k_ce_src_ring_highmark_set(ar, ctrl_addr, 0);
+
+}
+
+void ath10k_ce_reset_dest_ring(struct ath10k *ar, int num)
+{
+	u32 ctrl_addr = ath10k_ce_base_address(num);
+
+	ath10k_ce_dest_ring_base_addr_set(ar, ctrl_addr, 0);
+	ath10k_ce_dest_ring_size_set(ar, ctrl_addr, 0);
+	ath10k_ce_dest_ring_byte_swap_set(ar, ctrl_addr, 0);
+	ath10k_ce_dest_ring_lowmark_set(ar, ctrl_addr, 0);
+	ath10k_ce_dest_ring_highmark_set(ar, ctrl_addr, 0);
+}
+
 void ath10k_ce_deinit(struct ath10k_ce_pipe *ce_state)
 {
 	struct ath10k *ar = ce_state->ar;
 	struct ath10k_pci *ar_pci = ath10k_pci_priv(ar);
 
+	ath10k_ce_reset_src_ring(ar, ce_state->id);
+	ath10k_ce_reset_dest_ring(ar, ce_state->id);
+
 	if (ce_state->src_ring) {
 		kfree(ce_state->src_ring->shadow_base_unaligned);
 		pci_free_consistent(ar_pci->pdev,
-- 
1.8.4.rc3


^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [PATCH 3/3] ath10k: zero CE config upon deinit
@ 2013-12-12 14:24   ` Michal Kazior
  0 siblings, 0 replies; 12+ messages in thread
From: Michal Kazior @ 2013-12-12 14:24 UTC (permalink / raw)
  To: ath10k; +Cc: linux-wireless, Michal Kazior

Make sure to reset CE configuration in the exposed
device registers. This sounds like a safe plan
since CE configurations includes DMA addresses
shared between host and the device which are made
invalid after teardown.

Signed-off-by: Michal Kazior <michal.kazior@tieto.com>
---
 drivers/net/wireless/ath/ath10k/ce.c | 27 +++++++++++++++++++++++++++
 1 file changed, 27 insertions(+)

diff --git a/drivers/net/wireless/ath/ath10k/ce.c b/drivers/net/wireless/ath/ath10k/ce.c
index d44d618..ba82a03 100644
--- a/drivers/net/wireless/ath/ath10k/ce.c
+++ b/drivers/net/wireless/ath/ath10k/ce.c
@@ -1109,11 +1109,38 @@ out:
 	return ce_state;
 }
 
+void ath10k_ce_reset_src_ring(struct ath10k *ar, int num)
+{
+	u32 ctrl_addr = ath10k_ce_base_address(num);
+
+	ath10k_ce_src_ring_base_addr_set(ar, ctrl_addr, 0);
+	ath10k_ce_src_ring_size_set(ar, ctrl_addr, 0);
+	ath10k_ce_src_ring_dmax_set(ar, ctrl_addr, 0);
+	ath10k_ce_src_ring_byte_swap_set(ar, ctrl_addr, 0);
+	ath10k_ce_src_ring_lowmark_set(ar, ctrl_addr, 0);
+	ath10k_ce_src_ring_highmark_set(ar, ctrl_addr, 0);
+
+}
+
+void ath10k_ce_reset_dest_ring(struct ath10k *ar, int num)
+{
+	u32 ctrl_addr = ath10k_ce_base_address(num);
+
+	ath10k_ce_dest_ring_base_addr_set(ar, ctrl_addr, 0);
+	ath10k_ce_dest_ring_size_set(ar, ctrl_addr, 0);
+	ath10k_ce_dest_ring_byte_swap_set(ar, ctrl_addr, 0);
+	ath10k_ce_dest_ring_lowmark_set(ar, ctrl_addr, 0);
+	ath10k_ce_dest_ring_highmark_set(ar, ctrl_addr, 0);
+}
+
 void ath10k_ce_deinit(struct ath10k_ce_pipe *ce_state)
 {
 	struct ath10k *ar = ce_state->ar;
 	struct ath10k_pci *ar_pci = ath10k_pci_priv(ar);
 
+	ath10k_ce_reset_src_ring(ar, ce_state->id);
+	ath10k_ce_reset_dest_ring(ar, ce_state->id);
+
 	if (ce_state->src_ring) {
 		kfree(ce_state->src_ring->shadow_base_unaligned);
 		pci_free_consistent(ar_pci->pdev,
-- 
1.8.4.rc3


_______________________________________________
ath10k mailing list
ath10k@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/ath10k

^ permalink raw reply related	[flat|nested] 12+ messages in thread

* Re: [PATCH 0/3] ath10k: fix host corruption
  2013-12-12 14:24 ` Michal Kazior
@ 2013-12-16 15:43   ` Kalle Valo
  -1 siblings, 0 replies; 12+ messages in thread
From: Kalle Valo @ 2013-12-16 15:43 UTC (permalink / raw)
  To: Michal Kazior; +Cc: ath10k, linux-wireless

Michal Kazior <michal.kazior@tieto.com> writes:

> This patchset aims at fixing (or at least
> reducing) the frequency of a (apparently) strange
> HW bug.
>
> The bug happens in some workloads with different
> frequency with different hardware spinoffs. It is
> triggered by a cold reset after HW/FW has been
> excercised with some work.
>
> On x86 this could be a hang, on AP135 it is a data
> bus error.
>
>
> Michal Kazior (3):
>   ath10k: fix device initialization routine
>   ath10k: zero device DRAM to avoid host hangs
>   ath10k: zero CE config upon deinit

Let's put these patches on hold for now while we investigate more about
this problem.

-- 
Kalle Valo

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH 0/3] ath10k: fix host corruption
@ 2013-12-16 15:43   ` Kalle Valo
  0 siblings, 0 replies; 12+ messages in thread
From: Kalle Valo @ 2013-12-16 15:43 UTC (permalink / raw)
  To: Michal Kazior; +Cc: linux-wireless, ath10k

Michal Kazior <michal.kazior@tieto.com> writes:

> This patchset aims at fixing (or at least
> reducing) the frequency of a (apparently) strange
> HW bug.
>
> The bug happens in some workloads with different
> frequency with different hardware spinoffs. It is
> triggered by a cold reset after HW/FW has been
> excercised with some work.
>
> On x86 this could be a hang, on AP135 it is a data
> bus error.
>
>
> Michal Kazior (3):
>   ath10k: fix device initialization routine
>   ath10k: zero device DRAM to avoid host hangs
>   ath10k: zero CE config upon deinit

Let's put these patches on hold for now while we investigate more about
this problem.

-- 
Kalle Valo

_______________________________________________
ath10k mailing list
ath10k@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/ath10k

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH 0/3] ath10k: fix host corruption
  2013-12-16 15:43   ` Kalle Valo
@ 2014-01-21 12:27     ` Marek Puzyniak
  -1 siblings, 0 replies; 12+ messages in thread
From: Marek Puzyniak @ 2014-01-21 12:27 UTC (permalink / raw)
  To: Kalle Valo; +Cc: Michal Kazior, linux-wireless, ath10k

On 16 December 2013 16:43, Kalle Valo <kvalo@qca.qualcomm.com> wrote:
> Michal Kazior <michal.kazior@tieto.com> writes:
>
>> This patchset aims at fixing (or at least
>> reducing) the frequency of a (apparently) strange
>> HW bug.
>>
>> The bug happens in some workloads with different
>> frequency with different hardware spinoffs. It is
>> triggered by a cold reset after HW/FW has been
>> excercised with some work.
>>
>> On x86 this could be a hang, on AP135 it is a data
>> bus error.
>>
>>
>> Michal Kazior (3):
>>   ath10k: fix device initialization routine
>>   ath10k: zero device DRAM to avoid host hangs
>>   ath10k: zero CE config upon deinit
>
> Let's put these patches on hold for now while we investigate more about
> this problem.
>
> --
> Kalle Valo
>
> _______________________________________________
> ath10k mailing list
> ath10k@lists.infradead.org
> http://lists.infradead.org/mailman/listinfo/ath10k

Kalle, please drop these patches. There will be new version of this patch-set.

Marek

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH 0/3] ath10k: fix host corruption
@ 2014-01-21 12:27     ` Marek Puzyniak
  0 siblings, 0 replies; 12+ messages in thread
From: Marek Puzyniak @ 2014-01-21 12:27 UTC (permalink / raw)
  To: Kalle Valo; +Cc: linux-wireless, Michal Kazior, ath10k

On 16 December 2013 16:43, Kalle Valo <kvalo@qca.qualcomm.com> wrote:
> Michal Kazior <michal.kazior@tieto.com> writes:
>
>> This patchset aims at fixing (or at least
>> reducing) the frequency of a (apparently) strange
>> HW bug.
>>
>> The bug happens in some workloads with different
>> frequency with different hardware spinoffs. It is
>> triggered by a cold reset after HW/FW has been
>> excercised with some work.
>>
>> On x86 this could be a hang, on AP135 it is a data
>> bus error.
>>
>>
>> Michal Kazior (3):
>>   ath10k: fix device initialization routine
>>   ath10k: zero device DRAM to avoid host hangs
>>   ath10k: zero CE config upon deinit
>
> Let's put these patches on hold for now while we investigate more about
> this problem.
>
> --
> Kalle Valo
>
> _______________________________________________
> ath10k mailing list
> ath10k@lists.infradead.org
> http://lists.infradead.org/mailman/listinfo/ath10k

Kalle, please drop these patches. There will be new version of this patch-set.

Marek

_______________________________________________
ath10k mailing list
ath10k@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/ath10k

^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2014-01-21 12:28 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2013-12-12 14:24 [PATCH 0/3] ath10k: fix host corruption Michal Kazior
2013-12-12 14:24 ` Michal Kazior
2013-12-12 14:24 ` [PATCH 1/3] ath10k: fix device initialization routine Michal Kazior
2013-12-12 14:24   ` Michal Kazior
2013-12-12 14:24 ` [PATCH 2/3] ath10k: zero device DRAM to avoid host hangs Michal Kazior
2013-12-12 14:24   ` Michal Kazior
2013-12-12 14:24 ` [PATCH 3/3] ath10k: zero CE config upon deinit Michal Kazior
2013-12-12 14:24   ` Michal Kazior
2013-12-16 15:43 ` [PATCH 0/3] ath10k: fix host corruption Kalle Valo
2013-12-16 15:43   ` Kalle Valo
2014-01-21 12:27   ` Marek Puzyniak
2014-01-21 12:27     ` Marek Puzyniak

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.