All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v5 0/7] ath10k: firmware crash dump
@ 2014-08-08 20:28 Kalle Valo
  2014-08-08 20:28 ` [PATCH v5 1/7] ath10k: add ath10k_pci_diag_* helpers Kalle Valo
                   ` (7 more replies)
  0 siblings, 8 replies; 17+ messages in thread
From: Kalle Valo @ 2014-08-08 20:28 UTC (permalink / raw)
  To: ath10k

Hi,

here's my reworked Ben's patchset adding firmware crash dump support to ath10k.
Unfortunately this crashes when reading the stack dump from the firmware but
time run out for me to fix that and I wanted to send this for comments anyway.

I did quite a lot of changes, basically to simplify the code, remove ifdefs and
so on. Here's some sort of list what I did:

* dump_data->tv_sec and tv_nsec to 64 bits (because long can be 32 bits
  on some platforms)

* fix long lines

* renamed ath10k_dbg_save_fw_dbg_buffer() to ath10k_debug_dbglog_add()

* add helpers for ath10k_pci_diag* functions

* refactor and rename ath10k_pci_hif_dump_area()

* latest crash dump is always stored (instead of the oldest unread)

* add ath10k_debug_get_fw_crash_data()

* move fw_r?m_bss_* fields to ar->fw

* struct ath10k_fw_crash_data is allocated with vmalloc()

* atomic allocation in ath10k_pci_dump_bss() is bad, fix that by using vmalloc
  in module initialisation

* separate FW IE entries for BSS regions

* don't use ath10k_err()

* simplify locking and memory allocation for FW IE handling

* add uuid

* move struct ath10k_dump_file_data and enum ath10k_fw_error_dump_type to debug.c

* function and variable naming, using ath10k_fw_crash_ prefix etc

* change warning and debug messages to follow ath10k style

* add ath10k_debug_get_new_fw_crash_data() to avoid ifdefs in pci.c

And I still have TODO:

* rename crashed_since_read to crashed?

* atomic allocation in ath10k_pci_dump_dbglog() is bad. Should we
  allocate a big buffer with vmalloc and use that?

* what should ath10k_fw_error_dump_open() do if firmware hasn't
  crashed? check crashed_since_read and return zero len file? or an
  error code? -ENOMSG?

* should the crash dump file actually be in little endian? would that
  be easier/simpler?

* should ath10k_pci_hif_dump_area() hold the lock all the time? That
  way we would guarantee that changes to ath10k_fw_crash_data are
  atomic.

---

Ben Greear (5):
      ath10k: provide firmware crash info via debugfs
      ath10k: save firmware debug log messages
      ath10k: save firmware stack upon firmware crash
      ath10k: dump exception stack contents on firmware crash
      ath10k: save firmware RAM and ROM BSS sections on crash

Kalle Valo (2):
      ath10k: add ath10k_pci_diag_* helpers
      ath10k: rename ath10k_pci_hif_dump_area() to ath10k_pci_firmware_crashed()


 drivers/net/wireless/ath/ath10k/core.c  |   54 +++++
 drivers/net/wireless/ath/ath10k/core.h  |   43 ++++
 drivers/net/wireless/ath/ath10k/debug.c |  314 +++++++++++++++++++++++++++++++
 drivers/net/wireless/ath/ath10k/debug.h |   22 ++
 drivers/net/wireless/ath/ath10k/hw.h    |   36 ++++
 drivers/net/wireless/ath/ath10k/pci.c   |  250 ++++++++++++++++++++++---
 drivers/net/wireless/ath/ath10k/pci.h   |    3 
 drivers/net/wireless/ath/ath10k/wmi.c   |    6 +
 8 files changed, 701 insertions(+), 27 deletions(-)


_______________________________________________
ath10k mailing list
ath10k@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/ath10k

^ permalink raw reply	[flat|nested] 17+ messages in thread

* [PATCH v5 1/7] ath10k: add ath10k_pci_diag_* helpers
  2014-08-08 20:28 [PATCH v5 0/7] ath10k: firmware crash dump Kalle Valo
@ 2014-08-08 20:28 ` Kalle Valo
  2014-08-08 20:28 ` [PATCH v5 2/7] ath10k: provide firmware crash info via debugfs Kalle Valo
                   ` (6 subsequent siblings)
  7 siblings, 0 replies; 17+ messages in thread
From: Kalle Valo @ 2014-08-08 20:28 UTC (permalink / raw)
  To: ath10k

ath10k_pci_diag_read32() is for reading u32 from a device and ath10k_pci_diag_read_hi()
is a helper for reading data using "host interest" table.

Signed-off-by: Kalle Valo <kvalo@qca.qualcomm.com>
---
 drivers/net/wireless/ath/ath10k/pci.c |   55 +++++++++++++++++++++++----------
 1 file changed, 38 insertions(+), 17 deletions(-)

diff --git a/drivers/net/wireless/ath/ath10k/pci.c b/drivers/net/wireless/ath/ath10k/pci.c
index 2d340cc522e3..96ce359349cb 100644
--- a/drivers/net/wireless/ath/ath10k/pci.c
+++ b/drivers/net/wireless/ath/ath10k/pci.c
@@ -466,13 +466,46 @@ done:
 	return ret;
 }
 
+static int ath10k_pci_diag_read32(struct ath10k *ar, u32 address, u32 *value)
+{
+	return ath10k_pci_diag_read_mem(ar, address, value, sizeof(u32));
+}
+
+static int __ath10k_pci_diag_read_hi(struct ath10k *ar, void *dest,
+				     u32 src, u32 len)
+{
+	u32 host_addr, addr;
+	int ret;
+
+	host_addr = host_interest_item_address(src);
+
+	ret = ath10k_pci_diag_read32(ar, host_addr, &addr);
+	if (ret != 0) {
+		ath10k_warn("failed to get memcpy hi address for firmware address %d: %d\n",
+			    src, ret);
+		return ret;
+	}
+
+	ret = ath10k_pci_diag_read_mem(ar, addr, dest, len);
+	if (ret != 0) {
+		ath10k_warn("failed to memcpy firmware memory from %d (%d B): %d\n",
+			    addr, len, ret);
+		return ret;
+	}
+
+	return 0;
+}
+
+#define ath10k_pci_diag_read_hi(ar, dest, src, len)		\
+	__ath10k_pci_diag_read_hi(ar, dest, HI_ITEM(src), len);
+
 /* Read 4-byte aligned data from Target memory or register */
 static int ath10k_pci_diag_read_access(struct ath10k *ar, u32 address,
 				       u32 *data)
 {
 	/* Assume range doesn't cross this boundary */
 	if (address >= DRAM_BASE_ADDRESS)
-		return ath10k_pci_diag_read_mem(ar, address, data, sizeof(u32));
+		return ath10k_pci_diag_read32(ar, address, data);
 
 	ath10k_pci_wake(ar);
 	*data = ath10k_pci_read32(ar, address);
@@ -836,9 +869,7 @@ static u16 ath10k_pci_hif_get_free_queue_number(struct ath10k *ar, u8 pipe)
 
 static void ath10k_pci_hif_dump_area(struct ath10k *ar)
 {
-	u32 reg_dump_area = 0;
 	u32 reg_dump_values[REG_DUMP_COUNT_QCA988X] = {};
-	u32 host_addr;
 	int ret;
 	u32 i;
 
@@ -847,21 +878,11 @@ static void ath10k_pci_hif_dump_area(struct ath10k *ar)
 		   ar->hw_params.name, ar->target_version);
 	ath10k_err("firmware version: %s\n", ar->hw->wiphy->fw_version);
 
-	host_addr = host_interest_item_address(HI_ITEM(hi_failure_state));
-	ret = ath10k_pci_diag_read_mem(ar, host_addr,
-				       &reg_dump_area, sizeof(u32));
+	ret = ath10k_pci_diag_read_hi(ar, &reg_dump_values[0],
+				      hi_failure_state,
+				      REG_DUMP_COUNT_QCA988X * sizeof(u32));
 	if (ret) {
-		ath10k_err("failed to read FW dump area address: %d\n", ret);
-		return;
-	}
-
-	ath10k_err("target register Dump Location: 0x%08X\n", reg_dump_area);
-
-	ret = ath10k_pci_diag_read_mem(ar, reg_dump_area,
-				       &reg_dump_values[0],
-				       REG_DUMP_COUNT_QCA988X * sizeof(u32));
-	if (ret != 0) {
-		ath10k_err("failed to read FW dump area: %d\n", ret);
+		ath10k_err("failed to read firmware dump area: %d\n", ret);
 		return;
 	}
 


_______________________________________________
ath10k mailing list
ath10k@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/ath10k

^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH v5 2/7] ath10k: provide firmware crash info via debugfs
  2014-08-08 20:28 [PATCH v5 0/7] ath10k: firmware crash dump Kalle Valo
  2014-08-08 20:28 ` [PATCH v5 1/7] ath10k: add ath10k_pci_diag_* helpers Kalle Valo
@ 2014-08-08 20:28 ` Kalle Valo
  2014-08-08 22:26   ` Ben Greear
  2014-08-08 20:28 ` [PATCH v5 3/7] ath10k: save firmware debug log messages Kalle Valo
                   ` (5 subsequent siblings)
  7 siblings, 1 reply; 17+ messages in thread
From: Kalle Valo @ 2014-08-08 20:28 UTC (permalink / raw)
  To: ath10k

From: Ben Greear <greearb@candelatech.com>

Store the firmware crash registers and last 128 or so
firmware debug-log ids and present them to user-space
via debugfs.

Should help with figuring out why the firmware crashed.

Signed-off-by: Ben Greear <greearb@candelatech.com>
Signed-off-by: Kalle Valo <kvalo@qca.qualcomm.com>
---
 drivers/net/wireless/ath/ath10k/core.h  |   25 +++
 drivers/net/wireless/ath/ath10k/debug.c |  264 +++++++++++++++++++++++++++++++
 drivers/net/wireless/ath/ath10k/debug.h |   22 +++
 drivers/net/wireless/ath/ath10k/hw.h    |   30 ++++
 drivers/net/wireless/ath/ath10k/pci.c   |  133 +++++++++++++++-
 drivers/net/wireless/ath/ath10k/pci.h   |    3 
 6 files changed, 466 insertions(+), 11 deletions(-)

diff --git a/drivers/net/wireless/ath/ath10k/core.h b/drivers/net/wireless/ath/ath10k/core.h
index d5c95d46e841..1243de3eae46 100644
--- a/drivers/net/wireless/ath/ath10k/core.h
+++ b/drivers/net/wireless/ath/ath10k/core.h
@@ -22,6 +22,7 @@
 #include <linux/if_ether.h>
 #include <linux/types.h>
 #include <linux/pci.h>
+#include <linux/uuid.h>
 
 #include "htt.h"
 #include "htc.h"
@@ -278,6 +279,28 @@ struct ath10k_vif_iter {
 	struct ath10k_vif *arvif;
 };
 
+/* This will store at least the last 128 entries.  Each dbglog message
+ * is a max of 7 32-bit integers in length, but the length can be less
+ * than that as well.
+ */
+#define ATH10K_DBGLOG_DATA_LEN (128 * 7 * sizeof(u32))
+
+struct ath10k_dbglog_entry_storage {
+	/* where to write next chunk of data */
+	u32 next_idx;
+
+	u8 data[ATH10K_DBGLOG_DATA_LEN];
+};
+
+/* used for crash-dump storage, protected by data-lock */
+struct ath10k_fw_crash_data {
+	bool crashed_since_read;
+
+	uuid_le uuid;
+	struct ath10k_dbglog_entry_storage dbglog_entry_data;
+	u32 reg_dump_values[REG_DUMP_COUNT_QCA988X];
+};
+
 struct ath10k_debug {
 	struct dentry *debugfs_phy;
 
@@ -295,6 +318,8 @@ struct ath10k_debug {
 
 	u8 htt_max_amsdu;
 	u8 htt_max_ampdu;
+
+	struct ath10k_fw_crash_data *fw_crash_data;
 };
 
 enum ath10k_state {
diff --git a/drivers/net/wireless/ath/ath10k/debug.c b/drivers/net/wireless/ath/ath10k/debug.c
index c9e35c87edfb..2a443e9557ed 100644
--- a/drivers/net/wireless/ath/ath10k/debug.c
+++ b/drivers/net/wireless/ath/ath10k/debug.c
@@ -17,6 +17,9 @@
 
 #include <linux/module.h>
 #include <linux/debugfs.h>
+#include <linux/version.h>
+#include <linux/vermagic.h>
+#include <linux/vmalloc.h>
 
 #include "core.h"
 #include "debug.h"
@@ -24,6 +27,89 @@
 /* ms */
 #define ATH10K_DEBUG_HTT_STATS_INTERVAL 1000
 
+/**
+ * enum ath10k_fw_error_dump_type - types of data in the dump file
+ * @ATH10K_FW_CRASH_DUMP_DBGLOG:  Recent firmware debug log entries
+ * @ATH10K_FW_CRASH_DUMP_REGDUMP: Register crash dump in binary format
+ */
+enum ath10k_fw_error_dump_type {
+	ATH10K_FW_CRASH_DUMP_DBGLOG = 0,
+	ATH10K_FW_CRASH_DUMP_REGDUMP = 1,
+
+	ATH10K_FW_CRASH_DUMP_MAX,
+};
+
+struct ath10k_tlv_dump_data {
+	/* see ath10k_fw_error_dump_type above */
+	u32 type;
+
+	/* in bytes */
+	u32 tlv_len;
+
+	/* pad to 32-bit boundaries as needed */
+	u8 tlv_data[];
+} __packed;
+
+struct ath10k_dump_file_data {
+	/* dump file information */
+
+	/* "ATH10K-FW-DUMP" */
+	char df_magic[16];
+
+	u32 len;
+
+	/* 0x1 if host is big-endian */
+	u32 big_endian;
+
+	/* file dump version, 1 for now. */
+	u32 version;
+
+	/* some info we can get from ath10k struct that might help */
+
+	u8 uuid[16];
+
+	u32 chip_id;
+
+	/* 0 for now, in place for later hardware */
+	u32 bus_type;
+
+	u32 target_version;
+	u32 fw_version_major;
+	u32 fw_version_minor;
+	u32 fw_version_release;
+	u32 fw_version_build;
+	u32 phy_capability;
+	u32 hw_min_tx_power;
+	u32 hw_max_tx_power;
+	u32 ht_cap_info;
+	u32 vht_cap_info;
+	u32 num_rf_chains;
+
+	/* firmware version string */
+	char fw_ver[ETHTOOL_FWVERS_LEN];
+
+	/* Kernel related information */
+
+	/* time-of-day stamp */
+	u64 tv_sec;
+
+	/* time-of-day stamp, nano-seconds */
+	u64 tv_nsec;
+
+
+	/* LINUX_VERSION_CODE */
+	u32 kernel_ver_code;
+
+	/* VERMAGIC_STRING */
+	char kernel_ver[64];
+
+	/* room for growth w/out changing binary format */
+	u8 unused[128];
+
+	/* struct ath10k_tlv_dump_data + more */
+	u8 data[0];
+} __packed;
+
 static int ath10k_printk(const char *level, const char *fmt, ...)
 {
 	struct va_format vaf;
@@ -580,6 +666,176 @@ static const struct file_operations fops_chip_id = {
 	.llseek = default_llseek,
 };
 
+struct ath10k_fw_crash_data *
+ath10k_debug_get_new_fw_crash_data(struct ath10k *ar)
+{
+	/* FIXME: locking is missing! */
+
+	uuid_le_gen(&ar->debug.fw_crash_data->uuid);
+
+	return ar->debug.fw_crash_data;
+}
+EXPORT_SYMBOL(ath10k_debug_get_new_fw_crash_data);
+
+void ath10k_debug_dbglog_add(struct ath10k *ar, u8 *buffer, int len)
+{
+	struct ath10k_fw_crash_data *crash_data = ar->debug.fw_crash_data;
+	int i, z;
+
+	spin_lock_bh(&ar->data_lock);
+
+	z = crash_data->dbglog_entry_data.next_idx;
+
+	for (i = 0; i < len; i++) {
+		crash_data->dbglog_entry_data.data[z] = buffer[i];
+		z++;
+		if (z >= ATH10K_DBGLOG_DATA_LEN)
+			z = 0;
+	}
+
+	crash_data->dbglog_entry_data.next_idx = z;
+
+	spin_unlock_bh(&ar->data_lock);
+}
+EXPORT_SYMBOL(ath10k_debug_dbglog_add);
+
+static struct ath10k_dump_file_data *ath10k_build_dump_file(struct ath10k *ar)
+{
+	struct ath10k_fw_crash_data *crash_data = ar->debug.fw_crash_data;
+	struct ath10k_dump_file_data *dump_data;
+	struct ath10k_tlv_dump_data *dump_tlv;
+	int hdr_len = sizeof(*dump_data);
+	struct timespec timestamp;
+	unsigned int len, sofar = 0;
+	unsigned char *buf;
+
+	len = hdr_len;
+	len += sizeof(*dump_tlv) + sizeof(crash_data->reg_dump_values);
+	len += sizeof(*dump_tlv) + sizeof(crash_data->dbglog_entry_data);
+
+	lockdep_assert_held(&ar->conf_mutex);
+
+	sofar += hdr_len;
+
+	/* This is going to get big when we start dumping FW RAM and such,
+	 * so go ahead and use vmalloc.
+	 */
+	buf = vmalloc(len);
+	if (!buf)
+		return NULL;
+
+	memset(buf, 0, len);
+	dump_data = (struct ath10k_dump_file_data *)(buf);
+	strlcpy(dump_data->df_magic, "ATH10K-FW-DUMP",
+		sizeof(dump_data->df_magic));
+	dump_data->len = len;
+
+#ifdef __BIG_ENDIAN
+	dump_data->big_endian = 1;
+#else
+	dump_data->big_endian = 0;
+#endif
+
+	dump_data->version = 1;
+	memcpy(dump_data->uuid, &crash_data->uuid, sizeof(dump_data->uuid));
+	dump_data->chip_id = ar->chip_id;
+	dump_data->bus_type = 0;
+	dump_data->target_version = ar->target_version;
+	dump_data->fw_version_major = ar->fw_version_major;
+	dump_data->fw_version_minor = ar->fw_version_minor;
+	dump_data->fw_version_release = ar->fw_version_release;
+	dump_data->fw_version_build = ar->fw_version_build;
+	dump_data->phy_capability = ar->phy_capability;
+	dump_data->hw_min_tx_power = ar->hw_min_tx_power;
+	dump_data->hw_max_tx_power = ar->hw_max_tx_power;
+	dump_data->ht_cap_info = ar->ht_cap_info;
+	dump_data->vht_cap_info = ar->vht_cap_info;
+	dump_data->num_rf_chains = ar->num_rf_chains;
+
+	strlcpy(dump_data->fw_ver, ar->hw->wiphy->fw_version,
+		sizeof(dump_data->fw_ver));
+
+	dump_data->kernel_ver_code = LINUX_VERSION_CODE;
+	strlcpy(dump_data->kernel_ver, VERMAGIC_STRING,
+		sizeof(dump_data->kernel_ver));
+
+	getnstimeofday(&timestamp);
+	dump_data->tv_sec = timestamp.tv_sec;
+	dump_data->tv_nsec = timestamp.tv_nsec;
+
+	spin_lock_bh(&ar->data_lock);
+
+	/* Gather dbg-log */
+	dump_tlv = (struct ath10k_tlv_dump_data *)(buf + sofar);
+	dump_tlv->type = ATH10K_FW_CRASH_DUMP_DBGLOG;
+	dump_tlv->tlv_len = sizeof(crash_data->dbglog_entry_data);
+	memcpy(dump_tlv->tlv_data, &crash_data->dbglog_entry_data,
+	       dump_tlv->tlv_len);
+	sofar += sizeof(*dump_tlv) + dump_tlv->tlv_len;
+
+	/* Gather crash-dump */
+	dump_tlv = (struct ath10k_tlv_dump_data *)(buf + sofar);
+	dump_tlv->type = ATH10K_FW_CRASH_DUMP_REGDUMP;
+	dump_tlv->tlv_len = sizeof(crash_data->reg_dump_values);
+	memcpy(dump_tlv->tlv_data, &crash_data->reg_dump_values,
+	       dump_tlv->tlv_len);
+	sofar += sizeof(*dump_tlv) + dump_tlv->tlv_len;
+
+	spin_unlock_bh(&ar->data_lock);
+
+	return dump_data;
+}
+
+static int ath10k_fw_error_dump_open(struct inode *inode, struct file *file)
+{
+	struct ath10k *ar = inode->i_private;
+	struct ath10k_dump_file_data *dump;
+	int ret;
+
+	mutex_lock(&ar->conf_mutex);
+
+	dump = ath10k_build_dump_file(ar);
+	if (!dump) {
+		ret = -ENODATA;
+		goto out;
+	}
+
+	file->private_data = dump;
+	ar->debug.fw_crash_data->crashed_since_read = false;
+	ret = 0;
+
+out:
+	mutex_unlock(&ar->conf_mutex);
+	return ret;
+}
+
+static ssize_t ath10k_fw_error_dump_read(struct file *file,
+					 char __user *user_buf,
+					 size_t count, loff_t *ppos)
+{
+	struct ath10k_dump_file_data *dump_file = file->private_data;
+
+	return simple_read_from_buffer(user_buf, count, ppos,
+				       dump_file,
+				       dump_file->len);
+}
+
+static int ath10k_fw_error_dump_release(struct inode *inode,
+					struct file *file)
+{
+	vfree(file->private_data);
+
+	return 0;
+}
+
+static const struct file_operations fops_fw_error_dump = {
+	.open = ath10k_fw_error_dump_open,
+	.read = ath10k_fw_error_dump_read,
+	.release = ath10k_fw_error_dump_release,
+	.owner = THIS_MODULE,
+	.llseek = default_llseek,
+};
+
 static int ath10k_debug_htt_stats_req(struct ath10k *ar)
 {
 	u64 cookie;
@@ -913,6 +1169,10 @@ static const struct file_operations fops_dfs_stats = {
 
 int ath10k_debug_create(struct ath10k *ar)
 {
+	ar->debug.fw_crash_data = vzalloc(sizeof(ar->debug.fw_crash_data));
+	if (!ar->debug.fw_crash_data)
+		return -ENOMEM;
+
 	ar->debug.debugfs_phy = debugfs_create_dir("ath10k",
 						   ar->hw->wiphy->debugfsdir);
 
@@ -933,6 +1193,9 @@ int ath10k_debug_create(struct ath10k *ar)
 	debugfs_create_file("simulate_fw_crash", S_IRUSR, ar->debug.debugfs_phy,
 			    ar, &fops_simulate_fw_crash);
 
+	debugfs_create_file("fw_error_dump", S_IRUSR, ar->debug.debugfs_phy,
+			    ar, &fops_fw_error_dump);
+
 	debugfs_create_file("chip_id", S_IRUSR, ar->debug.debugfs_phy,
 			    ar, &fops_chip_id);
 
@@ -965,6 +1228,7 @@ int ath10k_debug_create(struct ath10k *ar)
 
 void ath10k_debug_destroy(struct ath10k *ar)
 {
+	vfree(ar->debug.fw_crash_data);
 	cancel_delayed_work_sync(&ar->debug.htt_stats_dwork);
 }
 
diff --git a/drivers/net/wireless/ath/ath10k/debug.h b/drivers/net/wireless/ath/ath10k/debug.h
index a5824990bd2a..80ff14e4db9b 100644
--- a/drivers/net/wireless/ath/ath10k/debug.h
+++ b/drivers/net/wireless/ath/ath10k/debug.h
@@ -53,6 +53,10 @@ void ath10k_debug_read_service_map(struct ath10k *ar,
 				   size_t map_size);
 void ath10k_debug_read_target_stats(struct ath10k *ar,
 				    struct wmi_stats_event *ev);
+struct ath10k_fw_crash_data *
+ath10k_debug_get_new_fw_crash_data(struct ath10k *ar);
+
+void ath10k_debug_dbglog_add(struct ath10k *ar, u8 *buffer, int len);
 
 #define ATH10K_DFS_STAT_INC(ar, c) (ar->debug.dfs_stats.c++)
 
@@ -86,6 +90,17 @@ static inline void ath10k_debug_read_target_stats(struct ath10k *ar,
 {
 }
 
+static inline void ath10k_debug_dbglog_add(struct ath10k *ar, u8 *buffer,
+					   int len)
+{
+}
+
+static inline struct ath10k_fw_crash_data *
+ath10k_debug_get_new_fw_crash_data(struct ath10k *ar)
+{
+	return NULL;
+}
+
 #define ATH10K_DFS_STAT_INC(ar, c) do { } while (0)
 
 #endif /* CONFIG_ATH10K_DEBUGFS */
@@ -96,6 +111,7 @@ __printf(2, 3) void ath10k_dbg(enum ath10k_debug_mask mask,
 void ath10k_dbg_dump(enum ath10k_debug_mask mask,
 		     const char *msg, const char *prefix,
 		     const void *buf, size_t len);
+
 #else /* CONFIG_ATH10K_DEBUG */
 
 static inline int ath10k_dbg(enum ath10k_debug_mask dbg_mask,
@@ -109,5 +125,11 @@ static inline void ath10k_dbg_dump(enum ath10k_debug_mask mask,
 				   const void *buf, size_t len)
 {
 }
+
+static inline void ath10k_dbg_save_fw_dbg_buffer(struct ath10k *ar,
+						 u8 *buffer, int len)
+{
+}
 #endif /* CONFIG_ATH10K_DEBUG */
+
 #endif /* _DEBUG_H_ */
diff --git a/drivers/net/wireless/ath/ath10k/hw.h b/drivers/net/wireless/ath/ath10k/hw.h
index ffd04890407e..c391c88096ee 100644
--- a/drivers/net/wireless/ath/ath10k/hw.h
+++ b/drivers/net/wireless/ath/ath10k/hw.h
@@ -39,6 +39,8 @@
 /* includes also the null byte */
 #define ATH10K_FIRMWARE_MAGIC               "QCA-ATH10K"
 
+#define REG_DUMP_COUNT_QCA988X 60
+
 struct ath10k_fw_ie {
 	__le32 id;
 	__le32 len;
@@ -362,4 +364,32 @@ enum ath10k_mcast2ucast_mode {
 
 #define RTC_STATE_V_GET(x) (((x) & RTC_STATE_V_MASK) >> RTC_STATE_V_LSB)
 
+
+/* Target debug log related defines and structs */
+
+/* Target is 32-bit CPU, so we just use u32 for
+ * the pointers.  The memory space is relative to the
+ * target, not the host.
+ */
+struct ath10k_fw_dbglog_buf {
+	/* pointer to dblog_buf_s */
+	u32 next;
+
+	/* pointer to u8 buffer */
+	u32 buffer;
+
+	u32 bufsize;
+	u32 length;
+	u32 count;
+	u32 free;
+} __packed;
+
+struct ath10k_fw_dbglog_hdr {
+	/* pointer to dbglog_buf_s */
+	u32 dbuf;
+
+	u32 dropped;
+} __packed;
+
+
 #endif /* _HW_H_ */
diff --git a/drivers/net/wireless/ath/ath10k/pci.c b/drivers/net/wireless/ath/ath10k/pci.c
index 96ce359349cb..19a556facb1f 100644
--- a/drivers/net/wireless/ath/ath10k/pci.c
+++ b/drivers/net/wireless/ath/ath10k/pci.c
@@ -867,16 +867,99 @@ static u16 ath10k_pci_hif_get_free_queue_number(struct ath10k *ar, u8 pipe)
 	return ath10k_ce_num_free_src_entries(ar_pci->pipe_info[pipe].ce_hdl);
 }
 
-static void ath10k_pci_hif_dump_area(struct ath10k *ar)
+static void ath10k_pci_dump_dbglog(struct ath10k *ar,
+				   struct ath10k_fw_crash_data *crash_data)
 {
-	u32 reg_dump_values[REG_DUMP_COUNT_QCA988X] = {};
-	int ret;
-	u32 i;
+	struct ath10k_fw_dbglog_hdr dbg_hdr;
+	struct ath10k_fw_dbglog_buf dbuf;
+	u8 *buffer;
+	int ret, i;
+	u32 dbufp;
 
-	ath10k_err("firmware crashed!\n");
-	ath10k_err("hardware name %s version 0x%x\n",
-		   ar->hw_params.name, ar->target_version);
-	ath10k_err("firmware version: %s\n", ar->hw->wiphy->fw_version);
+	/* dump the debug logs on the target */
+	ret = ath10k_pci_diag_read_hi(ar, &dbg_hdr,
+				      hi_dbglog_hdr, sizeof(dbg_hdr));
+	if (ret != 0) {
+		ath10k_warn("failed to dump debug log area from hi_dbglog_hdr: %d\n",
+			    ret);
+		return;
+	}
+
+	ath10k_dbg(ATH10K_DBG_PCI,
+		   "pci dbglog header dbuf 0x%x dropped %i\n",
+		   dbg_hdr.dbuf, dbg_hdr.dropped);
+
+	/* pointer in target memory space */
+	dbufp = dbg_hdr.dbuf;
+
+	/* i is for logging purposes and sanity check in case firmware buffers
+	 * are corrupted and will not properly terminate the list.
+	 * In standard firmware, it appears there are no more than 2
+	 * buffers, so 10 should be safe upper limit even if firmware
+	 * changes quite a bit.
+	 */
+	i = 0;
+	while (dbufp && i < 10) {
+		ret = ath10k_pci_diag_read_mem(ar, dbufp,
+					       &dbuf, sizeof(dbuf));
+		if (ret != 0) {
+			ath10k_warn("failed to read debug log area from address 0x%x: %d\n",
+				    dbufp, ret);
+			return;
+		}
+
+		/* we have a buffer of data */
+		ath10k_dbg(ATH10K_DBG_PCI,
+			   "pci dbglog [%i] next 0x%x buf 0x%x size %i len %i count %i free %i\n",
+			   i, dbuf.next, dbuf.buffer, dbuf.bufsize, dbuf.length,
+			   dbuf.count, dbuf.free);
+		if (dbuf.buffer == 0 || dbuf.length == 0)
+			goto next;
+
+		/* Pick arbitrary upper bound in case firmware is corrupted for
+		 * whatever reason.
+		 */
+		if (dbuf.length > 16000) {
+			ath10k_warn("firmware debug log buffer length is out of bounds: %d\n",
+				    dbuf.length);
+			/* do not trust the next pointer either... */
+			return;
+		}
+
+		buffer = kmalloc(dbuf.length, GFP_ATOMIC);
+
+		if (!buffer)
+			goto next;
+
+		ret = ath10k_pci_diag_read_mem(ar, dbuf.buffer, buffer,
+					       dbuf.length);
+		if (ret != 0) {
+			ath10k_warn("failed to read debug log buffer from address 0x%x: %d\n",
+				    dbuf.buffer, ret);
+			kfree(buffer);
+			return;
+		}
+
+		ath10k_debug_dbglog_add(ar, buffer, dbuf.length);
+		kfree(buffer);
+
+next:
+		dbufp = dbuf.next;
+		if (dbufp == dbg_hdr.dbuf) {
+			/* It is a circular buffer it seems, bail if next
+			 * is head.
+			 */
+			break;
+		}
+		i++;
+	} /* while we have a debug buffer to read */
+}
+
+static void ath10k_pci_dump_registers(struct ath10k *ar,
+				      struct ath10k_fw_crash_data *crash_data)
+{
+	u32 i, reg_dump_values[REG_DUMP_COUNT_QCA988X] = {};
+	int ret;
 
 	ret = ath10k_pci_diag_read_hi(ar, &reg_dump_values[0],
 				      hi_failure_state,
@@ -897,6 +980,40 @@ static void ath10k_pci_hif_dump_area(struct ath10k *ar)
 			   reg_dump_values[i + 2],
 			   reg_dump_values[i + 3]);
 
+	spin_lock_bh(&ar->data_lock);
+
+	crash_data->crashed_since_read = true;
+	memcpy(crash_data->reg_dump_values, reg_dump_values,
+	       sizeof(crash_data->reg_dump_values));
+
+	spin_unlock_bh(&ar->data_lock);
+}
+
+
+static void ath10k_pci_hif_dump_area(struct ath10k *ar)
+{
+	struct ath10k_fw_crash_data *crash_data;
+	char uuid[20];
+
+	crash_data = ath10k_debug_get_new_fw_crash_data(ar);
+
+	if (crash_data)
+		scnprintf(uuid, sizeof(uuid), "%pU", &crash_data->uuid);
+	else
+		scnprintf(uuid, sizeof(uuid), "n/a");
+
+	ath10k_err("firmware crashed! (uuid %s)\n", uuid);
+	ath10k_err("hardware name %s version 0x%x\n",
+		   ar->hw_params.name, ar->target_version);
+	ath10k_err("firmware version: %s\n", ar->hw->wiphy->fw_version);
+
+	if (!crash_data)
+		goto exit;
+
+	ath10k_pci_dump_registers(ar, crash_data);
+	ath10k_pci_dump_dbglog(ar, crash_data);
+
+exit:
 	queue_work(ar->workqueue, &ar->restart_work);
 }
 
diff --git a/drivers/net/wireless/ath/ath10k/pci.h b/drivers/net/wireless/ath/ath10k/pci.h
index 940129209990..f72a7cdec4d4 100644
--- a/drivers/net/wireless/ath/ath10k/pci.h
+++ b/drivers/net/wireless/ath/ath10k/pci.h
@@ -23,9 +23,6 @@
 #include "hw.h"
 #include "ce.h"
 
-/* FW dump area */
-#define REG_DUMP_COUNT_QCA988X 60
-
 /*
  * maximum number of bytes that can be handled atomically by DiagRead/DiagWrite
  */


_______________________________________________
ath10k mailing list
ath10k@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/ath10k

^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH v5 3/7] ath10k: save firmware debug log messages
  2014-08-08 20:28 [PATCH v5 0/7] ath10k: firmware crash dump Kalle Valo
  2014-08-08 20:28 ` [PATCH v5 1/7] ath10k: add ath10k_pci_diag_* helpers Kalle Valo
  2014-08-08 20:28 ` [PATCH v5 2/7] ath10k: provide firmware crash info via debugfs Kalle Valo
@ 2014-08-08 20:28 ` Kalle Valo
  2014-08-08 20:29 ` [PATCH v5 4/7] ath10k: save firmware stack upon firmware crash Kalle Valo
                   ` (4 subsequent siblings)
  7 siblings, 0 replies; 17+ messages in thread
From: Kalle Valo @ 2014-08-08 20:28 UTC (permalink / raw)
  To: ath10k

From: Ben Greear <greearb@candelatech.com>

They may be dumped through the firmware dump debugfs
file.

Signed-off-by: Ben Greear <greearb@candelatech.com>
Signed-off-by: Kalle Valo <kvalo@qca.qualcomm.com>
---
 drivers/net/wireless/ath/ath10k/wmi.c |    6 ++++++
 1 file changed, 6 insertions(+)

diff --git a/drivers/net/wireless/ath/ath10k/wmi.c b/drivers/net/wireless/ath/ath10k/wmi.c
index fffb15b1b50b..cd79858016f3 100644
--- a/drivers/net/wireless/ath/ath10k/wmi.c
+++ b/drivers/net/wireless/ath/ath10k/wmi.c
@@ -1210,6 +1210,12 @@ static int ath10k_wmi_event_debug_mesg(struct ath10k *ar, struct sk_buff *skb)
 
 	trace_ath10k_wmi_dbglog(skb->data, skb->len);
 
+	/* First 4 bytes are a messages-dropped-due-to-overflow counter,
+	 * and should not be recorded in the dbglog buffer, so we skip
+	 * them.
+	 */
+	ath10k_debug_dbglog_add(ar, skb->data + 4, skb->len - 4);
+
 	return 0;
 }
 


_______________________________________________
ath10k mailing list
ath10k@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/ath10k

^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH v5 4/7] ath10k: save firmware stack upon firmware crash
  2014-08-08 20:28 [PATCH v5 0/7] ath10k: firmware crash dump Kalle Valo
                   ` (2 preceding siblings ...)
  2014-08-08 20:28 ` [PATCH v5 3/7] ath10k: save firmware debug log messages Kalle Valo
@ 2014-08-08 20:29 ` Kalle Valo
  2014-08-08 20:29 ` [PATCH v5 5/7] ath10k: dump exception stack contents on " Kalle Valo
                   ` (3 subsequent siblings)
  7 siblings, 0 replies; 17+ messages in thread
From: Kalle Valo @ 2014-08-08 20:29 UTC (permalink / raw)
  To: ath10k

From: Ben Greear <greearb@candelatech.com>

Should help debug firmware crashes, and give users a way
to provide some useful debug reports to firmware developers.

Signed-off-by: Ben Greear <greearb@candelatech.com>
Signed-off-by: Kalle Valo <kvalo@qca.qualcomm.com>
---
 drivers/net/wireless/ath/ath10k/core.h  |    1 +
 drivers/net/wireless/ath/ath10k/debug.c |   11 +++++++++++
 drivers/net/wireless/ath/ath10k/hw.h    |    1 +
 drivers/net/wireless/ath/ath10k/pci.c   |   15 +++++++++++++++
 4 files changed, 28 insertions(+)

diff --git a/drivers/net/wireless/ath/ath10k/core.h b/drivers/net/wireless/ath/ath10k/core.h
index 1243de3eae46..e5f43d6960a9 100644
--- a/drivers/net/wireless/ath/ath10k/core.h
+++ b/drivers/net/wireless/ath/ath10k/core.h
@@ -299,6 +299,7 @@ struct ath10k_fw_crash_data {
 	uuid_le uuid;
 	struct ath10k_dbglog_entry_storage dbglog_entry_data;
 	u32 reg_dump_values[REG_DUMP_COUNT_QCA988X];
+	u8 stack_buf[ATH10K_FW_STACK_SIZE];
 };
 
 struct ath10k_debug {
diff --git a/drivers/net/wireless/ath/ath10k/debug.c b/drivers/net/wireless/ath/ath10k/debug.c
index 2a443e9557ed..e019850dc969 100644
--- a/drivers/net/wireless/ath/ath10k/debug.c
+++ b/drivers/net/wireless/ath/ath10k/debug.c
@@ -31,10 +31,12 @@
  * enum ath10k_fw_error_dump_type - types of data in the dump file
  * @ATH10K_FW_CRASH_DUMP_DBGLOG:  Recent firmware debug log entries
  * @ATH10K_FW_CRASH_DUMP_REGDUMP: Register crash dump in binary format
+ * @ATH10K_FW_CRASH_DUMP_STACK:   Stack memory contents.
  */
 enum ath10k_fw_error_dump_type {
 	ATH10K_FW_CRASH_DUMP_DBGLOG = 0,
 	ATH10K_FW_CRASH_DUMP_REGDUMP = 1,
+	ATH10K_FW_CRASH_DUMP_STACK = 2,
 
 	ATH10K_FW_CRASH_DUMP_MAX,
 };
@@ -712,6 +714,7 @@ static struct ath10k_dump_file_data *ath10k_build_dump_file(struct ath10k *ar)
 	len = hdr_len;
 	len += sizeof(*dump_tlv) + sizeof(crash_data->reg_dump_values);
 	len += sizeof(*dump_tlv) + sizeof(crash_data->dbglog_entry_data);
+	len += sizeof(*dump_tlv) + sizeof(crash_data->stack_buf);
 
 	lockdep_assert_held(&ar->conf_mutex);
 
@@ -781,8 +784,16 @@ static struct ath10k_dump_file_data *ath10k_build_dump_file(struct ath10k *ar)
 	       dump_tlv->tlv_len);
 	sofar += sizeof(*dump_tlv) + dump_tlv->tlv_len;
 
+	/* Gather firmware stack dump */
+	dump_tlv = (struct ath10k_tlv_dump_data *)(buf + sofar);
+	dump_tlv->type = ATH10K_FW_CRASH_DUMP_STACK;
+	dump_tlv->tlv_len = sizeof(crash_data->stack_buf);
+	memcpy(dump_tlv->tlv_data, crash_data->stack_buf, dump_tlv->tlv_len);
+	sofar += sizeof(*dump_tlv) + dump_tlv->tlv_len;
+
 	spin_unlock_bh(&ar->data_lock);
 
+	WARN_ON(sofar != len);
 	return dump_data;
 }
 
diff --git a/drivers/net/wireless/ath/ath10k/hw.h b/drivers/net/wireless/ath/ath10k/hw.h
index c391c88096ee..9c602b0d7277 100644
--- a/drivers/net/wireless/ath/ath10k/hw.h
+++ b/drivers/net/wireless/ath/ath10k/hw.h
@@ -40,6 +40,7 @@
 #define ATH10K_FIRMWARE_MAGIC               "QCA-ATH10K"
 
 #define REG_DUMP_COUNT_QCA988X 60
+#define ATH10K_FW_STACK_SIZE 4096
 
 struct ath10k_fw_ie {
 	__le32 id;
diff --git a/drivers/net/wireless/ath/ath10k/pci.c b/drivers/net/wireless/ath/ath10k/pci.c
index 19a556facb1f..016f6641d1f7 100644
--- a/drivers/net/wireless/ath/ath10k/pci.c
+++ b/drivers/net/wireless/ath/ath10k/pci.c
@@ -867,6 +867,15 @@ static u16 ath10k_pci_hif_get_free_queue_number(struct ath10k *ar, u8 pipe)
 	return ath10k_ce_num_free_src_entries(ar_pci->pipe_info[pipe].ce_hdl);
 }
 
+/* Save the main firmware stack */
+static void ath10k_pci_dump_stack(struct ath10k *ar,
+				  struct ath10k_fw_crash_data *crash_data)
+{
+	BUILD_BUG_ON(ATH10K_FW_STACK_SIZE % 4);
+	ath10k_pci_diag_read_hi(ar, crash_data->stack_buf,
+				hi_stack, ATH10K_FW_STACK_SIZE);
+}
+
 static void ath10k_pci_dump_dbglog(struct ath10k *ar,
 				   struct ath10k_fw_crash_data *crash_data)
 {
@@ -1010,6 +1019,12 @@ static void ath10k_pci_hif_dump_area(struct ath10k *ar)
 	if (!crash_data)
 		goto exit;
 
+	spin_lock_bh(&ar->data_lock);
+
+	ath10k_pci_dump_stack(ar, crash_data);
+
+	spin_unlock_bh(&ar->data_lock);
+
 	ath10k_pci_dump_registers(ar, crash_data);
 	ath10k_pci_dump_dbglog(ar, crash_data);
 


_______________________________________________
ath10k mailing list
ath10k@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/ath10k

^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH v5 5/7] ath10k: dump exception stack contents on firmware crash
  2014-08-08 20:28 [PATCH v5 0/7] ath10k: firmware crash dump Kalle Valo
                   ` (3 preceding siblings ...)
  2014-08-08 20:29 ` [PATCH v5 4/7] ath10k: save firmware stack upon firmware crash Kalle Valo
@ 2014-08-08 20:29 ` Kalle Valo
  2014-08-08 20:29 ` [PATCH v5 6/7] ath10k: save firmware RAM and ROM BSS sections on crash Kalle Valo
                   ` (2 subsequent siblings)
  7 siblings, 0 replies; 17+ messages in thread
From: Kalle Valo @ 2014-08-08 20:29 UTC (permalink / raw)
  To: ath10k

From: Ben Greear <greearb@candelatech.com>

Firmware developers can decode this and maybe figure out
why the firmware crashed.

Signed-off-by: Ben Greear <greearb@candelatech.com>
Signed-off-by: Kalle Valo <kvalo@qca.qualcomm.com>
---
 drivers/net/wireless/ath/ath10k/core.h  |    1 +
 drivers/net/wireless/ath/ath10k/debug.c |   11 +++++++++++
 drivers/net/wireless/ath/ath10k/pci.c   |   10 ++++++++++
 3 files changed, 22 insertions(+)

diff --git a/drivers/net/wireless/ath/ath10k/core.h b/drivers/net/wireless/ath/ath10k/core.h
index e5f43d6960a9..dd1a939974c2 100644
--- a/drivers/net/wireless/ath/ath10k/core.h
+++ b/drivers/net/wireless/ath/ath10k/core.h
@@ -300,6 +300,7 @@ struct ath10k_fw_crash_data {
 	struct ath10k_dbglog_entry_storage dbglog_entry_data;
 	u32 reg_dump_values[REG_DUMP_COUNT_QCA988X];
 	u8 stack_buf[ATH10K_FW_STACK_SIZE];
+	u8 exc_stack_buf[ATH10K_FW_STACK_SIZE];
 };
 
 struct ath10k_debug {
diff --git a/drivers/net/wireless/ath/ath10k/debug.c b/drivers/net/wireless/ath/ath10k/debug.c
index e019850dc969..d605fe54522b 100644
--- a/drivers/net/wireless/ath/ath10k/debug.c
+++ b/drivers/net/wireless/ath/ath10k/debug.c
@@ -32,11 +32,13 @@
  * @ATH10K_FW_CRASH_DUMP_DBGLOG:  Recent firmware debug log entries
  * @ATH10K_FW_CRASH_DUMP_REGDUMP: Register crash dump in binary format
  * @ATH10K_FW_CRASH_DUMP_STACK:   Stack memory contents.
+ * @ATH10K_FW_CRASH_DUMP_EXC_STACK:  Exception stack contents
  */
 enum ath10k_fw_error_dump_type {
 	ATH10K_FW_CRASH_DUMP_DBGLOG = 0,
 	ATH10K_FW_CRASH_DUMP_REGDUMP = 1,
 	ATH10K_FW_CRASH_DUMP_STACK = 2,
+	ATH10K_FW_CRASH_DUMP_EXC_STACK = 3,
 
 	ATH10K_FW_CRASH_DUMP_MAX,
 };
@@ -715,6 +717,7 @@ static struct ath10k_dump_file_data *ath10k_build_dump_file(struct ath10k *ar)
 	len += sizeof(*dump_tlv) + sizeof(crash_data->reg_dump_values);
 	len += sizeof(*dump_tlv) + sizeof(crash_data->dbglog_entry_data);
 	len += sizeof(*dump_tlv) + sizeof(crash_data->stack_buf);
+	len += sizeof(*dump_tlv) + sizeof(crash_data->exc_stack_buf);
 
 	lockdep_assert_held(&ar->conf_mutex);
 
@@ -791,6 +794,14 @@ static struct ath10k_dump_file_data *ath10k_build_dump_file(struct ath10k *ar)
 	memcpy(dump_tlv->tlv_data, crash_data->stack_buf, dump_tlv->tlv_len);
 	sofar += sizeof(*dump_tlv) + dump_tlv->tlv_len;
 
+	/* Gather firmware exception (irq) stack dump */
+	dump_tlv = (struct ath10k_tlv_dump_data *)(buf + sofar);
+	dump_tlv->type = ATH10K_FW_CRASH_DUMP_EXC_STACK;
+	dump_tlv->tlv_len = sizeof(crash_data->exc_stack_buf);
+	memcpy(dump_tlv->tlv_data, &crash_data->exc_stack_buf,
+	       dump_tlv->tlv_len);
+	sofar += sizeof(*dump_tlv) + dump_tlv->tlv_len;
+
 	spin_unlock_bh(&ar->data_lock);
 
 	WARN_ON(sofar != len);
diff --git a/drivers/net/wireless/ath/ath10k/pci.c b/drivers/net/wireless/ath/ath10k/pci.c
index 016f6641d1f7..5e4de6cb6fcf 100644
--- a/drivers/net/wireless/ath/ath10k/pci.c
+++ b/drivers/net/wireless/ath/ath10k/pci.c
@@ -876,6 +876,15 @@ static void ath10k_pci_dump_stack(struct ath10k *ar,
 				hi_stack, ATH10K_FW_STACK_SIZE);
 }
 
+/* Save the firmware exception stack */
+static void ath10k_pci_dump_exc_stack(struct ath10k *ar,
+				      struct ath10k_fw_crash_data *crash_data)
+{
+	BUILD_BUG_ON(ATH10K_FW_STACK_SIZE % 4);
+	ath10k_pci_diag_read_hi(ar, crash_data->exc_stack_buf,
+				hi_err_stack, ATH10K_FW_STACK_SIZE);
+}
+
 static void ath10k_pci_dump_dbglog(struct ath10k *ar,
 				   struct ath10k_fw_crash_data *crash_data)
 {
@@ -1022,6 +1031,7 @@ static void ath10k_pci_hif_dump_area(struct ath10k *ar)
 	spin_lock_bh(&ar->data_lock);
 
 	ath10k_pci_dump_stack(ar, crash_data);
+	ath10k_pci_dump_exc_stack(ar, crash_data);
 
 	spin_unlock_bh(&ar->data_lock);
 


_______________________________________________
ath10k mailing list
ath10k@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/ath10k

^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH v5 6/7] ath10k: save firmware RAM and ROM BSS sections on crash
  2014-08-08 20:28 [PATCH v5 0/7] ath10k: firmware crash dump Kalle Valo
                   ` (4 preceding siblings ...)
  2014-08-08 20:29 ` [PATCH v5 5/7] ath10k: dump exception stack contents on " Kalle Valo
@ 2014-08-08 20:29 ` Kalle Valo
  2014-08-08 22:36   ` Ben Greear
  2014-08-08 20:29 ` [PATCH v5 7/7] ath10k: rename ath10k_pci_hif_dump_area() to ath10k_pci_firmware_crashed() Kalle Valo
  2014-08-08 21:32 ` [PATCH v5 0/7] ath10k: firmware crash dump Ben Greear
  7 siblings, 1 reply; 17+ messages in thread
From: Kalle Valo @ 2014-08-08 20:29 UTC (permalink / raw)
  To: ath10k

From: Ben Greear <greearb@candelatech.com>

This can be used to get a useful back trace out of a firmware
crash that involves an interrupt handler.  For instance, a
null-pointer-exception would be this kind of trace.  A user-space
tool can read the debugfs file and decode things as wished.

This requires a packaged firmware with a new IE to describe the
BSS section starts and length.

Signed-off-by: Ben Greear <greearb@candelatech.com>
Signed-off-by: Kalle Valo <kvalo@qca.qualcomm.com>
---
 drivers/net/wireless/ath/ath10k/core.c  |   54 +++++++++++++++++++++++++++++++
 drivers/net/wireless/ath/ath10k/core.h  |   16 +++++++++
 drivers/net/wireless/ath/ath10k/debug.c |   28 ++++++++++++++++
 drivers/net/wireless/ath/ath10k/hw.h    |    5 +++
 drivers/net/wireless/ath/ath10k/pci.c   |   39 ++++++++++++++++++++++
 5 files changed, 142 insertions(+)

diff --git a/drivers/net/wireless/ath/ath10k/core.c b/drivers/net/wireless/ath/ath10k/core.c
index 440c3ff03aec..39b7383b395c 100644
--- a/drivers/net/wireless/ath/ath10k/core.c
+++ b/drivers/net/wireless/ath/ath10k/core.c
@@ -479,6 +479,60 @@ static int ath10k_core_fetch_firmware_api_n(struct ath10k *ar, const char *name)
 			ar->otp_len = ie_len;
 
 			break;
+		case ATH10K_FW_IE_RAM_BSS_ADDR:
+			if (ie_len != sizeof(u32))
+				break;
+
+			ar->fw.ram_bss_addr = le32_to_cpup((__le32 *)data);
+
+			ath10k_dbg(ATH10K_DBG_BOOT,
+				   "found RAM BSS address 0x%x\n",
+				   ar->fw.ram_bss_addr);
+			break;
+		case ATH10K_FW_IE_RAM_BSS_LEN:
+			if (ie_len != sizeof(u32))
+				break;
+
+			ar->fw.ram_bss_len = le32_to_cpup((__le32 *)data);
+
+			ath10k_dbg(ATH10K_DBG_BOOT,
+				   "found RAM BSS length 0x%x\n",
+				   ar->fw.ram_bss_len);
+
+			if (ar->fw.ram_bss_len > ATH10K_RAM_BSS_BUF_LEN) {
+				ath10k_warn("too long firmware RAM BSS length: %d\n",
+					    ar->fw.ram_bss_len);
+				ar->fw.ram_bss_len = 0;
+			}
+
+			break;
+		case ATH10K_FW_IE_ROM_BSS_ADDR:
+			if (ie_len != sizeof(u32))
+				break;
+
+			ar->fw.rom_bss_addr = le32_to_cpup((__le32 *)data);
+
+			ath10k_dbg(ATH10K_DBG_BOOT,
+				   "found ROM BSS address 0x%x\n",
+				   ar->fw.rom_bss_addr);
+			break;
+		case ATH10K_FW_IE_ROM_BSS_LEN:
+			if (ie_len != sizeof(u32))
+				break;
+
+			ar->fw.rom_bss_len = le32_to_cpup((__le32 *)data);
+
+			ath10k_dbg(ATH10K_DBG_BOOT,
+				   "found ROM BSS length 0x%x\n",
+				   ar->fw.rom_bss_len);
+
+			if (ar->fw.ram_bss_len > ATH10K_RAM_BSS_BUF_LEN) {
+				ath10k_warn("too long firmware RAM BSS length: %d\n",
+					    ar->fw.ram_bss_len);
+				ar->fw.rom_bss_len = 0;
+			}
+
+			break;
 		default:
 			ath10k_warn("Unknown FW IE: %u\n",
 				    le32_to_cpu(hdr->id));
diff --git a/drivers/net/wireless/ath/ath10k/core.h b/drivers/net/wireless/ath/ath10k/core.h
index dd1a939974c2..19832ac38685 100644
--- a/drivers/net/wireless/ath/ath10k/core.h
+++ b/drivers/net/wireless/ath/ath10k/core.h
@@ -292,6 +292,10 @@ struct ath10k_dbglog_entry_storage {
 	u8 data[ATH10K_DBGLOG_DATA_LEN];
 };
 
+/* estimated values, hopefully these are enough */
+#define ATH10K_ROM_BSS_BUF_LEN 10000
+#define ATH10K_RAM_BSS_BUF_LEN 30000
+
 /* used for crash-dump storage, protected by data-lock */
 struct ath10k_fw_crash_data {
 	bool crashed_since_read;
@@ -301,6 +305,9 @@ struct ath10k_fw_crash_data {
 	u32 reg_dump_values[REG_DUMP_COUNT_QCA988X];
 	u8 stack_buf[ATH10K_FW_STACK_SIZE];
 	u8 exc_stack_buf[ATH10K_FW_STACK_SIZE];
+
+	u8 rom_bss_buf[ATH10K_ROM_BSS_BUF_LEN];
+	u8 ram_bss_buf[ATH10K_RAM_BSS_BUF_LEN];
 };
 
 struct ath10k_debug {
@@ -426,6 +433,15 @@ struct ath10k {
 		} fw;
 	} hw_params;
 
+	/* These are written to only during first firmware boot so no need
+	 * for locking. */
+	struct {
+		unsigned int ram_bss_addr;
+		unsigned int ram_bss_len;
+		unsigned int rom_bss_addr;
+		unsigned int rom_bss_len;
+	} fw;
+
 	const struct firmware *board;
 	const void *board_data;
 	size_t board_len;
diff --git a/drivers/net/wireless/ath/ath10k/debug.c b/drivers/net/wireless/ath/ath10k/debug.c
index d605fe54522b..dbdf5f638568 100644
--- a/drivers/net/wireless/ath/ath10k/debug.c
+++ b/drivers/net/wireless/ath/ath10k/debug.c
@@ -33,12 +33,16 @@
  * @ATH10K_FW_CRASH_DUMP_REGDUMP: Register crash dump in binary format
  * @ATH10K_FW_CRASH_DUMP_STACK:   Stack memory contents.
  * @ATH10K_FW_CRASH_DUMP_EXC_STACK:  Exception stack contents
+ * @ATH10K_FW_CRASH_DUMP_RAM_BSS:  BSS area for RAM code
+ * @ATH10K_FW_CRASH_DUMP_ROM_BSS:  BSS area for ROM code
  */
 enum ath10k_fw_error_dump_type {
 	ATH10K_FW_CRASH_DUMP_DBGLOG = 0,
 	ATH10K_FW_CRASH_DUMP_REGDUMP = 1,
 	ATH10K_FW_CRASH_DUMP_STACK = 2,
 	ATH10K_FW_CRASH_DUMP_EXC_STACK = 3,
+	ATH10K_FW_CRASH_DUMP_RAM_BSS = 4,
+	ATH10K_FW_CRASH_DUMP_ROM_BSS = 5,
 
 	ATH10K_FW_CRASH_DUMP_MAX,
 };
@@ -719,6 +723,12 @@ static struct ath10k_dump_file_data *ath10k_build_dump_file(struct ath10k *ar)
 	len += sizeof(*dump_tlv) + sizeof(crash_data->stack_buf);
 	len += sizeof(*dump_tlv) + sizeof(crash_data->exc_stack_buf);
 
+	if (ar->fw.ram_bss_addr && ar->fw.ram_bss_len)
+		len += sizeof(*dump_tlv) + ar->fw.ram_bss_len;
+
+	if (ar->fw.rom_bss_addr && ar->fw.rom_bss_len)
+		len += sizeof(*dump_tlv) + ar->fw.rom_bss_len;
+
 	lockdep_assert_held(&ar->conf_mutex);
 
 	sofar += hdr_len;
@@ -802,6 +812,24 @@ static struct ath10k_dump_file_data *ath10k_build_dump_file(struct ath10k *ar)
 	       dump_tlv->tlv_len);
 	sofar += sizeof(*dump_tlv) + dump_tlv->tlv_len;
 
+	if (ar->fw.ram_bss_addr && ar->fw.ram_bss_len) {
+		dump_tlv = (struct ath10k_tlv_dump_data *)(buf + sofar);
+		dump_tlv->type = ATH10K_FW_CRASH_DUMP_RAM_BSS;
+		dump_tlv->tlv_len = ar->fw.ram_bss_len;
+		memcpy(dump_tlv->tlv_data, crash_data->ram_bss_buf,
+		       dump_tlv->tlv_len);
+		sofar += sizeof(*dump_tlv) + dump_tlv->tlv_len;
+	}
+
+	if (ar->fw.rom_bss_addr && ar->fw.rom_bss_len) {
+		dump_tlv = (struct ath10k_tlv_dump_data *)(buf + sofar);
+		dump_tlv->type = ATH10K_FW_CRASH_DUMP_ROM_BSS;
+		dump_tlv->tlv_len = ar->fw.rom_bss_len;
+		memcpy(dump_tlv->tlv_data, crash_data->rom_bss_buf,
+		       dump_tlv->tlv_len);
+		sofar += sizeof(*dump_tlv) + dump_tlv->tlv_len;
+	}
+
 	spin_unlock_bh(&ar->data_lock);
 
 	WARN_ON(sofar != len);
diff --git a/drivers/net/wireless/ath/ath10k/hw.h b/drivers/net/wireless/ath/ath10k/hw.h
index 9c602b0d7277..6c275638ddaa 100644
--- a/drivers/net/wireless/ath/ath10k/hw.h
+++ b/drivers/net/wireless/ath/ath10k/hw.h
@@ -54,6 +54,11 @@ enum ath10k_fw_ie_type {
 	ATH10K_FW_IE_FEATURES = 2,
 	ATH10K_FW_IE_FW_IMAGE = 3,
 	ATH10K_FW_IE_OTP_IMAGE = 4,
+	ATH10K_FW_IE_RAM_BSS_ADDR = 5,
+	ATH10K_FW_IE_RAM_BSS_LEN = 6,
+	ATH10K_FW_IE_ROM_BSS_ADDR = 7,
+	ATH10K_FW_IE_ROM_BSS_LEN = 8,
+
 };
 
 /* Known pecularities:
diff --git a/drivers/net/wireless/ath/ath10k/pci.c b/drivers/net/wireless/ath/ath10k/pci.c
index 5e4de6cb6fcf..a65d45c78542 100644
--- a/drivers/net/wireless/ath/ath10k/pci.c
+++ b/drivers/net/wireless/ath/ath10k/pci.c
@@ -885,6 +885,43 @@ static void ath10k_pci_dump_exc_stack(struct ath10k *ar,
 				hi_err_stack, ATH10K_FW_STACK_SIZE);
 }
 
+static void ath10k_pci_dump_bss_ram(struct ath10k *ar,
+				    struct ath10k_fw_crash_data *crash_data) {
+	int ret;
+
+	if (!ar->fw.ram_bss_addr)
+		return;
+
+	if (!ar->fw.ram_bss_len)
+		return;
+
+	ret = ath10k_pci_diag_read_mem(ar, ar->fw.ram_bss_addr,
+				       crash_data->ram_bss_buf,
+				       ar->fw.ram_bss_len);
+	if (ret)
+		ath10k_warn("failed to read firmware RAM BSS memory from %d (%d B): %d\n",
+			    ar->fw.ram_bss_addr, ar->fw.ram_bss_len, ret);
+}
+
+static void ath10k_pci_dump_bss_rom(struct ath10k *ar,
+				    struct ath10k_fw_crash_data *crash_data)
+{
+	int ret;
+
+	if (!ar->fw.rom_bss_addr)
+		return;
+
+	if (!ar->fw.rom_bss_len)
+		return;
+
+	ret = ath10k_pci_diag_read_mem(ar, ar->fw.rom_bss_addr,
+				       crash_data->rom_bss_buf,
+				       ar->fw.rom_bss_len);
+	if (ret)
+		ath10k_warn("failed to read firmware ROM BSS memory from %d (%d B): %d\n",
+			    ar->fw.rom_bss_addr, ar->fw.rom_bss_len, ret);
+}
+
 static void ath10k_pci_dump_dbglog(struct ath10k *ar,
 				   struct ath10k_fw_crash_data *crash_data)
 {
@@ -1032,6 +1069,8 @@ static void ath10k_pci_hif_dump_area(struct ath10k *ar)
 
 	ath10k_pci_dump_stack(ar, crash_data);
 	ath10k_pci_dump_exc_stack(ar, crash_data);
+	ath10k_pci_dump_bss_ram(ar, crash_data);
+	ath10k_pci_dump_bss_rom(ar, crash_data);
 
 	spin_unlock_bh(&ar->data_lock);
 


_______________________________________________
ath10k mailing list
ath10k@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/ath10k

^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH v5 7/7] ath10k: rename ath10k_pci_hif_dump_area() to ath10k_pci_firmware_crashed()
  2014-08-08 20:28 [PATCH v5 0/7] ath10k: firmware crash dump Kalle Valo
                   ` (5 preceding siblings ...)
  2014-08-08 20:29 ` [PATCH v5 6/7] ath10k: save firmware RAM and ROM BSS sections on crash Kalle Valo
@ 2014-08-08 20:29 ` Kalle Valo
  2014-08-08 21:32 ` [PATCH v5 0/7] ath10k: firmware crash dump Ben Greear
  7 siblings, 0 replies; 17+ messages in thread
From: Kalle Valo @ 2014-08-08 20:29 UTC (permalink / raw)
  To: ath10k

Better to have a clear name for the function.

Signed-off-by: Kalle Valo <kvalo@qca.qualcomm.com>
---
 drivers/net/wireless/ath/ath10k/pci.c |    8 ++++----
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/drivers/net/wireless/ath/ath10k/pci.c b/drivers/net/wireless/ath/ath10k/pci.c
index a65d45c78542..fe7159c3c8b1 100644
--- a/drivers/net/wireless/ath/ath10k/pci.c
+++ b/drivers/net/wireless/ath/ath10k/pci.c
@@ -1045,7 +1045,7 @@ static void ath10k_pci_dump_registers(struct ath10k *ar,
 }
 
 
-static void ath10k_pci_hif_dump_area(struct ath10k *ar)
+static void ath10k_pci_firmware_crashed(struct ath10k *ar)
 {
 	struct ath10k_fw_crash_data *crash_data;
 	char uuid[20];
@@ -2000,7 +2000,7 @@ static void ath10k_pci_fw_interrupt_handler(struct ath10k *ar)
 				   fw_indicator & ~FW_IND_EVENT_PENDING);
 
 		if (ar_pci->started) {
-			ath10k_pci_hif_dump_area(ar);
+			ath10k_pci_firmware_crashed(ar);
 		} else {
 			/*
 			 * Probable Target failure before we're prepared
@@ -2451,7 +2451,7 @@ static void ath10k_pci_early_irq_tasklet(unsigned long data)
 	if (fw_ind & FW_IND_EVENT_PENDING) {
 		ath10k_pci_write32(ar, FW_INDICATOR_ADDRESS,
 				   fw_ind & ~FW_IND_EVENT_PENDING);
-		ath10k_pci_hif_dump_area(ar);
+		ath10k_pci_firmware_crashed(ar);
 	}
 
 	ath10k_pci_sleep(ar);
@@ -2730,7 +2730,7 @@ static int ath10k_pci_wait_for_target_init(struct ath10k *ar)
 		ath10k_warn("device has crashed during init\n");
 		ath10k_pci_write32(ar, FW_INDICATOR_ADDRESS,
 				   val & ~FW_IND_EVENT_PENDING);
-		ath10k_pci_hif_dump_area(ar);
+		ath10k_pci_firmware_crashed(ar);
 		ret = -ECOMM;
 		goto out;
 	}


_______________________________________________
ath10k mailing list
ath10k@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/ath10k

^ permalink raw reply related	[flat|nested] 17+ messages in thread

* Re: [PATCH v5 0/7] ath10k: firmware crash dump
  2014-08-08 20:28 [PATCH v5 0/7] ath10k: firmware crash dump Kalle Valo
                   ` (6 preceding siblings ...)
  2014-08-08 20:29 ` [PATCH v5 7/7] ath10k: rename ath10k_pci_hif_dump_area() to ath10k_pci_firmware_crashed() Kalle Valo
@ 2014-08-08 21:32 ` Ben Greear
  2014-08-09  5:50   ` Kalle Valo
  7 siblings, 1 reply; 17+ messages in thread
From: Ben Greear @ 2014-08-08 21:32 UTC (permalink / raw)
  To: Kalle Valo; +Cc: ath10k

On 08/08/2014 01:28 PM, Kalle Valo wrote:
> Hi,
> 
> here's my reworked Ben's patchset adding firmware crash dump support to ath10k.
> Unfortunately this crashes when reading the stack dump from the firmware but
> time run out for me to fix that and I wanted to send this for comments anyway.
> 
> I did quite a lot of changes, basically to simplify the code, remove ifdefs and
> so on. Here's some sort of list what I did:
> 
> * dump_data->tv_sec and tv_nsec to 64 bits (because long can be 32 bits
>   on some platforms)

I did u32 on purpose because I know how to do ntohl, htonl if I need
to flip the machine order in user-space.  Is there something similar
for 64-bit numbers?

> * fix long lines
> 
> * renamed ath10k_dbg_save_fw_dbg_buffer() to ath10k_debug_dbglog_add()
> 
> * add helpers for ath10k_pci_diag* functions
> 
> * refactor and rename ath10k_pci_hif_dump_area()
> 
> * latest crash dump is always stored (instead of the oldest unread)

At least with the kernel, the first crash is normally the most useful,
so I figured the same would be true of the firmware (a firmware crash
is excellent way to find hidden bugs in the driver, so upon crash/reload,
it is more likely that the driver will be screwed up and thus mis-configure
firmware...)

> 
> * add ath10k_debug_get_fw_crash_data()
> 
> * move fw_r?m_bss_* fields to ar->fw
> 
> * struct ath10k_fw_crash_data is allocated with vmalloc()
> 
> * atomic allocation in ath10k_pci_dump_bss() is bad, fix that by using vmalloc
>   in module initialisation
> 
> * separate FW IE entries for BSS regions
> 
> * don't use ath10k_err()
> 
> * simplify locking and memory allocation for FW IE handling
> 
> * add uuid
> 
> * move struct ath10k_dump_file_data and enum ath10k_fw_error_dump_type to debug.c
> 
> * function and variable naming, using ath10k_fw_crash_ prefix etc
> 
> * change warning and debug messages to follow ath10k style
> 
> * add ath10k_debug_get_new_fw_crash_data() to avoid ifdefs in pci.c
> 
> And I still have TODO:
> 
> * rename crashed_since_read to crashed?
> 
> * atomic allocation in ath10k_pci_dump_dbglog() is bad. Should we
>   allocate a big buffer with vmalloc and use that?

dbglog entries are probably never going to be large, they are currently
in the 2k range, so vmalloc is likely overkill.  If you want it allocated
at startup, just choose a 4k buffer size, can put the buffer right in
the debug struct or something like that so you don't even have more memory
management to deal with.  It will waste 4k of RAM for normal use, however.

> 
> * what should ath10k_fw_error_dump_open() do if firmware hasn't
>   crashed? check crashed_since_read and return zero len file? or an
>   error code? -ENOMSG?

Maybe a mostly empty crash log with just the header, should be easy enough
to figure out it's not really a crash.  If we use udev to gather these,
it is likely to be stupid shell scripts doing a lot of the work, so
being clever on return codes might not be helpful.

> * should the crash dump file actually be in little endian? would that
>   be easier/simpler?

I think it is about the same amount of work in user-space, so I'd
keep the kernel simple and dump in the CPU's endian-ness.  I don't
care that much either way, however.

> * should ath10k_pci_hif_dump_area() hold the lock all the time? That
>   way we would guarantee that changes to ath10k_fw_crash_data are
>   atomic.

I don't think it's worth trying to do that.

I'll go dig through the patches in a bit..have to figure out why systemd
is fcked up on my lab machine first. :P

Thanks,
Ben


-- 
Ben Greear <greearb@candelatech.com>
Candela Technologies Inc  http://www.candelatech.com


_______________________________________________
ath10k mailing list
ath10k@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/ath10k

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH v5 2/7] ath10k: provide firmware crash info via debugfs
  2014-08-08 20:28 ` [PATCH v5 2/7] ath10k: provide firmware crash info via debugfs Kalle Valo
@ 2014-08-08 22:26   ` Ben Greear
  2014-08-09  5:52     ` Kalle Valo
  0 siblings, 1 reply; 17+ messages in thread
From: Ben Greear @ 2014-08-08 22:26 UTC (permalink / raw)
  To: Kalle Valo; +Cc: ath10k

On 08/08/2014 01:28 PM, Kalle Valo wrote:

> +
> +	getnstimeofday(&timestamp);
> +	dump_data->tv_sec = timestamp.tv_sec;
> +	dump_data->tv_nsec = timestamp.tv_nsec;

I wonder if we should take timestamp at crash time instead of crash-dump-gather time?


>  int ath10k_debug_create(struct ath10k *ar)
>  {
> +	ar->debug.fw_crash_data = vzalloc(sizeof(ar->debug.fw_crash_data));
> +	if (!ar->debug.fw_crash_data)
> +		return -ENOMEM;
> +

That sizeof looks quite wrong.


Thanks,
Ben

-- 
Ben Greear <greearb@candelatech.com>
Candela Technologies Inc  http://www.candelatech.com


_______________________________________________
ath10k mailing list
ath10k@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/ath10k

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH v5 6/7] ath10k: save firmware RAM and ROM BSS sections on crash
  2014-08-08 20:29 ` [PATCH v5 6/7] ath10k: save firmware RAM and ROM BSS sections on crash Kalle Valo
@ 2014-08-08 22:36   ` Ben Greear
  2014-08-09  6:05     ` Kalle Valo
  0 siblings, 1 reply; 17+ messages in thread
From: Ben Greear @ 2014-08-08 22:36 UTC (permalink / raw)
  To: Kalle Valo; +Cc: ath10k

On 08/08/2014 01:29 PM, Kalle Valo wrote:
>  
> +/* estimated values, hopefully these are enough */
> +#define ATH10K_ROM_BSS_BUF_LEN 10000
> +#define ATH10K_RAM_BSS_BUF_LEN 30000
> +
>  /* used for crash-dump storage, protected by data-lock */
>  struct ath10k_fw_crash_data {
>  	bool crashed_since_read;
> @@ -301,6 +305,9 @@ struct ath10k_fw_crash_data {
>  	u32 reg_dump_values[REG_DUMP_COUNT_QCA988X];
>  	u8 stack_buf[ATH10K_FW_STACK_SIZE];
>  	u8 exc_stack_buf[ATH10K_FW_STACK_SIZE];
> +
> +	u8 rom_bss_buf[ATH10K_ROM_BSS_BUF_LEN];
> +	u8 ram_bss_buf[ATH10K_RAM_BSS_BUF_LEN];
>  };

That (using estimates instead of allocating memory when we know
the true value and/or when we need it)
is wasting quite a bit of RAM.  Doesn't matter on my systems,
but AP manufacturers might be more ticklish about RAM usage...


> +	/* These are written to only during first firmware boot so no need
> +	 * for locking. */

It's 'firmware load' instead of boot maybe?


Thanks,
Ben

-- 
Ben Greear <greearb@candelatech.com>
Candela Technologies Inc  http://www.candelatech.com


_______________________________________________
ath10k mailing list
ath10k@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/ath10k

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH v5 0/7] ath10k: firmware crash dump
  2014-08-08 21:32 ` [PATCH v5 0/7] ath10k: firmware crash dump Ben Greear
@ 2014-08-09  5:50   ` Kalle Valo
  2014-08-09 16:51     ` Ben Greear
  0 siblings, 1 reply; 17+ messages in thread
From: Kalle Valo @ 2014-08-09  5:50 UTC (permalink / raw)
  To: Ben Greear; +Cc: ath10k

Ben Greear <greearb@candelatech.com> writes:

> On 08/08/2014 01:28 PM, Kalle Valo wrote:
>> Hi,
>> 
>> here's my reworked Ben's patchset adding firmware crash dump support to ath10k.
>> Unfortunately this crashes when reading the stack dump from the firmware but
>> time run out for me to fix that and I wanted to send this for comments anyway.
>> 
>> I did quite a lot of changes, basically to simplify the code, remove ifdefs and
>> so on. Here's some sort of list what I did:
>> 
>> * dump_data->tv_sec and tv_nsec to 64 bits (because long can be 32 bits
>>   on some platforms)
>
> I did u32 on purpose because I know how to do ntohl, htonl if I need
> to flip the machine order in user-space.

On my 32-bit box I got a compiler warning for this and that's why I
changed it to u64.

> Is there something similar for 64-bit numbers?

At least I haven't heard anything. But it should be easy to implement on
your own.

>> * latest crash dump is always stored (instead of the oldest unread)
>
> At least with the kernel, the first crash is normally the most useful,
> so I figured the same would be true of the firmware (a firmware crash
> is excellent way to find hidden bugs in the driver, so upon crash/reload,
> it is more likely that the driver will be screwed up and thus mis-configure
> firmware...)

Think of an AP with 3 month uptime where firmware crashes once in June
and second time in August and when you retrieve the crash dump you get
the one from June. I find that very unintuitive.

I think in your use case we should have a tool which stores all crash
dumps based on events from the kernel, that way you have access to all
of them. But for the simple case we should just provide the latest dump.

>> * atomic allocation in ath10k_pci_dump_dbglog() is bad. Should we
>>   allocate a big buffer with vmalloc and use that?
>
> dbglog entries are probably never going to be large, they are currently
> in the 2k range, so vmalloc is likely overkill.

But what would be the downside from using vmalloc? This is not in
hotpath or anything like that. With vmalloc we don't need to worry about
using precious unfragmented memory.

> If you want it allocated at startup, just choose a 4k buffer size, can
> put the buffer right in the debug struct or something like that so you
> don't even have more memory management to deal with. It will waste 4k
> of RAM for normal use, however.

Ok. But I think we can leave this as is and fix it later.

>> * what should ath10k_fw_error_dump_open() do if firmware hasn't
>>   crashed? check crashed_since_read and return zero len file? or an
>>   error code? -ENOMSG?
>
> Maybe a mostly empty crash log with just the header, should be easy enough
> to figure out it's not really a crash.  If we use udev to gather these,
> it is likely to be stupid shell scripts doing a lot of the work, so
> being clever on return codes might not be helpful.

But with cp in that case you also get an empty file, right? That way
it's easy to detect that there's no crash dump available.

What is the benefit from providing an empty header to user space? I
don't get that.

>> * should the crash dump file actually be in little endian? would that
>>   be easier/simpler?
>
> I think it is about the same amount of work in user-space, so I'd
> keep the kernel simple and dump in the CPU's endian-ness.  I don't
> care that much either way, however.

I don't care much either. But IIRC Michal suggested something like that.

>> * should ath10k_pci_hif_dump_area() hold the lock all the time? That
>>   way we would guarantee that changes to ath10k_fw_crash_data are
>>   atomic.
>
> I don't think it's worth trying to do that.

Thought more about this and I think we need it. Otherwise we have a
theoretical race that uuid and the dumpt content are from separate
crashes. I'll take a look if there's anything preventing do all the
dumps in atomic context.

> I'll go dig through the patches in a bit..have to figure out why systemd
> is fcked up on my lab machine first. :P

But wasn't systemd supposed to fix all of our problems? ;)

-- 
Kalle Valo

_______________________________________________
ath10k mailing list
ath10k@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/ath10k

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH v5 2/7] ath10k: provide firmware crash info via debugfs
  2014-08-08 22:26   ` Ben Greear
@ 2014-08-09  5:52     ` Kalle Valo
  0 siblings, 0 replies; 17+ messages in thread
From: Kalle Valo @ 2014-08-09  5:52 UTC (permalink / raw)
  To: Ben Greear; +Cc: ath10k

Ben Greear <greearb@candelatech.com> writes:

> On 08/08/2014 01:28 PM, Kalle Valo wrote:
>
>> +
>> +	getnstimeofday(&timestamp);
>> +	dump_data->tv_sec = timestamp.tv_sec;
>> +	dump_data->tv_nsec = timestamp.tv_nsec;
>
> I wonder if we should take timestamp at crash time instead of
> crash-dump-gather time?

Good point, we definitely should. I'll change.

>>  int ath10k_debug_create(struct ath10k *ar)
>>  {
>> +	ar->debug.fw_crash_data = vzalloc(sizeof(ar->debug.fw_crash_data));
>> +	if (!ar->debug.fw_crash_data)
>> +		return -ENOMEM;
>> +
>
> That sizeof looks quite wrong.

Ouch, good catch! Will fix.

-- 
Kalle Valo

_______________________________________________
ath10k mailing list
ath10k@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/ath10k

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH v5 6/7] ath10k: save firmware RAM and ROM BSS sections on crash
  2014-08-08 22:36   ` Ben Greear
@ 2014-08-09  6:05     ` Kalle Valo
  2014-08-09 16:54       ` Ben Greear
  0 siblings, 1 reply; 17+ messages in thread
From: Kalle Valo @ 2014-08-09  6:05 UTC (permalink / raw)
  To: Ben Greear; +Cc: ath10k

Ben Greear <greearb@candelatech.com> writes:

> On 08/08/2014 01:29 PM, Kalle Valo wrote:
>>  
>> +/* estimated values, hopefully these are enough */
>> +#define ATH10K_ROM_BSS_BUF_LEN 10000
>> +#define ATH10K_RAM_BSS_BUF_LEN 30000
>> +
>>  /* used for crash-dump storage, protected by data-lock */
>>  struct ath10k_fw_crash_data {
>>  	bool crashed_since_read;
>> @@ -301,6 +305,9 @@ struct ath10k_fw_crash_data {
>>  	u32 reg_dump_values[REG_DUMP_COUNT_QCA988X];
>>  	u8 stack_buf[ATH10K_FW_STACK_SIZE];
>>  	u8 exc_stack_buf[ATH10K_FW_STACK_SIZE];
>> +
>> +	u8 rom_bss_buf[ATH10K_ROM_BSS_BUF_LEN];
>> +	u8 ram_bss_buf[ATH10K_RAM_BSS_BUF_LEN];
>>  };
>
> That (using estimates instead of allocating memory when we know the
> true value and/or when we need it) is wasting quite a bit of RAM.
> Doesn't matter on my systems, but AP manufacturers might be more
> ticklish about RAM usage...

Yeah, that is a problem bu this can be avoided by disabling
CONFIG_ATH10K_DEBUGFS altogether. But on the other hand, there might be
people who would still prefer to enable debugfs but not waste RAM on
firmware crash dumps. For those we could add a module parameter or
similar to disable firmware crash dumps, but is that overengineering
already?

Another option is that we do firmware crash memory allocation on the
fly, when the crash happens. But as we are in atomic context it's pretty
expensive. We cannot do huge kmalloc() call due to memory fragmentation
and I doubt __vmalloc() is acceptable to call in atomic contexts (at
least I didn't find use of that in the kernel).

Third option is that we allocate crash dump buffers after the firmware
has been loaded and FW IEs are read. That way we don't allocate any
extra memory but the fundamental problem still persists.

>> +	/* These are written to only during first firmware boot so no need
>> +	 * for locking. */
>
> It's 'firmware load' instead of boot maybe?

Correct, I'll change.

-- 
Kalle Valo

_______________________________________________
ath10k mailing list
ath10k@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/ath10k

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH v5 0/7] ath10k: firmware crash dump
  2014-08-09  5:50   ` Kalle Valo
@ 2014-08-09 16:51     ` Ben Greear
  0 siblings, 0 replies; 17+ messages in thread
From: Ben Greear @ 2014-08-09 16:51 UTC (permalink / raw)
  To: Kalle Valo; +Cc: ath10k



On 08/08/2014 10:50 PM, Kalle Valo wrote:
> Ben Greear <greearb@candelatech.com> writes:
>
>> On 08/08/2014 01:28 PM, Kalle Valo wrote:
>>> Hi,
>>>
>>> here's my reworked Ben's patchset adding firmware crash dump support to ath10k.
>>> Unfortunately this crashes when reading the stack dump from the firmware but
>>> time run out for me to fix that and I wanted to send this for comments anyway.
>>>
>>> I did quite a lot of changes, basically to simplify the code, remove ifdefs and
>>> so on. Here's some sort of list what I did:
>>>
>>> * dump_data->tv_sec and tv_nsec to 64 bits (because long can be 32 bits
>>>    on some platforms)
>>
>> I did u32 on purpose because I know how to do ntohl, htonl if I need
>> to flip the machine order in user-space.
>
> On my 32-bit box I got a compiler warning for this and that's why I
> changed it to u64.

Casting could have fixed that I imagine.

>> Is there something similar for 64-bit numbers?
>
> At least I haven't heard anything. But it should be easy to implement on
> your own.

Sure...it's not a big deal, likely there will be at most two different apps
that ever decode this anyway (mine and qca's).


>>> * latest crash dump is always stored (instead of the oldest unread)
>>
>> At least with the kernel, the first crash is normally the most useful,
>> so I figured the same would be true of the firmware (a firmware crash
>> is excellent way to find hidden bugs in the driver, so upon crash/reload,
>> it is more likely that the driver will be screwed up and thus mis-configure
>> firmware...)
>
> Think of an AP with 3 month uptime where firmware crashes once in June
> and second time in August and when you retrieve the crash dump you get
> the one from June. I find that very unintuitive.

AP should have gathered the crash the first time, but again, not a big deal,
as I would gather the crash logs as soon as they happen anyway.

>>> * what should ath10k_fw_error_dump_open() do if firmware hasn't
>>>    crashed? check crashed_since_read and return zero len file? or an
>>>    error code? -ENOMSG?
>>
>> Maybe a mostly empty crash log with just the header, should be easy enough
>> to figure out it's not really a crash.  If we use udev to gather these,
>> it is likely to be stupid shell scripts doing a lot of the work, so
>> being clever on return codes might not be helpful.
>
> But with cp in that case you also get an empty file, right? That way
> it's easy to detect that there's no crash dump available.
>
> What is the benefit from providing an empty header to user space? I
> don't get that.

An empty file could be any sort of error or mis-configuration
...but an empty header means the infrastructure is working properly.

I think with all of this, it mostly a matter of opinion, and whatever
you choose should be fine with me.  If we find any serious issues, we
can of course fix it later.

>> I'll go dig through the patches in a bit..have to figure out why systemd
>> is fcked up on my lab machine first. :P
>
> But wasn't systemd supposed to fix all of our problems? ;)

On Fedora 19, the latest update forgot how to enable/disable programs.  It must
have a zero-regression-test policy as well as it's other issues :P

Thanks,
Ben

-- 
Ben Greear <greearb@candelatech.com>
Candela Technologies Inc  http://www.candelatech.com

_______________________________________________
ath10k mailing list
ath10k@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/ath10k

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH v5 6/7] ath10k: save firmware RAM and ROM BSS sections on crash
  2014-08-09  6:05     ` Kalle Valo
@ 2014-08-09 16:54       ` Ben Greear
  2014-08-09 17:31         ` Kalle Valo
  0 siblings, 1 reply; 17+ messages in thread
From: Ben Greear @ 2014-08-09 16:54 UTC (permalink / raw)
  To: Kalle Valo; +Cc: ath10k



On 08/08/2014 11:05 PM, Kalle Valo wrote:
> Ben Greear <greearb@candelatech.com> writes:
>
>> On 08/08/2014 01:29 PM, Kalle Valo wrote:
>>>
>>> +/* estimated values, hopefully these are enough */
>>> +#define ATH10K_ROM_BSS_BUF_LEN 10000
>>> +#define ATH10K_RAM_BSS_BUF_LEN 30000
>>> +
>>>   /* used for crash-dump storage, protected by data-lock */
>>>   struct ath10k_fw_crash_data {
>>>   	bool crashed_since_read;
>>> @@ -301,6 +305,9 @@ struct ath10k_fw_crash_data {
>>>   	u32 reg_dump_values[REG_DUMP_COUNT_QCA988X];
>>>   	u8 stack_buf[ATH10K_FW_STACK_SIZE];
>>>   	u8 exc_stack_buf[ATH10K_FW_STACK_SIZE];
>>> +
>>> +	u8 rom_bss_buf[ATH10K_ROM_BSS_BUF_LEN];
>>> +	u8 ram_bss_buf[ATH10K_RAM_BSS_BUF_LEN];
>>>   };
>>
>> That (using estimates instead of allocating memory when we know the
>> true value and/or when we need it) is wasting quite a bit of RAM.
>> Doesn't matter on my systems, but AP manufacturers might be more
>> ticklish about RAM usage...
>
> Yeah, that is a problem bu this can be avoided by disabling
> CONFIG_ATH10K_DEBUGFS altogether. But on the other hand, there might be
> people who would still prefer to enable debugfs but not waste RAM on
> firmware crash dumps. For those we could add a module parameter or
> similar to disable firmware crash dumps, but is that overengineering
> already?
>
> Another option is that we do firmware crash memory allocation on the
> fly, when the crash happens. But as we are in atomic context it's pretty
> expensive. We cannot do huge kmalloc() call due to memory fragmentation
> and I doubt __vmalloc() is acceptable to call in atomic contexts (at
> least I didn't find use of that in the kernel).
>
> Third option is that we allocate crash dump buffers after the firmware
> has been loaded and FW IEs are read. That way we don't allocate any
> extra memory but the fundamental problem still persists.

I'd prefer this third option..seems more elegant all around.

I don't think you can do vmalloc in atomic context, btw.  But, can do it
on firmware load.

Thanks,
Ben

-- 
Ben Greear <greearb@candelatech.com>
Candela Technologies Inc  http://www.candelatech.com

_______________________________________________
ath10k mailing list
ath10k@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/ath10k

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH v5 6/7] ath10k: save firmware RAM and ROM BSS sections on crash
  2014-08-09 16:54       ` Ben Greear
@ 2014-08-09 17:31         ` Kalle Valo
  0 siblings, 0 replies; 17+ messages in thread
From: Kalle Valo @ 2014-08-09 17:31 UTC (permalink / raw)
  To: Ben Greear; +Cc: ath10k

Ben Greear <greearb@candelatech.com> writes:

>> Third option is that we allocate crash dump buffers after the firmware
>> has been loaded and FW IEs are read. That way we don't allocate any
>> extra memory but the fundamental problem still persists.
>
> I'd prefer this third option..seems more elegant all around.
>
> I don't think you can do vmalloc in atomic context, btw.  But, can do it
> on firmware load.

Ok, I'll look how to implement this.

-- 
Kalle Valo

_______________________________________________
ath10k mailing list
ath10k@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/ath10k

^ permalink raw reply	[flat|nested] 17+ messages in thread

end of thread, other threads:[~2014-08-09 17:31 UTC | newest]

Thread overview: 17+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2014-08-08 20:28 [PATCH v5 0/7] ath10k: firmware crash dump Kalle Valo
2014-08-08 20:28 ` [PATCH v5 1/7] ath10k: add ath10k_pci_diag_* helpers Kalle Valo
2014-08-08 20:28 ` [PATCH v5 2/7] ath10k: provide firmware crash info via debugfs Kalle Valo
2014-08-08 22:26   ` Ben Greear
2014-08-09  5:52     ` Kalle Valo
2014-08-08 20:28 ` [PATCH v5 3/7] ath10k: save firmware debug log messages Kalle Valo
2014-08-08 20:29 ` [PATCH v5 4/7] ath10k: save firmware stack upon firmware crash Kalle Valo
2014-08-08 20:29 ` [PATCH v5 5/7] ath10k: dump exception stack contents on " Kalle Valo
2014-08-08 20:29 ` [PATCH v5 6/7] ath10k: save firmware RAM and ROM BSS sections on crash Kalle Valo
2014-08-08 22:36   ` Ben Greear
2014-08-09  6:05     ` Kalle Valo
2014-08-09 16:54       ` Ben Greear
2014-08-09 17:31         ` Kalle Valo
2014-08-08 20:29 ` [PATCH v5 7/7] ath10k: rename ath10k_pci_hif_dump_area() to ath10k_pci_firmware_crashed() Kalle Valo
2014-08-08 21:32 ` [PATCH v5 0/7] ath10k: firmware crash dump Ben Greear
2014-08-09  5:50   ` Kalle Valo
2014-08-09 16:51     ` Ben Greear

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.