ATH10K Archive on lore.kernel.org
 help / color / Atom feed
From: Wen Gong <wgong@codeaurora.org>
To: ath10k@lists.infradead.org
Cc: linux-wireless@vger.kernel.org, wgong@codeaurora.org
Subject: [PATCH] ath10k: change len of trace_ath10k_log_dbg_dump for large buffer size
Date: Tue,  9 Feb 2021 10:59:53 +0800
Message-ID: <1612839593-2308-1-git-send-email-wgong@codeaurora.org> (raw)

Kernel panic every time in kernel when running below case:
steps:
1. connect to an AP with good signal strength
2. echo 0x7f > /sys/kernel/debug/ieee80211/phy0/ath10k/pktlog_filter
3. echo 0xffff 0 > /sys/kernel/debug/ieee80211/phy0/ath10k/fw_dbglog
4. echo 0 > /sys/module/ath10k_core/parameters/debug_mask
5. sudo trace-cmd record  -e ath10k
6. run "iperf -c 192.168.1.1 -u -b 100m -i 1 -t 30"
7. kernel panic immeditely

It always crash at trace_event_raw_event_ath10k_xxx, below is 2 sample:

dmesg 1:
[   58.038310] Unable to handle kernel paging request at virtual address 003130393837363d
[   58.038325] Mem abort info:
[   58.038330]   ESR = 0x96000004
[   58.038336]   Exception class = DABT (current EL), IL = 32 bits
[   58.038340]   SET = 0, FnV = 0
[   58.038345]   EA = 0, S1PTW = 0
[   58.038349] Data abort info:
[   58.038353]   ISV = 0, ISS = 0x00000004
[   58.038358]   CM = 0, WnR = 0
[   58.038363] [003130393837363d] address between user and kernel address ranges
[   58.038370] Internal error: Oops: 96000004 [#1] PREEMPT SMP
[   58.038377] Modules linked in: rfcomm algif_hash algif_skcipher
[   58.038385] Unable to handle kernel paging request at virtual address 003938373635343b
[   58.038386]  af_alg uinput cros_ec_rpmsg mtk_dip mtk_mdp3
[   58.038395] Mem abort info:
[   58.038403]   ESR = 0x96000004
[   58.038412]   Exception class = DABT (current EL), IL = 32 bits
[   58.038419]   SET = 0, FnV = 0
[   58.038427]   EA = 0, S1PTW = 0
[   58.038434] Data abort info:
[   58.038442]   ISV = 0, ISS = 0x00000004
[   58.038449]   CM = 0, WnR = 0
[   58.038457] [003938373635343b] address between user and kernel address ranges
[   58.038486] Process kworker/u16:1 (pid: 141, stack limit = 0x000000009bd6ba2e)
[   58.038496] CPU: 1 PID: 141 Comm: kworker/u16:1 Not tainted 4.19.139 #162
[   58.038501] Hardware name: MediaTek krane sku176 board (DT)
[   58.038515] Workqueue: ath10k_sdio_wq ath10k_sdio_write_async_work [ath10k_sdio]
[   58.038523] pstate: 80000005 (Nzcv daif -PAN -UAO)
[   58.038536] pc : filter_match_preds+0x20/0x94
[   58.038546] lr : trace_event_buffer_commit+0x150/0x22c
[   58.038550] sp : ffffff80088a39f0
[   58.038555] x29: ffffff80088a3a20 x28: 000000000000000c
[   58.038561] x27: 000000000000000c x26: 0000000000000001
[   58.038567] x25: ffffffdfd96ef008 x24: 0000000000000000
[   58.038573] x23: 0000000000000000 x22: ffffffe03b006c00
[   58.038580] x21: ffffffdfd96ef000 x20: 3231303938373635
[   58.038586] x19: ffffffe02de9b160 x18: 0000000000000020
[   58.038592] x17: 0000000000100000 x16: ffffffa4e67937f0
[   58.038598] x15: 0000000000100000 x14: 0000393837363534
[   58.038604] x13: 3332313039383736 x12: 3534333231303938
[   58.038611] x11: 3736353433323130 x10: 3938373635343332
[   58.038617] x9 : 3130393837363534 x8 : 000000000000040b
[   58.038623] x7 : 3534333231303938 x6 : ffffffdfd96fb2d1
[   58.038629] x5 : 0000000000000000 x4 : 0000000000000000
[   58.038635] x3 : 000000000000c2cd x2 : ffffffffffffffc0
[   58.038641] x1 : ffffffdfd96ef008 x0 : ffffffdfd96f7080
[   58.038648] Call trace:
[   58.038655]  filter_match_preds+0x20/0x94
[   58.038661]  trace_event_buffer_commit+0x150/0x22c
[   58.038698]  trace_event_raw_event_ath10k_log_dbg_dump+0x208/0x240 [ath10k_core]
[   58.038716]  ath10k_dbg_dump+0x1bc/0x21c [ath10k_core]
[   58.038724]  ath10k_sdio_write+0xc8/0xe8 [ath10k_sdio]
[   58.038730]  ath10k_sdio_write_async_work+0x11c/0x1f4 [ath10k_sdio]
[   58.038738]  process_one_work+0x210/0x418
[   58.038744]  worker_thread+0x234/0x3dc
[   58.038751]  kthread+0x120/0x140
[   58.038758]  ret_from_fork+0x10/0x18
[   58.038766] Code: 9100c3fd b40002e0 f9400014 b40002b4 (f9400680)
[   58.038773] ---[ end trace 8b9d2463f441eb7f ]---
[   58.051846] Kernel panic - not syncing: Fatal exception
[   58.051864] SMP: stopping secondary CPUs

dmesg 2:
[  228.692905] Unable to handle kernel paging request at virtual address 0033323130393837
[  228.692913] Mem abort info:
[  228.692920]   ESR = 0x96000004
[  228.692927]   Exception class = DABT (current EL), IL = 32 bits
[  228.692933]   SET = 0, FnV = 0
[  228.692938]   EA = 0, S1PTW = 0
[  228.692943] Data abort info:
[  228.692953]   ISV = 0, ISS = 0x00000004
[  228.692965]   CM = 0, WnR = 0
[  228.692975] [0033323130393837] address between user and kernel address ranges
[  228.692983] Internal error: Oops: 96000004 [#1] PREEMPT SMP
[  228.693078] Process kworker/u16:2 (pid: 144, stack limit = 0x000000009f882493)
[  228.693088] CPU: 6 PID: 144 Comm: kworker/u16:2 Tainted: G        W         4.19.139 #162
[  228.693095] Hardware name: MediaTek krane sku176 board (DT)
[  228.693125] Workqueue: ath10k_tx_complete_wq ath10k_htc_tx_complete_work [ath10k_core]
[  228.693137] pstate: 80000005 (Nzcv daif -PAN -UAO)
[  228.693152] pc : filter_match_preds+0x38/0x94
[  228.693164] lr : trace_event_buffer_commit+0x150/0x22c
[  228.693170] sp : ffffff80088b3b20
[  228.693175] x29: ffffff80088b3b50 x28: 0000000000000402
[  228.693183] x27: fffffff040001f60 x26: ffffff9d4ba34000
[  228.693191] x25: fffffff040af5004 x24: 0000000000000000
[  228.693198] x23: 0000000000000000 x22: fffffff040af7100
[  228.693205] x21: 0000000000000000 x20: fffffff040af7100
[  228.693212] x19: fffffff040af5004 x18: 0000000000000000
[  228.693219] x17: 000000000000003c x16: ffffff9d4da649d8
[  228.693226] x15: 0000000000000006 x14: 3837363534333231
[  228.693233] x13: fffffff01514fc00 x12: 0000000000000000
[  228.693240] x11: 0000000000000000 x10: fffffff040af5004
[  228.693247] x9 : 0000000000000000 x8 : 0000000000000000
[  228.693254] x7 : 646b686064ff6e68 x6 : 0000800000000000
[  228.693261] x5 : 0000000000000000 x4 : 0000000000000000
[  228.693268] x3 : 000000000000002c x2 : 0000000000000000
[  228.693275] x1 : fffffff040af5004 x0 : 3433323130393837
[  228.693282] Call trace:
[  228.693291]  filter_match_preds+0x38/0x94
[  228.693299]  trace_event_buffer_commit+0x150/0x22c
[  228.693321]  trace_event_raw_event_ath10k_txrx_tx_unref+0x164/0x198 [ath10k_core]
[  228.693344]  ath10k_txrx_tx_unref+0x1c0/0x36c [ath10k_core]
[  228.693365]  ath10k_htt_htc_tx_complete+0xe4/0x114 [ath10k_core]
[  228.693386]  ath10k_htc_notify_tx_completion+0xe4/0x124 [ath10k_core]
[  228.693407]  ath10k_htc_tx_complete_work+0xb8/0xf0 [ath10k_core]
[  228.693418]  process_one_work+0x210/0x418
[  228.693425]  worker_thread+0x234/0x3dc
[  228.693433]  kthread+0x120/0x140
[  228.693441]  ret_from_fork+0x10/0x18
[  228.693451] Code: aa0103f3 aa1f03e8 2a1f03f5 aa1403f6 (f9400009)
[  228.693459] ---[ end trace 187b0ae372e7515c ]---
[  228.711074] Kernel panic - not syncing: Fatal exception
[  228.711086] SMP: stopping secondary CPUs
[  228.711254] Kernel Offset: 0x1d45600000 from 0xffffff8008000000
[  228.711259] CPU features: 0x0,2188200c
[  228.711262] Memory Limit: none

The value of prog in filter_match_preds of kernel/trace/trace_events_filter.c
is overwrite to the content of the UDP packets's content like this
0x0039383736353433, it is a invalid address, so crash.

Content of udp tx packets with iperf:
0000   00 03 7f 46 c4 49 40 49 0f 2f f9 09 08 00 45 00   ...F.I@I./....E.
0010   05 da 18 00 40 00 40 11 99 4a c0 a8 01 77 c0 a8   ....@.@..J...w..
0020   01 01 9c 7d 13 89 05 c6 89 a0 00 00 01 8b 5d 69   ...}..........]i
0030   98 9a 00 0a 55 b8 00 00 00 00 00 00 00 01 00 00   ....U...........
0040   13 89 00 00 00 00 00 00 00 00 ff ff f4 48 05 f5   .............H..
0050   e1 00 30 31 32 33 34 35 36 37 38 39 30 31 32 33   ..01234567890123
0060   34 35 36 37 38 39 30 31 32 33 34 35 36 37 38 39   4567890123456789
0070   30 31 32 33 34 35 36 37 38 39 30 31 32 33 34 35   0123456789012345
0080   36 37 38 39 30 31 32 33 34 35 36 37 38 39 30 31   6789012345678901
0090   32 33 34 35 36 37 38 39 30 31 32 33 34 35 36 37   2345678901234567
00a0   38 39 30 31 32 33 34 35 36 37 38 39 30 31 32 33   8901234567890123
00b0   34 35 36 37 38 39 30 31 32 33 34 35 36 37 38 39   4567890123456789
...
05b0   34 35 36 37 38 39 30 31 32 33 34 35 36 37 38 39   4567890123456789
05c0   30 31 32 33 34 35 36 37 38 39 30 31 32 33 34 35   0123456789012345
05d0   36 37 38 39 30 31 32 33 34 35 36 37 38 39 30 31   6789012345678901
05e0   32 33 34 35 36 37 38 39                           23456789

ath10k_htc_send_bundle_skbs allocate skb with size 49792(1556*32), it
is bigger than PAGE_SIZE which is normally 4096, then ath10k_sdio_write
call ath10k_dbg_dump to trace the large size skb and corrupt the trace
data of tracing and lead crash. When disable the TX bundle of SDIO, it
does not crash, but TX bundle is for improve throughput, so it is enabled
by default. It is useless to call ath10k_dbg_dump to trace the large
bundled skb, so this patch trace the top part of large bundled skb.

I also did another try with below change, it is move the ath10k_dbg_dump
from ath10k_sdio_write to ath10k_sdio_mbox_rx_fetch_bundle, it also
crash immediately when run iperf UDP RX traffic, RX bundle is enabled
by default for SDIO, so the virt_pkt_len of ath10k_sdio_mbox_rx_fetch_bundle
is large, max value is 57344(1792*32), so it crashed same with upper test.

Change of try:
diff --git a/drivers/net/wireless/ath/ath10k/sdio.c b/drivers/net/wireless/ath/ath10k/sdio.c
--- a/drivers/net/wireless/ath/ath10k/sdio.c
+++ b/drivers/net/wireless/ath/ath10k/sdio.c
@@ -336,7 +336,6 @@ static int ath10k_sdio_write(struct ath10k *ar, u32 addr, const void *buf, size_

        ath10k_dbg(ar, ATH10K_DBG_SDIO, "sdio write addr 0x%x buf 0x%p len %zu\n",
                   addr, buf, len);
-       ath10k_dbg_dump(ar, ATH10K_DBG_SDIO_DUMP, NULL, "sdio write ", buf, len);

 out:
        sdio_release_host(func);
@@ -692,6 +691,8 @@ static int ath10k_sdio_mbox_rx_fetch_bundle(struct ath10k *ar)
                goto err;
        }

+       ath10k_dbg_dump(ar, ATH10K_DBG_SDIO_DUMP, NULL, "sdio read vsg", ar_sdio->vsg_buffer, virt_pkt_len);
+
        pkt_offset = 0;
        for (i = 0; i < ar_sdio->n_rx_pkts; i++) {
                pkt = &ar_sdio->rx_pkts[i];

dmesg of try:
[  170.621318] Unable to handle kernel paging request at virtual address 0036353433323138
[  170.621336] Mem abort info:
[  170.621341]   ESR = 0x96000004
[  170.621347]   Exception class = DABT (current EL), IL = 32 bits
[  170.621352]   SET = 0, FnV = 0
[  170.621357]   EA = 0, S1PTW = 0
[  170.621362] Data abort info:
[  170.621367]   ISV = 0, ISS = 0x00000004
[  170.621371]   CM = 0, WnR = 0
[  170.621377] [0036353433323138] address between user and kernel address ranges
[  170.621385] Internal error: Oops: 96000004 [#1] PREEMPT SMP
[  170.621479] Process kworker/0:1 (pid: 14, stack limit = 0x00000000c6fcfb2a)
[  170.621489] CPU: 0 PID: 14 Comm: kworker/0:1 Not tainted 4.19.139 #162
[  170.621494] Hardware name: MediaTek krane sku176 board (DT)
[  170.621511] Workqueue: events sdio_irq_work
[  170.621519] pstate: 80000005 (Nzcv daif -PAN -UAO)
[  170.621529] pc : filter_match_preds+0x20/0x94
[  170.621540] lr : trace_event_buffer_commit+0x150/0x22c
[  170.621545] sp : ffffff80080f37d0
[  170.621550] x29: ffffff80080f3800 x28: 000000000000000c
[  170.621556] x27: 000000000000000c x26: 0000000000000001
[  170.621563] x25: fffffff118eb8008 x24: 0000000000000000
[  170.621569] x23: 0000000000000000 x22: fffffff17b007900
[  170.621576] x21: fffffff118eb8000 x20: 3736353433323130
[  170.621583] x19: fffffff178645790 x18: 0000000000000020
[  170.621589] x17: 000000000000003c x16: ffffff91fe3937f0
[  170.621596] x15: 0000000000000006 x14: 00000636000ad8b0
[  170.621602] x13: 0000000000000000 x12: 0000000000000000
[  170.621608] x11: 0000000000000000 x10: 0000000000000000
[  170.621615] x9 : 0000000000000000 x8 : 000000000000040b
[  170.621621] x7 : 0000000000000000 x6 : fffffff118ec6052
[  170.621628] x5 : 0000000000000000 x4 : 0000000000000000
[  170.621634] x3 : 000000000000e04e x2 : ffffffffffffffc0
[  170.621640] x1 : fffffff118eb8008 x0 : fffffff118ec1f00
[  170.621648] Call trace:
[  170.621655]  filter_match_preds+0x20/0x94
[  170.621662]  trace_event_buffer_commit+0x150/0x22c
[  170.621704]  trace_event_raw_event_ath10k_log_dbg_dump+0x208/0x240 [ath10k_core]
[  170.621724]  ath10k_dbg_dump+0x1bc/0x21c [ath10k_core]
[  170.621734]  ath10k_sdio_readsb+0xdc/0xfc [ath10k_sdio]
[  170.621742]  ath10k_sdio_irq_handler+0x6f8/0xca8 [ath10k_sdio]
[  170.621749]  process_sdio_pending_irqs+0x60/0x1b4
[  170.621756]  sdio_run_irqs+0x34/0x60
[  170.621762]  sdio_irq_work+0x1c/0x28
[  170.621769]  process_one_work+0x210/0x418
[  170.621775]  worker_thread+0x234/0x3dc
[  170.621783]  kthread+0x120/0x140
[  170.621791]  ret_from_fork+0x10/0x18
[  170.621799] Code: 9100c3fd b40002e0 f9400014 b40002b4 (f9400680)
[  170.621807] ---[ end trace b52c34b9c8deb8e3 ]---
[  170.624123] Unable to handle kernel paging request at virtual address dffd74685cec003c
[  170.624134] Mem abort info:
[  170.624138]   ESR = 0x96000004
[  170.624143]   Exception class = DABT (current EL), IL = 32 bits
[  170.624146]   SET = 0, FnV = 0
[  170.624150]   EA = 0, S1PTW = 0
[  170.624153] Data abort info:
[  170.624157]   ISV = 0, ISS = 0x00000004
[  170.624160]   CM = 0, WnR = 0
[  170.624165] [dffd74685cec003c] address between user and kernel address ranges
[  170.634352] Kernel panic - not syncing: Fatal exception
[  170.634370] SMP: stopping secondary CPUs
[  171.711302] SMP: failed to stop secondary CPUs 0,4
[  171.711310] Kernel Offset: 0x11f6200000 from 0xffffff8008000000
[  171.711316] CPU features: 0x0,2188200c
[  171.711321] Memory Limit: none
[  171.723885] SMP: stopping secondary CPUs
[  172.800814] SMP: failed to stop secondary CPUs 0,4

trace_event_raw_event_ath10k_log_dbg_dump is generated by compiler, it
call trace_event_buffer_reserve got get a struct pointer *entry, its
type is trace_event_raw_ath10k_log_dbg_dump which is also generated by
compiler, trace_event_buffer_reserve of kernel/trace/trace_events.c
call trace_event_buffer_lock_reserve to get ring_buffer_event.

In function trace_event_buffer_lock_reserve of kernel/trace/trace.c,
the ring_buffer_time_stamp_abs is false and trace_file->flags is 0x40b
which is set bit of EVENT_FILE_FL_FILTERED by debugging, so it use the
temp buffer this_cpu_read(trace_buffered_event), and the buffer size
is 1 page size which allocatee in trace_buffered_event_enable by
alloc_pages_node, and then ath10k pass the buffer size > 1 page trigger
overflow and crash.

Based on upper test, try and debugging, pass large buff size to function
trace_ath10k_log_dbg_dump cause crash, and it has ath10k_dbg in
ath10k_sdio_write to print the length of skb/buffer, it is not necessary
to trace all content of the large skb.

Tested-on: QCA6174 hw3.2 SDIO WLAN.RMH.4.4.1-00049

Signed-off-by: Wen Gong <wgong@codeaurora.org>
---
 drivers/net/wireless/ath/ath10k/debug.c | 2 +-
 drivers/net/wireless/ath/ath10k/debug.h | 2 ++
 2 files changed, 3 insertions(+), 1 deletion(-)

diff --git a/drivers/net/wireless/ath/ath10k/debug.c b/drivers/net/wireless/ath/ath10k/debug.c
index e8250a665433..5433dbdde0e5 100644
--- a/drivers/net/wireless/ath/ath10k/debug.c
+++ b/drivers/net/wireless/ath/ath10k/debug.c
@@ -2718,7 +2718,7 @@ void ath10k_dbg_dump(struct ath10k *ar,
 
 	/* tracing code doesn't like null strings :/ */
 	trace_ath10k_log_dbg_dump(ar, msg ? msg : "", prefix ? prefix : "",
-				  buf, len);
+				  buf, min_t(size_t, len, ATH10K_LOG_DUMP_MAX));
 }
 EXPORT_SYMBOL(ath10k_dbg_dump);
 
diff --git a/drivers/net/wireless/ath/ath10k/debug.h b/drivers/net/wireless/ath/ath10k/debug.h
index 997c1c80aba7..cec9ba92f28f 100644
--- a/drivers/net/wireless/ath/ath10k/debug.h
+++ b/drivers/net/wireless/ath/ath10k/debug.h
@@ -11,6 +11,8 @@
 #include <linux/types.h>
 #include "trace.h"
 
+#define ATH10K_LOG_DUMP_MAX 1024
+
 enum ath10k_debug_mask {
 	ATH10K_DBG_PCI		= 0x00000001,
 	ATH10K_DBG_WMI		= 0x00000002,
-- 
2.23.0


_______________________________________________
ath10k mailing list
ath10k@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/ath10k

             reply index

Thread overview: 12+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-02-09  2:59 Wen Gong [this message]
2021-02-09  7:34 ` Kalle Valo
2021-02-09 19:35 ` Brian Norris
2021-02-09 19:55   ` Steven Rostedt
2021-02-09 21:34     ` Steven Rostedt
2021-02-10  2:01       ` Wen Gong
2021-02-10 16:30         ` Steven Rostedt
2021-02-10 16:48           ` Steven Rostedt
2021-02-10 16:53             ` Steven Rostedt
2021-02-10  2:11   ` Wen Gong
2021-02-10  2:14     ` Brian Norris
2021-02-10  2:24       ` Wen Gong

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1612839593-2308-1-git-send-email-wgong@codeaurora.org \
    --to=wgong@codeaurora.org \
    --cc=ath10k@lists.infradead.org \
    --cc=linux-wireless@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

ATH10K Archive on lore.kernel.org

Archives are clonable:
	git clone --mirror https://lore.kernel.org/ath10k/0 ath10k/git/0.git

	# If you have public-inbox 1.1+ installed, you may
	# initialize and index your mirror using the following commands:
	public-inbox-init -V2 ath10k ath10k/ https://lore.kernel.org/ath10k \
		ath10k@lists.infradead.org
	public-inbox-index ath10k

Example config snippet for mirrors

Newsgroup available over NNTP:
	nntp://nntp.lore.kernel.org/org.infradead.lists.ath10k


AGPL code for this site: git clone https://public-inbox.org/public-inbox.git