netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* Possible bug in TCP retry logic/Kernel crash
@ 2019-11-10  5:59 Avinash Patil
  2019-11-10 23:47 ` Eric Dumazet
  0 siblings, 1 reply; 8+ messages in thread
From: Avinash Patil @ 2019-11-10  5:59 UTC (permalink / raw)
  To: netdev

Hi everyone,

Kernel: Linux 4.19.35 kernel built from linux-stable

I am seeing this issue on our platform and suspect this is TCP issue:

[ 3148.796319] Oops
[ 3148.799789] Path: /usr/bin/qtn_dut
[ 3148.803306] CPU: 0 PID: 1341 Comm: qtn_dut Tainted: P           O
   4.19.35 #4
[ 3148.810876]
[ 3148.810876] [ECR   ]: 0x00220100 => Invalid Read @ 0x00000008 by
insn @ 0x8b1bc7e8
[ 3148.820064] [EFA   ]: 0x00000008
[ 3148.820064] [BLINK ]: tcp_try_coalesce+0x3c/0xf0
[ 3148.820064] [ERET  ]: skb_try_coalesce+0x94/0x3a0
[ 3148.832704] [STAT32]: 0x00000206 : K         E2 E1
[ 3148.837677] BTA: 0x8b309ca3   SP: 0x8c92db44  FP: 0x00000000
[ 3148.843338] LPS: 0x8b304b94  LPE: 0x8b304b9c LPC: 0x00000000
[ 3148.849023] r00: 0x8c8743c0  r01: 0x8c92a0e0 r02: 0x8c92dbaa
[ 3148.849023] r03: 0x00000000  r04: 0x40000214 r05: 0x8b221ab8
[ 3148.849023] r06: 0x8bab8f3d  r07: 0x00000000 r08: 0x00000000
[ 3148.849023] r09: 0x00000000  r10: 0x1f4f9e47 r11: 0x00000000
[ 3148.849023] r12: 0x00000000  r13: 0x8afcecfc r14: 0x000d2bb8
[ 3148.849023] r15: 0x5682fbc0  r16: 0xffffffff r17: 0x00000000
[ 3148.849023] r18: 0x00000001  r19: 0x5682faa4 r20: 0x5682fa84
[ 3148.849023] r21: 0x5682fa64  r22: 0x5682fb30 r23: 0x00000020
[ 3148.849023] r24: 0x000d2bb8  r25: 0x5682fbc0
[ 3148.849023]
[ 3148.849023]
[ 3148.901689]
[ 3148.901689] Stack Trace:
[ 3148.905781] Firmware build version: AAA
[ 3148.905781] Firmware configuration: BBB
[ 3148.905781] Hardware ID           : CCC
[ 3148.920879]   skb_try_coalesce+0x94/0x3a0
[ 3148.925026]   tcp_try_coalesce+0x3c/0xf0
[ 3148.929079]   tcp_queue_rcv+0x44/0x164
[ 3148.932953]   tcp_data_queue+0x32a/0x75c
[ 3148.936946]   tcp_rcv_established+0x37e/0x7d4
[ 3148.941438]   tcp_v4_do_rcv+0xda/0x120
[ 3148.945320]   tcp_v4_rcv+0x8f2/0xa04
[ 3148.949034]   ip_local_deliver+0x72/0x208
[ 3148.953179]   process_backlog+0xbe/0x1b0
[ 3148.957169]   net_rx_action+0xfe/0x27c
[ 3148.961057]   __do_softirq+0xf0/0x228
[ 3148.964863]   __local_bh_enable_ip+0xae/0xb4
[ 3148.969277]   ip_finish_output2.constprop.6+0x116/0x368
[ 3148.974641]   __tcp_transmit_skb+0x56e/0xb3c
[ 3148.979039]   tcp_write_xmit+0x34a/0x126c
[ 3148.983174]   __tcp_push_pending_frames+0x28/0x94
[ 3148.987992]   tcp_sendmsg_locked+0xa7a/0xc14
[ 3148.992386]   tcp_sendmsg+0x1e/0x34
[ 3148.995935]   __sys_sendto+0xc8/0xf4
[ 3148.999642]   EV_Trap+0x11c/0x120
[ 3149.003057]

Conditions under which this happens:

There are 2 processes running on platform which communicate with TCP
sockets- P1 and P2.
1. P1 has 2 TCP sockets- one TCP client to communicate with P2 while
another TCP server to listen to client running on another machine.
2. P1 has issued command to P2 and P2 is preparing response.
3. While P2 is preparing response, P1 receives zero sized packet from
remote server and closes its server socket treating this as error.
Note: client socket is open/active. P2 prepares its response but its
buffered
4. P1 respawns server socket and issues another command to P2 and
waits for response.
5. P2 now sends 2 sets of data- one for old session and one response
for current command. I see kernel panic with backtrace as above.


There is another symptom of this issue :

# [  194.416963] Alignment trap: fault in fix-up 0000a260 at [<00000001>]
[  194.423419]
[  194.423419] Misaligned Access
[  194.427950] Path: (null)
[  194.430517] CPU: 0 PID: 0 Comm: swapper Tainted: P           O
4.19.35 #3
[  194.437816]
[  194.437816] [ECR   ]: 0x00230400 => Misaligned r/w from 0x00000001
[  194.445597] [EFA   ]: 0x00000001
[  194.445597] [BLINK ]: tcp_ack+0x5e6/0x1598
[  194.445597] [ERET  ]: tcp_ack+0x606/0x1598
[  194.457087] [STAT32]: 0x0000020e : K       A1 E2 E1
[  194.462137] BTA: 0x8b3bf3b7   SP: 0x8b4c9c04  FP: 0x00000000
[  194.467777] LPS: 0x8b3ba37c  LPE: 0x8b3ba384 LPC: 0x00000000
[  194.473454] r00: 0x8ce503c0  r01: 0x8f3b34e4 r02: 0x00000001
[  194.473454] r03: 0xa0076e71  r04: 0x00000000 r05: 0x06e0ed35
[  194.473454] r06: 0x8f3b3800  r07: 0x00000000 r08: 0x0b9362f8
[  194.473454] r09: 0x00000000  r10: 0x00032e0c r11: 0x00000000
[  194.473454] r12: 0xefec0000  r13: 0x8f3b3400 r14: 0x8f3b3c00
[  194.473454] r15: 0x8ce503c0  r16: 0x00000001 r17: 0x8f3b3800
[  194.473454] r18: 0x00000000  r19: 0x00000000 r20: 0x66c2443f
[  194.473454] r21: 0x8b4c9c80  r22: 0x00000004 r23: 0x00000001
[  194.473454] r24: 0x00000000  r25: 0x8b4cb2e0
[  194.473454]
[  194.473454]
[  194.526128]
[  194.526128] Stack Trace:
[  194.530209]
[  194.530209] Firmware build version: pyang_sh-swbuild04_main2ac-cl101263
[  194.530216]
[  194.530216] Firmware configuration: pearl_10gax_config
[  194.538389]
[  194.538389] Hardware ID           : 65535
[  194.550632]   tcp_ack+0x606/0x1598
[  194.554160]   tcp_rcv_established+0x458/0x7d4
[  194.558646]   tcp_v4_do_rcv+0xda/0x120
[  194.562521]   tcp_v4_rcv+0x8f2/0xa04
[  194.566162]   ip_local_deliver+0x72/0x208
[  194.570287]   netif_receive_skb+0x62/0x104
[  194.574510]   br_handle_frame_finish.constprop.2+0x1a6/0x270
[  194.580299]   br_handle_frame+0x170/0x2a0
[  194.584427]   __netif_receive_skb_core+0x156/0x650
[  194.589346]   netif_receive_skb+0x50/0x104
[  194.593582]   wowlan_magic_packet_check+0xc68/0x16b8 [switch_tqe]
[  194.599825]   net_rx_action+0xfe/0x27c
[  194.603695]   __do_softirq+0xf0/0x228
[  194.607477]   __handle_domain_irq+0x5c/0x98
[  194.611732]   handle_interrupt_level1+0xcc/0xd8


Do you happen to know if this is already reported/fixed?
I can run more experiments/gather more debug data/stats if required.

Thanks in advance.

-Avinash

^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2019-11-19 22:39 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-11-10  5:59 Possible bug in TCP retry logic/Kernel crash Avinash Patil
2019-11-10 23:47 ` Eric Dumazet
2019-11-11  2:20   ` Avinash Patil
2019-11-16  1:52   ` Avinash Patil
2019-11-16  2:01     ` Eric Dumazet
2019-11-16  2:07       ` Avinash Patil
2019-11-16  2:29         ` Eric Dumazet
2019-11-19 22:39           ` Avinash Patil

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).