linux-rdma.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* rxe panic
@ 2019-12-25  4:55 Frank Huang
  2019-12-25  5:27 ` Zhu Yanjun
  2019-12-25  6:32 ` Leon Romanovsky
  0 siblings, 2 replies; 12+ messages in thread
From: Frank Huang @ 2019-12-25  4:55 UTC (permalink / raw)
  To: linux-rdma

hi, there is a panic on rdma_rxe module when the restart
network.service or shutdown the switch.

it looks like a use-after-free error.

everytime it happens, there is the log "rdma_rxe: Unknown layer 3 protocol: 0"

is it a known error?

my kernel version is 4.14.97

[448840.314544] rdma_rxe: Unknown layer 3 protocol: 0
[448840.314626] general protection fault: 0000 [#1] SMP PTI
[448840.314627] Modules linked in: binfmt_misc ib_isert
iscsi_target_mod ib_srpt target_core_mod rpcrdma ib_iser ib_srp
scsi_transport_srp rdma_rxe(OE) ib_ipoib ib_umad ip6_udp_tunnel
udp_tunnel rdma_ucm rdma_cm iw_cm ib_cm ib_uverbs ib_core
ebtable_filter ebtables devlink ip6table_filter ip6_tables
ipt_MASQUERADE nf_nat_masquerade_ipv4 nf_conntrack_netlink iptable_nat
xt_addrtype xt_conntrack br_netfilter bridge stp llc overlay
ip_set_hash_ip ip_set nfnetlink iscsi_tcp libiscsi_tcp libiscsi
scsi_transport_iscsi sch_ingress openvswitch nf_conntrack_ipv6
nf_nat_ipv6 nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4
nf_defrag_ipv6 nf_nat nf_conntrack libcrc32c sunrpc intel_rapl
x86_pkg_temp_thermal intel_powerclamp coretemp vfat fat kvm_intel kvm
irqbypass crct10dif_pclmul crc32_pclmul ghash_clmulni_intel
intel_cstate
[448840.314677]  intel_uncore intel_rapl_perf mxm_wmi iTCO_wdt
iTCO_vendor_support ipmi_ssif pcspkr i2c_i801 lpc_ich ipmi_si
ipmi_devintf ipmi_msghandler pcc_cpufreq shpchp wmi ast drm_kms_helper
ttm crc32c_intel drm ixgbe igb mdio ptp pps_core dca i2c_algo_bit
[448840.314700] CPU: 1 PID: 17 Comm: ksoftirqd/1 Tainted: G
OE   4.14.97-el7.centos.x86_64 #1
[448840.314701] Hardware name:  /80010211        , BIOS 3.12 11/27/2018
[448840.314703] task: ffff9ce768af8000 task.stack: ffffbd7c4c6c4000
[448840.314710] RIP: 0010:rxe_elem_release+0xf/0x60 [rdma_rxe]
[448840.314711] RSP: 0018:ffffbd7c4c6c7d28 EFLAGS: 00010246
[448840.314713] RAX: 0000000000000000 RBX: 2917351aae258b92 RCX:
0000000000000000
[448840.314714] RDX: ffff9cfb3f64ba40 RSI: 000000000000026c RDI:
ffff9cfb3f678008
[448840.314715] RBP: ffff9cfb3f678000 R08: 0000000000000201 R09:
ffffbd7c4df35000
[448840.314716] R10: 0000000000000000 R11: 0000000000000001 R12:
0000000000000000
[448840.314717] R13: 000000000000001d R14: 0000000000000006 R15:
ffff9cfb3f678000
[448840.314719] FS:  0000000000000000(0000) GS:ffff9ce76f840000(0000)
knlGS:0000000000000000
[448840.314720] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[448840.314721] CR2: 00007f4fc400f000 CR3: 000000260420a005 CR4:
00000000001626e0
[448840.314723] Call Trace:
[448840.314730]  rxe_responder+0xcf0/0x1fe0 [rdma_rxe]
[448840.314738]  ? check_preempt_wakeup+0x125/0x240
[448840.314742]  ? check_preempt_curr+0x84/0x90
[448840.314745]  ? ttwu_do_wakeup+0x19/0x140
[448840.314747]  ? try_to_wake_up+0x54/0x450
[448840.314751]  rxe_do_task+0x8b/0x100 [rdma_rxe]
[448840.314754]  tasklet_action+0xfe/0x110
[448840.314758]  __do_softirq+0xd9/0x2a2
[448840.314761]  run_ksoftirqd+0x1e/0x70
[448840.314763]  smpboot_thread_fn+0x10e/0x160
[448840.314766]  kthread+0xff/0x140
[448840.314768]  ? sort_range+0x20/0x20
[448840.314770]  ? __kthread_parkme+0x90/0x90
[448840.314771]  ret_from_fork+0x35/0x40
[448840.314773] Code: 7a 00 00 74 04 31 c0 eb c3 4c 89 e7 e8 bb f9 ff
ff 31 c0 eb b7 0f 1f 80 00 00 00 00 0f 1f 44 00 00 55 48 8d 6f f8 53
48 8b 5f f8 <48> 8b 43 20 48 85 c0 74 08 48 89 ef e8 60 1c 53 fb 8b 43
30 48
[448840.314817] RIP: rxe_elem_release+0xf/0x60 [rdma_rxe] RSP: ffffbd7c4c6c7d28

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: rxe panic
  2019-12-25  4:55 rxe panic Frank Huang
@ 2019-12-25  5:27 ` Zhu Yanjun
  2019-12-25  6:01   ` Frank Huang
  2019-12-25  6:32 ` Leon Romanovsky
  1 sibling, 1 reply; 12+ messages in thread
From: Zhu Yanjun @ 2019-12-25  5:27 UTC (permalink / raw)
  To: Frank Huang; +Cc: linux-rdma

Is there any vmcore about this problem?

On Wed, Dec 25, 2019 at 1:03 PM Frank Huang <tigerinxm@gmail.com> wrote:
>
> hi, there is a panic on rdma_rxe module when the restart
> network.service or shutdown the switch.
>
> it looks like a use-after-free error.
>
> everytime it happens, there is the log "rdma_rxe: Unknown layer 3 protocol: 0"
>
> is it a known error?
>
> my kernel version is 4.14.97
>
> [448840.314544] rdma_rxe: Unknown layer 3 protocol: 0
> [448840.314626] general protection fault: 0000 [#1] SMP PTI
> [448840.314627] Modules linked in: binfmt_misc ib_isert
> iscsi_target_mod ib_srpt target_core_mod rpcrdma ib_iser ib_srp
> scsi_transport_srp rdma_rxe(OE) ib_ipoib ib_umad ip6_udp_tunnel
> udp_tunnel rdma_ucm rdma_cm iw_cm ib_cm ib_uverbs ib_core
> ebtable_filter ebtables devlink ip6table_filter ip6_tables
> ipt_MASQUERADE nf_nat_masquerade_ipv4 nf_conntrack_netlink iptable_nat
> xt_addrtype xt_conntrack br_netfilter bridge stp llc overlay
> ip_set_hash_ip ip_set nfnetlink iscsi_tcp libiscsi_tcp libiscsi
> scsi_transport_iscsi sch_ingress openvswitch nf_conntrack_ipv6
> nf_nat_ipv6 nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4
> nf_defrag_ipv6 nf_nat nf_conntrack libcrc32c sunrpc intel_rapl
> x86_pkg_temp_thermal intel_powerclamp coretemp vfat fat kvm_intel kvm
> irqbypass crct10dif_pclmul crc32_pclmul ghash_clmulni_intel
> intel_cstate
> [448840.314677]  intel_uncore intel_rapl_perf mxm_wmi iTCO_wdt
> iTCO_vendor_support ipmi_ssif pcspkr i2c_i801 lpc_ich ipmi_si
> ipmi_devintf ipmi_msghandler pcc_cpufreq shpchp wmi ast drm_kms_helper
> ttm crc32c_intel drm ixgbe igb mdio ptp pps_core dca i2c_algo_bit
> [448840.314700] CPU: 1 PID: 17 Comm: ksoftirqd/1 Tainted: G
> OE   4.14.97-el7.centos.x86_64 #1
> [448840.314701] Hardware name:  /80010211        , BIOS 3.12 11/27/2018
> [448840.314703] task: ffff9ce768af8000 task.stack: ffffbd7c4c6c4000
> [448840.314710] RIP: 0010:rxe_elem_release+0xf/0x60 [rdma_rxe]
> [448840.314711] RSP: 0018:ffffbd7c4c6c7d28 EFLAGS: 00010246
> [448840.314713] RAX: 0000000000000000 RBX: 2917351aae258b92 RCX:
> 0000000000000000
> [448840.314714] RDX: ffff9cfb3f64ba40 RSI: 000000000000026c RDI:
> ffff9cfb3f678008
> [448840.314715] RBP: ffff9cfb3f678000 R08: 0000000000000201 R09:
> ffffbd7c4df35000
> [448840.314716] R10: 0000000000000000 R11: 0000000000000001 R12:
> 0000000000000000
> [448840.314717] R13: 000000000000001d R14: 0000000000000006 R15:
> ffff9cfb3f678000
> [448840.314719] FS:  0000000000000000(0000) GS:ffff9ce76f840000(0000)
> knlGS:0000000000000000
> [448840.314720] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [448840.314721] CR2: 00007f4fc400f000 CR3: 000000260420a005 CR4:
> 00000000001626e0
> [448840.314723] Call Trace:
> [448840.314730]  rxe_responder+0xcf0/0x1fe0 [rdma_rxe]
> [448840.314738]  ? check_preempt_wakeup+0x125/0x240
> [448840.314742]  ? check_preempt_curr+0x84/0x90
> [448840.314745]  ? ttwu_do_wakeup+0x19/0x140
> [448840.314747]  ? try_to_wake_up+0x54/0x450
> [448840.314751]  rxe_do_task+0x8b/0x100 [rdma_rxe]
> [448840.314754]  tasklet_action+0xfe/0x110
> [448840.314758]  __do_softirq+0xd9/0x2a2
> [448840.314761]  run_ksoftirqd+0x1e/0x70
> [448840.314763]  smpboot_thread_fn+0x10e/0x160
> [448840.314766]  kthread+0xff/0x140
> [448840.314768]  ? sort_range+0x20/0x20
> [448840.314770]  ? __kthread_parkme+0x90/0x90
> [448840.314771]  ret_from_fork+0x35/0x40
> [448840.314773] Code: 7a 00 00 74 04 31 c0 eb c3 4c 89 e7 e8 bb f9 ff
> ff 31 c0 eb b7 0f 1f 80 00 00 00 00 0f 1f 44 00 00 55 48 8d 6f f8 53
> 48 8b 5f f8 <48> 8b 43 20 48 85 c0 74 08 48 89 ef e8 60 1c 53 fb 8b 43
> 30 48
> [448840.314817] RIP: rxe_elem_release+0xf/0x60 [rdma_rxe] RSP: ffffbd7c4c6c7d28

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: rxe panic
  2019-12-25  5:27 ` Zhu Yanjun
@ 2019-12-25  6:01   ` Frank Huang
  2019-12-25  6:34     ` Zhu Yanjun
  0 siblings, 1 reply; 12+ messages in thread
From: Frank Huang @ 2019-12-25  6:01 UTC (permalink / raw)
  To: Zhu Yanjun; +Cc: linux-rdma

yes,

what is the information should i post?

crash> bt
PID: 108    TASK: ffff978e28548000  CPU: 16  COMMAND: "ksoftirqd/16"
 #0 [ffffa2f14c9a7b18] machine_kexec at ffffffff8f059992
 #1 [ffffa2f14c9a7b70] __crash_kexec at ffffffff8f13cf7d
 #2 [ffffa2f14c9a7c38] crash_kexec at ffffffff8f13e089
 #3 [ffffa2f14c9a7c50] oops_end at ffffffff8f027a77
 #4 [ffffa2f14c9a7c70] general_protection at ffffffff8fa01635
    [exception RIP: rxe_elem_release+15]
    RIP: ffffffffc08da38f  RSP: ffffa2f14c9a7d28  RFLAGS: 00010246
    RAX: 0000000000000000  RBX: 860e42124013b0aa  RCX: 0000000000000000
    RDX: ffff978e03ba8900  RSI: 0000000000000281  RDI: ffff978e02e746e8
    RBP: ffff978e02e746e0   R8: 0000000000000201   R9: ffffa2f14dcb9000
    R10: 0000000000000000  R11: 0000000000000001  R12: 0000000000000000
    R13: 000000000000001d  R14: 0000000000000006  R15: ffff978e02e746e0
    ORIG_RAX: ffffffffffffffff  CS: 0010  SS: 0018
 #5 [ffffa2f14c9a7d38] rxe_responder at ffffffffc08d7d10 [rdma_rxe]
 #6 [ffffa2f14c9a7e48] rxe_do_task at ffffffffc08e060b [rdma_rxe]
 #7 [ffffa2f14c9a7e70] tasklet_action at ffffffff8f0afa1e
 #8 [ffffa2f14c9a7e88] __softirqentry_text_start at ffffffff8fc000d9
 #9 [ffffa2f14c9a7ee0] run_ksoftirqd at ffffffff8f0afa4e
#10 [ffffa2f14c9a7ee8] smpboot_thread_fn at ffffffff8f0cca5e
#11 [ffffa2f14c9a7f10] kthread at ffffffff8f0c8c9f
#12 [ffffa2f14c9a7f50] ret_from_fork at ffffffff8fa00205
crash> dis -l ffffffffc08d7d10
0xffffffffc08d7d10 <rxe_responder+3312>:        jmpq
0xffffffffc08d7c6c <rxe_responder+3148>
crash>

0xffffffffc08d7c97 <rxe_responder+3191>:        mov    0xec(%r15),%eax
0xffffffffc08d7c9e <rxe_responder+3198>:        cmp    $0x2,%eax
0xffffffffc08d7ca1 <rxe_responder+3201>:        je
0xffffffffc08d8213 <rxe_responder+4595>
0xffffffffc08d7ca7 <rxe_responder+3207>:        cmp    $0x3,%eax
0xffffffffc08d7caa <rxe_responder+3210>:        jne
0xffffffffc08d7ecc <rxe_responder+3756>
0xffffffffc08d7cb0 <rxe_responder+3216>:        mov    0x450(%r15),%eax
0xffffffffc08d7cb7 <rxe_responder+3223>:        cmp    $0x20,%eax
0xffffffffc08d7cba <rxe_responder+3226>:        jl
0xffffffffc08d873e <rxe_responder+5918>
0xffffffffc08d7cc0 <rxe_responder+3232>:        cmp    $0x21,%eax
0xffffffffc08d7cc3 <rxe_responder+3235>:        jle
0xffffffffc08d8725 <rxe_responder+5893>
0xffffffffc08d7cc9 <rxe_responder+3241>:        sub    $0x26,%eax
0xffffffffc08d7ccc <rxe_responder+3244>:        cmp    $0x1,%eax
0xffffffffc08d7ccf <rxe_responder+3247>:        ja
0xffffffffc08d873e <rxe_responder+5918>
0xffffffffc08d7cd5 <rxe_responder+3253>:        movzbl 0x2d(%rbx),%eax
0xffffffffc08d7cd9 <rxe_responder+3257>:        sub    $0x27,%eax
0xffffffffc08d7cdc <rxe_responder+3260>:        cmp    $0x3,%al
0xffffffffc08d7cde <rxe_responder+3262>:        sbb    %r13d,%r13d
0xffffffffc08d7ce1 <rxe_responder+3265>:        and    $0xfffffff0,%r13d
0xffffffffc08d7ce5 <rxe_responder+3269>:        add    $0x14,%r13d
0xffffffffc08d7ce9 <rxe_responder+3273>:        jmpq
0xffffffffc08d70a2 <rxe_responder+130>
0xffffffffc08d7cee <rxe_responder+3278>:        mov    %rbp,%rdi
0xffffffffc08d7cf1 <rxe_responder+3281>:        callq
0xffffffffc08da380 <rxe_elem_release>
0xffffffffc08d7cf6 <rxe_responder+3286>:        jmpq
0xffffffffc08d7b66 <rxe_responder+2886>
0xffffffffc08d7cfb <rxe_responder+3291>:        mov    %rbp,%rdi
0xffffffffc08d7cfe <rxe_responder+3294>:        callq
0xffffffffc08da380 <rxe_elem_release>
0xffffffffc08d7d03 <rxe_responder+3299>:        jmpq
0xffffffffc08d7b14 <rxe_responder+2804>
0xffffffffc08d7d08 <rxe_responder+3304>:        mov    %rbp,%rdi
0xffffffffc08d7d0b <rxe_responder+3307>:        callq
0xffffffffc08da380 <rxe_elem_release>
0xffffffffc08d7d10 <rxe_responder+3312>:        jmpq
0xffffffffc08d7c6c <rxe_responder+3148>
0xffffffffc08d7d15 <rxe_responder+3317>:        test   $0x10000,%eax
0xffffffffc08d7d1a <rxe_responder+3322>:        je
0xffffffffc08d804f <rxe_responder+4143>
0xffffffffc08d7d20 <rxe_responder+3328>:        mov    0x24(%rbx),%r12d
0xffffffffc08d7d24 <rxe_responder+3332>:        movzbl 0x19f(%r15),%edi
0xffffffffc08d7d2c <rxe_responder+3340>:        lea    0x6c0(%r15),%rsi
0xffffffffc08d7d33 <rxe_responder+3347>:        mov    %r12d,%edx
0xffffffffc08d7d36 <rxe_responder+3350>:        callq
0xffffffffc08d6af0 <find_resource>
0xffffffffc08d7d3b <rxe_responder+3355>:        test   %rax,%rax
0xffffffffc08d7d3e <rxe_responder+3358>:        je
0xffffffffc08d8c40 <rxe_responder+7200>
0xffffffffc08d7d44 <rxe_responder+3364>:        movzbl 0x2d(%rbx),%edx
0xffffffffc08d7d48 <rxe_responder+3368>:        movzbl 0x2e(%rbx),%ecx
0xffffffffc08d7d4c <rxe_responder+3372>:        mov    $0xc,%r13d
0xffffffffc08d7d52 <rxe_responder+3378>:        mov    0x20(%rax),%rdi
0xffffffffc08d7d56 <rxe_responder+3382>:        shl    $0x6,%rdx
0xffffffffc08d7d5a <rxe_responder+3386>:        movslq -0x3f715564(%rdx),%rdx
0xffffffffc08d7d61 <rxe_responder+3393>:        add    %rdx,%rcx
0xffffffffc08d7d64 <rxe_responder+3396>:        add    0x18(%rbx),%rcx
0xffffffffc08d7d68 <rxe_responder+3400>:        mov    (%rcx),%rdx
0xffffffffc08d7d6b <rxe_responder+3403>:        mov    0xc(%rcx),%esi
0xffffffffc08d7d6e <rxe_responder+3406>:        bswap  %rdx
0xffffffffc08d7d71 <rxe_responder+3409>:        bswap  %esi
0xffffffffc08d7d73 <rxe_responder+3411>:        cmp    %rdi,%rdx
0xffffffffc08d7d76 <rxe_responder+3414>:        jb
0xffffffffc08d70a2 <rxe_responder+130>
0xffffffffc08d7d7c <rxe_responder+3420>:        mov    0x2c(%rax),%r8d
0xffffffffc08d7d80 <rxe_responder+3424>:        cmp    %r8d,%esi
0xffffffffc08d7d83 <rxe_responder+3427>:        ja
0xffffffffc08d70a2 <rxe_responder+130>
0xffffffffc08d7d89 <rxe_responder+3433>:        mov    %esi,%r9d
0xffffffffc08d7d8c <rxe_responder+3436>:        add    %r8,%rdi
0xffffffffc08d7d8f <rxe_responder+3439>:        add    %rdx,%r9
0xffffffffc08d7d92 <rxe_responder+3442>:        cmp    %rdi,%r9
0xffffffffc08d7d95 <rxe_responder+3445>:        ja
0xffffffffc08d70a2 <rxe_responder+130>
0xffffffffc08d7d9b <rxe_responder+3451>:        mov    0x8(%rcx),%ecx
0xffffffffc08d7d9e <rxe_responder+3454>:        bswap  %ecx
0xffffffffc08d7da0 <rxe_responder+3456>:        cmp    0x28(%rax),%ecx

On Wed, Dec 25, 2019 at 1:28 PM Zhu Yanjun <zyjzyj2000@gmail.com> wrote:
>
> Is there any vmcore about this problem?
>
> On Wed, Dec 25, 2019 at 1:03 PM Frank Huang <tigerinxm@gmail.com> wrote:
> >
> > hi, there is a panic on rdma_rxe module when the restart
> > network.service or shutdown the switch.
> >
> > it looks like a use-after-free error.
> >
> > everytime it happens, there is the log "rdma_rxe: Unknown layer 3 protocol: 0"
> >
> > is it a known error?
> >
> > my kernel version is 4.14.97
> >
> > [448840.314544] rdma_rxe: Unknown layer 3 protocol: 0
> > [448840.314626] general protection fault: 0000 [#1] SMP PTI
> > [448840.314627] Modules linked in: binfmt_misc ib_isert
> > iscsi_target_mod ib_srpt target_core_mod rpcrdma ib_iser ib_srp
> > scsi_transport_srp rdma_rxe(OE) ib_ipoib ib_umad ip6_udp_tunnel
> > udp_tunnel rdma_ucm rdma_cm iw_cm ib_cm ib_uverbs ib_core
> > ebtable_filter ebtables devlink ip6table_filter ip6_tables
> > ipt_MASQUERADE nf_nat_masquerade_ipv4 nf_conntrack_netlink iptable_nat
> > xt_addrtype xt_conntrack br_netfilter bridge stp llc overlay
> > ip_set_hash_ip ip_set nfnetlink iscsi_tcp libiscsi_tcp libiscsi
> > scsi_transport_iscsi sch_ingress openvswitch nf_conntrack_ipv6
> > nf_nat_ipv6 nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4
> > nf_defrag_ipv6 nf_nat nf_conntrack libcrc32c sunrpc intel_rapl
> > x86_pkg_temp_thermal intel_powerclamp coretemp vfat fat kvm_intel kvm
> > irqbypass crct10dif_pclmul crc32_pclmul ghash_clmulni_intel
> > intel_cstate
> > [448840.314677]  intel_uncore intel_rapl_perf mxm_wmi iTCO_wdt
> > iTCO_vendor_support ipmi_ssif pcspkr i2c_i801 lpc_ich ipmi_si
> > ipmi_devintf ipmi_msghandler pcc_cpufreq shpchp wmi ast drm_kms_helper
> > ttm crc32c_intel drm ixgbe igb mdio ptp pps_core dca i2c_algo_bit
> > [448840.314700] CPU: 1 PID: 17 Comm: ksoftirqd/1 Tainted: G
> > OE   4.14.97-el7.centos.x86_64 #1
> > [448840.314701] Hardware name:  /80010211        , BIOS 3.12 11/27/2018
> > [448840.314703] task: ffff9ce768af8000 task.stack: ffffbd7c4c6c4000
> > [448840.314710] RIP: 0010:rxe_elem_release+0xf/0x60 [rdma_rxe]
> > [448840.314711] RSP: 0018:ffffbd7c4c6c7d28 EFLAGS: 00010246
> > [448840.314713] RAX: 0000000000000000 RBX: 2917351aae258b92 RCX:
> > 0000000000000000
> > [448840.314714] RDX: ffff9cfb3f64ba40 RSI: 000000000000026c RDI:
> > ffff9cfb3f678008
> > [448840.314715] RBP: ffff9cfb3f678000 R08: 0000000000000201 R09:
> > ffffbd7c4df35000
> > [448840.314716] R10: 0000000000000000 R11: 0000000000000001 R12:
> > 0000000000000000
> > [448840.314717] R13: 000000000000001d R14: 0000000000000006 R15:
> > ffff9cfb3f678000
> > [448840.314719] FS:  0000000000000000(0000) GS:ffff9ce76f840000(0000)
> > knlGS:0000000000000000
> > [448840.314720] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> > [448840.314721] CR2: 00007f4fc400f000 CR3: 000000260420a005 CR4:
> > 00000000001626e0
> > [448840.314723] Call Trace:
> > [448840.314730]  rxe_responder+0xcf0/0x1fe0 [rdma_rxe]
> > [448840.314738]  ? check_preempt_wakeup+0x125/0x240
> > [448840.314742]  ? check_preempt_curr+0x84/0x90
> > [448840.314745]  ? ttwu_do_wakeup+0x19/0x140
> > [448840.314747]  ? try_to_wake_up+0x54/0x450
> > [448840.314751]  rxe_do_task+0x8b/0x100 [rdma_rxe]
> > [448840.314754]  tasklet_action+0xfe/0x110
> > [448840.314758]  __do_softirq+0xd9/0x2a2
> > [448840.314761]  run_ksoftirqd+0x1e/0x70
> > [448840.314763]  smpboot_thread_fn+0x10e/0x160
> > [448840.314766]  kthread+0xff/0x140
> > [448840.314768]  ? sort_range+0x20/0x20
> > [448840.314770]  ? __kthread_parkme+0x90/0x90
> > [448840.314771]  ret_from_fork+0x35/0x40
> > [448840.314773] Code: 7a 00 00 74 04 31 c0 eb c3 4c 89 e7 e8 bb f9 ff
> > ff 31 c0 eb b7 0f 1f 80 00 00 00 00 0f 1f 44 00 00 55 48 8d 6f f8 53
> > 48 8b 5f f8 <48> 8b 43 20 48 85 c0 74 08 48 89 ef e8 60 1c 53 fb 8b 43
> > 30 48
> > [448840.314817] RIP: rxe_elem_release+0xf/0x60 [rdma_rxe] RSP: ffffbd7c4c6c7d28

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: rxe panic
  2019-12-25  4:55 rxe panic Frank Huang
  2019-12-25  5:27 ` Zhu Yanjun
@ 2019-12-25  6:32 ` Leon Romanovsky
  2019-12-25  7:23   ` Frank Huang
  1 sibling, 1 reply; 12+ messages in thread
From: Leon Romanovsky @ 2019-12-25  6:32 UTC (permalink / raw)
  To: Frank Huang; +Cc: linux-rdma

On Wed, Dec 25, 2019 at 12:55:35PM +0800, Frank Huang wrote:
> hi, there is a panic on rdma_rxe module when the restart
> network.service or shutdown the switch.
>
> it looks like a use-after-free error.
>
> everytime it happens, there is the log "rdma_rxe: Unknown layer 3 protocol: 0"

The error print itself is harmless.
>
> is it a known error?
>
> my kernel version is 4.14.97

Your kernel is old enough and doesn't include refcount,
so I can't say for sure that it is the case, but the
following code is not correct and with refcount debug
it will be seen immediately.

1213 int rxe_responder(void *arg)
1214 {
1215         struct rxe_qp *qp = (struct rxe_qp *)arg;
1216         struct rxe_dev *rxe = to_rdev(qp->ibqp.device);
1217         enum resp_states state;
1218         struct rxe_pkt_info *pkt = NULL;
1219         int ret = 0;
1220
1221         rxe_add_ref(qp); <------ USE-AFTER-FREE
1222
1223         qp->resp.aeth_syndrome = AETH_ACK_UNLIMITED;
1224
1225         if (!qp->valid) {
1226                 ret = -EINVAL;
1227                 goto done;
1228         }

Thanks

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: rxe panic
  2019-12-25  6:01   ` Frank Huang
@ 2019-12-25  6:34     ` Zhu Yanjun
  2019-12-25  7:10       ` Frank Huang
  0 siblings, 1 reply; 12+ messages in thread
From: Zhu Yanjun @ 2019-12-25  6:34 UTC (permalink / raw)
  To: Frank Huang; +Cc: linux-rdma

Please install kernel-dbg file. And run "mod -S
directory-of-kernel-ko". Then run "dis -lr rxe_elem_release+15".
Show us the result.

On Wed, Dec 25, 2019 at 2:02 PM Frank Huang <tigerinxm@gmail.com> wrote:
>
> yes,
>
> what is the information should i post?
>
> crash> bt
> PID: 108    TASK: ffff978e28548000  CPU: 16  COMMAND: "ksoftirqd/16"
>  #0 [ffffa2f14c9a7b18] machine_kexec at ffffffff8f059992
>  #1 [ffffa2f14c9a7b70] __crash_kexec at ffffffff8f13cf7d
>  #2 [ffffa2f14c9a7c38] crash_kexec at ffffffff8f13e089
>  #3 [ffffa2f14c9a7c50] oops_end at ffffffff8f027a77
>  #4 [ffffa2f14c9a7c70] general_protection at ffffffff8fa01635
>     [exception RIP: rxe_elem_release+15]
>     RIP: ffffffffc08da38f  RSP: ffffa2f14c9a7d28  RFLAGS: 00010246
>     RAX: 0000000000000000  RBX: 860e42124013b0aa  RCX: 0000000000000000
>     RDX: ffff978e03ba8900  RSI: 0000000000000281  RDI: ffff978e02e746e8
>     RBP: ffff978e02e746e0   R8: 0000000000000201   R9: ffffa2f14dcb9000
>     R10: 0000000000000000  R11: 0000000000000001  R12: 0000000000000000
>     R13: 000000000000001d  R14: 0000000000000006  R15: ffff978e02e746e0
>     ORIG_RAX: ffffffffffffffff  CS: 0010  SS: 0018
>  #5 [ffffa2f14c9a7d38] rxe_responder at ffffffffc08d7d10 [rdma_rxe]
>  #6 [ffffa2f14c9a7e48] rxe_do_task at ffffffffc08e060b [rdma_rxe]
>  #7 [ffffa2f14c9a7e70] tasklet_action at ffffffff8f0afa1e
>  #8 [ffffa2f14c9a7e88] __softirqentry_text_start at ffffffff8fc000d9
>  #9 [ffffa2f14c9a7ee0] run_ksoftirqd at ffffffff8f0afa4e
> #10 [ffffa2f14c9a7ee8] smpboot_thread_fn at ffffffff8f0cca5e
> #11 [ffffa2f14c9a7f10] kthread at ffffffff8f0c8c9f
> #12 [ffffa2f14c9a7f50] ret_from_fork at ffffffff8fa00205
> crash> dis -l ffffffffc08d7d10
> 0xffffffffc08d7d10 <rxe_responder+3312>:        jmpq
> 0xffffffffc08d7c6c <rxe_responder+3148>
> crash>
>
> 0xffffffffc08d7c97 <rxe_responder+3191>:        mov    0xec(%r15),%eax
> 0xffffffffc08d7c9e <rxe_responder+3198>:        cmp    $0x2,%eax
> 0xffffffffc08d7ca1 <rxe_responder+3201>:        je
> 0xffffffffc08d8213 <rxe_responder+4595>
> 0xffffffffc08d7ca7 <rxe_responder+3207>:        cmp    $0x3,%eax
> 0xffffffffc08d7caa <rxe_responder+3210>:        jne
> 0xffffffffc08d7ecc <rxe_responder+3756>
> 0xffffffffc08d7cb0 <rxe_responder+3216>:        mov    0x450(%r15),%eax
> 0xffffffffc08d7cb7 <rxe_responder+3223>:        cmp    $0x20,%eax
> 0xffffffffc08d7cba <rxe_responder+3226>:        jl
> 0xffffffffc08d873e <rxe_responder+5918>
> 0xffffffffc08d7cc0 <rxe_responder+3232>:        cmp    $0x21,%eax
> 0xffffffffc08d7cc3 <rxe_responder+3235>:        jle
> 0xffffffffc08d8725 <rxe_responder+5893>
> 0xffffffffc08d7cc9 <rxe_responder+3241>:        sub    $0x26,%eax
> 0xffffffffc08d7ccc <rxe_responder+3244>:        cmp    $0x1,%eax
> 0xffffffffc08d7ccf <rxe_responder+3247>:        ja
> 0xffffffffc08d873e <rxe_responder+5918>
> 0xffffffffc08d7cd5 <rxe_responder+3253>:        movzbl 0x2d(%rbx),%eax
> 0xffffffffc08d7cd9 <rxe_responder+3257>:        sub    $0x27,%eax
> 0xffffffffc08d7cdc <rxe_responder+3260>:        cmp    $0x3,%al
> 0xffffffffc08d7cde <rxe_responder+3262>:        sbb    %r13d,%r13d
> 0xffffffffc08d7ce1 <rxe_responder+3265>:        and    $0xfffffff0,%r13d
> 0xffffffffc08d7ce5 <rxe_responder+3269>:        add    $0x14,%r13d
> 0xffffffffc08d7ce9 <rxe_responder+3273>:        jmpq
> 0xffffffffc08d70a2 <rxe_responder+130>
> 0xffffffffc08d7cee <rxe_responder+3278>:        mov    %rbp,%rdi
> 0xffffffffc08d7cf1 <rxe_responder+3281>:        callq
> 0xffffffffc08da380 <rxe_elem_release>
> 0xffffffffc08d7cf6 <rxe_responder+3286>:        jmpq
> 0xffffffffc08d7b66 <rxe_responder+2886>
> 0xffffffffc08d7cfb <rxe_responder+3291>:        mov    %rbp,%rdi
> 0xffffffffc08d7cfe <rxe_responder+3294>:        callq
> 0xffffffffc08da380 <rxe_elem_release>
> 0xffffffffc08d7d03 <rxe_responder+3299>:        jmpq
> 0xffffffffc08d7b14 <rxe_responder+2804>
> 0xffffffffc08d7d08 <rxe_responder+3304>:        mov    %rbp,%rdi
> 0xffffffffc08d7d0b <rxe_responder+3307>:        callq
> 0xffffffffc08da380 <rxe_elem_release>
> 0xffffffffc08d7d10 <rxe_responder+3312>:        jmpq
> 0xffffffffc08d7c6c <rxe_responder+3148>
> 0xffffffffc08d7d15 <rxe_responder+3317>:        test   $0x10000,%eax
> 0xffffffffc08d7d1a <rxe_responder+3322>:        je
> 0xffffffffc08d804f <rxe_responder+4143>
> 0xffffffffc08d7d20 <rxe_responder+3328>:        mov    0x24(%rbx),%r12d
> 0xffffffffc08d7d24 <rxe_responder+3332>:        movzbl 0x19f(%r15),%edi
> 0xffffffffc08d7d2c <rxe_responder+3340>:        lea    0x6c0(%r15),%rsi
> 0xffffffffc08d7d33 <rxe_responder+3347>:        mov    %r12d,%edx
> 0xffffffffc08d7d36 <rxe_responder+3350>:        callq
> 0xffffffffc08d6af0 <find_resource>
> 0xffffffffc08d7d3b <rxe_responder+3355>:        test   %rax,%rax
> 0xffffffffc08d7d3e <rxe_responder+3358>:        je
> 0xffffffffc08d8c40 <rxe_responder+7200>
> 0xffffffffc08d7d44 <rxe_responder+3364>:        movzbl 0x2d(%rbx),%edx
> 0xffffffffc08d7d48 <rxe_responder+3368>:        movzbl 0x2e(%rbx),%ecx
> 0xffffffffc08d7d4c <rxe_responder+3372>:        mov    $0xc,%r13d
> 0xffffffffc08d7d52 <rxe_responder+3378>:        mov    0x20(%rax),%rdi
> 0xffffffffc08d7d56 <rxe_responder+3382>:        shl    $0x6,%rdx
> 0xffffffffc08d7d5a <rxe_responder+3386>:        movslq -0x3f715564(%rdx),%rdx
> 0xffffffffc08d7d61 <rxe_responder+3393>:        add    %rdx,%rcx
> 0xffffffffc08d7d64 <rxe_responder+3396>:        add    0x18(%rbx),%rcx
> 0xffffffffc08d7d68 <rxe_responder+3400>:        mov    (%rcx),%rdx
> 0xffffffffc08d7d6b <rxe_responder+3403>:        mov    0xc(%rcx),%esi
> 0xffffffffc08d7d6e <rxe_responder+3406>:        bswap  %rdx
> 0xffffffffc08d7d71 <rxe_responder+3409>:        bswap  %esi
> 0xffffffffc08d7d73 <rxe_responder+3411>:        cmp    %rdi,%rdx
> 0xffffffffc08d7d76 <rxe_responder+3414>:        jb
> 0xffffffffc08d70a2 <rxe_responder+130>
> 0xffffffffc08d7d7c <rxe_responder+3420>:        mov    0x2c(%rax),%r8d
> 0xffffffffc08d7d80 <rxe_responder+3424>:        cmp    %r8d,%esi
> 0xffffffffc08d7d83 <rxe_responder+3427>:        ja
> 0xffffffffc08d70a2 <rxe_responder+130>
> 0xffffffffc08d7d89 <rxe_responder+3433>:        mov    %esi,%r9d
> 0xffffffffc08d7d8c <rxe_responder+3436>:        add    %r8,%rdi
> 0xffffffffc08d7d8f <rxe_responder+3439>:        add    %rdx,%r9
> 0xffffffffc08d7d92 <rxe_responder+3442>:        cmp    %rdi,%r9
> 0xffffffffc08d7d95 <rxe_responder+3445>:        ja
> 0xffffffffc08d70a2 <rxe_responder+130>
> 0xffffffffc08d7d9b <rxe_responder+3451>:        mov    0x8(%rcx),%ecx
> 0xffffffffc08d7d9e <rxe_responder+3454>:        bswap  %ecx
> 0xffffffffc08d7da0 <rxe_responder+3456>:        cmp    0x28(%rax),%ecx
>
> On Wed, Dec 25, 2019 at 1:28 PM Zhu Yanjun <zyjzyj2000@gmail.com> wrote:
> >
> > Is there any vmcore about this problem?
> >
> > On Wed, Dec 25, 2019 at 1:03 PM Frank Huang <tigerinxm@gmail.com> wrote:
> > >
> > > hi, there is a panic on rdma_rxe module when the restart
> > > network.service or shutdown the switch.
> > >
> > > it looks like a use-after-free error.
> > >
> > > everytime it happens, there is the log "rdma_rxe: Unknown layer 3 protocol: 0"
> > >
> > > is it a known error?
> > >
> > > my kernel version is 4.14.97
> > >
> > > [448840.314544] rdma_rxe: Unknown layer 3 protocol: 0
> > > [448840.314626] general protection fault: 0000 [#1] SMP PTI
> > > [448840.314627] Modules linked in: binfmt_misc ib_isert
> > > iscsi_target_mod ib_srpt target_core_mod rpcrdma ib_iser ib_srp
> > > scsi_transport_srp rdma_rxe(OE) ib_ipoib ib_umad ip6_udp_tunnel
> > > udp_tunnel rdma_ucm rdma_cm iw_cm ib_cm ib_uverbs ib_core
> > > ebtable_filter ebtables devlink ip6table_filter ip6_tables
> > > ipt_MASQUERADE nf_nat_masquerade_ipv4 nf_conntrack_netlink iptable_nat
> > > xt_addrtype xt_conntrack br_netfilter bridge stp llc overlay
> > > ip_set_hash_ip ip_set nfnetlink iscsi_tcp libiscsi_tcp libiscsi
> > > scsi_transport_iscsi sch_ingress openvswitch nf_conntrack_ipv6
> > > nf_nat_ipv6 nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4
> > > nf_defrag_ipv6 nf_nat nf_conntrack libcrc32c sunrpc intel_rapl
> > > x86_pkg_temp_thermal intel_powerclamp coretemp vfat fat kvm_intel kvm
> > > irqbypass crct10dif_pclmul crc32_pclmul ghash_clmulni_intel
> > > intel_cstate
> > > [448840.314677]  intel_uncore intel_rapl_perf mxm_wmi iTCO_wdt
> > > iTCO_vendor_support ipmi_ssif pcspkr i2c_i801 lpc_ich ipmi_si
> > > ipmi_devintf ipmi_msghandler pcc_cpufreq shpchp wmi ast drm_kms_helper
> > > ttm crc32c_intel drm ixgbe igb mdio ptp pps_core dca i2c_algo_bit
> > > [448840.314700] CPU: 1 PID: 17 Comm: ksoftirqd/1 Tainted: G
> > > OE   4.14.97-el7.centos.x86_64 #1
> > > [448840.314701] Hardware name:  /80010211        , BIOS 3.12 11/27/2018
> > > [448840.314703] task: ffff9ce768af8000 task.stack: ffffbd7c4c6c4000
> > > [448840.314710] RIP: 0010:rxe_elem_release+0xf/0x60 [rdma_rxe]
> > > [448840.314711] RSP: 0018:ffffbd7c4c6c7d28 EFLAGS: 00010246
> > > [448840.314713] RAX: 0000000000000000 RBX: 2917351aae258b92 RCX:
> > > 0000000000000000
> > > [448840.314714] RDX: ffff9cfb3f64ba40 RSI: 000000000000026c RDI:
> > > ffff9cfb3f678008
> > > [448840.314715] RBP: ffff9cfb3f678000 R08: 0000000000000201 R09:
> > > ffffbd7c4df35000
> > > [448840.314716] R10: 0000000000000000 R11: 0000000000000001 R12:
> > > 0000000000000000
> > > [448840.314717] R13: 000000000000001d R14: 0000000000000006 R15:
> > > ffff9cfb3f678000
> > > [448840.314719] FS:  0000000000000000(0000) GS:ffff9ce76f840000(0000)
> > > knlGS:0000000000000000
> > > [448840.314720] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> > > [448840.314721] CR2: 00007f4fc400f000 CR3: 000000260420a005 CR4:
> > > 00000000001626e0
> > > [448840.314723] Call Trace:
> > > [448840.314730]  rxe_responder+0xcf0/0x1fe0 [rdma_rxe]
> > > [448840.314738]  ? check_preempt_wakeup+0x125/0x240
> > > [448840.314742]  ? check_preempt_curr+0x84/0x90
> > > [448840.314745]  ? ttwu_do_wakeup+0x19/0x140
> > > [448840.314747]  ? try_to_wake_up+0x54/0x450
> > > [448840.314751]  rxe_do_task+0x8b/0x100 [rdma_rxe]
> > > [448840.314754]  tasklet_action+0xfe/0x110
> > > [448840.314758]  __do_softirq+0xd9/0x2a2
> > > [448840.314761]  run_ksoftirqd+0x1e/0x70
> > > [448840.314763]  smpboot_thread_fn+0x10e/0x160
> > > [448840.314766]  kthread+0xff/0x140
> > > [448840.314768]  ? sort_range+0x20/0x20
> > > [448840.314770]  ? __kthread_parkme+0x90/0x90
> > > [448840.314771]  ret_from_fork+0x35/0x40
> > > [448840.314773] Code: 7a 00 00 74 04 31 c0 eb c3 4c 89 e7 e8 bb f9 ff
> > > ff 31 c0 eb b7 0f 1f 80 00 00 00 00 0f 1f 44 00 00 55 48 8d 6f f8 53
> > > 48 8b 5f f8 <48> 8b 43 20 48 85 c0 74 08 48 89 ef e8 60 1c 53 fb 8b 43
> > > 30 48
> > > [448840.314817] RIP: rxe_elem_release+0xf/0x60 [rdma_rxe] RSP: ffffbd7c4c6c7d28

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: rxe panic
  2019-12-25  6:34     ` Zhu Yanjun
@ 2019-12-25  7:10       ` Frank Huang
  0 siblings, 0 replies; 12+ messages in thread
From: Frank Huang @ 2019-12-25  7:10 UTC (permalink / raw)
  To: Zhu Yanjun; +Cc: linux-rdma

hi, zhu

Here is the detail.  Wish it is what your wanted.


[root@test 127.0.0.1-2019-11-11-18:59:57]# crash
/usr/lib/debug/lib/modules/$(uname -r)/vmlinux vmcore

crash 7.2.3-10.el7
Copyright (C) 2002-2017  Red Hat, Inc.
Copyright (C) 2004, 2005, 2006, 2010  IBM Corporation
Copyright (C) 1999-2006  Hewlett-Packard Co
Copyright (C) 2005, 2006, 2011, 2012  Fujitsu Limited
Copyright (C) 2006, 2007  VA Linux Systems Japan K.K.
Copyright (C) 2005, 2011  NEC Corporation
Copyright (C) 1999, 2002, 2007  Silicon Graphics, Inc.
Copyright (C) 1999, 2000, 2001, 2002  Mission Critical Linux, Inc.
This program is free software, covered by the GNU General Public License,
and you are welcome to change it and/or distribute copies of it under
certain conditions.  Enter "help copying" to see the conditions.
This program has absolutely no warranty.  Enter "help warranty" for details.

GNU gdb (GDB) 7.6
Copyright (C) 2013 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.  Type "show copying"
and "show warranty" for details.
This GDB was configured as "x86_64-unknown-linux-gnu"...

WARNING: kernel relocated [224MB]: patching 92417 gdb minimal_symbol values

      KERNEL: /usr/lib/debug/lib/modules/4.14.97-.el7.centos.x86_64/vmlinux
    DUMPFILE: vmcore  [PARTIAL DUMP]
        CPUS: 48
        DATE: Mon Nov 11 18:59:49 2019
      UPTIME: 11 days, 04:57:53
LOAD AVERAGE: 3.14, 3.11, 3.09
       TASKS: 1103
    NODENAME: test
     RELEASE: 4.14.97-.el7.centos.x86_64
     VERSION: #1 SMP Mon Apr 29 14:32:59 CST 2019
     MACHINE: x86_64  (2494 Mhz)
      MEMORY: 159.9 GB
       PANIC: "general protection fault: 0000 [#1] SMP PTI"
         PID: 108
     COMMAND: "ksoftirqd/16"
        TASK: ffff978e28548000  [THREAD_INFO: ffff978e28548000]
         CPU: 16
       STATE: TASK_RUNNING (PANIC)

crash> bt
PID: 108    TASK: ffff978e28548000  CPU: 16  COMMAND: "ksoftirqd/16"
 #0 [ffffa2f14c9a7b18] machine_kexec at ffffffff8f059992
 #1 [ffffa2f14c9a7b70] __crash_kexec at ffffffff8f13cf7d
 #2 [ffffa2f14c9a7c38] crash_kexec at ffffffff8f13e089
 #3 [ffffa2f14c9a7c50] oops_end at ffffffff8f027a77
 #4 [ffffa2f14c9a7c70] general_protection at ffffffff8fa01635
    [exception RIP: rxe_elem_release+15]
    RIP: ffffffffc08da38f  RSP: ffffa2f14c9a7d28  RFLAGS: 00010246
    RAX: 0000000000000000  RBX: 860e42124013b0aa  RCX: 0000000000000000
    RDX: ffff978e03ba8900  RSI: 0000000000000281  RDI: ffff978e02e746e8
    RBP: ffff978e02e746e0   R8: 0000000000000201   R9: ffffa2f14dcb9000
    R10: 0000000000000000  R11: 0000000000000001  R12: 0000000000000000
    R13: 000000000000001d  R14: 0000000000000006  R15: ffff978e02e746e0
    ORIG_RAX: ffffffffffffffff  CS: 0010  SS: 0018
 #5 [ffffa2f14c9a7d38] rxe_responder at ffffffffc08d7d10 [rdma_rxe]
 #6 [ffffa2f14c9a7e48] rxe_do_task at ffffffffc08e060b [rdma_rxe]
 #7 [ffffa2f14c9a7e70] tasklet_action at ffffffff8f0afa1e
 #8 [ffffa2f14c9a7e88] __softirqentry_text_start at ffffffff8fc000d9
 #9 [ffffa2f14c9a7ee0] run_ksoftirqd at ffffffff8f0afa4e
#10 [ffffa2f14c9a7ee8] smpboot_thread_fn at ffffffff8f0cca5e
#11 [ffffa2f14c9a7f10] kthread at ffffffff8f0c8c9f
#12 [ffffa2f14c9a7f50] ret_from_fork at ffffffff8fa00205
crash> mod -s rdma_rxe
     MODULE       NAME                    SIZE  OBJECT FILE
ffffffffc08ef240  rdma_rxe              126976
/usr/lib/debug/usr/lib/modules/4.14.97-.el7.centos.x86_64/kernel/drivers/infiniband/sw/rxe/rdma_rxe.ko.debug
crash> dis -lr rxe_elem_release+15
/usr/src/debug/kernel--4.14.97-1.el7/linux-4.14.97-.el7.centos.x86_64/drivers/infiniband/sw/rxe/rxe_pool.c:
452
0xffffffffc08da380 <rxe_elem_release>:  nopl   0x0(%rax,%rax,1) [FTRACE NOP]
0xffffffffc08da385 <rxe_elem_release+5>:        push   %rbp
/usr/src/debug/kernel--4.14.97-1.el7/linux-4.14.97-.el7.centos.x86_64/drivers/infiniband/sw/rxe/rxe_pool.c:
447
0xffffffffc08da386 <rxe_elem_release+6>:        lea    -0x8(%rdi),%rbp
/usr/src/debug/kernel--4.14.97-1.el7/linux-4.14.97-.el7.centos.x86_64/arch/x86/include/asm/refcount.h:
52
0xffffffffc08da38a <rxe_elem_release+10>:       push   %rbx
0xffffffffc08da38b <rxe_elem_release+11>:       mov    -0x8(%rdi),%rbx
0xffffffffc08da38f <rxe_elem_release+15>:       mov    0x20(%rbx),%rax

crash> quit
[root@test 127.0.0.1-2019-11-11-18:59:57]#
   433 void *rxe_pool_get_index(struct rxe_pool *pool, u32 index)
   434 {
   435 struct rb_node *node = NULL;
   436 struct rxe_pool_entry *elem = NULL;
   437 unsigned long flags;
   438
   439 spin_lock_irqsave(&pool->pool_lock, flags);
   440
   441 if (pool->state != rxe_pool_valid)
   442 goto out;
   443
   444 node = pool->tree.rb_node;
   445
   446 while (node) {
   447 elem = rb_entry(node, struct rxe_pool_entry, node);
   448
   449 if (elem->index > index)
   450 node = node->rb_left;
   451 else if (elem->index < index)
   452 node = node->rb_right;
   453 else
   454 break;
   455 }
   456
   457 if (node)
   458 kref_get(&elem->ref_cnt);
   459
   460 out:
   461 spin_unlock_irqrestore(&pool->pool_lock, flags);
   462 return node ? elem : NULL;
   463 }

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: rxe panic
  2019-12-25  6:32 ` Leon Romanovsky
@ 2019-12-25  7:23   ` Frank Huang
  2019-12-25  7:43     ` Frank Huang
  2019-12-25  9:23     ` Leon Romanovsky
  0 siblings, 2 replies; 12+ messages in thread
From: Frank Huang @ 2019-12-25  7:23 UTC (permalink / raw)
  To: Leon Romanovsky; +Cc: linux-rdma

hi leon

I can not get what you means, do you say the rxe_add_ref(qp) is not needed?
My kernel is old, and I found some bugs of rxe on 4.14.97, especially
the rnr errors.
I can not upgrade whole kernel because there are many dependencies.
Finally , I sync the fixed from newest kernel version to the 4.14.97.

When I compare my rxe_resp.c with kernel 5.2.9 , I found the snippet
of duplicate_request is changed.
and rxe_xmit_packet will call rxe_send,enter the log "rdma_rxe:
Unknown layer 3 protocol: 0"

  1137 } else {
  1138 struct resp_res *res;
  1139
  1140 /* Find the operation in our list of responder resources. */
  1141 res = find_resource(qp, pkt->psn);
  1142 if (res) {
  1143 struct sk_buff *skb_copy;
  1144
  1145 skb_copy = skb_clone(res->atomic.skb, GFP_ATOMIC);
  1146 if (skb_copy) {
  1147 rxe_add_ref(qp); /* for the new SKB */
  1148 } else {
  1149 pr_warn("Couldn't clone atomic resp\n");
  1150 rc = RESPST_CLEANUP;
  1151 goto out;
  1152 }
  1153
  1154 /* Resend the result. */
  1155 rc = rxe_xmit_packet(to_rdev(qp->ibqp.device), qp,
  1156      pkt, skb_copy);
  1157 if (rc) {
  1158 pr_err("Failed resending result. This flow is not handled - skb
ignored\n");
  1159 rxe_drop_ref(qp);
  1160 rc = RESPST_CLEANUP;
  1161 goto out;
  1162 }
  1163 }
  1164
  1165 /* Resource not found. Class D error. Drop the request. */
  1166 rc = RESPST_CLEANUP;
  1167 goto out;
  1168 }
  1169 out:
  1170 return rc;
  1171 }

On Wed, Dec 25, 2019 at 2:33 PM Leon Romanovsky <leon@kernel.org> wrote:
>
> On Wed, Dec 25, 2019 at 12:55:35PM +0800, Frank Huang wrote:
> > hi, there is a panic on rdma_rxe module when the restart
> > network.service or shutdown the switch.
> >
> > it looks like a use-after-free error.
> >
> > everytime it happens, there is the log "rdma_rxe: Unknown layer 3 protocol: 0"
>
> The error print itself is harmless.
> >
> > is it a known error?
> >
> > my kernel version is 4.14.97
>
> Your kernel is old enough and doesn't include refcount,
> so I can't say for sure that it is the case, but the
> following code is not correct and with refcount debug
> it will be seen immediately.
>
> 1213 int rxe_responder(void *arg)
> 1214 {
> 1215         struct rxe_qp *qp = (struct rxe_qp *)arg;
> 1216         struct rxe_dev *rxe = to_rdev(qp->ibqp.device);
> 1217         enum resp_states state;
> 1218         struct rxe_pkt_info *pkt = NULL;
> 1219         int ret = 0;
> 1220
> 1221         rxe_add_ref(qp); <------ USE-AFTER-FREE
> 1222
> 1223         qp->resp.aeth_syndrome = AETH_ACK_UNLIMITED;
> 1224
> 1225         if (!qp->valid) {
> 1226                 ret = -EINVAL;
> 1227                 goto done;
> 1228         }
>
> Thanks

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: rxe panic
  2019-12-25  7:23   ` Frank Huang
@ 2019-12-25  7:43     ` Frank Huang
  2019-12-25  9:23     ` Leon Romanovsky
  1 sibling, 0 replies; 12+ messages in thread
From: Frank Huang @ 2019-12-25  7:43 UTC (permalink / raw)
  To: Leon Romanovsky; +Cc: linux-rdma

there is the patch what i used. :)


rdma_rxe(4.14.97) : has problems in dealing with disorderly messages
this patch transplant rdma_rxe module from linux-5.2.9 to fix this problems.
the fix only under linux-4.14.97/drivers/infiniband/sw/rxe. At
present, no impact on other modules has been found.

diff -ur linux-4.14.97/drivers/infiniband/sw/rxe/rxe_comp.c
linux-4.14.97-rxe/drivers/infiniband/sw/rxe/rxe_comp.c
--- linux-4.14.97/drivers/infiniband/sw/rxe/rxe_comp.c 2019-01-31
15:13:48.000000000 +0800
+++ linux-4.14.97-rxe/drivers/infiniband/sw/rxe/rxe_comp.c 2019-09-17
16:00:39.168896560 +0800
@@ -191,6 +191,7 @@
 {
  qp->comp.retry_cnt = qp->attr.retry_cnt;
  qp->comp.rnr_retry = qp->attr.rnr_retry;
+ qp->comp.started_retry = 0;
 }

 static inline enum comp_state check_psn(struct rxe_qp *qp,
@@ -253,6 +254,17 @@
  case IB_OPCODE_RC_RDMA_READ_RESPONSE_MIDDLE:
  if (pkt->opcode != IB_OPCODE_RC_RDMA_READ_RESPONSE_MIDDLE &&
      pkt->opcode != IB_OPCODE_RC_RDMA_READ_RESPONSE_LAST) {
+ /* read retries of partial data may restart from
+ * read response first or response only.
+ */
+ if ((pkt->psn == wqe->first_psn &&
+      pkt->opcode ==
+      IB_OPCODE_RC_RDMA_READ_RESPONSE_FIRST) ||
+     (wqe->first_psn == wqe->last_psn &&
+      pkt->opcode ==
+      IB_OPCODE_RC_RDMA_READ_RESPONSE_ONLY))
+ break;
+
  return COMPST_ERROR;
  }
  break;
@@ -270,8 +282,8 @@
  if ((syn & AETH_TYPE_MASK) != AETH_ACK)
  return COMPST_ERROR;

- /* Fall through (IB_OPCODE_RC_RDMA_READ_RESPONSE_MIDDLE
- * doesn't have an AETH)
+ /* fall through */
+ /* (IB_OPCODE_RC_RDMA_READ_RESPONSE_MIDDLE doesn't have an AETH)
  */
  case IB_OPCODE_RC_RDMA_READ_RESPONSE_MIDDLE:
  if (wqe->wr.opcode != IB_WR_RDMA_READ &&
@@ -501,11 +513,11 @@
     struct rxe_pkt_info *pkt,
     struct rxe_send_wqe *wqe)
 {
- qp->comp.opcode = -1;
-
- if (pkt) {
- if (psn_compare(pkt->psn, qp->comp.psn) >= 0)
- qp->comp.psn = (pkt->psn + 1) & BTH_PSN_MASK;
+ if (pkt && wqe->state == wqe_state_pending) {
+ if (psn_compare(wqe->last_psn, qp->comp.psn) >= 0) {
+ qp->comp.psn = (wqe->last_psn + 1) & BTH_PSN_MASK;
+ qp->comp.opcode = -1;
+ }

  if (qp->req.wait_psn) {
  qp->req.wait_psn = 0;
@@ -662,7 +674,6 @@
      qp->qp_timeout_jiffies)
  mod_timer(&qp->retrans_timer,
    jiffies + qp->qp_timeout_jiffies);
- WARN_ON_ONCE(skb);
  goto exit;

  case COMPST_ERROR_RETRY:
@@ -676,10 +687,23 @@

  /* there is nothing to retry in this case */
  if (!wqe || (wqe->state == wqe_state_posted)) {
- WARN_ON_ONCE(skb);
  goto exit;
  }

+ /* if we've started a retry, don't start another
+ * retry sequence, unless this is a timeout.
+ */
+ if (qp->comp.started_retry &&
+     !qp->comp.timeout_retry) {
+ if (pkt) {
+ rxe_drop_ref(pkt->qp);
+ kfree_skb(skb);
+ skb = NULL;
+ }
+
+ goto done;
+ }
+
  if (qp->comp.retry_cnt > 0) {
  if (qp->comp.retry_cnt != 7)
  qp->comp.retry_cnt--;
@@ -696,6 +720,7 @@
  rxe_counter_inc(rxe,
  RXE_CNT_COMP_RETRY);
  qp->req.need_retry = 1;
+ qp->comp.started_retry = 1;
  rxe_run_task(&qp->req.task, 1);
  }

@@ -705,8 +730,7 @@
  skb = NULL;
  }

- WARN_ON_ONCE(skb);
- goto exit;
+ goto done;

  } else {
  rxe_counter_inc(rxe, RXE_CNT_RETRY_EXCEEDED);
@@ -749,7 +773,6 @@
  skb = NULL;
  }

- WARN_ON_ONCE(skb);
  goto exit;
  }
  }
diff -ur linux-4.14.97/drivers/infiniband/sw/rxe/rxe.h
linux-4.14.97-rxe/drivers/infiniband/sw/rxe/rxe.h
--- linux-4.14.97/drivers/infiniband/sw/rxe/rxe.h 2019-01-31
15:13:48.000000000 +0800
+++ linux-4.14.97-rxe/drivers/infiniband/sw/rxe/rxe.h 2019-09-17
16:00:39.169896565 +0800
@@ -74,7 +74,6 @@
  SHASH_DESC_ON_STACK(shash, rxe->tfm);

  shash->tfm = rxe->tfm;
- shash->flags = 0;
  *(u32 *)shash_desc_ctx(shash) = crc;
  err = crypto_shash_update(shash, next, len);
  if (unlikely(err)) {
diff -ur linux-4.14.97/drivers/infiniband/sw/rxe/rxe_hdr.h
linux-4.14.97-rxe/drivers/infiniband/sw/rxe/rxe_hdr.h
--- linux-4.14.97/drivers/infiniband/sw/rxe/rxe_hdr.h 2019-01-31
15:13:48.000000000 +0800
+++ linux-4.14.97-rxe/drivers/infiniband/sw/rxe/rxe_hdr.h 2019-09-17
16:00:39.169896565 +0800
@@ -643,7 +643,7 @@
  __be32 rkey;
  __be64 swap_add;
  __be64 comp;
-} __attribute__((__packed__));
+} __packed;

 static inline u64 __atmeth_va(void *arg)
 {
diff -ur linux-4.14.97/drivers/infiniband/sw/rxe/rxe_hw_counters.c
linux-4.14.97-rxe/drivers/infiniband/sw/rxe/rxe_hw_counters.c
--- linux-4.14.97/drivers/infiniband/sw/rxe/rxe_hw_counters.c
2019-01-31 15:13:48.000000000 +0800
+++ linux-4.14.97-rxe/drivers/infiniband/sw/rxe/rxe_hw_counters.c
2019-09-17 16:00:39.169896565 +0800
@@ -37,11 +37,11 @@
  [RXE_CNT_SENT_PKTS]           =  "sent_pkts",
  [RXE_CNT_RCVD_PKTS]           =  "rcvd_pkts",
  [RXE_CNT_DUP_REQ]             =  "duplicate_request",
- [RXE_CNT_OUT_OF_SEQ_REQ]      =  "out_of_sequence",
+ [RXE_CNT_OUT_OF_SEQ_REQ]      =  "out_of_seq_request",
  [RXE_CNT_RCV_RNR]             =  "rcvd_rnr_err",
  [RXE_CNT_SND_RNR]             =  "send_rnr_err",
  [RXE_CNT_RCV_SEQ_ERR]         =  "rcvd_seq_err",
- [RXE_CNT_COMPLETER_SCHED]     =  "ack_deffered",
+ [RXE_CNT_COMPLETER_SCHED]     =  "ack_deferred",
  [RXE_CNT_RETRY_EXCEEDED]      =  "retry_exceeded_err",
  [RXE_CNT_RNR_RETRY_EXCEEDED]  =  "retry_rnr_exceeded_err",
  [RXE_CNT_COMP_RETRY]          =  "completer_retry_err",
diff -ur linux-4.14.97/drivers/infiniband/sw/rxe/rxe_loc.h
linux-4.14.97-rxe/drivers/infiniband/sw/rxe/rxe_loc.h
--- linux-4.14.97/drivers/infiniband/sw/rxe/rxe_loc.h 2019-01-31
15:13:48.000000000 +0800
+++ linux-4.14.97-rxe/drivers/infiniband/sw/rxe/rxe_loc.h 2019-09-17
16:00:39.170896570 +0800
@@ -268,7 +268,8 @@

  if (pkt->mask & RXE_LOOPBACK_MASK) {
  memcpy(SKB_TO_PKT(skb), pkt, sizeof(*pkt));
- err = rxe_loopback(skb);
+ rxe_loopback(skb);
+ err = 0;
  } else {
  err = rxe_send(rxe, pkt, skb);
  }
diff -ur linux-4.14.97/drivers/infiniband/sw/rxe/rxe_mmap.c
linux-4.14.97-rxe/drivers/infiniband/sw/rxe/rxe_mmap.c
--- linux-4.14.97/drivers/infiniband/sw/rxe/rxe_mmap.c 2019-01-31
15:13:48.000000000 +0800
+++ linux-4.14.97-rxe/drivers/infiniband/sw/rxe/rxe_mmap.c 2019-09-17
16:00:39.170896570 +0800
@@ -146,6 +146,8 @@
     void *obj)
 {
  struct rxe_mmap_info *ip;
+    if (!context)
+     return ERR_PTR(-EINVAL);

  ip = kmalloc(sizeof(*ip), GFP_KERNEL);
  if (!ip)
diff -ur linux-4.14.97/drivers/infiniband/sw/rxe/rxe_pool.c
linux-4.14.97-rxe/drivers/infiniband/sw/rxe/rxe_pool.c
--- linux-4.14.97/drivers/infiniband/sw/rxe/rxe_pool.c 2019-01-31
15:13:48.000000000 +0800
+++ linux-4.14.97-rxe/drivers/infiniband/sw/rxe/rxe_pool.c 2019-09-17
16:00:39.171896575 +0800
@@ -112,6 +112,20 @@
  return rxe_type_info[pool->type].cache;
 }

+static void rxe_cache_clean(size_t cnt)
+{
+ int i;
+ struct rxe_type_info *type;
+
+ for (i = 0; i < cnt; i++) {
+ type = &rxe_type_info[i];
+ if (!(type->flags & RXE_POOL_NO_ALLOC)) {
+ kmem_cache_destroy(type->cache);
+ type->cache = NULL;
+ }
+ }
+}
+
 int rxe_cache_init(void)
 {
  int err;
@@ -136,24 +150,14 @@
  return 0;

 err1:
- while (--i >= 0) {
- kmem_cache_destroy(type->cache);
- type->cache = NULL;
- }
+ rxe_cache_clean(i);

  return err;
 }

 void rxe_cache_exit(void)
 {
- int i;
- struct rxe_type_info *type;
-
- for (i = 0; i < RXE_NUM_TYPES; i++) {
- type = &rxe_type_info[i];
- kmem_cache_destroy(type->cache);
- type->cache = NULL;
- }
+ rxe_cache_clean(RXE_NUM_TYPES);
 }

 static int rxe_pool_init_index(struct rxe_pool *pool, u32 max, u32 min)
@@ -207,7 +211,7 @@

  kref_init(&pool->ref_cnt);

- spin_lock_init(&pool->pool_lock);
+ rwlock_init(&pool->pool_lock);

  if (rxe_type_info[type].flags & RXE_POOL_INDEX) {
  err = rxe_pool_init_index(pool,
@@ -222,7 +226,7 @@
  pool->key_size = rxe_type_info[type].key_size;
  }

- pool->state = rxe_pool_valid;
+ pool->state = RXE_POOL_STATE_VALID;

 out:
  return err;
@@ -232,7 +236,7 @@
 {
  struct rxe_pool *pool = container_of(kref, struct rxe_pool, ref_cnt);

- pool->state = rxe_pool_invalid;
+ pool->state = RXE_POOL_STATE_INVALID;
  kfree(pool->table);
 }

@@ -245,12 +249,12 @@
 {
  unsigned long flags;

- spin_lock_irqsave(&pool->pool_lock, flags);
- pool->state = rxe_pool_invalid;
+ write_lock_irqsave(&pool->pool_lock, flags);
+ pool->state = RXE_POOL_STATE_INVALID;
  if (atomic_read(&pool->num_elem) > 0)
  pr_warn("%s pool destroyed with unfree'd elem\n",
  pool_name(pool));
- spin_unlock_irqrestore(&pool->pool_lock, flags);
+ write_unlock_irqrestore(&pool->pool_lock, flags);

  rxe_pool_put(pool);

@@ -336,10 +340,10 @@
  struct rxe_pool *pool = elem->pool;
  unsigned long flags;

- spin_lock_irqsave(&pool->pool_lock, flags);
+ write_lock_irqsave(&pool->pool_lock, flags);
  memcpy((u8 *)elem + pool->key_offset, key, pool->key_size);
  insert_key(pool, elem);
- spin_unlock_irqrestore(&pool->pool_lock, flags);
+ write_unlock_irqrestore(&pool->pool_lock, flags);
 }

 void rxe_drop_key(void *arg)
@@ -348,9 +352,9 @@
  struct rxe_pool *pool = elem->pool;
  unsigned long flags;

- spin_lock_irqsave(&pool->pool_lock, flags);
+ write_lock_irqsave(&pool->pool_lock, flags);
  rb_erase(&elem->node, &pool->tree);
- spin_unlock_irqrestore(&pool->pool_lock, flags);
+ write_unlock_irqrestore(&pool->pool_lock, flags);
 }

 void rxe_add_index(void *arg)
@@ -359,10 +363,10 @@
  struct rxe_pool *pool = elem->pool;
  unsigned long flags;

- spin_lock_irqsave(&pool->pool_lock, flags);
+ write_lock_irqsave(&pool->pool_lock, flags);
  elem->index = alloc_index(pool);
  insert_index(pool, elem);
- spin_unlock_irqrestore(&pool->pool_lock, flags);
+ write_unlock_irqrestore(&pool->pool_lock, flags);
 }

 void rxe_drop_index(void *arg)
@@ -371,10 +375,10 @@
  struct rxe_pool *pool = elem->pool;
  unsigned long flags;

- spin_lock_irqsave(&pool->pool_lock, flags);
+ write_lock_irqsave(&pool->pool_lock, flags);
  clear_bit(elem->index - pool->min_index, pool->table);
  rb_erase(&elem->node, &pool->tree);
- spin_unlock_irqrestore(&pool->pool_lock, flags);
+ write_unlock_irqrestore(&pool->pool_lock, flags);
 }

 void *rxe_alloc(struct rxe_pool *pool)
@@ -384,13 +388,13 @@

  might_sleep_if(!(pool->flags & RXE_POOL_ATOMIC));

- spin_lock_irqsave(&pool->pool_lock, flags);
- if (pool->state != rxe_pool_valid) {
- spin_unlock_irqrestore(&pool->pool_lock, flags);
+ read_lock_irqsave(&pool->pool_lock, flags);
+ if (pool->state != RXE_POOL_STATE_VALID) {
+ read_unlock_irqrestore(&pool->pool_lock, flags);
  return NULL;
  }
  kref_get(&pool->ref_cnt);
- spin_unlock_irqrestore(&pool->pool_lock, flags);
+ read_unlock_irqrestore(&pool->pool_lock, flags);

  kref_get(&pool->rxe->ref_cnt);

@@ -436,9 +440,9 @@
  struct rxe_pool_entry *elem = NULL;
  unsigned long flags;

- spin_lock_irqsave(&pool->pool_lock, flags);
+ read_lock_irqsave(&pool->pool_lock, flags);

- if (pool->state != rxe_pool_valid)
+ if (pool->state != RXE_POOL_STATE_VALID)
  goto out;

  node = pool->tree.rb_node;
@@ -450,15 +454,14 @@
  node = node->rb_left;
  else if (elem->index < index)
  node = node->rb_right;
- else
+ else {
+ kref_get(&elem->ref_cnt);
  break;
+ }
  }

- if (node)
- kref_get(&elem->ref_cnt);
-
 out:
- spin_unlock_irqrestore(&pool->pool_lock, flags);
+ read_unlock_irqrestore(&pool->pool_lock, flags);
  return node ? elem : NULL;
 }

@@ -469,9 +472,9 @@
  int cmp;
  unsigned long flags;

- spin_lock_irqsave(&pool->pool_lock, flags);
+ read_lock_irqsave(&pool->pool_lock, flags);

- if (pool->state != rxe_pool_valid)
+ if (pool->state != RXE_POOL_STATE_VALID)
  goto out;

  node = pool->tree.rb_node;
@@ -494,6 +497,6 @@
  kref_get(&elem->ref_cnt);

 out:
- spin_unlock_irqrestore(&pool->pool_lock, flags);
+ read_unlock_irqrestore(&pool->pool_lock, flags);
  return node ? elem : NULL;
 }
diff -ur linux-4.14.97/drivers/infiniband/sw/rxe/rxe_pool.h
linux-4.14.97-rxe/drivers/infiniband/sw/rxe/rxe_pool.h
--- linux-4.14.97/drivers/infiniband/sw/rxe/rxe_pool.h 2019-01-31
15:13:48.000000000 +0800
+++ linux-4.14.97-rxe/drivers/infiniband/sw/rxe/rxe_pool.h 2019-09-17
16:00:39.171896575 +0800
@@ -41,6 +41,7 @@
  RXE_POOL_ATOMIC = BIT(0),
  RXE_POOL_INDEX = BIT(1),
  RXE_POOL_KEY = BIT(2),
+ RXE_POOL_NO_ALLOC = BIT(4),
 };

 enum rxe_elem_type {
@@ -74,8 +75,8 @@
 extern struct rxe_type_info rxe_type_info[];

 enum rxe_pool_state {
- rxe_pool_invalid,
- rxe_pool_valid,
+ RXE_POOL_STATE_INVALID,
+ RXE_POOL_STATE_VALID,
 };

 struct rxe_pool_entry {
@@ -90,7 +91,7 @@

 struct rxe_pool {
  struct rxe_dev *rxe;
- spinlock_t              pool_lock; /* pool spinlock */
+ rwlock_t pool_lock; /* protects pool add/del/search */
  size_t elem_size;
  struct kref ref_cnt;
  void (*cleanup)(struct rxe_pool_entry *obj);
diff -ur linux-4.14.97/drivers/infiniband/sw/rxe/rxe_qp.c
linux-4.14.97-rxe/drivers/infiniband/sw/rxe/rxe_qp.c
--- linux-4.14.97/drivers/infiniband/sw/rxe/rxe_qp.c 2019-01-31
15:13:48.000000000 +0800
+++ linux-4.14.97-rxe/drivers/infiniband/sw/rxe/rxe_qp.c 2019-09-17
16:00:39.172896580 +0800
@@ -235,6 +235,16 @@
  return err;
  qp->sk->sk->sk_user_data = qp;

+ /* pick a source UDP port number for this QP based on
+ * the source QPN. this spreads traffic for different QPs
+ * across different NIC RX queues (while using a single
+ * flow for a given QP to maintain packet order).
+ * the port number must be in the Dynamic Ports range
+ * (0xc000 - 0xffff).
+ */
+ qp->src_port = RXE_ROCE_V2_SPORT +
+ (hash_32_generic(qp_num(qp), 14) & 0x3fff);
+
  qp->sq.max_wr = init->cap.max_send_wr;
  qp->sq.max_sge = init->cap.max_send_sge;
  qp->sq.max_inline = init->cap.max_inline_data;
diff -ur linux-4.14.97/drivers/infiniband/sw/rxe/rxe_req.c
linux-4.14.97-rxe/drivers/infiniband/sw/rxe/rxe_req.c
--- linux-4.14.97/drivers/infiniband/sw/rxe/rxe_req.c 2019-01-31
15:13:48.000000000 +0800
+++ linux-4.14.97-rxe/drivers/infiniband/sw/rxe/rxe_req.c 2019-09-17
16:00:39.172896580 +0800
@@ -73,9 +73,6 @@
  int npsn;
  int first = 1;

- wqe = queue_head(qp->sq.queue);
- npsn = (qp->comp.psn - wqe->first_psn) & BTH_PSN_MASK;
-
  qp->req.wqe_index = consumer_index(qp->sq.queue);
  qp->req.psn = qp->comp.psn;
  qp->req.opcode = -1;
@@ -107,11 +104,17 @@
  if (first) {
  first = 0;

- if (mask & WR_WRITE_OR_SEND_MASK)
+ if (mask & WR_WRITE_OR_SEND_MASK) {
+ npsn = (qp->comp.psn - wqe->first_psn) &
+ BTH_PSN_MASK;
  retry_first_write_send(qp, wqe, mask, npsn);
+ }

- if (mask & WR_READ_MASK)
+ if (mask & WR_READ_MASK) {
+ npsn = (wqe->dma.length - wqe->dma.resid) /
+ qp->mtu;
  wqe->iova += npsn * qp->mtu;
+ }
  }

  wqe->state = wqe_state_posted;
@@ -435,7 +438,7 @@
  if (pkt->mask & RXE_RETH_MASK) {
  reth_set_rkey(pkt, ibwr->wr.rdma.rkey);
  reth_set_va(pkt, wqe->iova);
- reth_set_len(pkt, wqe->dma.length);
+ reth_set_len(pkt, wqe->dma.resid);
  }

  if (pkt->mask & RXE_IMMDT_MASK)
@@ -713,6 +716,7 @@

  if (fill_packet(qp, wqe, &pkt, skb, payload)) {
  pr_debug("qp#%d Error during fill packet\n", qp_num(qp));
+ kfree_skb(skb);
  goto err;
  }

@@ -744,7 +748,6 @@
  goto next_wqe;

 err:
- kfree_skb(skb);
  wqe->status = IB_WC_LOC_PROT_ERR;
  wqe->state = wqe_state_error;
  __rxe_do_task(&qp->comp.task);
diff -ur linux-4.14.97/drivers/infiniband/sw/rxe/rxe_resp.c
linux-4.14.97-rxe/drivers/infiniband/sw/rxe/rxe_resp.c
--- linux-4.14.97/drivers/infiniband/sw/rxe/rxe_resp.c 2019-01-31
15:13:48.000000000 +0800
+++ linux-4.14.97-rxe/drivers/infiniband/sw/rxe/rxe_resp.c 2019-09-17
16:00:39.173896585 +0800
@@ -124,12 +124,9 @@
  struct sk_buff *skb;

  if (qp->resp.state == QP_STATE_ERROR) {
- skb = skb_dequeue(&qp->req_pkts);
- if (skb) {
- /* drain request packet queue */
+ while ((skb = skb_dequeue(&qp->req_pkts))) {
  rxe_drop_ref(qp);
  kfree_skb(skb);
- return RESPST_GET_REQ;
  }

  /* go drain recv wr queue */
@@ -435,6 +432,7 @@
  qp->resp.va = reth_va(pkt);
  qp->resp.rkey = reth_rkey(pkt);
  qp->resp.resid = reth_len(pkt);
+ qp->resp.length = reth_len(pkt);
  }
  access = (pkt->mask & RXE_READ_MASK) ? IB_ACCESS_REMOTE_READ
       : IB_ACCESS_REMOTE_WRITE;
@@ -860,7 +858,9 @@
  pkt->mask & RXE_WRITE_MASK) ?
  IB_WC_RECV_RDMA_WITH_IMM : IB_WC_RECV;
  wc->vendor_err = 0;
- wc->byte_len = wqe->dma.length - wqe->dma.resid;
+ wc->byte_len = (pkt->mask & RXE_IMMDT_MASK &&
+ pkt->mask & RXE_WRITE_MASK) ?
+ qp->resp.length : wqe->dma.length - wqe->dma.resid;

  /* fields after byte_len are different between kernel and user
  * space
diff -ur linux-4.14.97/drivers/infiniband/sw/rxe/rxe_verbs.c
linux-4.14.97-rxe/drivers/infiniband/sw/rxe/rxe_verbs.c
--- linux-4.14.97/drivers/infiniband/sw/rxe/rxe_verbs.c 2019-01-31
15:13:48.000000000 +0800
+++ linux-4.14.97-rxe/drivers/infiniband/sw/rxe/rxe_verbs.c 2019-09-17
16:00:39.174896590 +0800
@@ -644,6 +644,7 @@
  switch (wr->opcode) {
  case IB_WR_RDMA_WRITE_WITH_IMM:
  wr->ex.imm_data = ibwr->ex.imm_data;
+ /* fall through */
  case IB_WR_RDMA_READ:
  case IB_WR_RDMA_WRITE:
  wr->wr.rdma.remote_addr = rdma_wr(ibwr)->remote_addr;
@@ -774,7 +775,6 @@
  unsigned int mask;
  unsigned int length = 0;
  int i;
- int must_sched;

  while (wr) {
  mask = wr_opcode_mask(wr->opcode, qp);
@@ -804,14 +804,7 @@
  wr = wr->next;
  }

- /*
- * Must sched in case of GSI QP because ib_send_mad() hold irq lock,
- * and the requester call ip_local_out_sk() that takes spin_lock_bh.
- */
- must_sched = (qp_type(qp) == IB_QPT_GSI) ||
- (queue_count(qp->sq.queue) > 1);
-
- rxe_run_task(&qp->req.task, must_sched);
+ rxe_run_task(&qp->req.task, 1);
  if (unlikely(qp->req.state == QP_STATE_ERROR))
  rxe_run_task(&qp->comp.task, 1);

diff -ur linux-4.14.97/drivers/infiniband/sw/rxe/rxe_verbs.h
linux-4.14.97-rxe/drivers/infiniband/sw/rxe/rxe_verbs.h
--- linux-4.14.97/drivers/infiniband/sw/rxe/rxe_verbs.h 2019-01-31
15:13:48.000000000 +0800
+++ linux-4.14.97-rxe/drivers/infiniband/sw/rxe/rxe_verbs.h 2019-09-17
16:00:39.174896590 +0800
@@ -160,6 +160,7 @@
  int opcode;
  int timeout;
  int timeout_retry;
+ int started_retry;
  u32 retry_cnt;
  u32 rnr_retry;
  struct rxe_task task;
@@ -214,6 +215,7 @@
  struct rxe_mem *mr;
  u32 resid;
  u32 rkey;
+ u32 length;
  u64 atomic_orig;

  /* SRQ only */
@@ -252,6 +254,7 @@

  struct socket *sk;
  u32 dst_cookie;
+ u16 src_port;

  struct rxe_av pri_av;
  struct rxe_av alt_av;

On Wed, Dec 25, 2019 at 3:23 PM Frank Huang <tigerinxm@gmail.com> wrote:
>
> hi leon
>
> I can not get what you means, do you say the rxe_add_ref(qp) is not needed?
> My kernel is old, and I found some bugs of rxe on 4.14.97, especially
> the rnr errors.
> I can not upgrade whole kernel because there are many dependencies.
> Finally , I sync the fixed from newest kernel version to the 4.14.97.
>
> When I compare my rxe_resp.c with kernel 5.2.9 , I found the snippet
> of duplicate_request is changed.
> and rxe_xmit_packet will call rxe_send,enter the log "rdma_rxe:
> Unknown layer 3 protocol: 0"
>
>   1137 } else {
>   1138 struct resp_res *res;
>   1139
>   1140 /* Find the operation in our list of responder resources. */
>   1141 res = find_resource(qp, pkt->psn);
>   1142 if (res) {
>   1143 struct sk_buff *skb_copy;
>   1144
>   1145 skb_copy = skb_clone(res->atomic.skb, GFP_ATOMIC);
>   1146 if (skb_copy) {
>   1147 rxe_add_ref(qp); /* for the new SKB */
>   1148 } else {
>   1149 pr_warn("Couldn't clone atomic resp\n");
>   1150 rc = RESPST_CLEANUP;
>   1151 goto out;
>   1152 }
>   1153
>   1154 /* Resend the result. */
>   1155 rc = rxe_xmit_packet(to_rdev(qp->ibqp.device), qp,
>   1156      pkt, skb_copy);
>   1157 if (rc) {
>   1158 pr_err("Failed resending result. This flow is not handled - skb
> ignored\n");
>   1159 rxe_drop_ref(qp);
>   1160 rc = RESPST_CLEANUP;
>   1161 goto out;
>   1162 }
>   1163 }
>   1164
>   1165 /* Resource not found. Class D error. Drop the request. */
>   1166 rc = RESPST_CLEANUP;
>   1167 goto out;
>   1168 }
>   1169 out:
>   1170 return rc;
>   1171 }
>
> On Wed, Dec 25, 2019 at 2:33 PM Leon Romanovsky <leon@kernel.org> wrote:
> >
> > On Wed, Dec 25, 2019 at 12:55:35PM +0800, Frank Huang wrote:
> > > hi, there is a panic on rdma_rxe module when the restart
> > > network.service or shutdown the switch.
> > >
> > > it looks like a use-after-free error.
> > >
> > > everytime it happens, there is the log "rdma_rxe: Unknown layer 3 protocol: 0"
> >
> > The error print itself is harmless.
> > >
> > > is it a known error?
> > >
> > > my kernel version is 4.14.97
> >
> > Your kernel is old enough and doesn't include refcount,
> > so I can't say for sure that it is the case, but the
> > following code is not correct and with refcount debug
> > it will be seen immediately.
> >
> > 1213 int rxe_responder(void *arg)
> > 1214 {
> > 1215         struct rxe_qp *qp = (struct rxe_qp *)arg;
> > 1216         struct rxe_dev *rxe = to_rdev(qp->ibqp.device);
> > 1217         enum resp_states state;
> > 1218         struct rxe_pkt_info *pkt = NULL;
> > 1219         int ret = 0;
> > 1220
> > 1221         rxe_add_ref(qp); <------ USE-AFTER-FREE
> > 1222
> > 1223         qp->resp.aeth_syndrome = AETH_ACK_UNLIMITED;
> > 1224
> > 1225         if (!qp->valid) {
> > 1226                 ret = -EINVAL;
> > 1227                 goto done;
> > 1228         }
> >
> > Thanks

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: rxe panic
  2019-12-25  7:23   ` Frank Huang
  2019-12-25  7:43     ` Frank Huang
@ 2019-12-25  9:23     ` Leon Romanovsky
  2019-12-26  1:08       ` Zhu Yanjun
  1 sibling, 1 reply; 12+ messages in thread
From: Leon Romanovsky @ 2019-12-25  9:23 UTC (permalink / raw)
  To: Frank Huang; +Cc: linux-rdma

On Wed, Dec 25, 2019 at 03:23:53PM +0800, Frank Huang wrote:
> hi leon
>
> I can not get what you means, do you say the rxe_add_ref(qp) is not needed?

It is not what I'm saying.

The use of rxe_add_ref(qp) assumes that QP can't disappear while it is
called. From what I see in the code, rxe_responder() doesn't guarantee
that.

Thanks

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: rxe panic
  2019-12-25  9:23     ` Leon Romanovsky
@ 2019-12-26  1:08       ` Zhu Yanjun
  2019-12-26  1:39         ` Frank Huang
  0 siblings, 1 reply; 12+ messages in thread
From: Zhu Yanjun @ 2019-12-26  1:08 UTC (permalink / raw)
  To: Leon Romanovsky; +Cc: Frank Huang, linux-rdma

I agree with you, Leon. I have fixed several problems similar to this
in the Linux upstream. Not sure whether this problem is fixed or not
in Linux upstream.

Zhu Yanjun

On Wed, Dec 25, 2019 at 5:29 PM Leon Romanovsky <leon@kernel.org> wrote:
>
> On Wed, Dec 25, 2019 at 03:23:53PM +0800, Frank Huang wrote:
> > hi leon
> >
> > I can not get what you means, do you say the rxe_add_ref(qp) is not needed?
>
> It is not what I'm saying.
>
> The use of rxe_add_ref(qp) assumes that QP can't disappear while it is
> called. From what I see in the code, rxe_responder() doesn't guarantee
> that.
>
> Thanks

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: rxe panic
  2019-12-26  1:08       ` Zhu Yanjun
@ 2019-12-26  1:39         ` Frank Huang
  2019-12-26  2:35           ` Zhu Yanjun
  0 siblings, 1 reply; 12+ messages in thread
From: Frank Huang @ 2019-12-26  1:39 UTC (permalink / raw)
  To: Zhu Yanjun; +Cc: Leon Romanovsky, linux-rdma

Hi, zhu

Can you show some patches about these problems?

On Thu, Dec 26, 2019 at 9:08 AM Zhu Yanjun <zyjzyj2000@gmail.com> wrote:
>
> I agree with you, Leon. I have fixed several problems similar to this
> in the Linux upstream. Not sure whether this problem is fixed or not
> in Linux upstream.
>
> Zhu Yanjun
>
> On Wed, Dec 25, 2019 at 5:29 PM Leon Romanovsky <leon@kernel.org> wrote:
> >
> > On Wed, Dec 25, 2019 at 03:23:53PM +0800, Frank Huang wrote:
> > > hi leon
> > >
> > > I can not get what you means, do you say the rxe_add_ref(qp) is not needed?
> >
> > It is not what I'm saying.
> >
> > The use of rxe_add_ref(qp) assumes that QP can't disappear while it is
> > called. From what I see in the code, rxe_responder() doesn't guarantee
> > that.
> >
> > Thanks

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: rxe panic
  2019-12-26  1:39         ` Frank Huang
@ 2019-12-26  2:35           ` Zhu Yanjun
  0 siblings, 0 replies; 12+ messages in thread
From: Zhu Yanjun @ 2019-12-26  2:35 UTC (permalink / raw)
  To: Frank Huang; +Cc: Leon Romanovsky, linux-rdma

Please make tests with Linux upstream.
Thanks.

Zhu Yanjun

On Thu, Dec 26, 2019 at 9:39 AM Frank Huang <tigerinxm@gmail.com> wrote:
>
> Hi, zhu
>
> Can you show some patches about these problems?
>
> On Thu, Dec 26, 2019 at 9:08 AM Zhu Yanjun <zyjzyj2000@gmail.com> wrote:
> >
> > I agree with you, Leon. I have fixed several problems similar to this
> > in the Linux upstream. Not sure whether this problem is fixed or not
> > in Linux upstream.
> >
> > Zhu Yanjun
> >
> > On Wed, Dec 25, 2019 at 5:29 PM Leon Romanovsky <leon@kernel.org> wrote:
> > >
> > > On Wed, Dec 25, 2019 at 03:23:53PM +0800, Frank Huang wrote:
> > > > hi leon
> > > >
> > > > I can not get what you means, do you say the rxe_add_ref(qp) is not needed?
> > >
> > > It is not what I'm saying.
> > >
> > > The use of rxe_add_ref(qp) assumes that QP can't disappear while it is
> > > called. From what I see in the code, rxe_responder() doesn't guarantee
> > > that.
> > >
> > > Thanks

^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2019-12-26  2:35 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-12-25  4:55 rxe panic Frank Huang
2019-12-25  5:27 ` Zhu Yanjun
2019-12-25  6:01   ` Frank Huang
2019-12-25  6:34     ` Zhu Yanjun
2019-12-25  7:10       ` Frank Huang
2019-12-25  6:32 ` Leon Romanovsky
2019-12-25  7:23   ` Frank Huang
2019-12-25  7:43     ` Frank Huang
2019-12-25  9:23     ` Leon Romanovsky
2019-12-26  1:08       ` Zhu Yanjun
2019-12-26  1:39         ` Frank Huang
2019-12-26  2:35           ` Zhu Yanjun

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).