All of lore.kernel.org
 help / color / mirror / Atom feed
* nvme_tcp BUG: unable to handle kernel NULL pointer dereference at 0000000000000230
@ 2021-06-01 17:51 Engel, Amit
  2021-06-02 12:28 ` Engel, Amit
  0 siblings, 1 reply; 11+ messages in thread
From: Engel, Amit @ 2021-06-01 17:51 UTC (permalink / raw)
  To: linux-nvme, sagi; +Cc: Engel, Amit

Hello,

We hit the below kernel panic “BUG: unable to handle kernel NULL pointer dereference at 0000000000000230”
when running with RHEL8.3 host. This happens after we reboot the target side application (when multiple ctrls/connections exist)
Based on vmcore analysis it seems that when nvme_tcp_restore_sock_calls is called (from __nvme_tcp_stop_queue)
queue->sock is NULL

Are you familiar with such an issue ?

crash> bt
PID: 193053  TASK: ffff9491bdad17c0  CPU: 7   COMMAND: "kworker/u193:9"
 #0 [ffffb2e9cfdbbb70] machine_kexec at ffffffffb245bf3e
 #1 [ffffb2e9cfdbbbc8] __crash_kexec at ffffffffb256072d
 #2 [ffffb2e9cfdbbc90] crash_kexec at ffffffffb256160d
 #3 [ffffb2e9cfdbbca8] oops_end at ffffffffb2422d4d
 #4 [ffffb2e9cfdbbcc8] no_context at ffffffffb246ba9e
 #5 [ffffb2e9cfdbbd20] do_page_fault at ffffffffb246c5c2
 #6 [ffffb2e9cfdbbd50] page_fault at ffffffffb2e0122e
    [exception RIP: _raw_write_lock_bh+23]
    RIP: ffffffffb2cd6cc7  RSP: ffffb2e9cfdbbe00  RFLAGS: 00010246
    RAX: 0000000000000000  RBX: ffff94b2aefb4000  RCX: 0000000000000003
    RDX: 00000000000000ff  RSI: 00000000fffffe01  RDI: 0000000000000230
    RBP: ffff94923f793f40   R8: ffff9492ff1ea7f8   R9: 0000000000000000
    R10: 0000000000000000  R11: ffff9492ff1e8c64  R12: ffff94b2b7210338
    R13: 0000000000000000  R14: ffff94b27f7a4100  R15: ffff94b2b72110a0
    ORIG_RAX: ffffffffffffffff  CS: 0010  SS: 0018
 #7 [ffffb2e9cfdbbe00] __nvme_tcp_stop_queue at ffffffffc02dc0aa [nvme_tcp]
 #8 [ffffb2e9cfdbbe18] nvme_tcp_start_queue at ffffffffc02dcd18 [nvme_tcp]
 #9 [ffffb2e9cfdbbe38] nvme_tcp_setup_ctrl at ffffffffc02df258 [nvme_tcp]
#10 [ffffb2e9cfdbbe80] nvme_tcp_reconnect_ctrl_work at ffffffffc02df4bf [nvme_tcp]
#11 [ffffb2e9cfdbbe98] process_one_work at ffffffffb24d3477
#12 [ffffb2e9cfdbbed8] worker_thread at ffffffffb24d3b40
#13 [ffffb2e9cfdbbf10] kthread at ffffffffb24d9502
#14 [ffffb2e9cfdbbf50] ret_from_fork at ffffffffb2e00255


crash> bt -l
PID: 193053  TASK: ffff9491bdad17c0  CPU: 7   COMMAND: "kworker/u193:9"
 #0 [ffffb2e9cfdbbb70] machine_kexec at ffffffffb245bf3e
    /usr/src/debug/kernel-4.18.0-240.el8/linux-4.18.0-240.el8.x86_64/arch/x86/kernel/machine_kexec_64.c: 389
 #1 [ffffb2e9cfdbbbc8] __crash_kexec at ffffffffb256072d
    /usr/src/debug/kernel-4.18.0-240.el8/linux-4.18.0-240.el8.x86_64/kernel/kexec_core.c: 956
 #2 [ffffb2e9cfdbbc90] crash_kexec at ffffffffb256160d
    /usr/src/debug/kernel-4.18.0-240.el8/linux-4.18.0-240.el8.x86_64/./include/linux/compiler.h: 219
 #3 [ffffb2e9cfdbbca8] oops_end at ffffffffb2422d4d
    /usr/src/debug/kernel-4.18.0-240.el8/linux-4.18.0-240.el8.x86_64/arch/x86/kernel/dumpstack.c: 334
 #4 [ffffb2e9cfdbbcc8] no_context at ffffffffb246ba9e
    /usr/src/debug/kernel-4.18.0-240.el8/linux-4.18.0-240.el8.x86_64/arch/x86/mm/fault.c: 773
 #5 [ffffb2e9cfdbbd20] do_page_fault at ffffffffb246c5c2
    /usr/src/debug/kernel-4.18.0-240.el8/linux-4.18.0-240.el8.x86_64/./arch/x86/include/asm/jump_label.h: 38
 #6 [ffffb2e9cfdbbd50] page_fault at ffffffffb2e0122e
    /usr/src/debug/kernel-4.18.0-240.el8/linux-4.18.0-240.el8.x86_64/arch/x86/entry/entry_64.S: 1183
    [exception RIP: _raw_write_lock_bh+23]
    RIP: ffffffffb2cd6cc7  RSP: ffffb2e9cfdbbe00  RFLAGS: 00010246
    RAX: 0000000000000000  RBX: ffff94b2aefb4000  RCX: 0000000000000003
    RDX: 00000000000000ff  RSI: 00000000fffffe01  RDI: 0000000000000230
    RBP: ffff94923f793f40   R8: ffff9492ff1ea7f8   R9: 0000000000000000
    R10: 0000000000000000  R11: ffff9492ff1e8c64  R12: ffff94b2b7210338
    R13: 0000000000000000  R14: ffff94b27f7a4100  R15: ffff94b2b72110a0
    ORIG_RAX: ffffffffffffffff  CS: 0010  SS: 0018
    /usr/src/debug/kernel-4.18.0-240.el8/linux-4.18.0-240.el8.x86_64/./arch/x86/include/asm/atomic.h: 194
 #7 [ffffb2e9cfdbbe00] __nvme_tcp_stop_queue at ffffffffc02dc0aa [nvme_tcp]
    /usr/src/debug/kernel-4.18.0-240.el8/linux-4.18.0-240.el8.x86_64/drivers/nvme/host/tcp.c: 1486
 #8 [ffffb2e9cfdbbe18] nvme_tcp_start_queue at ffffffffc02dcd18 [nvme_tcp]
    /usr/src/debug/kernel-4.18.0-240.el8/linux-4.18.0-240.el8.x86_64/drivers/nvme/host/tcp.c: 1525
 #9 [ffffb2e9cfdbbe38] nvme_tcp_setup_ctrl at ffffffffc02df258 [nvme_tcp]
    /usr/src/debug/kernel-4.18.0-240.el8/linux-4.18.0-240.el8.x86_64/drivers/nvme/host/tcp.c: 1814
#10 [ffffb2e9cfdbbe80] nvme_tcp_reconnect_ctrl_work at ffffffffc02df4bf [nvme_tcp]
    /usr/src/debug/kernel-4.18.0-240.el8/linux-4.18.0-240.el8.x86_64/drivers/nvme/host/tcp.c: 1962
#11 [ffffb2e9cfdbbe98] process_one_work at ffffffffb24d3477
    /usr/src/debug/kernel-4.18.0-240.el8/linux-4.18.0-240.el8.x86_64/./arch/x86/include/asm/jump_label.h: 38
#12 [ffffb2e9cfdbbed8] worker_thread at ffffffffb24d3b40
    /usr/src/debug/kernel-4.18.0-240.el8/linux-4.18.0-240.el8.x86_64/./include/linux/compiler.h: 193
#13 [ffffb2e9cfdbbf10] kthread at ffffffffb24d9502
    /usr/src/debug/kernel-4.18.0-240.el8/linux-4.18.0-240.el8.x86_64/kernel/kthread.c: 280
#14 [ffffb2e9cfdbbf50] ret_from_fork at ffffffffb2e00255
    /usr/src/debug/kernel-4.18.0-240.el8/linux-4.18.0-240.el8.x86_64/arch/x86/entry/entry_64.S: 360

nvme/host/tcp.c
..snip
1481 static void nvme_tcp_restore_sock_calls(struct nvme_tcp_queue *queue)
1482 {
1483 >-------struct socket *sock = queue->sock;
1484 
1485 >-------write_lock_bh(&sock->sk->sk_callback_lock);
1486 >-------sock->sk->sk_user_data  = NULL;
1487 >-------sock->sk->sk_data_ready = queue->data_ready;
1488 >-------sock->sk->sk_state_change = queue->state_change;
1489 >-------sock->sk->sk_write_space  = queue->write_space;
1490 >-------write_unlock_bh(&sock->sk->sk_callback_lock);
1491 }
..snip


NULL pointer dereference at 0x230 → 560 decimal
crash> struct sock -o
struct sock {
   [0] struct sock_common __sk_common;
   …
   ...
   …
   [560] rwlock_t sk_callback_lock;

_______________________________________________
Linux-nvme mailing list
Linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme

^ permalink raw reply	[flat|nested] 11+ messages in thread

* RE: nvme_tcp BUG: unable to handle kernel NULL pointer dereference at 0000000000000230
  2021-06-01 17:51 nvme_tcp BUG: unable to handle kernel NULL pointer dereference at 0000000000000230 Engel, Amit
@ 2021-06-02 12:28 ` Engel, Amit
  2021-06-08 23:39   ` Sagi Grimberg
  0 siblings, 1 reply; 11+ messages in thread
From: Engel, Amit @ 2021-06-02 12:28 UTC (permalink / raw)
  To: linux-nvme, sagi; +Cc: Anner, Ran, Grupi, Elad

Hi Sagi,

A correction to the below analysis:
It seems like sock->sk is NULL and not queue->sock

As part of _nvme_tcp_stop_queue
kernel_sock_shutdown and nvme_tcp_restore_sock_calls are being called:
kernel_sock_shutdown leads to nvme_tcp_state_change which will trigger err_work (nvme_tcp_error_recovery_work)

As part of nvme_tcp_error_recovery_work, nvme_tcp_free_queue is being called which releases the socket (sock_release)

In our case, based on the below bt:
nvme_tcp_error_recovery_work is being triggered (and so sock_release) before nvme_tcp_restore_sock_calls , which end up with NULL dereference pointer at 'rwlock_t sk_callback_lock' ?

Can you please review and provide your inputs for this potential race ?

Thanks
Amit Engel

-----Original Message-----
From: Engel, Amit 
Sent: Tuesday, June 1, 2021 8:52 PM
To: linux-nvme@lists.infradead.org; sagi@grimberg.me
Cc: Engel, Amit
Subject: nvme_tcp BUG: unable to handle kernel NULL pointer dereference at 0000000000000230

Hello,

We hit the below kernel panic “BUG: unable to handle kernel NULL pointer dereference at 0000000000000230”
when running with RHEL8.3 host. This happens after we reboot the target side application (when multiple ctrls/connections exist) Based on vmcore analysis it seems that when nvme_tcp_restore_sock_calls is called (from __nvme_tcp_stop_queue)
queue->sock is NULL

Are you familiar with such an issue ?

crash> bt
PID: 193053  TASK: ffff9491bdad17c0  CPU: 7   COMMAND: "kworker/u193:9"
 #0 [ffffb2e9cfdbbb70] machine_kexec at ffffffffb245bf3e
 #1 [ffffb2e9cfdbbbc8] __crash_kexec at ffffffffb256072d
 #2 [ffffb2e9cfdbbc90] crash_kexec at ffffffffb256160d
 #3 [ffffb2e9cfdbbca8] oops_end at ffffffffb2422d4d
 #4 [ffffb2e9cfdbbcc8] no_context at ffffffffb246ba9e
 #5 [ffffb2e9cfdbbd20] do_page_fault at ffffffffb246c5c2
 #6 [ffffb2e9cfdbbd50] page_fault at ffffffffb2e0122e
    [exception RIP: _raw_write_lock_bh+23]
    RIP: ffffffffb2cd6cc7  RSP: ffffb2e9cfdbbe00  RFLAGS: 00010246
    RAX: 0000000000000000  RBX: ffff94b2aefb4000  RCX: 0000000000000003
    RDX: 00000000000000ff  RSI: 00000000fffffe01  RDI: 0000000000000230
    RBP: ffff94923f793f40   R8: ffff9492ff1ea7f8   R9: 0000000000000000
    R10: 0000000000000000  R11: ffff9492ff1e8c64  R12: ffff94b2b7210338
    R13: 0000000000000000  R14: ffff94b27f7a4100  R15: ffff94b2b72110a0
    ORIG_RAX: ffffffffffffffff  CS: 0010  SS: 0018
 #7 [ffffb2e9cfdbbe00] __nvme_tcp_stop_queue at ffffffffc02dc0aa [nvme_tcp]
 #8 [ffffb2e9cfdbbe18] nvme_tcp_start_queue at ffffffffc02dcd18 [nvme_tcp]
 #9 [ffffb2e9cfdbbe38] nvme_tcp_setup_ctrl at ffffffffc02df258 [nvme_tcp]
#10 [ffffb2e9cfdbbe80] nvme_tcp_reconnect_ctrl_work at ffffffffc02df4bf [nvme_tcp]
#11 [ffffb2e9cfdbbe98] process_one_work at ffffffffb24d3477
#12 [ffffb2e9cfdbbed8] worker_thread at ffffffffb24d3b40
#13 [ffffb2e9cfdbbf10] kthread at ffffffffb24d9502
#14 [ffffb2e9cfdbbf50] ret_from_fork at ffffffffb2e00255


crash> bt -l
PID: 193053  TASK: ffff9491bdad17c0  CPU: 7   COMMAND: "kworker/u193:9"
 #0 [ffffb2e9cfdbbb70] machine_kexec at ffffffffb245bf3e
    /usr/src/debug/kernel-4.18.0-240.el8/linux-4.18.0-240.el8.x86_64/arch/x86/kernel/machine_kexec_64.c: 389
 #1 [ffffb2e9cfdbbbc8] __crash_kexec at ffffffffb256072d
    /usr/src/debug/kernel-4.18.0-240.el8/linux-4.18.0-240.el8.x86_64/kernel/kexec_core.c: 956
 #2 [ffffb2e9cfdbbc90] crash_kexec at ffffffffb256160d
    /usr/src/debug/kernel-4.18.0-240.el8/linux-4.18.0-240.el8.x86_64/./include/linux/compiler.h: 219
 #3 [ffffb2e9cfdbbca8] oops_end at ffffffffb2422d4d
    /usr/src/debug/kernel-4.18.0-240.el8/linux-4.18.0-240.el8.x86_64/arch/x86/kernel/dumpstack.c: 334
 #4 [ffffb2e9cfdbbcc8] no_context at ffffffffb246ba9e
    /usr/src/debug/kernel-4.18.0-240.el8/linux-4.18.0-240.el8.x86_64/arch/x86/mm/fault.c: 773
 #5 [ffffb2e9cfdbbd20] do_page_fault at ffffffffb246c5c2
    /usr/src/debug/kernel-4.18.0-240.el8/linux-4.18.0-240.el8.x86_64/./arch/x86/include/asm/jump_label.h: 38
 #6 [ffffb2e9cfdbbd50] page_fault at ffffffffb2e0122e
    /usr/src/debug/kernel-4.18.0-240.el8/linux-4.18.0-240.el8.x86_64/arch/x86/entry/entry_64.S: 1183
    [exception RIP: _raw_write_lock_bh+23]
    RIP: ffffffffb2cd6cc7  RSP: ffffb2e9cfdbbe00  RFLAGS: 00010246
    RAX: 0000000000000000  RBX: ffff94b2aefb4000  RCX: 0000000000000003
    RDX: 00000000000000ff  RSI: 00000000fffffe01  RDI: 0000000000000230
    RBP: ffff94923f793f40   R8: ffff9492ff1ea7f8   R9: 0000000000000000
    R10: 0000000000000000  R11: ffff9492ff1e8c64  R12: ffff94b2b7210338
    R13: 0000000000000000  R14: ffff94b27f7a4100  R15: ffff94b2b72110a0
    ORIG_RAX: ffffffffffffffff  CS: 0010  SS: 0018
    /usr/src/debug/kernel-4.18.0-240.el8/linux-4.18.0-240.el8.x86_64/./arch/x86/include/asm/atomic.h: 194
 #7 [ffffb2e9cfdbbe00] __nvme_tcp_stop_queue at ffffffffc02dc0aa [nvme_tcp]
    /usr/src/debug/kernel-4.18.0-240.el8/linux-4.18.0-240.el8.x86_64/drivers/nvme/host/tcp.c: 1486
 #8 [ffffb2e9cfdbbe18] nvme_tcp_start_queue at ffffffffc02dcd18 [nvme_tcp]
    /usr/src/debug/kernel-4.18.0-240.el8/linux-4.18.0-240.el8.x86_64/drivers/nvme/host/tcp.c: 1525
 #9 [ffffb2e9cfdbbe38] nvme_tcp_setup_ctrl at ffffffffc02df258 [nvme_tcp]
    /usr/src/debug/kernel-4.18.0-240.el8/linux-4.18.0-240.el8.x86_64/drivers/nvme/host/tcp.c: 1814
#10 [ffffb2e9cfdbbe80] nvme_tcp_reconnect_ctrl_work at ffffffffc02df4bf [nvme_tcp]
    /usr/src/debug/kernel-4.18.0-240.el8/linux-4.18.0-240.el8.x86_64/drivers/nvme/host/tcp.c: 1962
#11 [ffffb2e9cfdbbe98] process_one_work at ffffffffb24d3477
    /usr/src/debug/kernel-4.18.0-240.el8/linux-4.18.0-240.el8.x86_64/./arch/x86/include/asm/jump_label.h: 38
#12 [ffffb2e9cfdbbed8] worker_thread at ffffffffb24d3b40
    /usr/src/debug/kernel-4.18.0-240.el8/linux-4.18.0-240.el8.x86_64/./include/linux/compiler.h: 193
#13 [ffffb2e9cfdbbf10] kthread at ffffffffb24d9502
    /usr/src/debug/kernel-4.18.0-240.el8/linux-4.18.0-240.el8.x86_64/kernel/kthread.c: 280
#14 [ffffb2e9cfdbbf50] ret_from_fork at ffffffffb2e00255
    /usr/src/debug/kernel-4.18.0-240.el8/linux-4.18.0-240.el8.x86_64/arch/x86/entry/entry_64.S: 360

nvme/host/tcp.c
..snip
1481 static void nvme_tcp_restore_sock_calls(struct nvme_tcp_queue *queue)
1482 {
1483 >-------struct socket *sock = queue->sock;
1484
1485 >-------write_lock_bh(&sock->sk->sk_callback_lock);
1486 >-------sock->sk->sk_user_data  = NULL;
1487 >-------sock->sk->sk_data_ready = queue->data_ready;
1488 >-------sock->sk->sk_state_change = queue->state_change;
1489 >-------sock->sk->sk_write_space  = queue->write_space;
1490 >-------write_unlock_bh(&sock->sk->sk_callback_lock);
1491 }
..snip


NULL pointer dereference at 0x230 → 560 decimal
crash> struct sock -o
struct sock {
   [0] struct sock_common __sk_common;
   …
   ...
   …
   [560] rwlock_t sk_callback_lock;

_______________________________________________
Linux-nvme mailing list
Linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: nvme_tcp BUG: unable to handle kernel NULL pointer dereference at 0000000000000230
  2021-06-02 12:28 ` Engel, Amit
@ 2021-06-08 23:39   ` Sagi Grimberg
  2021-06-09  7:48     ` Engel, Amit
  0 siblings, 1 reply; 11+ messages in thread
From: Sagi Grimberg @ 2021-06-08 23:39 UTC (permalink / raw)
  To: Engel, Amit, linux-nvme; +Cc: Anner, Ran, Grupi, Elad


> Hi Sagi,
> 
> A correction to the below analysis:
> It seems like sock->sk is NULL and not queue->sock
> 
> As part of _nvme_tcp_stop_queue
> kernel_sock_shutdown and nvme_tcp_restore_sock_calls are being called:
> kernel_sock_shutdown leads to nvme_tcp_state_change which will trigger err_work (nvme_tcp_error_recovery_work)
> 
> As part of nvme_tcp_error_recovery_work, nvme_tcp_free_queue is being called which releases the socket (sock_release)
> 
> In our case, based on the below bt:
> nvme_tcp_error_recovery_work is being triggered (and so sock_release) before nvme_tcp_restore_sock_calls , which end up with NULL dereference pointer at 'rwlock_t sk_callback_lock' ?
> 
> Can you please review and provide your inputs for this potential race ?

Seems that RH8.3 is missing the mutex protection on nvme_tcp_stop_queue.
I'm assuming it doesn't happen upstream?

_______________________________________________
Linux-nvme mailing list
Linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme

^ permalink raw reply	[flat|nested] 11+ messages in thread

* RE: nvme_tcp BUG: unable to handle kernel NULL pointer dereference at 0000000000000230
  2021-06-08 23:39   ` Sagi Grimberg
@ 2021-06-09  7:48     ` Engel, Amit
  2021-06-09  8:04       ` Sagi Grimberg
  0 siblings, 1 reply; 11+ messages in thread
From: Engel, Amit @ 2021-06-09  7:48 UTC (permalink / raw)
  To: Sagi Grimberg, linux-nvme; +Cc: Anner, Ran, Grupi, Elad

Hi Sagi,

Indeed RHEL8.3 does not have the mutex protection on nvme_tcp_stop_queue
However, in our case, based on the below back trace
We don't get to __nvme_tcp_stop_queue from nvme_tcp_stop_queue
We get to it from:
nvme_tcp_reconnect_ctrl_work --> nvme_tcp_setup_ctrl --> nvme_tcp_start_queue  --> __nvme_tcp_stop_queue

so I'm not sure how this mutex protection will help in this case

crash> bt -l
PID: 193053  TASK: ffff9491bdad17c0  CPU: 7   COMMAND: "kworker/u193:9"
 #0 [ffffb2e9cfdbbb70] machine_kexec at ffffffffb245bf3e
    /usr/src/debug/kernel-4.18.0-240.el8/linux-4.18.0-240.el8.x86_64/arch/x86/kernel/machine_kexec_64.c: 389
 #1 [ffffb2e9cfdbbbc8] __crash_kexec at ffffffffb256072d
    /usr/src/debug/kernel-4.18.0-240.el8/linux-4.18.0-240.el8.x86_64/kernel/kexec_core.c: 956
 #2 [ffffb2e9cfdbbc90] crash_kexec at ffffffffb256160d
    /usr/src/debug/kernel-4.18.0-240.el8/linux-4.18.0-240.el8.x86_64/./include/linux/compiler.h: 219
 #3 [ffffb2e9cfdbbca8] oops_end at ffffffffb2422d4d
    /usr/src/debug/kernel-4.18.0-240.el8/linux-4.18.0-240.el8.x86_64/arch/x86/kernel/dumpstack.c: 334
 #4 [ffffb2e9cfdbbcc8] no_context at ffffffffb246ba9e
    /usr/src/debug/kernel-4.18.0-240.el8/linux-4.18.0-240.el8.x86_64/arch/x86/mm/fault.c: 773
 #5 [ffffb2e9cfdbbd20] do_page_fault at ffffffffb246c5c2
    /usr/src/debug/kernel-4.18.0-240.el8/linux-4.18.0-240.el8.x86_64/./arch/x86/include/asm/jump_label.h: 38
 #6 [ffffb2e9cfdbbd50] page_fault at ffffffffb2e0122e
    /usr/src/debug/kernel-4.18.0-240.el8/linux-4.18.0-240.el8.x86_64/arch/x86/entry/entry_64.S: 1183
    [exception RIP: _raw_write_lock_bh+23]
    RIP: ffffffffb2cd6cc7  RSP: ffffb2e9cfdbbe00  RFLAGS: 00010246
    RAX: 0000000000000000  RBX: ffff94b2aefb4000  RCX: 0000000000000003
    RDX: 00000000000000ff  RSI: 00000000fffffe01  RDI: 0000000000000230
    RBP: ffff94923f793f40   R8: ffff9492ff1ea7f8   R9: 0000000000000000
    R10: 0000000000000000  R11: ffff9492ff1e8c64  R12: ffff94b2b7210338
    R13: 0000000000000000  R14: ffff94b27f7a4100  R15: ffff94b2b72110a0
    ORIG_RAX: ffffffffffffffff  CS: 0010  SS: 0018
    /usr/src/debug/kernel-4.18.0-240.el8/linux-4.18.0-240.el8.x86_64/./arch/x86/include/asm/atomic.h: 194
 #7 [ffffb2e9cfdbbe00] __nvme_tcp_stop_queue at ffffffffc02dc0aa [nvme_tcp]
    /usr/src/debug/kernel-4.18.0-240.el8/linux-4.18.0-240.el8.x86_64/drivers/nvme/host/tcp.c: 1486
 #8 [ffffb2e9cfdbbe18] nvme_tcp_start_queue at ffffffffc02dcd18 [nvme_tcp]
    /usr/src/debug/kernel-4.18.0-240.el8/linux-4.18.0-240.el8.x86_64/drivers/nvme/host/tcp.c: 1525
 #9 [ffffb2e9cfdbbe38] nvme_tcp_setup_ctrl at ffffffffc02df258 [nvme_tcp]
    /usr/src/debug/kernel-4.18.0-240.el8/linux-4.18.0-240.el8.x86_64/drivers/nvme/host/tcp.c: 1814
#10 [ffffb2e9cfdbbe80] nvme_tcp_reconnect_ctrl_work at ffffffffc02df4bf [nvme_tcp]
    /usr/src/debug/kernel-4.18.0-240.el8/linux-4.18.0-240.el8.x86_64/drivers/nvme/host/tcp.c: 1962
#11 [ffffb2e9cfdbbe98] process_one_work at ffffffffb24d3477
    /usr/src/debug/kernel-4.18.0-240.el8/linux-4.18.0-240.el8.x86_64/./arch/x86/include/asm/jump_label.h: 38
#12 [ffffb2e9cfdbbed8] worker_thread at ffffffffb24d3b40
    /usr/src/debug/kernel-4.18.0-240.el8/linux-4.18.0-240.el8.x86_64/./include/linux/compiler.h: 193
#13 [ffffb2e9cfdbbf10] kthread at ffffffffb24d9502
    /usr/src/debug/kernel-4.18.0-240.el8/linux-4.18.0-240.el8.x86_64/kernel/kthread.c: 280
#14 [ffffb2e9cfdbbf50] ret_from_fork at ffffffffb2e00255
    /usr/src/debug/kernel-4.18.0-240.el8/linux-4.18.0-240.el8.x86_64/arch/x86/entry/entry_64.S: 360

-----Original Message-----
From: Sagi Grimberg <sagi@grimberg.me> 
Sent: Wednesday, June 9, 2021 2:39 AM
To: Engel, Amit; linux-nvme@lists.infradead.org
Cc: Anner, Ran; Grupi, Elad
Subject: Re: nvme_tcp BUG: unable to handle kernel NULL pointer dereference at 0000000000000230


[EXTERNAL EMAIL] 


> Hi Sagi,
> 
> A correction to the below analysis:
> It seems like sock->sk is NULL and not queue->sock
> 
> As part of _nvme_tcp_stop_queue
> kernel_sock_shutdown and nvme_tcp_restore_sock_calls are being called:
> kernel_sock_shutdown leads to nvme_tcp_state_change which will trigger err_work (nvme_tcp_error_recovery_work)
> 
> As part of nvme_tcp_error_recovery_work, nvme_tcp_free_queue is being called which releases the socket (sock_release)
> 
> In our case, based on the below bt:
> nvme_tcp_error_recovery_work is being triggered (and so sock_release) before nvme_tcp_restore_sock_calls , which end up with NULL dereference pointer at 'rwlock_t sk_callback_lock' ?
> 
> Can you please review and provide your inputs for this potential race ?

Seems that RH8.3 is missing the mutex protection on nvme_tcp_stop_queue.
I'm assuming it doesn't happen upstream?

_______________________________________________
Linux-nvme mailing list
Linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: nvme_tcp BUG: unable to handle kernel NULL pointer dereference at 0000000000000230
  2021-06-09  7:48     ` Engel, Amit
@ 2021-06-09  8:04       ` Sagi Grimberg
  2021-06-09  8:39         ` Engel, Amit
  0 siblings, 1 reply; 11+ messages in thread
From: Sagi Grimberg @ 2021-06-09  8:04 UTC (permalink / raw)
  To: Engel, Amit, linux-nvme; +Cc: Anner, Ran, Grupi, Elad


> Hi Sagi,
> 
> Indeed RHEL8.3 does not have the mutex protection on nvme_tcp_stop_queue
> However, in our case, based on the below back trace
> We don't get to __nvme_tcp_stop_queue from nvme_tcp_stop_queue
> We get to it from:
> nvme_tcp_reconnect_ctrl_work --> nvme_tcp_setup_ctrl --> nvme_tcp_start_queue  --> __nvme_tcp_stop_queue
> 
> so I'm not sure how this mutex protection will help in this case


Oh, well iirc we probably need the same mutex protection in start
failure case then?
--
diff --git a/drivers/nvme/host/tcp.c b/drivers/nvme/host/tcp.c
index 216d21a6a165..00dff3654e6f 100644
--- a/drivers/nvme/host/tcp.c
+++ b/drivers/nvme/host/tcp.c
@@ -1548,6 +1548,7 @@ static void nvme_tcp_stop_queue(struct nvme_ctrl 
*nctrl, int qid)
  static int nvme_tcp_start_queue(struct nvme_ctrl *nctrl, int idx)
  {
         struct nvme_tcp_ctrl *ctrl = to_tcp_ctrl(nctrl);
+       struct nvme_tcp_queue *queue = &ctrl->queues[idx];
         int ret;

         if (idx)
@@ -1556,10 +1557,12 @@ static int nvme_tcp_start_queue(struct nvme_ctrl 
*nctrl, int idx)
                 ret = nvmf_connect_admin_queue(nctrl);

         if (!ret) {
-               set_bit(NVME_TCP_Q_LIVE, &ctrl->queues[idx].flags);
+               set_bit(NVME_TCP_Q_LIVE, &queue->flags);
         } else {
-               if (test_bit(NVME_TCP_Q_ALLOCATED, 
&ctrl->queues[idx].flags))
-                       __nvme_tcp_stop_queue(&ctrl->queues[idx]);
+               mutex_lock(&queue->queue_lock);
+               if (test_bit(NVME_TCP_Q_ALLOCATED, &queue->flags))
+                       __nvme_tcp_stop_queue(queue);
+               mutex_unlock(&queue->queue_lock);
                 dev_err(nctrl->device,
                         "failed to connect queue: %d ret=%d\n", idx, ret);
         }
--

_______________________________________________
Linux-nvme mailing list
Linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme

^ permalink raw reply related	[flat|nested] 11+ messages in thread

* RE: nvme_tcp BUG: unable to handle kernel NULL pointer dereference at 0000000000000230
  2021-06-09  8:04       ` Sagi Grimberg
@ 2021-06-09  8:39         ` Engel, Amit
  2021-06-09  9:11           ` Sagi Grimberg
  0 siblings, 1 reply; 11+ messages in thread
From: Engel, Amit @ 2021-06-09  8:39 UTC (permalink / raw)
  To: Sagi Grimberg, linux-nvme; +Cc: Anner, Ran, Grupi, Elad


Im not sure that using the queue_lock mutex ill help
The race in this case is between sock_release and nvme_tcp_restore_sock_calls
sock_release is being called as part of nvme_tcp_free_queue which is destroying the mutex

-----Original Message-----
From: Sagi Grimberg <sagi@grimberg.me> 
Sent: Wednesday, June 9, 2021 11:05 AM
To: Engel, Amit; linux-nvme@lists.infradead.org
Cc: Anner, Ran; Grupi, Elad
Subject: Re: nvme_tcp BUG: unable to handle kernel NULL pointer dereference at 0000000000000230


[EXTERNAL EMAIL] 


> Hi Sagi,
> 
> Indeed RHEL8.3 does not have the mutex protection on 
> nvme_tcp_stop_queue However, in our case, based on the below back 
> trace We don't get to __nvme_tcp_stop_queue from nvme_tcp_stop_queue 
> We get to it from:
> nvme_tcp_reconnect_ctrl_work --> nvme_tcp_setup_ctrl --> 
> nvme_tcp_start_queue  --> __nvme_tcp_stop_queue
> 
> so I'm not sure how this mutex protection will help in this case


Oh, well iirc we probably need the same mutex protection in start failure case then?
--
diff --git a/drivers/nvme/host/tcp.c b/drivers/nvme/host/tcp.c index 216d21a6a165..00dff3654e6f 100644
--- a/drivers/nvme/host/tcp.c
+++ b/drivers/nvme/host/tcp.c
@@ -1548,6 +1548,7 @@ static void nvme_tcp_stop_queue(struct nvme_ctrl *nctrl, int qid)
  static int nvme_tcp_start_queue(struct nvme_ctrl *nctrl, int idx)
  {
         struct nvme_tcp_ctrl *ctrl = to_tcp_ctrl(nctrl);
+       struct nvme_tcp_queue *queue = &ctrl->queues[idx];
         int ret;

         if (idx)
@@ -1556,10 +1557,12 @@ static int nvme_tcp_start_queue(struct nvme_ctrl *nctrl, int idx)
                 ret = nvmf_connect_admin_queue(nctrl);

         if (!ret) {
-               set_bit(NVME_TCP_Q_LIVE, &ctrl->queues[idx].flags);
+               set_bit(NVME_TCP_Q_LIVE, &queue->flags);
         } else {
-               if (test_bit(NVME_TCP_Q_ALLOCATED, 
&ctrl->queues[idx].flags))
-                       __nvme_tcp_stop_queue(&ctrl->queues[idx]);
+               mutex_lock(&queue->queue_lock);
+               if (test_bit(NVME_TCP_Q_ALLOCATED, &queue->flags))
+                       __nvme_tcp_stop_queue(queue);
+               mutex_unlock(&queue->queue_lock);
                 dev_err(nctrl->device,
                         "failed to connect queue: %d ret=%d\n", idx, ret);
         }
--
_______________________________________________
Linux-nvme mailing list
Linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: nvme_tcp BUG: unable to handle kernel NULL pointer dereference at 0000000000000230
  2021-06-09  8:39         ` Engel, Amit
@ 2021-06-09  9:11           ` Sagi Grimberg
  2021-06-09 11:14             ` Engel, Amit
  0 siblings, 1 reply; 11+ messages in thread
From: Sagi Grimberg @ 2021-06-09  9:11 UTC (permalink / raw)
  To: Engel, Amit, linux-nvme; +Cc: Anner, Ran, Grupi, Elad


> Im not sure that using the queue_lock mutex ill help
> The race in this case is between sock_release and nvme_tcp_restore_sock_calls
> sock_release is being called as part of nvme_tcp_free_queue which is destroying the mutex

Maybe I'm not understanding the issue here. What is the scenario again?
stop_queue is called (ctx1), that triggers error_recovery (ctx2) which
then calls free_queue before ctx1 gets to restore sock callbacks?

err_work will first stop the queues before freeing them, so it will
serialize behind ctx1. What am I missing?

_______________________________________________
Linux-nvme mailing list
Linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme

^ permalink raw reply	[flat|nested] 11+ messages in thread

* RE: nvme_tcp BUG: unable to handle kernel NULL pointer dereference at 0000000000000230
  2021-06-09  9:11           ` Sagi Grimberg
@ 2021-06-09 11:14             ` Engel, Amit
  2021-06-10  8:44               ` Engel, Amit
  2021-06-10 20:03               ` Sagi Grimberg
  0 siblings, 2 replies; 11+ messages in thread
From: Engel, Amit @ 2021-06-09 11:14 UTC (permalink / raw)
  To: Sagi Grimberg, linux-nvme; +Cc: Anner, Ran, Grupi, Elad

Correct, free_queue is being called (sock->sk becomes NULL) before restore_sock_calls

When restore_sock_calls is called, we fail on 'write_lock_bh(&sock->sk->sk_callback_lock)' 

NULL pointer dereference at 0x230 → 560 decimal
crash> struct sock -o
struct sock {
   [0] struct sock_common __sk_common;
   …
   ...
   …
   [560] rwlock_t sk_callback_lock;

stop queue in ctx2 does not really do anything since 'NVME_TCP_Q_LIVE' bit is already cleared (by ctx1).
can you please explain how stop the queue before free helps to serialize ctx1 ?

The race we are describing is based on the panic bt that I shared.

maybe our analysis is not accurate? 

Thanks,
Amit

-----Original Message-----
From: Sagi Grimberg <sagi@grimberg.me> 
Sent: Wednesday, June 9, 2021 12:11 PM
To: Engel, Amit; linux-nvme@lists.infradead.org
Cc: Anner, Ran; Grupi, Elad
Subject: Re: nvme_tcp BUG: unable to handle kernel NULL pointer dereference at 0000000000000230


[EXTERNAL EMAIL] 


> Im not sure that using the queue_lock mutex ill help The race in this 
> case is between sock_release and nvme_tcp_restore_sock_calls 
> sock_release is being called as part of nvme_tcp_free_queue which is 
> destroying the mutex

Maybe I'm not understanding the issue here. What is the scenario again?
stop_queue is called (ctx1), that triggers error_recovery (ctx2) which then calls free_queue before ctx1 gets to restore sock callbacks?

err_work will first stop the queues before freeing them, so it will serialize behind ctx1. What am I missing?
_______________________________________________
Linux-nvme mailing list
Linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme

^ permalink raw reply	[flat|nested] 11+ messages in thread

* RE: nvme_tcp BUG: unable to handle kernel NULL pointer dereference at 0000000000000230
  2021-06-09 11:14             ` Engel, Amit
@ 2021-06-10  8:44               ` Engel, Amit
  2021-06-10 20:03               ` Sagi Grimberg
  1 sibling, 0 replies; 11+ messages in thread
From: Engel, Amit @ 2021-06-10  8:44 UTC (permalink / raw)
  To: Sagi Grimberg, linux-nvme

We are trying to reproduce with the upstream code now, will update

In addition, I just posted a patch to add mutex lock also to start queue failure case, as suggested by Sagi.

Thanks
Amit 

-----Original Message-----
From: Engel, Amit 
Sent: Wednesday, June 9, 2021 2:14 PM
To: 'Sagi Grimberg'; linux-nvme@lists.infradead.org
Cc: Anner, Ran; Grupi, Elad
Subject: RE: nvme_tcp BUG: unable to handle kernel NULL pointer dereference at 0000000000000230

Correct, free_queue is being called (sock->sk becomes NULL) before restore_sock_calls

When restore_sock_calls is called, we fail on 'write_lock_bh(&sock->sk->sk_callback_lock)' 

NULL pointer dereference at 0x230 → 560 decimal
crash> struct sock -o
struct sock {
   [0] struct sock_common __sk_common;
   …
   ...
   …
   [560] rwlock_t sk_callback_lock;

stop queue in ctx2 does not really do anything since 'NVME_TCP_Q_LIVE' bit is already cleared (by ctx1).
can you please explain how stop the queue before free helps to serialize ctx1 ?

The race we are describing is based on the panic bt that I shared.

maybe our analysis is not accurate? 

Thanks,
Amit

-----Original Message-----
From: Sagi Grimberg <sagi@grimberg.me> 
Sent: Wednesday, June 9, 2021 12:11 PM
To: Engel, Amit; linux-nvme@lists.infradead.org
Cc: Anner, Ran; Grupi, Elad
Subject: Re: nvme_tcp BUG: unable to handle kernel NULL pointer dereference at 0000000000000230


[EXTERNAL EMAIL] 


> Im not sure that using the queue_lock mutex ill help The race in this 
> case is between sock_release and nvme_tcp_restore_sock_calls 
> sock_release is being called as part of nvme_tcp_free_queue which is 
> destroying the mutex

Maybe I'm not understanding the issue here. What is the scenario again?
stop_queue is called (ctx1), that triggers error_recovery (ctx2) which then calls free_queue before ctx1 gets to restore sock callbacks?

err_work will first stop the queues before freeing them, so it will serialize behind ctx1. What am I missing?
_______________________________________________
Linux-nvme mailing list
Linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: nvme_tcp BUG: unable to handle kernel NULL pointer dereference at 0000000000000230
  2021-06-09 11:14             ` Engel, Amit
  2021-06-10  8:44               ` Engel, Amit
@ 2021-06-10 20:03               ` Sagi Grimberg
  2021-06-13  8:35                 ` Engel, Amit
  1 sibling, 1 reply; 11+ messages in thread
From: Sagi Grimberg @ 2021-06-10 20:03 UTC (permalink / raw)
  To: Engel, Amit, linux-nvme; +Cc: Anner, Ran, Grupi, Elad


> Correct, free_queue is being called (sock->sk becomes NULL) before restore_sock_calls
> 
> When restore_sock_calls is called, we fail on 'write_lock_bh(&sock->sk->sk_callback_lock)'
> 
> NULL pointer dereference at 0x230 → 560 decimal
> crash> struct sock -o
> struct sock {
>     [0] struct sock_common __sk_common;
>     …
>     ...
>     …
>     [560] rwlock_t sk_callback_lock;
> 
> stop queue in ctx2 does not really do anything since 'NVME_TCP_Q_LIVE' bit is already cleared (by ctx1).
> can you please explain how stop the queue before free helps to serialize ctx1 ?

What I understood from your description is:
1. ctx1 calls stop_queue - calls kernel_sock_shutdown
2. ctx1 gets to restore_sock_calls (just before)
3. ctx2 is triggered from state_change - scheduling err_work
4. ctx2 does stop_queues
5. ctx2 calls destroy_queues -> there does sock_release
6. ctx1 does frwd progress and access an already freed sk

Hence with the mutex protection, ctx2 will be serialized on step (4)
until ctx2 releases the mutex and hence cannot get to step (5) but
only after ctx1 releases the mutex, in step (6).

But maybe I'm not interpreting this correctly?

_______________________________________________
Linux-nvme mailing list
Linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme

^ permalink raw reply	[flat|nested] 11+ messages in thread

* RE: nvme_tcp BUG: unable to handle kernel NULL pointer dereference at 0000000000000230
  2021-06-10 20:03               ` Sagi Grimberg
@ 2021-06-13  8:35                 ` Engel, Amit
  0 siblings, 0 replies; 11+ messages in thread
From: Engel, Amit @ 2021-06-13  8:35 UTC (permalink / raw)
  To: Sagi Grimberg, linux-nvme; +Cc: Anner, Ran, Grupi, Elad

Hi Sagi, after revisit the upstream code, I agree with your analysis.
We are trying now to run rhel8.3 nvme host with the missing mutex_lock commit.

Anyway, I applied a patch that uses the same mutex_lock also for start_queue failure case.

In this case, nvme_tcp_start_queue calls __nvme_tcp_stop_queue that should be protected by the same mutex_lock.

Thank you for your help
Amit

-----Original Message-----
From: Sagi Grimberg <sagi@grimberg.me> 
Sent: Thursday, June 10, 2021 11:03 PM
To: Engel, Amit; linux-nvme@lists.infradead.org
Cc: Anner, Ran; Grupi, Elad
Subject: Re: nvme_tcp BUG: unable to handle kernel NULL pointer dereference at 0000000000000230


[EXTERNAL EMAIL] 


> Correct, free_queue is being called (sock->sk becomes NULL) before 
> restore_sock_calls
> 
> When restore_sock_calls is called, we fail on 'write_lock_bh(&sock->sk->sk_callback_lock)'
> 
> NULL pointer dereference at 0x230 → 560 decimal
> crash> struct sock -o
> struct sock {
>     [0] struct sock_common __sk_common;
>     …
>     ...
>     …
>     [560] rwlock_t sk_callback_lock;
> 
> stop queue in ctx2 does not really do anything since 'NVME_TCP_Q_LIVE' bit is already cleared (by ctx1).
> can you please explain how stop the queue before free helps to serialize ctx1 ?

What I understood from your description is:
1. ctx1 calls stop_queue - calls kernel_sock_shutdown 2. ctx1 gets to restore_sock_calls (just before) 3. ctx2 is triggered from state_change - scheduling err_work 4. ctx2 does stop_queues 5. ctx2 calls destroy_queues -> there does sock_release 6. ctx1 does frwd progress and access an already freed sk

Hence with the mutex protection, ctx2 will be serialized on step (4) until ctx2 releases the mutex and hence cannot get to step (5) but only after ctx1 releases the mutex, in step (6).

But maybe I'm not interpreting this correctly?
_______________________________________________
Linux-nvme mailing list
Linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme

^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2021-06-13  8:36 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-06-01 17:51 nvme_tcp BUG: unable to handle kernel NULL pointer dereference at 0000000000000230 Engel, Amit
2021-06-02 12:28 ` Engel, Amit
2021-06-08 23:39   ` Sagi Grimberg
2021-06-09  7:48     ` Engel, Amit
2021-06-09  8:04       ` Sagi Grimberg
2021-06-09  8:39         ` Engel, Amit
2021-06-09  9:11           ` Sagi Grimberg
2021-06-09 11:14             ` Engel, Amit
2021-06-10  8:44               ` Engel, Amit
2021-06-10 20:03               ` Sagi Grimberg
2021-06-13  8:35                 ` Engel, Amit

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.