All of lore.kernel.org
 help / color / mirror / Atom feed
From: Stefan Majer <stefan.majer@gmail.com>
To: sagi grimberg <sagi@grimberg.me>
Cc: Keith Busch <kbusch@kernel.org>,
	linux-nvme <linux-nvme@lists.infradead.org>
Subject: Re: null pointer dereference in nvme_tcp_io_work
Date: Sat, 28 Dec 2019 18:53:10 +0100	[thread overview]
Message-ID: <CADdPHGt+vLDp6hx0u3nabW7s6Ut11Jzbb4gx2NRD95zu3H9mvQ@mail.gmail.com> (raw)
In-Reply-To: <CADdPHGsT8JxqWN8KKnQgJvNFZXzq08pd5eR1RJeUN-cmhQYH_Q@mail.gmail.com>

I have to add:

./faddr2line  /var/lib/debug/lib/modules/5.3.0-24-generic/kernel/drivers/nvme/host/nvme-tcp.ko
nvme_tcp_io_work+0x341/0x7f0
nvme_tcp_io_work+0x341/0x7f0:
nvme_tcp_req_cur_length at
/build/linux-4AS01l/linux-5.3.0/drivers/nvme/host/tcp.c:189
(inlined by) nvme_tcp_try_send_data at
/build/linux-4AS01l/linux-5.3.0/drivers/nvme/host/tcp.c:854
(inlined by) nvme_tcp_try_send at
/build/linux-4AS01l/linux-5.3.0/drivers/nvme/host/tcp.c:1011
(inlined by) nvme_tcp_io_work at
/build/linux-4AS01l/linux-5.3.0/drivers/nvme/host/tcp.c:1048

On Sat, Dec 28, 2019 at 6:49 PM Stefan Majer <stefan.majer@gmail.com> wrote:
>
> Hi,
>
> took a while, but now reproduced with ubuntu-19.10 kernel 5.3.x i
> installed the debug symbols and ran decodestacktrace.sh from kernel
> sources which gives me:
>
> [   29.266954] nvme nvme0: new ctrl: NQN
> "nqn.2014-08.org.nvmexpress.discovery", addr 192.168.22.1:4420
> [   29.267477] nvme nvme0: Removing ctrl: NQN
> "nqn.2014-08.org.nvmexpress.discovery"
> [   29.285732] nvme nvme0: creating 1 I/O queues.
> [   29.286632] nvme nvme0: mapped 1/0 default/read queues.
> [   29.288565] nvme nvme0: new ctrl: NQN "nvmet-test", addr
> 192.168.22.1:4420
> [   29.293146] nvme0n1: detected capacity change from 0 to 1084227584
> [   39.196846] BUG: kernel NULL pointer dereference, address:
> 0000000000000008
> [   39.198524] #PF: supervisor read access in kernel mode
> [   39.199786] #PF: error_code(0x0000) - not-present page
> [   39.201198] PGD 0 P4D 0
> [   39.201849] Oops: 0000 [#1] SMP PTI
> [   39.202679] CPU: 0 PID: 223 Comm: kworker/0:1H Kdump: loaded Not
> tainted 5.3.0-24-generic #26-Ubuntu
> [   39.204830] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996),
> BIOS 0.0.0 02/06/2015
> [   39.207205] Workqueue: nvme_tcp_wq nvme_tcp_io_work [nvme_tcp]
> [   39.209005] RIP: 0010:nvme_tcp_io_work+0x341/0x7f0 nvme_tcp
> [ 39.210686] Code: 8b 87 98 00 00 00 83 f8 02 0f 85 34 fd ff ff 49 8b
> 47 28 4d 89 fe 48 89 45 a8 49 8b 46 78 49 8b 56 68 45 8b 66 34 45 2b
> 66 38 <8b> 58 08 8b 48 0c 4c 8b 28 48 29 d3 48 8d 34 11 4c 39 e3 48 89
> 75
> All code
> ========
>    0:   8b 87 98 00 00 00       mov    0x98(%rdi),%eax
>    6:   83 f8 02                cmp    $0x2,%eax
>    9:   0f 85 34 fd ff ff       jne    0xfffffffffffffd43
>    f:   49 8b 47 28             mov    0x28(%r15),%rax
>   13:   4d 89 fe                mov    %r15,%r14
>   16:   48 89 45 a8             mov    %rax,-0x58(%rbp)
>   1a:   49 8b 46 78             mov    0x78(%r14),%rax
>   1e:   49 8b 56 68             mov    0x68(%r14),%rdx
>   22:   45 8b 66 34             mov    0x34(%r14),%r12d
>   26:   45 2b 66 38             sub    0x38(%r14),%r12d
>   2a:*  8b 58 08                mov    0x8(%rax),%ebx           <--
> trapping instruction
>   2d:   8b 48 0c                mov    0xc(%rax),%ecx
>   30:   4c 8b 28                mov    (%rax),%r13
>   33:   48 29 d3                sub    %rdx,%rbx
>   36:   48 8d 34 11             lea    (%rcx,%rdx,1),%rsi
>   3a:   4c 39 e3                cmp    %r12,%rbx
>   3d:   48                      rex.W
>   3e:   89                      .byte 0x89
>   3f:   75                      .byte 0x75
>
> Code starting with the faulting instruction
> ===========================================
>    0:   8b 58 08                mov    0x8(%rax),%ebx
>    3:   8b 48 0c                mov    0xc(%rax),%ecx
>    6:   4c 8b 28                mov    (%rax),%r13
>    9:   48 29 d3                sub    %rdx,%rbx
>    c:   48 8d 34 11             lea    (%rcx,%rdx,1),%rsi
>   10:   4c 39 e3                cmp    %r12,%rbx
>   13:   48                      rex.W
>   14:   89                      .byte 0x89
>   15:   75                      .byte 0x75
> [   39.216464] RSP: 0018:ffffb0f8c0453dd8 EFLAGS: 00010206
> [   39.218053] RAX: 0000000000000000 RBX: 00000000b4e42801 RCX: 0000000000000000
> [   39.219803] RDX: 0000000000000000 RSI: 0000000000000011 RDI: ffff9dd8e6e49478
> [   39.221766] RBP: ffffb0f8c0453e60 R08: 0000000000001000 R09: 0000000002800809
> [   39.223635] R10: 0000000000000009 R11: 0000000000000000 R12: 0000000000001000
> [   39.226010] R13: 0000000000000048 R14: ffff9dd8e6e49418 R15: ffff9dd8e6e49418
> [   39.228992] FS:  0000000000000000(0000) GS:ffff9dd8ff600000(0000)
> knlGS:0000000000000000
> [   39.233660] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [   39.237863] CR2: 0000000000000008 CR3: 0000000067c6a005 CR4: 0000000000360ef0
> [   39.241807] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> [   39.244496] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
> [   39.246569] Call Trace:
> [   39.247272] process_one_work
> (/build/linux-4AS01l/linux-5.3.0/arch/x86/include/asm/jump_label.h:25
> /build/linux-4AS01l/linux-5.3.0/include/linux/jump_label.h:200
> /build/linux-4AS01l/linux-5.3.0/include/trace/events/workqueu
> e.h:114 /build/linux-4AS01l/linux-5.3.0/kernel/workqueue.c:2274)
> [   39.248361] worker_thread
> (/build/linux-4AS01l/linux-5.3.0/include/linux/compiler.h:199
> /build/linux-4AS01l/linux-5.3.0/include/linux/list.h:268
> /build/linux-4AS01l/linux-5.3.0/kernel/workqueue.c:2416)
> [   39.249364] kthread (/build/linux-4AS01l/linux-5.3.0/kernel/kthread.c:255)
> [   39.250243] ? process_one_work
> (/build/linux-4AS01l/linux-5.3.0/kernel/workqueue.c:2358)
> [   39.251485] ? kthread_park
> (/build/linux-4AS01l/linux-5.3.0/kernel/kthread.c:215)
> [   39.252474] ret_from_fork
> (/build/linux-4AS01l/linux-5.3.0/arch/x86/entry/entry_64.S:358)
> [   39.253476] Modules linked in: nvme_tcp nvme_fabrics nvme nvme_core
> xt_conntrack xt_MASQUERADE nf_conntrack_netlink nfnetlink xfrm_user
> xfrm_algo xt_addrtype iptable_filter iptable_nat nf_nat nf_conntrack
> nf_defrag_ipv6 nf_
> defrag_ipv4 libcrc32c bpfilter br_netfilter bridge stp llc aufs
> overlay intel_rapl_msr intel_rapl_common kvm_intel kvm irqbypass
> crct10dif_pclmul crc32_pclmul ghash_clmulni_intel aesni_intel
> aes_x86_64 crypto_simd cirrus nls_i
> so8859_1 cryptd glue_helper drm_kms_helper drm input_leds joydev
> fb_sys_fops serio_raw syscopyarea sysfillrect sysimgblt mac_hid
> qemu_fw_cfg bonding sch_fq_codel ipmi_watchdog ipmi_devintf
> ipmi_msghandler virtio_rng ip_tables
> x_tables autofs4 psmouse virtio_net net_failover failover ahci libahci
> i2c_piix4 pata_acpi floppy
> [   39.269809] CR2: 0000000000000008
>
> greetings
> Stefan
>
> On Fri, Dec 27, 2019 at 8:54 AM Stefan Majer <stefan.majer@gmail.com> wrote:
> >
> > Hi,
> >
> > no problem, i am also on vacation.
> >
> > the issue is not reproducible in a pure bare metal environment, target
> > and host are physical machines.
> > The environment where it happens both machines are kvm based.
> >
> > I first have to figure out howto gdb on the kernel crash, thats not my
> > daily jobs, so please be patient.
> >
> > Greetings
> > Stefan
> >
> > On Fri, Dec 27, 2019 at 8:49 AM sagi grimberg <sagi@grimberg.me> wrote:
> > >
> > > Hey,
> > >
> > > On vacation so not able to take a look right now, but can you provide a line info from gdb on the RIP line?
> > >
> > > Also, did you say that the issue is not reproducible when the host is on bare metal but only on kvm? ( You said the target, but I'm asking about the host).
> > >
> > > On Thu, Dec 26, 2019, 23:18 Stefan Majer <stefan.majer@gmail.com> wrote:
> > >>
> > >> Hi,
> > >>
> > >> i have to add that doing the same on bare metal does work without any problems.
> > >> I suspect that this is probably caused by the fact that in the above
> > >> example my target is a qemu-kvm machine with a emulated nvme device.
> > >> Greetings
> > >> Stefan
> > >>
> > >> On Thu, Dec 26, 2019 at 6:47 PM Keith Busch <kbusch@kernel.org> wrote:
> > >> >
> > >> > Adding Sagi.
> > >> >
> > >> > On Wed, Dec 25, 2019 at 11:06:17AM +0100, Stefan Majer wrote:
> > >> > > Hi,
> > >> > >
> > >> > > im trying to setup a nvme-over-tcp test environment with a qemu-kvm
> > >> > > based nvmet-tcp target based on ubuntu-19.10 and a ubuntu-19.10 host
> > >> > > with kernel 5.4.6 installed. Kernel was taken from
> > >> > > https://kernel.ubuntu.com/~kernel-ppa/mainline/v5.4.6/ . Same Panic
> > >> > > occurs with ubuntu 19.10 kernel 5.3.x
> > >> > >
> > >> > > After setup the target i can discover and connect the exported nvme
> > >> > > device on the host with:
> > >> > > modprobe nvme
> > >> > > modprobe nvme-tcp
> > >> > > nvme discover -t tcp -a 192.168.22.1 -s 4420
> > >> > > nvme connect -t tcp -n nvmet-test -a 192.168.22.1 -s 4420
> > >> > >
> > >> > > No errors so far, but when i try to format the device with:
> > >> > >
> > >> > > mkfs.ext4 /dev/nvme0n1
> > >> > >
> > >> > > The kernel panics with:
> > >> > > Writing inode tables:
> > >> > > [  692.651243] BUG: kernel NULL pointer dereference, address: 0000000000000008
> > >> > > [  692.653158] #PF: supervisor read access in kernel mode
> > >> > > [  692.653922] #PF: error_code(0x0000) - not-present page
> > >> > > [  692.653922] PGD 0 P4D 0
> > >> > > [  692.653922] Oops: 0000 [#1] SMP PTI
> > >> > > [  692.653922] CPU: 0 PID: 224 Comm: kworker/0:1H Not tainted
> > >> > > 5.4.6-050406-generic #201912211140
> > >> > > [  692.653922] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996),
> > >> > > BIOS 0.0.0 02/06/2015
> > >> > > [  692.653922] Workqueue: nvme_tcp_wq nvme_tcp_io_work [nvme_tcp]
> > >> > > [  692.653922] RIP: 0010:nvme_tcp_io_work+0x308/0x790 [nvme_tcp]
> > >> > > [  692.653922] Code: 8b 86 98 00 00 00 83 f8 02 0f 85 6d fd ff ff 49
> > >> > > 8b 46 28 4d 89 f7 48 89 45 a8 49 8b 47 78 49 8b 57 68 45 8b 67 34 45
> > >> > > 2b 67 38 <8b> 58 08 8b 48 0c 4c 8b 28 48 29 d3 48 8d 34 11 4c 39 e3 48
> > >> > > 89 75
> > >> > > [  692.653922] RSP: 0018:ffffa49a00447dd8 EFLAGS: 00010206
> > >> > > [  692.653922] RAX: 0000000000000000 RBX: 0000000077bd3601 RCX: 0000000000000000
> > >> > > [  692.653922] RDX: 0000000000000000 RSI: 0000000000000011 RDI: ffff9376781c0500
> > >> > > [  692.653922] RBP: ffffa49a00447e60 R08: 0000000000001000 R09: 0000000005000809
> > >> > > [  692.653922] R10: 0000000000000009 R11: 0000000000000000 R12: 0000000000001000
> > >> > > [  692.653922] R13: 0000000000000048 R14: ffff9376781c04a0 R15: ffff9376781c04a0
> > >> > > [  692.653922] FS:  0000000000000000(0000) GS:ffff93767f600000(0000)
> > >> > > knlGS:0000000000000000
> > >> > > [  692.653922] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> > >> > > [  692.653922] CR2: 0000000000000008 CR3: 000000007b488003 CR4: 0000000000360ef0
> > >> > > [  692.653922] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> > >> > > [  692.653922] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
> > >> > > [  692.653922] Call Trace:
> > >> > > [  692.653922]  process_one_work+0x1ec/0x3a0
> > >> > > [  692.653922]  worker_thread+0x4d/0x400
> > >> > > [  692.653922]  kthread+0x104/0x140
> > >> > > [  692.653922]  ? process_one_work+0x3a0/0x3a0
> > >> > > [  692.653922]  ? kthread_park+0x90/0x90
> > >> > > [  692.653922]  ret_from_fork+0x35/0x40
> > >> > > [  692.653922] Modules linked in: binfmt_misc nvme_tcp nvme_fabrics
> > >> > > nvme nvme_core xt_conntrack xt_MASQUERADE nf_conntrack_netlink
> > >> > > nfnetlink xfrm_user xfrm_algo xt_addrtype iptable_filter iptable_nat
> > >> > > nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 libcrc32c bpfilter
> > >> > > br_netfilter bridge stp llc overlay intel_rapl_msr intel_rapl_common
> > >> > > kvm_intel kvm irqbypass crct10dif_pclmul crc32_pclmul
> > >> > > ghash_clmulni_intel aesni_intel nls_iso8859_1 crypto_simd cryptd
> > >> > > cirrus glue_helper drm_kms_helper drm input_leds fb_sys_fops joydev
> > >> > > serio_raw syscopyarea sysfillrect sysimgblt mac_hid qemu_fw_cfg
> > >> > > bonding sch_fq_codel ipmi_watchdog ipmi_devintf ipmi_msghandler
> > >> > > virtio_rng ip_tables x_tables autofs4 ahci psmouse virtio_net
> > >> > > net_failover failover libahci i2c_piix4 pata_acpi floppy
> > >> > > [  692.653922] CR2: 0000000000000008
> > >> > > [  692.653922] ---[ end trace d688c2c182feef87 ]---
> > >> > > [  692.653922] RIP: 0010:nvme_tcp_io_work+0x308/0x790 [nvme_tcp]
> > >> > > [  692.653922] Code: 8b 86 98 00 00 00 83 f8 02 0f 85 6d fd ff ff 49
> > >> > > 8b 46 28 4d 89 f7 48 89 45 a8 49 8b 47 78 49 8b 57 68 45 8b 67 34 45
> > >> > > 2b 67 38 <8b> 58 08 8b 48 0c 4c 8b 28 48 29 d3 48 8d 34 11 4c 39 e3 48
> > >> > > 89 75
> > >> > > [  692.653922] RSP: 0018:ffffa49a00447dd8 EFLAGS: 00010206
> > >> > > [  692.653922] RAX: 0000000000000000 RBX: 0000000077bd3601 RCX: 0000000000000000
> > >> > > [  692.653922] RDX: 0000000000000000 RSI: 0000000000000011 RDI: ffff9376781c0500
> > >> > > [  692.653922] RBP: ffffa49a00447e60 R08: 0000000000001000 R09: 0000000005000809
> > >> > > [  692.653922] R10: 0000000000000009 R11: 0000000000000000 R12: 0000000000001000
> > >> > > [  692.653922] R13: 0000000000000048 R14: ffff9376781c04a0 R15: ffff9376781c04a0
> > >> > > [  692.653922] FS:  0000000000000000(0000) GS:ffff93767f600000(0000)
> > >> > > knlGS:0000000000000000
> > >> > > [  692.653922] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> > >> > > [  692.653922] CR2: 0000000000000008 CR3: 000000007b488003 CR4: 0000000000360ef0
> > >> > > [  692.653922] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> > >> > > [  692.653922] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
> > >> > >
> > >> > >
> > >> > > Any help appreciated.
> > >> > >
> > >> > > Greetings
> > >> > >
> > >> > > --
> > >> > > Stefan Majer
> > >>
> > >>
> > >>
> > >> --
> > >> Stefan Majer
> >
> >
> >
> > --
> > Stefan Majer
>
>
>
> --
> Stefan Majer



-- 
Stefan Majer

_______________________________________________
linux-nvme mailing list
linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme

  reply	other threads:[~2019-12-28 17:53 UTC|newest]

Thread overview: 10+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-12-25 10:06 null pointer dereference in nvme_tcp_io_work Stefan Majer
2019-12-26 17:47 ` Keith Busch
2019-12-27  7:18   ` Stefan Majer
     [not found]     ` <CAB5Wxwco3KD1e_nRGQ_mWAMa_2d-wP2-1Aao4ZXtDeVgFQQM_w@mail.gmail.com>
2019-12-27  7:54       ` Stefan Majer
2019-12-28 17:49         ` Stefan Majer
2019-12-28 17:53           ` Stefan Majer [this message]
2020-01-07 15:41             ` Stefan Majer
2020-01-07 16:48               ` Nadolski, Edmund
2020-01-15 20:03               ` Sagi Grimberg
2020-01-16  6:28                 ` Stefan Majer

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=CADdPHGt+vLDp6hx0u3nabW7s6Ut11Jzbb4gx2NRD95zu3H9mvQ@mail.gmail.com \
    --to=stefan.majer@gmail.com \
    --cc=kbusch@kernel.org \
    --cc=linux-nvme@lists.infradead.org \
    --cc=sagi@grimberg.me \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.