All of lore.kernel.org
 help / color / mirror / Atom feed
From: "Marek Marczykowski-Górecki" <marmarek@invisiblethingslab.com>
To: "Roger Pau Monné" <roger.pau@citrix.com>,
	"Juergen Gross" <jgross@suse.com>
Cc: Sagi Grimberg <sagi@grimberg.me>,
	linux-nvme@lists.infradead.org, Jens Axboe <axboe@fb.com>,
	Keith Busch <kbusch@kernel.org>,
	xen-devel <xen-devel@lists.xenproject.org>,
	Christoph Hellwig <hch@lst.de>
Subject: Re: GPF on 0xdead000000000100 in nvme_map_data - Linux 5.9.9
Date: Fri, 4 Dec 2020 12:08:47 +0100	[thread overview]
Message-ID: <20201204110847.GU201140@mail-itl> (raw)
In-Reply-To: <20201202000642.GJ201140@mail-itl>


[-- Attachment #1.1: Type: text/plain, Size: 6993 bytes --]

On Wed, Dec 02, 2020 at 01:06:46AM +0100, Marek Marczykowski-Górecki wrote:
> On Tue, Dec 01, 2020 at 01:40:10AM +0900, Keith Busch wrote:
> > On Sun, Nov 29, 2020 at 04:56:39AM +0100, Marek Marczykowski-Górecki wrote:
> > > I can reliably hit kernel panic in nvme_map_data() which looks like the
> > > one below. It happens on Linux 5.9.9, while 5.4.75 works fine. I haven't
> > > tried other version on this hardware. Linux is running as Xen
> > > PV dom0, on top of nvme there is LUKS and then LVM with thin
> > > provisioning. The crash happens reliably when starting a Xen domU (which
> > > uses one of thin provisioned LVM volumes as its disk). But booting dom0
> > > works fine (even though it is using the same disk setup for its root
> > > filesystem).
> > > 
> > > I did a bit of debugging and found it's about this part:
> > > 
> > > drivers/nvme/host/pci.c:
> > >  800 static blk_status_t nvme_map_data(struct nvme_dev *dev, struct request *req,
> > >  801         struct nvme_command *cmnd)
> > >  802 {
> > >  803     struct nvme_iod *iod = blk_mq_rq_to_pdu(req);
> > >  804     blk_status_t ret = BLK_STS_RESOURCE;
> > >  805     int nr_mapped;
> > >  806 
> > >  807     if (blk_rq_nr_phys_segments(req) == 1) {
> > >  808         struct bio_vec bv = req_bvec(req);
> > >  809 
> > >  810         if (!is_pci_p2pdma_page(bv.bv_page)) {
> > > 
> > > Here, bv.bv_page->pgmap is LIST_POISON1, while page_zonenum(bv.bv_page)
> > > says ZONE_DEVICE. So, is_pci_p2pdma_page() crashes on accessing
> > > bv.bv_page->pgmap->type.
> > 
> > Something sounds off. I thought all ZONE_DEVICE pages require a pgmap
> > because that's what holds a references to the device's live-ness. What
> > are you allocating this memory from that makes ZONE_DEVICE true without
> > a pgmap?
> 
> Well, I allocate anything myself. I just try to start the system with
> unmodified Linux 5.9.9 and NVME drive...
> I didn't managed to find where this page is allocated, nor where it gets
> broken. I _suspect_ it gets allocated as ZONE_DEVICE page and then gets
> released as ZONE_NORMAL which sets another part of the union to
> LIST_POISON1. But I have absolutely no data to confirm/deny this theory.

I've bisected this (thanks to a bit of scripting, PXE and git bisect
run, it was long, but fairly painless) and identified this commit as the
culprit: 

commit 9e2369c06c8a181478039258a4598c1ddd2cadfa
Author: Roger Pau Monne <roger.pau@citrix.com>
Date:   Tue Sep 1 10:33:26 2020 +0200

    xen: add helpers to allocate unpopulated memory
    
I'm adding relevant people and xen-devel to the thread.
For completeness, here is the original crash message:

general protection fault, probably for non-canonical address 0xdead000000000100: 0000 [#1] SMP NOPTI
CPU: 1 PID: 134 Comm: kworker/u12:2 Not tainted 5.9.9-1.qubes.x86_64 #1
Hardware name: LENOVO 20M9CTO1WW/20M9CTO1WW, BIOS N2CET50W (1.33 ) 01/15/2020
Workqueue: dm-thin do_worker [dm_thin_pool]
RIP: e030:nvme_map_data+0x300/0x3a0 [nvme]
Code: b8 fe ff ff e9 a8 fe ff ff 4c 8b 56 68 8b 5e 70 8b 76 74 49 8b 02 48 c1 e8 33 83 e0 07 83 f8 04 0f 85 f2 fe ff ff 49 8b 42 08 <83> b8 d0 00 00 00 04 0f 85 e1 fe ff ff e9 38 fd ff ff 8b 55 70 be
RSP: e02b:ffffc900010e7ad8 EFLAGS: 00010246
RAX: dead000000000100 RBX: 0000000000001000 RCX: ffff8881a58f5000
RDX: 0000000000001000 RSI: 0000000000000000 RDI: ffff8881a679e000
RBP: ffff8881a5ef4c80 R08: ffff8881a5ef4c80 R09: 0000000000000002
R10: ffffea0003dfff40 R11: 0000000000000008 R12: ffff8881a679e000
R13: ffffc900010e7b20 R14: ffff8881a70b5980 R15: ffff8881a679e000
FS:  0000000000000000(0000) GS:ffff8881b5440000(0000) knlGS:0000000000000000
CS:  e030 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 0000000001d64408 CR3: 00000001aa2c0000 CR4: 0000000000050660
Call Trace:
 nvme_queue_rq+0xa7/0x1a0 [nvme]
 __blk_mq_try_issue_directly+0x11d/0x1e0
 ? add_wait_queue_exclusive+0x70/0x70
 blk_mq_try_issue_directly+0x35/0xc0l[
 blk_mq_submit_bio+0x58f/0x660
 __submit_bio_noacct+0x300/0x330
 process_shared_bio+0x126/0x1b0 [dm_thin_pool]
 process_cell+0x226/0x280 [dm_thin_pool]
 process_thin_deferred_cells+0x185/0x320 [dm_thin_pool]
 process_deferred_bios+0xa4/0x2a0 [dm_thin_pool]UX
 do_worker+0xcc/0x130 [dm_thin_pool]
 process_one_work+0x1b4/0x370
 worker_thread+0x4c/0x310
 ? process_one_work+0x370/0x370
 kthread+0x11b/0x140
 ? __kthread_bind_mask+0x60/0x60<
 ret_from_fork+0x22/0x30
Modules linked in: loop snd_seq_dummy snd_hrtimer nf_tables nfnetlink vfat fat snd_sof_pci snd_sof_intel_byt snd_sof_intel_ipc snd_sof_intel_hda_common snd_soc_hdac_hda snd_sof_xtensa_dsp snd_sof_intel_hda snd_sof snd_soc_skl
snd_soc_sst_
ipc snd_soc_sst_dsp snd_hda_ext_core snd_soc_acpi_intel_match snd_soc_acpi snd_soc_core snd_compress ac97_bus snd_pcm_dmaengine elan_i2c snd_hda_codec_hdmi mei_hdcp iTCO_wdt intel_powerclamp intel_pmc_bxt ee1004 intel_rapl_msr
iTCO_vendor
_support joydev pcspkr intel_wmi_thunderbolt wmi_bmof thunderbolt ucsi_acpi idma64 typec_ucsi snd_hda_codec_realtek typec snd_hda_codec_generic snd_hda_intel snd_intel_dspcfg snd_hda_codec thinkpad_acpi snd_hda_core ledtrig_audio
int3403_
thermal snd_hwdep snd_seq snd_seq_device snd_pcm iwlwifi snd_timer processor_thermal_device mei_me cfg80211 intel_rapl_common snd e1000e mei int3400_thermal int340x_thermal_zone i2c_i801 acpi_thermal_rel soundcore intel_soc_dts_iosf
i2c_s
mbus rfkill intel_pch_thermal xenfs
 ip_tables dm_thin_pool dm_persistent_data dm_bio_prison dm_crypt nouveau rtsx_pci_sdmmc mmc_core mxm_wmi crct10dif_pclmul ttm crc32_pclmul crc32c_intel i915 ghash_clmulni_intel i2c_algo_bit serio_raw nvme drm_kms_helper cec xhci_pci
nvme
_core rtsx_pci xhci_pci_renesas drm xhci_hcd wmi video pinctrl_cannonlake pinctrl_intel xen_privcmd xen_pciback xen_blkback xen_gntalloc xen_gntdev xen_evtchn uinput
---[ end trace f8d47e4aa6724df4 ]---
RIP: e030:nvme_map_data+0x300/0x3a0 [nvme]
Code: b8 fe ff ff e9 a8 fe ff ff 4c 8b 56 68 8b 5e 70 8b 76 74 49 8b 02 48 c1 e8 33 83 e0 07 83 f8 04 0f 85 f2 fe ff ff 49 8b 42 08 <83> b8 d0 00 00 00 04 0f 85 e1 fe ff ff e9 38 fd ff ff 8b 55 70 be
RSP: e02b:ffffc900010e7ad8 EFLAGS: 00010246
RAX: dead000000000100 RBX: 0000000000001000 RCX: ffff8881a58f5000
RDX: 0000000000001000 RSI: 0000000000000000 RDI: ffff8881a679e000
RBP: ffff8881a5ef4c80 R08: ffff8881a5ef4c80 R09: 0000000000000002
R10: ffffea0003dfff40 R11: 0000000000000008 R12: ffff8881a679e000
R13: ffffc900010e7b20 R14: ffff8881a70b5980 R15: ffff8881a679e000
FS:  0000000000000000(0000) GS:ffff8881b5440000(0000) knlGS:0000000000000000
CS:  e030 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 0000000001d64408 CR3: 00000001aa2c0000 CR4: 0000000000050660
Kernel panic - not syncing: Fatal exception
Kernel Offset: disabled


-- 
Best Regards,
Marek Marczykowski-Górecki
Invisible Things Lab
A: Because it messes up the order in which people normally read text.
Q: Why is top-posting such a bad thing?

[-- Attachment #1.2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

[-- Attachment #2: Type: text/plain, Size: 158 bytes --]

_______________________________________________
Linux-nvme mailing list
Linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme

WARNING: multiple messages have this Message-ID (diff)
From: "Marek Marczykowski-Górecki" <marmarek@invisiblethingslab.com>
To: "Roger Pau Monné" <roger.pau@citrix.com>,
	"Juergen Gross" <jgross@suse.com>
Cc: xen-devel <xen-devel@lists.xenproject.org>,
	Keith Busch <kbusch@kernel.org>, Jens Axboe <axboe@fb.com>,
	Christoph Hellwig <hch@lst.de>, Sagi Grimberg <sagi@grimberg.me>,
	linux-nvme@lists.infradead.org
Subject: Re: GPF on 0xdead000000000100 in nvme_map_data - Linux 5.9.9
Date: Fri, 4 Dec 2020 12:08:47 +0100	[thread overview]
Message-ID: <20201204110847.GU201140@mail-itl> (raw)
In-Reply-To: <20201202000642.GJ201140@mail-itl>

[-- Attachment #1: Type: text/plain, Size: 6993 bytes --]

On Wed, Dec 02, 2020 at 01:06:46AM +0100, Marek Marczykowski-Górecki wrote:
> On Tue, Dec 01, 2020 at 01:40:10AM +0900, Keith Busch wrote:
> > On Sun, Nov 29, 2020 at 04:56:39AM +0100, Marek Marczykowski-Górecki wrote:
> > > I can reliably hit kernel panic in nvme_map_data() which looks like the
> > > one below. It happens on Linux 5.9.9, while 5.4.75 works fine. I haven't
> > > tried other version on this hardware. Linux is running as Xen
> > > PV dom0, on top of nvme there is LUKS and then LVM with thin
> > > provisioning. The crash happens reliably when starting a Xen domU (which
> > > uses one of thin provisioned LVM volumes as its disk). But booting dom0
> > > works fine (even though it is using the same disk setup for its root
> > > filesystem).
> > > 
> > > I did a bit of debugging and found it's about this part:
> > > 
> > > drivers/nvme/host/pci.c:
> > >  800 static blk_status_t nvme_map_data(struct nvme_dev *dev, struct request *req,
> > >  801         struct nvme_command *cmnd)
> > >  802 {
> > >  803     struct nvme_iod *iod = blk_mq_rq_to_pdu(req);
> > >  804     blk_status_t ret = BLK_STS_RESOURCE;
> > >  805     int nr_mapped;
> > >  806 
> > >  807     if (blk_rq_nr_phys_segments(req) == 1) {
> > >  808         struct bio_vec bv = req_bvec(req);
> > >  809 
> > >  810         if (!is_pci_p2pdma_page(bv.bv_page)) {
> > > 
> > > Here, bv.bv_page->pgmap is LIST_POISON1, while page_zonenum(bv.bv_page)
> > > says ZONE_DEVICE. So, is_pci_p2pdma_page() crashes on accessing
> > > bv.bv_page->pgmap->type.
> > 
> > Something sounds off. I thought all ZONE_DEVICE pages require a pgmap
> > because that's what holds a references to the device's live-ness. What
> > are you allocating this memory from that makes ZONE_DEVICE true without
> > a pgmap?
> 
> Well, I allocate anything myself. I just try to start the system with
> unmodified Linux 5.9.9 and NVME drive...
> I didn't managed to find where this page is allocated, nor where it gets
> broken. I _suspect_ it gets allocated as ZONE_DEVICE page and then gets
> released as ZONE_NORMAL which sets another part of the union to
> LIST_POISON1. But I have absolutely no data to confirm/deny this theory.

I've bisected this (thanks to a bit of scripting, PXE and git bisect
run, it was long, but fairly painless) and identified this commit as the
culprit: 

commit 9e2369c06c8a181478039258a4598c1ddd2cadfa
Author: Roger Pau Monne <roger.pau@citrix.com>
Date:   Tue Sep 1 10:33:26 2020 +0200

    xen: add helpers to allocate unpopulated memory
    
I'm adding relevant people and xen-devel to the thread.
For completeness, here is the original crash message:

general protection fault, probably for non-canonical address 0xdead000000000100: 0000 [#1] SMP NOPTI
CPU: 1 PID: 134 Comm: kworker/u12:2 Not tainted 5.9.9-1.qubes.x86_64 #1
Hardware name: LENOVO 20M9CTO1WW/20M9CTO1WW, BIOS N2CET50W (1.33 ) 01/15/2020
Workqueue: dm-thin do_worker [dm_thin_pool]
RIP: e030:nvme_map_data+0x300/0x3a0 [nvme]
Code: b8 fe ff ff e9 a8 fe ff ff 4c 8b 56 68 8b 5e 70 8b 76 74 49 8b 02 48 c1 e8 33 83 e0 07 83 f8 04 0f 85 f2 fe ff ff 49 8b 42 08 <83> b8 d0 00 00 00 04 0f 85 e1 fe ff ff e9 38 fd ff ff 8b 55 70 be
RSP: e02b:ffffc900010e7ad8 EFLAGS: 00010246
RAX: dead000000000100 RBX: 0000000000001000 RCX: ffff8881a58f5000
RDX: 0000000000001000 RSI: 0000000000000000 RDI: ffff8881a679e000
RBP: ffff8881a5ef4c80 R08: ffff8881a5ef4c80 R09: 0000000000000002
R10: ffffea0003dfff40 R11: 0000000000000008 R12: ffff8881a679e000
R13: ffffc900010e7b20 R14: ffff8881a70b5980 R15: ffff8881a679e000
FS:  0000000000000000(0000) GS:ffff8881b5440000(0000) knlGS:0000000000000000
CS:  e030 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 0000000001d64408 CR3: 00000001aa2c0000 CR4: 0000000000050660
Call Trace:
 nvme_queue_rq+0xa7/0x1a0 [nvme]
 __blk_mq_try_issue_directly+0x11d/0x1e0
 ? add_wait_queue_exclusive+0x70/0x70
 blk_mq_try_issue_directly+0x35/0xc0l[
 blk_mq_submit_bio+0x58f/0x660
 __submit_bio_noacct+0x300/0x330
 process_shared_bio+0x126/0x1b0 [dm_thin_pool]
 process_cell+0x226/0x280 [dm_thin_pool]
 process_thin_deferred_cells+0x185/0x320 [dm_thin_pool]
 process_deferred_bios+0xa4/0x2a0 [dm_thin_pool]UX
 do_worker+0xcc/0x130 [dm_thin_pool]
 process_one_work+0x1b4/0x370
 worker_thread+0x4c/0x310
 ? process_one_work+0x370/0x370
 kthread+0x11b/0x140
 ? __kthread_bind_mask+0x60/0x60<
 ret_from_fork+0x22/0x30
Modules linked in: loop snd_seq_dummy snd_hrtimer nf_tables nfnetlink vfat fat snd_sof_pci snd_sof_intel_byt snd_sof_intel_ipc snd_sof_intel_hda_common snd_soc_hdac_hda snd_sof_xtensa_dsp snd_sof_intel_hda snd_sof snd_soc_skl
snd_soc_sst_
ipc snd_soc_sst_dsp snd_hda_ext_core snd_soc_acpi_intel_match snd_soc_acpi snd_soc_core snd_compress ac97_bus snd_pcm_dmaengine elan_i2c snd_hda_codec_hdmi mei_hdcp iTCO_wdt intel_powerclamp intel_pmc_bxt ee1004 intel_rapl_msr
iTCO_vendor
_support joydev pcspkr intel_wmi_thunderbolt wmi_bmof thunderbolt ucsi_acpi idma64 typec_ucsi snd_hda_codec_realtek typec snd_hda_codec_generic snd_hda_intel snd_intel_dspcfg snd_hda_codec thinkpad_acpi snd_hda_core ledtrig_audio
int3403_
thermal snd_hwdep snd_seq snd_seq_device snd_pcm iwlwifi snd_timer processor_thermal_device mei_me cfg80211 intel_rapl_common snd e1000e mei int3400_thermal int340x_thermal_zone i2c_i801 acpi_thermal_rel soundcore intel_soc_dts_iosf
i2c_s
mbus rfkill intel_pch_thermal xenfs
 ip_tables dm_thin_pool dm_persistent_data dm_bio_prison dm_crypt nouveau rtsx_pci_sdmmc mmc_core mxm_wmi crct10dif_pclmul ttm crc32_pclmul crc32c_intel i915 ghash_clmulni_intel i2c_algo_bit serio_raw nvme drm_kms_helper cec xhci_pci
nvme
_core rtsx_pci xhci_pci_renesas drm xhci_hcd wmi video pinctrl_cannonlake pinctrl_intel xen_privcmd xen_pciback xen_blkback xen_gntalloc xen_gntdev xen_evtchn uinput
---[ end trace f8d47e4aa6724df4 ]---
RIP: e030:nvme_map_data+0x300/0x3a0 [nvme]
Code: b8 fe ff ff e9 a8 fe ff ff 4c 8b 56 68 8b 5e 70 8b 76 74 49 8b 02 48 c1 e8 33 83 e0 07 83 f8 04 0f 85 f2 fe ff ff 49 8b 42 08 <83> b8 d0 00 00 00 04 0f 85 e1 fe ff ff e9 38 fd ff ff 8b 55 70 be
RSP: e02b:ffffc900010e7ad8 EFLAGS: 00010246
RAX: dead000000000100 RBX: 0000000000001000 RCX: ffff8881a58f5000
RDX: 0000000000001000 RSI: 0000000000000000 RDI: ffff8881a679e000
RBP: ffff8881a5ef4c80 R08: ffff8881a5ef4c80 R09: 0000000000000002
R10: ffffea0003dfff40 R11: 0000000000000008 R12: ffff8881a679e000
R13: ffffc900010e7b20 R14: ffff8881a70b5980 R15: ffff8881a679e000
FS:  0000000000000000(0000) GS:ffff8881b5440000(0000) knlGS:0000000000000000
CS:  e030 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 0000000001d64408 CR3: 00000001aa2c0000 CR4: 0000000000050660
Kernel panic - not syncing: Fatal exception
Kernel Offset: disabled


-- 
Best Regards,
Marek Marczykowski-Górecki
Invisible Things Lab
A: Because it messes up the order in which people normally read text.
Q: Why is top-posting such a bad thing?

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

  reply	other threads:[~2020-12-04 11:09 UTC|newest]

Thread overview: 25+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-11-29  3:56 GPF on 0xdead000000000100 in nvme_map_data - Linux 5.9.9 Marek Marczykowski-Górecki
2020-11-30 16:40 ` Keith Busch
2020-12-02  0:06   ` Marek Marczykowski-Górecki
2020-12-04 11:08     ` Marek Marczykowski-Górecki [this message]
2020-12-04 11:08       ` Marek Marczykowski-Górecki
2020-12-04 12:08       ` Christoph Hellwig
2020-12-04 12:08         ` Christoph Hellwig
2020-12-04 12:20         ` Marek Marczykowski-Górecki
2020-12-04 12:20           ` Marek Marczykowski-Górecki
2020-12-05  8:28           ` Roger Pau Monné
2020-12-05  8:28             ` Roger Pau Monné
2020-12-06 16:47             ` Jason Andryuk
2020-12-06 16:47               ` Jason Andryuk
2020-12-07  8:53               ` Jürgen Groß
2020-12-07  8:53                 ` Jürgen Groß
2020-12-07  9:02                 ` Jürgen Groß
2020-12-07  9:02                   ` Jürgen Groß
2020-12-07 10:55               ` Jürgen Groß
2020-12-07 10:55                 ` Jürgen Groß
2020-12-07 11:48                 ` Marek Marczykowski-Górecki
2020-12-07 11:48                   ` Marek Marczykowski-Górecki
2020-12-07 12:00                   ` Jürgen Groß
2020-12-07 12:00                     ` Jürgen Groß
2020-12-07 13:00                     ` Marek Marczykowski-Górecki
2020-12-07 13:00                       ` Marek Marczykowski-Górecki

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20201204110847.GU201140@mail-itl \
    --to=marmarek@invisiblethingslab.com \
    --cc=axboe@fb.com \
    --cc=hch@lst.de \
    --cc=jgross@suse.com \
    --cc=kbusch@kernel.org \
    --cc=linux-nvme@lists.infradead.org \
    --cc=roger.pau@citrix.com \
    --cc=sagi@grimberg.me \
    --cc=xen-devel@lists.xenproject.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.