All of lore.kernel.org
 help / color / mirror / Atom feed
From: "Marek Marczykowski-Górecki" <marmarek@invisiblethingslab.com>
To: "Jürgen Groß" <jgross@suse.com>
Cc: "Sagi Grimberg" <sagi@grimberg.me>,
	"Jason Andryuk" <jandryuk@gmail.com>,
	linux-nvme@lists.infradead.org, "Jens Axboe" <axboe@fb.com>,
	"Keith Busch" <kbusch@kernel.org>,
	xen-devel <xen-devel@lists.xenproject.org>,
	"Christoph Hellwig" <hch@lst.de>,
	"Roger Pau Monné" <roger.pau@citrix.com>
Subject: Re: GPF on 0xdead000000000100 in nvme_map_data - Linux 5.9.9
Date: Mon, 7 Dec 2020 12:48:05 +0100	[thread overview]
Message-ID: <20201207114805.GF1244@mail-itl> (raw)
In-Reply-To: <293433c5-d23b-63e7-d607-9d24f06c46b4@suse.com>


[-- Attachment #1.1: Type: text/plain, Size: 5728 bytes --]

On Mon, Dec 07, 2020 at 11:55:01AM +0100, Jürgen Groß wrote:
> Marek,
> 
> On 06.12.20 17:47, Jason Andryuk wrote:
> > On Sat, Dec 5, 2020 at 3:29 AM Roger Pau Monné <roger.pau@citrix.com> wrote:
> > > 
> > > On Fri, Dec 04, 2020 at 01:20:54PM +0100, Marek Marczykowski-Górecki wrote:
> > > > On Fri, Dec 04, 2020 at 01:08:03PM +0100, Christoph Hellwig wrote:
> > > > > On Fri, Dec 04, 2020 at 12:08:47PM +0100, Marek Marczykowski-Górecki wrote:
> > > > > > culprit:
> > > > > > 
> > > > > > commit 9e2369c06c8a181478039258a4598c1ddd2cadfa
> > > > > > Author: Roger Pau Monne <roger.pau@citrix.com>
> > > > > > Date:   Tue Sep 1 10:33:26 2020 +0200
> > > > > > 
> > > > > >      xen: add helpers to allocate unpopulated memory
> > > > > > 
> > > > > > I'm adding relevant people and xen-devel to the thread.
> > > > > > For completeness, here is the original crash message:
> > > > > 
> > > > > That commit definitively adds a new ZONE_DEVICE user, so it does look
> > > > > related.  But you are not running on Xen, are you?
> > > > 
> > > > I am. It is Xen dom0.
> > > 
> > > I'm afraid I'm on leave and won't be able to look into this until the
> > > beginning of January. I would guess it's some kind of bad
> > > interaction between blkback and NVMe drivers both using ZONE_DEVICE?
> > > 
> > > Maybe the best is to revert this change and I will look into it when
> > > I get back, unless someone is willing to debug this further.
> > 
> > Looking at commit 9e2369c06c8a and xen-blkback put_free_pages() , they
> > both use page->lru which is part of the anonymous union shared with
> > *pgmap.  That matches Marek's suspicion that the ZONE_DEVICE memory is
> > being used as ZONE_NORMAL.
> > 
> > memmap_init_zone_device() says:
> > * ZONE_DEVICE pages union ->lru with a ->pgmap back pointer
> > * and zone_device_data.  It is a bug if a ZONE_DEVICE page is
> > * ever freed or placed on a driver-private list.
> 
> Second try, now even tested to work on a test system (without NVMe).

It doesn't work for me:

[  526.023340] xen-blkback: backend/vbd/1/51712: using 2 queues, protocol 1 (x86_64-abi) persistent grants
[  526.030550] xen-blkback: backend/vbd/1/51728: using 2 queues, protocol 1 (x86_64-abi) persistent grants
[  526.034810] BUG: kernel NULL pointer dereference, address: 0000000000000010
[  526.034841] #PF: supervisor read access in kernel mode
[  526.034857] #PF: error_code(0x0000) - not-present page
[  526.034875] PGD 105428067 P4D 105428067 PUD 105b92067 PMD 0 
[  526.034896] Oops: 0000 [#1] SMP NOPTI
[  526.034909] CPU: 3 PID: 4007 Comm: 1.xvda-0 Tainted: G        W         5.10.0-rc6-1.qubes.x86_64+ #108
[  526.034933] Hardware name: LENOVO 20M9CTO1WW/20M9CTO1WW, BIOS N2CET50W (1.33 ) 01/15/2020
[  526.034974] RIP: e030:gnttab_page_cache_get+0x32/0x60
[  526.034990] Code: 89 f4 55 48 89 fd e8 4d e3 80 00 48 83 7d 08 00 48 89 c6 74 15 48 89 ef e8 5b e0 80 00 4c 89 e6 5d bf 01 00 00 00 41 5c eb 8e <48> 8b 04 25 10 00 00 00 48 89 ef 48 89 45 08 49 c7 04 24 00 00 00
[  526.035035] RSP: e02b:ffffc90003e27a40 EFLAGS: 00010046
[  526.035052] RAX: 0000000000000200 RBX: 0000000000000001 RCX: 0000000000000000
[  526.035072] RDX: 0000000000000001 RSI: 0000000000000200 RDI: ffff888104275518
[  526.035092] RBP: ffff888104275518 R08: 0000000000000000 R09: 0000000000000000
[  526.035113] R10: ffff888104275400 R11: 0000000000000000 R12: ffff888109b5d3a0
[  526.035133] R13: 0000000000000000 R14: 0000000000000000 R15: ffff888104275400
[  526.035159] FS:  0000000000000000(0000) GS:ffff8881b54c0000(0000) knlGS:0000000000000000
[  526.035194] CS:  10000e030 DS: 0000 ES: 0000 CR0: 0000000080050033
[  526.035214] CR2: 0000000000000010 CR3: 0000000103b5a000 CR4: 0000000000050660
[  526.035239] Call Trace:
[  526.035253]  xen_blkbk_map+0x131/0x5a0
[  526.035268]  dispatch_rw_block_io+0x42a/0x9c0
[  526.035284]  ? xen_mc_flush+0xcb/0x190
[  526.035298]  __do_block_io_op+0x314/0x630
[  526.035312]  xen_blkif_schedule+0x182/0x790
[  526.035327]  ? finish_wait+0x80/0x80
[  526.035340]  ? xen_blkif_be_int+0x30/0x30
[  526.035355]  kthread+0xfe/0x140
[  526.035371]  ? kthread_park+0x90/0x90
[  526.035385]  ret_from_fork+0x22/0x30
[  526.035398] Modules linked in:
[  526.035410] CR2: 0000000000000010
[  526.035440] ---[ end trace 431ea72658d96c9d ]---
[  526.176390] RIP: e030:gnttab_page_cache_get+0x32/0x60
[  526.176460] Code: 89 f4 55 48 89 fd e8 4d e3 80 00 48 83 7d 08 00 48 89 c6 74 15 48 89 ef e8 5b e0 80 00 4c 89 e6 5d bf 01 00 00 00 41 5c eb 8e <48> 8b 04 25 10 00 00 00 48 89 ef 48 89 45 08 49 c7 04 24 00 00 00
[  526.250734] RSP: e02b:ffffc90003e27a40 EFLAGS: 00010046
[  526.250751] RAX: 0000000000000200 RBX: 0000000000000001 RCX: 0000000000000000
[  526.250771] RDX: 0000000000000001 RSI: 0000000000000200 RDI: ffff888104275518
[  526.250790] RBP: ffff888104275518 R08: 0000000000000000 R09: 0000000000000000
[  526.250808] R10: ffff888104275400 R11: 0000000000000000 R12: ffff888109b5d3a0
[  526.250827] R13: 0000000000000000 R14: 0000000000000000 R15: ffff888104275400
[  526.250863] FS:  0000000000000000(0000) GS:ffff8881b54c0000(0000) knlGS:0000000000000000
[  526.250884] CS:  10000e030 DS: 0000 ES: 0000 CR0: 0000000080050033
[  526.250901] CR2: 0000000000000010 CR3: 0000000103b5a000 CR4: 0000000000050660
[  526.250924] Kernel panic - not syncing: Fatal exception
[  526.250972] Kernel Offset: disabled


This is 7059c2c00a2196865c2139083cbef47cd18109b6 with your patches on
top.

-- 
Best Regards,
Marek Marczykowski-Górecki
Invisible Things Lab
A: Because it messes up the order in which people normally read text.
Q: Why is top-posting such a bad thing?

[-- Attachment #1.2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

[-- Attachment #2: Type: text/plain, Size: 158 bytes --]

_______________________________________________
Linux-nvme mailing list
Linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme

WARNING: multiple messages have this Message-ID (diff)
From: "Marek Marczykowski-Górecki" <marmarek@invisiblethingslab.com>
To: "Jürgen Groß" <jgross@suse.com>
Cc: "Jason Andryuk" <jandryuk@gmail.com>,
	"Roger Pau Monné" <roger.pau@citrix.com>,
	"Christoph Hellwig" <hch@lst.de>,
	xen-devel <xen-devel@lists.xenproject.org>,
	"Keith Busch" <kbusch@kernel.org>, "Jens Axboe" <axboe@fb.com>,
	"Sagi Grimberg" <sagi@grimberg.me>,
	linux-nvme@lists.infradead.org
Subject: Re: GPF on 0xdead000000000100 in nvme_map_data - Linux 5.9.9
Date: Mon, 7 Dec 2020 12:48:05 +0100	[thread overview]
Message-ID: <20201207114805.GF1244@mail-itl> (raw)
In-Reply-To: <293433c5-d23b-63e7-d607-9d24f06c46b4@suse.com>

[-- Attachment #1: Type: text/plain, Size: 5728 bytes --]

On Mon, Dec 07, 2020 at 11:55:01AM +0100, Jürgen Groß wrote:
> Marek,
> 
> On 06.12.20 17:47, Jason Andryuk wrote:
> > On Sat, Dec 5, 2020 at 3:29 AM Roger Pau Monné <roger.pau@citrix.com> wrote:
> > > 
> > > On Fri, Dec 04, 2020 at 01:20:54PM +0100, Marek Marczykowski-Górecki wrote:
> > > > On Fri, Dec 04, 2020 at 01:08:03PM +0100, Christoph Hellwig wrote:
> > > > > On Fri, Dec 04, 2020 at 12:08:47PM +0100, Marek Marczykowski-Górecki wrote:
> > > > > > culprit:
> > > > > > 
> > > > > > commit 9e2369c06c8a181478039258a4598c1ddd2cadfa
> > > > > > Author: Roger Pau Monne <roger.pau@citrix.com>
> > > > > > Date:   Tue Sep 1 10:33:26 2020 +0200
> > > > > > 
> > > > > >      xen: add helpers to allocate unpopulated memory
> > > > > > 
> > > > > > I'm adding relevant people and xen-devel to the thread.
> > > > > > For completeness, here is the original crash message:
> > > > > 
> > > > > That commit definitively adds a new ZONE_DEVICE user, so it does look
> > > > > related.  But you are not running on Xen, are you?
> > > > 
> > > > I am. It is Xen dom0.
> > > 
> > > I'm afraid I'm on leave and won't be able to look into this until the
> > > beginning of January. I would guess it's some kind of bad
> > > interaction between blkback and NVMe drivers both using ZONE_DEVICE?
> > > 
> > > Maybe the best is to revert this change and I will look into it when
> > > I get back, unless someone is willing to debug this further.
> > 
> > Looking at commit 9e2369c06c8a and xen-blkback put_free_pages() , they
> > both use page->lru which is part of the anonymous union shared with
> > *pgmap.  That matches Marek's suspicion that the ZONE_DEVICE memory is
> > being used as ZONE_NORMAL.
> > 
> > memmap_init_zone_device() says:
> > * ZONE_DEVICE pages union ->lru with a ->pgmap back pointer
> > * and zone_device_data.  It is a bug if a ZONE_DEVICE page is
> > * ever freed or placed on a driver-private list.
> 
> Second try, now even tested to work on a test system (without NVMe).

It doesn't work for me:

[  526.023340] xen-blkback: backend/vbd/1/51712: using 2 queues, protocol 1 (x86_64-abi) persistent grants
[  526.030550] xen-blkback: backend/vbd/1/51728: using 2 queues, protocol 1 (x86_64-abi) persistent grants
[  526.034810] BUG: kernel NULL pointer dereference, address: 0000000000000010
[  526.034841] #PF: supervisor read access in kernel mode
[  526.034857] #PF: error_code(0x0000) - not-present page
[  526.034875] PGD 105428067 P4D 105428067 PUD 105b92067 PMD 0 
[  526.034896] Oops: 0000 [#1] SMP NOPTI
[  526.034909] CPU: 3 PID: 4007 Comm: 1.xvda-0 Tainted: G        W         5.10.0-rc6-1.qubes.x86_64+ #108
[  526.034933] Hardware name: LENOVO 20M9CTO1WW/20M9CTO1WW, BIOS N2CET50W (1.33 ) 01/15/2020
[  526.034974] RIP: e030:gnttab_page_cache_get+0x32/0x60
[  526.034990] Code: 89 f4 55 48 89 fd e8 4d e3 80 00 48 83 7d 08 00 48 89 c6 74 15 48 89 ef e8 5b e0 80 00 4c 89 e6 5d bf 01 00 00 00 41 5c eb 8e <48> 8b 04 25 10 00 00 00 48 89 ef 48 89 45 08 49 c7 04 24 00 00 00
[  526.035035] RSP: e02b:ffffc90003e27a40 EFLAGS: 00010046
[  526.035052] RAX: 0000000000000200 RBX: 0000000000000001 RCX: 0000000000000000
[  526.035072] RDX: 0000000000000001 RSI: 0000000000000200 RDI: ffff888104275518
[  526.035092] RBP: ffff888104275518 R08: 0000000000000000 R09: 0000000000000000
[  526.035113] R10: ffff888104275400 R11: 0000000000000000 R12: ffff888109b5d3a0
[  526.035133] R13: 0000000000000000 R14: 0000000000000000 R15: ffff888104275400
[  526.035159] FS:  0000000000000000(0000) GS:ffff8881b54c0000(0000) knlGS:0000000000000000
[  526.035194] CS:  10000e030 DS: 0000 ES: 0000 CR0: 0000000080050033
[  526.035214] CR2: 0000000000000010 CR3: 0000000103b5a000 CR4: 0000000000050660
[  526.035239] Call Trace:
[  526.035253]  xen_blkbk_map+0x131/0x5a0
[  526.035268]  dispatch_rw_block_io+0x42a/0x9c0
[  526.035284]  ? xen_mc_flush+0xcb/0x190
[  526.035298]  __do_block_io_op+0x314/0x630
[  526.035312]  xen_blkif_schedule+0x182/0x790
[  526.035327]  ? finish_wait+0x80/0x80
[  526.035340]  ? xen_blkif_be_int+0x30/0x30
[  526.035355]  kthread+0xfe/0x140
[  526.035371]  ? kthread_park+0x90/0x90
[  526.035385]  ret_from_fork+0x22/0x30
[  526.035398] Modules linked in:
[  526.035410] CR2: 0000000000000010
[  526.035440] ---[ end trace 431ea72658d96c9d ]---
[  526.176390] RIP: e030:gnttab_page_cache_get+0x32/0x60
[  526.176460] Code: 89 f4 55 48 89 fd e8 4d e3 80 00 48 83 7d 08 00 48 89 c6 74 15 48 89 ef e8 5b e0 80 00 4c 89 e6 5d bf 01 00 00 00 41 5c eb 8e <48> 8b 04 25 10 00 00 00 48 89 ef 48 89 45 08 49 c7 04 24 00 00 00
[  526.250734] RSP: e02b:ffffc90003e27a40 EFLAGS: 00010046
[  526.250751] RAX: 0000000000000200 RBX: 0000000000000001 RCX: 0000000000000000
[  526.250771] RDX: 0000000000000001 RSI: 0000000000000200 RDI: ffff888104275518
[  526.250790] RBP: ffff888104275518 R08: 0000000000000000 R09: 0000000000000000
[  526.250808] R10: ffff888104275400 R11: 0000000000000000 R12: ffff888109b5d3a0
[  526.250827] R13: 0000000000000000 R14: 0000000000000000 R15: ffff888104275400
[  526.250863] FS:  0000000000000000(0000) GS:ffff8881b54c0000(0000) knlGS:0000000000000000
[  526.250884] CS:  10000e030 DS: 0000 ES: 0000 CR0: 0000000080050033
[  526.250901] CR2: 0000000000000010 CR3: 0000000103b5a000 CR4: 0000000000050660
[  526.250924] Kernel panic - not syncing: Fatal exception
[  526.250972] Kernel Offset: disabled


This is 7059c2c00a2196865c2139083cbef47cd18109b6 with your patches on
top.

-- 
Best Regards,
Marek Marczykowski-Górecki
Invisible Things Lab
A: Because it messes up the order in which people normally read text.
Q: Why is top-posting such a bad thing?

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

  reply	other threads:[~2020-12-07 11:48 UTC|newest]

Thread overview: 25+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-11-29  3:56 GPF on 0xdead000000000100 in nvme_map_data - Linux 5.9.9 Marek Marczykowski-Górecki
2020-11-30 16:40 ` Keith Busch
2020-12-02  0:06   ` Marek Marczykowski-Górecki
2020-12-04 11:08     ` Marek Marczykowski-Górecki
2020-12-04 11:08       ` Marek Marczykowski-Górecki
2020-12-04 12:08       ` Christoph Hellwig
2020-12-04 12:08         ` Christoph Hellwig
2020-12-04 12:20         ` Marek Marczykowski-Górecki
2020-12-04 12:20           ` Marek Marczykowski-Górecki
2020-12-05  8:28           ` Roger Pau Monné
2020-12-05  8:28             ` Roger Pau Monné
2020-12-06 16:47             ` Jason Andryuk
2020-12-06 16:47               ` Jason Andryuk
2020-12-07  8:53               ` Jürgen Groß
2020-12-07  8:53                 ` Jürgen Groß
2020-12-07  9:02                 ` Jürgen Groß
2020-12-07  9:02                   ` Jürgen Groß
2020-12-07 10:55               ` Jürgen Groß
2020-12-07 10:55                 ` Jürgen Groß
2020-12-07 11:48                 ` Marek Marczykowski-Górecki [this message]
2020-12-07 11:48                   ` Marek Marczykowski-Górecki
2020-12-07 12:00                   ` Jürgen Groß
2020-12-07 12:00                     ` Jürgen Groß
2020-12-07 13:00                     ` Marek Marczykowski-Górecki
2020-12-07 13:00                       ` Marek Marczykowski-Górecki

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20201207114805.GF1244@mail-itl \
    --to=marmarek@invisiblethingslab.com \
    --cc=axboe@fb.com \
    --cc=hch@lst.de \
    --cc=jandryuk@gmail.com \
    --cc=jgross@suse.com \
    --cc=kbusch@kernel.org \
    --cc=linux-nvme@lists.infradead.org \
    --cc=roger.pau@citrix.com \
    --cc=sagi@grimberg.me \
    --cc=xen-devel@lists.xenproject.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.