On Tue, Dec 01, 2020 at 01:40:10AM +0900, Keith Busch wrote: > On Sun, Nov 29, 2020 at 04:56:39AM +0100, Marek Marczykowski-Górecki wrote: > > I can reliably hit kernel panic in nvme_map_data() which looks like the > > one below. It happens on Linux 5.9.9, while 5.4.75 works fine. I haven't > > tried other version on this hardware. Linux is running as Xen > > PV dom0, on top of nvme there is LUKS and then LVM with thin > > provisioning. The crash happens reliably when starting a Xen domU (which > > uses one of thin provisioned LVM volumes as its disk). But booting dom0 > > works fine (even though it is using the same disk setup for its root > > filesystem). > > > > I did a bit of debugging and found it's about this part: > > > > drivers/nvme/host/pci.c: > > 800 static blk_status_t nvme_map_data(struct nvme_dev *dev, struct request *req, > > 801 struct nvme_command *cmnd) > > 802 { > > 803 struct nvme_iod *iod = blk_mq_rq_to_pdu(req); > > 804 blk_status_t ret = BLK_STS_RESOURCE; > > 805 int nr_mapped; > > 806 > > 807 if (blk_rq_nr_phys_segments(req) == 1) { > > 808 struct bio_vec bv = req_bvec(req); > > 809 > > 810 if (!is_pci_p2pdma_page(bv.bv_page)) { > > > > Here, bv.bv_page->pgmap is LIST_POISON1, while page_zonenum(bv.bv_page) > > says ZONE_DEVICE. So, is_pci_p2pdma_page() crashes on accessing > > bv.bv_page->pgmap->type. > > Something sounds off. I thought all ZONE_DEVICE pages require a pgmap > because that's what holds a references to the device's live-ness. What > are you allocating this memory from that makes ZONE_DEVICE true without > a pgmap? Well, I allocate anything myself. I just try to start the system with unmodified Linux 5.9.9 and NVME drive... I didn't managed to find where this page is allocated, nor where it gets broken. I _suspect_ it gets allocated as ZONE_DEVICE page and then gets released as ZONE_NORMAL which sets another part of the union to LIST_POISON1. But I have absolutely no data to confirm/deny this theory. -- Best Regards, Marek Marczykowski-Górecki Invisible Things Lab A: Because it messes up the order in which people normally read text. Q: Why is top-posting such a bad thing?