From: David Hildenbrand <david@redhat.com> To: Dan Williams <dan.j.williams@intel.com>, Zhang Yi <yi.z.zhang@linux.intel.com> Cc: "KVM list" <kvm@vger.kernel.org>, "Zhang, Yu C" <yu.c.zhang@intel.com>, linux-nvdimm <linux-nvdimm@lists.01.org>, "Jan Kara" <jack@suse.cz>, rkrcmar@redhat.com, "Linux Kernel Mailing List" <linux-kernel@vger.kernel.org>, "Linux MM" <linux-mm@kvack.org>, "Jérôme Glisse" <jglisse@redhat.com>, "Paolo Bonzini" <pbonzini@redhat.com>, "Christoph Hellwig" <hch@lst.de> Subject: Re: [PATCH V5 4/4] kvm: add a check if pfn is from NVDIMM pmem. Date: Wed, 19 Sep 2018 09:20:25 +0200 [thread overview] Message-ID: <fefbd66e-623d-b6a5-7202-5309dd4f5b32@redhat.com> (raw) In-Reply-To: <CAPcyv4ifg2BZMTNfu6mg0xxtPWs3BVgkfEj51v1CQ6jp2S70fw@mail.gmail.com> Am 19.09.18 um 04:53 schrieb Dan Williams: > On Fri, Sep 7, 2018 at 2:25 AM Zhang Yi <yi.z.zhang@linux.intel.com> wrote: >> >> For device specific memory space, when we move these area of pfn to >> memory zone, we will set the page reserved flag at that time, some of >> these reserved for device mmio, and some of these are not, such as >> NVDIMM pmem. >> >> Now, we map these dev_dax or fs_dax pages to kvm for DIMM/NVDIMM >> backend, since these pages are reserved, the check of >> kvm_is_reserved_pfn() misconceives those pages as MMIO. Therefor, we >> introduce 2 page map types, MEMORY_DEVICE_FS_DAX/MEMORY_DEVICE_DEV_DAX, >> to identify these pages are from NVDIMM pmem and let kvm treat these >> as normal pages. >> >> Without this patch, many operations will be missed due to this >> mistreatment to pmem pages, for example, a page may not have chance to >> be unpinned for KVM guest(in kvm_release_pfn_clean), not able to be >> marked as dirty/accessed(in kvm_set_pfn_dirty/accessed) etc. >> >> Signed-off-by: Zhang Yi <yi.z.zhang@linux.intel.com> >> Acked-by: Pankaj Gupta <pagupta@redhat.com> >> --- >> virt/kvm/kvm_main.c | 16 ++++++++++++++-- >> 1 file changed, 14 insertions(+), 2 deletions(-) >> >> diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c >> index c44c406..9c49634 100644 >> --- a/virt/kvm/kvm_main.c >> +++ b/virt/kvm/kvm_main.c >> @@ -147,8 +147,20 @@ __weak void kvm_arch_mmu_notifier_invalidate_range(struct kvm *kvm, >> >> bool kvm_is_reserved_pfn(kvm_pfn_t pfn) >> { >> - if (pfn_valid(pfn)) >> - return PageReserved(pfn_to_page(pfn)); >> + struct page *page; >> + >> + if (pfn_valid(pfn)) { >> + page = pfn_to_page(pfn); >> + >> + /* >> + * For device specific memory space, there is a case >> + * which we need pass MEMORY_DEVICE_FS[DEV]_DAX pages >> + * to kvm, these pages marked reserved flag as it is a >> + * zone device memory, we need to identify these pages >> + * and let kvm treat these as normal pages >> + */ >> + return PageReserved(page) && !is_dax_page(page); > > Should we consider just not setting PageReserved for > devm_memremap_pages()? Perhaps kvm is not be the only component making > these assumptions about this flag? I was asking the exact same question in v3 or so. I was recently going through all PageReserved users, trying to clean up and document how it is used. PG_reserved used to be a marker "not available for the page allocator". This is only partially true and not really helpful I think. My current understanding: " PG_reserved is set for special pages, struct pages of such pages should in general not be touched except by their owner. Pages marked as reserved include: - Kernel image (including vDSO) and similar (e.g. BIOS, initrd) - Pages allocated early during boot (bootmem, memblock) - Zero pages - Pages that have been associated with a zone but were not onlined (e.g. NVDIMM/pmem, online_page_callback used by XEN) - Pages to exclude from the hibernation image (e.g. loaded kexec images) - MCA (memory error) pages on ia64 - Offline pages Some architectures don't allow to ioremap RAM pages that are not marked as reserved. Allocated pages might have to be set reserved to allow for that - if there is a good reason to enforce this. Consequently, PG_reserved part of a user space table might be the indicator for the zero page, pmem or MMIO pages. " Swapping code does not care about PageReserved at all as far as I remember. This seems to be fine as it only looks at the way pages have been mapped into user space. I don't really see a good reason to set pmem pages as reserved. One question would be, how/if to exclude them from the hibernation image. But that could also be solved differently (we would have to double check how they are handled in hibernation code). A similar user of PageReserved to look at is: drivers/vfio/vfio_iommu_type1.c:is_invalid_reserved_pfn() It will not mark pages dirty if they are reserved. Similar to KVM code. > > Why is MEMORY_DEVICE_PUBLIC memory specifically excluded? > > This has less to do with "dax" pages and more to do with > devm_memremap_pages() established ranges. P2PDMA is another producer > of these pages. If either MEMORY_DEVICE_PUBLIC or P2PDMA pages can be > used in these kvm paths then I think this points to consider clearing > the Reserved flag. > > That said I haven't audited all the locations that test PageReserved(). > > Sorry for not responding sooner I was on extended leave. > -- Thanks, David / dhildenb _______________________________________________ Linux-nvdimm mailing list Linux-nvdimm@lists.01.org https://lists.01.org/mailman/listinfo/linux-nvdimm
WARNING: multiple messages have this Message-ID (diff)
From: David Hildenbrand <david@redhat.com> To: Dan Williams <dan.j.williams@intel.com>, Zhang Yi <yi.z.zhang@linux.intel.com> Cc: "KVM list" <kvm@vger.kernel.org>, "Linux Kernel Mailing List" <linux-kernel@vger.kernel.org>, linux-nvdimm <linux-nvdimm@lists.01.org>, "Paolo Bonzini" <pbonzini@redhat.com>, "Dave Jiang" <dave.jiang@intel.com>, "Zhang, Yu C" <yu.c.zhang@intel.com>, "Pankaj Gupta" <pagupta@redhat.com>, "Jan Kara" <jack@suse.cz>, "Christoph Hellwig" <hch@lst.de>, "Linux MM" <linux-mm@kvack.org>, rkrcmar@redhat.com, "Jérôme Glisse" <jglisse@redhat.com>, "Zhang, Yi Z" <yi.z.zhang@intel.com> Subject: Re: [PATCH V5 4/4] kvm: add a check if pfn is from NVDIMM pmem. Date: Wed, 19 Sep 2018 09:20:25 +0200 [thread overview] Message-ID: <fefbd66e-623d-b6a5-7202-5309dd4f5b32@redhat.com> (raw) In-Reply-To: <CAPcyv4ifg2BZMTNfu6mg0xxtPWs3BVgkfEj51v1CQ6jp2S70fw@mail.gmail.com> Am 19.09.18 um 04:53 schrieb Dan Williams: > On Fri, Sep 7, 2018 at 2:25 AM Zhang Yi <yi.z.zhang@linux.intel.com> wrote: >> >> For device specific memory space, when we move these area of pfn to >> memory zone, we will set the page reserved flag at that time, some of >> these reserved for device mmio, and some of these are not, such as >> NVDIMM pmem. >> >> Now, we map these dev_dax or fs_dax pages to kvm for DIMM/NVDIMM >> backend, since these pages are reserved, the check of >> kvm_is_reserved_pfn() misconceives those pages as MMIO. Therefor, we >> introduce 2 page map types, MEMORY_DEVICE_FS_DAX/MEMORY_DEVICE_DEV_DAX, >> to identify these pages are from NVDIMM pmem and let kvm treat these >> as normal pages. >> >> Without this patch, many operations will be missed due to this >> mistreatment to pmem pages, for example, a page may not have chance to >> be unpinned for KVM guest(in kvm_release_pfn_clean), not able to be >> marked as dirty/accessed(in kvm_set_pfn_dirty/accessed) etc. >> >> Signed-off-by: Zhang Yi <yi.z.zhang@linux.intel.com> >> Acked-by: Pankaj Gupta <pagupta@redhat.com> >> --- >> virt/kvm/kvm_main.c | 16 ++++++++++++++-- >> 1 file changed, 14 insertions(+), 2 deletions(-) >> >> diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c >> index c44c406..9c49634 100644 >> --- a/virt/kvm/kvm_main.c >> +++ b/virt/kvm/kvm_main.c >> @@ -147,8 +147,20 @@ __weak void kvm_arch_mmu_notifier_invalidate_range(struct kvm *kvm, >> >> bool kvm_is_reserved_pfn(kvm_pfn_t pfn) >> { >> - if (pfn_valid(pfn)) >> - return PageReserved(pfn_to_page(pfn)); >> + struct page *page; >> + >> + if (pfn_valid(pfn)) { >> + page = pfn_to_page(pfn); >> + >> + /* >> + * For device specific memory space, there is a case >> + * which we need pass MEMORY_DEVICE_FS[DEV]_DAX pages >> + * to kvm, these pages marked reserved flag as it is a >> + * zone device memory, we need to identify these pages >> + * and let kvm treat these as normal pages >> + */ >> + return PageReserved(page) && !is_dax_page(page); > > Should we consider just not setting PageReserved for > devm_memremap_pages()? Perhaps kvm is not be the only component making > these assumptions about this flag? I was asking the exact same question in v3 or so. I was recently going through all PageReserved users, trying to clean up and document how it is used. PG_reserved used to be a marker "not available for the page allocator". This is only partially true and not really helpful I think. My current understanding: " PG_reserved is set for special pages, struct pages of such pages should in general not be touched except by their owner. Pages marked as reserved include: - Kernel image (including vDSO) and similar (e.g. BIOS, initrd) - Pages allocated early during boot (bootmem, memblock) - Zero pages - Pages that have been associated with a zone but were not onlined (e.g. NVDIMM/pmem, online_page_callback used by XEN) - Pages to exclude from the hibernation image (e.g. loaded kexec images) - MCA (memory error) pages on ia64 - Offline pages Some architectures don't allow to ioremap RAM pages that are not marked as reserved. Allocated pages might have to be set reserved to allow for that - if there is a good reason to enforce this. Consequently, PG_reserved part of a user space table might be the indicator for the zero page, pmem or MMIO pages. " Swapping code does not care about PageReserved at all as far as I remember. This seems to be fine as it only looks at the way pages have been mapped into user space. I don't really see a good reason to set pmem pages as reserved. One question would be, how/if to exclude them from the hibernation image. But that could also be solved differently (we would have to double check how they are handled in hibernation code). A similar user of PageReserved to look at is: drivers/vfio/vfio_iommu_type1.c:is_invalid_reserved_pfn() It will not mark pages dirty if they are reserved. Similar to KVM code. > > Why is MEMORY_DEVICE_PUBLIC memory specifically excluded? > > This has less to do with "dax" pages and more to do with > devm_memremap_pages() established ranges. P2PDMA is another producer > of these pages. If either MEMORY_DEVICE_PUBLIC or P2PDMA pages can be > used in these kvm paths then I think this points to consider clearing > the Reserved flag. > > That said I haven't audited all the locations that test PageReserved(). > > Sorry for not responding sooner I was on extended leave. > -- Thanks, David / dhildenb
next prev parent reply other threads:[~2018-09-19 7:20 UTC|newest] Thread overview: 48+ messages / expand[flat|nested] mbox.gz Atom feed top 2018-09-07 18:03 [PATCH V5 0/4] Fix kvm misconceives NVDIMM pages as reserved mmio Zhang Yi 2018-09-07 18:03 ` Zhang Yi 2018-09-07 18:03 ` Zhang Yi 2018-09-07 17:04 ` Ahmed S. Darwish 2018-09-07 17:04 ` Ahmed S. Darwish 2018-09-07 17:04 ` Ahmed S. Darwish 2018-09-18 14:31 ` Yi Zhang 2018-09-18 14:31 ` Yi Zhang 2018-09-18 14:31 ` Yi Zhang 2018-09-07 18:03 ` [PATCH V5 1/4] kvm: remove redundant reserved page check Zhang Yi 2018-09-07 18:03 ` Zhang Yi 2018-09-07 18:03 ` Zhang Yi 2018-10-24 14:32 ` Yi Zhang 2018-09-07 18:03 ` [PATCH V5 2/4] mm: introduce memory type MEMORY_DEVICE_DEV_DAX Zhang Yi 2018-09-07 18:03 ` Zhang Yi 2018-09-07 18:03 ` [PATCH V5 3/4] mm: add a function to differentiate the pages is from DAX device memory Zhang Yi 2018-09-07 18:03 ` Zhang Yi 2018-09-07 18:03 ` Zhang Yi 2018-09-07 18:04 ` [PATCH V5 4/4] kvm: add a check if pfn is from NVDIMM pmem Zhang Yi 2018-09-07 18:04 ` Zhang Yi 2018-09-19 2:53 ` Dan Williams 2018-09-19 2:53 ` Dan Williams 2018-09-19 7:20 ` David Hildenbrand [this message] 2018-09-19 7:20 ` David Hildenbrand 2018-09-20 22:49 ` Yi Zhang 2018-09-20 22:49 ` Yi Zhang 2018-09-20 21:19 ` Dan Williams 2018-09-21 22:47 ` Yi Zhang 2018-09-21 14:23 ` David Hildenbrand 2018-09-21 14:23 ` David Hildenbrand 2018-09-21 18:17 ` Dan Williams 2018-09-21 18:17 ` Dan Williams 2018-09-21 18:17 ` Dan Williams 2018-09-21 19:29 ` David Hildenbrand 2018-09-21 19:29 ` David Hildenbrand 2018-09-21 19:29 ` David Hildenbrand 2018-10-19 16:33 ` Barret Rhoden 2018-10-19 16:33 ` Barret Rhoden 2018-10-19 16:33 ` Barret Rhoden 2018-10-22 8:47 ` Yi Zhang 2018-10-22 8:47 ` Yi Zhang 2018-10-22 8:47 ` Yi Zhang 2018-10-22 8:47 ` Yi Zhang 2018-09-19 10:55 ` [PATCH V5 0/4] Fix kvm misconceives NVDIMM pages as reserved mmio Yi Zhang 2018-09-19 10:55 ` Yi Zhang 2018-09-19 10:55 ` Yi Zhang 2018-09-19 2:43 ` Pankaj Gupta 2018-09-19 2:43 ` Pankaj Gupta
Reply instructions: You may reply publicly to this message via plain-text email using any one of the following methods: * Save the following mbox file, import it into your mail client, and reply-to-all from there: mbox Avoid top-posting and favor interleaved quoting: https://en.wikipedia.org/wiki/Posting_style#Interleaved_style * Reply using the --to, --cc, and --in-reply-to switches of git-send-email(1): git send-email \ --in-reply-to=fefbd66e-623d-b6a5-7202-5309dd4f5b32@redhat.com \ --to=david@redhat.com \ --cc=dan.j.williams@intel.com \ --cc=hch@lst.de \ --cc=jack@suse.cz \ --cc=jglisse@redhat.com \ --cc=kvm@vger.kernel.org \ --cc=linux-kernel@vger.kernel.org \ --cc=linux-mm@kvack.org \ --cc=linux-nvdimm@lists.01.org \ --cc=pbonzini@redhat.com \ --cc=rkrcmar@redhat.com \ --cc=yi.z.zhang@linux.intel.com \ --cc=yu.c.zhang@intel.com \ /path/to/YOUR_REPLY https://kernel.org/pub/software/scm/git/docs/git-send-email.html * If your mail client supports setting the In-Reply-To header via mailto: links, try the mailto: linkBe sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.