All of lore.kernel.org
 help / color / mirror / Atom feed
From: David Hildenbrand <david@redhat.com>
To: "Dan Williams" <dan.j.williams@intel.com>,
	"KVM list" <kvm@vger.kernel.org>,
	"Linux Kernel Mailing List" <linux-kernel@vger.kernel.org>,
	linux-nvdimm <linux-nvdimm@lists.01.org>,
	"Paolo Bonzini" <pbonzini@redhat.com>,
	"Dave Jiang" <dave.jiang@intel.com>,
	"Zhang, Yu C" <yu.c.zhang@intel.com>,
	"Pankaj Gupta" <pagupta@redhat.com>, "Jan Kara" <jack@suse.cz>,
	"Christoph Hellwig" <hch@lst.de>, "Linux MM" <linux-mm@kvack.org>,
	rkrcmar@redhat.com, "Jérôme Glisse" <jglisse@redhat.com>,
	"Zhang, Yi Z" <yi.z.zhang@intel.com>
Subject: Re: [PATCH V5 4/4] kvm: add a check if pfn is from NVDIMM pmem.
Date: Fri, 21 Sep 2018 16:23:19 +0200	[thread overview]
Message-ID: <c8ad8ed7-ca8c-4dd7-819b-8d9c856fbe04@redhat.com> (raw)
In-Reply-To: <20180921224739.GA33892@tiger-server>

On 22/09/2018 00:47, Yi Zhang wrote:
> On 2018-09-20 at 14:19:17 -0700, Dan Williams wrote:
>> On Thu, Sep 20, 2018 at 7:11 AM Yi Zhang <yi.z.zhang@linux.intel.com> wrote:
>>>
>>> On 2018-09-19 at 09:20:25 +0200, David Hildenbrand wrote:
>>>> Am 19.09.18 um 04:53 schrieb Dan Williams:
>>>>>
>>>>> Should we consider just not setting PageReserved for
>>>>> devm_memremap_pages()? Perhaps kvm is not be the only component making
>>>>> these assumptions about this flag?
>>>>
>>>> I was asking the exact same question in v3 or so.
>>>>
>>>> I was recently going through all PageReserved users, trying to clean up
>>>> and document how it is used.
>>>>
>>>> PG_reserved used to be a marker "not available for the page allocator".
>>>> This is only partially true and not really helpful I think. My current
>>>> understanding:
>>>>
>>>> "
>>>> PG_reserved is set for special pages, struct pages of such pages should
>>>> in general not be touched except by their owner. Pages marked as
>>>> reserved include:
>>>> - Kernel image (including vDSO) and similar (e.g. BIOS, initrd)
>>>> - Pages allocated early during boot (bootmem, memblock)
>>>> - Zero pages
>>>> - Pages that have been associated with a zone but were not onlined
>>>>   (e.g. NVDIMM/pmem, online_page_callback used by XEN)
>>>> - Pages to exclude from the hibernation image (e.g. loaded kexec images)
>>>> - MCA (memory error) pages on ia64
>>>> - Offline pages
>>>> Some architectures don't allow to ioremap RAM pages that are not marked
>>>> as reserved. Allocated pages might have to be set reserved to allow for
>>>> that - if there is a good reason to enforce this. Consequently,
>>>> PG_reserved part of a user space table might be the indicator for the
>>>> zero page, pmem or MMIO pages.
>>>> "
>>>>
>>>> Swapping code does not care about PageReserved at all as far as I
>>>> remember. This seems to be fine as it only looks at the way pages have
>>>> been mapped into user space.
>>>>
>>>> I don't really see a good reason to set pmem pages as reserved. One
>>>> question would be, how/if to exclude them from the hibernation image.
>>>> But that could also be solved differently (we would have to double check
>>>> how they are handled in hibernation code).
>>>>
>>>>
>>>> A similar user of PageReserved to look at is:
>>>>
>>>> drivers/vfio/vfio_iommu_type1.c:is_invalid_reserved_pfn()
>>>>
>>>> It will not mark pages dirty if they are reserved. Similar to KVM code.
>>> Yes, kvm is not the only one user of the dax reserved page.
>>>>
>>>>>
>>>>> Why is MEMORY_DEVICE_PUBLIC memory specifically excluded?
>>>>>
>>>>> This has less to do with "dax" pages and more to do with
>>>>> devm_memremap_pages() established ranges. P2PDMA is another producer
>>>>> of these pages. If either MEMORY_DEVICE_PUBLIC or P2PDMA pages can be
>>>>> used in these kvm paths then I think this points to consider clearing
>>>>> the Reserved flag.
>>>
>>> Thanks Dan/David's comments.
>>> for MEMORY_DEVICE_PUBLIC memory, since host driver could manager the
>>> memory resource to share to guest, Jerome says we could ignore it at
>>> this time.
>>>
>>> And p2pmem, it seems mapped in a PCI bar space which should most likely
>>> a mmio. I think kvm should treated as a reserved page.
>>
>> Ok, but the question you left unanswered is whether it would be better
>> for devm_memremap_pages() to clear the PageReserved flag for
>> MEMORY_DEVICE_{FS,DEV}_DAX rather than introduce a local kvm-only hack
>> for what looks like a global problem.
> 
> Remove the PageReserved flag sounds more reasonable. 
> And Could we still have a flag to identify it is a device private memory, or
> where these pages coming from?

We could use a page type for that or what you proposed. (as I said, we
might have to change hibernation code to skip the pages once we drop the
reserved flag).

-- 

Thanks,

David / dhildenb
_______________________________________________
Linux-nvdimm mailing list
Linux-nvdimm@lists.01.org
https://lists.01.org/mailman/listinfo/linux-nvdimm

WARNING: multiple messages have this Message-ID (diff)
From: David Hildenbrand <david@redhat.com>
To: "Dan Williams" <dan.j.williams@intel.com>,
	"KVM list" <kvm@vger.kernel.org>,
	"Linux Kernel Mailing List" <linux-kernel@vger.kernel.org>,
	linux-nvdimm <linux-nvdimm@lists.01.org>,
	"Paolo Bonzini" <pbonzini@redhat.com>,
	"Dave Jiang" <dave.jiang@intel.com>,
	"Zhang, Yu C" <yu.c.zhang@intel.com>,
	"Pankaj Gupta" <pagupta@redhat.com>, "Jan Kara" <jack@suse.cz>,
	"Christoph Hellwig" <hch@lst.de>, "Linux MM" <linux-mm@kvack.org>,
	rkrcmar@redhat.com, "Jérôme Glisse" <jglisse@redhat.com>,
	"Zhang, Yi Z" <yi.z.zhang@intel.com>
Subject: Re: [PATCH V5 4/4] kvm: add a check if pfn is from NVDIMM pmem.
Date: Fri, 21 Sep 2018 16:23:19 +0200	[thread overview]
Message-ID: <c8ad8ed7-ca8c-4dd7-819b-8d9c856fbe04@redhat.com> (raw)
In-Reply-To: <20180921224739.GA33892@tiger-server>

On 22/09/2018 00:47, Yi Zhang wrote:
> On 2018-09-20 at 14:19:17 -0700, Dan Williams wrote:
>> On Thu, Sep 20, 2018 at 7:11 AM Yi Zhang <yi.z.zhang@linux.intel.com> wrote:
>>>
>>> On 2018-09-19 at 09:20:25 +0200, David Hildenbrand wrote:
>>>> Am 19.09.18 um 04:53 schrieb Dan Williams:
>>>>>
>>>>> Should we consider just not setting PageReserved for
>>>>> devm_memremap_pages()? Perhaps kvm is not be the only component making
>>>>> these assumptions about this flag?
>>>>
>>>> I was asking the exact same question in v3 or so.
>>>>
>>>> I was recently going through all PageReserved users, trying to clean up
>>>> and document how it is used.
>>>>
>>>> PG_reserved used to be a marker "not available for the page allocator".
>>>> This is only partially true and not really helpful I think. My current
>>>> understanding:
>>>>
>>>> "
>>>> PG_reserved is set for special pages, struct pages of such pages should
>>>> in general not be touched except by their owner. Pages marked as
>>>> reserved include:
>>>> - Kernel image (including vDSO) and similar (e.g. BIOS, initrd)
>>>> - Pages allocated early during boot (bootmem, memblock)
>>>> - Zero pages
>>>> - Pages that have been associated with a zone but were not onlined
>>>>   (e.g. NVDIMM/pmem, online_page_callback used by XEN)
>>>> - Pages to exclude from the hibernation image (e.g. loaded kexec images)
>>>> - MCA (memory error) pages on ia64
>>>> - Offline pages
>>>> Some architectures don't allow to ioremap RAM pages that are not marked
>>>> as reserved. Allocated pages might have to be set reserved to allow for
>>>> that - if there is a good reason to enforce this. Consequently,
>>>> PG_reserved part of a user space table might be the indicator for the
>>>> zero page, pmem or MMIO pages.
>>>> "
>>>>
>>>> Swapping code does not care about PageReserved at all as far as I
>>>> remember. This seems to be fine as it only looks at the way pages have
>>>> been mapped into user space.
>>>>
>>>> I don't really see a good reason to set pmem pages as reserved. One
>>>> question would be, how/if to exclude them from the hibernation image.
>>>> But that could also be solved differently (we would have to double check
>>>> how they are handled in hibernation code).
>>>>
>>>>
>>>> A similar user of PageReserved to look at is:
>>>>
>>>> drivers/vfio/vfio_iommu_type1.c:is_invalid_reserved_pfn()
>>>>
>>>> It will not mark pages dirty if they are reserved. Similar to KVM code.
>>> Yes, kvm is not the only one user of the dax reserved page.
>>>>
>>>>>
>>>>> Why is MEMORY_DEVICE_PUBLIC memory specifically excluded?
>>>>>
>>>>> This has less to do with "dax" pages and more to do with
>>>>> devm_memremap_pages() established ranges. P2PDMA is another producer
>>>>> of these pages. If either MEMORY_DEVICE_PUBLIC or P2PDMA pages can be
>>>>> used in these kvm paths then I think this points to consider clearing
>>>>> the Reserved flag.
>>>
>>> Thanks Dan/David's comments.
>>> for MEMORY_DEVICE_PUBLIC memory, since host driver could manager the
>>> memory resource to share to guest, Jerome says we could ignore it at
>>> this time.
>>>
>>> And p2pmem, it seems mapped in a PCI bar space which should most likely
>>> a mmio. I think kvm should treated as a reserved page.
>>
>> Ok, but the question you left unanswered is whether it would be better
>> for devm_memremap_pages() to clear the PageReserved flag for
>> MEMORY_DEVICE_{FS,DEV}_DAX rather than introduce a local kvm-only hack
>> for what looks like a global problem.
> 
> Remove the PageReserved flag sounds more reasonable. 
> And Could we still have a flag to identify it is a device private memory, or
> where these pages coming from?

We could use a page type for that or what you proposed. (as I said, we
might have to change hibernation code to skip the pages once we drop the
reserved flag).

-- 

Thanks,

David / dhildenb

  reply	other threads:[~2018-09-21 14:23 UTC|newest]

Thread overview: 48+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-09-07 18:03 [PATCH V5 0/4] Fix kvm misconceives NVDIMM pages as reserved mmio Zhang Yi
2018-09-07 18:03 ` Zhang Yi
2018-09-07 18:03 ` Zhang Yi
2018-09-07 17:04 ` Ahmed S. Darwish
2018-09-07 17:04   ` Ahmed S. Darwish
2018-09-07 17:04   ` Ahmed S. Darwish
2018-09-18 14:31   ` Yi Zhang
2018-09-18 14:31     ` Yi Zhang
2018-09-18 14:31     ` Yi Zhang
2018-09-07 18:03 ` [PATCH V5 1/4] kvm: remove redundant reserved page check Zhang Yi
2018-09-07 18:03   ` Zhang Yi
2018-09-07 18:03   ` Zhang Yi
2018-10-24 14:32   ` Yi Zhang
2018-09-07 18:03 ` [PATCH V5 2/4] mm: introduce memory type MEMORY_DEVICE_DEV_DAX Zhang Yi
2018-09-07 18:03   ` Zhang Yi
2018-09-07 18:03 ` [PATCH V5 3/4] mm: add a function to differentiate the pages is from DAX device memory Zhang Yi
2018-09-07 18:03   ` Zhang Yi
2018-09-07 18:03   ` Zhang Yi
2018-09-07 18:04 ` [PATCH V5 4/4] kvm: add a check if pfn is from NVDIMM pmem Zhang Yi
2018-09-07 18:04   ` Zhang Yi
2018-09-19  2:53   ` Dan Williams
2018-09-19  2:53     ` Dan Williams
2018-09-19  7:20     ` David Hildenbrand
2018-09-19  7:20       ` David Hildenbrand
2018-09-20 22:49       ` Yi Zhang
2018-09-20 22:49         ` Yi Zhang
2018-09-20 21:19         ` Dan Williams
2018-09-21 22:47           ` Yi Zhang
2018-09-21 14:23             ` David Hildenbrand [this message]
2018-09-21 14:23               ` David Hildenbrand
2018-09-21 18:17               ` Dan Williams
2018-09-21 18:17                 ` Dan Williams
2018-09-21 18:17                 ` Dan Williams
2018-09-21 19:29                 ` David Hildenbrand
2018-09-21 19:29                   ` David Hildenbrand
2018-09-21 19:29                   ` David Hildenbrand
2018-10-19 16:33                   ` Barret Rhoden
2018-10-19 16:33                     ` Barret Rhoden
2018-10-19 16:33                     ` Barret Rhoden
2018-10-22  8:47                     ` Yi Zhang
2018-10-22  8:47                       ` Yi Zhang
2018-10-22  8:47                       ` Yi Zhang
2018-10-22  8:47                       ` Yi Zhang
2018-09-19 10:55 ` [PATCH V5 0/4] Fix kvm misconceives NVDIMM pages as reserved mmio Yi Zhang
2018-09-19 10:55   ` Yi Zhang
2018-09-19 10:55   ` Yi Zhang
2018-09-19  2:43   ` Pankaj Gupta
2018-09-19  2:43     ` Pankaj Gupta

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=c8ad8ed7-ca8c-4dd7-819b-8d9c856fbe04@redhat.com \
    --to=david@redhat.com \
    --cc=dan.j.williams@intel.com \
    --cc=dave.jiang@intel.com \
    --cc=hch@lst.de \
    --cc=jack@suse.cz \
    --cc=jglisse@redhat.com \
    --cc=kvm@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=linux-nvdimm@lists.01.org \
    --cc=pagupta@redhat.com \
    --cc=pbonzini@redhat.com \
    --cc=rkrcmar@redhat.com \
    --cc=yi.z.zhang@intel.com \
    --cc=yu.c.zhang@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.