LKML Archive on lore.kernel.org
 help / color / Atom feed
From: David Hildenbrand <david@redhat.com>
To: Dan Williams <dan.j.williams@intel.com>,
	alexander.h.duyck@linux.intel.com
Cc: "Barret Rhoden" <brho@google.com>,
	"Paolo Bonzini" <pbonzini@redhat.com>,
	"Zhang Yi" <yi.z.zhang@linux.intel.com>,
	"KVM list" <kvm@vger.kernel.org>,
	linux-nvdimm <linux-nvdimm@lists.01.org>,
	"Linux Kernel Mailing List" <linux-kernel@vger.kernel.org>,
	"Linux MM" <linux-mm@kvack.org>,
	"Dave Jiang" <dave.jiang@intel.com>,
	"Zhang, Yu C" <yu.c.zhang@intel.com>,
	"Pankaj Gupta" <pagupta@redhat.com>, "Jan Kara" <jack@suse.cz>,
	"Christoph Hellwig" <hch@lst.de>,
	rkrcmar@redhat.com, "Jérôme Glisse" <jglisse@redhat.com>
Subject: Re: [PATCH RFC 2/3] mm: Add support for exposing if dev_pagemap supports refcount pinning
Date: Wed, 5 Dec 2018 09:13:47 +0100
Message-ID: <ee8cc068-903c-d87e-f418-ade46786249e@redhat.com> (raw)
In-Reply-To: <CAPcyv4ix4aHyivwCiw0YNMxLjRJeqDX3x3m1q1JhyMPCEMOJtQ@mail.gmail.com>

On 05.12.18 01:26, Dan Williams wrote:
> On Tue, Dec 4, 2018 at 4:01 PM Alexander Duyck
> <alexander.h.duyck@linux.intel.com> wrote:
>>
>> On Tue, 2018-12-04 at 18:24 -0500, Barret Rhoden wrote:
>>> Hi -
>>>
>>> On 2018-12-04 at 14:51 Alexander Duyck
>>> <alexander.h.duyck@linux.intel.com> wrote:
>>>
>>> [snip]
>>>
>>>>> I think the confusion arises from the fact that there are a few MMIO
>>>>> resources with a struct page and all the rest MMIO resources without.
>>>>> The problem comes from the coarse definition of pfn_valid(), it may
>>>>> return 'true' for things that are not System-RAM, because pfn_valid()
>>>>> may be something as simplistic as a single "address < X" check. Then
>>>>> PageReserved is a fallback to clarify the pfn_valid() result. The
>>>>> typical case is that MMIO space is not caught up in this linear map
>>>>> confusion. An MMIO address may or may not have an associated 'struct
>>>>> page' and in most cases it does not.
>>>>
>>>> Okay. I think I understand this somewhat now. So the page might be
>>>> physically there, but with the reserved bit it is not supposed to be
>>>> touched.
>>>>
>>>> My main concern with just dropping the bit is that we start seeing some
>>>> other uses that I was not certain what the impact would be. For example
>>>> the functions like kvm_set_pfn_accessed start going in and manipulating
>>>> things that I am not sure should be messed with for a DAX page.
>>>
>>> One thing regarding the accessed and dirty bits is that we might want
>>> to have DAX pages marked dirty/accessed, even if we can't LRU-reclaim
>>> or swap them.  I don't have a real example and I'm fairly ignorant
>>> about the specifics here.  But one possibility would be using the A/D
>>> bits to detect changes to a guest's memory for VM migration.  Maybe
>>> there would be issues with KSM too.
>>>
>>> Barret
>>
>> I get that, but the issue is that the code associated with those bits
>> currently assumes you are working with either an anonymous swap backed
>> page or a page cache page. We should really be updating that logic now,
>> and then enabling DAX to access it rather than trying to do things the
>> other way around which is how this feels.
> 
> Agree. I understand the concern about unintended side effects of
> dropping PageReserved for dax pages, but they simply don't fit the
> definition of the intended use of PageReserved. We've already had
> fallout from legacy code paths doing the wrong thing with dax pages
> where PageReserved wouldn't have helped. For example, see commit
> 6e2608dfd934 "xfs, dax: introduce xfs_dax_aops", or commit
> 6100e34b2526 "mm, memory_failure: Teach memory_failure() about
> dev_pagemap pages". So formerly teaching kvm about these page
> semantics and dropping the reliance on a side effect of PageReserved()
> seems the right direction.
> 
> That said, for mark_page_accessed(), it does not look like it will
> have any effect on dax pages. PageLRU will be false,
> __lru_cache_activate_page() will not find a page on a percpu pagevec,
> and workingset_activation() won't find an associated memcg. I would
> not be surprised if mark_page_accessed() is already being called today
> via the ext4 + dax use case.
> 

I agree to what Dan says here. I'd vote for getting rid of the
PageReserved bit for these pages and rather fixing the fallout from that
(if any, I also doubt that there will be much). One thing I already
mentioned in another thread is hindering hibernation code from touching
ZONE_DEVICE memory is one thing to take care of.

PageReserved as part of a user space process can mean many things (and I
still have a patch pending for submission to document that). It can mean
zero pages, VDSO pages, MMIO pages and right now DAX pages. For the
first three, we don't want to touch the struct page ever
(->PageReserved). For DAX it should not harm (-> no need for PageReserved).

-- 

Thanks,

David / dhildenb

  reply index

Thread overview: 18+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-12-03 19:25 [PATCH RFC 0/3] Fix KVM misinterpreting Reserved page as an MMIO page Alexander Duyck
2018-12-03 19:25 ` [PATCH RFC 1/3] kvm: Split use cases for kvm_is_reserved_pfn to kvm_is_refcounted_pfn Alexander Duyck
2018-12-03 19:25 ` [PATCH RFC 2/3] mm: Add support for exposing if dev_pagemap supports refcount pinning Alexander Duyck
2018-12-03 19:47   ` Dan Williams
2018-12-03 20:21     ` Alexander Duyck
2018-12-03 20:31       ` Dan Williams
2018-12-03 20:53         ` Alexander Duyck
2018-12-03 21:05           ` Dan Williams
2018-12-03 21:50             ` Alexander Duyck
2018-12-04 19:08               ` Dan Williams
2018-12-04 22:51                 ` Alexander Duyck
2018-12-04 23:24                   ` Barret Rhoden
2018-12-05  0:01                     ` Alexander Duyck
2018-12-05  0:26                       ` Dan Williams
2018-12-05  8:13                         ` David Hildenbrand [this message]
2018-12-03 19:25 ` [PATCH RFC 3/3] kvm: Add additional check to determine if a page is refcounted Alexander Duyck
2018-12-04  6:59 ` [PATCH RFC 0/3] Fix KVM misinterpreting Reserved page as an MMIO page Yi Zhang
2018-12-04 18:45   ` Alexander Duyck

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=ee8cc068-903c-d87e-f418-ade46786249e@redhat.com \
    --to=david@redhat.com \
    --cc=alexander.h.duyck@linux.intel.com \
    --cc=brho@google.com \
    --cc=dan.j.williams@intel.com \
    --cc=dave.jiang@intel.com \
    --cc=hch@lst.de \
    --cc=jack@suse.cz \
    --cc=jglisse@redhat.com \
    --cc=kvm@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=linux-nvdimm@lists.01.org \
    --cc=pagupta@redhat.com \
    --cc=pbonzini@redhat.com \
    --cc=rkrcmar@redhat.com \
    --cc=yi.z.zhang@linux.intel.com \
    --cc=yu.c.zhang@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

LKML Archive on lore.kernel.org

Archives are clonable:
	git clone --mirror https://lore.kernel.org/lkml/0 lkml/git/0.git
	git clone --mirror https://lore.kernel.org/lkml/1 lkml/git/1.git
	git clone --mirror https://lore.kernel.org/lkml/2 lkml/git/2.git
	git clone --mirror https://lore.kernel.org/lkml/3 lkml/git/3.git
	git clone --mirror https://lore.kernel.org/lkml/4 lkml/git/4.git
	git clone --mirror https://lore.kernel.org/lkml/5 lkml/git/5.git
	git clone --mirror https://lore.kernel.org/lkml/6 lkml/git/6.git
	git clone --mirror https://lore.kernel.org/lkml/7 lkml/git/7.git
	git clone --mirror https://lore.kernel.org/lkml/8 lkml/git/8.git
	git clone --mirror https://lore.kernel.org/lkml/9 lkml/git/9.git

	# If you have public-inbox 1.1+ installed, you may
	# initialize and index your mirror using the following commands:
	public-inbox-init -V2 lkml lkml/ https://lore.kernel.org/lkml \
		linux-kernel@vger.kernel.org
	public-inbox-index lkml

Example config snippet for mirrors

Newsgroup available over NNTP:
	nntp://nntp.lore.kernel.org/org.kernel.vger.linux-kernel


AGPL code for this site: git clone https://public-inbox.org/public-inbox.git