LinuxPPC-Dev Archive on lore.kernel.org
 help / color / Atom feed
From: Dan Williams <dan.j.williams@intel.com>
To: David Hildenbrand <david@redhat.com>
Cc: linux-hyperv@vger.kernel.org, "Michal Hocko" <mhocko@suse.com>,
	"Radim Krčmář" <rkrcmar@redhat.com>,
	"KVM list" <kvm@vger.kernel.org>,
	"Pavel Tatashin" <pavel.tatashin@microsoft.com>,
	"KarimAllah Ahmed" <karahmed@amazon.de>,
	"Dave Hansen" <dave.hansen@linux.intel.com>,
	"Alexander Duyck" <alexander.duyck@gmail.com>,
	"Michal Hocko" <mhocko@kernel.org>,
	"Linux MM" <linux-mm@kvack.org>,
	"Paolo Bonzini" <pbonzini@redhat.com>,
	"Paul Mackerras" <paulus@samba.org>,
	"H. Peter Anvin" <hpa@zytor.com>,
	"Wanpeng Li" <wanpengli@tencent.com>,
	"Alexander Duyck" <alexander.h.duyck@linux.intel.com>,
	"K. Y. Srinivasan" <kys@microsoft.com>,
	"Thomas Gleixner" <tglx@linutronix.de>,
	"Kees Cook" <keescook@chromium.org>,
	devel@driverdev.osuosl.org,
	"Stefano Stabellini" <sstabellini@kernel.org>,
	"Stephen Hemminger" <sthemmin@microsoft.com>,
	"Aneesh Kumar K.V" <aneesh.kumar@linux.ibm.com>,
	"Joerg Roedel" <joro@8bytes.org>, "X86 ML" <x86@kernel.org>,
	YueHaibing <yuehaibing@huawei.com>,
	"Matthew Wilcox (Oracle)" <willy@infradead.org>,
	"Mike Rapoport" <rppt@linux.ibm.com>,
	"Peter Zijlstra" <peterz@infradead.org>,
	"Ingo Molnar" <mingo@redhat.com>,
	"Vlastimil Babka" <vbabka@suse.cz>,
	"Anthony Yznaga" <anthony.yznaga@oracle.com>,
	"Oscar Salvador" <osalvador@suse.de>,
	"Isaac J. Manjarres" <isaacm@codeaurora.org>,
	"Matt Sickler" <Matt.Sickler@daktronics.com>,
	"Juergen Gross" <jgross@suse.com>,
	"Anshuman Khandual" <anshuman.khandual@arm.com>,
	"Haiyang Zhang" <haiyangz@microsoft.com>,
	"Sasha Levin" <sashal@kernel.org>,
	kvm-ppc@vger.kernel.org, "Qian Cai" <cai@lca.pw>,
	"Alex Williamson" <alex.williamson@redhat.com>,
	"Mike Rapoport" <rppt@linux.vnet.ibm.com>,
	"David Hildenbrand" <dhildenb@redhat.com>,
	"Nicholas Piggin" <npiggin@gmail.com>,
	"Andy Lutomirski" <luto@kernel.org>,
	xen-devel <xen-devel@lists.xenproject.org>,
	"Boris Ostrovsky" <boris.ostrovsky@oracle.com>,
	"Vitaly Kuznetsov" <vkuznets@redhat.com>,
	"Allison Randal" <allison@lohutok.net>,
	"Jim Mattson" <jmattson@google.com>,
	"Mel Gorman" <mgorman@techsingularity.net>,
	"Cornelia Huck" <cohuck@redhat.com>,
	"Pavel Tatashin" <pasha.tatashin@soleen.com>,
	"Linux Kernel Mailing List" <linux-kernel@vger.kernel.org>,
	"Sean Christopherson" <sean.j.christopherson@intel.com>,
	"Johannes Weiner" <hannes@cmpxchg.org>,
	"Borislav Petkov" <bp@alien8.de>,
	"Andrew Morton" <akpm@linux-foundation.org>,
	linuxppc-dev <linuxppc-dev@lists.ozlabs.org>
Subject: Re: [PATCH v1 04/10] vfio/type1: Prepare is_invalid_reserved_pfn() for PG_reserved changes
Date: Thu, 7 Nov 2019 21:09:46 -0800
Message-ID: <CAPcyv4h0yX4g6ETymQEpp52FFLaOmps_hO7w_yuYGk7BqQQcMQ@mail.gmail.com> (raw)
In-Reply-To: <0eb001e0-bb26-59bb-c514-d2f8a86a7eab@redhat.com>

On Thu, Nov 7, 2019 at 2:07 PM David Hildenbrand <david@redhat.com> wrote:
>
> On 07.11.19 19:22, David Hildenbrand wrote:
> >
> >
> >> Am 07.11.2019 um 16:40 schrieb Dan Williams <dan.j.williams@intel.com>:
> >>
> >> On Thu, Oct 24, 2019 at 5:12 AM David Hildenbrand <david@redhat.com> wrote:
> >>>
> >>> Right now, ZONE_DEVICE memory is always set PG_reserved. We want to
> >>> change that.
> >>>
> >>> KVM has this weird use case that you can map anything from /dev/mem
> >>> into the guest. pfn_valid() is not a reliable check whether the memmap
> >>> was initialized and can be touched. pfn_to_online_page() makes sure
> >>> that we have an initialized memmap (and don't have ZONE_DEVICE memory).
> >>>
> >>> Rewrite is_invalid_reserved_pfn() similar to kvm_is_reserved_pfn() to make
> >>> sure the function produces the same result once we stop setting ZONE_DEVICE
> >>> pages PG_reserved.
> >>>
> >>> Cc: Alex Williamson <alex.williamson@redhat.com>
> >>> Cc: Cornelia Huck <cohuck@redhat.com>
> >>> Signed-off-by: David Hildenbrand <david@redhat.com>
> >>> ---
> >>> drivers/vfio/vfio_iommu_type1.c | 10 ++++++++--
> >>> 1 file changed, 8 insertions(+), 2 deletions(-)
> >>>
> >>> diff --git a/drivers/vfio/vfio_iommu_type1.c b/drivers/vfio/vfio_iommu_type1.c
> >>> index 2ada8e6cdb88..f8ce8c408ba8 100644
> >>> --- a/drivers/vfio/vfio_iommu_type1.c
> >>> +++ b/drivers/vfio/vfio_iommu_type1.c
> >>> @@ -299,9 +299,15 @@ static int vfio_lock_acct(struct vfio_dma *dma, long npage, bool async)
> >>>   */
> >>> static bool is_invalid_reserved_pfn(unsigned long pfn)
> >>> {
> >>> -       if (pfn_valid(pfn))
> >>> -               return PageReserved(pfn_to_page(pfn));
> >>> +       struct page *page = pfn_to_online_page(pfn);
> >>
> >> Ugh, I just realized this is not a safe conversion until
> >> pfn_to_online_page() is moved over to subsection granularity. As it
> >> stands it will return true for any ZONE_DEVICE pages that share a
> >> section with boot memory.
> >
> > That should not happen right now and I commented back when you introduced subsection support that I don’t want to have ZONE_DEVICE mixed with online pages in a section. Having memory block devices that partially span ZONE_DEVICE would be ... really weird. With something like pfn_active() - as discussed - we could at least make this check work - but I am not sure if we really want to go down that path. In the worst case, some MB of RAM are lost ... I guess this needs more thought.
> >
>
> I just realized the "boot memory" part. Is that a real thing? IOW, can
> we have ZONE_DEVICE falling into a memory block (with holes)? I somewhat
> have doubts that this would work ...

One of the real world failure cases that started the subsection effect
is that Persistent Memory collides with System RAM on a 64MB boundary
on shipping platforms. System RAM ends on a 64MB boundary and due to a
lack of memory controller resources PMEM is mapped contiguously at the
end of that boundary. Some more details in the subsection cover letter
/ changelogs [1] [2]. It's not sufficient to just lose some memory,
that's the broken implementation that lead to the subsection work
because the lost memory may change from one boot to the next and
software can't reliably inject a padding that conforms to the x86
128MB section constraint.

Suffice to say I think we need your pfn_active() to get subsection
granularity pfn_to_online_page() before PageReserved() can be removed.

[1]: https://lore.kernel.org/linux-mm/156092349300.979959.17603710711957735135.stgit@dwillia2-desk3.amr.corp.intel.com/
[2]: https://lore.kernel.org/linux-mm/156092354368.979959.6232443923440952359.stgit@dwillia2-desk3.amr.corp.intel.com/

  reply index

Thread overview: 43+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-10-24 12:09 [PATCH v1 00/10] mm: Don't mark hotplugged pages PG_reserved (including ZONE_DEVICE) David Hildenbrand
2019-10-24 12:09 ` [PATCH v1 01/10] mm/memory_hotplug: Don't allow to online/offline memory blocks with holes David Hildenbrand
2019-11-05  1:30   ` Dan Williams
2019-11-05  9:31     ` David Hildenbrand
2019-10-24 12:09 ` [PATCH v1 02/10] KVM: x86/mmu: Prepare kvm_is_mmio_pfn() for PG_reserved changes David Hildenbrand
2019-11-05  1:37   ` Dan Williams
2019-11-05 11:09     ` David Hildenbrand
2019-10-24 12:09 ` [PATCH v1 03/10] KVM: Prepare kvm_is_reserved_pfn() " David Hildenbrand
2019-11-05  4:38   ` Dan Williams
2019-11-05  9:17     ` David Hildenbrand
2019-11-05  9:49       ` David Hildenbrand
2019-11-05 10:02         ` David Hildenbrand
2019-11-05 16:00           ` Sean Christopherson
2019-11-05 20:30             ` David Hildenbrand
2019-11-05 22:22               ` Sean Christopherson
2019-11-05 23:02               ` Dan Williams
2019-11-05 23:13                 ` Sean Christopherson
2019-11-05 23:30                   ` Dan Williams
2019-11-05 23:42                     ` Sean Christopherson
2019-11-05 23:43                     ` Dan Williams
2019-11-06  0:03                       ` Sean Christopherson
2019-11-06  0:08                         ` Dan Williams
2019-11-06  6:56                           ` David Hildenbrand
2019-11-06 16:09                             ` Sean Christopherson
2019-10-24 12:09 ` [PATCH v1 04/10] vfio/type1: Prepare is_invalid_reserved_pfn() " David Hildenbrand
2019-11-07 15:40   ` Dan Williams
2019-11-07 18:22     ` David Hildenbrand
2019-11-07 22:07       ` David Hildenbrand
2019-11-08  5:09         ` Dan Williams [this message]
2019-11-08  7:14           ` David Hildenbrand
2019-11-08 10:21             ` David Hildenbrand
2019-11-08 18:29               ` Dan Williams
2019-11-08 23:01                 ` David Hildenbrand
2019-10-24 12:09 ` [PATCH v1 05/10] powerpc/book3s: Prepare kvmppc_book3s_instantiate_page() " David Hildenbrand
2019-10-24 12:09 ` [PATCH v1 06/10] powerpc/64s: Prepare hash_page_do_lazy_icache() " David Hildenbrand
2019-10-24 12:09 ` [PATCH v1 07/10] powerpc/mm: Prepare maybe_pte_to_page() " David Hildenbrand
2019-10-24 12:09 ` [PATCH v1 08/10] x86/mm: Prepare __ioremap_check_ram() " David Hildenbrand
2019-10-24 12:09 ` [PATCH v1 09/10] mm/memory_hotplug: Don't mark pages PG_reserved when initializing the memmap David Hildenbrand
2019-11-04 22:44   ` Boris Ostrovsky
2019-11-05 10:18     ` David Hildenbrand
2019-11-05 16:06       ` Boris Ostrovsky
2019-10-24 12:09 ` [PATCH v1 10/10] mm/usercopy.c: Update comment in check_page_span() regarding ZONE_DEVICE David Hildenbrand
2019-11-01 19:24 ` [PATCH v1 00/10] mm: Don't mark hotplugged pages PG_reserved (including ZONE_DEVICE) David Hildenbrand

Reply instructions:

You may reply publically to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=CAPcyv4h0yX4g6ETymQEpp52FFLaOmps_hO7w_yuYGk7BqQQcMQ@mail.gmail.com \
    --to=dan.j.williams@intel.com \
    --cc=Matt.Sickler@daktronics.com \
    --cc=akpm@linux-foundation.org \
    --cc=alex.williamson@redhat.com \
    --cc=alexander.duyck@gmail.com \
    --cc=alexander.h.duyck@linux.intel.com \
    --cc=allison@lohutok.net \
    --cc=aneesh.kumar@linux.ibm.com \
    --cc=anshuman.khandual@arm.com \
    --cc=anthony.yznaga@oracle.com \
    --cc=boris.ostrovsky@oracle.com \
    --cc=bp@alien8.de \
    --cc=cai@lca.pw \
    --cc=cohuck@redhat.com \
    --cc=dave.hansen@linux.intel.com \
    --cc=david@redhat.com \
    --cc=devel@driverdev.osuosl.org \
    --cc=dhildenb@redhat.com \
    --cc=haiyangz@microsoft.com \
    --cc=hannes@cmpxchg.org \
    --cc=hpa@zytor.com \
    --cc=isaacm@codeaurora.org \
    --cc=jgross@suse.com \
    --cc=jmattson@google.com \
    --cc=joro@8bytes.org \
    --cc=karahmed@amazon.de \
    --cc=keescook@chromium.org \
    --cc=kvm-ppc@vger.kernel.org \
    --cc=kvm@vger.kernel.org \
    --cc=kys@microsoft.com \
    --cc=linux-hyperv@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=linuxppc-dev@lists.ozlabs.org \
    --cc=luto@kernel.org \
    --cc=mgorman@techsingularity.net \
    --cc=mhocko@kernel.org \
    --cc=mhocko@suse.com \
    --cc=mingo@redhat.com \
    --cc=npiggin@gmail.com \
    --cc=osalvador@suse.de \
    --cc=pasha.tatashin@soleen.com \
    --cc=paulus@samba.org \
    --cc=pavel.tatashin@microsoft.com \
    --cc=pbonzini@redhat.com \
    --cc=peterz@infradead.org \
    --cc=rkrcmar@redhat.com \
    --cc=rppt@linux.ibm.com \
    --cc=rppt@linux.vnet.ibm.com \
    --cc=sashal@kernel.org \
    --cc=sean.j.christopherson@intel.com \
    --cc=sstabellini@kernel.org \
    --cc=sthemmin@microsoft.com \
    --cc=tglx@linutronix.de \
    --cc=vbabka@suse.cz \
    --cc=vkuznets@redhat.com \
    --cc=wanpengli@tencent.com \
    --cc=willy@infradead.org \
    --cc=x86@kernel.org \
    --cc=xen-devel@lists.xenproject.org \
    --cc=yuehaibing@huawei.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

LinuxPPC-Dev Archive on lore.kernel.org

Archives are clonable:
	git clone --mirror https://lore.kernel.org/linuxppc-dev/0 linuxppc-dev/git/0.git

	# If you have public-inbox 1.1+ installed, you may
	# initialize and index your mirror using the following commands:
	public-inbox-init -V2 linuxppc-dev linuxppc-dev/ https://lore.kernel.org/linuxppc-dev \
		linuxppc-dev@lists.ozlabs.org linuxppc-dev@ozlabs.org
	public-inbox-index linuxppc-dev

Example config snippet for mirrors

Newsgroup available over NNTP:
	nntp://nntp.lore.kernel.org/org.ozlabs.lists.linuxppc-dev


AGPL code for this site: git clone https://public-inbox.org/public-inbox.git