driverdev-devel.linuxdriverproject.org archive mirror
 help / color / mirror / Atom feed
From: David Hildenbrand <david@redhat.com>
To: Dan Williams <dan.j.williams@intel.com>
Cc: "Kate Stewart" <kstewart@linuxfoundation.org>,
	linux-hyperv@vger.kernel.org, "Michal Hocko" <mhocko@suse.com>,
	"Radim Krčmář" <rkrcmar@redhat.com>,
	"KVM list" <kvm@vger.kernel.org>,
	"Pavel Tatashin" <pavel.tatashin@microsoft.com>,
	"KarimAllah Ahmed" <karahmed@amazon.de>,
	"Benjamin Herrenschmidt" <benh@kernel.crashing.org>,
	"Dave Hansen" <dave.hansen@linux.intel.com>,
	"Alexander Duyck" <alexander.duyck@gmail.com>,
	"Michal Hocko" <mhocko@kernel.org>,
	"Paul Mackerras" <paulus@ozlabs.org>,
	"Linux MM" <linux-mm@kvack.org>,
	"Paul Mackerras" <paulus@samba.org>,
	"Michael Ellerman" <mpe@ellerman.id.au>,
	"H. Peter Anvin" <hpa@zytor.com>,
	"Wanpeng Li" <wanpengli@tencent.com>,
	"Pavel Tatashin" <pasha.tatashin@soleen.com>,
	devel@driverdev.osuosl.org,
	"Stefano Stabellini" <sstabellini@kernel.org>,
	"Stephen Hemminger" <sthemmin@microsoft.com>,
	"Aneesh Kumar K.V" <aneesh.kumar@linux.ibm.com>,
	"Joerg Roedel" <joro@8bytes.org>, "X86 ML" <x86@kernel.org>,
	YueHaibing <yuehaibing@huawei.com>,
	"Mike Rapoport" <rppt@linux.ibm.com>,
	"Madhumitha Prabakaran" <madhumithabiw@gmail.com>,
	"Peter Zijlstra" <peterz@infradead.org>,
	"Ingo Molnar" <mingo@redhat.com>,
	"Vlastimil Babka" <vbabka@suse.cz>,
	"Nishka Dasgupta" <nishkadg.linux@gmail.com>,
	"Anthony Yznaga" <anthony.yznaga@oracle.com>,
	"Oscar Salvador" <osalvador@suse.de>,
	"Dan Carpenter" <dan.carpenter@oracle.com>,
	"Isaac J. Manjarres" <isaacm@codeaurora.org>,
	"Kees Cook" <keescook@chromium.org>,
	"Anshuman Khandual" <anshuman.khandual@arm.com>,
	"Haiyang Zhang" <haiyangz@microsoft.com>,
	"Simon Sandström" <simon@nikanor.nu>,
	"Sasha Levin" <sashal@kernel.org>,
	"Juergen Gross" <jgross@suse.com>,
	kvm-ppc@vger.kernel.org, "Qian Cai" <cai@lca.pw>,
	"Alex Williamson" <alex.williamson@redhat.com>,
	"Mike Rapoport" <rppt@linux.vnet.ibm.com>,
	"Borislav Petkov" <bp@alien8.de>,
	"Nicholas Piggin" <npiggin@gmail.com>,
	"Andy Lutomirski" <luto@kernel.org>,
	xen-devel <xen-devel@lists.xenproject.org>,
	"Boris Ostrovsky" <boris.ostrovsky@oracle.com>,
	"Todd Poynor" <toddpoynor@google.com>,
	"Vitaly Kuznetsov" <vkuznets@redhat.com>,
	"Allison Randal" <allison@lohutok.net>,
	"Jim Mattson" <jmattson@google.com>,
	"Christophe Leroy" <christophe.leroy@c-s.fr>,
	"Vandana BN" <bnvandana@gmail.com>,
	"Mel Gorman" <mgorman@techsingularity.net>,
	"Greg Kroah-Hartman" <gregkh@linuxfoundation.org>,
	"Cornelia Huck" <cohuck@redhat.com>,
	"Linux Kernel Mailing List" <linux-kernel@vger.kernel.org>,
	"Sean Christopherson" <sean.j.christopherson@intel.com>,
	"Rob Springer" <rspringer@google.com>,
	"Thomas Gleixner" <tglx@linutronix.de>,
	"Johannes Weiner" <hannes@cmpxchg.org>,
	"Paolo Bonzini" <pbonzini@redhat.com>,
	"Andrew Morton" <akpm@linux-foundation.org>,
	linuxppc-dev <linuxppc-dev@lists.ozlabs.org>
Subject: Re: [PATCH RFC v1 00/12] mm: Don't mark hotplugged pages PG_reserved (including ZONE_DEVICE)
Date: Wed, 23 Oct 2019 23:22:23 +0200	[thread overview]
Message-ID: <a1f53724-2a7b-048d-0790-a17c8b79c65a@redhat.com> (raw)
In-Reply-To: <CAPcyv4g1zBpD2i936wWB9Pn0OStUoksXXLCCdXeYjbHuri-j4Q@mail.gmail.com>

On 23.10.19 21:39, Dan Williams wrote:
> On Wed, Oct 23, 2019 at 10:28 AM David Hildenbrand <david@redhat.com> wrote:
>>
>>>> I dislike this for three reasons
>>>>
>>>> a) It does not protect against any races, really, it does not improve things.
>>>> b) We do have the exact same problem with pfn_to_online_page(). As long as we
>>>>    don't hold the memory hotplug lock, memory can get offlined and remove any time. Racy.
>>>
>>> True, we need to solve that problem too. That seems to want something
>>> lighter weight than the hotplug lock that can be held over pfn lookups
>>> +  use rather than requiring a page lookup in paths where it's not
>>> clear that a page reference would prevent unplug.
>>>
>>>> c) We mix in ZONE specific stuff into the core. It should be "just another zone"
>>>
>>> Not sure I grok this when the RFC is sprinkling zone-specific
>>> is_zone_device_page() throughout the core?
>>
>> Most users should not care about the zone. pfn_active() would be enough
>> in most situations, especially most PFN walkers - "this memmap is valid
>> and e.g., contains a valid zone ...".
> 
> Oh, I see, you're saying convert most users to pfn_active() (and some
> TBD rcu locking), but only pfn_to_online_page() users would need the
> zone lookup? I can get on board with that.

I guess my answer to that is simple: If we only care about "is this
memmap safe to touch", use pfn_active()

(well, with pfn_valid_within() similar as done in pfn_to_online_page()
due to memory holes, but these are details - e.g., pfn_active() can
check against pfn_valid_within() right away internally). (+locking TBD
to make sure it remains active)

However, if we want to special case in addition on zones (!ZONE_DEVICE
(a.k.a., onlined via memory blocks, managed by the buddy), ZONE_DEVICE,
whatever might come in the future, ...), also access the zone stored in
the memmap. E.g., by using pfn_to_online_page().

> 
>>
>>>
>>>>
>>>> What I propose instead (already discussed in https://lkml.org/lkml/2019/10/10/87)
>>>
>>> Sorry I missed this earlier...
>>>
>>>>
>>>> 1. Convert SECTION_IS_ONLINE to SECTION_IS_ACTIVE
>>>> 2. Convert SECTION_IS_ACTIVE to a subsection bitmap
>>>> 3. Introduce pfn_active() that checks against the subsection bitmap
>>>> 4. Once the memmap was initialized / prepared, set the subsection active
>>>>    (similar to SECTION_IS_ONLINE in the buddy right now)
>>>> 5. Before the memmap gets invalidated, set the subsection inactive
>>>>    (similar to SECTION_IS_ONLINE in the buddy right now)
>>>> 5. pfn_to_online_page() = pfn_active() && zone != ZONE_DEVICE
>>>> 6. pfn_to_device_page() = pfn_active() && zone == ZONE_DEVICE
>>>
>>> This does not seem to reduce any complexity because it still requires
>>> a pfn to zone lookup at the end of the process.
>>>
>>> I.e. converting pfn_to_online_page() to use a new pfn_active()
>>> subsection map plus looking up the zone from pfn_to_page() is more
>>> steps than just doing a direct pfn to zone lookup. What am I missing?
>>
>> That a real "pfn to zone" lookup without going via the struct page will
>> require to have more than just a single bitmap. IMHO, keeping the
>> information at a single place (memmap) is the clean thing to do (not
>> replicating it somewhere else). Going via the memmap might not be as
>> fast as a direct lookup, but do we actually care? We are already looking
>> at "random PFNs we are not sure if there is a valid memmap".
> 
> True, we only care about the validity of the check, and as you pointed
> out moving the check to the pfn level does not solve the validity
> race. It needs a lock.

Let's call pfn_active() "a pfn that is active in the system and has an
initialized memmap, which contains sane values" (valid memmap sounds
like pfn_valid(), which is actually "there is a memmap which might
contain garbage"). Yes we need some sort of lightweight locking as
discussed.

[...]

>>>> However, I think we also have to clarify if we need the change in 3 at all.
>>>> It comes from
>>>>
>>>> commit f5509cc18daa7f82bcc553be70df2117c8eedc16
>>>> Author: Kees Cook <keescook@chromium.org>
>>>> Date:   Tue Jun 7 11:05:33 2016 -0700
>>>>
>>>>     mm: Hardened usercopy
>>>>
>>>>     This is the start of porting PAX_USERCOPY into the mainline kernel. This
>>>>     is the first set of features, controlled by CONFIG_HARDENED_USERCOPY. The
>>>>     work is based on code by PaX Team and Brad Spengler, and an earlier port
>>>>     from Casey Schaufler. Additional non-slab page tests are from Rik van Riel.
>>>> [...]
>>>>     - otherwise, object must not span page allocations (excepting Reserved
>>>>       and CMA ranges)
>>>>
>>>> Not sure if we really have to care about ZONE_DEVICE at this point.
>>>
>>> That check needs to be careful to ignore ZONE_DEVICE pages. There's
>>> nothing wrong with a copy spanning ZONE_DEVICE and typical pages.
>>
>> Please note that the current check would *forbid* this (AFAIKs for a
>> single heap object). As discussed in the relevant patch, we might be
>> able to just stop doing that and limit it to real PG_reserved pages
>> (without ZONE_DEVICE). I'd be happy to not introduce new
>> is_zone_device_page() users.
> 
> At least for non-HMM ZONE_DEVICE usage, i.e. the dax + pmem stuff, is
> excluded from this path by:
> 
> 52f476a323f9 libnvdimm/pmem: Bypass CONFIG_HARDENED_USERCOPY overhead

Interesting, and very valuable information. So this sounds like patch #2
can go (or convert it to a documentation update).

> 
> So this case is one more to add to the pile of HMM auditing.

Sounds like HMM is some dangerous piece of software we have. This needs
auditing, fixing, and documentation.

BTW, do you have a good source of details about HMM? Especially about
these oddities you mentioned?

Also, can you have a look at patch #2 7/8 and confirm that doing a
SetPageDirty() on a ZONE_DEVICE page is okay (although not useful)? Thanks!

-- 

Thanks,

David / dhildenb

_______________________________________________
devel mailing list
devel@linuxdriverproject.org
http://driverdev.linuxdriverproject.org/mailman/listinfo/driverdev-devel

  reply	other threads:[~2019-10-23 21:24 UTC|newest]

Thread overview: 28+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-10-22 17:12 [PATCH RFC v1 00/12] mm: Don't mark hotplugged pages PG_reserved (including ZONE_DEVICE) David Hildenbrand
2019-10-22 17:12 ` [PATCH RFC v1 01/12] mm/memory_hotplug: Don't allow to online/offline memory blocks with holes David Hildenbrand
2019-10-24  3:53   ` Anshuman Khandual
2019-10-24  7:55     ` David Hildenbrand
2019-10-22 17:12 ` [PATCH RFC v1 02/12] mm/usercopy.c: Prepare check_page_span() for PG_reserved changes David Hildenbrand
2019-10-23  8:20   ` David Hildenbrand
2019-10-23 16:25     ` Kees Cook
2019-10-23 16:32       ` David Hildenbrand
2019-10-22 17:12 ` [PATCH RFC v1 03/12] KVM: x86/mmu: Prepare kvm_is_mmio_pfn() " David Hildenbrand
2019-10-22 17:12 ` [PATCH RFC v1 04/12] KVM: Prepare kvm_is_reserved_pfn() " David Hildenbrand
2019-10-22 17:12 ` [PATCH RFC v1 05/12] vfio/type1: Prepare is_invalid_reserved_pfn() " David Hildenbrand
2019-10-22 17:12 ` [PATCH RFC v1 06/12] staging/gasket: Prepare gasket_release_page() " David Hildenbrand
2019-10-23  8:17   ` David Hildenbrand
2019-10-22 17:12 ` [PATCH RFC v1 07/12] staging: kpc2000: Prepare transfer_complete_cb() " David Hildenbrand
2019-10-22 17:55   ` Matt Sickler
2019-10-22 21:01     ` David Hildenbrand
2019-10-22 17:12 ` [PATCH RFC v1 08/12] powerpc/book3s: Prepare kvmppc_book3s_instantiate_page() " David Hildenbrand
2019-10-22 17:12 ` [PATCH RFC v1 09/12] powerpc/64s: Prepare hash_page_do_lazy_icache() " David Hildenbrand
2019-10-22 17:12 ` [PATCH RFC v1 10/12] powerpc/mm: Prepare maybe_pte_to_page() " David Hildenbrand
2019-10-22 17:12 ` [PATCH RFC v1 11/12] x86/mm: Prepare __ioremap_check_ram() " David Hildenbrand
2019-10-22 17:12 ` [PATCH RFC v1 12/12] mm/memory_hotplug: Don't mark pages PG_reserved when initializing the memmap David Hildenbrand
2019-10-22 21:54 ` [PATCH RFC v1 00/12] mm: Don't mark hotplugged pages PG_reserved (including ZONE_DEVICE) Dan Williams
2019-10-23  7:26   ` David Hildenbrand
2019-10-23 17:09     ` Dan Williams
2019-10-23 17:27       ` David Hildenbrand
2019-10-23 19:39         ` Dan Williams
2019-10-23 21:22           ` David Hildenbrand [this message]
2019-10-24 12:50     ` David Hildenbrand

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=a1f53724-2a7b-048d-0790-a17c8b79c65a@redhat.com \
    --to=david@redhat.com \
    --cc=akpm@linux-foundation.org \
    --cc=alex.williamson@redhat.com \
    --cc=alexander.duyck@gmail.com \
    --cc=allison@lohutok.net \
    --cc=aneesh.kumar@linux.ibm.com \
    --cc=anshuman.khandual@arm.com \
    --cc=anthony.yznaga@oracle.com \
    --cc=benh@kernel.crashing.org \
    --cc=bnvandana@gmail.com \
    --cc=boris.ostrovsky@oracle.com \
    --cc=bp@alien8.de \
    --cc=cai@lca.pw \
    --cc=christophe.leroy@c-s.fr \
    --cc=cohuck@redhat.com \
    --cc=dan.carpenter@oracle.com \
    --cc=dan.j.williams@intel.com \
    --cc=dave.hansen@linux.intel.com \
    --cc=devel@driverdev.osuosl.org \
    --cc=gregkh@linuxfoundation.org \
    --cc=haiyangz@microsoft.com \
    --cc=hannes@cmpxchg.org \
    --cc=hpa@zytor.com \
    --cc=isaacm@codeaurora.org \
    --cc=jgross@suse.com \
    --cc=jmattson@google.com \
    --cc=joro@8bytes.org \
    --cc=karahmed@amazon.de \
    --cc=keescook@chromium.org \
    --cc=kstewart@linuxfoundation.org \
    --cc=kvm-ppc@vger.kernel.org \
    --cc=kvm@vger.kernel.org \
    --cc=linux-hyperv@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=linuxppc-dev@lists.ozlabs.org \
    --cc=luto@kernel.org \
    --cc=madhumithabiw@gmail.com \
    --cc=mgorman@techsingularity.net \
    --cc=mhocko@kernel.org \
    --cc=mhocko@suse.com \
    --cc=mingo@redhat.com \
    --cc=mpe@ellerman.id.au \
    --cc=nishkadg.linux@gmail.com \
    --cc=npiggin@gmail.com \
    --cc=osalvador@suse.de \
    --cc=pasha.tatashin@soleen.com \
    --cc=paulus@ozlabs.org \
    --cc=paulus@samba.org \
    --cc=pavel.tatashin@microsoft.com \
    --cc=pbonzini@redhat.com \
    --cc=peterz@infradead.org \
    --cc=rkrcmar@redhat.com \
    --cc=rppt@linux.ibm.com \
    --cc=rppt@linux.vnet.ibm.com \
    --cc=rspringer@google.com \
    --cc=sashal@kernel.org \
    --cc=sean.j.christopherson@intel.com \
    --cc=simon@nikanor.nu \
    --cc=sstabellini@kernel.org \
    --cc=sthemmin@microsoft.com \
    --cc=tglx@linutronix.de \
    --cc=toddpoynor@google.com \
    --cc=vbabka@suse.cz \
    --cc=vkuznets@redhat.com \
    --cc=wanpengli@tencent.com \
    --cc=x86@kernel.org \
    --cc=xen-devel@lists.xenproject.org \
    --cc=yuehaibing@huawei.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).