From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-2.3 required=3.0 tests=DKIM_SIGNED,DKIM_VALID, DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS,MAILING_LIST_MULTI,SPF_HELO_NONE, SPF_PASS,URIBL_BLOCKED,USER_AGENT_SANE_1 autolearn=no autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 5D5C4C5DF62 for ; Wed, 6 Nov 2019 06:57:06 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id DD8C5217F5 for ; Wed, 6 Nov 2019 06:57:05 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b="K8UBhX1J" DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org DD8C5217F5 Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=redhat.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id 4F4426B0003; Wed, 6 Nov 2019 01:57:05 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 47E666B0006; Wed, 6 Nov 2019 01:57:05 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 31F5B6B0007; Wed, 6 Nov 2019 01:57:05 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0101.hostedemail.com [216.40.44.101]) by kanga.kvack.org (Postfix) with ESMTP id 15DE46B0003 for ; Wed, 6 Nov 2019 01:57:05 -0500 (EST) Received: from smtpin08.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay04.hostedemail.com (Postfix) with SMTP id AA3F045A6 for ; Wed, 6 Nov 2019 06:57:04 +0000 (UTC) X-FDA: 76124945568.08.store62_3dbcb5075322f X-HE-Tag: store62_3dbcb5075322f X-Filterd-Recvd-Size: 11414 Received: from us-smtp-1.mimecast.com (us-smtp-delivery-1.mimecast.com [207.211.31.120]) by imf36.hostedemail.com (Postfix) with ESMTP for ; Wed, 6 Nov 2019 06:57:04 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1573023423; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=gHhiVRGjTg2z5h3dRfpoqGRkrXa+G8taPvqq9+5Ox4o=; b=K8UBhX1JtFZMs594Ypt3noWNFA7EDgGOzgTZeoLrecZiYBc0z9aQ1ftUT60ADHOM/4fKea S4RheFJRLPHeZXXsdxGjj1lj/DBLQ6j6EczxwIps2sdVeLlU8ExvD/QiqqB572Rk55bS6J wibfYYo3txGWC+O//e5qvLzZuGhvknc= Received: from mimecast-mx01.redhat.com (mimecast-mx01.redhat.com [209.132.183.4]) (Using TLS) by relay.mimecast.com with ESMTP id us-mta-43-W8IUVLyUMSC0rC0dumrlPQ-1; Wed, 06 Nov 2019 01:57:02 -0500 Received: from smtp.corp.redhat.com (int-mx05.intmail.prod.int.phx2.redhat.com [10.5.11.15]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx01.redhat.com (Postfix) with ESMTPS id 7029E8017E0; Wed, 6 Nov 2019 06:56:55 +0000 (UTC) Received: from [10.36.116.143] (ovpn-116-143.ams2.redhat.com [10.36.116.143]) by smtp.corp.redhat.com (Postfix) with ESMTP id DB0E15D70E; Wed, 6 Nov 2019 06:56:35 +0000 (UTC) Subject: Re: [PATCH v1 03/10] KVM: Prepare kvm_is_reserved_pfn() for PG_reserved changes To: Dan Williams , Sean Christopherson Cc: Linux Kernel Mailing List , Linux MM , Michal Hocko , Andrew Morton , kvm-ppc@vger.kernel.org, linuxppc-dev , KVM list , linux-hyperv@vger.kernel.org, devel@driverdev.osuosl.org, xen-devel , X86 ML , Alexander Duyck , Alexander Duyck , Alex Williamson , Allison Randal , Andy Lutomirski , "Aneesh Kumar K.V" , Anshuman Khandual , Anthony Yznaga , Benjamin Herrenschmidt , Borislav Petkov , Boris Ostrovsky , Christophe Leroy , Cornelia Huck , Dave Hansen , Haiyang Zhang , "H. Peter Anvin" , Ingo Molnar , "Isaac J. Manjarres" , Jim Mattson , Joerg Roedel , Johannes Weiner , Juergen Gross , KarimAllah Ahmed , Kees Cook , "K. Y. Srinivasan" , "Matthew Wilcox (Oracle)" , Matt Sickler , Mel Gorman , Michael Ellerman , Michal Hocko , Mike Rapoport , Mike Rapoport , Nicholas Piggin , Oscar Salvador , Paolo Bonzini , Paul Mackerras , Paul Mackerras , Pavel Tatashin , Pavel Tatashin , Peter Zijlstra , Qian Cai , =?UTF-8?B?UmFkaW0gS3LEjW3DocWZ?= , Sasha Levin , Stefano Stabellini , Stephen Hemminger , Thomas Gleixner , Vitaly Kuznetsov , Vlastimil Babka , Wanpeng Li , YueHaibing , Adam Borowski References: <01adb4cb-6092-638c-0bab-e61322be7cf5@redhat.com> <613f3606-748b-0e56-a3ad-1efaffa1a67b@redhat.com> <20191105160000.GC8128@linux.intel.com> <20191105231316.GE23297@linux.intel.com> <20191106000315.GI23297@linux.intel.com> From: David Hildenbrand Organization: Red Hat GmbH Message-ID: <694202e7-d8e6-6ac8-6e47-3553b298bbcc@redhat.com> Date: Wed, 6 Nov 2019 07:56:34 +0100 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:68.0) Gecko/20100101 Thunderbird/68.1.1 MIME-Version: 1.0 In-Reply-To: Content-Language: en-US X-Scanned-By: MIMEDefang 2.79 on 10.5.11.15 X-MC-Unique: W8IUVLyUMSC0rC0dumrlPQ-1 X-Mimecast-Spam-Score: 0 Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: quoted-printable X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On 06.11.19 01:08, Dan Williams wrote: > On Tue, Nov 5, 2019 at 4:03 PM Sean Christopherson > wrote: >> >> On Tue, Nov 05, 2019 at 03:43:29PM -0800, Dan Williams wrote: >>> On Tue, Nov 5, 2019 at 3:30 PM Dan Williams = wrote: >>>> >>>> On Tue, Nov 5, 2019 at 3:13 PM Sean Christopherson >>>> wrote: >>>>> >>>>> On Tue, Nov 05, 2019 at 03:02:40PM -0800, Dan Williams wrote: >>>>>> On Tue, Nov 5, 2019 at 12:31 PM David Hildenbrand = wrote: >>>>>>>> The scarier code (for me) is transparent_hugepage_adjust() and >>>>>>>> kvm_mmu_zap_collapsible_spte(), as I don't at all understand the >>>>>>>> interaction between THP and _PAGE_DEVMAP. >>>>>>> >>>>>>> The x86 KVM MMU code is one of the ugliest code I know (sorry, but = it >>>>>>> had to be said :/ ). Luckily, this should be independent of the >>>>>>> PG_reserved thingy AFAIKs. >>>>>> >>>>>> Both transparent_hugepage_adjust() and kvm_mmu_zap_collapsible_spte(= ) >>>>>> are honoring kvm_is_reserved_pfn(), so again I'm missing where the >>>>>> page count gets mismanaged and leads to the reported hang. >>>>> >>>>> When mapping pages into the guest, KVM gets the page via gup(), which >>>>> increments the page count for ZONE_DEVICE pages. But KVM puts the pa= ge >>>>> using kvm_release_pfn_clean(), which skips put_page() if PageReserved= () >>>>> and so never puts its reference to ZONE_DEVICE pages. >>>> >>>> Oh, yeah, that's busted. >>> >>> Ugh, it's extra busted because every other gup user in the kernel >>> tracks the pages resulting from gup and puts them (put_page()) when >>> they are done. KVM wants to forget about whether it did a gup to get >>> the page and optionally trigger put_page() based purely on the pfn. >>> Outside of VFIO device assignment that needs pages pinned for DMA, why >>> does KVM itself need to pin pages? If pages are pinned over a return >>> to userspace that needs to be a FOLL_LONGTERM gup. >> >> Short answer, KVM pins the page to ensure correctness with respect to th= e >> primary MMU invalidating the associated host virtual address, e.g. when >> the page is being migrated or unmapped from host userspace. >> >> The main use of gup() is to handle guest page faults and map pages into >> the guest, i.e. into KVM's secondary MMU. KVM uses gup() to both get th= e >> PFN and to temporarily pin the page. The pin is held just long enough t= o >> guaranteed that any invalidation via the mmu_notifier will be stalled >> until after KVM finishes installing the page into the secondary MMU, i.e= . >> the pin is short-term and not held across a return to userspace or entry >> into the guest. When a subsequent mmu_notifier invalidation occurs, KVM >> pulls the PFN from the secondary MMU and uses that to update accessed >> and dirty bits in the host. >> >> There are a few other KVM flows that eventually call into gup(), but tho= se >> are "traditional" short-term pins and use put_page() directly. >=20 > Ok, I was misinterpreting the effect of the bug with what KVM is using > the reference to do. >=20 > To your other point: >=20 >> But David's proposed fix for the above refcount bug is to omit the patch >> so that KVM no longer treats ZONE_DEVICE pages as reserved. That seems >> like the right thing to do, including for thp_adjust(), e.g. it would >> naturally let KVM use 2mb pages for the guest when a ZONE_DEVICE page is >> mapped with a huge page (2mb or above) in the host. The only hiccup is >> figuring out how to correctly transfer the reference. >=20 > That might not be the only hiccup. There's currently no such thing as > huge pages for ZONE_DEVICE, there are huge *mappings* (pmd and pud), > but the result of pfn_to_page() on such a mapping does not yield a > huge 'struct page'. It seems there are other paths in KVM that assume > that more typical page machinery is active like SetPageDirty() based > on kvm_is_reserved_pfn(). While I told David that I did not want to > see more usage of is_zone_device_page(), this patch below (untested) > seems a cleaner path with less surprises: >=20 > diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c > index 4df0aa6b8e5c..fbea17c1810c 100644 > --- a/virt/kvm/kvm_main.c > +++ b/virt/kvm/kvm_main.c > @@ -1831,7 +1831,8 @@ EXPORT_SYMBOL_GPL(kvm_release_page_clean); >=20 > void kvm_release_pfn_clean(kvm_pfn_t pfn) > { > - if (!is_error_noslot_pfn(pfn) && !kvm_is_reserved_pfn(pfn)) > + if ((!is_error_noslot_pfn(pfn) && !kvm_is_reserved_pfn(pfn)) || > + (pfn_valid(pfn) && is_zone_device_page(pfn_to_page(pfn)))) > put_page(pfn_to_page(pfn)); > } > EXPORT_SYMBOL_GPL(kvm_release_pfn_clean); I had the same thought, but I do wonder about the kvm_get_pfn() users,=20 e.g.,: hva_to_pfn_remapped(): =09r =3D follow_pfn(vma, addr, &pfn); =09... =09kvm_get_pfn(pfn); =09... We would not take a reference for ZONE_DEVICE, but later drop one=20 reference via kvm_release_pfn_clean(). IOW, kvm_get_pfn() gets *really*=20 dangerous to use. I can't tell if this can happen right now. We do have 3 users of kvm_get_pfn() that we have to audit before this=20 change. Also, we should add a comment to kvm_get_pfn() that it should=20 never be used with possible ZONE_DEVICE pages. Also, we should add a comment to kvm_release_pfn_clean(), describing why=20 we treat ZONE_DEVICE in a special way here. We can then progress like this 1. Get this fix upstream, it's somewhat unrelated to this series. 2. This patch here remains as is in this series (+/- documentation update) 3. Long term, rework KVM to not have to not treat ZONE_DEVICE like=20 reserved pages. E.g., get rid of kvm_get_pfn(). Then, this special zone=20 check can go. --=20 Thanks, David / dhildenb From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-2.0 required=3.0 tests=DKIM_INVALID,DKIM_SIGNED, HEADER_FROM_DIFFERENT_DOMAINS,MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS, URIBL_BLOCKED,USER_AGENT_SANE_1 autolearn=no autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 0C9E2C5DF62 for ; Wed, 6 Nov 2019 06:57:09 +0000 (UTC) Received: from hemlock.osuosl.org (smtp2.osuosl.org [140.211.166.133]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id C3AD2217D7 for ; Wed, 6 Nov 2019 06:57:08 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=fail reason="signature verification failed" (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b="SRtP8Fln" DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org C3AD2217D7 Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=redhat.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=driverdev-devel-bounces@linuxdriverproject.org Received: from localhost (localhost [127.0.0.1]) by hemlock.osuosl.org (Postfix) with ESMTP id A85898A54E; Wed, 6 Nov 2019 06:57:08 +0000 (UTC) X-Virus-Scanned: amavisd-new at osuosl.org Received: from hemlock.osuosl.org ([127.0.0.1]) by localhost (.osuosl.org [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id gmXUXsoF-1dG; Wed, 6 Nov 2019 06:57:07 +0000 (UTC) Received: from ash.osuosl.org (ash.osuosl.org [140.211.166.34]) by hemlock.osuosl.org (Postfix) with ESMTP id 9719B8A19C; Wed, 6 Nov 2019 06:57:07 +0000 (UTC) Received: from whitealder.osuosl.org (smtp1.osuosl.org [140.211.166.138]) by ash.osuosl.org (Postfix) with ESMTP id BF0291BF487 for ; Wed, 6 Nov 2019 06:57:06 +0000 (UTC) Received: from localhost (localhost [127.0.0.1]) by whitealder.osuosl.org (Postfix) with ESMTP id B92E089CEB for ; Wed, 6 Nov 2019 06:57:06 +0000 (UTC) X-Virus-Scanned: amavisd-new at osuosl.org Received: from whitealder.osuosl.org ([127.0.0.1]) by localhost (.osuosl.org [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id 7iE2-FAx8-3F for ; Wed, 6 Nov 2019 06:57:05 +0000 (UTC) X-Greylist: domain auto-whitelisted by SQLgrey-1.7.6 Received: from us-smtp-1.mimecast.com (us-smtp-delivery-1.mimecast.com [207.211.31.120]) by whitealder.osuosl.org (Postfix) with ESMTPS id 77CB689CEA for ; Wed, 6 Nov 2019 06:57:05 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1573023424; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=gHhiVRGjTg2z5h3dRfpoqGRkrXa+G8taPvqq9+5Ox4o=; b=SRtP8Fln1eP9pf1T2MuOK56VKv9DSNmVShxpAwY8MqGAX6v2ouXF1X4m9XowP6c01f8K00 R+RcWMI4DOXoJ4uZzkrOcZRX29aYjcMEm+InlsCT+gowIrEK4kaGmRbR30RQyg+6S4LF3M /itbaFKlofdj88Q9WIyIN8k43zQGUdI= Received: from mimecast-mx01.redhat.com (mimecast-mx01.redhat.com [209.132.183.4]) (Using TLS) by relay.mimecast.com with ESMTP id us-mta-43-W8IUVLyUMSC0rC0dumrlPQ-1; Wed, 06 Nov 2019 01:57:02 -0500 Received: from smtp.corp.redhat.com (int-mx05.intmail.prod.int.phx2.redhat.com [10.5.11.15]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx01.redhat.com (Postfix) with ESMTPS id 7029E8017E0; Wed, 6 Nov 2019 06:56:55 +0000 (UTC) Received: from [10.36.116.143] (ovpn-116-143.ams2.redhat.com [10.36.116.143]) by smtp.corp.redhat.com (Postfix) with ESMTP id DB0E15D70E; Wed, 6 Nov 2019 06:56:35 +0000 (UTC) Subject: Re: [PATCH v1 03/10] KVM: Prepare kvm_is_reserved_pfn() for PG_reserved changes To: Dan Williams , Sean Christopherson References: <01adb4cb-6092-638c-0bab-e61322be7cf5@redhat.com> <613f3606-748b-0e56-a3ad-1efaffa1a67b@redhat.com> <20191105160000.GC8128@linux.intel.com> <20191105231316.GE23297@linux.intel.com> <20191106000315.GI23297@linux.intel.com> From: David Hildenbrand Organization: Red Hat GmbH Message-ID: <694202e7-d8e6-6ac8-6e47-3553b298bbcc@redhat.com> Date: Wed, 6 Nov 2019 07:56:34 +0100 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:68.0) Gecko/20100101 Thunderbird/68.1.1 MIME-Version: 1.0 In-Reply-To: Content-Language: en-US X-Scanned-By: MIMEDefang 2.79 on 10.5.11.15 X-MC-Unique: W8IUVLyUMSC0rC0dumrlPQ-1 X-Mimecast-Spam-Score: 0 X-BeenThere: driverdev-devel@linuxdriverproject.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Linux Driver Project Developer List List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: linux-hyperv@vger.kernel.org, Michal Hocko , =?UTF-8?B?UmFkaW0gS3LEjW3DocWZ?= , KVM list , Pavel Tatashin , KarimAllah Ahmed , Benjamin Herrenschmidt , Dave Hansen , Alexander Duyck , Michal Hocko , Paul Mackerras , Linux MM , Paul Mackerras , Michael Ellerman , "H. Peter Anvin" , Wanpeng Li , Alexander Duyck , Thomas Gleixner , Kees Cook , devel@driverdev.osuosl.org, Stefano Stabellini , Stephen Hemminger , "Aneesh Kumar K.V" , Joerg Roedel , X86 ML , YueHaibing , "Matthew Wilcox \(Oracle\)" , Mike Rapoport , Peter Zijlstra , Ingo Molnar , Vlastimil Babka , Anthony Yznaga , Oscar Salvador , "Isaac J. Manjarres" , Juergen Gross , Anshuman Khandual , Haiyang Zhang , Sasha Levin , kvm-ppc@vger.kernel.org, Qian Cai , Alex Williamson , Mike Rapoport , Borislav Petkov , Nicholas Piggin , Andy Lutomirski , xen-devel , Boris Ostrovsky , Vitaly Kuznetsov , Allison Randal , Jim Mattson , Christophe Leroy , Mel Gorman , Adam Borowski , Cornelia Huck , Pavel Tatashin , Linux Kernel Mailing List , Johannes Weiner , Paolo Bonzini , Andrew Morton , linuxppc-dev Content-Transfer-Encoding: 7bit Content-Type: text/plain; charset="us-ascii"; Format="flowed" Errors-To: driverdev-devel-bounces@linuxdriverproject.org Sender: "devel" On 06.11.19 01:08, Dan Williams wrote: > On Tue, Nov 5, 2019 at 4:03 PM Sean Christopherson > wrote: >> >> On Tue, Nov 05, 2019 at 03:43:29PM -0800, Dan Williams wrote: >>> On Tue, Nov 5, 2019 at 3:30 PM Dan Williams wrote: >>>> >>>> On Tue, Nov 5, 2019 at 3:13 PM Sean Christopherson >>>> wrote: >>>>> >>>>> On Tue, Nov 05, 2019 at 03:02:40PM -0800, Dan Williams wrote: >>>>>> On Tue, Nov 5, 2019 at 12:31 PM David Hildenbrand wrote: >>>>>>>> The scarier code (for me) is transparent_hugepage_adjust() and >>>>>>>> kvm_mmu_zap_collapsible_spte(), as I don't at all understand the >>>>>>>> interaction between THP and _PAGE_DEVMAP. >>>>>>> >>>>>>> The x86 KVM MMU code is one of the ugliest code I know (sorry, but it >>>>>>> had to be said :/ ). Luckily, this should be independent of the >>>>>>> PG_reserved thingy AFAIKs. >>>>>> >>>>>> Both transparent_hugepage_adjust() and kvm_mmu_zap_collapsible_spte() >>>>>> are honoring kvm_is_reserved_pfn(), so again I'm missing where the >>>>>> page count gets mismanaged and leads to the reported hang. >>>>> >>>>> When mapping pages into the guest, KVM gets the page via gup(), which >>>>> increments the page count for ZONE_DEVICE pages. But KVM puts the page >>>>> using kvm_release_pfn_clean(), which skips put_page() if PageReserved() >>>>> and so never puts its reference to ZONE_DEVICE pages. >>>> >>>> Oh, yeah, that's busted. >>> >>> Ugh, it's extra busted because every other gup user in the kernel >>> tracks the pages resulting from gup and puts them (put_page()) when >>> they are done. KVM wants to forget about whether it did a gup to get >>> the page and optionally trigger put_page() based purely on the pfn. >>> Outside of VFIO device assignment that needs pages pinned for DMA, why >>> does KVM itself need to pin pages? If pages are pinned over a return >>> to userspace that needs to be a FOLL_LONGTERM gup. >> >> Short answer, KVM pins the page to ensure correctness with respect to the >> primary MMU invalidating the associated host virtual address, e.g. when >> the page is being migrated or unmapped from host userspace. >> >> The main use of gup() is to handle guest page faults and map pages into >> the guest, i.e. into KVM's secondary MMU. KVM uses gup() to both get the >> PFN and to temporarily pin the page. The pin is held just long enough to >> guaranteed that any invalidation via the mmu_notifier will be stalled >> until after KVM finishes installing the page into the secondary MMU, i.e. >> the pin is short-term and not held across a return to userspace or entry >> into the guest. When a subsequent mmu_notifier invalidation occurs, KVM >> pulls the PFN from the secondary MMU and uses that to update accessed >> and dirty bits in the host. >> >> There are a few other KVM flows that eventually call into gup(), but those >> are "traditional" short-term pins and use put_page() directly. > > Ok, I was misinterpreting the effect of the bug with what KVM is using > the reference to do. > > To your other point: > >> But David's proposed fix for the above refcount bug is to omit the patch >> so that KVM no longer treats ZONE_DEVICE pages as reserved. That seems >> like the right thing to do, including for thp_adjust(), e.g. it would >> naturally let KVM use 2mb pages for the guest when a ZONE_DEVICE page is >> mapped with a huge page (2mb or above) in the host. The only hiccup is >> figuring out how to correctly transfer the reference. > > That might not be the only hiccup. There's currently no such thing as > huge pages for ZONE_DEVICE, there are huge *mappings* (pmd and pud), > but the result of pfn_to_page() on such a mapping does not yield a > huge 'struct page'. It seems there are other paths in KVM that assume > that more typical page machinery is active like SetPageDirty() based > on kvm_is_reserved_pfn(). While I told David that I did not want to > see more usage of is_zone_device_page(), this patch below (untested) > seems a cleaner path with less surprises: > > diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c > index 4df0aa6b8e5c..fbea17c1810c 100644 > --- a/virt/kvm/kvm_main.c > +++ b/virt/kvm/kvm_main.c > @@ -1831,7 +1831,8 @@ EXPORT_SYMBOL_GPL(kvm_release_page_clean); > > void kvm_release_pfn_clean(kvm_pfn_t pfn) > { > - if (!is_error_noslot_pfn(pfn) && !kvm_is_reserved_pfn(pfn)) > + if ((!is_error_noslot_pfn(pfn) && !kvm_is_reserved_pfn(pfn)) || > + (pfn_valid(pfn) && is_zone_device_page(pfn_to_page(pfn)))) > put_page(pfn_to_page(pfn)); > } > EXPORT_SYMBOL_GPL(kvm_release_pfn_clean); I had the same thought, but I do wonder about the kvm_get_pfn() users, e.g.,: hva_to_pfn_remapped(): r = follow_pfn(vma, addr, &pfn); ... kvm_get_pfn(pfn); ... We would not take a reference for ZONE_DEVICE, but later drop one reference via kvm_release_pfn_clean(). IOW, kvm_get_pfn() gets *really* dangerous to use. I can't tell if this can happen right now. We do have 3 users of kvm_get_pfn() that we have to audit before this change. Also, we should add a comment to kvm_get_pfn() that it should never be used with possible ZONE_DEVICE pages. Also, we should add a comment to kvm_release_pfn_clean(), describing why we treat ZONE_DEVICE in a special way here. We can then progress like this 1. Get this fix upstream, it's somewhat unrelated to this series. 2. This patch here remains as is in this series (+/- documentation update) 3. Long term, rework KVM to not have to not treat ZONE_DEVICE like reserved pages. E.g., get rid of kvm_get_pfn(). Then, this special zone check can go. -- Thanks, David / dhildenb _______________________________________________ devel mailing list devel@linuxdriverproject.org http://driverdev.linuxdriverproject.org/mailman/listinfo/driverdev-devel From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-2.0 required=3.0 tests=DKIM_INVALID,DKIM_SIGNED, HEADER_FROM_DIFFERENT_DOMAINS,MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS, URIBL_BLOCKED,USER_AGENT_SANE_1 autolearn=no autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 2A5B9C5DF62 for ; Wed, 6 Nov 2019 07:19:01 +0000 (UTC) Received: from lists.ozlabs.org (lists.ozlabs.org [203.11.71.2]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id BD53B2087E for ; Wed, 6 Nov 2019 07:19:00 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=fail reason="signature verification failed" (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b="K8UBhX1J" DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org BD53B2087E Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=redhat.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=linuxppc-dev-bounces+linuxppc-dev=archiver.kernel.org@lists.ozlabs.org Received: from bilbo.ozlabs.org (lists.ozlabs.org [IPv6:2401:3900:2:1::3]) by lists.ozlabs.org (Postfix) with ESMTP id 477Hvk0SRszF43N for ; Wed, 6 Nov 2019 18:18:58 +1100 (AEDT) Authentication-Results: lists.ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=redhat.com (client-ip=207.211.31.120; helo=us-smtp-1.mimecast.com; envelope-from=david@redhat.com; receiver=) Authentication-Results: lists.ozlabs.org; dmarc=pass (p=none dis=none) header.from=redhat.com Authentication-Results: lists.ozlabs.org; dkim=pass (1024-bit key; unprotected) header.d=redhat.com header.i=@redhat.com header.b="K8UBhX1J"; dkim-atps=neutral Received: from us-smtp-1.mimecast.com (us-smtp-delivery-1.mimecast.com [207.211.31.120]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by lists.ozlabs.org (Postfix) with ESMTPS id 477HQX1RXvzF5K3 for ; Wed, 6 Nov 2019 17:57:06 +1100 (AEDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1573023423; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=gHhiVRGjTg2z5h3dRfpoqGRkrXa+G8taPvqq9+5Ox4o=; b=K8UBhX1JtFZMs594Ypt3noWNFA7EDgGOzgTZeoLrecZiYBc0z9aQ1ftUT60ADHOM/4fKea S4RheFJRLPHeZXXsdxGjj1lj/DBLQ6j6EczxwIps2sdVeLlU8ExvD/QiqqB572Rk55bS6J wibfYYo3txGWC+O//e5qvLzZuGhvknc= Received: from mimecast-mx01.redhat.com (mimecast-mx01.redhat.com [209.132.183.4]) (Using TLS) by relay.mimecast.com with ESMTP id us-mta-43-W8IUVLyUMSC0rC0dumrlPQ-1; Wed, 06 Nov 2019 01:57:02 -0500 Received: from smtp.corp.redhat.com (int-mx05.intmail.prod.int.phx2.redhat.com [10.5.11.15]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx01.redhat.com (Postfix) with ESMTPS id 7029E8017E0; Wed, 6 Nov 2019 06:56:55 +0000 (UTC) Received: from [10.36.116.143] (ovpn-116-143.ams2.redhat.com [10.36.116.143]) by smtp.corp.redhat.com (Postfix) with ESMTP id DB0E15D70E; Wed, 6 Nov 2019 06:56:35 +0000 (UTC) Subject: Re: [PATCH v1 03/10] KVM: Prepare kvm_is_reserved_pfn() for PG_reserved changes To: Dan Williams , Sean Christopherson References: <01adb4cb-6092-638c-0bab-e61322be7cf5@redhat.com> <613f3606-748b-0e56-a3ad-1efaffa1a67b@redhat.com> <20191105160000.GC8128@linux.intel.com> <20191105231316.GE23297@linux.intel.com> <20191106000315.GI23297@linux.intel.com> From: David Hildenbrand Organization: Red Hat GmbH Message-ID: <694202e7-d8e6-6ac8-6e47-3553b298bbcc@redhat.com> Date: Wed, 6 Nov 2019 07:56:34 +0100 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:68.0) Gecko/20100101 Thunderbird/68.1.1 MIME-Version: 1.0 In-Reply-To: Content-Language: en-US X-Scanned-By: MIMEDefang 2.79 on 10.5.11.15 X-MC-Unique: W8IUVLyUMSC0rC0dumrlPQ-1 X-Mimecast-Spam-Score: 0 Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: quoted-printable X-Mailman-Approved-At: Wed, 06 Nov 2019 18:16:48 +1100 X-BeenThere: linuxppc-dev@lists.ozlabs.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Linux on PowerPC Developers Mail List List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: linux-hyperv@vger.kernel.org, Michal Hocko , =?UTF-8?B?UmFkaW0gS3LEjW3DocWZ?= , KVM list , Pavel Tatashin , KarimAllah Ahmed , Dave Hansen , Alexander Duyck , Michal Hocko , Linux MM , Paul Mackerras , "H. Peter Anvin" , Wanpeng Li , Alexander Duyck , "K. Y. Srinivasan" , Thomas Gleixner , Kees Cook , devel@driverdev.osuosl.org, Stefano Stabellini , Stephen Hemminger , "Aneesh Kumar K.V" , Joerg Roedel , X86 ML , YueHaibing , "Matthew Wilcox \(Oracle\)" , Mike Rapoport , Peter Zijlstra , Ingo Molnar , Vlastimil Babka , Anthony Yznaga , Oscar Salvador , "Isaac J. Manjarres" , Matt Sickler , Juergen Gross , Anshuman Khandual , Haiyang Zhang , Sasha Levin , kvm-ppc@vger.kernel.org, Qian Cai , Alex Williamson , Mike Rapoport , Borislav Petkov , Nicholas Piggin , Andy Lutomirski , xen-devel , Boris Ostrovsky , Vitaly Kuznetsov , Allison Randal , Jim Mattson , Mel Gorman , Adam Borowski , Cornelia Huck , Pavel Tatashin , Linux Kernel Mailing List , Johannes Weiner , Paolo Bonzini , Andrew Morton , linuxppc-dev Errors-To: linuxppc-dev-bounces+linuxppc-dev=archiver.kernel.org@lists.ozlabs.org Sender: "Linuxppc-dev" On 06.11.19 01:08, Dan Williams wrote: > On Tue, Nov 5, 2019 at 4:03 PM Sean Christopherson > wrote: >> >> On Tue, Nov 05, 2019 at 03:43:29PM -0800, Dan Williams wrote: >>> On Tue, Nov 5, 2019 at 3:30 PM Dan Williams = wrote: >>>> >>>> On Tue, Nov 5, 2019 at 3:13 PM Sean Christopherson >>>> wrote: >>>>> >>>>> On Tue, Nov 05, 2019 at 03:02:40PM -0800, Dan Williams wrote: >>>>>> On Tue, Nov 5, 2019 at 12:31 PM David Hildenbrand = wrote: >>>>>>>> The scarier code (for me) is transparent_hugepage_adjust() and >>>>>>>> kvm_mmu_zap_collapsible_spte(), as I don't at all understand the >>>>>>>> interaction between THP and _PAGE_DEVMAP. >>>>>>> >>>>>>> The x86 KVM MMU code is one of the ugliest code I know (sorry, but = it >>>>>>> had to be said :/ ). Luckily, this should be independent of the >>>>>>> PG_reserved thingy AFAIKs. >>>>>> >>>>>> Both transparent_hugepage_adjust() and kvm_mmu_zap_collapsible_spte(= ) >>>>>> are honoring kvm_is_reserved_pfn(), so again I'm missing where the >>>>>> page count gets mismanaged and leads to the reported hang. >>>>> >>>>> When mapping pages into the guest, KVM gets the page via gup(), which >>>>> increments the page count for ZONE_DEVICE pages. But KVM puts the pa= ge >>>>> using kvm_release_pfn_clean(), which skips put_page() if PageReserved= () >>>>> and so never puts its reference to ZONE_DEVICE pages. >>>> >>>> Oh, yeah, that's busted. >>> >>> Ugh, it's extra busted because every other gup user in the kernel >>> tracks the pages resulting from gup and puts them (put_page()) when >>> they are done. KVM wants to forget about whether it did a gup to get >>> the page and optionally trigger put_page() based purely on the pfn. >>> Outside of VFIO device assignment that needs pages pinned for DMA, why >>> does KVM itself need to pin pages? If pages are pinned over a return >>> to userspace that needs to be a FOLL_LONGTERM gup. >> >> Short answer, KVM pins the page to ensure correctness with respect to th= e >> primary MMU invalidating the associated host virtual address, e.g. when >> the page is being migrated or unmapped from host userspace. >> >> The main use of gup() is to handle guest page faults and map pages into >> the guest, i.e. into KVM's secondary MMU. KVM uses gup() to both get th= e >> PFN and to temporarily pin the page. The pin is held just long enough t= o >> guaranteed that any invalidation via the mmu_notifier will be stalled >> until after KVM finishes installing the page into the secondary MMU, i.e= . >> the pin is short-term and not held across a return to userspace or entry >> into the guest. When a subsequent mmu_notifier invalidation occurs, KVM >> pulls the PFN from the secondary MMU and uses that to update accessed >> and dirty bits in the host. >> >> There are a few other KVM flows that eventually call into gup(), but tho= se >> are "traditional" short-term pins and use put_page() directly. >=20 > Ok, I was misinterpreting the effect of the bug with what KVM is using > the reference to do. >=20 > To your other point: >=20 >> But David's proposed fix for the above refcount bug is to omit the patch >> so that KVM no longer treats ZONE_DEVICE pages as reserved. That seems >> like the right thing to do, including for thp_adjust(), e.g. it would >> naturally let KVM use 2mb pages for the guest when a ZONE_DEVICE page is >> mapped with a huge page (2mb or above) in the host. The only hiccup is >> figuring out how to correctly transfer the reference. >=20 > That might not be the only hiccup. There's currently no such thing as > huge pages for ZONE_DEVICE, there are huge *mappings* (pmd and pud), > but the result of pfn_to_page() on such a mapping does not yield a > huge 'struct page'. It seems there are other paths in KVM that assume > that more typical page machinery is active like SetPageDirty() based > on kvm_is_reserved_pfn(). While I told David that I did not want to > see more usage of is_zone_device_page(), this patch below (untested) > seems a cleaner path with less surprises: >=20 > diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c > index 4df0aa6b8e5c..fbea17c1810c 100644 > --- a/virt/kvm/kvm_main.c > +++ b/virt/kvm/kvm_main.c > @@ -1831,7 +1831,8 @@ EXPORT_SYMBOL_GPL(kvm_release_page_clean); >=20 > void kvm_release_pfn_clean(kvm_pfn_t pfn) > { > - if (!is_error_noslot_pfn(pfn) && !kvm_is_reserved_pfn(pfn)) > + if ((!is_error_noslot_pfn(pfn) && !kvm_is_reserved_pfn(pfn)) || > + (pfn_valid(pfn) && is_zone_device_page(pfn_to_page(pfn)))) > put_page(pfn_to_page(pfn)); > } > EXPORT_SYMBOL_GPL(kvm_release_pfn_clean); I had the same thought, but I do wonder about the kvm_get_pfn() users,=20 e.g.,: hva_to_pfn_remapped(): =09r =3D follow_pfn(vma, addr, &pfn); =09... =09kvm_get_pfn(pfn); =09... We would not take a reference for ZONE_DEVICE, but later drop one=20 reference via kvm_release_pfn_clean(). IOW, kvm_get_pfn() gets *really*=20 dangerous to use. I can't tell if this can happen right now. We do have 3 users of kvm_get_pfn() that we have to audit before this=20 change. Also, we should add a comment to kvm_get_pfn() that it should=20 never be used with possible ZONE_DEVICE pages. Also, we should add a comment to kvm_release_pfn_clean(), describing why=20 we treat ZONE_DEVICE in a special way here. We can then progress like this 1. Get this fix upstream, it's somewhat unrelated to this series. 2. This patch here remains as is in this series (+/- documentation update) 3. Long term, rework KVM to not have to not treat ZONE_DEVICE like=20 reserved pages. E.g., get rid of kvm_get_pfn(). Then, this special zone=20 check can go. --=20 Thanks, David / dhildenb From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-2.0 required=3.0 tests=DKIM_INVALID,DKIM_SIGNED, HEADER_FROM_DIFFERENT_DOMAINS,MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS, URIBL_BLOCKED,USER_AGENT_SANE_1 autolearn=no autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id E2D7CC5DF62 for ; Wed, 6 Nov 2019 06:57:30 +0000 (UTC) Received: from lists.xenproject.org (lists.xenproject.org [192.237.175.120]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id 898562173E for ; Wed, 6 Nov 2019 06:57:30 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=fail reason="signature verification failed" (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b="K8UBhX1J" DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 898562173E Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=redhat.com Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=xen-devel-bounces@lists.xenproject.org Received: from localhost ([127.0.0.1] helo=lists.xenproject.org) by lists.xenproject.org with esmtp (Exim 4.89) (envelope-from ) id 1iSFFb-00037L-W1; Wed, 06 Nov 2019 06:57:07 +0000 Received: from all-amaz-eas1.inumbo.com ([34.197.232.57] helo=us1-amaz-eas2.inumbo.com) by lists.xenproject.org with esmtp (Exim 4.89) (envelope-from ) id 1iSFFa-00036k-Oc for xen-devel@lists.xenproject.org; Wed, 06 Nov 2019 06:57:07 +0000 X-Inumbo-ID: a3a19b31-0062-11ea-a1a6-12813bfff9fa Received: from us-smtp-delivery-1.mimecast.com (unknown [207.211.31.81]) by us1-amaz-eas2.inumbo.com (Halon) with ESMTP id a3a19b31-0062-11ea-a1a6-12813bfff9fa; Wed, 06 Nov 2019 06:57:04 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1573023423; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=gHhiVRGjTg2z5h3dRfpoqGRkrXa+G8taPvqq9+5Ox4o=; b=K8UBhX1JtFZMs594Ypt3noWNFA7EDgGOzgTZeoLrecZiYBc0z9aQ1ftUT60ADHOM/4fKea S4RheFJRLPHeZXXsdxGjj1lj/DBLQ6j6EczxwIps2sdVeLlU8ExvD/QiqqB572Rk55bS6J wibfYYo3txGWC+O//e5qvLzZuGhvknc= Received: from mimecast-mx01.redhat.com (mimecast-mx01.redhat.com [209.132.183.4]) (Using TLS) by relay.mimecast.com with ESMTP id us-mta-43-W8IUVLyUMSC0rC0dumrlPQ-1; Wed, 06 Nov 2019 01:57:02 -0500 Received: from smtp.corp.redhat.com (int-mx05.intmail.prod.int.phx2.redhat.com [10.5.11.15]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx01.redhat.com (Postfix) with ESMTPS id 7029E8017E0; Wed, 6 Nov 2019 06:56:55 +0000 (UTC) Received: from [10.36.116.143] (ovpn-116-143.ams2.redhat.com [10.36.116.143]) by smtp.corp.redhat.com (Postfix) with ESMTP id DB0E15D70E; Wed, 6 Nov 2019 06:56:35 +0000 (UTC) To: Dan Williams , Sean Christopherson References: <01adb4cb-6092-638c-0bab-e61322be7cf5@redhat.com> <613f3606-748b-0e56-a3ad-1efaffa1a67b@redhat.com> <20191105160000.GC8128@linux.intel.com> <20191105231316.GE23297@linux.intel.com> <20191106000315.GI23297@linux.intel.com> From: David Hildenbrand Organization: Red Hat GmbH Message-ID: <694202e7-d8e6-6ac8-6e47-3553b298bbcc@redhat.com> Date: Wed, 6 Nov 2019 07:56:34 +0100 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:68.0) Gecko/20100101 Thunderbird/68.1.1 MIME-Version: 1.0 In-Reply-To: Content-Language: en-US X-Scanned-By: MIMEDefang 2.79 on 10.5.11.15 X-MC-Unique: W8IUVLyUMSC0rC0dumrlPQ-1 X-Mimecast-Spam-Score: 0 Subject: Re: [Xen-devel] [PATCH v1 03/10] KVM: Prepare kvm_is_reserved_pfn() for PG_reserved changes X-BeenThere: xen-devel@lists.xenproject.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: Xen developer discussion List-Unsubscribe: , List-Post: List-Help: List-Subscribe: , Cc: linux-hyperv@vger.kernel.org, Michal Hocko , =?UTF-8?B?UmFkaW0gS3LEjW3DocWZ?= , KVM list , Pavel Tatashin , KarimAllah Ahmed , Benjamin Herrenschmidt , Dave Hansen , Alexander Duyck , Michal Hocko , Paul Mackerras , Linux MM , Paul Mackerras , Michael Ellerman , "H. Peter Anvin" , Wanpeng Li , Alexander Duyck , "K. Y. Srinivasan" , Thomas Gleixner , Kees Cook , devel@driverdev.osuosl.org, Stefano Stabellini , Stephen Hemminger , "Aneesh Kumar K.V" , Joerg Roedel , X86 ML , YueHaibing , "Matthew Wilcox \(Oracle\)" , Mike Rapoport , Peter Zijlstra , Ingo Molnar , Vlastimil Babka , Anthony Yznaga , Oscar Salvador , "Isaac J. Manjarres" , Matt Sickler , Juergen Gross , Anshuman Khandual , Haiyang Zhang , Sasha Levin , kvm-ppc@vger.kernel.org, Qian Cai , Alex Williamson , Mike Rapoport , Borislav Petkov , Nicholas Piggin , Andy Lutomirski , xen-devel , Boris Ostrovsky , Vitaly Kuznetsov , Allison Randal , Jim Mattson , Christophe Leroy , Mel Gorman , Adam Borowski , Cornelia Huck , Pavel Tatashin , Linux Kernel Mailing List , Johannes Weiner , Paolo Bonzini , Andrew Morton , linuxppc-dev Content-Transfer-Encoding: base64 Content-Type: text/plain; charset="utf-8"; Format="flowed" Errors-To: xen-devel-bounces@lists.xenproject.org Sender: "Xen-devel" T24gMDYuMTEuMTkgMDE6MDgsIERhbiBXaWxsaWFtcyB3cm90ZToKPiBPbiBUdWUsIE5vdiA1LCAy MDE5IGF0IDQ6MDMgUE0gU2VhbiBDaHJpc3RvcGhlcnNvbgo+IDxzZWFuLmouY2hyaXN0b3BoZXJz b25AaW50ZWwuY29tPiB3cm90ZToKPj4KPj4gT24gVHVlLCBOb3YgMDUsIDIwMTkgYXQgMDM6NDM6 MjlQTSAtMDgwMCwgRGFuIFdpbGxpYW1zIHdyb3RlOgo+Pj4gT24gVHVlLCBOb3YgNSwgMjAxOSBh dCAzOjMwIFBNIERhbiBXaWxsaWFtcyA8ZGFuLmoud2lsbGlhbXNAaW50ZWwuY29tPiB3cm90ZToK Pj4+Pgo+Pj4+IE9uIFR1ZSwgTm92IDUsIDIwMTkgYXQgMzoxMyBQTSBTZWFuIENocmlzdG9waGVy c29uCj4+Pj4gPHNlYW4uai5jaHJpc3RvcGhlcnNvbkBpbnRlbC5jb20+IHdyb3RlOgo+Pj4+Pgo+ Pj4+PiBPbiBUdWUsIE5vdiAwNSwgMjAxOSBhdCAwMzowMjo0MFBNIC0wODAwLCBEYW4gV2lsbGlh bXMgd3JvdGU6Cj4+Pj4+PiBPbiBUdWUsIE5vdiA1LCAyMDE5IGF0IDEyOjMxIFBNIERhdmlkIEhp bGRlbmJyYW5kIDxkYXZpZEByZWRoYXQuY29tPiB3cm90ZToKPj4+Pj4+Pj4gVGhlIHNjYXJpZXIg Y29kZSAoZm9yIG1lKSBpcyB0cmFuc3BhcmVudF9odWdlcGFnZV9hZGp1c3QoKSBhbmQKPj4+Pj4+ Pj4ga3ZtX21tdV96YXBfY29sbGFwc2libGVfc3B0ZSgpLCBhcyBJIGRvbid0IGF0IGFsbCB1bmRl cnN0YW5kIHRoZQo+Pj4+Pj4+PiBpbnRlcmFjdGlvbiBiZXR3ZWVuIFRIUCBhbmQgX1BBR0VfREVW TUFQLgo+Pj4+Pj4+Cj4+Pj4+Pj4gVGhlIHg4NiBLVk0gTU1VIGNvZGUgaXMgb25lIG9mIHRoZSB1 Z2xpZXN0IGNvZGUgSSBrbm93IChzb3JyeSwgYnV0IGl0Cj4+Pj4+Pj4gaGFkIHRvIGJlIHNhaWQg Oi8gKS4gTHVja2lseSwgdGhpcyBzaG91bGQgYmUgaW5kZXBlbmRlbnQgb2YgdGhlCj4+Pj4+Pj4g UEdfcmVzZXJ2ZWQgdGhpbmd5IEFGQUlLcy4KPj4+Pj4+Cj4+Pj4+PiBCb3RoIHRyYW5zcGFyZW50 X2h1Z2VwYWdlX2FkanVzdCgpIGFuZCBrdm1fbW11X3phcF9jb2xsYXBzaWJsZV9zcHRlKCkKPj4+ Pj4+IGFyZSBob25vcmluZyBrdm1faXNfcmVzZXJ2ZWRfcGZuKCksIHNvIGFnYWluIEknbSBtaXNz aW5nIHdoZXJlIHRoZQo+Pj4+Pj4gcGFnZSBjb3VudCBnZXRzIG1pc21hbmFnZWQgYW5kIGxlYWRz IHRvIHRoZSByZXBvcnRlZCBoYW5nLgo+Pj4+Pgo+Pj4+PiBXaGVuIG1hcHBpbmcgcGFnZXMgaW50 byB0aGUgZ3Vlc3QsIEtWTSBnZXRzIHRoZSBwYWdlIHZpYSBndXAoKSwgd2hpY2gKPj4+Pj4gaW5j cmVtZW50cyB0aGUgcGFnZSBjb3VudCBmb3IgWk9ORV9ERVZJQ0UgcGFnZXMuICBCdXQgS1ZNIHB1 dHMgdGhlIHBhZ2UKPj4+Pj4gdXNpbmcga3ZtX3JlbGVhc2VfcGZuX2NsZWFuKCksIHdoaWNoIHNr aXBzIHB1dF9wYWdlKCkgaWYgUGFnZVJlc2VydmVkKCkKPj4+Pj4gYW5kIHNvIG5ldmVyIHB1dHMg aXRzIHJlZmVyZW5jZSB0byBaT05FX0RFVklDRSBwYWdlcy4KPj4+Pgo+Pj4+IE9oLCB5ZWFoLCB0 aGF0J3MgYnVzdGVkLgo+Pj4KPj4+IFVnaCwgaXQncyBleHRyYSBidXN0ZWQgYmVjYXVzZSBldmVy eSBvdGhlciBndXAgdXNlciBpbiB0aGUga2VybmVsCj4+PiB0cmFja3MgdGhlIHBhZ2VzIHJlc3Vs dGluZyBmcm9tIGd1cCBhbmQgcHV0cyB0aGVtIChwdXRfcGFnZSgpKSB3aGVuCj4+PiB0aGV5IGFy ZSBkb25lLiBLVk0gd2FudHMgdG8gZm9yZ2V0IGFib3V0IHdoZXRoZXIgaXQgZGlkIGEgZ3VwIHRv IGdldAo+Pj4gdGhlIHBhZ2UgYW5kIG9wdGlvbmFsbHkgdHJpZ2dlciBwdXRfcGFnZSgpIGJhc2Vk IHB1cmVseSBvbiB0aGUgcGZuLgo+Pj4gT3V0c2lkZSBvZiBWRklPIGRldmljZSBhc3NpZ25tZW50 IHRoYXQgbmVlZHMgcGFnZXMgcGlubmVkIGZvciBETUEsIHdoeQo+Pj4gZG9lcyBLVk0gaXRzZWxm IG5lZWQgdG8gcGluIHBhZ2VzPyBJZiBwYWdlcyBhcmUgcGlubmVkIG92ZXIgYSByZXR1cm4KPj4+ IHRvIHVzZXJzcGFjZSB0aGF0IG5lZWRzIHRvIGJlIGEgRk9MTF9MT05HVEVSTSBndXAuCj4+Cj4+ IFNob3J0IGFuc3dlciwgS1ZNIHBpbnMgdGhlIHBhZ2UgdG8gZW5zdXJlIGNvcnJlY3RuZXNzIHdp dGggcmVzcGVjdCB0byB0aGUKPj4gcHJpbWFyeSBNTVUgaW52YWxpZGF0aW5nIHRoZSBhc3NvY2lh dGVkIGhvc3QgdmlydHVhbCBhZGRyZXNzLCBlLmcuIHdoZW4KPj4gdGhlIHBhZ2UgaXMgYmVpbmcg bWlncmF0ZWQgb3IgdW5tYXBwZWQgZnJvbSBob3N0IHVzZXJzcGFjZS4KPj4KPj4gVGhlIG1haW4g dXNlIG9mIGd1cCgpIGlzIHRvIGhhbmRsZSBndWVzdCBwYWdlIGZhdWx0cyBhbmQgbWFwIHBhZ2Vz IGludG8KPj4gdGhlIGd1ZXN0LCBpLmUuIGludG8gS1ZNJ3Mgc2Vjb25kYXJ5IE1NVS4gIEtWTSB1 c2VzIGd1cCgpIHRvIGJvdGggZ2V0IHRoZQo+PiBQRk4gYW5kIHRvIHRlbXBvcmFyaWx5IHBpbiB0 aGUgcGFnZS4gIFRoZSBwaW4gaXMgaGVsZCBqdXN0IGxvbmcgZW5vdWdoIHRvCj4+IGd1YXJhbnRl ZWQgdGhhdCBhbnkgaW52YWxpZGF0aW9uIHZpYSB0aGUgbW11X25vdGlmaWVyIHdpbGwgYmUgc3Rh bGxlZAo+PiB1bnRpbCBhZnRlciBLVk0gZmluaXNoZXMgaW5zdGFsbGluZyB0aGUgcGFnZSBpbnRv IHRoZSBzZWNvbmRhcnkgTU1VLCBpLmUuCj4+IHRoZSBwaW4gaXMgc2hvcnQtdGVybSBhbmQgbm90 IGhlbGQgYWNyb3NzIGEgcmV0dXJuIHRvIHVzZXJzcGFjZSBvciBlbnRyeQo+PiBpbnRvIHRoZSBn dWVzdC4gIFdoZW4gYSBzdWJzZXF1ZW50IG1tdV9ub3RpZmllciBpbnZhbGlkYXRpb24gb2NjdXJz LCBLVk0KPj4gcHVsbHMgdGhlIFBGTiBmcm9tIHRoZSBzZWNvbmRhcnkgTU1VIGFuZCB1c2VzIHRo YXQgdG8gdXBkYXRlIGFjY2Vzc2VkCj4+IGFuZCBkaXJ0eSBiaXRzIGluIHRoZSBob3N0Lgo+Pgo+ PiBUaGVyZSBhcmUgYSBmZXcgb3RoZXIgS1ZNIGZsb3dzIHRoYXQgZXZlbnR1YWxseSBjYWxsIGlu dG8gZ3VwKCksIGJ1dCB0aG9zZQo+PiBhcmUgInRyYWRpdGlvbmFsIiBzaG9ydC10ZXJtIHBpbnMg YW5kIHVzZSBwdXRfcGFnZSgpIGRpcmVjdGx5Lgo+IAo+IE9rLCBJIHdhcyBtaXNpbnRlcnByZXRp bmcgdGhlIGVmZmVjdCBvZiB0aGUgYnVnIHdpdGggd2hhdCBLVk0gaXMgdXNpbmcKPiB0aGUgcmVm ZXJlbmNlIHRvIGRvLgo+IAo+IFRvIHlvdXIgb3RoZXIgcG9pbnQ6Cj4gCj4+IEJ1dCBEYXZpZCdz IHByb3Bvc2VkIGZpeCBmb3IgdGhlIGFib3ZlIHJlZmNvdW50IGJ1ZyBpcyB0byBvbWl0IHRoZSBw YXRjaAo+PiBzbyB0aGF0IEtWTSBubyBsb25nZXIgdHJlYXRzIFpPTkVfREVWSUNFIHBhZ2VzIGFz IHJlc2VydmVkLiAgVGhhdCBzZWVtcwo+PiBsaWtlIHRoZSByaWdodCB0aGluZyB0byBkbywgaW5j bHVkaW5nIGZvciB0aHBfYWRqdXN0KCksIGUuZy4gaXQgd291bGQKPj4gbmF0dXJhbGx5IGxldCBL Vk0gdXNlIDJtYiBwYWdlcyBmb3IgdGhlIGd1ZXN0IHdoZW4gYSBaT05FX0RFVklDRSBwYWdlIGlz Cj4+IG1hcHBlZCB3aXRoIGEgaHVnZSBwYWdlICgybWIgb3IgYWJvdmUpIGluIHRoZSBob3N0LiAg VGhlIG9ubHkgaGljY3VwIGlzCj4+IGZpZ3VyaW5nIG91dCBob3cgdG8gY29ycmVjdGx5IHRyYW5z ZmVyIHRoZSByZWZlcmVuY2UuCj4gCj4gVGhhdCBtaWdodCBub3QgYmUgdGhlIG9ubHkgaGljY3Vw LiBUaGVyZSdzIGN1cnJlbnRseSBubyBzdWNoIHRoaW5nIGFzCj4gaHVnZSBwYWdlcyBmb3IgWk9O RV9ERVZJQ0UsIHRoZXJlIGFyZSBodWdlICptYXBwaW5ncyogKHBtZCBhbmQgcHVkKSwKPiBidXQg dGhlIHJlc3VsdCBvZiBwZm5fdG9fcGFnZSgpIG9uIHN1Y2ggYSBtYXBwaW5nIGRvZXMgbm90IHlp ZWxkIGEKPiBodWdlICdzdHJ1Y3QgcGFnZScuIEl0IHNlZW1zIHRoZXJlIGFyZSBvdGhlciBwYXRo cyBpbiBLVk0gdGhhdCBhc3N1bWUKPiB0aGF0IG1vcmUgdHlwaWNhbCBwYWdlIG1hY2hpbmVyeSBp cyBhY3RpdmUgbGlrZSBTZXRQYWdlRGlydHkoKSBiYXNlZAo+IG9uIGt2bV9pc19yZXNlcnZlZF9w Zm4oKS4gV2hpbGUgSSB0b2xkIERhdmlkIHRoYXQgSSBkaWQgbm90IHdhbnQgdG8KPiBzZWUgbW9y ZSB1c2FnZSBvZiBpc196b25lX2RldmljZV9wYWdlKCksIHRoaXMgcGF0Y2ggYmVsb3cgKHVudGVz dGVkKQo+IHNlZW1zIGEgY2xlYW5lciBwYXRoIHdpdGggbGVzcyBzdXJwcmlzZXM6Cj4gCj4gZGlm ZiAtLWdpdCBhL3ZpcnQva3ZtL2t2bV9tYWluLmMgYi92aXJ0L2t2bS9rdm1fbWFpbi5jCj4gaW5k ZXggNGRmMGFhNmI4ZTVjLi5mYmVhMTdjMTgxMGMgMTAwNjQ0Cj4gLS0tIGEvdmlydC9rdm0va3Zt X21haW4uYwo+ICsrKyBiL3ZpcnQva3ZtL2t2bV9tYWluLmMKPiBAQCAtMTgzMSw3ICsxODMxLDgg QEAgRVhQT1JUX1NZTUJPTF9HUEwoa3ZtX3JlbGVhc2VfcGFnZV9jbGVhbik7Cj4gCj4gICB2b2lk IGt2bV9yZWxlYXNlX3Bmbl9jbGVhbihrdm1fcGZuX3QgcGZuKQo+ICAgewo+IC0gICAgICAgaWYg KCFpc19lcnJvcl9ub3Nsb3RfcGZuKHBmbikgJiYgIWt2bV9pc19yZXNlcnZlZF9wZm4ocGZuKSkK PiArICAgICAgIGlmICgoIWlzX2Vycm9yX25vc2xvdF9wZm4ocGZuKSAmJiAha3ZtX2lzX3Jlc2Vy dmVkX3BmbihwZm4pKSB8fAo+ICsgICAgICAgICAgIChwZm5fdmFsaWQocGZuKSAmJiBpc196b25l X2RldmljZV9wYWdlKHBmbl90b19wYWdlKHBmbikpKSkKPiAgICAgICAgICAgICAgICAgIHB1dF9w YWdlKHBmbl90b19wYWdlKHBmbikpOwo+ICAgfQo+ICAgRVhQT1JUX1NZTUJPTF9HUEwoa3ZtX3Jl bGVhc2VfcGZuX2NsZWFuKTsKCkkgaGFkIHRoZSBzYW1lIHRob3VnaHQsIGJ1dCBJIGRvIHdvbmRl ciBhYm91dCB0aGUga3ZtX2dldF9wZm4oKSB1c2VycywgCmUuZy4sOgoKaHZhX3RvX3Bmbl9yZW1h cHBlZCgpOgoJciA9IGZvbGxvd19wZm4odm1hLCBhZGRyLCAmcGZuKTsKCS4uLgoJa3ZtX2dldF9w Zm4ocGZuKTsKCS4uLgoKV2Ugd291bGQgbm90IHRha2UgYSByZWZlcmVuY2UgZm9yIFpPTkVfREVW SUNFLCBidXQgbGF0ZXIgZHJvcCBvbmUgCnJlZmVyZW5jZSB2aWEga3ZtX3JlbGVhc2VfcGZuX2Ns ZWFuKCkuIElPVywga3ZtX2dldF9wZm4oKSBnZXRzICpyZWFsbHkqIApkYW5nZXJvdXMgdG8gdXNl LiBJIGNhbid0IHRlbGwgaWYgdGhpcyBjYW4gaGFwcGVuIHJpZ2h0IG5vdy4KCldlIGRvIGhhdmUg MyB1c2VycyBvZiBrdm1fZ2V0X3BmbigpIHRoYXQgd2UgaGF2ZSB0byBhdWRpdCBiZWZvcmUgdGhp cyAKY2hhbmdlLiBBbHNvLCB3ZSBzaG91bGQgYWRkIGEgY29tbWVudCB0byBrdm1fZ2V0X3Bmbigp IHRoYXQgaXQgc2hvdWxkIApuZXZlciBiZSB1c2VkIHdpdGggcG9zc2libGUgWk9ORV9ERVZJQ0Ug cGFnZXMuCgpBbHNvLCB3ZSBzaG91bGQgYWRkIGEgY29tbWVudCB0byBrdm1fcmVsZWFzZV9wZm5f Y2xlYW4oKSwgZGVzY3JpYmluZyB3aHkgCndlIHRyZWF0IFpPTkVfREVWSUNFIGluIGEgc3BlY2lh bCB3YXkgaGVyZS4KCgpXZSBjYW4gdGhlbiBwcm9ncmVzcyBsaWtlIHRoaXMKCjEuIEdldCB0aGlz IGZpeCB1cHN0cmVhbSwgaXQncyBzb21ld2hhdCB1bnJlbGF0ZWQgdG8gdGhpcyBzZXJpZXMuCjIu IFRoaXMgcGF0Y2ggaGVyZSByZW1haW5zIGFzIGlzIGluIHRoaXMgc2VyaWVzICgrLy0gZG9jdW1l bnRhdGlvbiB1cGRhdGUpCjMuIExvbmcgdGVybSwgcmV3b3JrIEtWTSB0byBub3QgaGF2ZSB0byBu b3QgdHJlYXQgWk9ORV9ERVZJQ0UgbGlrZSAKcmVzZXJ2ZWQgcGFnZXMuIEUuZy4sIGdldCByaWQg b2Yga3ZtX2dldF9wZm4oKS4gVGhlbiwgdGhpcyBzcGVjaWFsIHpvbmUgCmNoZWNrIGNhbiBnby4K Ci0tIAoKVGhhbmtzLAoKRGF2aWQgLyBkaGlsZGVuYgoKCl9fX19fX19fX19fX19fX19fX19fX19f X19fX19fX19fX19fX19fX19fX19fX19fClhlbi1kZXZlbCBtYWlsaW5nIGxpc3QKWGVuLWRldmVs QGxpc3RzLnhlbnByb2plY3Qub3JnCmh0dHBzOi8vbGlzdHMueGVucHJvamVjdC5vcmcvbWFpbG1h bi9saXN0aW5mby94ZW4tZGV2ZWw=