From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-3.7 required=3.0 tests=DKIM_SIGNED,DKIM_VALID, HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_PATCH,MAILING_LIST_MULTI,SPF_HELO_NONE, SPF_PASS,URIBL_BLOCKED autolearn=no autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 2147BC5DF60 for ; Wed, 6 Nov 2019 00:08:47 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id 9D76C21A4A for ; Wed, 6 Nov 2019 00:08:46 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (2048-bit key) header.d=intel-com.20150623.gappssmtp.com header.i=@intel-com.20150623.gappssmtp.com header.b="e+Di3mlc" DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 9D76C21A4A Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=intel.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id 4785F6B0003; Tue, 5 Nov 2019 19:08:46 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 401CE6B0005; Tue, 5 Nov 2019 19:08:46 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 2A2596B0007; Tue, 5 Nov 2019 19:08:46 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0052.hostedemail.com [216.40.44.52]) by kanga.kvack.org (Postfix) with ESMTP id 10F946B0003 for ; Tue, 5 Nov 2019 19:08:46 -0500 (EST) Received: from smtpin11.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay01.hostedemail.com (Postfix) with SMTP id C3505180AD81A for ; Wed, 6 Nov 2019 00:08:45 +0000 (UTC) X-FDA: 76123916610.11.team88_86a69f7230317 X-HE-Tag: team88_86a69f7230317 X-Filterd-Recvd-Size: 10783 Received: from mail-ot1-f66.google.com (mail-ot1-f66.google.com [209.85.210.66]) by imf50.hostedemail.com (Postfix) with ESMTP for ; Wed, 6 Nov 2019 00:08:45 +0000 (UTC) Received: by mail-ot1-f66.google.com with SMTP id l14so4961585oti.10 for ; Tue, 05 Nov 2019 16:08:44 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=intel-com.20150623.gappssmtp.com; s=20150623; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc; bh=NBnxU3butVFZM7AnAXfHT8dkMjtUFYfyCvUDW61SXV0=; b=e+Di3mlc5nguC0AjBP91KU4QLtXyLv60VHL8bcA2yTFqSx8GzUT3JPnO2SUvw9dJ3e x892tG+naCnJ7xx1/XyJqFBd5BAvIkbzvc8Hm+iANPYt1TTQlO54TnNTT2/Vr/wnnHwU cWH/uHsxgeOfe757EoaxLRwaf5ARnH5aT0zdOmRUm2qEb9I99F92KkPxOUH4REMi83wk kEyAkGq3/W1xKyxTMdH0KRUk/WXIRl4YbdL49pTFBgS4yeLctZvOTpoL9B1muQPVqJmY p3ipEL1tac4Bnfpc8lK1KzxSavaLkdb1VqqzcOJA3HvnvhhzQSrRj7AaXeANHjyy47rG tLMg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=NBnxU3butVFZM7AnAXfHT8dkMjtUFYfyCvUDW61SXV0=; b=Cb04lfX8fhqi73P9zLJvr/Dugfr6zCNxoJDynlssJD9glDHJZXTa2HHNnzNVkNTXCD gyCBOXzOzvNLRm+CJUtAGlt0g4F57//SnljOiBm/Qv7vJPADxTKO6j+pQ0EY0UP2ERdh aX2gLzqpHP2MF0Wf+Z758vcb3OuNUEbIlpoqMdrsP8JQ34mStfDRdgc1HMVgYc94M6zK vb6VxVhmqucUSPKQuHk6kzOvmBKDVx/WGqQJZlKITxQgIzWvf9PkAyvliGpT1HEXorbu kwqUoO4wM76JoWJD4gLB01pCLfdb+PaSFnNN5FQ0JjV154UCmdgzQ+TWUXSd1gnfKBdt 0ItA== X-Gm-Message-State: APjAAAVstq+O5T/qoYDOBS2CUhwOQrD4ikIBk6+ND37ItlcxQNcPmtwU G7y8sUYQJRF1T0pK6bSkAd5KpWkQ67l2JWecenRmrw== X-Google-Smtp-Source: APXvYqy6XKydDtiqLti7hhnEPqaloEXe8khYaaQC8o7B44In3kaSUL1URuHERGnTYAOJPrixGMDMzxSCyGGCNLsXEME= X-Received: by 2002:a9d:5f11:: with SMTP id f17mr24252119oti.207.1572998924064; Tue, 05 Nov 2019 16:08:44 -0800 (PST) MIME-Version: 1.0 References: <01adb4cb-6092-638c-0bab-e61322be7cf5@redhat.com> <613f3606-748b-0e56-a3ad-1efaffa1a67b@redhat.com> <20191105160000.GC8128@linux.intel.com> <20191105231316.GE23297@linux.intel.com> <20191106000315.GI23297@linux.intel.com> In-Reply-To: <20191106000315.GI23297@linux.intel.com> From: Dan Williams Date: Tue, 5 Nov 2019 16:08:32 -0800 Message-ID: Subject: Re: [PATCH v1 03/10] KVM: Prepare kvm_is_reserved_pfn() for PG_reserved changes To: Sean Christopherson Cc: David Hildenbrand , Linux Kernel Mailing List , Linux MM , Michal Hocko , Andrew Morton , kvm-ppc@vger.kernel.org, linuxppc-dev , KVM list , linux-hyperv@vger.kernel.org, devel@driverdev.osuosl.org, xen-devel , X86 ML , Alexander Duyck , Alexander Duyck , Alex Williamson , Allison Randal , Andy Lutomirski , "Aneesh Kumar K.V" , Anshuman Khandual , Anthony Yznaga , Benjamin Herrenschmidt , Borislav Petkov , Boris Ostrovsky , Christophe Leroy , Cornelia Huck , Dave Hansen , Haiyang Zhang , "H. Peter Anvin" , Ingo Molnar , "Isaac J. Manjarres" , Jim Mattson , Joerg Roedel , Johannes Weiner , Juergen Gross , KarimAllah Ahmed , Kees Cook , "K. Y. Srinivasan" , "Matthew Wilcox (Oracle)" , Matt Sickler , Mel Gorman , Michael Ellerman , Michal Hocko , Mike Rapoport , Mike Rapoport , Nicholas Piggin , Oscar Salvador , Paolo Bonzini , Paul Mackerras , Paul Mackerras , Pavel Tatashin , Pavel Tatashin , Peter Zijlstra , Qian Cai , =?UTF-8?B?UmFkaW0gS3LEjW3DocWZ?= , Sasha Levin , Stefano Stabellini , Stephen Hemminger , Thomas Gleixner , Vitaly Kuznetsov , Vlastimil Babka , Wanpeng Li , YueHaibing , Adam Borowski Content-Type: text/plain; charset="UTF-8" X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On Tue, Nov 5, 2019 at 4:03 PM Sean Christopherson wrote: > > On Tue, Nov 05, 2019 at 03:43:29PM -0800, Dan Williams wrote: > > On Tue, Nov 5, 2019 at 3:30 PM Dan Williams wrote: > > > > > > On Tue, Nov 5, 2019 at 3:13 PM Sean Christopherson > > > wrote: > > > > > > > > On Tue, Nov 05, 2019 at 03:02:40PM -0800, Dan Williams wrote: > > > > > On Tue, Nov 5, 2019 at 12:31 PM David Hildenbrand wrote: > > > > > > > The scarier code (for me) is transparent_hugepage_adjust() and > > > > > > > kvm_mmu_zap_collapsible_spte(), as I don't at all understand the > > > > > > > interaction between THP and _PAGE_DEVMAP. > > > > > > > > > > > > The x86 KVM MMU code is one of the ugliest code I know (sorry, but it > > > > > > had to be said :/ ). Luckily, this should be independent of the > > > > > > PG_reserved thingy AFAIKs. > > > > > > > > > > Both transparent_hugepage_adjust() and kvm_mmu_zap_collapsible_spte() > > > > > are honoring kvm_is_reserved_pfn(), so again I'm missing where the > > > > > page count gets mismanaged and leads to the reported hang. > > > > > > > > When mapping pages into the guest, KVM gets the page via gup(), which > > > > increments the page count for ZONE_DEVICE pages. But KVM puts the page > > > > using kvm_release_pfn_clean(), which skips put_page() if PageReserved() > > > > and so never puts its reference to ZONE_DEVICE pages. > > > > > > Oh, yeah, that's busted. > > > > Ugh, it's extra busted because every other gup user in the kernel > > tracks the pages resulting from gup and puts them (put_page()) when > > they are done. KVM wants to forget about whether it did a gup to get > > the page and optionally trigger put_page() based purely on the pfn. > > Outside of VFIO device assignment that needs pages pinned for DMA, why > > does KVM itself need to pin pages? If pages are pinned over a return > > to userspace that needs to be a FOLL_LONGTERM gup. > > Short answer, KVM pins the page to ensure correctness with respect to the > primary MMU invalidating the associated host virtual address, e.g. when > the page is being migrated or unmapped from host userspace. > > The main use of gup() is to handle guest page faults and map pages into > the guest, i.e. into KVM's secondary MMU. KVM uses gup() to both get the > PFN and to temporarily pin the page. The pin is held just long enough to > guaranteed that any invalidation via the mmu_notifier will be stalled > until after KVM finishes installing the page into the secondary MMU, i.e. > the pin is short-term and not held across a return to userspace or entry > into the guest. When a subsequent mmu_notifier invalidation occurs, KVM > pulls the PFN from the secondary MMU and uses that to update accessed > and dirty bits in the host. > > There are a few other KVM flows that eventually call into gup(), but those > are "traditional" short-term pins and use put_page() directly. Ok, I was misinterpreting the effect of the bug with what KVM is using the reference to do. To your other point: > But David's proposed fix for the above refcount bug is to omit the patch > so that KVM no longer treats ZONE_DEVICE pages as reserved. That seems > like the right thing to do, including for thp_adjust(), e.g. it would > naturally let KVM use 2mb pages for the guest when a ZONE_DEVICE page is > mapped with a huge page (2mb or above) in the host. The only hiccup is > figuring out how to correctly transfer the reference. That might not be the only hiccup. There's currently no such thing as huge pages for ZONE_DEVICE, there are huge *mappings* (pmd and pud), but the result of pfn_to_page() on such a mapping does not yield a huge 'struct page'. It seems there are other paths in KVM that assume that more typical page machinery is active like SetPageDirty() based on kvm_is_reserved_pfn(). While I told David that I did not want to see more usage of is_zone_device_page(), this patch below (untested) seems a cleaner path with less surprises: diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c index 4df0aa6b8e5c..fbea17c1810c 100644 --- a/virt/kvm/kvm_main.c +++ b/virt/kvm/kvm_main.c @@ -1831,7 +1831,8 @@ EXPORT_SYMBOL_GPL(kvm_release_page_clean); void kvm_release_pfn_clean(kvm_pfn_t pfn) { - if (!is_error_noslot_pfn(pfn) && !kvm_is_reserved_pfn(pfn)) + if ((!is_error_noslot_pfn(pfn) && !kvm_is_reserved_pfn(pfn)) || + (pfn_valid(pfn) && is_zone_device_page(pfn_to_page(pfn)))) put_page(pfn_to_page(pfn)); } EXPORT_SYMBOL_GPL(kvm_release_pfn_clean); This is safe because the reference that KVM took earlier protects the is_zone_device_page() lookup from racing device teardown. Otherwise, if KVM does not have a reference it's unsafe, but that's already even more broken because KVM would be releasing a page that it never referenced. Every other KVM operation that assumes page allocator pages would continue to honor kvm_is_reserved_pfn(). From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-3.5 required=3.0 tests=DKIM_INVALID,DKIM_SIGNED, HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_PATCH,MAILING_LIST_MULTI,SPF_HELO_NONE, SPF_PASS,URIBL_BLOCKED autolearn=no autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id E82A2C5DF60 for ; Wed, 6 Nov 2019 00:09:08 +0000 (UTC) Received: from silver.osuosl.org (smtp3.osuosl.org [140.211.166.136]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id 8DDB821A49 for ; Wed, 6 Nov 2019 00:09:08 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=fail reason="signature verification failed" (2048-bit key) header.d=intel-com.20150623.gappssmtp.com header.i=@intel-com.20150623.gappssmtp.com header.b="e+Di3mlc" DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 8DDB821A49 Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=intel.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=driverdev-devel-bounces@linuxdriverproject.org Received: from localhost (localhost [127.0.0.1]) by silver.osuosl.org (Postfix) with ESMTP id 1BB2F2267B; Wed, 6 Nov 2019 00:09:08 +0000 (UTC) X-Virus-Scanned: amavisd-new at osuosl.org Received: from silver.osuosl.org ([127.0.0.1]) by localhost (.osuosl.org [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id OxLZXdCe2I7J; Wed, 6 Nov 2019 00:09:07 +0000 (UTC) Received: from ash.osuosl.org (ash.osuosl.org [140.211.166.34]) by silver.osuosl.org (Postfix) with ESMTP id 6FB7422668; Wed, 6 Nov 2019 00:09:06 +0000 (UTC) Received: from silver.osuosl.org (smtp3.osuosl.org [140.211.166.136]) by ash.osuosl.org (Postfix) with ESMTP id D46171BF329 for ; Wed, 6 Nov 2019 00:09:01 +0000 (UTC) Received: from localhost (localhost [127.0.0.1]) by silver.osuosl.org (Postfix) with ESMTP id 9791722668 for ; Wed, 6 Nov 2019 00:08:55 +0000 (UTC) X-Virus-Scanned: amavisd-new at osuosl.org Received: from silver.osuosl.org ([127.0.0.1]) by localhost (.osuosl.org [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id Qp3cgdRarI44 for ; Wed, 6 Nov 2019 00:08:47 +0000 (UTC) X-Greylist: from auto-whitelisted by SQLgrey-1.7.6 Received: from mail-ot1-f65.google.com (mail-ot1-f65.google.com [209.85.210.65]) by silver.osuosl.org (Postfix) with ESMTPS id 0905722650 for ; Wed, 6 Nov 2019 00:08:44 +0000 (UTC) Received: by mail-ot1-f65.google.com with SMTP id b16so19287555otk.9 for ; Tue, 05 Nov 2019 16:08:44 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=intel-com.20150623.gappssmtp.com; s=20150623; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc; bh=NBnxU3butVFZM7AnAXfHT8dkMjtUFYfyCvUDW61SXV0=; b=e+Di3mlc5nguC0AjBP91KU4QLtXyLv60VHL8bcA2yTFqSx8GzUT3JPnO2SUvw9dJ3e x892tG+naCnJ7xx1/XyJqFBd5BAvIkbzvc8Hm+iANPYt1TTQlO54TnNTT2/Vr/wnnHwU cWH/uHsxgeOfe757EoaxLRwaf5ARnH5aT0zdOmRUm2qEb9I99F92KkPxOUH4REMi83wk kEyAkGq3/W1xKyxTMdH0KRUk/WXIRl4YbdL49pTFBgS4yeLctZvOTpoL9B1muQPVqJmY p3ipEL1tac4Bnfpc8lK1KzxSavaLkdb1VqqzcOJA3HvnvhhzQSrRj7AaXeANHjyy47rG tLMg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=NBnxU3butVFZM7AnAXfHT8dkMjtUFYfyCvUDW61SXV0=; b=g1ftzc/jpGigzKc9vjAUdTpfEXA34Xz1lo5OJBEOSso/hAv4KxGqy09oBacpCYc89F xJFHLULi1BbsdcvgO5IAhGAAxALt8o6wgbixiSjCNf45PsoD5/cugZwoz52UbUOre0YE 2ilubJYud4RdT5ieGvAkhyh2jKcAauLzpC20XTGSkgw9NqRYmnNmLD1rS9GUHvBfDlo4 UHopIwwYsJrTCXWnEYvFJKcG/FRhLuq8L0ePHEc3gIhAG4OsFJtcBWzgqLKJYzEwYBqT 03c5sjDTuV8I7sl1cjoYeSjJrzLab0kyWt0opaK2Kl81SpK9LBKNm+YTBVJ+gYKenKbu BOLg== X-Gm-Message-State: APjAAAXbWXO5RNoY5ZkRvZnNCIYypvxVejsBWNzkuwFtSHnmRZ75ODkb ctRcAGxrViNDm9rujLFQ4NsjdoL2t3v77yyBhTFGpIhWDg0= X-Google-Smtp-Source: APXvYqy6XKydDtiqLti7hhnEPqaloEXe8khYaaQC8o7B44In3kaSUL1URuHERGnTYAOJPrixGMDMzxSCyGGCNLsXEME= X-Received: by 2002:a9d:5f11:: with SMTP id f17mr24252119oti.207.1572998924064; Tue, 05 Nov 2019 16:08:44 -0800 (PST) MIME-Version: 1.0 References: <01adb4cb-6092-638c-0bab-e61322be7cf5@redhat.com> <613f3606-748b-0e56-a3ad-1efaffa1a67b@redhat.com> <20191105160000.GC8128@linux.intel.com> <20191105231316.GE23297@linux.intel.com> <20191106000315.GI23297@linux.intel.com> In-Reply-To: <20191106000315.GI23297@linux.intel.com> From: Dan Williams Date: Tue, 5 Nov 2019 16:08:32 -0800 Message-ID: Subject: Re: [PATCH v1 03/10] KVM: Prepare kvm_is_reserved_pfn() for PG_reserved changes To: Sean Christopherson X-BeenThere: driverdev-devel@linuxdriverproject.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Linux Driver Project Developer List List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: linux-hyperv@vger.kernel.org, Michal Hocko , =?UTF-8?B?UmFkaW0gS3LEjW3DocWZ?= , KVM list , David Hildenbrand , KarimAllah Ahmed , Benjamin Herrenschmidt , Dave Hansen , Alexander Duyck , Michal Hocko , Paul Mackerras , Linux MM , Pavel Tatashin , Paul Mackerras , Michael Ellerman , "H. Peter Anvin" , Wanpeng Li , Alexander Duyck , Thomas Gleixner , Kees Cook , devel@driverdev.osuosl.org, Stefano Stabellini , Stephen Hemminger , "Aneesh Kumar K.V" , Joerg Roedel , X86 ML , YueHaibing , "Matthew Wilcox \(Oracle\)" , Mike Rapoport , Peter Zijlstra , Ingo Molnar , Vlastimil Babka , Anthony Yznaga , Oscar Salvador , "Isaac J. Manjarres" , Juergen Gross , Anshuman Khandual , Haiyang Zhang , Sasha Levin , kvm-ppc@vger.kernel.org, Qian Cai , Alex Williamson , Mike Rapoport , Borislav Petkov , Nicholas Piggin , Andy Lutomirski , xen-devel , Boris Ostrovsky , Vitaly Kuznetsov , Allison Randal , Jim Mattson , Christophe Leroy , Mel Gorman , Adam Borowski , Cornelia Huck , Pavel Tatashin , Linux Kernel Mailing List , Johannes Weiner , Paolo Bonzini , Andrew Morton , linuxppc-dev Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit Errors-To: driverdev-devel-bounces@linuxdriverproject.org Sender: "devel" On Tue, Nov 5, 2019 at 4:03 PM Sean Christopherson wrote: > > On Tue, Nov 05, 2019 at 03:43:29PM -0800, Dan Williams wrote: > > On Tue, Nov 5, 2019 at 3:30 PM Dan Williams wrote: > > > > > > On Tue, Nov 5, 2019 at 3:13 PM Sean Christopherson > > > wrote: > > > > > > > > On Tue, Nov 05, 2019 at 03:02:40PM -0800, Dan Williams wrote: > > > > > On Tue, Nov 5, 2019 at 12:31 PM David Hildenbrand wrote: > > > > > > > The scarier code (for me) is transparent_hugepage_adjust() and > > > > > > > kvm_mmu_zap_collapsible_spte(), as I don't at all understand the > > > > > > > interaction between THP and _PAGE_DEVMAP. > > > > > > > > > > > > The x86 KVM MMU code is one of the ugliest code I know (sorry, but it > > > > > > had to be said :/ ). Luckily, this should be independent of the > > > > > > PG_reserved thingy AFAIKs. > > > > > > > > > > Both transparent_hugepage_adjust() and kvm_mmu_zap_collapsible_spte() > > > > > are honoring kvm_is_reserved_pfn(), so again I'm missing where the > > > > > page count gets mismanaged and leads to the reported hang. > > > > > > > > When mapping pages into the guest, KVM gets the page via gup(), which > > > > increments the page count for ZONE_DEVICE pages. But KVM puts the page > > > > using kvm_release_pfn_clean(), which skips put_page() if PageReserved() > > > > and so never puts its reference to ZONE_DEVICE pages. > > > > > > Oh, yeah, that's busted. > > > > Ugh, it's extra busted because every other gup user in the kernel > > tracks the pages resulting from gup and puts them (put_page()) when > > they are done. KVM wants to forget about whether it did a gup to get > > the page and optionally trigger put_page() based purely on the pfn. > > Outside of VFIO device assignment that needs pages pinned for DMA, why > > does KVM itself need to pin pages? If pages are pinned over a return > > to userspace that needs to be a FOLL_LONGTERM gup. > > Short answer, KVM pins the page to ensure correctness with respect to the > primary MMU invalidating the associated host virtual address, e.g. when > the page is being migrated or unmapped from host userspace. > > The main use of gup() is to handle guest page faults and map pages into > the guest, i.e. into KVM's secondary MMU. KVM uses gup() to both get the > PFN and to temporarily pin the page. The pin is held just long enough to > guaranteed that any invalidation via the mmu_notifier will be stalled > until after KVM finishes installing the page into the secondary MMU, i.e. > the pin is short-term and not held across a return to userspace or entry > into the guest. When a subsequent mmu_notifier invalidation occurs, KVM > pulls the PFN from the secondary MMU and uses that to update accessed > and dirty bits in the host. > > There are a few other KVM flows that eventually call into gup(), but those > are "traditional" short-term pins and use put_page() directly. Ok, I was misinterpreting the effect of the bug with what KVM is using the reference to do. To your other point: > But David's proposed fix for the above refcount bug is to omit the patch > so that KVM no longer treats ZONE_DEVICE pages as reserved. That seems > like the right thing to do, including for thp_adjust(), e.g. it would > naturally let KVM use 2mb pages for the guest when a ZONE_DEVICE page is > mapped with a huge page (2mb or above) in the host. The only hiccup is > figuring out how to correctly transfer the reference. That might not be the only hiccup. There's currently no such thing as huge pages for ZONE_DEVICE, there are huge *mappings* (pmd and pud), but the result of pfn_to_page() on such a mapping does not yield a huge 'struct page'. It seems there are other paths in KVM that assume that more typical page machinery is active like SetPageDirty() based on kvm_is_reserved_pfn(). While I told David that I did not want to see more usage of is_zone_device_page(), this patch below (untested) seems a cleaner path with less surprises: diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c index 4df0aa6b8e5c..fbea17c1810c 100644 --- a/virt/kvm/kvm_main.c +++ b/virt/kvm/kvm_main.c @@ -1831,7 +1831,8 @@ EXPORT_SYMBOL_GPL(kvm_release_page_clean); void kvm_release_pfn_clean(kvm_pfn_t pfn) { - if (!is_error_noslot_pfn(pfn) && !kvm_is_reserved_pfn(pfn)) + if ((!is_error_noslot_pfn(pfn) && !kvm_is_reserved_pfn(pfn)) || + (pfn_valid(pfn) && is_zone_device_page(pfn_to_page(pfn)))) put_page(pfn_to_page(pfn)); } EXPORT_SYMBOL_GPL(kvm_release_pfn_clean); This is safe because the reference that KVM took earlier protects the is_zone_device_page() lookup from racing device teardown. Otherwise, if KVM does not have a reference it's unsafe, but that's already even more broken because KVM would be releasing a page that it never referenced. Every other KVM operation that assumes page allocator pages would continue to honor kvm_is_reserved_pfn(). _______________________________________________ devel mailing list devel@linuxdriverproject.org http://driverdev.linuxdriverproject.org/mailman/listinfo/driverdev-devel From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-3.5 required=3.0 tests=DKIM_INVALID,DKIM_SIGNED, HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_PATCH,MAILING_LIST_MULTI,SPF_HELO_NONE, SPF_PASS,URIBL_BLOCKED autolearn=no autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id E1826C5DF60 for ; Wed, 6 Nov 2019 00:21:52 +0000 (UTC) Received: from lists.ozlabs.org (lists.ozlabs.org [203.11.71.2]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id 656E02087E for ; Wed, 6 Nov 2019 00:21:52 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=fail reason="signature verification failed" (2048-bit key) header.d=intel-com.20150623.gappssmtp.com header.i=@intel-com.20150623.gappssmtp.com header.b="e+Di3mlc" DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 656E02087E Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=intel.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=linuxppc-dev-bounces+linuxppc-dev=archiver.kernel.org@lists.ozlabs.org Received: from bilbo.ozlabs.org (lists.ozlabs.org [IPv6:2401:3900:2:1::3]) by lists.ozlabs.org (Postfix) with ESMTP id 4776fQ03z0zF5Ht for ; Wed, 6 Nov 2019 11:21:50 +1100 (AEDT) Authentication-Results: lists.ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=intel.com (client-ip=2607:f8b0:4864:20::341; helo=mail-ot1-x341.google.com; envelope-from=dan.j.williams@intel.com; receiver=) Authentication-Results: lists.ozlabs.org; dmarc=pass (p=none dis=none) header.from=intel.com Authentication-Results: lists.ozlabs.org; dkim=pass (2048-bit key; unprotected) header.d=intel-com.20150623.gappssmtp.com header.i=@intel-com.20150623.gappssmtp.com header.b="e+Di3mlc"; dkim-atps=neutral Received: from mail-ot1-x341.google.com (mail-ot1-x341.google.com [IPv6:2607:f8b0:4864:20::341]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by lists.ozlabs.org (Postfix) with ESMTPS id 4776ML46rCzF3wl for ; Wed, 6 Nov 2019 11:08:46 +1100 (AEDT) Received: by mail-ot1-x341.google.com with SMTP id f10so597017oto.3 for ; Tue, 05 Nov 2019 16:08:46 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=intel-com.20150623.gappssmtp.com; s=20150623; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc; bh=NBnxU3butVFZM7AnAXfHT8dkMjtUFYfyCvUDW61SXV0=; b=e+Di3mlc5nguC0AjBP91KU4QLtXyLv60VHL8bcA2yTFqSx8GzUT3JPnO2SUvw9dJ3e x892tG+naCnJ7xx1/XyJqFBd5BAvIkbzvc8Hm+iANPYt1TTQlO54TnNTT2/Vr/wnnHwU cWH/uHsxgeOfe757EoaxLRwaf5ARnH5aT0zdOmRUm2qEb9I99F92KkPxOUH4REMi83wk kEyAkGq3/W1xKyxTMdH0KRUk/WXIRl4YbdL49pTFBgS4yeLctZvOTpoL9B1muQPVqJmY p3ipEL1tac4Bnfpc8lK1KzxSavaLkdb1VqqzcOJA3HvnvhhzQSrRj7AaXeANHjyy47rG tLMg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=NBnxU3butVFZM7AnAXfHT8dkMjtUFYfyCvUDW61SXV0=; b=beS5ijNNYV/wJaFQD1C13wdJjYPdEzIi3qdREICCajoiW9wrC5lWAO0ZEqaBdxTnKy tEGr+PTzexxxpfVTdciPHYAiPsYNucYvM3OAz4HmwiR0G6z8UZXwPYggXcy0u7jdJga6 2vo+yG+lP4EJdV5dW7nCc/8iDEiIlb5+Dtty8v50KZRmOZGkfmRlx0DKaiKX6t7szOO0 W8Bb26sF5fyhsj14Ssbt2qFLZ2mpqI14/7ZRJISPvsXkvOY547dxPGlwtc6HLnkZYaCW LS05nTmfSbx9oyGzS7dJfSwUxD7cAcdN4poFPPpSzXIQUPfQ1BjAdNDJdPmwCSqE5mWf umiw== X-Gm-Message-State: APjAAAW35EdiY3kCHxKAC7+cNKa11uQKc43J1gGJ0j1LRrlLG0OK6bJM VwXyG+ZdY3EAiJ230AjYdnyt19nLXsFyhSU5nD6Dag== X-Google-Smtp-Source: APXvYqy6XKydDtiqLti7hhnEPqaloEXe8khYaaQC8o7B44In3kaSUL1URuHERGnTYAOJPrixGMDMzxSCyGGCNLsXEME= X-Received: by 2002:a9d:5f11:: with SMTP id f17mr24252119oti.207.1572998924064; Tue, 05 Nov 2019 16:08:44 -0800 (PST) MIME-Version: 1.0 References: <01adb4cb-6092-638c-0bab-e61322be7cf5@redhat.com> <613f3606-748b-0e56-a3ad-1efaffa1a67b@redhat.com> <20191105160000.GC8128@linux.intel.com> <20191105231316.GE23297@linux.intel.com> <20191106000315.GI23297@linux.intel.com> In-Reply-To: <20191106000315.GI23297@linux.intel.com> From: Dan Williams Date: Tue, 5 Nov 2019 16:08:32 -0800 Message-ID: Subject: Re: [PATCH v1 03/10] KVM: Prepare kvm_is_reserved_pfn() for PG_reserved changes To: Sean Christopherson Content-Type: text/plain; charset="UTF-8" X-Mailman-Approved-At: Wed, 06 Nov 2019 11:13:27 +1100 X-BeenThere: linuxppc-dev@lists.ozlabs.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Linux on PowerPC Developers Mail List List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: linux-hyperv@vger.kernel.org, Michal Hocko , =?UTF-8?B?UmFkaW0gS3LEjW3DocWZ?= , KVM list , David Hildenbrand , KarimAllah Ahmed , Dave Hansen , Alexander Duyck , Michal Hocko , Linux MM , Pavel Tatashin , Paul Mackerras , "H. Peter Anvin" , Wanpeng Li , Alexander Duyck , "K. Y. Srinivasan" , Thomas Gleixner , Kees Cook , devel@driverdev.osuosl.org, Stefano Stabellini , Stephen Hemminger , "Aneesh Kumar K.V" , Joerg Roedel , X86 ML , YueHaibing , "Matthew Wilcox \(Oracle\)" , Mike Rapoport , Peter Zijlstra , Ingo Molnar , Vlastimil Babka , Anthony Yznaga , Oscar Salvador , "Isaac J. Manjarres" , Matt Sickler , Juergen Gross , Anshuman Khandual , Haiyang Zhang , Sasha Levin , kvm-ppc@vger.kernel.org, Qian Cai , Alex Williamson , Mike Rapoport , Borislav Petkov , Nicholas Piggin , Andy Lutomirski , xen-devel , Boris Ostrovsky , Vitaly Kuznetsov , Allison Randal , Jim Mattson , Mel Gorman , Adam Borowski , Cornelia Huck , Pavel Tatashin , Linux Kernel Mailing List , Johannes Weiner , Paolo Bonzini , Andrew Morton , linuxppc-dev Errors-To: linuxppc-dev-bounces+linuxppc-dev=archiver.kernel.org@lists.ozlabs.org Sender: "Linuxppc-dev" On Tue, Nov 5, 2019 at 4:03 PM Sean Christopherson wrote: > > On Tue, Nov 05, 2019 at 03:43:29PM -0800, Dan Williams wrote: > > On Tue, Nov 5, 2019 at 3:30 PM Dan Williams wrote: > > > > > > On Tue, Nov 5, 2019 at 3:13 PM Sean Christopherson > > > wrote: > > > > > > > > On Tue, Nov 05, 2019 at 03:02:40PM -0800, Dan Williams wrote: > > > > > On Tue, Nov 5, 2019 at 12:31 PM David Hildenbrand wrote: > > > > > > > The scarier code (for me) is transparent_hugepage_adjust() and > > > > > > > kvm_mmu_zap_collapsible_spte(), as I don't at all understand the > > > > > > > interaction between THP and _PAGE_DEVMAP. > > > > > > > > > > > > The x86 KVM MMU code is one of the ugliest code I know (sorry, but it > > > > > > had to be said :/ ). Luckily, this should be independent of the > > > > > > PG_reserved thingy AFAIKs. > > > > > > > > > > Both transparent_hugepage_adjust() and kvm_mmu_zap_collapsible_spte() > > > > > are honoring kvm_is_reserved_pfn(), so again I'm missing where the > > > > > page count gets mismanaged and leads to the reported hang. > > > > > > > > When mapping pages into the guest, KVM gets the page via gup(), which > > > > increments the page count for ZONE_DEVICE pages. But KVM puts the page > > > > using kvm_release_pfn_clean(), which skips put_page() if PageReserved() > > > > and so never puts its reference to ZONE_DEVICE pages. > > > > > > Oh, yeah, that's busted. > > > > Ugh, it's extra busted because every other gup user in the kernel > > tracks the pages resulting from gup and puts them (put_page()) when > > they are done. KVM wants to forget about whether it did a gup to get > > the page and optionally trigger put_page() based purely on the pfn. > > Outside of VFIO device assignment that needs pages pinned for DMA, why > > does KVM itself need to pin pages? If pages are pinned over a return > > to userspace that needs to be a FOLL_LONGTERM gup. > > Short answer, KVM pins the page to ensure correctness with respect to the > primary MMU invalidating the associated host virtual address, e.g. when > the page is being migrated or unmapped from host userspace. > > The main use of gup() is to handle guest page faults and map pages into > the guest, i.e. into KVM's secondary MMU. KVM uses gup() to both get the > PFN and to temporarily pin the page. The pin is held just long enough to > guaranteed that any invalidation via the mmu_notifier will be stalled > until after KVM finishes installing the page into the secondary MMU, i.e. > the pin is short-term and not held across a return to userspace or entry > into the guest. When a subsequent mmu_notifier invalidation occurs, KVM > pulls the PFN from the secondary MMU and uses that to update accessed > and dirty bits in the host. > > There are a few other KVM flows that eventually call into gup(), but those > are "traditional" short-term pins and use put_page() directly. Ok, I was misinterpreting the effect of the bug with what KVM is using the reference to do. To your other point: > But David's proposed fix for the above refcount bug is to omit the patch > so that KVM no longer treats ZONE_DEVICE pages as reserved. That seems > like the right thing to do, including for thp_adjust(), e.g. it would > naturally let KVM use 2mb pages for the guest when a ZONE_DEVICE page is > mapped with a huge page (2mb or above) in the host. The only hiccup is > figuring out how to correctly transfer the reference. That might not be the only hiccup. There's currently no such thing as huge pages for ZONE_DEVICE, there are huge *mappings* (pmd and pud), but the result of pfn_to_page() on such a mapping does not yield a huge 'struct page'. It seems there are other paths in KVM that assume that more typical page machinery is active like SetPageDirty() based on kvm_is_reserved_pfn(). While I told David that I did not want to see more usage of is_zone_device_page(), this patch below (untested) seems a cleaner path with less surprises: diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c index 4df0aa6b8e5c..fbea17c1810c 100644 --- a/virt/kvm/kvm_main.c +++ b/virt/kvm/kvm_main.c @@ -1831,7 +1831,8 @@ EXPORT_SYMBOL_GPL(kvm_release_page_clean); void kvm_release_pfn_clean(kvm_pfn_t pfn) { - if (!is_error_noslot_pfn(pfn) && !kvm_is_reserved_pfn(pfn)) + if ((!is_error_noslot_pfn(pfn) && !kvm_is_reserved_pfn(pfn)) || + (pfn_valid(pfn) && is_zone_device_page(pfn_to_page(pfn)))) put_page(pfn_to_page(pfn)); } EXPORT_SYMBOL_GPL(kvm_release_pfn_clean); This is safe because the reference that KVM took earlier protects the is_zone_device_page() lookup from racing device teardown. Otherwise, if KVM does not have a reference it's unsafe, but that's already even more broken because KVM would be releasing a page that it never referenced. Every other KVM operation that assumes page allocator pages would continue to honor kvm_is_reserved_pfn(). From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-3.5 required=3.0 tests=DKIM_INVALID,DKIM_SIGNED, HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_PATCH,MAILING_LIST_MULTI,SPF_HELO_NONE, SPF_PASS,URIBL_BLOCKED autolearn=no autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 85297C5DF60 for ; Wed, 6 Nov 2019 00:09:15 +0000 (UTC) Received: from lists.xenproject.org (lists.xenproject.org [192.237.175.120]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id 270D7214D8 for ; Wed, 6 Nov 2019 00:09:15 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=fail reason="signature verification failed" (2048-bit key) header.d=intel-com.20150623.gappssmtp.com header.i=@intel-com.20150623.gappssmtp.com header.b="e+Di3mlc" DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 270D7214D8 Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=intel.com Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=xen-devel-bounces@lists.xenproject.org Received: from localhost ([127.0.0.1] helo=lists.xenproject.org) by lists.xenproject.org with esmtp (Exim 4.89) (envelope-from ) id 1iS8sR-00058m-Qb; Wed, 06 Nov 2019 00:08:47 +0000 Received: from us1-rack-iad1.inumbo.com ([172.99.69.81]) by lists.xenproject.org with esmtp (Exim 4.89) (envelope-from ) id 1iS8sR-00058h-7e for xen-devel@lists.xenproject.org; Wed, 06 Nov 2019 00:08:47 +0000 X-Inumbo-ID: 98c7c876-0029-11ea-b678-bc764e2007e4 Received: from mail-ot1-x343.google.com (unknown [2607:f8b0:4864:20::343]) by us1-rack-iad1.inumbo.com (Halon) with ESMTPS id 98c7c876-0029-11ea-b678-bc764e2007e4; Wed, 06 Nov 2019 00:08:44 +0000 (UTC) Received: by mail-ot1-x343.google.com with SMTP id m15so10535309otq.7 for ; Tue, 05 Nov 2019 16:08:44 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=intel-com.20150623.gappssmtp.com; s=20150623; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc; bh=NBnxU3butVFZM7AnAXfHT8dkMjtUFYfyCvUDW61SXV0=; b=e+Di3mlc5nguC0AjBP91KU4QLtXyLv60VHL8bcA2yTFqSx8GzUT3JPnO2SUvw9dJ3e x892tG+naCnJ7xx1/XyJqFBd5BAvIkbzvc8Hm+iANPYt1TTQlO54TnNTT2/Vr/wnnHwU cWH/uHsxgeOfe757EoaxLRwaf5ARnH5aT0zdOmRUm2qEb9I99F92KkPxOUH4REMi83wk kEyAkGq3/W1xKyxTMdH0KRUk/WXIRl4YbdL49pTFBgS4yeLctZvOTpoL9B1muQPVqJmY p3ipEL1tac4Bnfpc8lK1KzxSavaLkdb1VqqzcOJA3HvnvhhzQSrRj7AaXeANHjyy47rG tLMg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=NBnxU3butVFZM7AnAXfHT8dkMjtUFYfyCvUDW61SXV0=; b=j/PitPgU1rYuc6bU9aBEPj+VK/exRYsXN+bMkMzxThcxq3/BHHMFUBWgEDFgFc40Rd Rx9SpBgNekvbWqAwi9oq4gUWKna0TDV7usQ3D+JZcXulri3iqU0TaCwgVKCmzy00q+U/ lMeWtR5awThNsa+DPhuBBXossWAEfmta8o7xL+EaNvXCqCgN5PmmQysapy6uDmuCDoNX bBC1y2bjKppTsBdxf3+CQZiGKjQc8QzOyjWFjJsbjNr9+F+niGUJAeE9QxOqkVFAzx2+ PQ3zUWIvvF2A4GdogqX9qRuz+VGg8JBNOg1SjT7CzS4P2qLHWeMOgOIG9zMiXKmAAea6 mITA== X-Gm-Message-State: APjAAAWuqT80emVyKxT2WV7ernOWc1HDwId2Gl0bHR4yRQA/82EoecLC jGlYCMPeRJ+FqsUvDgmXT4AsgoTQGAgE0TWiBaq6uA== X-Google-Smtp-Source: APXvYqy6XKydDtiqLti7hhnEPqaloEXe8khYaaQC8o7B44In3kaSUL1URuHERGnTYAOJPrixGMDMzxSCyGGCNLsXEME= X-Received: by 2002:a9d:5f11:: with SMTP id f17mr24252119oti.207.1572998924064; Tue, 05 Nov 2019 16:08:44 -0800 (PST) MIME-Version: 1.0 References: <01adb4cb-6092-638c-0bab-e61322be7cf5@redhat.com> <613f3606-748b-0e56-a3ad-1efaffa1a67b@redhat.com> <20191105160000.GC8128@linux.intel.com> <20191105231316.GE23297@linux.intel.com> <20191106000315.GI23297@linux.intel.com> In-Reply-To: <20191106000315.GI23297@linux.intel.com> From: Dan Williams Date: Tue, 5 Nov 2019 16:08:32 -0800 Message-ID: To: Sean Christopherson Subject: Re: [Xen-devel] [PATCH v1 03/10] KVM: Prepare kvm_is_reserved_pfn() for PG_reserved changes X-BeenThere: xen-devel@lists.xenproject.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: Xen developer discussion List-Unsubscribe: , List-Post: List-Help: List-Subscribe: , Cc: linux-hyperv@vger.kernel.org, Michal Hocko , =?UTF-8?B?UmFkaW0gS3LEjW3DocWZ?= , KVM list , David Hildenbrand , KarimAllah Ahmed , Benjamin Herrenschmidt , Dave Hansen , Alexander Duyck , Michal Hocko , Paul Mackerras , Linux MM , Pavel Tatashin , Paul Mackerras , Michael Ellerman , "H. Peter Anvin" , Wanpeng Li , Alexander Duyck , "K. Y. Srinivasan" , Thomas Gleixner , Kees Cook , devel@driverdev.osuosl.org, Stefano Stabellini , Stephen Hemminger , "Aneesh Kumar K.V" , Joerg Roedel , X86 ML , YueHaibing , "Matthew Wilcox \(Oracle\)" , Mike Rapoport , Peter Zijlstra , Ingo Molnar , Vlastimil Babka , Anthony Yznaga , Oscar Salvador , "Isaac J. Manjarres" , Matt Sickler , Juergen Gross , Anshuman Khandual , Haiyang Zhang , Sasha Levin , kvm-ppc@vger.kernel.org, Qian Cai , Alex Williamson , Mike Rapoport , Borislav Petkov , Nicholas Piggin , Andy Lutomirski , xen-devel , Boris Ostrovsky , Vitaly Kuznetsov , Allison Randal , Jim Mattson , Christophe Leroy , Mel Gorman , Adam Borowski , Cornelia Huck , Pavel Tatashin , Linux Kernel Mailing List , Johannes Weiner , Paolo Bonzini , Andrew Morton , linuxppc-dev Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: base64 Errors-To: xen-devel-bounces@lists.xenproject.org Sender: "Xen-devel" T24gVHVlLCBOb3YgNSwgMjAxOSBhdCA0OjAzIFBNIFNlYW4gQ2hyaXN0b3BoZXJzb24KPHNlYW4u ai5jaHJpc3RvcGhlcnNvbkBpbnRlbC5jb20+IHdyb3RlOgo+Cj4gT24gVHVlLCBOb3YgMDUsIDIw MTkgYXQgMDM6NDM6MjlQTSAtMDgwMCwgRGFuIFdpbGxpYW1zIHdyb3RlOgo+ID4gT24gVHVlLCBO b3YgNSwgMjAxOSBhdCAzOjMwIFBNIERhbiBXaWxsaWFtcyA8ZGFuLmoud2lsbGlhbXNAaW50ZWwu Y29tPiB3cm90ZToKPiA+ID4KPiA+ID4gT24gVHVlLCBOb3YgNSwgMjAxOSBhdCAzOjEzIFBNIFNl YW4gQ2hyaXN0b3BoZXJzb24KPiA+ID4gPHNlYW4uai5jaHJpc3RvcGhlcnNvbkBpbnRlbC5jb20+ IHdyb3RlOgo+ID4gPiA+Cj4gPiA+ID4gT24gVHVlLCBOb3YgMDUsIDIwMTkgYXQgMDM6MDI6NDBQ TSAtMDgwMCwgRGFuIFdpbGxpYW1zIHdyb3RlOgo+ID4gPiA+ID4gT24gVHVlLCBOb3YgNSwgMjAx OSBhdCAxMjozMSBQTSBEYXZpZCBIaWxkZW5icmFuZCA8ZGF2aWRAcmVkaGF0LmNvbT4gd3JvdGU6 Cj4gPiA+ID4gPiA+ID4gVGhlIHNjYXJpZXIgY29kZSAoZm9yIG1lKSBpcyB0cmFuc3BhcmVudF9o dWdlcGFnZV9hZGp1c3QoKSBhbmQKPiA+ID4gPiA+ID4gPiBrdm1fbW11X3phcF9jb2xsYXBzaWJs ZV9zcHRlKCksIGFzIEkgZG9uJ3QgYXQgYWxsIHVuZGVyc3RhbmQgdGhlCj4gPiA+ID4gPiA+ID4g aW50ZXJhY3Rpb24gYmV0d2VlbiBUSFAgYW5kIF9QQUdFX0RFVk1BUC4KPiA+ID4gPiA+ID4KPiA+ ID4gPiA+ID4gVGhlIHg4NiBLVk0gTU1VIGNvZGUgaXMgb25lIG9mIHRoZSB1Z2xpZXN0IGNvZGUg SSBrbm93IChzb3JyeSwgYnV0IGl0Cj4gPiA+ID4gPiA+IGhhZCB0byBiZSBzYWlkIDovICkuIEx1 Y2tpbHksIHRoaXMgc2hvdWxkIGJlIGluZGVwZW5kZW50IG9mIHRoZQo+ID4gPiA+ID4gPiBQR19y ZXNlcnZlZCB0aGluZ3kgQUZBSUtzLgo+ID4gPiA+ID4KPiA+ID4gPiA+IEJvdGggdHJhbnNwYXJl bnRfaHVnZXBhZ2VfYWRqdXN0KCkgYW5kIGt2bV9tbXVfemFwX2NvbGxhcHNpYmxlX3NwdGUoKQo+ ID4gPiA+ID4gYXJlIGhvbm9yaW5nIGt2bV9pc19yZXNlcnZlZF9wZm4oKSwgc28gYWdhaW4gSSdt IG1pc3Npbmcgd2hlcmUgdGhlCj4gPiA+ID4gPiBwYWdlIGNvdW50IGdldHMgbWlzbWFuYWdlZCBh bmQgbGVhZHMgdG8gdGhlIHJlcG9ydGVkIGhhbmcuCj4gPiA+ID4KPiA+ID4gPiBXaGVuIG1hcHBp bmcgcGFnZXMgaW50byB0aGUgZ3Vlc3QsIEtWTSBnZXRzIHRoZSBwYWdlIHZpYSBndXAoKSwgd2hp Y2gKPiA+ID4gPiBpbmNyZW1lbnRzIHRoZSBwYWdlIGNvdW50IGZvciBaT05FX0RFVklDRSBwYWdl cy4gIEJ1dCBLVk0gcHV0cyB0aGUgcGFnZQo+ID4gPiA+IHVzaW5nIGt2bV9yZWxlYXNlX3Bmbl9j bGVhbigpLCB3aGljaCBza2lwcyBwdXRfcGFnZSgpIGlmIFBhZ2VSZXNlcnZlZCgpCj4gPiA+ID4g YW5kIHNvIG5ldmVyIHB1dHMgaXRzIHJlZmVyZW5jZSB0byBaT05FX0RFVklDRSBwYWdlcy4KPiA+ ID4KPiA+ID4gT2gsIHllYWgsIHRoYXQncyBidXN0ZWQuCj4gPgo+ID4gVWdoLCBpdCdzIGV4dHJh IGJ1c3RlZCBiZWNhdXNlIGV2ZXJ5IG90aGVyIGd1cCB1c2VyIGluIHRoZSBrZXJuZWwKPiA+IHRy YWNrcyB0aGUgcGFnZXMgcmVzdWx0aW5nIGZyb20gZ3VwIGFuZCBwdXRzIHRoZW0gKHB1dF9wYWdl KCkpIHdoZW4KPiA+IHRoZXkgYXJlIGRvbmUuIEtWTSB3YW50cyB0byBmb3JnZXQgYWJvdXQgd2hl dGhlciBpdCBkaWQgYSBndXAgdG8gZ2V0Cj4gPiB0aGUgcGFnZSBhbmQgb3B0aW9uYWxseSB0cmln Z2VyIHB1dF9wYWdlKCkgYmFzZWQgcHVyZWx5IG9uIHRoZSBwZm4uCj4gPiBPdXRzaWRlIG9mIFZG SU8gZGV2aWNlIGFzc2lnbm1lbnQgdGhhdCBuZWVkcyBwYWdlcyBwaW5uZWQgZm9yIERNQSwgd2h5 Cj4gPiBkb2VzIEtWTSBpdHNlbGYgbmVlZCB0byBwaW4gcGFnZXM/IElmIHBhZ2VzIGFyZSBwaW5u ZWQgb3ZlciBhIHJldHVybgo+ID4gdG8gdXNlcnNwYWNlIHRoYXQgbmVlZHMgdG8gYmUgYSBGT0xM X0xPTkdURVJNIGd1cC4KPgo+IFNob3J0IGFuc3dlciwgS1ZNIHBpbnMgdGhlIHBhZ2UgdG8gZW5z dXJlIGNvcnJlY3RuZXNzIHdpdGggcmVzcGVjdCB0byB0aGUKPiBwcmltYXJ5IE1NVSBpbnZhbGlk YXRpbmcgdGhlIGFzc29jaWF0ZWQgaG9zdCB2aXJ0dWFsIGFkZHJlc3MsIGUuZy4gd2hlbgo+IHRo ZSBwYWdlIGlzIGJlaW5nIG1pZ3JhdGVkIG9yIHVubWFwcGVkIGZyb20gaG9zdCB1c2Vyc3BhY2Uu Cj4KPiBUaGUgbWFpbiB1c2Ugb2YgZ3VwKCkgaXMgdG8gaGFuZGxlIGd1ZXN0IHBhZ2UgZmF1bHRz IGFuZCBtYXAgcGFnZXMgaW50bwo+IHRoZSBndWVzdCwgaS5lLiBpbnRvIEtWTSdzIHNlY29uZGFy eSBNTVUuICBLVk0gdXNlcyBndXAoKSB0byBib3RoIGdldCB0aGUKPiBQRk4gYW5kIHRvIHRlbXBv cmFyaWx5IHBpbiB0aGUgcGFnZS4gIFRoZSBwaW4gaXMgaGVsZCBqdXN0IGxvbmcgZW5vdWdoIHRv Cj4gZ3VhcmFudGVlZCB0aGF0IGFueSBpbnZhbGlkYXRpb24gdmlhIHRoZSBtbXVfbm90aWZpZXIg d2lsbCBiZSBzdGFsbGVkCj4gdW50aWwgYWZ0ZXIgS1ZNIGZpbmlzaGVzIGluc3RhbGxpbmcgdGhl IHBhZ2UgaW50byB0aGUgc2Vjb25kYXJ5IE1NVSwgaS5lLgo+IHRoZSBwaW4gaXMgc2hvcnQtdGVy bSBhbmQgbm90IGhlbGQgYWNyb3NzIGEgcmV0dXJuIHRvIHVzZXJzcGFjZSBvciBlbnRyeQo+IGlu dG8gdGhlIGd1ZXN0LiAgV2hlbiBhIHN1YnNlcXVlbnQgbW11X25vdGlmaWVyIGludmFsaWRhdGlv biBvY2N1cnMsIEtWTQo+IHB1bGxzIHRoZSBQRk4gZnJvbSB0aGUgc2Vjb25kYXJ5IE1NVSBhbmQg dXNlcyB0aGF0IHRvIHVwZGF0ZSBhY2Nlc3NlZAo+IGFuZCBkaXJ0eSBiaXRzIGluIHRoZSBob3N0 Lgo+Cj4gVGhlcmUgYXJlIGEgZmV3IG90aGVyIEtWTSBmbG93cyB0aGF0IGV2ZW50dWFsbHkgY2Fs bCBpbnRvIGd1cCgpLCBidXQgdGhvc2UKPiBhcmUgInRyYWRpdGlvbmFsIiBzaG9ydC10ZXJtIHBp bnMgYW5kIHVzZSBwdXRfcGFnZSgpIGRpcmVjdGx5LgoKT2ssIEkgd2FzIG1pc2ludGVycHJldGlu ZyB0aGUgZWZmZWN0IG9mIHRoZSBidWcgd2l0aCB3aGF0IEtWTSBpcyB1c2luZwp0aGUgcmVmZXJl bmNlIHRvIGRvLgoKVG8geW91ciBvdGhlciBwb2ludDoKCj4gQnV0IERhdmlkJ3MgcHJvcG9zZWQg Zml4IGZvciB0aGUgYWJvdmUgcmVmY291bnQgYnVnIGlzIHRvIG9taXQgdGhlIHBhdGNoCj4gc28g dGhhdCBLVk0gbm8gbG9uZ2VyIHRyZWF0cyBaT05FX0RFVklDRSBwYWdlcyBhcyByZXNlcnZlZC4g IFRoYXQgc2VlbXMKPiBsaWtlIHRoZSByaWdodCB0aGluZyB0byBkbywgaW5jbHVkaW5nIGZvciB0 aHBfYWRqdXN0KCksIGUuZy4gaXQgd291bGQKPiBuYXR1cmFsbHkgbGV0IEtWTSB1c2UgMm1iIHBh Z2VzIGZvciB0aGUgZ3Vlc3Qgd2hlbiBhIFpPTkVfREVWSUNFIHBhZ2UgaXMKPiBtYXBwZWQgd2l0 aCBhIGh1Z2UgcGFnZSAoMm1iIG9yIGFib3ZlKSBpbiB0aGUgaG9zdC4gIFRoZSBvbmx5IGhpY2N1 cCBpcwo+IGZpZ3VyaW5nIG91dCBob3cgdG8gY29ycmVjdGx5IHRyYW5zZmVyIHRoZSByZWZlcmVu Y2UuCgpUaGF0IG1pZ2h0IG5vdCBiZSB0aGUgb25seSBoaWNjdXAuIFRoZXJlJ3MgY3VycmVudGx5 IG5vIHN1Y2ggdGhpbmcgYXMKaHVnZSBwYWdlcyBmb3IgWk9ORV9ERVZJQ0UsIHRoZXJlIGFyZSBo dWdlICptYXBwaW5ncyogKHBtZCBhbmQgcHVkKSwKYnV0IHRoZSByZXN1bHQgb2YgcGZuX3RvX3Bh Z2UoKSBvbiBzdWNoIGEgbWFwcGluZyBkb2VzIG5vdCB5aWVsZCBhCmh1Z2UgJ3N0cnVjdCBwYWdl Jy4gSXQgc2VlbXMgdGhlcmUgYXJlIG90aGVyIHBhdGhzIGluIEtWTSB0aGF0IGFzc3VtZQp0aGF0 IG1vcmUgdHlwaWNhbCBwYWdlIG1hY2hpbmVyeSBpcyBhY3RpdmUgbGlrZSBTZXRQYWdlRGlydHko KSBiYXNlZApvbiBrdm1faXNfcmVzZXJ2ZWRfcGZuKCkuIFdoaWxlIEkgdG9sZCBEYXZpZCB0aGF0 IEkgZGlkIG5vdCB3YW50IHRvCnNlZSBtb3JlIHVzYWdlIG9mIGlzX3pvbmVfZGV2aWNlX3BhZ2Uo KSwgdGhpcyBwYXRjaCBiZWxvdyAodW50ZXN0ZWQpCnNlZW1zIGEgY2xlYW5lciBwYXRoIHdpdGgg bGVzcyBzdXJwcmlzZXM6CgpkaWZmIC0tZ2l0IGEvdmlydC9rdm0va3ZtX21haW4uYyBiL3ZpcnQv a3ZtL2t2bV9tYWluLmMKaW5kZXggNGRmMGFhNmI4ZTVjLi5mYmVhMTdjMTgxMGMgMTAwNjQ0Ci0t LSBhL3ZpcnQva3ZtL2t2bV9tYWluLmMKKysrIGIvdmlydC9rdm0va3ZtX21haW4uYwpAQCAtMTgz MSw3ICsxODMxLDggQEAgRVhQT1JUX1NZTUJPTF9HUEwoa3ZtX3JlbGVhc2VfcGFnZV9jbGVhbik7 Cgogdm9pZCBrdm1fcmVsZWFzZV9wZm5fY2xlYW4oa3ZtX3Bmbl90IHBmbikKIHsKLSAgICAgICBp ZiAoIWlzX2Vycm9yX25vc2xvdF9wZm4ocGZuKSAmJiAha3ZtX2lzX3Jlc2VydmVkX3BmbihwZm4p KQorICAgICAgIGlmICgoIWlzX2Vycm9yX25vc2xvdF9wZm4ocGZuKSAmJiAha3ZtX2lzX3Jlc2Vy dmVkX3BmbihwZm4pKSB8fAorICAgICAgICAgICAocGZuX3ZhbGlkKHBmbikgJiYgaXNfem9uZV9k ZXZpY2VfcGFnZShwZm5fdG9fcGFnZShwZm4pKSkpCiAgICAgICAgICAgICAgICBwdXRfcGFnZShw Zm5fdG9fcGFnZShwZm4pKTsKIH0KIEVYUE9SVF9TWU1CT0xfR1BMKGt2bV9yZWxlYXNlX3Bmbl9j bGVhbik7CgpUaGlzIGlzIHNhZmUgYmVjYXVzZSB0aGUgcmVmZXJlbmNlIHRoYXQgS1ZNIHRvb2sg ZWFybGllciBwcm90ZWN0cyB0aGUKaXNfem9uZV9kZXZpY2VfcGFnZSgpIGxvb2t1cCBmcm9tIHJh Y2luZyBkZXZpY2UgdGVhcmRvd24uIE90aGVyd2lzZSwKaWYgS1ZNIGRvZXMgbm90IGhhdmUgYSBy ZWZlcmVuY2UgaXQncyB1bnNhZmUsIGJ1dCB0aGF0J3MgYWxyZWFkeSBldmVuCm1vcmUgYnJva2Vu IGJlY2F1c2UgS1ZNIHdvdWxkIGJlIHJlbGVhc2luZyBhIHBhZ2UgdGhhdCBpdCBuZXZlcgpyZWZl cmVuY2VkLiBFdmVyeSBvdGhlciBLVk0gb3BlcmF0aW9uIHRoYXQgYXNzdW1lcyBwYWdlIGFsbG9j YXRvcgpwYWdlcyB3b3VsZCBjb250aW51ZSB0byBob25vciBrdm1faXNfcmVzZXJ2ZWRfcGZuKCku CgpfX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fXwpYZW4tZGV2 ZWwgbWFpbGluZyBsaXN0Clhlbi1kZXZlbEBsaXN0cy54ZW5wcm9qZWN0Lm9yZwpodHRwczovL2xp c3RzLnhlbnByb2plY3Qub3JnL21haWxtYW4vbGlzdGluZm8veGVuLWRldmVs