From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-2.2 required=3.0 tests=HEADER_FROM_DIFFERENT_DOMAINS, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED,USER_AGENT_SANE_1 autolearn=no autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 09905C5DF60 for ; Wed, 6 Nov 2019 00:03:22 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id 8F14021A49 for ; Wed, 6 Nov 2019 00:03:21 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 8F14021A49 Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=intel.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id 541876B0005; Tue, 5 Nov 2019 19:03:21 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 517716B0007; Tue, 5 Nov 2019 19:03:21 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 42DF06B0008; Tue, 5 Nov 2019 19:03:21 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0096.hostedemail.com [216.40.44.96]) by kanga.kvack.org (Postfix) with ESMTP id 2FE106B0005 for ; Tue, 5 Nov 2019 19:03:21 -0500 (EST) Received: from smtpin14.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay03.hostedemail.com (Postfix) with SMTP id CAE958249980 for ; Wed, 6 Nov 2019 00:03:20 +0000 (UTC) X-FDA: 76123902960.14.crime41_57496f7612117 X-HE-Tag: crime41_57496f7612117 X-Filterd-Recvd-Size: 7191 Received: from mga04.intel.com (mga04.intel.com [192.55.52.120]) by imf48.hostedemail.com (Postfix) with ESMTP for ; Wed, 6 Nov 2019 00:03:19 +0000 (UTC) X-Amp-Result: UNKNOWN X-Amp-Original-Verdict: FILE UNKNOWN X-Amp-File-Uploaded: False Received: from fmsmga008.fm.intel.com ([10.253.24.58]) by fmsmga104.fm.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384; 05 Nov 2019 16:03:17 -0800 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.68,271,1569308400"; d="scan'208";a="200547335" Received: from sjchrist-coffee.jf.intel.com (HELO linux.intel.com) ([10.54.74.41]) by fmsmga008.fm.intel.com with ESMTP; 05 Nov 2019 16:03:15 -0800 Date: Tue, 5 Nov 2019 16:03:15 -0800 From: Sean Christopherson To: Dan Williams Cc: David Hildenbrand , Linux Kernel Mailing List , Linux MM , Michal Hocko , Andrew Morton , kvm-ppc@vger.kernel.org, linuxppc-dev , KVM list , linux-hyperv@vger.kernel.org, devel@driverdev.osuosl.org, xen-devel , X86 ML , Alexander Duyck , Alexander Duyck , Alex Williamson , Allison Randal , Andy Lutomirski , "Aneesh Kumar K.V" , Anshuman Khandual , Anthony Yznaga , Benjamin Herrenschmidt , Borislav Petkov , Boris Ostrovsky , Christophe Leroy , Cornelia Huck , Dave Hansen , Haiyang Zhang , "H. Peter Anvin" , Ingo Molnar , "Isaac J. Manjarres" , Jim Mattson , Joerg Roedel , Johannes Weiner , Juergen Gross , KarimAllah Ahmed , Kees Cook , "K. Y. Srinivasan" , "Matthew Wilcox (Oracle)" , Matt Sickler , Mel Gorman , Michael Ellerman , Michal Hocko , Mike Rapoport , Mike Rapoport , Nicholas Piggin , Oscar Salvador , Paolo Bonzini , Paul Mackerras , Paul Mackerras , Pavel Tatashin , Pavel Tatashin , Peter Zijlstra , Qian Cai , Radim =?utf-8?B?S3LEjW3DocWZ?= , Sasha Levin , Stefano Stabellini , Stephen Hemminger , Thomas Gleixner , Vitaly Kuznetsov , Vlastimil Babka , Wanpeng Li , YueHaibing , Adam Borowski Subject: Re: [PATCH v1 03/10] KVM: Prepare kvm_is_reserved_pfn() for PG_reserved changes Message-ID: <20191106000315.GI23297@linux.intel.com> References: <01adb4cb-6092-638c-0bab-e61322be7cf5@redhat.com> <613f3606-748b-0e56-a3ad-1efaffa1a67b@redhat.com> <20191105160000.GC8128@linux.intel.com> <20191105231316.GE23297@linux.intel.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: User-Agent: Mutt/1.5.24 (2015-08-30) X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On Tue, Nov 05, 2019 at 03:43:29PM -0800, Dan Williams wrote: > On Tue, Nov 5, 2019 at 3:30 PM Dan Williams wrote: > > > > On Tue, Nov 5, 2019 at 3:13 PM Sean Christopherson > > wrote: > > > > > > On Tue, Nov 05, 2019 at 03:02:40PM -0800, Dan Williams wrote: > > > > On Tue, Nov 5, 2019 at 12:31 PM David Hildenbrand wrote: > > > > > > The scarier code (for me) is transparent_hugepage_adjust() and > > > > > > kvm_mmu_zap_collapsible_spte(), as I don't at all understand the > > > > > > interaction between THP and _PAGE_DEVMAP. > > > > > > > > > > The x86 KVM MMU code is one of the ugliest code I know (sorry, but it > > > > > had to be said :/ ). Luckily, this should be independent of the > > > > > PG_reserved thingy AFAIKs. > > > > > > > > Both transparent_hugepage_adjust() and kvm_mmu_zap_collapsible_spte() > > > > are honoring kvm_is_reserved_pfn(), so again I'm missing where the > > > > page count gets mismanaged and leads to the reported hang. > > > > > > When mapping pages into the guest, KVM gets the page via gup(), which > > > increments the page count for ZONE_DEVICE pages. But KVM puts the page > > > using kvm_release_pfn_clean(), which skips put_page() if PageReserved() > > > and so never puts its reference to ZONE_DEVICE pages. > > > > Oh, yeah, that's busted. > > Ugh, it's extra busted because every other gup user in the kernel > tracks the pages resulting from gup and puts them (put_page()) when > they are done. KVM wants to forget about whether it did a gup to get > the page and optionally trigger put_page() based purely on the pfn. > Outside of VFIO device assignment that needs pages pinned for DMA, why > does KVM itself need to pin pages? If pages are pinned over a return > to userspace that needs to be a FOLL_LONGTERM gup. Short answer, KVM pins the page to ensure correctness with respect to the primary MMU invalidating the associated host virtual address, e.g. when the page is being migrated or unmapped from host userspace. The main use of gup() is to handle guest page faults and map pages into the guest, i.e. into KVM's secondary MMU. KVM uses gup() to both get the PFN and to temporarily pin the page. The pin is held just long enough to guaranteed that any invalidation via the mmu_notifier will be stalled until after KVM finishes installing the page into the secondary MMU, i.e. the pin is short-term and not held across a return to userspace or entry into the guest. When a subsequent mmu_notifier invalidation occurs, KVM pulls the PFN from the secondary MMU and uses that to update accessed and dirty bits in the host. There are a few other KVM flows that eventually call into gup(), but those are "traditional" short-term pins and use put_page() directly.