From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-0.6 required=3.0 tests=DKIM_INVALID,DKIM_SIGNED, HEADER_FROM_DIFFERENT_DOMAINS,MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS autolearn=no autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 348E8CA9EA0 for ; Tue, 22 Oct 2019 22:00:15 +0000 (UTC) Received: from silver.osuosl.org (smtp3.osuosl.org [140.211.166.136]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id 0110B207FC for ; Tue, 22 Oct 2019 22:00:15 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=fail reason="signature verification failed" (2048-bit key) header.d=intel-com.20150623.gappssmtp.com header.i=@intel-com.20150623.gappssmtp.com header.b="e6GVBLMv" DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 0110B207FC Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=intel.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=driverdev-devel-bounces@linuxdriverproject.org Received: from localhost (localhost [127.0.0.1]) by silver.osuosl.org (Postfix) with ESMTP id BB60A203BE; Tue, 22 Oct 2019 22:00:14 +0000 (UTC) X-Virus-Scanned: amavisd-new at osuosl.org Received: from silver.osuosl.org ([127.0.0.1]) by localhost (.osuosl.org [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id TGVcNglPDToW; Tue, 22 Oct 2019 22:00:11 +0000 (UTC) Received: from ash.osuosl.org (ash.osuosl.org [140.211.166.34]) by silver.osuosl.org (Postfix) with ESMTP id 9F681203A4; Tue, 22 Oct 2019 22:00:11 +0000 (UTC) Received: from whitealder.osuosl.org (smtp1.osuosl.org [140.211.166.138]) by ash.osuosl.org (Postfix) with ESMTP id E8B5E1BF331 for ; Tue, 22 Oct 2019 22:00:10 +0000 (UTC) Received: from localhost (localhost [127.0.0.1]) by whitealder.osuosl.org (Postfix) with ESMTP id E407286378 for ; Tue, 22 Oct 2019 22:00:10 +0000 (UTC) X-Virus-Scanned: amavisd-new at osuosl.org Received: from whitealder.osuosl.org ([127.0.0.1]) by localhost (.osuosl.org [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id MDdU2wWdJ897 for ; Tue, 22 Oct 2019 22:00:10 +0000 (UTC) X-Greylist: delayed 00:05:07 by SQLgrey-1.7.6 Received: from mail-ot1-f67.google.com (mail-ot1-f67.google.com [209.85.210.67]) by whitealder.osuosl.org (Postfix) with ESMTPS id F418784F5A for ; Tue, 22 Oct 2019 22:00:09 +0000 (UTC) Received: by mail-ot1-f67.google.com with SMTP id 60so15615965otu.0 for ; Tue, 22 Oct 2019 15:00:09 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=intel-com.20150623.gappssmtp.com; s=20150623; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc; bh=X+xhy3C7ct4Wd70Xt4OQNAyYmD6XISU1mkD27wwsH4c=; b=e6GVBLMv/V4iAdkfWlAXzADW0DLH0ywch5G0LGr+6++slwJhs2ysGTULdUcotQKYgd WvTyOi8z9vET1krZn1EMRocnjmPrLDAWWBl0e+DDdfTIVpda3KbYcobsSukz/xXtnWSC ZQtGJrQtK05zIxxgAns0ZaO0HHZ9zxQd3iOnrg9SRoBF3dwEbgxB6kbHp1i3JTqB9Fj9 opopASozuNvGmtfcE2psfc/mttjRcEa+QRgxLSQdWXH9K3QGe+2rJ5AJpXwm355yEKqj kWc+H/zOB1wp7gTllbz46Sbxau/Gyb16yQFt7zE0emKw99oiq7QMqumCqJj5kMsWralz baBg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=X+xhy3C7ct4Wd70Xt4OQNAyYmD6XISU1mkD27wwsH4c=; b=edu4eLxdi6gmraP0sxGDwoQiiQyOp4Xidsq/mXGiaJocSyotFhp3aXgO73+w08ehdl lRLczfX+yZdDQefR0B62ElviQSzzf4k5/k1JeO3mKIfjlfdOO6DDApt8/se28AAz9B4M af23ifKs3x+yIVMjcmNpp9jx2WYxXMZddt1VtSpA+5uX1Dc1ZlZ7o56XM8FaazQz8FRU 3Ik+vHe9W8ZXobmsXWuN7VT7R83Qhs8H3fuEsj4vLoR+2PM47YlqO/VBPF/LcvF3nRjw BhNri2ca+wxJegL4Ic7aCzMX3BgFbHk9S25ofJzf66ugskf5/JbVYP1Ve2JO8ygucDCn oYxQ== X-Gm-Message-State: APjAAAWUqey4oZJtkG3Vu6L4FuU2bKz74Kl2AO3G0hHV0PY9dNaWzz6q 8iLThvqWT5lPg9giZJTIBOrA0YZJWLI2FGnZkZn95A== X-Google-Smtp-Source: APXvYqwiQBeyg9j33QDXGkD1y7feKBrNdPRGKNKM/b+c8diEhCMBl1zIP9b9Mw2m1ljAM+dxjsHOktasSwWYIWy9hhA= X-Received: by 2002:a05:6830:1b78:: with SMTP id d24mr4571406ote.363.1571781301217; Tue, 22 Oct 2019 14:55:01 -0700 (PDT) MIME-Version: 1.0 References: <20191022171239.21487-1-david@redhat.com> In-Reply-To: <20191022171239.21487-1-david@redhat.com> From: Dan Williams Date: Tue, 22 Oct 2019 14:54:47 -0700 Message-ID: Subject: Re: [PATCH RFC v1 00/12] mm: Don't mark hotplugged pages PG_reserved (including ZONE_DEVICE) To: David Hildenbrand X-BeenThere: driverdev-devel@linuxdriverproject.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Linux Driver Project Developer List List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Kate Stewart , linux-hyperv@vger.kernel.org, Michal Hocko , =?UTF-8?B?UmFkaW0gS3LEjW3DocWZ?= , KVM list , Pavel Tatashin , KarimAllah Ahmed , Benjamin Herrenschmidt , Dave Hansen , Alexander Duyck , Michal Hocko , Paul Mackerras , Linux MM , Paul Mackerras , Michael Ellerman , "H. Peter Anvin" , Wanpeng Li , Alexander Duyck , Kees Cook , devel@driverdev.osuosl.org, Stefano Stabellini , Stephen Hemminger , "Aneesh Kumar K.V" , Joerg Roedel , X86 ML , YueHaibing , Mike Rapoport , Madhumitha Prabakaran , Peter Zijlstra , Ingo Molnar , Vlastimil Babka , Nishka Dasgupta , Anthony Yznaga , Oscar Salvador , Dan Carpenter , "Isaac J. Manjarres" , Juergen Gross , Anshuman Khandual , Haiyang Zhang , =?UTF-8?Q?Simon_Sandstr=C3=B6m?= , Sasha Levin , kvm-ppc@vger.kernel.org, Qian Cai , Alex Williamson , Mike Rapoport , Borislav Petkov , Nicholas Piggin , Andy Lutomirski , xen-devel , Boris Ostrovsky , Todd Poynor , Vitaly Kuznetsov , Allison Randal , Jim Mattson , Christophe Leroy , Vandana BN , Mel Gorman , Greg Kroah-Hartman , Cornelia Huck , Pavel Tatashin , Linux Kernel Mailing List , Sean Christopherson , Rob Springer , Thomas Gleixner , Johannes Weiner , Paolo Bonzini , Andrew Morton , linuxppc-dev Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit Errors-To: driverdev-devel-bounces@linuxdriverproject.org Sender: "devel" Hi David, Thanks for tackling this! On Tue, Oct 22, 2019 at 10:13 AM David Hildenbrand wrote: > > This series is based on [2], which should pop up in linux/next soon: > https://lkml.org/lkml/2019/10/21/1034 > > This is the result of a recent discussion with Michal ([1], [2]). Right > now we set all pages PG_reserved when initializing hotplugged memmaps. This > includes ZONE_DEVICE memory. In case of system memory, PG_reserved is > cleared again when onlining the memory, in case of ZONE_DEVICE memory > never. In ancient times, we needed PG_reserved, because there was no way > to tell whether the memmap was already properly initialized. We now have > SECTION_IS_ONLINE for that in the case of !ZONE_DEVICE memory. ZONE_DEVICE > memory is already initialized deferred, and there shouldn't be a visible > change in that regard. > > I remember that some time ago, we already talked about stopping to set > ZONE_DEVICE pages PG_reserved on the list, but I never saw any patches. > Also, I forgot who was part of the discussion :) You got me, Alex, and KVM folks on the Cc, so I'd say that was it. > One of the biggest fear were side effects. I went ahead and audited all > users of PageReserved(). The ones that don't need any care (patches) > can be found below. I will double check and hope I am not missing something > important. > > I am probably a little bit too careful (but I don't want to break things). > In most places (besides KVM and vfio that are nuts), the > pfn_to_online_page() check could most probably be avoided by a > is_zone_device_page() check. However, I usually get suspicious when I see > a pfn_valid() check (especially after I learned that people mmap parts of > /dev/mem into user space, including memory without memmaps. Also, people > could memmap offline memory blocks this way :/). As long as this does not > hurt performance, I think we should rather do it the clean way. I'm concerned about using is_zone_device_page() in places that are not known to already have a reference to the page. Here's an audit of current usages, and the ones I think need to cleaned up. The "unsafe" ones do not appear to have any protections against the device page being removed (get_dev_pagemap()). Yes, some of these were added by me. The "unsafe? HMM" ones need HMM eyes because HMM leaks device pages into anonymous memory paths and I'm not up to speed on how it guarantees 'struct page' validity vs device shutdown without using get_dev_pagemap(). smaps_pmd_entry(): unsafe put_devmap_managed_page(): safe, page reference is held is_device_private_page(): safe? gpu driver manages private page lifetime is_pci_p2pdma_page(): safe, page reference is held uncharge_page(): unsafe? HMM add_to_kill(): safe, protected by get_dev_pagemap() and dax_lock_page() soft_offline_page(): unsafe remove_migration_pte(): unsafe? HMM move_to_new_page(): unsafe? HMM migrate_vma_pages() and helpers: unsafe? HMM try_to_unmap_one(): unsafe? HMM __put_page(): safe release_pages(): safe I'm hoping all the HMM ones can be converted to is_device_private_page() directlly and have that routine grow a nice comment about how it knows it can always safely de-reference its @page argument. For the rest I'd like to propose that we add a facility to determine ZONE_DEVICE by pfn rather than page. The most straightforward why I can think of would be to just add another bitmap to mem_section_usage to indicate if a subsection is ZONE_DEVICE or not. > > I only gave it a quick test with DIMMs on x86-64, but didn't test the > ZONE_DEVICE part at all (any tips for a nice QEMU setup?). Compile-tested > on x86-64 and PPC. I'll give it a spin, but I don't think the kernel wants to grow more is_zone_device_page() users. _______________________________________________ devel mailing list devel@linuxdriverproject.org http://driverdev.linuxdriverproject.org/mailman/listinfo/driverdev-devel