From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-3.8 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,HEADER_FROM_DIFFERENT_DOMAINS,MAILING_LIST_MULTI,SPF_HELO_NONE, SPF_PASS,URIBL_BLOCKED autolearn=no autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id AE36AC4363A for ; Mon, 5 Oct 2020 18:24:21 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 7393320B80 for ; Mon, 5 Oct 2020 18:24:21 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (2048-bit key) header.d=intel-com.20150623.gappssmtp.com header.i=@intel-com.20150623.gappssmtp.com header.b="MCDa+1cw" Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1728027AbgJESYU (ORCPT ); Mon, 5 Oct 2020 14:24:20 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:47718 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1725960AbgJESYT (ORCPT ); Mon, 5 Oct 2020 14:24:19 -0400 Received: from mail-qk1-x741.google.com (mail-qk1-x741.google.com [IPv6:2607:f8b0:4864:20::741]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id B9EDFC0613CE for ; Mon, 5 Oct 2020 11:24:19 -0700 (PDT) Received: by mail-qk1-x741.google.com with SMTP id 188so2234126qkk.12 for ; Mon, 05 Oct 2020 11:24:19 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=intel-com.20150623.gappssmtp.com; s=20150623; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc; bh=OCSLqGNh9b5YSmYsk1DlWKJeyULS7OR1a85rgALsc9w=; b=MCDa+1cwmN3uqvhljUtqnjRDTYQZ/bi0REuAsgRPHawGFRLu9yuU9WlllBKYKSYWoC COppqAsu3MTAife2gP8sPBj5BM2BbGaDJxEwZvtuafwP4a4hdfpwdn/NGDE42M6wmPYx w6RKz8PK1yLHK6fqWqsgZhdkxQJNKhX5RC2wxagFLtzU3TwNIOuknrBo7berTriabUSB qqTiGv8KasNRDd1sfsDjr2lO2IxW1f9KChgBRQcBUplHmmgGaWF66ZIebhDq1VNBfLXS ScC9FSdcE4fiChkQkPfTWWSWJq5i1UZonfVL4S1jsX1F1gH31b2ZbJ02j2EKwtYWInDJ gSCQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=OCSLqGNh9b5YSmYsk1DlWKJeyULS7OR1a85rgALsc9w=; b=UCatfvQjC7Rn4jEzy8key/d3+ViLy3U1koyny+yl89Tr/hjbBMpHb/NIEGURCzvpsu bTlkyMFRmnDAcqcO+nRXTYe6a+fatGfGeSyi5lsNTxREuD8d2FssnppIN36xmg0dSgh+ ugfa1mAzM7sJp1dBarwLbljmi6yywycj5QHFI2L2rRJGLKMmXvB9qGHdeiusNjxu884j J1NcZK6NvAYAD+zjajORtbH2pvF1srS77TuM9f9j6H8tT0f4v5R/kXg6nDDAHN59CSXa 126l1hBHcgFBmwO/a/DMiXxxzdKbtZIPhX9OkVHsYjYfC8/TFcVyQlq+/6oOpQefq8EC 7h5g== X-Gm-Message-State: AOAM530IB/rm/69rfSpJFLNz/59+FjuVwgWdOVJJxtJxwp9NqCrTTnP9 S/e1hI5/FGrXdnH7/OrpQc1uxoUfd2qYez29mIoOQg== X-Google-Smtp-Source: ABdhPJxukAgIYi20kSI9kjVPF33fEc4rKpb9eQYgj02bTfT4kUplSjOE1HGcZ8LTlpn7FoBl8beo3/6xylx4QC83AAg= X-Received: by 2002:a05:620a:4d0:: with SMTP id 16mr1333224qks.200.1601922258954; Mon, 05 Oct 2020 11:24:18 -0700 (PDT) MIME-Version: 1.0 References: <20201001181715.17416-1-rcampbell@nvidia.com> In-Reply-To: <20201001181715.17416-1-rcampbell@nvidia.com> From: Dan Williams Date: Mon, 5 Oct 2020 11:24:07 -0700 Message-ID: Subject: Re: [RFC PATCH v3 0/2] mm: remove extra ZONE_DEVICE struct page refcount To: Ralph Campbell Cc: Linux MM , kvm-ppc@vger.kernel.org, nouveau@lists.freedesktop.org, Linux Kernel Mailing List , Ira Weiny , Matthew Wilcox , Jerome Glisse , John Hubbard , Alistair Popple , Christoph Hellwig , Jason Gunthorpe , Bharata B Rao , Zi Yan , "Kirill A . Shutemov" , Yang Shi , Paul Mackerras , Ben Skeggs , Andrew Morton Content-Type: text/plain; charset="UTF-8" Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Thu, Oct 1, 2020 at 11:17 AM Ralph Campbell wrote: > > This is still an RFC because after looking at the pmem/dax code some > more, I realized that the ZONE_DEVICE struct pages are being inserted > into the process' page tables with vmf_insert_mixed() and a zero > refcount on the ZONE_DEVICE struct page. This is sort of OK because > insert_pfn() increments the reference count on the pgmap which is what > prevents memunmap_pages() from freeing the struct pages and it doesn't > check for a non-zero struct page reference count. > But, any calls to get_page() will hit the VM_BUG_ON_PAGE() that > checks for a reference count == 0. > > // mmap() an ext4 file that is mounted -o dax. > ext4_dax_fault() > ext4_dax_huge_fault() > dax_iomap_fault(&ext4_iomap_ops) > dax_iomap_pte_fault() > ops->iomap_begin() // ext4_iomap_begin() > ext4_map_blocks() > ext4_set_iomap() > dax_iomap_pfn() > dax_insert_entry() > vmf_insert_mixed(pfn) > __vm_insert_mixed() > if (!IS_ENABLED(CONFIG_ARCH_HAS_PTE_SPECIAL) && > !pfn_t_devmap(pfn) && pfn_t_valid(pfn)) > insert_page() > get_page(page) // XXX would trigger VM_BUG_ON_PAGE() > page_add_file_rmap() > set_pte_at() > else > insert_pfn() > pte_mkdevmap() > set_pte_at() > > Should pmem set the page reference count to one before inserting the > pfn into the page tables (and decrement when removing devmap PTEs)? > What about MEMORY_DEVICE_GENERIC and MEMORY_DEVICE_PCI_P2PDMA use cases? > Where should they icrement/decrement the page reference count? > I don't know enough about how these are used to really know what to > do at this point. If people want me to continue to work on this series, > I will need some guidance. fs/dax could take the reference when inserting, but that would mean that ext4 and xfs would need to go back to checking for 1 to be page idle. I think that's ok because the filesystem is actually not checking for page-idle it's checking for "get_user_pages()" idle. From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-3.8 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,HEADER_FROM_DIFFERENT_DOMAINS,MAILING_LIST_MULTI,SPF_HELO_NONE, SPF_PASS,URIBL_BLOCKED autolearn=no autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id B2A98C46466 for ; Mon, 5 Oct 2020 18:24:22 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id 49B2020B80 for ; Mon, 5 Oct 2020 18:24:21 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (2048-bit key) header.d=intel-com.20150623.gappssmtp.com header.i=@intel-com.20150623.gappssmtp.com header.b="MCDa+1cw" DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 49B2020B80 Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=intel.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id 746B5900012; Mon, 5 Oct 2020 14:24:21 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 71DF090000C; Mon, 5 Oct 2020 14:24:21 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 659DB900012; Mon, 5 Oct 2020 14:24:21 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0211.hostedemail.com [216.40.44.211]) by kanga.kvack.org (Postfix) with ESMTP id 3944890000C for ; Mon, 5 Oct 2020 14:24:21 -0400 (EDT) Received: from smtpin15.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay05.hostedemail.com (Postfix) with ESMTP id CF107181AE870 for ; Mon, 5 Oct 2020 18:24:20 +0000 (UTC) X-FDA: 77338696680.15.pot66_2e0a74a271c0 Received: from filter.hostedemail.com (10.5.16.251.rfc1918.com [10.5.16.251]) by smtpin15.hostedemail.com (Postfix) with ESMTP id 8F6051814B0C8 for ; Mon, 5 Oct 2020 18:24:20 +0000 (UTC) X-HE-Tag: pot66_2e0a74a271c0 X-Filterd-Recvd-Size: 5457 Received: from mail-qk1-f194.google.com (mail-qk1-f194.google.com [209.85.222.194]) by imf38.hostedemail.com (Postfix) with ESMTP for ; Mon, 5 Oct 2020 18:24:19 +0000 (UTC) Received: by mail-qk1-f194.google.com with SMTP id w16so13077004qkj.7 for ; Mon, 05 Oct 2020 11:24:19 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=intel-com.20150623.gappssmtp.com; s=20150623; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc; bh=OCSLqGNh9b5YSmYsk1DlWKJeyULS7OR1a85rgALsc9w=; b=MCDa+1cwmN3uqvhljUtqnjRDTYQZ/bi0REuAsgRPHawGFRLu9yuU9WlllBKYKSYWoC COppqAsu3MTAife2gP8sPBj5BM2BbGaDJxEwZvtuafwP4a4hdfpwdn/NGDE42M6wmPYx w6RKz8PK1yLHK6fqWqsgZhdkxQJNKhX5RC2wxagFLtzU3TwNIOuknrBo7berTriabUSB qqTiGv8KasNRDd1sfsDjr2lO2IxW1f9KChgBRQcBUplHmmgGaWF66ZIebhDq1VNBfLXS ScC9FSdcE4fiChkQkPfTWWSWJq5i1UZonfVL4S1jsX1F1gH31b2ZbJ02j2EKwtYWInDJ gSCQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=OCSLqGNh9b5YSmYsk1DlWKJeyULS7OR1a85rgALsc9w=; b=PZCVyp9aNRYlQMZu05pX1pM52B4Tc9i05kUIkrZwexVRelYhV5yVKsuD9/TyBY0Jlm iiUz+YgsAaBT8muHcTWoVJVT9yZ0p2vbh12LVLizjJs94n7Xi/oJ64+OSC3AbgMuXe/L nr4PCxJH6g4M0UUE+Ak3uRQ5WAuoqTxAK3grtkiSwLdFqaxgVPn4McuiKE3KgEVYbnaa 3YoYk6iAwObor2AXw91h2PHpOdMQ/FOlu0PdV2X+R78jG7qE3JeYcBQZxt/CgVtH6pPI 4N2nmqiUxbQzBt7Q+9JR9LX5A40GATnairRqB00K57/K4hUbNRQzt9Y6H83HHbRcOAcQ Z9kA== X-Gm-Message-State: AOAM533iO+idTPmT8V51OXdU7fribtF2Gcw+97oQsRbHIgtyW5ve9GqG NssNyuylxxrEzmLFXVMykTFNQn54H3kwQaDP+zljWQ== X-Google-Smtp-Source: ABdhPJxukAgIYi20kSI9kjVPF33fEc4rKpb9eQYgj02bTfT4kUplSjOE1HGcZ8LTlpn7FoBl8beo3/6xylx4QC83AAg= X-Received: by 2002:a05:620a:4d0:: with SMTP id 16mr1333224qks.200.1601922258954; Mon, 05 Oct 2020 11:24:18 -0700 (PDT) MIME-Version: 1.0 References: <20201001181715.17416-1-rcampbell@nvidia.com> In-Reply-To: <20201001181715.17416-1-rcampbell@nvidia.com> From: Dan Williams Date: Mon, 5 Oct 2020 11:24:07 -0700 Message-ID: Subject: Re: [RFC PATCH v3 0/2] mm: remove extra ZONE_DEVICE struct page refcount To: Ralph Campbell Cc: Linux MM , kvm-ppc@vger.kernel.org, nouveau@lists.freedesktop.org, Linux Kernel Mailing List , Ira Weiny , Matthew Wilcox , Jerome Glisse , John Hubbard , Alistair Popple , Christoph Hellwig , Jason Gunthorpe , Bharata B Rao , Zi Yan , "Kirill A . Shutemov" , Yang Shi , Paul Mackerras , Ben Skeggs , Andrew Morton Content-Type: text/plain; charset="UTF-8" X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On Thu, Oct 1, 2020 at 11:17 AM Ralph Campbell wrote: > > This is still an RFC because after looking at the pmem/dax code some > more, I realized that the ZONE_DEVICE struct pages are being inserted > into the process' page tables with vmf_insert_mixed() and a zero > refcount on the ZONE_DEVICE struct page. This is sort of OK because > insert_pfn() increments the reference count on the pgmap which is what > prevents memunmap_pages() from freeing the struct pages and it doesn't > check for a non-zero struct page reference count. > But, any calls to get_page() will hit the VM_BUG_ON_PAGE() that > checks for a reference count == 0. > > // mmap() an ext4 file that is mounted -o dax. > ext4_dax_fault() > ext4_dax_huge_fault() > dax_iomap_fault(&ext4_iomap_ops) > dax_iomap_pte_fault() > ops->iomap_begin() // ext4_iomap_begin() > ext4_map_blocks() > ext4_set_iomap() > dax_iomap_pfn() > dax_insert_entry() > vmf_insert_mixed(pfn) > __vm_insert_mixed() > if (!IS_ENABLED(CONFIG_ARCH_HAS_PTE_SPECIAL) && > !pfn_t_devmap(pfn) && pfn_t_valid(pfn)) > insert_page() > get_page(page) // XXX would trigger VM_BUG_ON_PAGE() > page_add_file_rmap() > set_pte_at() > else > insert_pfn() > pte_mkdevmap() > set_pte_at() > > Should pmem set the page reference count to one before inserting the > pfn into the page tables (and decrement when removing devmap PTEs)? > What about MEMORY_DEVICE_GENERIC and MEMORY_DEVICE_PCI_P2PDMA use cases? > Where should they icrement/decrement the page reference count? > I don't know enough about how these are used to really know what to > do at this point. If people want me to continue to work on this series, > I will need some guidance. fs/dax could take the reference when inserting, but that would mean that ext4 and xfs would need to go back to checking for 1 to be page idle. I think that's ok because the filesystem is actually not checking for page-idle it's checking for "get_user_pages()" idle. From mboxrd@z Thu Jan 1 00:00:00 1970 From: Dan Williams Date: Mon, 05 Oct 2020 18:24:07 +0000 Subject: Re: [RFC PATCH v3 0/2] mm: remove extra ZONE_DEVICE struct page refcount Message-Id: List-Id: References: <20201001181715.17416-1-rcampbell@nvidia.com> In-Reply-To: <20201001181715.17416-1-rcampbell@nvidia.com> MIME-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit To: Ralph Campbell Cc: Linux MM , kvm-ppc@vger.kernel.org, nouveau@lists.freedesktop.org, Linux Kernel Mailing List , Ira Weiny , Matthew Wilcox , Jerome Glisse , John Hubbard , Alistair Popple , Christoph Hellwig , Jason Gunthorpe , Bharata B Rao , Zi Yan , "Kirill A . Shutemov" , Yang Shi , Paul Mackerras , Ben Skeggs , Andrew Morton On Thu, Oct 1, 2020 at 11:17 AM Ralph Campbell wrote: > > This is still an RFC because after looking at the pmem/dax code some > more, I realized that the ZONE_DEVICE struct pages are being inserted > into the process' page tables with vmf_insert_mixed() and a zero > refcount on the ZONE_DEVICE struct page. This is sort of OK because > insert_pfn() increments the reference count on the pgmap which is what > prevents memunmap_pages() from freeing the struct pages and it doesn't > check for a non-zero struct page reference count. > But, any calls to get_page() will hit the VM_BUG_ON_PAGE() that > checks for a reference count = 0. > > // mmap() an ext4 file that is mounted -o dax. > ext4_dax_fault() > ext4_dax_huge_fault() > dax_iomap_fault(&ext4_iomap_ops) > dax_iomap_pte_fault() > ops->iomap_begin() // ext4_iomap_begin() > ext4_map_blocks() > ext4_set_iomap() > dax_iomap_pfn() > dax_insert_entry() > vmf_insert_mixed(pfn) > __vm_insert_mixed() > if (!IS_ENABLED(CONFIG_ARCH_HAS_PTE_SPECIAL) && > !pfn_t_devmap(pfn) && pfn_t_valid(pfn)) > insert_page() > get_page(page) // XXX would trigger VM_BUG_ON_PAGE() > page_add_file_rmap() > set_pte_at() > else > insert_pfn() > pte_mkdevmap() > set_pte_at() > > Should pmem set the page reference count to one before inserting the > pfn into the page tables (and decrement when removing devmap PTEs)? > What about MEMORY_DEVICE_GENERIC and MEMORY_DEVICE_PCI_P2PDMA use cases? > Where should they icrement/decrement the page reference count? > I don't know enough about how these are used to really know what to > do at this point. If people want me to continue to work on this series, > I will need some guidance. fs/dax could take the reference when inserting, but that would mean that ext4 and xfs would need to go back to checking for 1 to be page idle. I think that's ok because the filesystem is actually not checking for page-idle it's checking for "get_user_pages()" idle.