From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-4.0 required=3.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_ADSP_CUSTOM_MED,DKIM_SIGNED,DKIM_VALID,HEADER_FROM_DIFFERENT_DOMAINS, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS autolearn=no autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id EDEA0C433E1 for ; Thu, 13 Aug 2020 22:05:37 +0000 (UTC) Received: from merlin.infradead.org (merlin.infradead.org [205.233.59.134]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id BAE4B206B2 for ; Thu, 13 Aug 2020 22:05:37 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (2048-bit key) header.d=lists.infradead.org header.i=@lists.infradead.org header.b="3WALeO+P"; dkim=fail reason="signature verification failed" (2048-bit key) header.d=google.com header.i=@google.com header.b="V7AXdfFN" DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org BAE4B206B2 Authentication-Results: mail.kernel.org; dmarc=fail (p=reject dis=none) header.from=google.com Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=linux-arm-kernel-bounces+linux-arm-kernel=archiver.kernel.org@lists.infradead.org DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=lists.infradead.org; s=merlin.20170209; h=Sender:Content-Transfer-Encoding: Content-Type:Cc:List-Subscribe:List-Help:List-Post:List-Archive: List-Unsubscribe:List-Id:To:Subject:Message-ID:Date:From:In-Reply-To: References:MIME-Version:Reply-To:Content-ID:Content-Description:Resent-Date: Resent-From:Resent-Sender:Resent-To:Resent-Cc:Resent-Message-ID:List-Owner; bh=cBMXwQNoxcsJ7u+8DffZre4hxVMkDzb3S3RozsPu4lg=; b=3WALeO+PlHvuZir6FbEIipw3g nZKrDQRiUeTACXet+/DzUlugumlZNVQCdh55SIzNjzmjX1vJ6FfySEVZaz8IfDFEY8zGyOjpbq0VL O0SJ4wA5trcHD4gKxOxDV9Jxe+uNTArLciHweiGAQ7/EiZHnjIvWiexQHuv6wDPk51yxyikpnlPsN 27TSnhlh4Ag8gbeDrNV9dqpXxgi5hWz7S1ZIGTn7EKbKKT3TNxq239CEsPHwRmpR9jiRmJTKCcf/l ybgwH7ze9zOe0JFKupPVA7hKvoFCC27ukOiamjo9gvdcNwzB9b50YI0wProc9eNIfEtRFpoLAqIzW eVjFxEbFw==; Received: from localhost ([::1] helo=merlin.infradead.org) by merlin.infradead.org with esmtp (Exim 4.92.3 #3 (Red Hat Linux)) id 1k6LKJ-0004Ea-ES; Thu, 13 Aug 2020 22:03:59 +0000 Received: from mail-ua1-x943.google.com ([2607:f8b0:4864:20::943]) by merlin.infradead.org with esmtps (Exim 4.92.3 #3 (Red Hat Linux)) id 1k6LKG-0004Di-Ku for linux-arm-kernel@lists.infradead.org; Thu, 13 Aug 2020 22:03:57 +0000 Received: by mail-ua1-x943.google.com with SMTP id y17so2102636uaq.6 for ; Thu, 13 Aug 2020 15:03:54 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20161025; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc; bh=BVKXKIS8IbHKPdQCWJdZpYvWDsPi+sBRZ3nGwanMFxA=; b=V7AXdfFNYuUujMJg6sXoZg68o0kl9xUN+xd8U30ealMq0h19bogdfOZP9AtoEW9Pjl nLkET6AD53RKM2ByK8wwt2DwoJjyL/9RRJE/rz3Uzt1UW9KO7yGQc6Y9/Ai793yL6TTc vDTFDUlFDEOeJE5Y7aEFswCB612md9cGzaSSmlOQgWu035h792p2smyti8NSNpwzhCzN LcqyW5OuL4mk9Bx2xB6NsGXUBVe26eE4GZMQdlVT9Pkg5AMSyHFf5GWgqFOQpWr5QJ1c pOR+lxPeistAJfMO6icXyDVpYy6QIQRpQW+r9zwvT0+ATubhCh248Goh6/5oIU2HHpgU UT0A== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=BVKXKIS8IbHKPdQCWJdZpYvWDsPi+sBRZ3nGwanMFxA=; b=OrsBC8IjNC11glOifyuFmK+ZU1rhveZDXvz+HZmk5z1Uc/1JcbNwd6vty1INeyvQ3K zBhSlwmn5VPiWFBrAA9CdWHZrIovbZFCRP2BZbGZd7QbqMxhcf8wddlDbeL0FZgF2AuY K5SzPDAcrLs39xMSLOBTvnaBBFlaP+ERzcWRb7l/QdAMp37LNs8tBBxZuHWRmkaFFwsx 9ylMRkh0YKaGMjmbRkhLrv1fZCkW+dBq/pPDKvTDadShh1c728UTGvJSWhsH2sqcHoT9 rG6M4j9EGZERtDu4Y2tWK7fdJ+oCTazFsTj5x+hBexgN2FqyGYG/Jk6ua7bjV5Wn7sMi s8QA== X-Gm-Message-State: AOAM532ow5M8AnvNlLSDAtk+y4jTW33Z8fqpYEd3qR7y3TbagjuZ9jyO X6erbziUDWhmu+T20UdvRNrCsQPMjSwHKunrBep3vQ== X-Google-Smtp-Source: ABdhPJwjNYHH95l111Cz8ezZH8SBtxdivfO8SVFs8CY50pyFZT9ugA8RIwh79IQsDe9yAGJzc6Ngz58lKVglE/nX2rA= X-Received: by 2002:a9f:2648:: with SMTP id 66mr5470649uag.37.1597356232627; Thu, 13 Aug 2020 15:03:52 -0700 (PDT) MIME-Version: 1.0 References: <20200731203241.50427-1-pcc@google.com> <92fa4a71-d8dc-0f07-832c-cbceca43e537@nvidia.com> In-Reply-To: <92fa4a71-d8dc-0f07-832c-cbceca43e537@nvidia.com> From: Peter Collingbourne Date: Thu, 13 Aug 2020 15:03:41 -0700 Message-ID: Subject: Re: [PATCH] mm: introduce reference pages To: John Hubbard X-CRM114-Version: 20100106-BlameMichelson ( TRE 0.8.0 (BSD) ) MR-646709E3 X-CRM114-CacheID: sfid-20200813_180356_701286_FD5C69A6 X-CRM114-Status: GOOD ( 39.33 ) X-BeenThere: linux-arm-kernel@lists.infradead.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Catalin Marinas , Andrew Morton , Linux ARM , Evgenii Stepanov , linux-mm@kvack.org Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit Sender: "linux-arm-kernel" Errors-To: linux-arm-kernel-bounces+linux-arm-kernel=archiver.kernel.org@lists.infradead.org Hi John, Thanks for the review and suggestions. On Sun, Aug 2, 2020 at 8:28 PM John Hubbard wrote: > > On 7/31/20 1:32 PM, Peter Collingbourne wrote: > ... > > Hi, > > I can see why you want to do this. A few points to consider, below. > > btw, the patch would *not* apply for me, via `git am`. I finally used > patch(1) and that worked. Probably good to mention which tree and branch > this applies to, as a first step to avoiding that, but I'm not quite sure > what else went wrong. Maybe use stock git, instead of > 2.28.0.163.g6104cc2f0b6-goog? Just guessing. Sorry about that. It might have been because I had another patch applied underneath this one when I created the patch. In the v2 that I'm about to send I'm based directly on master. > > @@ -1684,9 +1695,33 @@ static inline int accountable_mapping(struct file *file, vm_flags_t vm_flags) > > return (vm_flags & (VM_NORESERVE | VM_SHARED | VM_WRITE)) == VM_WRITE; > > } > > > > +static vm_fault_t refpage_fault(struct vm_fault *vmf) > > +{ > > + struct page *page; > > + > > + if (get_user_pages((unsigned long)vmf->vma->vm_private_data, 1, 0, > > + &page, 0) != 1) > > + return VM_FAULT_SIGSEGV; > > + > > This will end up overflowing the page->_refcount in some situations. > > Some thoughts: > > In order to implement this feature, the reference pages need to be made > at least a little bit more special, and probably little bit more like > zero pages. At one extreme, for example, zero pages could be a special > case of reference pages, although I'm not sure of a clean way to > implement that. > > > The reason that more special-ness is required, is that things such as > reference counting and locking can be special-cased with zero pages. > Doing so allows avoiding page->_refcount overflows, for example. Your > patch here, however, allows normal pages to be treated *almost* like a > zero page, in that it's a page full of constant value data. But because > a refpage can be any page, not just a special one that is defined at a > single location, that leads to problems with refcounts. You're right, there is a potential reference count issue here. But it looks like the issue is not with _refcount but with _mapcount. For example, a program could create a reference page mapping with 2^32 pages, fault every page in the mapping and thereby overflow _mapcount. It looks like we can avoid this issue by aligning the handling of reference pages with that of the zero page, as you suggested. Like the zero page, _mapcount is now not modified on reference pages to track PTEs (this is done by causing vm_normal_page() to return null for these pages, as we do for the zero page). Ownership is moved to the struct file created by the new refpage_create (bikeshed colors welcome) syscall which returns a file descriptor, per Kirill's suggestion. A struct file's reference count is an atomic_long_t, which I assume cannot realistically overflow. A pointer to the reference page is stored in the VMA's vm_private_data, but this is mostly for convenience because the page is kept alive by the VMA's struct file reference. The VMA's vm_ops is now set to null, which causes us to follow the code path for anonymous pages, which has been modified to handle reference pages. That's all implemented in the v2 that I'm about to send. I considered having reference page mappings continue to provide a custom vm_ops, but this would require changes to the interface to preserve the specialness of the reference page. For example, vm_normal_page() would need to know to return null for the reference page in order to prevent _mapcount from overflowing, which could probably be done by adding a new interface to vm_ops, but that seemed more complicated than changing the anonymous page code path. > > + vmf->page = page; > > + return VM_FAULT_LOCKED; > > Is the page really locked, or is this a case of "the page is special and > we can safely claim it is locked"? Maybe I'm just confused about the use > of VM_FAULT_LOCKED: I thought you only should set it after locking the > page. You're right, it isn't locked at this point. I had confused locking the page with incrementing its _refcount via get_user_pages(). But with the new implementation we no longer need this fault handler. > > +} > > + > > +static void refpage_close(struct vm_area_struct *vma) > > +{ > > + /* This function exists only to prevent is_mergeable_vma from allowing a > > + * reference page mapping to be merged with an anonymous mapping. > > + */ > > While it is true that implementing a vma's .close() method will prevent > vma merging, this is an abuse of that function: it depends on how that > function is implemented. And given that refpages represent significant > new capability, I think they deserve their own "if" clause (and perhaps > a VMA flag) in is_mergeable_vma(), instead of this kind of minor hack. It turns out that with the change to use a file descriptor we do not need a change to is_mergeable_vma() because the function bails out if the struct file pointers in the VMAs are different. Thanks, Peter _______________________________________________ linux-arm-kernel mailing list linux-arm-kernel@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-arm-kernel