From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 68510C433EF for ; Thu, 14 Apr 2022 17:18:03 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S245708AbiDNRU0 (ORCPT ); Thu, 14 Apr 2022 13:20:26 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:33760 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1345501AbiDNRTI (ORCPT ); Thu, 14 Apr 2022 13:19:08 -0400 Received: from smtp-out2.suse.de (smtp-out2.suse.de [195.135.220.29]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 1CFE75F9C for ; Thu, 14 Apr 2022 10:15:05 -0700 (PDT) Received: from imap2.suse-dmz.suse.de (imap2.suse-dmz.suse.de [192.168.254.74]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (P-521) server-digest SHA512) (No client certificate requested) by smtp-out2.suse.de (Postfix) with ESMTPS id 883AE1F747; Thu, 14 Apr 2022 17:15:03 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=suse.cz; s=susede2_rsa; t=1649956503; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=RI4vY7SM3nGEvLVWVvIyfXvrCtyaLTk6ZtXdIjy/7XI=; b=0WIrx6FIqdSAruwf5lhWFHS8dCyQmcvo0Vl1gBzblLGWa3nWwl2QtPL8C/77cock9Bj1CY QEFSSfivasMVuzIs0dyRkP48cd+vcIzuQGRT7cmU7HK5K3IQIE/+8unDSvnCQ4lQmmSiI8 b9SCYyRbCQJQhCjmYYHeN9fuhRuzydA= DKIM-Signature: v=1; a=ed25519-sha256; c=relaxed/relaxed; d=suse.cz; s=susede2_ed25519; t=1649956503; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=RI4vY7SM3nGEvLVWVvIyfXvrCtyaLTk6ZtXdIjy/7XI=; b=WUH+I7Uzjz1uKrRfEhnCMZiS3wSqkYgxcOOIqZ9KF0lYMOt75GjoMILRbdgQPyiYUOkDnf kEs1jAU7U5ZyuTAg== Received: from imap2.suse-dmz.suse.de (imap2.suse-dmz.suse.de [192.168.254.74]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (P-521) server-digest SHA512) (No client certificate requested) by imap2.suse-dmz.suse.de (Postfix) with ESMTPS id 256E213A86; Thu, 14 Apr 2022 17:15:03 +0000 (UTC) Received: from dovecot-director2.suse.de ([192.168.254.65]) by imap2.suse-dmz.suse.de with ESMTPSA id 34VKCJdWWGL/GAAAMHmgww (envelope-from ); Thu, 14 Apr 2022 17:15:03 +0000 Message-ID: <9005b167-db08-c967-463b-5e0e092cbb6c@suse.cz> Date: Thu, 14 Apr 2022 19:15:02 +0200 MIME-Version: 1.0 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:91.0) Gecko/20100101 Thunderbird/91.7.0 Content-Language: en-US To: David Hildenbrand , linux-kernel@vger.kernel.org Cc: Andrew Morton , Hugh Dickins , Linus Torvalds , David Rientjes , Shakeel Butt , John Hubbard , Jason Gunthorpe , Mike Kravetz , Mike Rapoport , Yang Shi , "Kirill A . Shutemov" , Matthew Wilcox , Jann Horn , Michal Hocko , Nadav Amit , Rik van Riel , Roman Gushchin , Andrea Arcangeli , Peter Xu , Donald Dutile , Christoph Hellwig , Oleg Nesterov , Jan Kara , Liang Zhang , Pedro Gomes , Oded Gabbay , linux-mm@kvack.org References: <20220329160440.193848-1-david@redhat.com> <20220329160440.193848-15-david@redhat.com> From: Vlastimil Babka Subject: Re: [PATCH v3 14/16] mm: support GUP-triggered unsharing of anonymous pages In-Reply-To: <20220329160440.193848-15-david@redhat.com> Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 7bit Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On 3/29/22 18:04, David Hildenbrand wrote: > Whenever GUP currently ends up taking a R/O pin on an anonymous page that > might be shared -- mapped R/O and !PageAnonExclusive() -- any write fault > on the page table entry will end up replacing the mapped anonymous page > due to COW, resulting in the GUP pin no longer being consistent with the > page actually mapped into the page table. > > The possible ways to deal with this situation are: > (1) Ignore and pin -- what we do right now. > (2) Fail to pin -- which would be rather surprising to callers and > could break user space. > (3) Trigger unsharing and pin the now exclusive page -- reliable R/O > pins. > > We want to implement 3) because it provides the clearest semantics and > allows for checking in unpin_user_pages() and friends for possible BUGs: > when trying to unpin a page that's no longer exclusive, clearly > something went very wrong and might result in memory corruptions that > might be hard to debug. So we better have a nice way to spot such > issues. > > To implement 3), we need a way for GUP to trigger unsharing: > FAULT_FLAG_UNSHARE. FAULT_FLAG_UNSHARE is only applicable to R/O mapped > anonymous pages and resembles COW logic during a write fault. However, in > contrast to a write fault, GUP-triggered unsharing will, for example, still > maintain the write protection. > > Let's implement FAULT_FLAG_UNSHARE by hooking into the existing write fault > handlers for all applicable anonymous page types: ordinary pages, THP and > hugetlb. > > * If FAULT_FLAG_UNSHARE finds a R/O-mapped anonymous page that has been > marked exclusive in the meantime by someone else, there is nothing to do. > * If FAULT_FLAG_UNSHARE finds a R/O-mapped anonymous page that's not > marked exclusive, it will try detecting if the process is the exclusive > owner. If exclusive, it can be set exclusive similar to reuse logic > during write faults via page_move_anon_rmap() and there is nothing > else to do; otherwise, we either have to copy and map a fresh, > anonymous exclusive page R/O (ordinary pages, hugetlb), or split the > THP. > > This commit is heavily based on patches by Andrea. > > Co-developed-by: Andrea Arcangeli > Signed-off-by: Andrea Arcangeli > Signed-off-by: David Hildenbrand Acked-by: Vlastimil Babka Modulo a nit and suspected logical bug below. > @@ -3072,6 +3082,7 @@ static vm_fault_t wp_page_copy(struct vm_fault *vmf) > * mmu page tables (such as kvm shadow page tables), we want the > * new page to be mapped directly into the secondary page table. > */ > + BUG_ON(unshare && pte_write(entry)); > set_pte_at_notify(mm, vmf->address, vmf->pte, entry); > update_mmu_cache(vma, vmf->address, vmf->pte); > if (old_page) { > @@ -3121,7 +3132,7 @@ static vm_fault_t wp_page_copy(struct vm_fault *vmf) > free_swap_cache(old_page); > put_page(old_page); > } > - return page_copied ? VM_FAULT_WRITE : 0; > + return page_copied && !unshare ? VM_FAULT_WRITE : 0; Could be just me but I would prefer (page_copied && !unshare) as I rarely see these operators together like this to remember their relative priority very well. > oom_free_new: > put_page(new_page); > oom: > @@ -4515,8 +4550,11 @@ static inline vm_fault_t create_huge_pmd(struct vm_fault *vmf) > /* `inline' is required to avoid gcc 4.1.2 build error */ > static inline vm_fault_t wp_huge_pmd(struct vm_fault *vmf) > { > + const bool unshare = vmf->flags & FAULT_FLAG_UNSHARE; > + > if (vma_is_anonymous(vmf->vma)) { > - if (userfaultfd_huge_pmd_wp(vmf->vma, vmf->orig_pmd)) > + if (unlikely(unshare) && Is this condition flipped, should it be "likely(!unshare)"? As the similar code in do_wp_page() does. > + userfaultfd_huge_pmd_wp(vmf->vma, vmf->orig_pmd)) > return handle_userfault(vmf, VM_UFFD_WP); > return do_huge_pmd_wp_page(vmf); > } > @@ -4651,10 +4689,11 @@ static vm_fault_t handle_pte_fault(struct vm_fault *vmf) > update_mmu_tlb(vmf->vma, vmf->address, vmf->pte); > goto unlock; > } > - if (vmf->flags & FAULT_FLAG_WRITE) { > + if (vmf->flags & (FAULT_FLAG_WRITE|FAULT_FLAG_UNSHARE)) { > if (!pte_write(entry)) > return do_wp_page(vmf); > - entry = pte_mkdirty(entry); > + else if (likely(vmf->flags & FAULT_FLAG_WRITE)) > + entry = pte_mkdirty(entry); > } > entry = pte_mkyoung(entry); > if (ptep_set_access_flags(vmf->vma, vmf->address, vmf->pte, entry,