From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-7.5 required=3.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,USER_AGENT_SANE_1 autolearn=no autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id C8D97C433DB for ; Sun, 10 Jan 2021 02:51:30 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id 465A0236F9 for ; Sun, 10 Jan 2021 02:51:29 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 465A0236F9 Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=redhat.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id 2E85B6B00FD; Sat, 9 Jan 2021 21:51:29 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 298BA8D0015; Sat, 9 Jan 2021 21:51:29 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 187118D0002; Sat, 9 Jan 2021 21:51:29 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0183.hostedemail.com [216.40.44.183]) by kanga.kvack.org (Postfix) with ESMTP id F2E956B00FD for ; Sat, 9 Jan 2021 21:51:28 -0500 (EST) Received: from smtpin20.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay03.hostedemail.com (Postfix) with ESMTP id 8B42C8248047 for ; Sun, 10 Jan 2021 02:51:28 +0000 (UTC) X-FDA: 77688339456.20.sail20_0d1462627500 Received: from filter.hostedemail.com (10.5.16.251.rfc1918.com [10.5.16.251]) by smtpin20.hostedemail.com (Postfix) with ESMTP id 69E6A180C07A3 for ; Sun, 10 Jan 2021 02:51:28 +0000 (UTC) X-HE-Tag: sail20_0d1462627500 X-Filterd-Recvd-Size: 6537 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [216.205.24.124]) by imf20.hostedemail.com (Postfix) with ESMTP for ; Sun, 10 Jan 2021 02:51:27 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1610247087; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: in-reply-to:in-reply-to:references:references; bh=aIoIQn4ojBUCPSiaZ09DjloVv1kHMY3eIPXoRrSNrA8=; b=E8NPd0GsWayFaVvqpMTnxE3AhZKQjic1hy51GqfAOf0vywpIWariZeGhvhVazHfkzpfGf7 zYUfBh9+SKAbApwCCwl/7X7IBUxwrbp/6t7nE3pLPpbFDmjIallvCHRfFiSoo0Tp/2Nolj wnrTfmREAPvk5Y5oMp9EDrC1L8mDpoU= Received: from mimecast-mx01.redhat.com (mimecast-mx01.redhat.com [209.132.183.4]) (Using TLS) by relay.mimecast.com with ESMTP id us-mta-507-bCKjKWr4Mgm76VpykG7jOQ-1; Sat, 09 Jan 2021 21:51:25 -0500 X-MC-Unique: bCKjKWr4Mgm76VpykG7jOQ-1 Received: from smtp.corp.redhat.com (int-mx04.intmail.prod.int.phx2.redhat.com [10.5.11.14]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx01.redhat.com (Postfix) with ESMTPS id DDE7D801817; Sun, 10 Jan 2021 02:51:21 +0000 (UTC) Received: from mail (ovpn-112-222.rdu2.redhat.com [10.10.112.222]) by smtp.corp.redhat.com (Postfix) with ESMTPS id DEADF5D9D5; Sun, 10 Jan 2021 02:51:14 +0000 (UTC) Date: Sat, 9 Jan 2021 21:51:14 -0500 From: Andrea Arcangeli To: Linus Torvalds Cc: Andrew Morton , Linux-MM , Linux Kernel Mailing List , Yu Zhao , Andy Lutomirski , Peter Xu , Pavel Emelyanov , Mike Kravetz , Mike Rapoport , Minchan Kim , Will Deacon , Peter Zijlstra , Hugh Dickins , "Kirill A. Shutemov" , Matthew Wilcox , Oleg Nesterov , Jann Horn , Kees Cook , John Hubbard , Leon Romanovsky , Jason Gunthorpe , Jan Kara , Kirill Tkhai , Nadav Amit , Jens Axboe Subject: Re: [PATCH 0/1] mm: restore full accuracy in COW page reuse Message-ID: References: <20210110004435.26382-1-aarcange@redhat.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: User-Agent: Mutt/2.0.4 (2020-12-30) X-Scanned-By: MIMEDefang 2.79 on 10.5.11.14 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: Hello Linus, On Sat, Jan 09, 2021 at 05:19:51PM -0800, Linus Torvalds wrote: > +#define is_cow_mapping(flags) (((flags) & (VM_SHARED | VM_MAYWRITE)) == VM_MAYWRITE) > + > +static inline bool pte_is_pinned(struct vm_area_struct *vma, unsigned long addr, pte_t pte) > +{ > + struct page *page; > + > + if (!is_cow_mapping(vma->vm_flags)) > + return false; > + if (likely(!atomic_read(&vma->vm_mm->has_pinned))) > + return false; > + page = vm_normal_page(vma, addr, pte); > + if (!page) > + return false; > + if (page_mapcount(page) != 1) > + return false; > + return page_maybe_dma_pinned(page); > +} I just don't see the simplification coming from 09854ba94c6aad7886996bfbee2530b3d8a7f4f4. Instead of checking page_mapcount above as an optimization, to me it looks much simpler to check it in a single place, in do_wp_page, that in addition of optimizing away the superfluous copy, would optimize away the above complexity as well. And I won't comment if it's actually safe to skip random pages or not. All I know is for mprotect and uffd-wp, definitely the above approach wouldn't work. In addition I dislike has_pinned and FOLL_PIN. has_pinned 450 include/linux/mm_types.h * for instance during page table copying for fork(). has_pinned 1021 kernel/fork.c atomic64_set(&mm->pinned_vm, 0); has_pinned 1239 mm/gup.c atomic_set(&mm->has_pinned, 1); has_pinned 2579 mm/gup.c atomic_set(¤t->mm->has_pinned, 1); has_pinned 1099 mm/huge_memory.c atomic_read(&src_mm->has_pinned) && has_pinned 1213 mm/huge_memory.c atomic_read(&src_mm->has_pinned) && has_pinned 819 mm/memory.c if (likely(!atomic_read(&src_mm->has_pinned))) Are we spending 32bit in mm_struct atomic_t just to call atomic_set(1) on it? Why isn't it a MMF_HAS_PINNED that already can be set atomically under mmap_read_lock too? There's bit left free there, we didn't run out yet to justify wasting another 31 bits. I hope I'm overlooking something. The existence of FOLL_LONGTERM is good and makes a difference at times for writeback if it's on a MAP_SHARED, or it makes difference during GUP to do a page_migrate before taking the pin, but for the whole rest of the VM it's irrelevant if it's long or short term, so I'm also concerned from what Jason mentioned about long term pins being treated differently within the VM. That to me looks fundamentally as flawed as introducing inaccuracies in do_wp_page, that even ignoring the performance implications caused by the inaccuracy, simply prevent to do some useful things. I obviously agree all common workloads with no GUP pins and no clear_refs and no uffd, are way more important to be optimal, but I haven't seen a single benchmark showing the benefit of not taking the PG_lock before a page copy or any other runtime benefit coming from page_count in do_wp_page. To the contrary now I see additional branches in fork and in various other paths. The only thing again where I see page_count provides a tangible benefit, is the vmsplice attack from child to parent, but that has not been fully fixed and if page_count is added to fix it in all COW faults, it'll introduce extra inefficiency to the the very common important workloads, not only to the special GUP/clear_refs/uffd-wp workloads as your patch above shows. Thanks, Andrea