From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-7.5 required=3.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,URIBL_RED,USER_AGENT_SANE_1 autolearn=no autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 93456C433E0 for ; Sun, 10 Jan 2021 03:25:11 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id 139F522522 for ; Sun, 10 Jan 2021 03:25:10 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 139F522522 Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=redhat.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id 170AC8D0016; Sat, 9 Jan 2021 22:25:10 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 0FBE58D0015; Sat, 9 Jan 2021 22:25:10 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id F2A238D0016; Sat, 9 Jan 2021 22:25:09 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0211.hostedemail.com [216.40.44.211]) by kanga.kvack.org (Postfix) with ESMTP id D603D8D0015 for ; Sat, 9 Jan 2021 22:25:09 -0500 (EST) Received: from smtpin26.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay03.hostedemail.com (Postfix) with ESMTP id 938278248076 for ; Sun, 10 Jan 2021 03:25:09 +0000 (UTC) X-FDA: 77688424338.26.sleep46_501711027500 Received: from filter.hostedemail.com (10.5.16.251.rfc1918.com [10.5.16.251]) by smtpin26.hostedemail.com (Postfix) with ESMTP id 715061804A300 for ; Sun, 10 Jan 2021 03:25:09 +0000 (UTC) X-HE-Tag: sleep46_501711027500 X-Filterd-Recvd-Size: 5703 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [216.205.24.124]) by imf04.hostedemail.com (Postfix) with ESMTP for ; Sun, 10 Jan 2021 03:25:08 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1610249107; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: in-reply-to:in-reply-to:references:references; bh=/+8EcBqhJXYrraKAb9xX5znLleqitaHZlI2nnyEeNlE=; b=Hp+UFFPLBk059CZiqujqX8wpIT3CmDia5hNV0x75vBfaQn96LIXyh9V3udM3nHxyN17OZI fnkwpA/vDi9678oZ+u+bMDwx1PnWevnGlsQT4GrWXrHM5RbkxvNsSiB2DITZF/IrlNVE6e cqiFA23Yab04wHxsBhLmE+7zQ0Co7jw= Received: from mimecast-mx01.redhat.com (mimecast-mx01.redhat.com [209.132.183.4]) (Using TLS) by relay.mimecast.com with ESMTP id us-mta-107-X3KvbQBtNUule9ZWTCgMXw-1; Sat, 09 Jan 2021 22:25:04 -0500 X-MC-Unique: X3KvbQBtNUule9ZWTCgMXw-1 Received: from smtp.corp.redhat.com (int-mx08.intmail.prod.int.phx2.redhat.com [10.5.11.23]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx01.redhat.com (Postfix) with ESMTPS id D08B3800D25; Sun, 10 Jan 2021 03:25:00 +0000 (UTC) Received: from mail (ovpn-112-222.rdu2.redhat.com [10.10.112.222]) by smtp.corp.redhat.com (Postfix) with ESMTPS id A533E19D9B; Sun, 10 Jan 2021 03:24:53 +0000 (UTC) Date: Sat, 9 Jan 2021 22:24:52 -0500 From: Andrea Arcangeli To: Linus Torvalds Cc: Andrew Morton , Linux-MM , Linux Kernel Mailing List , Yu Zhao , Andy Lutomirski , Peter Xu , Pavel Emelyanov , Mike Kravetz , Mike Rapoport , Minchan Kim , Will Deacon , Peter Zijlstra , Hugh Dickins , "Kirill A. Shutemov" , Matthew Wilcox , Oleg Nesterov , Jann Horn , Kees Cook , John Hubbard , Leon Romanovsky , Jason Gunthorpe , Jan Kara , Kirill Tkhai , Nadav Amit , Jens Axboe Subject: Re: [PATCH 0/1] mm: restore full accuracy in COW page reuse Message-ID: References: <20210110004435.26382-1-aarcange@redhat.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: User-Agent: Mutt/2.0.4 (2020-12-30) X-Scanned-By: MIMEDefang 2.84 on 10.5.11.23 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On Sat, Jan 09, 2021 at 05:37:09PM -0800, Linus Torvalds wrote: > On Sat, Jan 9, 2021 at 5:19 PM Linus Torvalds > wrote: > > > > And no, I didn't make the UFFDIO_WRITEPROTECT code take the mmap_sem > > for writing. For whoever wants to look at that, it's > > mwriteprotect_range() in mm/userfaultfd.c and the fix is literally to > > turn the read-lock (and unlock) into a write-lock (and unlock). > > Oh, and if it wasn't obvious, we'll have to debate what to do with > trying to mprotect() a pinned page. Do we just ignore the pinned page > (the way my clear_refs patch did)? Or do we turn it into -EBUSY? Or > what? Agreed, I assume mprotect would have the same effect. mprotect in parallel of a read or recvmgs may be undefined, so I didn't bring it up, but it was pretty clear. The moment the write bit is cleared (no matter why and from who) and the PG lock relased, if there's any GUP pin, GUP currently loses synchrony. In any case I intended to help exercising the new page_count logic with the testcase, possibly to make it behave better somehow, no matter how. I admit I'm also wondering myself the exact semantics of O_DIRECT on clear_refs or uffd-wp tracking, but the point is that losing reads and getting unexpected data in the page, still doesn't look a good behavior and it had to be at least checked. To me ultimately the useful use case that is become impossible with page_count isn't even clear_refs nor uffd-wp. The useful case that I can see zero fundamental flaws in it, is a RDMA or some other device computing in pure readonly DMA on the data while a program runs normally and produces it. It could be even a framebuffer that doesn't care about coherency. You may want to occasionally wrprotect the memory under readonly long term GUP pin for consistency even against bugs of the program itself. Why should wrprotecting make the device lose synchrony? And kind of performance we gain to the normal useful cases by breaking the special case? Is there a benchmark showing it? > So it's not *just* the locking that needs to be fixed. But just take a > look at that suggested clear_refs patch of mine - it sure isn't > complicated. If we can skip the wrprotection it's fairly easy, I fully agree, even then it still looks more difficult than using page_mapcount in do_wp_page in my view, so I also don't see the simplification. And overall the amount of kernel code had a net increase as result. Thanks, Andrea