From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 7032BC4332F for ; Wed, 16 Nov 2022 10:31:49 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S231182AbiKPKbq (ORCPT ); Wed, 16 Nov 2022 05:31:46 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:38974 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229868AbiKPKbd (ORCPT ); Wed, 16 Nov 2022 05:31:33 -0500 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id C3F5A31FA3 for ; Wed, 16 Nov 2022 02:27:28 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1668594447; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding; bh=/1/81f1bc3wgoHEbP5M1Zz0wSDu9jba91U8WGSjmQ70=; b=cB58a1xu0zkDW6lrsnC26uPab7ol+rQisFQQtOzIpYe7O5kCr/QOE7+qoX/7XPLFwKinBR 7GcHG5dd9Tng6gLpYmkoZKyWpVp4m3QpU93ubJyq0NMqTHdfby4UiBFiwLBP5uinhSwgnj 0pmfg/rJXAuOprJ/4HOLlOIj3uTazSI= Received: from mimecast-mx02.redhat.com (mimecast-mx02.redhat.com [66.187.233.88]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id us-mta-634-ZOxGenHbOYKZhsRLkhfmqA-1; Wed, 16 Nov 2022 05:27:25 -0500 X-MC-Unique: ZOxGenHbOYKZhsRLkhfmqA-1 Received: from smtp.corp.redhat.com (int-mx04.intmail.prod.int.rdu2.redhat.com [10.11.54.4]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx02.redhat.com (Postfix) with ESMTPS id 71CBB833AED; Wed, 16 Nov 2022 10:27:23 +0000 (UTC) Received: from t480s.fritz.box (unknown [10.39.193.216]) by smtp.corp.redhat.com (Postfix) with ESMTP id 6687D2024CCA; Wed, 16 Nov 2022 10:27:02 +0000 (UTC) From: David Hildenbrand To: linux-kernel@vger.kernel.org Cc: x86@kernel.org, linux-alpha@vger.kernel.org, linux-arm-kernel@lists.infradead.org, linux-ia64@vger.kernel.org, linux-mips@vger.kernel.org, linuxppc-dev@lists.ozlabs.org, sparclinux@vger.kernel.org, linux-um@lists.infradead.org, etnaviv@lists.freedesktop.org, dri-devel@lists.freedesktop.org, linux-samsung-soc@vger.kernel.org, linux-rdma@vger.kernel.org, linux-media@vger.kernel.org, linux-fsdevel@vger.kernel.org, linux-mm@kvack.org, linux-perf-users@vger.kernel.org, linux-security-module@vger.kernel.org, linux-kselftest@vger.kernel.org, Linus Torvalds , Andrew Morton , Jason Gunthorpe , John Hubbard , Peter Xu , Greg Kroah-Hartman , Andrea Arcangeli , Hugh Dickins , Nadav Amit , Vlastimil Babka , Matthew Wilcox , Mike Kravetz , Muchun Song , Shuah Khan , Lucas Stach , David Airlie , Oded Gabbay , Arnd Bergmann , Christoph Hellwig , Alex Williamson , David Hildenbrand , Alexander Shishkin , Alexander Viro , Andy Walls , Anton Ivanov , Arnaldo Carvalho de Melo , Bernard Metzler , Borislav Petkov , Catalin Marinas , Christian Benvenuti , Christian Gmeiner , Christophe Leroy , Daniel Vetter , Daniel Vetter , Dave Hansen , "David S. Miller" , Dennis Dalessandro , Eric Biederman , Hans Verkuil , "H. Peter Anvin" , Ingo Molnar , Inki Dae , Ivan Kokshaysky , James Morris , Jiri Olsa , Johannes Berg , Kees Cook , Kentaro Takeda , Krzysztof Kozlowski , Kyungmin Park , Leon Romanovsky , Leon Romanovsky , Marek Szyprowski , Mark Rutland , Matt Turner , Mauro Carvalho Chehab , Michael Ellerman , Namhyung Kim , Nelson Escobar , Nicholas Piggin , Oleg Nesterov , Paul Moore , Peter Zijlstra , Richard Henderson , Richard Weinberger , Russell King , "Serge E. Hallyn" , Seung-Woo Kim , Tetsuo Handa , Thomas Bogendoerfer , Thomas Gleixner , Tomasz Figa , Will Deacon Subject: [PATCH mm-unstable v1 00/20] mm/gup: remove FOLL_FORCE usage from drivers (reliable R/O long-term pinning) Date: Wed, 16 Nov 2022 11:26:39 +0100 Message-Id: <20221116102659.70287-1-david@redhat.com> MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Scanned-By: MIMEDefang 3.1 on 10.11.54.4 Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org For now, we did not support reliable R/O long-term pinning in COW mappings. That means, if we would trigger R/O long-term pinning in MAP_PRIVATE mapping, we could end up pinning the (R/O-mapped) shared zeropage or a pagecache page. The next write access would trigger a write fault and replace the pinned page by an exclusive anonymous page in the process page table; whatever the process would write to that private page copy would not be visible by the owner of the previous page pin: for example, RDMA could read stale data. The end result is essentially an unexpected and hard-to-debug memory corruption. Some drivers tried working around that limitation by using "FOLL_FORCE|FOLL_WRITE|FOLL_LONGTERM" for R/O long-term pinning for now. FOLL_WRITE would trigger a write fault, if required, and break COW before pinning the page. FOLL_FORCE is required because the VMA might lack write permissions, and drivers wanted to make that working as well, just like one would expect (no write access, but still triggering a write access to break COW). However, that is not a practical solution, because (1) Drivers that don't stick to that undocumented and debatable pattern would still run into that issue. For example, VFIO only uses FOLL_LONGTERM for R/O long-term pinning. (2) Using FOLL_WRITE just to work around a COW mapping + page pinning limitation is unintuitive. FOLL_WRITE would, for example, mark the page softdirty or trigger uffd-wp, even though, there actually isn't going to be any write access. (3) The purpose of FOLL_FORCE is debug access, not access without lack of VMA permissions by arbitrarty drivers. So instead, make R/O long-term pinning work as expected, by breaking COW in a COW mapping early, such that we can remove any FOLL_FORCE usage from drivers and make FOLL_FORCE ptrace-specific (renaming it to FOLL_PTRACE). More details in patch #8. Patches #1--#3 add COW tests for non-anonymous pages. Patches #4--#7 prepare core MM for extended FAULT_FLAG_UNSHARE support in COW mappings. Patch #8 implements reliable R/O long-term pinning in COW mappings Patches #9--#19 remove any FOLL_FORCE usage from drivers. Patch #20 renames FOLL_FORCE to FOLL_PTRACE. I'm refraining from CCing all driver/arch maintainers on the whole patch set, but only CC them on the cover letter and the applicable patch (I know, I know, someone is always unhappy ... sorry). RFC -> v1: * Use term "ptrace" instead of "debuggers" in patch descriptions * Added ACK/Tested-by * "mm/frame-vector: remove FOLL_FORCE usage" -> Adjust description * "mm: rename FOLL_FORCE to FOLL_PTRACE" -> Added David Hildenbrand (20): selftests/vm: anon_cow: prepare for non-anonymous COW tests selftests/vm: cow: basic COW tests for non-anonymous pages selftests/vm: cow: R/O long-term pinning reliability tests for non-anon pages mm: add early FAULT_FLAG_UNSHARE consistency checks mm: add early FAULT_FLAG_WRITE consistency checks mm: rework handling in do_wp_page() based on private vs. shared mappings mm: don't call vm_ops->huge_fault() in wp_huge_pmd()/wp_huge_pud() for private mappings mm: extend FAULT_FLAG_UNSHARE support to anything in a COW mapping mm/gup: reliable R/O long-term pinning in COW mappings RDMA/umem: remove FOLL_FORCE usage RDMA/usnic: remove FOLL_FORCE usage RDMA/siw: remove FOLL_FORCE usage media: videobuf-dma-sg: remove FOLL_FORCE usage drm/etnaviv: remove FOLL_FORCE usage media: pci/ivtv: remove FOLL_FORCE usage mm/frame-vector: remove FOLL_FORCE usage drm/exynos: remove FOLL_FORCE usage RDMA/hw/qib/qib_user_pages: remove FOLL_FORCE usage habanalabs: remove FOLL_FORCE usage mm: rename FOLL_FORCE to FOLL_PTRACE arch/alpha/kernel/ptrace.c | 6 +- arch/arm64/kernel/mte.c | 2 +- arch/ia64/kernel/ptrace.c | 10 +- arch/mips/kernel/ptrace32.c | 4 +- arch/mips/math-emu/dsemul.c | 2 +- arch/powerpc/kernel/ptrace/ptrace32.c | 4 +- arch/sparc/kernel/ptrace_32.c | 4 +- arch/sparc/kernel/ptrace_64.c | 8 +- arch/x86/kernel/step.c | 2 +- arch/x86/um/ptrace_32.c | 2 +- arch/x86/um/ptrace_64.c | 2 +- drivers/gpu/drm/etnaviv/etnaviv_gem.c | 8 +- drivers/gpu/drm/exynos/exynos_drm_g2d.c | 2 +- drivers/infiniband/core/umem.c | 8 +- drivers/infiniband/hw/qib/qib_user_pages.c | 2 +- drivers/infiniband/hw/usnic/usnic_uiom.c | 9 +- drivers/infiniband/sw/siw/siw_mem.c | 9 +- drivers/media/common/videobuf2/frame_vector.c | 2 +- drivers/media/pci/ivtv/ivtv-udma.c | 2 +- drivers/media/pci/ivtv/ivtv-yuv.c | 5 +- drivers/media/v4l2-core/videobuf-dma-sg.c | 14 +- drivers/misc/habanalabs/common/memory.c | 3 +- fs/exec.c | 2 +- fs/proc/base.c | 2 +- include/linux/mm.h | 35 +- include/linux/mm_types.h | 8 +- kernel/events/uprobes.c | 4 +- kernel/ptrace.c | 12 +- mm/gup.c | 38 +- mm/huge_memory.c | 13 +- mm/hugetlb.c | 14 +- mm/memory.c | 97 +++-- mm/util.c | 4 +- security/tomoyo/domain.c | 2 +- tools/testing/selftests/vm/.gitignore | 2 +- tools/testing/selftests/vm/Makefile | 10 +- tools/testing/selftests/vm/check_config.sh | 4 +- .../selftests/vm/{anon_cow.c => cow.c} | 387 +++++++++++++++++- tools/testing/selftests/vm/run_vmtests.sh | 8 +- 39 files changed, 575 insertions(+), 177 deletions(-) rename tools/testing/selftests/vm/{anon_cow.c => cow.c} (75%) -- 2.38.1