From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id BAE04E95A62 for ; Sat, 7 Oct 2023 20:51:12 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 30DEE8D000E; Sat, 7 Oct 2023 16:51:12 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 27ABF8D0001; Sat, 7 Oct 2023 16:51:12 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 0C38F8D000E; Sat, 7 Oct 2023 16:51:12 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0017.hostedemail.com [216.40.44.17]) by kanga.kvack.org (Postfix) with ESMTP id EAC918D0001 for ; Sat, 7 Oct 2023 16:51:11 -0400 (EDT) Received: from smtpin28.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay02.hostedemail.com (Postfix) with ESMTP id B5EAF1202AF for ; Sat, 7 Oct 2023 20:51:11 +0000 (UTC) X-FDA: 81319860342.28.8233C00 Received: from mail-wr1-f54.google.com (mail-wr1-f54.google.com [209.85.221.54]) by imf24.hostedemail.com (Postfix) with ESMTP id E9B5D180011 for ; Sat, 7 Oct 2023 20:51:09 +0000 (UTC) Authentication-Results: imf24.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=BbWb6eRQ; dmarc=pass (policy=none) header.from=gmail.com; spf=pass (imf24.hostedemail.com: domain of lstoakes@gmail.com designates 209.85.221.54 as permitted sender) smtp.mailfrom=lstoakes@gmail.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1696711870; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-transfer-encoding:content-transfer-encoding: in-reply-to:references:dkim-signature; bh=iSZI04FWJw/xnla7C54t9PaSWTWuiAAC0wOD66SHSbo=; b=bpBtO0Kmt4p7/XeTKuFHmYAQr05CI3VRR7I6CaUaIbIhTBygSPO29A2RiLB40WlCYjueTF STVPhu8kGyBYXreJGyIQXu/Y0JWMr8MCZxQDbFcQTtk1QL6260q6owk/1HgC0wVsMtB+Db P6dQ0u7vEC69wmp8ZMKQNZR4v1NwUa8= ARC-Authentication-Results: i=1; imf24.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=BbWb6eRQ; dmarc=pass (policy=none) header.from=gmail.com; spf=pass (imf24.hostedemail.com: domain of lstoakes@gmail.com designates 209.85.221.54 as permitted sender) smtp.mailfrom=lstoakes@gmail.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1696711870; a=rsa-sha256; cv=none; b=pF2RPS3+NjwwH7iRfuvU7olYUyq1UzHupoc1HNXkXuwF1ncMOOpej5rk0bbHfya29/Yt1k 4ubksztnEDJ4xnhlOTzVs8mttm9FPXAx2YCMrt0FnWQqCl9ALDHpCLK+RSdWnuCymx/KoE cjL4XXAUuZCc5pFFQN6OocmsZo02eaQ= Received: by mail-wr1-f54.google.com with SMTP id ffacd0b85a97d-323168869daso3185357f8f.2 for ; Sat, 07 Oct 2023 13:51:09 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1696711868; x=1697316668; darn=kvack.org; h=content-transfer-encoding:mime-version:message-id:date:subject:cc :to:from:from:to:cc:subject:date:message-id:reply-to; bh=iSZI04FWJw/xnla7C54t9PaSWTWuiAAC0wOD66SHSbo=; b=BbWb6eRQGMyH3V3tK4VAvXEiSqlp7S289Pc7WL6mk7yYcBzPa5/yJbJ3WpnO235U4d p3O+dY0fsAXDhR7jwnmqYsoc7pGh2Hle0dY4SQuUhOFdUpZNDY22L3gEOSrNvBk5LokX 6k3kTunAQ5Lf5F5g/2oVXurJiZpSGx4Bxj8UavTjw5f1mWH/KW1MjzxceycnoKocu4f9 hQBBOOGIdgfSlZgtAK/Y/etgK8DDsfGb7DZTxwc8DDlfjjcaabz+G3WKPTkb/04yns9s 1WEFEAe1o4+HVx73FhBPx5W9r+M01RBdL3qmdosiBoIkYam+TZwf1ZLBzo67GhwR7RZc MppQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1696711868; x=1697316668; h=content-transfer-encoding:mime-version:message-id:date:subject:cc :to:from:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=iSZI04FWJw/xnla7C54t9PaSWTWuiAAC0wOD66SHSbo=; b=u65uffsdvDRT91UY4klNt1fzhVvFKRA2It9wtCv1xurCMvbLMKEPVUC+IrP4uZY+r4 dcWi1+T/CqBTB7nBJDSpz0lkBRthpmp04RCVjD1k/D5P9RyvEenGSQg699ffn/hRtr67 zYCdQ7cjdt1PyAlSLGMmMatNtXcn1BVZ2AKKAKwupXYQyRFqwHiKIewsk93gDvR/eLvC WgE9Dns3kJWRV4XtpI/0nrwK2/hdiipSKvH0bdf4DiPejNJ7cMmMrhSqkhuKCoAl38cH 5GuE9+lL6+5TQK+Iot/8hGd4czRkcygIfyVBWQN0YqnfiYx+kcDiUjTm+t7nCC0O6pGO Xdmw== X-Gm-Message-State: AOJu0YznXxaIuzyOnMdYD2D5KBspGdkGlEZVRlk76Src0iNnSHBdGKfT JDA2hS+P4Dv0yGVfU6TbUEh6wWqw974= X-Google-Smtp-Source: AGHT+IGK0lDCEFA1hjFIW4LoHD2dcefcZKMoSA33EZr0Wxrn11nHo2JrireSCmV9YVJAnWlKMXQ1Ug== X-Received: by 2002:a5d:494f:0:b0:31a:d266:3d62 with SMTP id r15-20020a5d494f000000b0031ad2663d62mr10817205wrs.54.1696711868048; Sat, 07 Oct 2023 13:51:08 -0700 (PDT) Received: from lucifer.home ([2a00:23c5:dc8c:8701:1663:9a35:5a7b:1d76]) by smtp.googlemail.com with ESMTPSA id g7-20020adfe407000000b003232d122dbfsm5120550wrm.66.2023.10.07.13.51.06 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Sat, 07 Oct 2023 13:51:07 -0700 (PDT) From: Lorenzo Stoakes To: linux-mm@kvack.org, linux-kernel@vger.kernel.org, Andrew Morton Cc: Mike Kravetz , Muchun Song , Alexander Viro , Christian Brauner , Matthew Wilcox , Hugh Dickins , Andy Lutomirski , Jan Kara , linux-fsdevel@vger.kernel.org, bpf@vger.kernel.org, Lorenzo Stoakes Subject: [PATCH v3 0/3] permit write-sealed memfd read-only shared mappings Date: Sat, 7 Oct 2023 21:50:58 +0100 Message-ID: X-Mailer: git-send-email 2.42.0 MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Rspamd-Queue-Id: E9B5D180011 X-Rspam-User: X-Rspamd-Server: rspam04 X-Stat-Signature: faxf3fr1yuaoeotj1n6cenwxnend4b9z X-HE-Tag: 1696711869-843078 X-HE-Meta: U2FsdGVkX187R9PR6sXpOEMDb4gEShgbQfcdVZPk7nT7S2+iGeCkG8pd02MQ+OHwEq1QRRseKarX0m/Vwg6K+9tO+1V+no2SeHzWZl9wOYfvOdhEEkFSSvd/B0RjCZ3dG0y3gwfoOKa3YQ3cEHG/totAK3+GFQ/PnHcZRWlPYZNC768wlaN4CkIfCB048Ha7MSevqKj4M2V9poTjvjwhWB8uFDb5c8876Tzr4HCioJoxvmpOdZ09RgwiFiWn6geqylABno00ontAgi+hXhfYnp2ZF8YuLZAgwI0ai49kS3BEF6qzVQhAxams4jn6bpW9zZuH9RtQMirM0Ybp73b0YGtiY4VeukDeXCCwryrJgGlKVafY4wmxvCp3D6VCSX8X0n/1O9qqYNd3iRq6SoznOXP/DKrcobuwy1jAHDKVZa0Mt3/aEVqgZ8TvRC1Wv1U7hfRs7PEVLvuKzkExJSQPbrVRxTkBYzDZarL4zXogH29kfK46de/hyHw+7zzbpfPWexDjBL6t69OmWSmLIVC+PhyucSekIgESKbbJmNWC97Cr034XfNMGN7wLu3gblSR3qMhNhxzElyhXlqR4pg88aoLw7G9017cGyccXN/tL96k8QRjrsaw005jdQkGDsT5DN/HpiS+FxCa+oqOU+ovPAFMLQGi6r7ugl8F0JeRguTpKyNMV1nz4QXGDxIPgWaP2JFR8ZGne6S6pgc+x8+y45UqUlENzvyP/UjW04f3vR9RM/qET2z6sa5EcIfXU5ACsWDJ3IUJbjETOZCnV/yyEACasx4LXVMtOnZmZKOiRwJALCNvzsP6tAcE7g50b8DtbS+1tBL2jXuCOBcGVa61reT568tDNy57kv6zxHckgQIpDgcyOXfV9oxtiuHmdtiekglBVoGGfmJsF0WHUJ0cUGNgDdG5cgNL2aPMbbs62WoKXjyD0rhZ62fB8acIsuOHnEhri1M8bZz/OhVN1lep 4IMbCSKH S49xt5skM2l0W6Wf8I5ozUKowesAxmHon6uDz5Smu0+o35pQmiT137sXO5Cv/NQC7Zjf3Hh4AHCq/QmNST2bZGH5z5hwGWX2F8sZ1ULn93ZIEJRB8Tlhj0C59dN60UFl7UX1WOoE2uRRfsWc4ZYWHXsiTsThJxpaFiQFIGodj2UbsZBPnra6Wdewr7LUk2X5ykfDXrPVVvnKN9ATWCrT3D/aiL3usiWS50dU88L2cF42+yQ1/ibVIFP2d44u7uAcg1hc08TF4HB7QQKknXHiCpAqjGd37CmlEYH3GqurbpGOmzE8U4t6PV5Fy6guDzdhbOTZjhJPDy1g2xpjODlFRBK8+sYeIp1x0LrtowhDJaLouUZjPH84fpU6kLNQV5VT/b+DelgtKMxoQAsc4Jq59efvxrwmkXZXaOWmPsjHR6PpnyimoxFcDJhf4EOy+FoTlPo30flutm0hN6AMKIN5nUU5gXp0ZN0NFfhdNHfVzXnfGCay3chBa5sTOA5Al5Yx5UKp7 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: The man page for fcntl() describing memfd file seals states the following about F_SEAL_WRITE:- Furthermore, trying to create new shared, writable memory-mappings via mmap(2) will also fail with EPERM. With emphasis on 'writable'. In turns out in fact that currently the kernel simply disallows all new shared memory mappings for a memfd with F_SEAL_WRITE applied, rendering this documentation inaccurate. This matters because users are therefore unable to obtain a shared mapping to a memfd after write sealing altogether, which limits their usefulness. This was reported in the discussion thread [1] originating from a bug report [2]. This is a product of both using the struct address_space->i_mmap_writable atomic counter to determine whether writing may be permitted, and the kernel adjusting this counter when any VM_SHARED mapping is performed and more generally implicitly assuming VM_SHARED implies writable. It seems sensible that we should only update this mapping if VM_MAYWRITE is specified, i.e. whether it is possible that this mapping could at any point be written to. If we do so then all we need to do to permit write seals to function as documented is to clear VM_MAYWRITE when mapping read-only. It turns out this functionality already exists for F_SEAL_FUTURE_WRITE - we can therefore simply adapt this logic to do the same for F_SEAL_WRITE. We then hit a chicken and egg situation in mmap_region() where the check for VM_MAYWRITE occurs before we are able to clear this flag. To work around this, separate the check and its enforcement across call_mmap() - allowing for this function to clear VM_MAYWRITE. Thanks to Andy Lutomirski for the suggestion! [1]:https://lore.kernel.org/all/20230324133646.16101dfa666f253c4715d965@linux-foundation.org/ [2]:https://bugzilla.kernel.org/show_bug.cgi?id=217238 v3: - Don't defer the writable check until after call_mmap() in case this breaks f_ops->mmap() callbacks which assume this has been done first. Instead, separate the check and enforcement of it across the call, allowing for it to change vma->vm_flags in the meanwhile. - Improve/correct commit messages and comments throughout. v2: - Removed RFC tag. - Correct incorrect goto pointed out by Jan. - Reworded cover letter as suggested by Jan. https://lore.kernel.org/all/cover.1682890156.git.lstoakes@gmail.com/ v1: https://lore.kernel.org/all/cover.1680560277.git.lstoakes@gmail.com/ Lorenzo Stoakes (3): mm: drop the assumption that VM_SHARED always implies writable mm: update memfd seal write check to include F_SEAL_WRITE mm: perform the mapping_map_writable() check after call_mmap() fs/hugetlbfs/inode.c | 2 +- include/linux/fs.h | 4 ++-- include/linux/mm.h | 26 +++++++++++++++++++------- kernel/fork.c | 2 +- mm/filemap.c | 2 +- mm/madvise.c | 2 +- mm/mmap.c | 28 ++++++++++++++++++---------- mm/shmem.c | 2 +- 8 files changed, 44 insertions(+), 24 deletions(-) -- 2.42.0