From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-5.6 required=3.0 tests=DKIMWL_WL_HIGH,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,MAILING_LIST_MULTI,SIGNED_OFF_BY,SPF_PASS, URIBL_BLOCKED autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id DB9FDC43441 for ; Thu, 22 Nov 2018 03:25:43 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.kernel.org (Postfix) with ESMTP id 7BDC12075B for ; Thu, 22 Nov 2018 03:25:43 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (1024-bit key) header.d=kernel.org header.i=@kernel.org header.b="FFSQJZJz" DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 7BDC12075B Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=kernel.org Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S2392010AbeKVODE (ORCPT ); Thu, 22 Nov 2018 09:03:04 -0500 Received: from mail.kernel.org ([198.145.29.99]:49312 "EHLO mail.kernel.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S2389222AbeKVODE (ORCPT ); Thu, 22 Nov 2018 09:03:04 -0500 Received: from mail-wr1-f42.google.com (mail-wr1-f42.google.com [209.85.221.42]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPSA id 7E52721104 for ; Thu, 22 Nov 2018 03:25:40 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=default; t=1542857140; bh=VDpHnxe1N0gr8JRa8rVBfxztvnJGxwJYrL4W2Krb9B0=; h=References:In-Reply-To:From:Date:Subject:To:Cc:From; b=FFSQJZJzquz+ArGTAs7huyPM2rbYyYnF+nnWECpeR48A3eB9A6370Qa7ShO0c8q7R pqkkaW8TB7ZA8hjM65HDDj+gy/qTxnIkwbUGVTSV1HCcdb/4AhnbeOKh87JtDV6kIL ynNmyt3Wt8mddNHn3zQiCT5oigBaj9XFJlHzsR2M= Received: by mail-wr1-f42.google.com with SMTP id c14so2101996wrr.0 for ; Wed, 21 Nov 2018 19:25:40 -0800 (PST) X-Gm-Message-State: AA+aEWbHmMzQRSD7Xm1U359qTXaisAm5GSSvTFlPvqsxD0AE/dJCdBXV sQLCBPrFzOd2N9M3WXSJtdZgxW97nPLsyCS7V5Lo1A== X-Google-Smtp-Source: AFSGD/U+attBYTmsrp89Lo77RUzra5GnE7VLgbqrPXcs+EYMyppQ4Q8JYoUOFHzqELSaBa4g86WlPsNBhrnUOO+aQdU= X-Received: by 2002:adf:90af:: with SMTP id i44-v6mr7599783wri.77.1542857138819; Wed, 21 Nov 2018 19:25:38 -0800 (PST) MIME-Version: 1.0 References: <20181120052137.74317-1-joel@joelfernandes.org> <20181120183926.GA124387@google.com> <20181121070658.011d576d@canb.auug.org.au> <469B80CB-D982-4802-A81D-95AC493D7E87@amacapital.net> <20181120204710.GB22801@google.com> <20181120211335.GC22801@google.com> <20181121182701.0d8a775fda6af1f8d2be8f25@linux-foundation.org> In-Reply-To: <20181121182701.0d8a775fda6af1f8d2be8f25@linux-foundation.org> From: Andy Lutomirski Date: Wed, 21 Nov 2018 19:25:26 -0800 X-Gmail-Original-Message-ID: Message-ID: Subject: Re: [PATCH -next 1/2] mm/memfd: make F_SEAL_FUTURE_WRITE seal more robust To: Andrew Morton Cc: Joel Fernandes , Stephen Rothwell , Andrew Lutomirski , LKML , Hugh Dickins , Jann Horn , Khalid Aziz , Linux API , "open list:KERNEL SELFTEST FRAMEWORK" , Linux-MM , marcandre.lureau@redhat.com, Matthew Wilcox , Mike Kravetz , Shuah Khan Content-Type: text/plain; charset="UTF-8" Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Wed, Nov 21, 2018 at 6:27 PM Andrew Morton wrote: > > On Tue, 20 Nov 2018 13:13:35 -0800 Joel Fernandes wrote: > > > > > I am Ok with whatever Andrew wants to do, if it is better to squash it with > > > > the original, then I can do that and send another patch. > > > > > > > > > > > > > > From experience, Andrew will food in fixups on request :) > > > > Andrew, could you squash this patch into the one titled ("mm: Add an > > F_SEAL_FUTURE_WRITE seal to memfd")? > > Sure. > > I could of course queue them separately but I rarely do so - I don't > think that the intermediate development states are useful in the > infinite-term, and I make them available via additional Link: tags in > the changelog footers anyway. > > I think that the magnitude of these patches is such that John Stultz's > Reviewed-by is invalidated, so this series is now in the "unreviewed" > state. > > So can we have a re-review please? For convenience, here's the > folded-together [1/1] patch, as it will go to Linus. > > > From: "Joel Fernandes (Google)" > Subject: mm: Add an F_SEAL_FUTURE_WRITE seal to memfd > > Android uses ashmem for sharing memory regions. We are looking forward to > migrating all usecases of ashmem to memfd so that we can possibly remove > the ashmem driver in the future from staging while also benefiting from > using memfd and contributing to it. Note staging drivers are also not ABI > and generally can be removed at anytime. > > One of the main usecases Android has is the ability to create a region and > mmap it as writeable, then add protection against making any "future" > writes while keeping the existing already mmap'ed writeable-region active. > This allows us to implement a usecase where receivers of the shared > memory buffer can get a read-only view, while the sender continues to > write to the buffer. See CursorWindow documentation in Android for more > details: > https://developer.android.com/reference/android/database/CursorWindow > > This usecase cannot be implemented with the existing F_SEAL_WRITE seal. > To support the usecase, this patch adds a new F_SEAL_FUTURE_WRITE seal > which prevents any future mmap and write syscalls from succeeding while > keeping the existing mmap active. The following program shows the seal > working in action: > > #include > #include > #include > #include > #include > #include > #include > #define F_SEAL_FUTURE_WRITE 0x0010 > #define REGION_SIZE (5 * 1024 * 1024) > > int memfd_create_region(const char *name, size_t size) > { > int ret; > int fd = syscall(__NR_memfd_create, name, MFD_ALLOW_SEALING); > if (fd < 0) return fd; > ret = ftruncate(fd, size); > if (ret < 0) { close(fd); return ret; } > return fd; > } > > int main() { > int ret, fd; > void *addr, *addr2, *addr3, *addr1; > ret = memfd_create_region("test_region", REGION_SIZE); > printf("ret=%d\n", ret); > fd = ret; > > // Create map > addr = mmap(0, REGION_SIZE, PROT_READ|PROT_WRITE, MAP_SHARED, fd, 0); > if (addr == MAP_FAILED) > printf("map 0 failed\n"); > else > printf("map 0 passed\n"); > > if ((ret = write(fd, "test", 4)) != 4) > printf("write failed even though no future-write seal " > "(ret=%d errno =%d)\n", ret, errno); > else > printf("write passed\n"); > > addr1 = mmap(0, REGION_SIZE, PROT_READ|PROT_WRITE, MAP_SHARED, fd, 0); > if (addr1 == MAP_FAILED) > perror("map 1 prot-write failed even though no seal\n"); > else > printf("map 1 prot-write passed as expected\n"); > > ret = fcntl(fd, F_ADD_SEALS, F_SEAL_FUTURE_WRITE | > F_SEAL_GROW | > F_SEAL_SHRINK); > if (ret == -1) > printf("fcntl failed, errno: %d\n", errno); > else > printf("future-write seal now active\n"); > > if ((ret = write(fd, "test", 4)) != 4) > printf("write failed as expected due to future-write seal\n"); > else > printf("write passed (unexpected)\n"); > > addr2 = mmap(0, REGION_SIZE, PROT_READ|PROT_WRITE, MAP_SHARED, fd, 0); > if (addr2 == MAP_FAILED) > perror("map 2 prot-write failed as expected due to seal\n"); > else > printf("map 2 passed\n"); > > addr3 = mmap(0, REGION_SIZE, PROT_READ, MAP_SHARED, fd, 0); > if (addr3 == MAP_FAILED) > perror("map 3 failed\n"); > else > printf("map 3 prot-read passed as expected\n"); > } > > The output of running this program is as follows: > ret=3 > map 0 passed > write passed > map 1 prot-write passed as expected > future-write seal now active > write failed as expected due to future-write seal > map 2 prot-write failed as expected due to seal > : Permission denied > map 3 prot-read passed as expected > > [joel@joelfernandes.org: make F_SEAL_FUTURE_WRITE seal more robust] > Link: http://lkml.kernel.org/r/20181120052137.74317-1-joel@joelfernandes.org > Link: http://lkml.kernel.org/r/20181108041537.39694-1-joel@joelfernandes.org > Signed-off-by: Joel Fernandes (Google) > Cc: John Stultz > Cc: John Reck > Cc: Todd Kjos > Cc: Greg Kroah-Hartman > Cc: Christoph Hellwig > Cc: Al Viro > Cc: Daniel Colascione > Cc: J. Bruce Fields > Cc: Jeff Layton > Cc: Khalid Aziz > Cc: Lei Yang > Cc: Marc-Andr Lureau > Cc: Mike Kravetz > Cc: Minchan Kim > Cc: Shuah Khan > Cc: Valdis Kletnieks > Cc: Andy Lutomirski > Cc: Jann Horn > Signed-off-by: Andrew Morton > --- > > > --- a/include/uapi/linux/fcntl.h~mm-add-an-f_seal_future_write-seal-to-memfd > +++ a/include/uapi/linux/fcntl.h > @@ -41,6 +41,7 @@ > #define F_SEAL_SHRINK 0x0002 /* prevent file from shrinking */ > #define F_SEAL_GROW 0x0004 /* prevent file from growing */ > #define F_SEAL_WRITE 0x0008 /* prevent writes */ > +#define F_SEAL_FUTURE_WRITE 0x0010 /* prevent future writes while mapped */ > /* (1U << 31) is reserved for signed error codes */ > > /* > --- a/mm/memfd.c~mm-add-an-f_seal_future_write-seal-to-memfd > +++ a/mm/memfd.c > @@ -131,7 +131,8 @@ static unsigned int *memfd_file_seals_pt > #define F_ALL_SEALS (F_SEAL_SEAL | \ > F_SEAL_SHRINK | \ > F_SEAL_GROW | \ > - F_SEAL_WRITE) > + F_SEAL_WRITE | \ > + F_SEAL_FUTURE_WRITE) > > static int memfd_add_seals(struct file *file, unsigned int seals) > { > --- a/fs/hugetlbfs/inode.c~mm-add-an-f_seal_future_write-seal-to-memfd > +++ a/fs/hugetlbfs/inode.c > @@ -530,7 +530,7 @@ static long hugetlbfs_punch_hole(struct > inode_lock(inode); > > /* protected by i_mutex */ > - if (info->seals & F_SEAL_WRITE) { > + if (info->seals & (F_SEAL_WRITE | F_SEAL_FUTURE_WRITE)) { > inode_unlock(inode); > return -EPERM; > } > --- a/mm/shmem.c~mm-add-an-f_seal_future_write-seal-to-memfd > +++ a/mm/shmem.c > @@ -2119,6 +2119,23 @@ out_nomem: > > static int shmem_mmap(struct file *file, struct vm_area_struct *vma) > { > + struct shmem_inode_info *info = SHMEM_I(file_inode(file)); > + > + /* > + * New PROT_READ and MAP_SHARED mmaps are not allowed when "future PROT_WRITE, perhaps? > + * write" seal active. > + */ > + if ((vma->vm_flags & VM_SHARED) && (vma->vm_flags & VM_WRITE) && > + (info->seals & F_SEAL_FUTURE_WRITE)) > + return -EPERM; > + > + /* > + * Since the F_SEAL_FUTURE_WRITE seals allow for a MAP_SHARED read-only > + * mapping, take care to not allow mprotect to revert protections. > + */ > + if (info->seals & F_SEAL_FUTURE_WRITE) > + vma->vm_flags &= ~(VM_MAYWRITE); > + This might all be clearer as: if (info->seals & F_SEAL_FUTURE_WRITE) { if (vma->vm_flags ...) return -EPERM; vma->vm_flags &= ~VM_MAYWRITE; } with appropriate comments inserted. From mboxrd@z Thu Jan 1 00:00:00 1970 From: luto at kernel.org (Andy Lutomirski) Date: Wed, 21 Nov 2018 19:25:26 -0800 Subject: [PATCH -next 1/2] mm/memfd: make F_SEAL_FUTURE_WRITE seal more robust In-Reply-To: <20181121182701.0d8a775fda6af1f8d2be8f25@linux-foundation.org> References: <20181120052137.74317-1-joel@joelfernandes.org> <20181120183926.GA124387@google.com> <20181121070658.011d576d@canb.auug.org.au> <469B80CB-D982-4802-A81D-95AC493D7E87@amacapital.net> <20181120204710.GB22801@google.com> <20181120211335.GC22801@google.com> <20181121182701.0d8a775fda6af1f8d2be8f25@linux-foundation.org> Message-ID: On Wed, Nov 21, 2018 at 6:27 PM Andrew Morton wrote: > > On Tue, 20 Nov 2018 13:13:35 -0800 Joel Fernandes wrote: > > > > > I am Ok with whatever Andrew wants to do, if it is better to squash it with > > > > the original, then I can do that and send another patch. > > > > > > > > > > > > > > From experience, Andrew will food in fixups on request :) > > > > Andrew, could you squash this patch into the one titled ("mm: Add an > > F_SEAL_FUTURE_WRITE seal to memfd")? > > Sure. > > I could of course queue them separately but I rarely do so - I don't > think that the intermediate development states are useful in the > infinite-term, and I make them available via additional Link: tags in > the changelog footers anyway. > > I think that the magnitude of these patches is such that John Stultz's > Reviewed-by is invalidated, so this series is now in the "unreviewed" > state. > > So can we have a re-review please? For convenience, here's the > folded-together [1/1] patch, as it will go to Linus. > > > From: "Joel Fernandes (Google)" > Subject: mm: Add an F_SEAL_FUTURE_WRITE seal to memfd > > Android uses ashmem for sharing memory regions. We are looking forward to > migrating all usecases of ashmem to memfd so that we can possibly remove > the ashmem driver in the future from staging while also benefiting from > using memfd and contributing to it. Note staging drivers are also not ABI > and generally can be removed at anytime. > > One of the main usecases Android has is the ability to create a region and > mmap it as writeable, then add protection against making any "future" > writes while keeping the existing already mmap'ed writeable-region active. > This allows us to implement a usecase where receivers of the shared > memory buffer can get a read-only view, while the sender continues to > write to the buffer. See CursorWindow documentation in Android for more > details: > https://developer.android.com/reference/android/database/CursorWindow > > This usecase cannot be implemented with the existing F_SEAL_WRITE seal. > To support the usecase, this patch adds a new F_SEAL_FUTURE_WRITE seal > which prevents any future mmap and write syscalls from succeeding while > keeping the existing mmap active. The following program shows the seal > working in action: > > #include > #include > #include > #include > #include > #include > #include > #define F_SEAL_FUTURE_WRITE 0x0010 > #define REGION_SIZE (5 * 1024 * 1024) > > int memfd_create_region(const char *name, size_t size) > { > int ret; > int fd = syscall(__NR_memfd_create, name, MFD_ALLOW_SEALING); > if (fd < 0) return fd; > ret = ftruncate(fd, size); > if (ret < 0) { close(fd); return ret; } > return fd; > } > > int main() { > int ret, fd; > void *addr, *addr2, *addr3, *addr1; > ret = memfd_create_region("test_region", REGION_SIZE); > printf("ret=%d\n", ret); > fd = ret; > > // Create map > addr = mmap(0, REGION_SIZE, PROT_READ|PROT_WRITE, MAP_SHARED, fd, 0); > if (addr == MAP_FAILED) > printf("map 0 failed\n"); > else > printf("map 0 passed\n"); > > if ((ret = write(fd, "test", 4)) != 4) > printf("write failed even though no future-write seal " > "(ret=%d errno =%d)\n", ret, errno); > else > printf("write passed\n"); > > addr1 = mmap(0, REGION_SIZE, PROT_READ|PROT_WRITE, MAP_SHARED, fd, 0); > if (addr1 == MAP_FAILED) > perror("map 1 prot-write failed even though no seal\n"); > else > printf("map 1 prot-write passed as expected\n"); > > ret = fcntl(fd, F_ADD_SEALS, F_SEAL_FUTURE_WRITE | > F_SEAL_GROW | > F_SEAL_SHRINK); > if (ret == -1) > printf("fcntl failed, errno: %d\n", errno); > else > printf("future-write seal now active\n"); > > if ((ret = write(fd, "test", 4)) != 4) > printf("write failed as expected due to future-write seal\n"); > else > printf("write passed (unexpected)\n"); > > addr2 = mmap(0, REGION_SIZE, PROT_READ|PROT_WRITE, MAP_SHARED, fd, 0); > if (addr2 == MAP_FAILED) > perror("map 2 prot-write failed as expected due to seal\n"); > else > printf("map 2 passed\n"); > > addr3 = mmap(0, REGION_SIZE, PROT_READ, MAP_SHARED, fd, 0); > if (addr3 == MAP_FAILED) > perror("map 3 failed\n"); > else > printf("map 3 prot-read passed as expected\n"); > } > > The output of running this program is as follows: > ret=3 > map 0 passed > write passed > map 1 prot-write passed as expected > future-write seal now active > write failed as expected due to future-write seal > map 2 prot-write failed as expected due to seal > : Permission denied > map 3 prot-read passed as expected > > [joel at joelfernandes.org: make F_SEAL_FUTURE_WRITE seal more robust] > Link: http://lkml.kernel.org/r/20181120052137.74317-1-joel at joelfernandes.org > Link: http://lkml.kernel.org/r/20181108041537.39694-1-joel at joelfernandes.org > Signed-off-by: Joel Fernandes (Google) > Cc: John Stultz > Cc: John Reck > Cc: Todd Kjos > Cc: Greg Kroah-Hartman > Cc: Christoph Hellwig > Cc: Al Viro > Cc: Daniel Colascione > Cc: J. Bruce Fields > Cc: Jeff Layton > Cc: Khalid Aziz > Cc: Lei Yang > Cc: Marc-Andr Lureau > Cc: Mike Kravetz > Cc: Minchan Kim > Cc: Shuah Khan > Cc: Valdis Kletnieks > Cc: Andy Lutomirski > Cc: Jann Horn > Signed-off-by: Andrew Morton > --- > > > --- a/include/uapi/linux/fcntl.h~mm-add-an-f_seal_future_write-seal-to-memfd > +++ a/include/uapi/linux/fcntl.h > @@ -41,6 +41,7 @@ > #define F_SEAL_SHRINK 0x0002 /* prevent file from shrinking */ > #define F_SEAL_GROW 0x0004 /* prevent file from growing */ > #define F_SEAL_WRITE 0x0008 /* prevent writes */ > +#define F_SEAL_FUTURE_WRITE 0x0010 /* prevent future writes while mapped */ > /* (1U << 31) is reserved for signed error codes */ > > /* > --- a/mm/memfd.c~mm-add-an-f_seal_future_write-seal-to-memfd > +++ a/mm/memfd.c > @@ -131,7 +131,8 @@ static unsigned int *memfd_file_seals_pt > #define F_ALL_SEALS (F_SEAL_SEAL | \ > F_SEAL_SHRINK | \ > F_SEAL_GROW | \ > - F_SEAL_WRITE) > + F_SEAL_WRITE | \ > + F_SEAL_FUTURE_WRITE) > > static int memfd_add_seals(struct file *file, unsigned int seals) > { > --- a/fs/hugetlbfs/inode.c~mm-add-an-f_seal_future_write-seal-to-memfd > +++ a/fs/hugetlbfs/inode.c > @@ -530,7 +530,7 @@ static long hugetlbfs_punch_hole(struct > inode_lock(inode); > > /* protected by i_mutex */ > - if (info->seals & F_SEAL_WRITE) { > + if (info->seals & (F_SEAL_WRITE | F_SEAL_FUTURE_WRITE)) { > inode_unlock(inode); > return -EPERM; > } > --- a/mm/shmem.c~mm-add-an-f_seal_future_write-seal-to-memfd > +++ a/mm/shmem.c > @@ -2119,6 +2119,23 @@ out_nomem: > > static int shmem_mmap(struct file *file, struct vm_area_struct *vma) > { > + struct shmem_inode_info *info = SHMEM_I(file_inode(file)); > + > + /* > + * New PROT_READ and MAP_SHARED mmaps are not allowed when "future PROT_WRITE, perhaps? > + * write" seal active. > + */ > + if ((vma->vm_flags & VM_SHARED) && (vma->vm_flags & VM_WRITE) && > + (info->seals & F_SEAL_FUTURE_WRITE)) > + return -EPERM; > + > + /* > + * Since the F_SEAL_FUTURE_WRITE seals allow for a MAP_SHARED read-only > + * mapping, take care to not allow mprotect to revert protections. > + */ > + if (info->seals & F_SEAL_FUTURE_WRITE) > + vma->vm_flags &= ~(VM_MAYWRITE); > + This might all be clearer as: if (info->seals & F_SEAL_FUTURE_WRITE) { if (vma->vm_flags ...) return -EPERM; vma->vm_flags &= ~VM_MAYWRITE; } with appropriate comments inserted. From mboxrd@z Thu Jan 1 00:00:00 1970 From: luto@kernel.org (Andy Lutomirski) Date: Wed, 21 Nov 2018 19:25:26 -0800 Subject: [PATCH -next 1/2] mm/memfd: make F_SEAL_FUTURE_WRITE seal more robust In-Reply-To: <20181121182701.0d8a775fda6af1f8d2be8f25@linux-foundation.org> References: <20181120052137.74317-1-joel@joelfernandes.org> <20181120183926.GA124387@google.com> <20181121070658.011d576d@canb.auug.org.au> <469B80CB-D982-4802-A81D-95AC493D7E87@amacapital.net> <20181120204710.GB22801@google.com> <20181120211335.GC22801@google.com> <20181121182701.0d8a775fda6af1f8d2be8f25@linux-foundation.org> Message-ID: Content-Type: text/plain; charset="UTF-8" Message-ID: <20181122032526.niFpOjMQ6N75e-1ACPs8YwaJVgmVobyFx3BhMOW_vOQ@z> On Wed, Nov 21, 2018@6:27 PM Andrew Morton wrote: > > On Tue, 20 Nov 2018 13:13:35 -0800 Joel Fernandes wrote: > > > > > I am Ok with whatever Andrew wants to do, if it is better to squash it with > > > > the original, then I can do that and send another patch. > > > > > > > > > > > > > > From experience, Andrew will food in fixups on request :) > > > > Andrew, could you squash this patch into the one titled ("mm: Add an > > F_SEAL_FUTURE_WRITE seal to memfd")? > > Sure. > > I could of course queue them separately but I rarely do so - I don't > think that the intermediate development states are useful in the > infinite-term, and I make them available via additional Link: tags in > the changelog footers anyway. > > I think that the magnitude of these patches is such that John Stultz's > Reviewed-by is invalidated, so this series is now in the "unreviewed" > state. > > So can we have a re-review please? For convenience, here's the > folded-together [1/1] patch, as it will go to Linus. > > > From: "Joel Fernandes (Google)" > Subject: mm: Add an F_SEAL_FUTURE_WRITE seal to memfd > > Android uses ashmem for sharing memory regions. We are looking forward to > migrating all usecases of ashmem to memfd so that we can possibly remove > the ashmem driver in the future from staging while also benefiting from > using memfd and contributing to it. Note staging drivers are also not ABI > and generally can be removed at anytime. > > One of the main usecases Android has is the ability to create a region and > mmap it as writeable, then add protection against making any "future" > writes while keeping the existing already mmap'ed writeable-region active. > This allows us to implement a usecase where receivers of the shared > memory buffer can get a read-only view, while the sender continues to > write to the buffer. See CursorWindow documentation in Android for more > details: > https://developer.android.com/reference/android/database/CursorWindow > > This usecase cannot be implemented with the existing F_SEAL_WRITE seal. > To support the usecase, this patch adds a new F_SEAL_FUTURE_WRITE seal > which prevents any future mmap and write syscalls from succeeding while > keeping the existing mmap active. The following program shows the seal > working in action: > > #include > #include > #include > #include > #include > #include > #include > #define F_SEAL_FUTURE_WRITE 0x0010 > #define REGION_SIZE (5 * 1024 * 1024) > > int memfd_create_region(const char *name, size_t size) > { > int ret; > int fd = syscall(__NR_memfd_create, name, MFD_ALLOW_SEALING); > if (fd < 0) return fd; > ret = ftruncate(fd, size); > if (ret < 0) { close(fd); return ret; } > return fd; > } > > int main() { > int ret, fd; > void *addr, *addr2, *addr3, *addr1; > ret = memfd_create_region("test_region", REGION_SIZE); > printf("ret=%d\n", ret); > fd = ret; > > // Create map > addr = mmap(0, REGION_SIZE, PROT_READ|PROT_WRITE, MAP_SHARED, fd, 0); > if (addr == MAP_FAILED) > printf("map 0 failed\n"); > else > printf("map 0 passed\n"); > > if ((ret = write(fd, "test", 4)) != 4) > printf("write failed even though no future-write seal " > "(ret=%d errno =%d)\n", ret, errno); > else > printf("write passed\n"); > > addr1 = mmap(0, REGION_SIZE, PROT_READ|PROT_WRITE, MAP_SHARED, fd, 0); > if (addr1 == MAP_FAILED) > perror("map 1 prot-write failed even though no seal\n"); > else > printf("map 1 prot-write passed as expected\n"); > > ret = fcntl(fd, F_ADD_SEALS, F_SEAL_FUTURE_WRITE | > F_SEAL_GROW | > F_SEAL_SHRINK); > if (ret == -1) > printf("fcntl failed, errno: %d\n", errno); > else > printf("future-write seal now active\n"); > > if ((ret = write(fd, "test", 4)) != 4) > printf("write failed as expected due to future-write seal\n"); > else > printf("write passed (unexpected)\n"); > > addr2 = mmap(0, REGION_SIZE, PROT_READ|PROT_WRITE, MAP_SHARED, fd, 0); > if (addr2 == MAP_FAILED) > perror("map 2 prot-write failed as expected due to seal\n"); > else > printf("map 2 passed\n"); > > addr3 = mmap(0, REGION_SIZE, PROT_READ, MAP_SHARED, fd, 0); > if (addr3 == MAP_FAILED) > perror("map 3 failed\n"); > else > printf("map 3 prot-read passed as expected\n"); > } > > The output of running this program is as follows: > ret=3 > map 0 passed > write passed > map 1 prot-write passed as expected > future-write seal now active > write failed as expected due to future-write seal > map 2 prot-write failed as expected due to seal > : Permission denied > map 3 prot-read passed as expected > > [joel at joelfernandes.org: make F_SEAL_FUTURE_WRITE seal more robust] > Link: http://lkml.kernel.org/r/20181120052137.74317-1-joel at joelfernandes.org > Link: http://lkml.kernel.org/r/20181108041537.39694-1-joel at joelfernandes.org > Signed-off-by: Joel Fernandes (Google) > Cc: John Stultz > Cc: John Reck > Cc: Todd Kjos > Cc: Greg Kroah-Hartman > Cc: Christoph Hellwig > Cc: Al Viro > Cc: Daniel Colascione > Cc: J. Bruce Fields > Cc: Jeff Layton > Cc: Khalid Aziz > Cc: Lei Yang > Cc: Marc-Andr Lureau > Cc: Mike Kravetz > Cc: Minchan Kim > Cc: Shuah Khan > Cc: Valdis Kletnieks > Cc: Andy Lutomirski > Cc: Jann Horn > Signed-off-by: Andrew Morton > --- > > > --- a/include/uapi/linux/fcntl.h~mm-add-an-f_seal_future_write-seal-to-memfd > +++ a/include/uapi/linux/fcntl.h > @@ -41,6 +41,7 @@ > #define F_SEAL_SHRINK 0x0002 /* prevent file from shrinking */ > #define F_SEAL_GROW 0x0004 /* prevent file from growing */ > #define F_SEAL_WRITE 0x0008 /* prevent writes */ > +#define F_SEAL_FUTURE_WRITE 0x0010 /* prevent future writes while mapped */ > /* (1U << 31) is reserved for signed error codes */ > > /* > --- a/mm/memfd.c~mm-add-an-f_seal_future_write-seal-to-memfd > +++ a/mm/memfd.c > @@ -131,7 +131,8 @@ static unsigned int *memfd_file_seals_pt > #define F_ALL_SEALS (F_SEAL_SEAL | \ > F_SEAL_SHRINK | \ > F_SEAL_GROW | \ > - F_SEAL_WRITE) > + F_SEAL_WRITE | \ > + F_SEAL_FUTURE_WRITE) > > static int memfd_add_seals(struct file *file, unsigned int seals) > { > --- a/fs/hugetlbfs/inode.c~mm-add-an-f_seal_future_write-seal-to-memfd > +++ a/fs/hugetlbfs/inode.c > @@ -530,7 +530,7 @@ static long hugetlbfs_punch_hole(struct > inode_lock(inode); > > /* protected by i_mutex */ > - if (info->seals & F_SEAL_WRITE) { > + if (info->seals & (F_SEAL_WRITE | F_SEAL_FUTURE_WRITE)) { > inode_unlock(inode); > return -EPERM; > } > --- a/mm/shmem.c~mm-add-an-f_seal_future_write-seal-to-memfd > +++ a/mm/shmem.c > @@ -2119,6 +2119,23 @@ out_nomem: > > static int shmem_mmap(struct file *file, struct vm_area_struct *vma) > { > + struct shmem_inode_info *info = SHMEM_I(file_inode(file)); > + > + /* > + * New PROT_READ and MAP_SHARED mmaps are not allowed when "future PROT_WRITE, perhaps? > + * write" seal active. > + */ > + if ((vma->vm_flags & VM_SHARED) && (vma->vm_flags & VM_WRITE) && > + (info->seals & F_SEAL_FUTURE_WRITE)) > + return -EPERM; > + > + /* > + * Since the F_SEAL_FUTURE_WRITE seals allow for a MAP_SHARED read-only > + * mapping, take care to not allow mprotect to revert protections. > + */ > + if (info->seals & F_SEAL_FUTURE_WRITE) > + vma->vm_flags &= ~(VM_MAYWRITE); > + This might all be clearer as: if (info->seals & F_SEAL_FUTURE_WRITE) { if (vma->vm_flags ...) return -EPERM; vma->vm_flags &= ~VM_MAYWRITE; } with appropriate comments inserted. From mboxrd@z Thu Jan 1 00:00:00 1970 From: Andy Lutomirski Subject: Re: [PATCH -next 1/2] mm/memfd: make F_SEAL_FUTURE_WRITE seal more robust Date: Wed, 21 Nov 2018 19:25:26 -0800 Message-ID: References: <20181120052137.74317-1-joel@joelfernandes.org> <20181120183926.GA124387@google.com> <20181121070658.011d576d@canb.auug.org.au> <469B80CB-D982-4802-A81D-95AC493D7E87@amacapital.net> <20181120204710.GB22801@google.com> <20181120211335.GC22801@google.com> <20181121182701.0d8a775fda6af1f8d2be8f25@linux-foundation.org> Mime-Version: 1.0 Content-Type: text/plain; charset="UTF-8" Return-path: In-Reply-To: <20181121182701.0d8a775fda6af1f8d2be8f25@linux-foundation.org> Sender: linux-kernel-owner@vger.kernel.org To: Andrew Morton Cc: Joel Fernandes , Stephen Rothwell , Andrew Lutomirski , LKML , Hugh Dickins , Jann Horn , Khalid Aziz , Linux API , "open list:KERNEL SELFTEST FRAMEWORK" , Linux-MM , marcandre.lureau@redhat.com, Matthew Wilcox , Mike Kravetz , Shuah Khan List-Id: linux-api@vger.kernel.org On Wed, Nov 21, 2018 at 6:27 PM Andrew Morton wrote: > > On Tue, 20 Nov 2018 13:13:35 -0800 Joel Fernandes wrote: > > > > > I am Ok with whatever Andrew wants to do, if it is better to squash it with > > > > the original, then I can do that and send another patch. > > > > > > > > > > > > > > From experience, Andrew will food in fixups on request :) > > > > Andrew, could you squash this patch into the one titled ("mm: Add an > > F_SEAL_FUTURE_WRITE seal to memfd")? > > Sure. > > I could of course queue them separately but I rarely do so - I don't > think that the intermediate development states are useful in the > infinite-term, and I make them available via additional Link: tags in > the changelog footers anyway. > > I think that the magnitude of these patches is such that John Stultz's > Reviewed-by is invalidated, so this series is now in the "unreviewed" > state. > > So can we have a re-review please? For convenience, here's the > folded-together [1/1] patch, as it will go to Linus. > > > From: "Joel Fernandes (Google)" > Subject: mm: Add an F_SEAL_FUTURE_WRITE seal to memfd > > Android uses ashmem for sharing memory regions. We are looking forward to > migrating all usecases of ashmem to memfd so that we can possibly remove > the ashmem driver in the future from staging while also benefiting from > using memfd and contributing to it. Note staging drivers are also not ABI > and generally can be removed at anytime. > > One of the main usecases Android has is the ability to create a region and > mmap it as writeable, then add protection against making any "future" > writes while keeping the existing already mmap'ed writeable-region active. > This allows us to implement a usecase where receivers of the shared > memory buffer can get a read-only view, while the sender continues to > write to the buffer. See CursorWindow documentation in Android for more > details: > https://developer.android.com/reference/android/database/CursorWindow > > This usecase cannot be implemented with the existing F_SEAL_WRITE seal. > To support the usecase, this patch adds a new F_SEAL_FUTURE_WRITE seal > which prevents any future mmap and write syscalls from succeeding while > keeping the existing mmap active. The following program shows the seal > working in action: > > #include > #include > #include > #include > #include > #include > #include > #define F_SEAL_FUTURE_WRITE 0x0010 > #define REGION_SIZE (5 * 1024 * 1024) > > int memfd_create_region(const char *name, size_t size) > { > int ret; > int fd = syscall(__NR_memfd_create, name, MFD_ALLOW_SEALING); > if (fd < 0) return fd; > ret = ftruncate(fd, size); > if (ret < 0) { close(fd); return ret; } > return fd; > } > > int main() { > int ret, fd; > void *addr, *addr2, *addr3, *addr1; > ret = memfd_create_region("test_region", REGION_SIZE); > printf("ret=%d\n", ret); > fd = ret; > > // Create map > addr = mmap(0, REGION_SIZE, PROT_READ|PROT_WRITE, MAP_SHARED, fd, 0); > if (addr == MAP_FAILED) > printf("map 0 failed\n"); > else > printf("map 0 passed\n"); > > if ((ret = write(fd, "test", 4)) != 4) > printf("write failed even though no future-write seal " > "(ret=%d errno =%d)\n", ret, errno); > else > printf("write passed\n"); > > addr1 = mmap(0, REGION_SIZE, PROT_READ|PROT_WRITE, MAP_SHARED, fd, 0); > if (addr1 == MAP_FAILED) > perror("map 1 prot-write failed even though no seal\n"); > else > printf("map 1 prot-write passed as expected\n"); > > ret = fcntl(fd, F_ADD_SEALS, F_SEAL_FUTURE_WRITE | > F_SEAL_GROW | > F_SEAL_SHRINK); > if (ret == -1) > printf("fcntl failed, errno: %d\n", errno); > else > printf("future-write seal now active\n"); > > if ((ret = write(fd, "test", 4)) != 4) > printf("write failed as expected due to future-write seal\n"); > else > printf("write passed (unexpected)\n"); > > addr2 = mmap(0, REGION_SIZE, PROT_READ|PROT_WRITE, MAP_SHARED, fd, 0); > if (addr2 == MAP_FAILED) > perror("map 2 prot-write failed as expected due to seal\n"); > else > printf("map 2 passed\n"); > > addr3 = mmap(0, REGION_SIZE, PROT_READ, MAP_SHARED, fd, 0); > if (addr3 == MAP_FAILED) > perror("map 3 failed\n"); > else > printf("map 3 prot-read passed as expected\n"); > } > > The output of running this program is as follows: > ret=3 > map 0 passed > write passed > map 1 prot-write passed as expected > future-write seal now active > write failed as expected due to future-write seal > map 2 prot-write failed as expected due to seal > : Permission denied > map 3 prot-read passed as expected > > [joel@joelfernandes.org: make F_SEAL_FUTURE_WRITE seal more robust] > Link: http://lkml.kernel.org/r/20181120052137.74317-1-joel@joelfernandes.org > Link: http://lkml.kernel.org/r/20181108041537.39694-1-joel@joelfernandes.org > Signed-off-by: Joel Fernandes (Google) > Cc: John Stultz > Cc: John Reck > Cc: Todd Kjos > Cc: Greg Kroah-Hartman > Cc: Christoph Hellwig > Cc: Al Viro > Cc: Daniel Colascione > Cc: J. Bruce Fields > Cc: Jeff Layton > Cc: Khalid Aziz > Cc: Lei Yang > Cc: Marc-Andr Lureau > Cc: Mike Kravetz > Cc: Minchan Kim > Cc: Shuah Khan > Cc: Valdis Kletnieks > Cc: Andy Lutomirski > Cc: Jann Horn > Signed-off-by: Andrew Morton > --- > > > --- a/include/uapi/linux/fcntl.h~mm-add-an-f_seal_future_write-seal-to-memfd > +++ a/include/uapi/linux/fcntl.h > @@ -41,6 +41,7 @@ > #define F_SEAL_SHRINK 0x0002 /* prevent file from shrinking */ > #define F_SEAL_GROW 0x0004 /* prevent file from growing */ > #define F_SEAL_WRITE 0x0008 /* prevent writes */ > +#define F_SEAL_FUTURE_WRITE 0x0010 /* prevent future writes while mapped */ > /* (1U << 31) is reserved for signed error codes */ > > /* > --- a/mm/memfd.c~mm-add-an-f_seal_future_write-seal-to-memfd > +++ a/mm/memfd.c > @@ -131,7 +131,8 @@ static unsigned int *memfd_file_seals_pt > #define F_ALL_SEALS (F_SEAL_SEAL | \ > F_SEAL_SHRINK | \ > F_SEAL_GROW | \ > - F_SEAL_WRITE) > + F_SEAL_WRITE | \ > + F_SEAL_FUTURE_WRITE) > > static int memfd_add_seals(struct file *file, unsigned int seals) > { > --- a/fs/hugetlbfs/inode.c~mm-add-an-f_seal_future_write-seal-to-memfd > +++ a/fs/hugetlbfs/inode.c > @@ -530,7 +530,7 @@ static long hugetlbfs_punch_hole(struct > inode_lock(inode); > > /* protected by i_mutex */ > - if (info->seals & F_SEAL_WRITE) { > + if (info->seals & (F_SEAL_WRITE | F_SEAL_FUTURE_WRITE)) { > inode_unlock(inode); > return -EPERM; > } > --- a/mm/shmem.c~mm-add-an-f_seal_future_write-seal-to-memfd > +++ a/mm/shmem.c > @@ -2119,6 +2119,23 @@ out_nomem: > > static int shmem_mmap(struct file *file, struct vm_area_struct *vma) > { > + struct shmem_inode_info *info = SHMEM_I(file_inode(file)); > + > + /* > + * New PROT_READ and MAP_SHARED mmaps are not allowed when "future PROT_WRITE, perhaps? > + * write" seal active. > + */ > + if ((vma->vm_flags & VM_SHARED) && (vma->vm_flags & VM_WRITE) && > + (info->seals & F_SEAL_FUTURE_WRITE)) > + return -EPERM; > + > + /* > + * Since the F_SEAL_FUTURE_WRITE seals allow for a MAP_SHARED read-only > + * mapping, take care to not allow mprotect to revert protections. > + */ > + if (info->seals & F_SEAL_FUTURE_WRITE) > + vma->vm_flags &= ~(VM_MAYWRITE); > + This might all be clearer as: if (info->seals & F_SEAL_FUTURE_WRITE) { if (vma->vm_flags ...) return -EPERM; vma->vm_flags &= ~VM_MAYWRITE; } with appropriate comments inserted.