linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v2 1/2] mm: Add an F_SEAL_FS_WRITE seal to memfd
@ 2018-10-09 22:20 Joel Fernandes (Google)
  2018-10-09 22:20 ` [PATCH v2 2/2] selftests/memfd: Add tests for F_SEAL_FS_WRITE seal Joel Fernandes (Google)
                   ` (2 more replies)
  0 siblings, 3 replies; 10+ messages in thread
From: Joel Fernandes (Google) @ 2018-10-09 22:20 UTC (permalink / raw)
  To: linux-kernel
  Cc: kernel-team, Joel Fernandes (Google),
	jreck, john.stultz, tkjos, gregkh, Andrew Morton, dancol,
	J. Bruce Fields, Jeff Layton, Khalid Aziz, linux-fsdevel,
	linux-kselftest, linux-mm, Mike Kravetz, minchan, Shuah Khan

Android uses ashmem for sharing memory regions. We are looking forward
to migrating all usecases of ashmem to memfd so that we can possibly
remove the ashmem driver in the future from staging while also
benefiting from using memfd and contributing to it. Note staging drivers
are also not ABI and generally can be removed at anytime.

One of the main usecases Android has is the ability to create a region
and mmap it as writeable, then drop its protection for "future" writes
while keeping the existing already mmap'ed writeable-region active.
This allows us to implement a usecase where receivers of the shared
memory buffer can get a read-only view, while the sender continues to
write to the buffer. See CursorWindow in Android for more details:
https://developer.android.com/reference/android/database/CursorWindow

This usecase cannot be implemented with the existing F_SEAL_WRITE seal.
To support the usecase, this patch adds a new F_SEAL_FS_WRITE seal which
prevents any future mmap and write syscalls from succeeding while
keeping the existing mmap active. The following program shows the seal
working in action:

int main() {
    int ret, fd;
    void *addr, *addr2, *addr3, *addr1;
    ret = memfd_create_region("test_region", REGION_SIZE);
    printf("ret=%d\n", ret);
    fd = ret;

    // Create map
    addr = mmap(0, REGION_SIZE, PROT_READ|PROT_WRITE, MAP_SHARED, fd, 0);
    if (addr == MAP_FAILED)
	    printf("map 0 failed\n");
    else
	    printf("map 0 passed\n");

    if ((ret = write(fd, "test", 4)) != 4)
	    printf("write failed even though no fs-write seal "
		   "(ret=%d errno =%d)\n", ret, errno);
    else
	    printf("write passed\n");

    addr1 = mmap(0, REGION_SIZE, PROT_READ|PROT_WRITE, MAP_SHARED, fd, 0);
    if (addr1 == MAP_FAILED)
	    perror("map 1 prot-write failed even though no seal\n");
    else
	    printf("map 1 prot-write passed as expected\n");

    ret = fcntl(fd, F_ADD_SEALS, F_SEAL_FS_WRITE);
    if (ret == -1)
	    printf("fcntl failed, errno: %d\n", errno);
    else
	    printf("fs-write seal now active\n");

    if ((ret = write(fd, "test", 4)) != 4)
	    printf("write failed as expected due to fs-write seal\n");
    else
	    printf("write passed (unexpected)\n");

    addr2 = mmap(0, REGION_SIZE, PROT_READ|PROT_WRITE, MAP_SHARED, fd, 0);
    if (addr2 == MAP_FAILED)
	    perror("map 2 prot-write failed as expected due to seal\n");
    else
	    printf("map 2 passed\n");

    addr3 = mmap(0, REGION_SIZE, PROT_READ, MAP_SHARED, fd, 0);
    if (addr3 == MAP_FAILED)
	    perror("map 3 failed\n");
    else
	    printf("map 3 prot-read passed as expected\n");
}

The output of running this program is as follows:
ret=3
map 0 passed
write passed
map 1 prot-write passed as expected
fs-write seal now active
write failed as expected due to fs-write seal
map 2 prot-write failed as expected due to seal
: Permission denied
map 3 prot-read passed as expected

Note: This seal will also prevent growing and shrinking of the memfd.
This is not something we do in Android so it does not affect us, however
I have mentioned this behavior of the seal in the manpage.

Cc: jreck@google.com
Cc: john.stultz@linaro.org
Cc: tkjos@google.com
Cc: gregkh@linuxfoundation.org
Signed-off-by: Joel Fernandes (Google) <joel@joelfernandes.org>
---
v1->v2: No change, just added selftests to the series. manpages are
ready and I'll submit them once the patches are accepted.

 include/uapi/linux/fcntl.h | 1 +
 mm/memfd.c                 | 6 +++++-
 2 files changed, 6 insertions(+), 1 deletion(-)

diff --git a/include/uapi/linux/fcntl.h b/include/uapi/linux/fcntl.h
index c98312fa78a5..fe44a2035edf 100644
--- a/include/uapi/linux/fcntl.h
+++ b/include/uapi/linux/fcntl.h
@@ -41,6 +41,7 @@
 #define F_SEAL_SHRINK	0x0002	/* prevent file from shrinking */
 #define F_SEAL_GROW	0x0004	/* prevent file from growing */
 #define F_SEAL_WRITE	0x0008	/* prevent writes */
+#define F_SEAL_FS_WRITE	0x0010  /* prevent all write-related syscalls */
 /* (1U << 31) is reserved for signed error codes */
 
 /*
diff --git a/mm/memfd.c b/mm/memfd.c
index 27069518e3c5..9b8855b80de9 100644
--- a/mm/memfd.c
+++ b/mm/memfd.c
@@ -150,7 +150,8 @@ static unsigned int *memfd_file_seals_ptr(struct file *file)
 #define F_ALL_SEALS (F_SEAL_SEAL | \
 		     F_SEAL_SHRINK | \
 		     F_SEAL_GROW | \
-		     F_SEAL_WRITE)
+		     F_SEAL_WRITE | \
+		     F_SEAL_FS_WRITE)
 
 static int memfd_add_seals(struct file *file, unsigned int seals)
 {
@@ -219,6 +220,9 @@ static int memfd_add_seals(struct file *file, unsigned int seals)
 		}
 	}
 
+	if ((seals & F_SEAL_FS_WRITE) && !(*file_seals & F_SEAL_FS_WRITE))
+		file->f_mode &= ~(FMODE_WRITE | FMODE_PWRITE);
+
 	*file_seals |= seals;
 	error = 0;
 
-- 
2.19.0.605.g01d371f741-goog


^ permalink raw reply	[flat|nested] 10+ messages in thread

* [PATCH v2 2/2] selftests/memfd: Add tests for F_SEAL_FS_WRITE seal
  2018-10-09 22:20 [PATCH v2 1/2] mm: Add an F_SEAL_FS_WRITE seal to memfd Joel Fernandes (Google)
@ 2018-10-09 22:20 ` Joel Fernandes (Google)
  2018-10-09 22:34   ` Joel Fernandes
  2018-10-16 21:57 ` [PATCH v2 1/2] mm: Add an F_SEAL_FS_WRITE seal to memfd John Stultz
  2018-10-17  9:51 ` Christoph Hellwig
  2 siblings, 1 reply; 10+ messages in thread
From: Joel Fernandes (Google) @ 2018-10-09 22:20 UTC (permalink / raw)
  To: linux-kernel
  Cc: kernel-team, Joel Fernandes (Google),
	dancol, minchan, Andrew Morton, gregkh, J. Bruce Fields,
	Jeff Layton, john.stultz, jreck, Khalid Aziz, linux-fsdevel,
	linux-kselftest, linux-mm, Mike Kravetz, Shuah Khan, tkjos

Add tests to verify sealing memfds with the F_SEAL_FS_WRITE works as
expected.

Cc: dancol@google.com
Cc: minchan@google.com
Signed-off-by: Joel Fernandes (Google) <joel@joelfernandes.org>
---
 tools/testing/selftests/memfd/memfd_test.c | 51 +++++++++++++++++++++-
 1 file changed, 50 insertions(+), 1 deletion(-)

diff --git a/tools/testing/selftests/memfd/memfd_test.c b/tools/testing/selftests/memfd/memfd_test.c
index 10baa1652fc2..4bd2b6c87bb4 100644
--- a/tools/testing/selftests/memfd/memfd_test.c
+++ b/tools/testing/selftests/memfd/memfd_test.c
@@ -27,7 +27,7 @@
 
 #define MFD_DEF_SIZE 8192
 #define STACK_SIZE 65536
-
+#define F_SEAL_FS_WRITE         0x0010
 /*
  * Default is not to test hugetlbfs
  */
@@ -170,6 +170,24 @@ static void *mfd_assert_mmap_shared(int fd)
 	return p;
 }
 
+static void *mfd_fail_mmap_shared(int fd)
+{
+	void *p;
+
+	p = mmap(NULL,
+		 mfd_def_size,
+		 PROT_READ | PROT_WRITE,
+		 MAP_SHARED,
+		 fd,
+		 0);
+	if (p != MAP_FAILED) {
+		printf("mmap() didn't fail as expected\n");
+		abort();
+	}
+
+	return p;
+}
+
 static void *mfd_assert_mmap_private(int fd)
 {
 	void *p;
@@ -692,6 +710,36 @@ static void test_seal_write(void)
 	close(fd);
 }
 
+/*
+ * Test SEAL_WRITE
+ * Test whether SEAL_WRITE actually prevents modifications.
+ */
+static void test_seal_fs_write(void)
+{
+	int fd;
+	void *p;
+
+	printf("%s SEAL-FS-WRITE\n", memfd_str);
+
+	fd = mfd_assert_new("kern_memfd_seal_fs_write",
+			    mfd_def_size,
+			    MFD_CLOEXEC | MFD_ALLOW_SEALING);
+
+	p = mfd_assert_mmap_shared(fd);
+
+	/* FS_WRITE seal can be added even with existing
+	 * writeable mappings */
+	mfd_assert_has_seals(fd, 0);
+	mfd_assert_add_seals(fd, F_SEAL_FS_WRITE);
+	mfd_assert_has_seals(fd, F_SEAL_FS_WRITE);
+
+	mfd_assert_read(fd);
+	mfd_fail_write(fd);
+
+	munmap(p, mfd_def_size);
+	close(fd);
+}
+
 /*
  * Test SEAL_SHRINK
  * Test whether SEAL_SHRINK actually prevents shrinking
@@ -945,6 +993,7 @@ int main(int argc, char **argv)
 	test_basic();
 
 	test_seal_write();
+	test_seal_fs_write();
 	test_seal_shrink();
 	test_seal_grow();
 	test_seal_resize();
-- 
2.19.0.605.g01d371f741-goog


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH v2 2/2] selftests/memfd: Add tests for F_SEAL_FS_WRITE seal
  2018-10-09 22:20 ` [PATCH v2 2/2] selftests/memfd: Add tests for F_SEAL_FS_WRITE seal Joel Fernandes (Google)
@ 2018-10-09 22:34   ` Joel Fernandes
  0 siblings, 0 replies; 10+ messages in thread
From: Joel Fernandes @ 2018-10-09 22:34 UTC (permalink / raw)
  To: linux-kernel
  Cc: kernel-team, dancol, minchan, Andrew Morton, gregkh,
	J. Bruce Fields, Jeff Layton, john.stultz, jreck, Khalid Aziz,
	linux-fsdevel, linux-kselftest, linux-mm, Mike Kravetz,
	Shuah Khan, tkjos

On Tue, Oct 09, 2018 at 03:20:42PM -0700, Joel Fernandes (Google) wrote:
> Add tests to verify sealing memfds with the F_SEAL_FS_WRITE works as
> expected.
> 
> Cc: dancol@google.com
> Cc: minchan@google.com
> Signed-off-by: Joel Fernandes (Google) <joel@joelfernandes.org>
> ---
>  tools/testing/selftests/memfd/memfd_test.c | 51 +++++++++++++++++++++-
>  1 file changed, 50 insertions(+), 1 deletion(-)
> 
> diff --git a/tools/testing/selftests/memfd/memfd_test.c b/tools/testing/selftests/memfd/memfd_test.c
> index 10baa1652fc2..4bd2b6c87bb4 100644
> --- a/tools/testing/selftests/memfd/memfd_test.c
> +++ b/tools/testing/selftests/memfd/memfd_test.c
> @@ -27,7 +27,7 @@
>  
>  #define MFD_DEF_SIZE 8192
>  #define STACK_SIZE 65536
> -
> +#define F_SEAL_FS_WRITE         0x0010
>  /*
>   * Default is not to test hugetlbfs
>   */
> @@ -170,6 +170,24 @@ static void *mfd_assert_mmap_shared(int fd)
>  	return p;
>  }
>  
> +static void *mfd_fail_mmap_shared(int fd)
> +{
> +	void *p;
> +
> +	p = mmap(NULL,
> +		 mfd_def_size,
> +		 PROT_READ | PROT_WRITE,
> +		 MAP_SHARED,
> +		 fd,
> +		 0);
> +	if (p != MAP_FAILED) {
> +		printf("mmap() didn't fail as expected\n");
> +		abort();
> +	}
> +
> +	return p;
> +}
> +

Ah, this function is unused. I wrote it initially and used it but then
figured I didn't need it, and then forgot to remove it. It does not affect
the correctness of the patch. Anyway below is the updated patch.

thanks,

- Joel

------8<-----

From: "Joel Fernandes (Google)" <joel@joelfernandes.org>
Subject: [PATCH v2.1] selftests/memfd: Add tests for F_SEAL_FS_WRITE seal

Add tests to verify sealing memfds with the F_SEAL_FS_WRITE works as
expected.

Cc: dancol@google.com
Cc: minchan@kernel.org
Signed-off-by: Joel Fernandes (Google) <joel@joelfernandes.org>
---
 tools/testing/selftests/memfd/memfd_test.c | 33 +++++++++++++++++++++-
 1 file changed, 32 insertions(+), 1 deletion(-)

diff --git a/tools/testing/selftests/memfd/memfd_test.c b/tools/testing/selftests/memfd/memfd_test.c
index 10baa1652fc2..d074de568ba0 100644
--- a/tools/testing/selftests/memfd/memfd_test.c
+++ b/tools/testing/selftests/memfd/memfd_test.c
@@ -27,7 +27,7 @@
 
 #define MFD_DEF_SIZE 8192
 #define STACK_SIZE 65536
-
+#define F_SEAL_FS_WRITE         0x0010
 /*
  * Default is not to test hugetlbfs
  */
@@ -692,6 +692,36 @@ static void test_seal_write(void)
 	close(fd);
 }
 
+/*
+ * Test SEAL_WRITE
+ * Test whether SEAL_WRITE actually prevents modifications.
+ */
+static void test_seal_fs_write(void)
+{
+	int fd;
+	void *p;
+
+	printf("%s SEAL-FS-WRITE\n", memfd_str);
+
+	fd = mfd_assert_new("kern_memfd_seal_fs_write",
+			    mfd_def_size,
+			    MFD_CLOEXEC | MFD_ALLOW_SEALING);
+
+	p = mfd_assert_mmap_shared(fd);
+
+	/* FS_WRITE seal can be added even with existing
+	 * writeable mappings */
+	mfd_assert_has_seals(fd, 0);
+	mfd_assert_add_seals(fd, F_SEAL_FS_WRITE);
+	mfd_assert_has_seals(fd, F_SEAL_FS_WRITE);
+
+	mfd_assert_read(fd);
+	mfd_fail_write(fd);
+
+	munmap(p, mfd_def_size);
+	close(fd);
+}
+
 /*
  * Test SEAL_SHRINK
  * Test whether SEAL_SHRINK actually prevents shrinking
@@ -945,6 +975,7 @@ int main(int argc, char **argv)
 	test_basic();
 
 	test_seal_write();
+	test_seal_fs_write();
 	test_seal_shrink();
 	test_seal_grow();
 	test_seal_resize();
-- 
2.19.0.605.g01d371f741-goog


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH v2 1/2] mm: Add an F_SEAL_FS_WRITE seal to memfd
  2018-10-09 22:20 [PATCH v2 1/2] mm: Add an F_SEAL_FS_WRITE seal to memfd Joel Fernandes (Google)
  2018-10-09 22:20 ` [PATCH v2 2/2] selftests/memfd: Add tests for F_SEAL_FS_WRITE seal Joel Fernandes (Google)
@ 2018-10-16 21:57 ` John Stultz
  2018-10-17  9:51 ` Christoph Hellwig
  2 siblings, 0 replies; 10+ messages in thread
From: John Stultz @ 2018-10-16 21:57 UTC (permalink / raw)
  To: Joel Fernandes (Google)
  Cc: lkml, Android Kernel Team, John Reck, Todd Kjos, Greg KH,
	Andrew Morton, Daniel Colascione, J. Bruce Fields, Jeff Layton,
	Khalid Aziz, linux-fsdevel, linux-kselftest, linux-mm,
	Mike Kravetz, Minchan Kim, Shuah Khan

On Tue, Oct 9, 2018 at 3:20 PM, Joel Fernandes (Google)
<joel@joelfernandes.org> wrote:
> Android uses ashmem for sharing memory regions. We are looking forward
> to migrating all usecases of ashmem to memfd so that we can possibly
> remove the ashmem driver in the future from staging while also
> benefiting from using memfd and contributing to it. Note staging drivers
> are also not ABI and generally can be removed at anytime.
>
> One of the main usecases Android has is the ability to create a region
> and mmap it as writeable, then drop its protection for "future" writes
> while keeping the existing already mmap'ed writeable-region active.
> This allows us to implement a usecase where receivers of the shared
> memory buffer can get a read-only view, while the sender continues to
> write to the buffer. See CursorWindow in Android for more details:
> https://developer.android.com/reference/android/database/CursorWindow
>
> This usecase cannot be implemented with the existing F_SEAL_WRITE seal.
> To support the usecase, this patch adds a new F_SEAL_FS_WRITE seal which
> prevents any future mmap and write syscalls from succeeding while
> keeping the existing mmap active. The following program shows the seal
> working in action:
>
> int main() {
>     int ret, fd;
>     void *addr, *addr2, *addr3, *addr1;
>     ret = memfd_create_region("test_region", REGION_SIZE);
>     printf("ret=%d\n", ret);
>     fd = ret;
>
>     // Create map
>     addr = mmap(0, REGION_SIZE, PROT_READ|PROT_WRITE, MAP_SHARED, fd, 0);
>     if (addr == MAP_FAILED)
>             printf("map 0 failed\n");
>     else
>             printf("map 0 passed\n");
>
>     if ((ret = write(fd, "test", 4)) != 4)
>             printf("write failed even though no fs-write seal "
>                    "(ret=%d errno =%d)\n", ret, errno);
>     else
>             printf("write passed\n");
>
>     addr1 = mmap(0, REGION_SIZE, PROT_READ|PROT_WRITE, MAP_SHARED, fd, 0);
>     if (addr1 == MAP_FAILED)
>             perror("map 1 prot-write failed even though no seal\n");
>     else
>             printf("map 1 prot-write passed as expected\n");
>
>     ret = fcntl(fd, F_ADD_SEALS, F_SEAL_FS_WRITE);
>     if (ret == -1)
>             printf("fcntl failed, errno: %d\n", errno);
>     else
>             printf("fs-write seal now active\n");
>
>     if ((ret = write(fd, "test", 4)) != 4)
>             printf("write failed as expected due to fs-write seal\n");
>     else
>             printf("write passed (unexpected)\n");
>
>     addr2 = mmap(0, REGION_SIZE, PROT_READ|PROT_WRITE, MAP_SHARED, fd, 0);
>     if (addr2 == MAP_FAILED)
>             perror("map 2 prot-write failed as expected due to seal\n");
>     else
>             printf("map 2 passed\n");
>
>     addr3 = mmap(0, REGION_SIZE, PROT_READ, MAP_SHARED, fd, 0);
>     if (addr3 == MAP_FAILED)
>             perror("map 3 failed\n");
>     else
>             printf("map 3 prot-read passed as expected\n");
> }
>
> The output of running this program is as follows:
> ret=3
> map 0 passed
> write passed
> map 1 prot-write passed as expected
> fs-write seal now active
> write failed as expected due to fs-write seal
> map 2 prot-write failed as expected due to seal
> : Permission denied
> map 3 prot-read passed as expected
>
> Note: This seal will also prevent growing and shrinking of the memfd.
> This is not something we do in Android so it does not affect us, however
> I have mentioned this behavior of the seal in the manpage.
>
> Cc: jreck@google.com
> Cc: john.stultz@linaro.org
> Cc: tkjos@google.com
> Cc: gregkh@linuxfoundation.org
> Signed-off-by: Joel Fernandes (Google) <joel@joelfernandes.org>

Reviewed-by: John Stultz <john.stultz@linaro.org>

thanks
-john

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH v2 1/2] mm: Add an F_SEAL_FS_WRITE seal to memfd
  2018-10-09 22:20 [PATCH v2 1/2] mm: Add an F_SEAL_FS_WRITE seal to memfd Joel Fernandes (Google)
  2018-10-09 22:20 ` [PATCH v2 2/2] selftests/memfd: Add tests for F_SEAL_FS_WRITE seal Joel Fernandes (Google)
  2018-10-16 21:57 ` [PATCH v2 1/2] mm: Add an F_SEAL_FS_WRITE seal to memfd John Stultz
@ 2018-10-17  9:51 ` Christoph Hellwig
  2018-10-17 10:39   ` Joel Fernandes
  2 siblings, 1 reply; 10+ messages in thread
From: Christoph Hellwig @ 2018-10-17  9:51 UTC (permalink / raw)
  To: Joel Fernandes (Google)
  Cc: linux-kernel, kernel-team, jreck, john.stultz, tkjos, gregkh,
	Andrew Morton, dancol, J. Bruce Fields, Jeff Layton, Khalid Aziz,
	linux-fsdevel, linux-kselftest, linux-mm, Mike Kravetz, minchan,
	Shuah Khan

On Tue, Oct 09, 2018 at 03:20:41PM -0700, Joel Fernandes (Google) wrote:
> One of the main usecases Android has is the ability to create a region
> and mmap it as writeable, then drop its protection for "future" writes
> while keeping the existing already mmap'ed writeable-region active.

s/drop/add/ ?

Otherwise this doesn't make much sense to me.

> This usecase cannot be implemented with the existing F_SEAL_WRITE seal.
> To support the usecase, this patch adds a new F_SEAL_FS_WRITE seal which
> prevents any future mmap and write syscalls from succeeding while
> keeping the existing mmap active. The following program shows the seal
> working in action:

Where does the FS come from?  I'd rather expect this to be implemented
as a 'force' style flag that applies the seal even if the otherwise
required precondition is not met.

> Note: This seal will also prevent growing and shrinking of the memfd.
> This is not something we do in Android so it does not affect us, however
> I have mentioned this behavior of the seal in the manpage.

This seems odd, as that is otherwise split into the F_SEAL_SHRINK /
F_SEAL_GROW flags.

>  static int memfd_add_seals(struct file *file, unsigned int seals)
>  {
> @@ -219,6 +220,9 @@ static int memfd_add_seals(struct file *file, unsigned int seals)
>  		}
>  	}
>  
> +	if ((seals & F_SEAL_FS_WRITE) && !(*file_seals & F_SEAL_FS_WRITE))
> +		file->f_mode &= ~(FMODE_WRITE | FMODE_PWRITE);
> +

This seems to lack any synchronization for f_mode.

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH v2 1/2] mm: Add an F_SEAL_FS_WRITE seal to memfd
  2018-10-17  9:51 ` Christoph Hellwig
@ 2018-10-17 10:39   ` Joel Fernandes
  2018-10-17 12:08     ` Christoph Hellwig
  0 siblings, 1 reply; 10+ messages in thread
From: Joel Fernandes @ 2018-10-17 10:39 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: linux-kernel, kernel-team, jreck, john.stultz, tkjos, gregkh,
	Andrew Morton, dancol, J. Bruce Fields, Jeff Layton, Khalid Aziz,
	linux-fsdevel, linux-kselftest, linux-mm, Mike Kravetz, minchan,
	Shuah Khan

On Wed, Oct 17, 2018 at 02:51:55AM -0700, Christoph Hellwig wrote:
> On Tue, Oct 09, 2018 at 03:20:41PM -0700, Joel Fernandes (Google) wrote:
> > One of the main usecases Android has is the ability to create a region
> > and mmap it as writeable, then drop its protection for "future" writes
> > while keeping the existing already mmap'ed writeable-region active.
> 
> s/drop/add/ ?
> 
> Otherwise this doesn't make much sense to me.

Sure, you are right that "add" is more appropriate. I'll change it to that.

> > This usecase cannot be implemented with the existing F_SEAL_WRITE seal.
> > To support the usecase, this patch adds a new F_SEAL_FS_WRITE seal which
> > prevents any future mmap and write syscalls from succeeding while
> > keeping the existing mmap active. The following program shows the seal
> > working in action:
> 
> Where does the FS come from?  I'd rather expect this to be implemented
> as a 'force' style flag that applies the seal even if the otherwise
> required precondition is not met.

The "FS" was meant to convey that the seal is preventing writes at the VFS
layer itself, for example vfs_write checks FMODE_WRITE and does not proceed,
it instead returns an error if the flag is not set. I could not find a better
name for it, I could call it F_SEAL_VFS_WRITE if you prefer?

> > Note: This seal will also prevent growing and shrinking of the memfd.
> > This is not something we do in Android so it does not affect us, however
> > I have mentioned this behavior of the seal in the manpage.
> 
> This seems odd, as that is otherwise split into the F_SEAL_SHRINK /
> F_SEAL_GROW flags.

I could make it such that this seal would not be allowed unless F_SEAL_SHRINK
and F_SEAL_GROW are either previously set, or they are passed along with this
seal. Would that make more sense to you?

> >  static int memfd_add_seals(struct file *file, unsigned int seals)
> >  {
> > @@ -219,6 +220,9 @@ static int memfd_add_seals(struct file *file, unsigned int seals)
> >  		}
> >  	}
> >  
> > +	if ((seals & F_SEAL_FS_WRITE) && !(*file_seals & F_SEAL_FS_WRITE))
> > +		file->f_mode &= ~(FMODE_WRITE | FMODE_PWRITE);
> > +
> 
> This seems to lack any synchronization for f_mode.

The f_mode is set when the struct file is first created and then memfd sets
additional flags in memfd_create. Then later we are changing it here at the
time of setting the seal. I donot see any possiblity of a race since it is
impossible to set the seal before memfd_create returns. Could you provide
more details about what kind of synchronization is needed and what is the
race condition scenario you were thinking off?

thanks for the review,

 - Joel


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH v2 1/2] mm: Add an F_SEAL_FS_WRITE seal to memfd
  2018-10-17 10:39   ` Joel Fernandes
@ 2018-10-17 12:08     ` Christoph Hellwig
  2018-10-17 15:44       ` Daniel Colascione
  2018-10-17 17:45       ` Joel Fernandes
  0 siblings, 2 replies; 10+ messages in thread
From: Christoph Hellwig @ 2018-10-17 12:08 UTC (permalink / raw)
  To: Joel Fernandes
  Cc: Christoph Hellwig, linux-kernel, kernel-team, jreck, john.stultz,
	tkjos, gregkh, Andrew Morton, dancol, J. Bruce Fields,
	Jeff Layton, Khalid Aziz, linux-fsdevel, linux-kselftest,
	linux-mm, Mike Kravetz, minchan, Shuah Khan

On Wed, Oct 17, 2018 at 03:39:58AM -0700, Joel Fernandes wrote:
> > > This usecase cannot be implemented with the existing F_SEAL_WRITE seal.
> > > To support the usecase, this patch adds a new F_SEAL_FS_WRITE seal which
> > > prevents any future mmap and write syscalls from succeeding while
> > > keeping the existing mmap active. The following program shows the seal
> > > working in action:
> > 
> > Where does the FS come from?  I'd rather expect this to be implemented
> > as a 'force' style flag that applies the seal even if the otherwise
> > required precondition is not met.
> 
> The "FS" was meant to convey that the seal is preventing writes at the VFS
> layer itself, for example vfs_write checks FMODE_WRITE and does not proceed,
> it instead returns an error if the flag is not set. I could not find a better
> name for it, I could call it F_SEAL_VFS_WRITE if you prefer?

I don't think there is anything VFS or FS about that - at best that
is an implementation detail.

Either do something like the force flag I suggested in the last mail,
or give it a name that matches the intention, e.g F_SEAL_FUTURE_WRITE.

> I could make it such that this seal would not be allowed unless F_SEAL_SHRINK
> and F_SEAL_GROW are either previously set, or they are passed along with this
> seal. Would that make more sense to you?

Yes.

> > >  static int memfd_add_seals(struct file *file, unsigned int seals)
> > >  {
> > > @@ -219,6 +220,9 @@ static int memfd_add_seals(struct file *file, unsigned int seals)
> > >  		}
> > >  	}
> > >  
> > > +	if ((seals & F_SEAL_FS_WRITE) && !(*file_seals & F_SEAL_FS_WRITE))
> > > +		file->f_mode &= ~(FMODE_WRITE | FMODE_PWRITE);
> > > +
> > 
> > This seems to lack any synchronization for f_mode.
> 
> The f_mode is set when the struct file is first created and then memfd sets
> additional flags in memfd_create. Then later we are changing it here at the
> time of setting the seal. I donot see any possiblity of a race since it is
> impossible to set the seal before memfd_create returns. Could you provide
> more details about what kind of synchronization is needed and what is the
> race condition scenario you were thinking off?

Even if no one changes these specific flags we still need a lock due
to rmw cycles on the field.  For example fadvise can set or clear
FMODE_RANDOM.  It seems to use file->f_lock for synchronization.

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH v2 1/2] mm: Add an F_SEAL_FS_WRITE seal to memfd
  2018-10-17 12:08     ` Christoph Hellwig
@ 2018-10-17 15:44       ` Daniel Colascione
  2018-10-17 16:19         ` Christoph Hellwig
  2018-10-17 17:45       ` Joel Fernandes
  1 sibling, 1 reply; 10+ messages in thread
From: Daniel Colascione @ 2018-10-17 15:44 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Joel Fernandes, linux-kernel, kernel-team, John Reck,
	John Stultz, Todd Kjos, Greg KH, Andrew Morton, J. Bruce Fields,
	Jeff Layton, Khalid Aziz, linux-fsdevel, linux-kselftest,
	linux-mm, Mike Kravetz, Minchan Kim, Shuah Khan

On Wed, Oct 17, 2018 at 5:08 AM, Christoph Hellwig <hch@infradead.org> wrote:
> On Wed, Oct 17, 2018 at 03:39:58AM -0700, Joel Fernandes wrote:
>> > > This usecase cannot be implemented with the existing F_SEAL_WRITE seal.
>> > > To support the usecase, this patch adds a new F_SEAL_FS_WRITE seal which
>> > > prevents any future mmap and write syscalls from succeeding while
>> > > keeping the existing mmap active. The following program shows the seal
>> > > working in action:
>> >
>> > Where does the FS come from?  I'd rather expect this to be implemented
>> > as a 'force' style flag that applies the seal even if the otherwise
>> > required precondition is not met.
>>
>> The "FS" was meant to convey that the seal is preventing writes at the VFS
>> layer itself, for example vfs_write checks FMODE_WRITE and does not proceed,
>> it instead returns an error if the flag is not set. I could not find a better
>> name for it, I could call it F_SEAL_VFS_WRITE if you prefer?
>
> I don't think there is anything VFS or FS about that - at best that
> is an implementation detail.
>
> Either do something like the force flag I suggested in the last mail,
> or give it a name that matches the intention, e.g F_SEAL_FUTURE_WRITE.

+1

>> > This seems to lack any synchronization for f_mode.
>>
>> The f_mode is set when the struct file is first created and then memfd sets
>> additional flags in memfd_create. Then later we are changing it here at the
>> time of setting the seal. I donot see any possiblity of a race since it is
>> impossible to set the seal before memfd_create returns. Could you provide
>> more details about what kind of synchronization is needed and what is the
>> race condition scenario you were thinking off?
>
> Even if no one changes these specific flags we still need a lock due
> to rmw cycles on the field.  For example fadvise can set or clear
> FMODE_RANDOM.  It seems to use file->f_lock for synchronization.

Compare-and-exchange will suffice, right?

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH v2 1/2] mm: Add an F_SEAL_FS_WRITE seal to memfd
  2018-10-17 15:44       ` Daniel Colascione
@ 2018-10-17 16:19         ` Christoph Hellwig
  0 siblings, 0 replies; 10+ messages in thread
From: Christoph Hellwig @ 2018-10-17 16:19 UTC (permalink / raw)
  To: Daniel Colascione
  Cc: Christoph Hellwig, Joel Fernandes, linux-kernel, kernel-team,
	John Reck, John Stultz, Todd Kjos, Greg KH, Andrew Morton,
	J. Bruce Fields, Jeff Layton, Khalid Aziz, linux-fsdevel,
	linux-kselftest, linux-mm, Mike Kravetz, Minchan Kim, Shuah Khan

On Wed, Oct 17, 2018 at 08:44:01AM -0700, Daniel Colascione wrote:
> > Even if no one changes these specific flags we still need a lock due
> > to rmw cycles on the field.  For example fadvise can set or clear
> > FMODE_RANDOM.  It seems to use file->f_lock for synchronization.
> 
> Compare-and-exchange will suffice, right?

Only if all users use the compare and exchange, and right now they
don't.

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH v2 1/2] mm: Add an F_SEAL_FS_WRITE seal to memfd
  2018-10-17 12:08     ` Christoph Hellwig
  2018-10-17 15:44       ` Daniel Colascione
@ 2018-10-17 17:45       ` Joel Fernandes
  1 sibling, 0 replies; 10+ messages in thread
From: Joel Fernandes @ 2018-10-17 17:45 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: linux-kernel, kernel-team, jreck, john.stultz, tkjos, gregkh,
	Andrew Morton, dancol, J. Bruce Fields, Jeff Layton, Khalid Aziz,
	linux-fsdevel, linux-kselftest, linux-mm, Mike Kravetz, minchan,
	Shuah Khan

On Wed, Oct 17, 2018 at 05:08:29AM -0700, Christoph Hellwig wrote:
> On Wed, Oct 17, 2018 at 03:39:58AM -0700, Joel Fernandes wrote:
> > > > This usecase cannot be implemented with the existing F_SEAL_WRITE seal.
> > > > To support the usecase, this patch adds a new F_SEAL_FS_WRITE seal which
> > > > prevents any future mmap and write syscalls from succeeding while
> > > > keeping the existing mmap active. The following program shows the seal
> > > > working in action:
> > > 
> > > Where does the FS come from?  I'd rather expect this to be implemented
> > > as a 'force' style flag that applies the seal even if the otherwise
> > > required precondition is not met.
> > 
> > The "FS" was meant to convey that the seal is preventing writes at the VFS
> > layer itself, for example vfs_write checks FMODE_WRITE and does not proceed,
> > it instead returns an error if the flag is not set. I could not find a better
> > name for it, I could call it F_SEAL_VFS_WRITE if you prefer?
> 
> I don't think there is anything VFS or FS about that - at best that
> is an implementation detail.
> 
> Either do something like the force flag I suggested in the last mail,
> or give it a name that matches the intention, e.g F_SEAL_FUTURE_WRITE.
> 

Ok, I agree. I like the name F_SEAL_FUTURE_WRITE you are proposing so I will
use that.

> > I could make it such that this seal would not be allowed unless F_SEAL_SHRINK
> > and F_SEAL_GROW are either previously set, or they are passed along with this
> > seal. Would that make more sense to you?
> 
> Yes.

Cool.

> > > >  static int memfd_add_seals(struct file *file, unsigned int seals)
> > > >  {
> > > > @@ -219,6 +220,9 @@ static int memfd_add_seals(struct file *file, unsigned int seals)
> > > >  		}
> > > >  	}
> > > >  
> > > > +	if ((seals & F_SEAL_FS_WRITE) && !(*file_seals & F_SEAL_FS_WRITE))
> > > > +		file->f_mode &= ~(FMODE_WRITE | FMODE_PWRITE);
> > > > +
> > > 
> > > This seems to lack any synchronization for f_mode.
> > 
> > The f_mode is set when the struct file is first created and then memfd sets
> > additional flags in memfd_create. Then later we are changing it here at the
> > time of setting the seal. I donot see any possiblity of a race since it is
> > impossible to set the seal before memfd_create returns. Could you provide
> > more details about what kind of synchronization is needed and what is the
> > race condition scenario you were thinking off?
> 
> Even if no one changes these specific flags we still need a lock due
> to rmw cycles on the field.  For example fadvise can set or clear
> FMODE_RANDOM.  It seems to use file->f_lock for synchronization.

Ok, I will acquire the f_lock before setting these, thanks for the
explanation. Will post updated patches today.

 - Joel


^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2018-10-17 17:45 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-10-09 22:20 [PATCH v2 1/2] mm: Add an F_SEAL_FS_WRITE seal to memfd Joel Fernandes (Google)
2018-10-09 22:20 ` [PATCH v2 2/2] selftests/memfd: Add tests for F_SEAL_FS_WRITE seal Joel Fernandes (Google)
2018-10-09 22:34   ` Joel Fernandes
2018-10-16 21:57 ` [PATCH v2 1/2] mm: Add an F_SEAL_FS_WRITE seal to memfd John Stultz
2018-10-17  9:51 ` Christoph Hellwig
2018-10-17 10:39   ` Joel Fernandes
2018-10-17 12:08     ` Christoph Hellwig
2018-10-17 15:44       ` Daniel Colascione
2018-10-17 16:19         ` Christoph Hellwig
2018-10-17 17:45       ` Joel Fernandes

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).