All of lore.kernel.org
 help / color / mirror / Atom feed
* [RFC PATCH 0/3] fat: add support for the renameat2 RENAME_EXCHANGE flag
@ 2022-05-19  9:23 Javier Martinez Canillas
  2022-05-19  9:23 ` [RFC PATCH 1/3] fat: add a vfat_rename2() and make existing .rename callback a helper Javier Martinez Canillas
                   ` (2 more replies)
  0 siblings, 3 replies; 10+ messages in thread
From: Javier Martinez Canillas @ 2022-05-19  9:23 UTC (permalink / raw)
  To: linux-kernel
  Cc: Colin Walters, Peter Jones, Alexander Larsson, Alberto Ruiz,
	Christian Kellner, Lennart Poettering, Chung-Chiang Cheng,
	Javier Martinez Canillas, OGAWA Hirofumi, Shuah Khan,
	linux-kselftest

Hello,

This series add support for the renameat2 system call RENAME_EXCHANGE flag
(which allows to atomically replace two paths) to the vfat filesystem code.

There are many use cases for this, but we are particularly interested in
making possible for vfat filesystems to be part of OSTree [0] deployments.

Currently OSTree relies on symbolic links to make the deployment updates
an atomic transactional operation. But RENAME_EXCHANGE could be used [1]
to achieve a similar level of robustness when using a vfat filesystem.

Patch #1 is just a preparatory patch to introduce the RENAME_EXCHANGE
support in patch #2 and finally patch #3 adds some kselftests to test it.

This is my first contribution to the fs/* subsystem, so I'm marking this
set as RFC, in case I got anything wrong with the patches. But they work
correctly on my local testing.

[0]: https://github.com/ostreedev/ostree
[1]: https://github.com/ostreedev/ostree/issues/1649


Javier Martinez Canillas (3):
  fat: add a vfat_rename2() and make existing .rename callback a helper
  fat: add renameat2 RENAME_EXCHANGE flag support
  selftests/filesystems: add a vfat RENAME_EXCHANGE test

 MAINTAINERS                                   |   1 +
 fs/fat/namei_vfat.c                           | 172 +++++++++++++++++-
 tools/testing/selftests/Makefile              |   1 +
 .../selftests/filesystems/fat/Makefile        |   7 +
 .../testing/selftests/filesystems/fat/config  |   2 +
 .../filesystems/fat/rename_exchange.c         |  37 ++++
 .../filesystems/fat/run_fat_tests.sh          |  80 ++++++++
 7 files changed, 293 insertions(+), 7 deletions(-)
 create mode 100644 tools/testing/selftests/filesystems/fat/Makefile
 create mode 100644 tools/testing/selftests/filesystems/fat/config
 create mode 100644 tools/testing/selftests/filesystems/fat/rename_exchange.c
 create mode 100755 tools/testing/selftests/filesystems/fat/run_fat_tests.sh

-- 
2.35.1


^ permalink raw reply	[flat|nested] 10+ messages in thread

* [RFC PATCH 1/3] fat: add a vfat_rename2() and make existing .rename callback a helper
  2022-05-19  9:23 [RFC PATCH 0/3] fat: add support for the renameat2 RENAME_EXCHANGE flag Javier Martinez Canillas
@ 2022-05-19  9:23 ` Javier Martinez Canillas
  2022-05-19  9:23 ` [RFC PATCH 2/3] fat: add renameat2 RENAME_EXCHANGE flag support Javier Martinez Canillas
  2022-05-19  9:23 ` [RFC PATCH 3/3] selftests/filesystems: add a vfat RENAME_EXCHANGE test Javier Martinez Canillas
  2 siblings, 0 replies; 10+ messages in thread
From: Javier Martinez Canillas @ 2022-05-19  9:23 UTC (permalink / raw)
  To: linux-kernel
  Cc: Colin Walters, Peter Jones, Alexander Larsson, Alberto Ruiz,
	Christian Kellner, Lennart Poettering, Chung-Chiang Cheng,
	Javier Martinez Canillas, OGAWA Hirofumi

Currently vfat only supports the RENAME_NOREPLACE flag which is handled by
the virtual file system layer but doesn't support the RENAME_EXCHANGE flag.

Add a vfat_rename2() function to be used as the .rename callback and move
the current vfat_rename() handler to a helper. This is in preparation for
implementing the RENAME_NOREPLACE flag using a different helper function.

Signed-off-by: Javier Martinez Canillas <javierm@redhat.com>
---

 fs/fat/namei_vfat.c | 21 ++++++++++++++-------
 1 file changed, 14 insertions(+), 7 deletions(-)

diff --git a/fs/fat/namei_vfat.c b/fs/fat/namei_vfat.c
index c573314806cf..88ccb2ee3537 100644
--- a/fs/fat/namei_vfat.c
+++ b/fs/fat/namei_vfat.c
@@ -889,9 +889,8 @@ static int vfat_mkdir(struct user_namespace *mnt_userns, struct inode *dir,
 	return err;
 }
 
-static int vfat_rename(struct user_namespace *mnt_userns, struct inode *old_dir,
-		       struct dentry *old_dentry, struct inode *new_dir,
-		       struct dentry *new_dentry, unsigned int flags)
+static int vfat_rename(struct inode *old_dir, struct dentry *old_dentry,
+		       struct inode *new_dir, struct dentry *new_dentry)
 {
 	struct buffer_head *dotdot_bh;
 	struct msdos_dir_entry *dotdot_de;
@@ -902,9 +901,6 @@ static int vfat_rename(struct user_namespace *mnt_userns, struct inode *old_dir,
 	int err, is_dir, update_dotdot, corrupt = 0;
 	struct super_block *sb = old_dir->i_sb;
 
-	if (flags & ~RENAME_NOREPLACE)
-		return -EINVAL;
-
 	old_sinfo.bh = sinfo.bh = dotdot_bh = NULL;
 	old_inode = d_inode(old_dentry);
 	new_inode = d_inode(new_dentry);
@@ -1021,13 +1017,24 @@ static int vfat_rename(struct user_namespace *mnt_userns, struct inode *old_dir,
 	goto out;
 }
 
+static int vfat_rename2(struct user_namespace *mnt_userns, struct inode *old_dir,
+			struct dentry *old_dentry, struct inode *new_dir,
+			struct dentry *new_dentry, unsigned int flags)
+{
+	if (flags & ~RENAME_NOREPLACE)
+		return -EINVAL;
+
+	/* VFS already handled RENAME_NOREPLACE, handle it as a normal rename */
+	return vfat_rename(old_dir, old_dentry, new_dir, new_dentry);
+}
+
 static const struct inode_operations vfat_dir_inode_operations = {
 	.create		= vfat_create,
 	.lookup		= vfat_lookup,
 	.unlink		= vfat_unlink,
 	.mkdir		= vfat_mkdir,
 	.rmdir		= vfat_rmdir,
-	.rename		= vfat_rename,
+	.rename		= vfat_rename2,
 	.setattr	= fat_setattr,
 	.getattr	= fat_getattr,
 	.update_time	= fat_update_time,
-- 
2.35.1


^ permalink raw reply related	[flat|nested] 10+ messages in thread

* [RFC PATCH 2/3] fat: add renameat2 RENAME_EXCHANGE flag support
  2022-05-19  9:23 [RFC PATCH 0/3] fat: add support for the renameat2 RENAME_EXCHANGE flag Javier Martinez Canillas
  2022-05-19  9:23 ` [RFC PATCH 1/3] fat: add a vfat_rename2() and make existing .rename callback a helper Javier Martinez Canillas
@ 2022-05-19  9:23 ` Javier Martinez Canillas
  2022-05-22 17:42   ` OGAWA Hirofumi
  2022-05-23 10:40   ` Colin Walters
  2022-05-19  9:23 ` [RFC PATCH 3/3] selftests/filesystems: add a vfat RENAME_EXCHANGE test Javier Martinez Canillas
  2 siblings, 2 replies; 10+ messages in thread
From: Javier Martinez Canillas @ 2022-05-19  9:23 UTC (permalink / raw)
  To: linux-kernel
  Cc: Colin Walters, Peter Jones, Alexander Larsson, Alberto Ruiz,
	Christian Kellner, Lennart Poettering, Chung-Chiang Cheng,
	Javier Martinez Canillas, OGAWA Hirofumi

The renameat2 RENAME_EXCHANGE flag allows to atomically exchange two paths
but is currently not supported by the Linux vfat filesystem driver.

Add a vfat_rename_exchange() helper function that implements this support.

The super block lock is acquired during the operation to ensure atomicity,
and in the error path actions made are reversed also with the mutex held,
making the whole operation transactional.

Signed-off-by: Javier Martinez Canillas <javierm@redhat.com>
---

 fs/fat/namei_vfat.c | 153 +++++++++++++++++++++++++++++++++++++++++++-
 1 file changed, 152 insertions(+), 1 deletion(-)

diff --git a/fs/fat/namei_vfat.c b/fs/fat/namei_vfat.c
index 88ccb2ee3537..6415a59eed13 100644
--- a/fs/fat/namei_vfat.c
+++ b/fs/fat/namei_vfat.c
@@ -1017,13 +1017,164 @@ static int vfat_rename(struct inode *old_dir, struct dentry *old_dentry,
 	goto out;
 }
 
+static int vfat_rename_exchange(struct inode *old_dir, struct dentry *old_dentry,
+				struct inode *new_dir, struct dentry *new_dentry)
+{
+	struct buffer_head *old_dotdot_bh = NULL, *new_dotdot_bh = NULL;
+	struct msdos_dir_entry *old_dotdot_de = NULL, *new_dotdot_de = NULL;
+	struct inode *old_inode, *new_inode;
+	struct timespec64 ts = current_time(old_dir);
+	loff_t old_i_pos, new_i_pos;
+	int err, corrupt = 0;
+	struct super_block *sb = old_dir->i_sb;
+
+	old_inode = d_inode(old_dentry);
+	new_inode = d_inode(new_dentry);
+
+	/* Acquire super block lock for the operation to be atomic */
+	mutex_lock(&MSDOS_SB(sb)->s_lock);
+
+	/* if directories are not the same, get ".." info to update */
+	if (old_dir != new_dir) {
+		if (S_ISDIR(old_inode->i_mode))
+			if (fat_get_dotdot_entry(old_inode, &old_dotdot_bh, &old_dotdot_de)) {
+				err = -EIO;
+				goto out;
+			}
+
+		if (S_ISDIR(new_inode->i_mode))
+			if (fat_get_dotdot_entry(new_inode, &new_dotdot_bh, &new_dotdot_de)) {
+				err = -EIO;
+				goto out;
+			}
+	}
+
+	/* exchange the two dentries */
+	old_i_pos = MSDOS_I(old_inode)->i_pos;
+	new_i_pos = MSDOS_I(new_inode)->i_pos;
+
+	fat_detach(old_inode);
+	fat_detach(new_inode);
+
+	fat_attach(old_inode, new_i_pos);
+	fat_attach(new_inode, old_i_pos);
+
+	if (IS_DIRSYNC(old_dir)) {
+		err = fat_sync_inode(new_inode);
+		if (err)
+			goto error_exchange;
+	} else {
+		mark_inode_dirty(new_inode);
+	}
+
+	if (IS_DIRSYNC(new_dir)) {
+		err = fat_sync_inode(old_inode);
+		if (err)
+			goto error_exchange;
+	} else {
+		mark_inode_dirty(old_inode);
+	}
+
+	/* update ".." directory entry info */
+	if (old_dotdot_de) {
+		fat_set_start(old_dotdot_de, MSDOS_I(new_dir)->i_logstart);
+		mark_buffer_dirty_inode(old_dotdot_bh, old_inode);
+		if (IS_DIRSYNC(new_dir)) {
+			err = sync_dirty_buffer(old_dotdot_bh);
+			if (err)
+				goto error_old_dotdot;
+		}
+		drop_nlink(old_dir);
+		inc_nlink(new_dir);
+	}
+
+	if (new_dotdot_de) {
+		fat_set_start(new_dotdot_de, MSDOS_I(old_dir)->i_logstart);
+		mark_buffer_dirty_inode(new_dotdot_bh, new_inode);
+		if (IS_DIRSYNC(old_dir)) {
+			err = sync_dirty_buffer(new_dotdot_bh);
+			if (err)
+				goto error_new_dotdot;
+		}
+		drop_nlink(new_dir);
+		inc_nlink(old_dir);
+	}
+
+	/* update inode version and timestamps */
+	inode_inc_iversion(old_dir);
+	inode_inc_iversion(new_dir);
+	inode_inc_iversion(old_inode);
+	inode_inc_iversion(new_inode);
+
+	fat_truncate_time(old_dir, &ts, S_CTIME | S_MTIME);
+	fat_truncate_time(new_dir, &ts, S_CTIME | S_MTIME);
+
+	if (IS_DIRSYNC(old_dir))
+		(void)fat_sync_inode(old_dir);
+	else
+		mark_inode_dirty(old_dir);
+
+	if (IS_DIRSYNC(new_dir))
+		(void)fat_sync_inode(new_dir);
+	else
+		mark_inode_dirty(new_dir);
+out:
+	brelse(old_dotdot_bh);
+	brelse(new_dotdot_bh);
+	mutex_unlock(&MSDOS_SB(sb)->s_lock);
+
+	return err;
+
+error_new_dotdot:
+	/* data cluster is shared, serious corruption */
+	corrupt = 1;
+
+	if (new_dotdot_de) {
+		fat_set_start(new_dotdot_de, MSDOS_I(new_dir)->i_logstart);
+		mark_buffer_dirty_inode(new_dotdot_bh, new_inode);
+		corrupt |= sync_dirty_buffer(new_dotdot_bh);
+	}
+
+error_old_dotdot:
+	/* data cluster is shared, serious corruption */
+	corrupt = 1;
+
+	if (old_dotdot_de) {
+		fat_set_start(old_dotdot_de, MSDOS_I(old_dir)->i_logstart);
+		mark_buffer_dirty_inode(old_dotdot_bh, old_inode);
+		corrupt |= sync_dirty_buffer(old_dotdot_bh);
+	}
+
+error_exchange:
+	fat_detach(old_inode);
+	fat_detach(new_inode);
+
+	fat_attach(old_inode, old_i_pos);
+	fat_attach(new_inode, new_i_pos);
+
+	if (corrupt) {
+		corrupt |= fat_sync_inode(old_inode);
+		corrupt |= fat_sync_inode(new_inode);
+	}
+
+	if (corrupt < 0) {
+		fat_fs_error(new_dir->i_sb,
+			     "%s: Filesystem corrupted (i_pos %lld, %lld)",
+			     __func__, old_i_pos, new_i_pos);
+	}
+	goto out;
+}
+
 static int vfat_rename2(struct user_namespace *mnt_userns, struct inode *old_dir,
 			struct dentry *old_dentry, struct inode *new_dir,
 			struct dentry *new_dentry, unsigned int flags)
 {
-	if (flags & ~RENAME_NOREPLACE)
+	if (flags & ~(RENAME_NOREPLACE | RENAME_EXCHANGE))
 		return -EINVAL;
 
+	if (flags & RENAME_EXCHANGE)
+		return vfat_rename_exchange(old_dir, old_dentry, new_dir, new_dentry);
+
 	/* VFS already handled RENAME_NOREPLACE, handle it as a normal rename */
 	return vfat_rename(old_dir, old_dentry, new_dir, new_dentry);
 }
-- 
2.35.1


^ permalink raw reply related	[flat|nested] 10+ messages in thread

* [RFC PATCH 3/3] selftests/filesystems: add a vfat RENAME_EXCHANGE test
  2022-05-19  9:23 [RFC PATCH 0/3] fat: add support for the renameat2 RENAME_EXCHANGE flag Javier Martinez Canillas
  2022-05-19  9:23 ` [RFC PATCH 1/3] fat: add a vfat_rename2() and make existing .rename callback a helper Javier Martinez Canillas
  2022-05-19  9:23 ` [RFC PATCH 2/3] fat: add renameat2 RENAME_EXCHANGE flag support Javier Martinez Canillas
@ 2022-05-19  9:23 ` Javier Martinez Canillas
  2022-05-21 13:38   ` Muhammad Usama Anjum
  2 siblings, 1 reply; 10+ messages in thread
From: Javier Martinez Canillas @ 2022-05-19  9:23 UTC (permalink / raw)
  To: linux-kernel
  Cc: Colin Walters, Peter Jones, Alexander Larsson, Alberto Ruiz,
	Christian Kellner, Lennart Poettering, Chung-Chiang Cheng,
	Javier Martinez Canillas, OGAWA Hirofumi, Shuah Khan,
	linux-kselftest

Add a test for the renameat2 RENAME_EXCHANGE support in vfat, but split it
in a tool that just does the rename exchange and a script that is run by
the kselftests framework on `make TARGETS="filesystems/fat" kselftest`.

That way the script can be easily extended to test other file operations.

The script creates a 1 MiB disk image, that is then formated with a vfat
filesystem and mounted using a loop device. That way all file operations
are done on an ephemeral filesystem.

Signed-off-by: Javier Martinez Canillas <javierm@redhat.com>
---

 MAINTAINERS                                   |  1 +
 tools/testing/selftests/Makefile              |  1 +
 .../selftests/filesystems/fat/Makefile        |  7 ++
 .../testing/selftests/filesystems/fat/config  |  2 +
 .../filesystems/fat/rename_exchange.c         | 37 +++++++++
 .../filesystems/fat/run_fat_tests.sh          | 80 +++++++++++++++++++
 6 files changed, 128 insertions(+)
 create mode 100644 tools/testing/selftests/filesystems/fat/Makefile
 create mode 100644 tools/testing/selftests/filesystems/fat/config
 create mode 100644 tools/testing/selftests/filesystems/fat/rename_exchange.c
 create mode 100755 tools/testing/selftests/filesystems/fat/run_fat_tests.sh

diff --git a/MAINTAINERS b/MAINTAINERS
index 4fdbbd6c1984..158771bb7755 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -20841,6 +20841,7 @@ M:	OGAWA Hirofumi <hirofumi@mail.parknet.co.jp>
 S:	Maintained
 F:	Documentation/filesystems/vfat.rst
 F:	fs/fat/
+F:	tools/testing/selftests/filesystems/fat/
 
 VFIO DRIVER
 M:	Alex Williamson <alex.williamson@redhat.com>
diff --git a/tools/testing/selftests/Makefile b/tools/testing/selftests/Makefile
index 0aedcd76cf0f..fc59ad849a90 100644
--- a/tools/testing/selftests/Makefile
+++ b/tools/testing/selftests/Makefile
@@ -16,6 +16,7 @@ TARGETS += exec
 TARGETS += filesystems
 TARGETS += filesystems/binderfs
 TARGETS += filesystems/epoll
+TARGETS += filesystems/fat
 TARGETS += firmware
 TARGETS += fpu
 TARGETS += ftrace
diff --git a/tools/testing/selftests/filesystems/fat/Makefile b/tools/testing/selftests/filesystems/fat/Makefile
new file mode 100644
index 000000000000..93ee73c16828
--- /dev/null
+++ b/tools/testing/selftests/filesystems/fat/Makefile
@@ -0,0 +1,7 @@
+# SPDX-License-Identifier: GPL-2.0
+
+TEST_PROGS := run_fat_tests.sh
+TEST_GEN_PROGS_EXTENDED := rename_exchange
+CFLAGS += -O2 -g -Wall -I../../../../usr/include/
+
+include ../../lib.mk
diff --git a/tools/testing/selftests/filesystems/fat/config b/tools/testing/selftests/filesystems/fat/config
new file mode 100644
index 000000000000..6cf95e787a17
--- /dev/null
+++ b/tools/testing/selftests/filesystems/fat/config
@@ -0,0 +1,2 @@
+CONFIG_BLK_DEV_LOOP=y
+CONFIG_VFAT_FS=y
diff --git a/tools/testing/selftests/filesystems/fat/rename_exchange.c b/tools/testing/selftests/filesystems/fat/rename_exchange.c
new file mode 100644
index 000000000000..e488ad354fce
--- /dev/null
+++ b/tools/testing/selftests/filesystems/fat/rename_exchange.c
@@ -0,0 +1,37 @@
+// SPDX-License-Identifier: GPL-2.0-or-later
+/*
+ * Program that atomically exchanges two paths using
+ * the renameat2() system call RENAME_EXCHANGE flag.
+ *
+ * Copyright 2022 Red Hat Inc.
+ * Author: Javier Martinez Canillas <javierm@redhat.com>
+ */
+
+#define _GNU_SOURCE
+#include <fcntl.h>
+#include <stdio.h>
+#include <stdlib.h>
+
+void print_usage(const char *program)
+{
+	printf("Usage: %s [oldpath] [newpath]\n", program);
+	printf("Atomically exchange oldpath and newpath\n");
+}
+
+int main(int argc, char *argv[])
+{
+	int ret;
+
+	if (argc != 3) {
+		print_usage(argv[0]);
+		exit(EXIT_FAILURE);
+	}
+
+	ret = renameat2(AT_FDCWD, argv[1], AT_FDCWD, argv[2], RENAME_EXCHANGE);
+	if (ret) {
+		perror("rename exchange failed");
+		exit(EXIT_FAILURE);
+	}
+
+	exit(EXIT_SUCCESS);
+}
diff --git a/tools/testing/selftests/filesystems/fat/run_fat_tests.sh b/tools/testing/selftests/filesystems/fat/run_fat_tests.sh
new file mode 100755
index 000000000000..8db49624409f
--- /dev/null
+++ b/tools/testing/selftests/filesystems/fat/run_fat_tests.sh
@@ -0,0 +1,80 @@
+#!/bin/bash
+# SPDX-License-Identifier: GPL-2.0
+#
+# Run filesystem operations tests on an 1 MiB disk image that is formatted with
+# a vfat filesystem and mounted in a temporary directory using a loop device.
+#
+# Copyright 2022 Red Hat Inc.
+# Author: Javier Martinez Canillas <javierm@redhat.com>
+
+set -e
+set -u
+set -o pipefail
+
+BASE_DIR="$(dirname $0)"
+TMP_DIR="$(mktemp -d /tmp/fat_tests_tmp.XXXX)"
+IMG_PATH="${TMP_DIR}/fat.img"
+MNT_PATH="${TMP_DIR}/mnt"
+
+cleanup()
+{
+    mountpoint -q "${MNT_PATH}" && unmount_image
+    rm -rf "${TMP_DIR}"
+}
+trap cleanup SIGINT SIGTERM EXIT
+
+create_loopback()
+{
+    touch "${IMG_PATH}"
+    chattr +C "${IMG_PATH}" >/dev/null 2>&1 || true
+
+    truncate -s 1M "${IMG_PATH}"
+    mkfs.vfat "${IMG_PATH}" >/dev/null 2>&1
+}
+
+mount_image()
+{
+    mkdir -p "${MNT_PATH}"
+    sudo mount -o loop "${IMG_PATH}" "${MNT_PATH}"
+}
+
+rename_exchange_test()
+{
+    local rename_exchange="${BASE_DIR}/rename_exchange"
+    local old_path="${MNT_PATH}/old_file"
+    local new_path="${MNT_PATH}/new_file"
+
+    echo old | sudo tee "${old_path}" >/dev/null 2>&1
+    echo new | sudo tee "${new_path}" >/dev/null 2>&1
+    sudo "${rename_exchange}" "${old_path}" "${new_path}" >/dev/null 2>&1
+    grep new "${old_path}" >/dev/null 2>&1
+    grep old "${new_path}" >/dev/null 2>&1
+}
+
+rename_exchange_subdir_test()
+{
+    local rename_exchange="${BASE_DIR}/rename_exchange"
+    local dir_path="${MNT_PATH}/subdir"
+    local old_path="${MNT_PATH}/old_file"
+    local new_path="${dir_path}/new_file"
+
+    sudo mkdir -p "${dir_path}"
+    echo old | sudo tee "${old_path}" >/dev/null 2>&1
+    echo new | sudo tee "${new_path}" >/dev/null 2>&1
+    sudo "${rename_exchange}" "${old_path}" "${new_path}" >/dev/null 2>&1
+    grep new "${old_path}" >/dev/null 2>&1
+    grep old "${new_path}" >/dev/null 2>&1
+}
+
+unmount_image()
+{
+    sudo umount "${MNT_PATH}" &> /dev/null
+}
+
+create_loopback
+mount_image
+rename_exchange_test
+rename_exchange_subdir_test
+unmount_image
+
+exit 0
-- 
2.35.1


^ permalink raw reply related	[flat|nested] 10+ messages in thread

* Re: [RFC PATCH 3/3] selftests/filesystems: add a vfat RENAME_EXCHANGE test
  2022-05-19  9:23 ` [RFC PATCH 3/3] selftests/filesystems: add a vfat RENAME_EXCHANGE test Javier Martinez Canillas
@ 2022-05-21 13:38   ` Muhammad Usama Anjum
  0 siblings, 0 replies; 10+ messages in thread
From: Muhammad Usama Anjum @ 2022-05-21 13:38 UTC (permalink / raw)
  To: Javier Martinez Canillas, linux-kernel
  Cc: usama.anjum, Colin Walters, Peter Jones, Alexander Larsson,
	Alberto Ruiz, Christian Kellner, Lennart Poettering,
	Chung-Chiang Cheng, OGAWA Hirofumi, Shuah Khan, linux-kselftest

On 5/19/22 2:23 PM, Javier Martinez Canillas wrote:
> diff --git a/tools/testing/selftests/filesystems/fat/Makefile b/tools/testing/selftests/filesystems/fat/Makefile
> new file mode 100644
> index 000000000000..93ee73c16828
> --- /dev/null
> +++ b/tools/testing/selftests/filesystems/fat/Makefile
> @@ -0,0 +1,7 @@
> +# SPDX-License-Identifier: GPL-2.0
> +
> +TEST_PROGS := run_fat_tests.sh
> +TEST_GEN_PROGS_EXTENDED := rename_exchange
Create a .gitignore file and add rename_exchange to a .gitignore like
other tests.

> +CFLAGS += -O2 -g -Wall -I../../../../usr/include/
> +
Include $(KHDR_INCLUDES) instead of "-I../../../../usr/include/" here.

> +include ../../lib.mk

-- 
Muhammad Usama Anjum

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [RFC PATCH 2/3] fat: add renameat2 RENAME_EXCHANGE flag support
  2022-05-19  9:23 ` [RFC PATCH 2/3] fat: add renameat2 RENAME_EXCHANGE flag support Javier Martinez Canillas
@ 2022-05-22 17:42   ` OGAWA Hirofumi
  2022-05-23 15:35     ` Javier Martinez Canillas
  2022-05-23 10:40   ` Colin Walters
  1 sibling, 1 reply; 10+ messages in thread
From: OGAWA Hirofumi @ 2022-05-22 17:42 UTC (permalink / raw)
  To: Javier Martinez Canillas
  Cc: linux-kernel, Colin Walters, Peter Jones, Alexander Larsson,
	Alberto Ruiz, Christian Kellner, Lennart Poettering,
	Chung-Chiang Cheng

Javier Martinez Canillas <javierm@redhat.com> writes:

> The renameat2 RENAME_EXCHANGE flag allows to atomically exchange two paths
> but is currently not supported by the Linux vfat filesystem driver.
>
> Add a vfat_rename_exchange() helper function that implements this support.
>
> The super block lock is acquired during the operation to ensure atomicity,
> and in the error path actions made are reversed also with the mutex held,
> making the whole operation transactional.

I'm not fully reviewed yet though (write order and race), basically
looks like good.

> +	/* if directories are not the same, get ".." info to update */
> +	if (old_dir != new_dir) {
> +		if (S_ISDIR(old_inode->i_mode))
> +			if (fat_get_dotdot_entry(old_inode, &old_dotdot_bh, &old_dotdot_de)) {
> +				err = -EIO;
> +				goto out;
> +			}
> +		if (S_ISDIR(new_inode->i_mode))
> +			if (fat_get_dotdot_entry(new_inode, &new_dotdot_bh, &new_dotdot_de)) {
> +				err = -EIO;
> +				goto out;
> +			}
> +	}

It may not be linux coding style though, please add {}

	if () {
        	...
	}

for non one liner body.

> +	/* update ".." directory entry info */
> +	if (old_dotdot_de) {
> +		fat_set_start(old_dotdot_de, MSDOS_I(new_dir)->i_logstart);
> +		mark_buffer_dirty_inode(old_dotdot_bh, old_inode);
> +		if (IS_DIRSYNC(new_dir)) {
> +			err = sync_dirty_buffer(old_dotdot_bh);
> +			if (err)
> +				goto error_old_dotdot;
> +		}
> +		drop_nlink(old_dir);
> +		inc_nlink(new_dir);
> +	}
> +
> +	if (new_dotdot_de) {
> +		fat_set_start(new_dotdot_de, MSDOS_I(old_dir)->i_logstart);
> +		mark_buffer_dirty_inode(new_dotdot_bh, new_inode);
> +		if (IS_DIRSYNC(old_dir)) {
> +			err = sync_dirty_buffer(new_dotdot_bh);
> +			if (err)
> +				goto error_new_dotdot;
> +		}
> +		drop_nlink(new_dir);
> +		inc_nlink(old_dir);
> +	}

There are some copy&paste codes, for example above, it may be better to
use function and consolidate? If you had some intent, it is ok though.

Thanks.
-- 
OGAWA Hirofumi <hirofumi@mail.parknet.co.jp>

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [RFC PATCH 2/3] fat: add renameat2 RENAME_EXCHANGE flag support
  2022-05-19  9:23 ` [RFC PATCH 2/3] fat: add renameat2 RENAME_EXCHANGE flag support Javier Martinez Canillas
  2022-05-22 17:42   ` OGAWA Hirofumi
@ 2022-05-23 10:40   ` Colin Walters
  2022-05-23 15:34     ` Javier Martinez Canillas
  1 sibling, 1 reply; 10+ messages in thread
From: Colin Walters @ 2022-05-23 10:40 UTC (permalink / raw)
  To: Javier Martinez Canillas, linux-kernel
  Cc: Peter Jones, Alexander Larsson, Alberto Ruiz, Christian Kellner,
	Lennart Poettering, Chung-Chiang Cheng, OGAWA Hirofumi

On Thu, May 19, 2022, at 5:23 AM, Javier Martinez Canillas wrote:
> The renameat2 RENAME_EXCHANGE flag allows to atomically exchange two paths
> but is currently not supported by the Linux vfat filesystem driver.
>
> Add a vfat_rename_exchange() helper function that implements this support.
>
> The super block lock is acquired during the operation to ensure atomicity,
> and in the error path actions made are reversed also with the mutex held,
> making the whole operation transactional.

Transactional with respect to the mounted kernel, but AIUI because vfat does not have journaling, the semantics on hard failure are...unspecified?  Is it possible for example we could see no file at all in the destination path?

This relates to https://github.com/ostreedev/ostree/issues/1951

TL;DR I'd been thinking that in order to have things be maximally robust we need to:

1. Write new desired bootloader config
2. fsync it
3. fsync containing directory (I guess for vfat really, syncfs())
4. remove old config, syncfs()

And here the bootloader would know to prefer the "new" file if it exists, and to delete the old one if it's still present on the next boot.

(Now obviously this is a small patch which will surely be generally useful, e.g. for tools that operate on things like mounted USB sticks, being able to do an atomic exchange at least from the running kernel PoV is just as useful as it is on other "regular" (and journaled) mounted filesystems)

So assuming we have this, I guess the flow could be:

1. rename_exchange(old, new)
2. syncfs()

?  But that's assuming that the implementation of this doesn't e.g. have any "holes" where in theory we could flush an intermediate state.



^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [RFC PATCH 2/3] fat: add renameat2 RENAME_EXCHANGE flag support
  2022-05-23 10:40   ` Colin Walters
@ 2022-05-23 15:34     ` Javier Martinez Canillas
  2022-05-23 17:05       ` OGAWA Hirofumi
  0 siblings, 1 reply; 10+ messages in thread
From: Javier Martinez Canillas @ 2022-05-23 15:34 UTC (permalink / raw)
  To: Colin Walters, linux-kernel
  Cc: Peter Jones, Alexander Larsson, Alberto Ruiz, Christian Kellner,
	Lennart Poettering, Chung-Chiang Cheng, OGAWA Hirofumi

Hello Colin,

Thanks for your feedback.

On 5/23/22 12:40, Colin Walters wrote:
> On Thu, May 19, 2022, at 5:23 AM, Javier Martinez Canillas wrote:
>> The renameat2 RENAME_EXCHANGE flag allows to atomically exchange two paths
>> but is currently not supported by the Linux vfat filesystem driver.
>>
>> Add a vfat_rename_exchange() helper function that implements this support.
>>
>> The super block lock is acquired during the operation to ensure atomicity,
>> and in the error path actions made are reversed also with the mutex held,
>> making the whole operation transactional.
> 
> Transactional with respect to the mounted kernel, but AIUI because vfat does not have journaling, the semantics on hard failure are...unspecified?  Is it possible for example we could see no file at all in the destination path?
>

That's correct, it's transactional within the constraints imposed by vfat.
That is, there's no journal replay that would be done if something gets
corrupted in the filesystem.

But I believe that's also true with any journaled filesystem and GRUB too?
Since GRUB doesn't mount filesystems but just attempt to read it without
trying to do any journal replay. Even if is able to detect that something
is wrong with the filesystem, it just tries in an best effort basis, i.e:

https://git.savannah.gnu.org/cgit/grub.git/commit/?id=777276063e2

About the semantics for a hard failure, that's not documented in the man
page for the renameat(2) system call but what most filesystems do AFAICT
is revert the operation if possible and print an error.

I don't think that not having a file at all at destination is a possible
outcome of a failure since the function does a detach, attach and sync
and only the sync can fail.

If the sync fails, then the detach/attach are reverted and another sync
is attempted. If this succeeds, then the old state would be preserved
and if it fails, then no sync was made so it should be good too I think.

But I'm not a filesystem expert so maybe someone else more familiar with
vfat and filesystems in general could chime in.

> This relates to https://github.com/ostreedev/ostree/issues/1951
> 
> TL;DR I'd been thinking that in order to have things be maximally robust we need to:
> 
> 1. Write new desired bootloader config
> 2. fsync it
> 3. fsync containing directory (I guess for vfat really, syncfs())
> 4. remove old config, syncfs()
>

Yes, I've seen that issue before but I (wrongly) understood that it was a
way to workaround the lack of renameat2(..., RENAME_EXCHANGE) in vfat. On
a second read I see that you also mention the journaled fs writes vs no
replay in the bootloader issue that I mentioned above. So it makes sense
to do the two phase commit even for journaled filesystems.
 
> And here the bootloader would know to prefer the "new" file if it exists, and to delete the old one if it's still present on the next boot.
>

This is the disadvantage of this approach, that then we will need to make
all bootloaders aware of the two phase commit as well. I'm OK with that but
then I believe that we should document the expectations clearly as a part
of the https://systemd.io/BOOT_LOADER_SPECIFICATION/.

Anyways, I don't think this is the place to discuss this though and we should
just focus on the actual kernel patches :) 
 
> (Now obviously this is a small patch which will surely be generally useful, e.g. for tools that operate on things like mounted USB sticks, being able to do an atomic exchange at least from the running kernel PoV is just as useful as it is on other "regular" (and journaled) mounted filesystems)
>

Agreed. I think that it wouldn't hurt to have this implementation in vfat.
 
> So assuming we have this, I guess the flow could be:
> 
> 1. rename_exchange(old, new)
> 2. syncfs()
>

Correct. In fact, Alex pointed me out that I should do sync in the test too
before checking that the rename succeeded. I was mostly interested that the
logic worked even if only the in-memory representation or page cache was
used. But I've added a `sudo sync -f "${MNT_PATH}"` for the next iteration.
 
> ?  But that's assuming that the implementation of this doesn't e.g. have any "holes" where in theory we could flush an intermediate state.
> 

Ogawa said that didn't fully review it yet but gave useful feedback that I
will also address in the next version. As said, is my first contribution to
a filesystem driver so it would be good if people with more experience can
let me know if there are holes in the implementation.

-- 
Best regards,

Javier Martinez Canillas
Linux Engineering
Red Hat


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [RFC PATCH 2/3] fat: add renameat2 RENAME_EXCHANGE flag support
  2022-05-22 17:42   ` OGAWA Hirofumi
@ 2022-05-23 15:35     ` Javier Martinez Canillas
  0 siblings, 0 replies; 10+ messages in thread
From: Javier Martinez Canillas @ 2022-05-23 15:35 UTC (permalink / raw)
  To: OGAWA Hirofumi
  Cc: linux-kernel, Colin Walters, Peter Jones, Alexander Larsson,
	Alberto Ruiz, Christian Kellner, Lennart Poettering,
	Chung-Chiang Cheng

Hello OGAWA,

Thanks a lot for your feedback.

On 5/22/22 19:42, OGAWA Hirofumi wrote:
> Javier Martinez Canillas <javierm@redhat.com> writes:
> 
>> The renameat2 RENAME_EXCHANGE flag allows to atomically exchange two paths
>> but is currently not supported by the Linux vfat filesystem driver.
>>
>> Add a vfat_rename_exchange() helper function that implements this support.
>>
>> The super block lock is acquired during the operation to ensure atomicity,
>> and in the error path actions made are reversed also with the mutex held,
>> making the whole operation transactional.
> 
> I'm not fully reviewed yet though (write order and race), basically
> looks like good.
> 

Thanks for looking at the patch. I agree with all your remarks and will
address them in v2. Please let me know once you have reviewed if is OK
from a write order and race point of view.

-- 
Best regards,

Javier Martinez Canillas
Linux Engineering
Red Hat


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [RFC PATCH 2/3] fat: add renameat2 RENAME_EXCHANGE flag support
  2022-05-23 15:34     ` Javier Martinez Canillas
@ 2022-05-23 17:05       ` OGAWA Hirofumi
  0 siblings, 0 replies; 10+ messages in thread
From: OGAWA Hirofumi @ 2022-05-23 17:05 UTC (permalink / raw)
  To: Javier Martinez Canillas
  Cc: Colin Walters, linux-kernel, Peter Jones, Alexander Larsson,
	Alberto Ruiz, Christian Kellner, Lennart Poettering,
	Chung-Chiang Cheng

Javier Martinez Canillas <javierm@redhat.com> writes:

>> So assuming we have this, I guess the flow could be:
>> 
>> 1. rename_exchange(old, new)
>> 2. syncfs()
>>
>
> Correct. In fact, Alex pointed me out that I should do sync in the test too
> before checking that the rename succeeded. I was mostly interested that the
> logic worked even if only the in-memory representation or page cache was
> used. But I've added a `sudo sync -f "${MNT_PATH}"` for the next iteration.
>  
>> ?  But that's assuming that the implementation of this doesn't e.g. have any "holes" where in theory we could flush an intermediate state.
>> 
>
> Ogawa said that didn't fully review it yet but gave useful feedback that I
> will also address in the next version. As said, is my first contribution to
> a filesystem driver so it would be good if people with more experience can
> let me know if there are holes in the implementation.

I'm not reading emails about ostree and stuff, so I may not understand
the issue though. If you are expecting the atomics on disk (not
in-core), rename exchange can't provide atomics on vfat without non
standard extension like adding journal or such. And even any syncfs(2)
can't prevent rename corruption, syncfs(2) can just only minimize the
race window.

If power failure happened on rename exchange, the file may lost in worst
case. (If had journal, file can recover to before or after rename
exchange while journal replay, but as you know vfat can't)

Thanks.
-- 
OGAWA Hirofumi <hirofumi@mail.parknet.co.jp>

^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2022-05-23 17:07 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-05-19  9:23 [RFC PATCH 0/3] fat: add support for the renameat2 RENAME_EXCHANGE flag Javier Martinez Canillas
2022-05-19  9:23 ` [RFC PATCH 1/3] fat: add a vfat_rename2() and make existing .rename callback a helper Javier Martinez Canillas
2022-05-19  9:23 ` [RFC PATCH 2/3] fat: add renameat2 RENAME_EXCHANGE flag support Javier Martinez Canillas
2022-05-22 17:42   ` OGAWA Hirofumi
2022-05-23 15:35     ` Javier Martinez Canillas
2022-05-23 10:40   ` Colin Walters
2022-05-23 15:34     ` Javier Martinez Canillas
2022-05-23 17:05       ` OGAWA Hirofumi
2022-05-19  9:23 ` [RFC PATCH 3/3] selftests/filesystems: add a vfat RENAME_EXCHANGE test Javier Martinez Canillas
2022-05-21 13:38   ` Muhammad Usama Anjum

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.