linux-erofs.lists.ozlabs.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v2 0/3] erofs: iomap support for uncompressed cases
@ 2021-07-30 19:46 Gao Xiang
  2021-07-30 19:46 ` [PATCH v2 1/3] erofs: iomap support for non-tailpacking DIO Gao Xiang
                   ` (2 more replies)
  0 siblings, 3 replies; 11+ messages in thread
From: Gao Xiang @ 2021-07-30 19:46 UTC (permalink / raw)
  To: linux-erofs
  Cc: nvdimm, Darrick J. Wong, LKML, Joseph Qi, Liu Bo, Tao Ma,
	linux-fsdevel, Liu Jiang

Hi folks,

This patchset mainly adds EROFS iomap support for uncompressed cases
I've planed for the next merge window.

The first 2 patches mainly deal with 2 new cases:
1) Direct I/O is useful in certain scenarios for uncompressed files.
For example, double pagecache can be avoid by direct I/O when loop
device is used for uncompressed files containing upper layer
compressed filesystem.

2) DAX is quite useful for some container use cases in order to save
guest memory extremely by using the minimal lightweight EROFS image.
BTW, a bit more off this iomap topic, chunk-deduplicated regfile
support is almost available (blob data support) for multi-layer
container image use cases (aka. called RAFS v6, nydus [1] will support
RAFS v6 (EROFS-compatible format) in the future and form a unified
high-performance container image solution, which will be announced
formally independently), which is also a small independent update.

The last patch relies on the previous iomap core update in order to
convert tail-packing inline files into iomap, thus all uncompressed
cases are handled with iomap properly.

Comments are welcome. Thanks for your time on reading this!

Thanks,
Gao Xiang

[1] https://github.com/dragonflyoss/image-service

changes since v1:
 - mainly resend with commit message & comments update.

Gao Xiang (2):
  erofs: dax support for non-tailpacking regular file
  erofs: convert all uncompressed cases to iomap

Huang Jianan (1):
  erofs: iomap support for non-tailpacking DIO

 fs/erofs/Kconfig    |   1 +
 fs/erofs/data.c     | 342 +++++++++++++++++++-------------------------
 fs/erofs/inode.c    |   9 +-
 fs/erofs/internal.h |   4 +
 fs/erofs/super.c    |  60 +++++++-
 5 files changed, 217 insertions(+), 199 deletions(-)

-- 
2.24.4


^ permalink raw reply	[flat|nested] 11+ messages in thread

* [PATCH v2 1/3] erofs: iomap support for non-tailpacking DIO
  2021-07-30 19:46 [PATCH v2 0/3] erofs: iomap support for uncompressed cases Gao Xiang
@ 2021-07-30 19:46 ` Gao Xiang
  2021-08-04  2:57   ` Chao Yu
  2021-07-30 19:46 ` [PATCH v2 2/3] erofs: dax support for non-tailpacking regular file Gao Xiang
  2021-07-30 19:46 ` [PATCH v2 3/3] erofs: convert all uncompressed cases to iomap Gao Xiang
  2 siblings, 1 reply; 11+ messages in thread
From: Gao Xiang @ 2021-07-30 19:46 UTC (permalink / raw)
  To: linux-erofs
  Cc: nvdimm, Darrick J. Wong, LKML, Joseph Qi, Liu Bo, Tao Ma,
	linux-fsdevel, Liu Jiang

From: Huang Jianan <huangjianan@oppo.com>

Add iomap support for non-tailpacking uncompressed data in order to
support DIO and DAX.

Direct I/O is useful in certain scenarios for uncompressed files.
For example, double pagecache can be avoid by direct I/O when
loop device is used for uncompressed files containing upper layer
compressed filesystem.

This adds iomap DIO support for non-tailpacking cases first and
tail-packing inline files are handled in the follow-up patch.

Cc: linux-fsdevel@vger.kernel.org
Signed-off-by: Huang Jianan <huangjianan@oppo.com>
Signed-off-by: Gao Xiang <hsiangkao@linux.alibaba.com>
---
 fs/erofs/Kconfig    |   1 +
 fs/erofs/data.c     | 102 ++++++++++++++++++++++++++++++++++++++++++++
 fs/erofs/inode.c    |   5 ++-
 fs/erofs/internal.h |   1 +
 4 files changed, 108 insertions(+), 1 deletion(-)

diff --git a/fs/erofs/Kconfig b/fs/erofs/Kconfig
index 906af0c1998c..14b747026742 100644
--- a/fs/erofs/Kconfig
+++ b/fs/erofs/Kconfig
@@ -3,6 +3,7 @@
 config EROFS_FS
 	tristate "EROFS filesystem support"
 	depends on BLOCK
+	select FS_IOMAP
 	select LIBCRC32C
 	help
 	  EROFS (Enhanced Read-Only File System) is a lightweight
diff --git a/fs/erofs/data.c b/fs/erofs/data.c
index 3787a5fb0a42..1f97151a9f90 100644
--- a/fs/erofs/data.c
+++ b/fs/erofs/data.c
@@ -5,6 +5,7 @@
  */
 #include "internal.h"
 #include <linux/prefetch.h>
+#include <linux/iomap.h>
 
 #include <trace/events/erofs.h>
 
@@ -308,9 +309,110 @@ static sector_t erofs_bmap(struct address_space *mapping, sector_t block)
 	return 0;
 }
 
+static int erofs_iomap_begin(struct inode *inode, loff_t offset, loff_t length,
+		unsigned int flags, struct iomap *iomap, struct iomap *srcmap)
+{
+	int ret;
+	struct erofs_map_blocks map;
+
+	map.m_la = offset;
+	map.m_llen = length;
+
+	ret = erofs_map_blocks_flatmode(inode, &map, EROFS_GET_BLOCKS_RAW);
+	if (ret < 0)
+		return ret;
+
+	iomap->bdev = inode->i_sb->s_bdev;
+	iomap->offset = map.m_la;
+	iomap->length = map.m_llen;
+	iomap->flags = 0;
+
+	if (!(map.m_flags & EROFS_MAP_MAPPED)) {
+		iomap->type = IOMAP_HOLE;
+		iomap->addr = IOMAP_NULL_ADDR;
+		if (!iomap->length)
+			iomap->length = length;
+		return 0;
+	}
+
+	/* that shouldn't happen for now */
+	if (map.m_flags & EROFS_MAP_META) {
+		DBG_BUGON(1);
+		return -ENOTBLK;
+	}
+	iomap->type = IOMAP_MAPPED;
+	iomap->addr = map.m_pa;
+	return 0;
+}
+
+const struct iomap_ops erofs_iomap_ops = {
+	.iomap_begin = erofs_iomap_begin,
+};
+
+static int erofs_prepare_dio(struct kiocb *iocb, struct iov_iter *to)
+{
+	struct inode *inode = file_inode(iocb->ki_filp);
+	loff_t align = iocb->ki_pos | iov_iter_count(to) |
+		iov_iter_alignment(to);
+	struct block_device *bdev = inode->i_sb->s_bdev;
+	unsigned int blksize_mask;
+
+	if (bdev)
+		blksize_mask = (1 << ilog2(bdev_logical_block_size(bdev))) - 1;
+	else
+		blksize_mask = (1 << inode->i_blkbits) - 1;
+
+	if (align & blksize_mask)
+		return -EINVAL;
+
+	/*
+	 * Temporarily fall back tail-packing inline to buffered I/O instead
+	 * since tail-packing inline support relies on an iomap core update.
+	 */
+	if (EROFS_I(inode)->datalayout == EROFS_INODE_FLAT_INLINE &&
+	    iocb->ki_pos + iov_iter_count(to) >
+			rounddown(inode->i_size, EROFS_BLKSIZ))
+		return 1;
+	return 0;
+}
+
+static ssize_t erofs_file_read_iter(struct kiocb *iocb, struct iov_iter *to)
+{
+	/* no need taking (shared) inode lock since it's a ro filesystem */
+	if (!iov_iter_count(to))
+		return 0;
+
+	if (iocb->ki_flags & IOCB_DIRECT) {
+		int err = erofs_prepare_dio(iocb, to);
+
+		if (!err)
+			return iomap_dio_rw(iocb, to, &erofs_iomap_ops,
+					    NULL, 0);
+		if (err < 0)
+			return err;
+		/*
+		 * Fallback to buffered I/O if the operation being performed on
+		 * the inode is not supported by direct I/O. The IOCB_DIRECT
+		 * flag needs to be cleared here in order to ensure that the
+		 * direct I/O path within generic_file_read_iter() is not
+		 * taken.
+		 */
+		iocb->ki_flags &= ~IOCB_DIRECT;
+	}
+	return generic_file_read_iter(iocb, to);
+}
+
 /* for uncompressed (aligned) files and raw access for other files */
 const struct address_space_operations erofs_raw_access_aops = {
 	.readpage = erofs_raw_access_readpage,
 	.readahead = erofs_raw_access_readahead,
 	.bmap = erofs_bmap,
+	.direct_IO = noop_direct_IO,
+};
+
+const struct file_operations erofs_file_fops = {
+	.llseek		= generic_file_llseek,
+	.read_iter	= erofs_file_read_iter,
+	.mmap		= generic_file_readonly_mmap,
+	.splice_read	= generic_file_splice_read,
 };
diff --git a/fs/erofs/inode.c b/fs/erofs/inode.c
index aa8a0d770ba3..00edb7562fea 100644
--- a/fs/erofs/inode.c
+++ b/fs/erofs/inode.c
@@ -247,7 +247,10 @@ static int erofs_fill_inode(struct inode *inode, int isdir)
 	switch (inode->i_mode & S_IFMT) {
 	case S_IFREG:
 		inode->i_op = &erofs_generic_iops;
-		inode->i_fop = &generic_ro_fops;
+		if (!erofs_inode_is_data_compressed(vi->datalayout))
+			inode->i_fop = &erofs_file_fops;
+		else
+			inode->i_fop = &generic_ro_fops;
 		break;
 	case S_IFDIR:
 		inode->i_op = &erofs_dir_iops;
diff --git a/fs/erofs/internal.h b/fs/erofs/internal.h
index 543c2ff97d30..2669c785d548 100644
--- a/fs/erofs/internal.h
+++ b/fs/erofs/internal.h
@@ -371,6 +371,7 @@ static inline int z_erofs_map_blocks_iter(struct inode *inode,
 #endif	/* !CONFIG_EROFS_FS_ZIP */
 
 /* data.c */
+extern const struct file_operations erofs_file_fops;
 struct page *erofs_get_meta_page(struct super_block *sb, erofs_blk_t blkaddr);
 
 /* inode.c */
-- 
2.24.4


^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [PATCH v2 2/3] erofs: dax support for non-tailpacking regular file
  2021-07-30 19:46 [PATCH v2 0/3] erofs: iomap support for uncompressed cases Gao Xiang
  2021-07-30 19:46 ` [PATCH v2 1/3] erofs: iomap support for non-tailpacking DIO Gao Xiang
@ 2021-07-30 19:46 ` Gao Xiang
  2021-08-04  7:14   ` Chao Yu
  2021-07-30 19:46 ` [PATCH v2 3/3] erofs: convert all uncompressed cases to iomap Gao Xiang
  2 siblings, 1 reply; 11+ messages in thread
From: Gao Xiang @ 2021-07-30 19:46 UTC (permalink / raw)
  To: linux-erofs
  Cc: nvdimm, Darrick J. Wong, LKML, Joseph Qi, Liu Bo, Tao Ma,
	linux-fsdevel, Liu Jiang

DAX is quite useful for some VM use cases in order to save guest
memory extremely with minimal lightweight EROFS.

In order to prepare for such use cases, add preliminary dax support
for non-tailpacking regular files for now.

Tested with the DRAM-emulated PMEM and the EROFS image generated by
"mkfs.erofs -Enoinline_data enwik9.fsdax.img enwik9"

Cc: nvdimm@lists.linux.dev
Cc: linux-fsdevel@vger.kernel.org
Signed-off-by: Gao Xiang <hsiangkao@linux.alibaba.com>
---
 fs/erofs/data.c     | 42 +++++++++++++++++++++++++++++--
 fs/erofs/inode.c    |  4 +++
 fs/erofs/internal.h |  3 +++
 fs/erofs/super.c    | 60 +++++++++++++++++++++++++++++++++++++++++++--
 4 files changed, 105 insertions(+), 4 deletions(-)

diff --git a/fs/erofs/data.c b/fs/erofs/data.c
index 1f97151a9f90..911521293b20 100644
--- a/fs/erofs/data.c
+++ b/fs/erofs/data.c
@@ -6,7 +6,7 @@
 #include "internal.h"
 #include <linux/prefetch.h>
 #include <linux/iomap.h>
-
+#include <linux/dax.h>
 #include <trace/events/erofs.h>
 
 static void erofs_readendio(struct bio *bio)
@@ -323,6 +323,7 @@ static int erofs_iomap_begin(struct inode *inode, loff_t offset, loff_t length,
 		return ret;
 
 	iomap->bdev = inode->i_sb->s_bdev;
+	iomap->dax_dev = EROFS_I_SB(inode)->dax_dev;
 	iomap->offset = map.m_la;
 	iomap->length = map.m_llen;
 	iomap->flags = 0;
@@ -382,6 +383,10 @@ static ssize_t erofs_file_read_iter(struct kiocb *iocb, struct iov_iter *to)
 	if (!iov_iter_count(to))
 		return 0;
 
+#ifdef CONFIG_FS_DAX
+	if (IS_DAX(iocb->ki_filp->f_mapping->host))
+		return dax_iomap_rw(iocb, to, &erofs_iomap_ops);
+#endif
 	if (iocb->ki_flags & IOCB_DIRECT) {
 		int err = erofs_prepare_dio(iocb, to);
 
@@ -410,9 +415,42 @@ const struct address_space_operations erofs_raw_access_aops = {
 	.direct_IO = noop_direct_IO,
 };
 
+#ifdef CONFIG_FS_DAX
+static vm_fault_t erofs_dax_huge_fault(struct vm_fault *vmf,
+		enum page_entry_size pe_size)
+{
+	return dax_iomap_fault(vmf, pe_size, NULL, NULL, &erofs_iomap_ops);
+}
+
+static vm_fault_t erofs_dax_fault(struct vm_fault *vmf)
+{
+	return erofs_dax_huge_fault(vmf, PE_SIZE_PTE);
+}
+
+static const struct vm_operations_struct erofs_dax_vm_ops = {
+	.fault		= erofs_dax_fault,
+	.huge_fault	= erofs_dax_huge_fault,
+};
+
+static int erofs_file_mmap(struct file *file, struct vm_area_struct *vma)
+{
+	if (!IS_DAX(file_inode(file)))
+		return generic_file_readonly_mmap(file, vma);
+
+	if ((vma->vm_flags & VM_SHARED) && (vma->vm_flags & VM_MAYWRITE))
+		return -EINVAL;
+
+	vma->vm_ops = &erofs_dax_vm_ops;
+	vma->vm_flags |= VM_HUGEPAGE;
+	return 0;
+}
+#else
+#define erofs_file_mmap	generic_file_readonly_mmap
+#endif
+
 const struct file_operations erofs_file_fops = {
 	.llseek		= generic_file_llseek,
 	.read_iter	= erofs_file_read_iter,
-	.mmap		= generic_file_readonly_mmap,
+	.mmap		= erofs_file_mmap,
 	.splice_read	= generic_file_splice_read,
 };
diff --git a/fs/erofs/inode.c b/fs/erofs/inode.c
index 00edb7562fea..e875fba18159 100644
--- a/fs/erofs/inode.c
+++ b/fs/erofs/inode.c
@@ -174,6 +174,10 @@ static struct page *erofs_read_inode(struct inode *inode,
 	inode->i_mtime.tv_nsec = inode->i_ctime.tv_nsec;
 	inode->i_atime.tv_nsec = inode->i_ctime.tv_nsec;
 
+	inode->i_flags &= ~S_DAX;
+	if (test_opt(&sbi->ctx, DAX_ALWAYS) && S_ISREG(inode->i_mode) &&
+	    vi->datalayout == EROFS_INODE_FLAT_PLAIN)
+		inode->i_flags |= S_DAX;
 	if (!nblks)
 		/* measure inode.i_blocks as generic filesystems */
 		inode->i_blocks = roundup(inode->i_size, EROFS_BLKSIZ) >> 9;
diff --git a/fs/erofs/internal.h b/fs/erofs/internal.h
index 2669c785d548..7c9abfc93109 100644
--- a/fs/erofs/internal.h
+++ b/fs/erofs/internal.h
@@ -83,6 +83,7 @@ struct erofs_sb_info {
 
 	struct erofs_sb_lz4_info lz4;
 #endif	/* CONFIG_EROFS_FS_ZIP */
+	struct dax_device *dax_dev;
 	u32 blocks;
 	u32 meta_blkaddr;
 #ifdef CONFIG_EROFS_FS_XATTR
@@ -115,6 +116,8 @@ struct erofs_sb_info {
 /* Mount flags set via mount options or defaults */
 #define EROFS_MOUNT_XATTR_USER		0x00000010
 #define EROFS_MOUNT_POSIX_ACL		0x00000020
+#define EROFS_MOUNT_DAX_ALWAYS		0x00000040
+#define EROFS_MOUNT_DAX_NEVER		0x00000080
 
 #define clear_opt(ctx, option)	((ctx)->mount_opt &= ~EROFS_MOUNT_##option)
 #define set_opt(ctx, option)	((ctx)->mount_opt |= EROFS_MOUNT_##option)
diff --git a/fs/erofs/super.c b/fs/erofs/super.c
index 8fc6c04b54f4..d5b110fd365d 100644
--- a/fs/erofs/super.c
+++ b/fs/erofs/super.c
@@ -11,6 +11,7 @@
 #include <linux/crc32c.h>
 #include <linux/fs_context.h>
 #include <linux/fs_parser.h>
+#include <linux/dax.h>
 #include "xattr.h"
 
 #define CREATE_TRACE_POINTS
@@ -355,6 +356,8 @@ enum {
 	Opt_user_xattr,
 	Opt_acl,
 	Opt_cache_strategy,
+	Opt_dax,
+	Opt_dax_enum,
 	Opt_err
 };
 
@@ -365,14 +368,47 @@ static const struct constant_table erofs_param_cache_strategy[] = {
 	{}
 };
 
+static const struct constant_table erofs_dax_param_enums[] = {
+	{"always",	EROFS_MOUNT_DAX_ALWAYS},
+	{"never",	EROFS_MOUNT_DAX_NEVER},
+	{}
+};
+
 static const struct fs_parameter_spec erofs_fs_parameters[] = {
 	fsparam_flag_no("user_xattr",	Opt_user_xattr),
 	fsparam_flag_no("acl",		Opt_acl),
 	fsparam_enum("cache_strategy",	Opt_cache_strategy,
 		     erofs_param_cache_strategy),
+	fsparam_flag("dax",             Opt_dax),
+	fsparam_enum("dax",		Opt_dax_enum, erofs_dax_param_enums),
 	{}
 };
 
+static bool erofs_fc_set_dax_mode(struct fs_context *fc, unsigned int mode)
+{
+#ifdef CONFIG_FS_DAX
+	struct erofs_fs_context *ctx = fc->fs_private;
+
+	switch (mode) {
+	case EROFS_MOUNT_DAX_ALWAYS:
+		warnfc(fc, "DAX enabled. Warning: EXPERIMENTAL, use at your own risk");
+		set_opt(ctx, DAX_ALWAYS);
+		clear_opt(ctx, DAX_NEVER);
+		return true;
+	case EROFS_MOUNT_DAX_NEVER:
+		set_opt(ctx, DAX_NEVER);
+		clear_opt(ctx, DAX_ALWAYS);
+		return true;
+	default:
+		DBG_BUGON(1);
+		return false;
+	}
+#else
+	errorfc(fc, "dax options not supported");
+	return false;
+#endif
+}
+
 static int erofs_fc_parse_param(struct fs_context *fc,
 				struct fs_parameter *param)
 {
@@ -412,6 +448,14 @@ static int erofs_fc_parse_param(struct fs_context *fc,
 		errorfc(fc, "compression not supported, cache_strategy ignored");
 #endif
 		break;
+	case Opt_dax:
+		if (!erofs_fc_set_dax_mode(fc, EROFS_MOUNT_DAX_ALWAYS))
+			return -EINVAL;
+		break;
+	case Opt_dax_enum:
+		if (!erofs_fc_set_dax_mode(fc, result.uint_32))
+			return -EINVAL;
+		break;
 	default:
 		return -ENOPARAM;
 	}
@@ -496,10 +540,16 @@ static int erofs_fc_fill_super(struct super_block *sb, struct fs_context *fc)
 		return -ENOMEM;
 
 	sb->s_fs_info = sbi;
+	sbi->dax_dev = fs_dax_get_by_bdev(sb->s_bdev);
 	err = erofs_read_superblock(sb);
 	if (err)
 		return err;
 
+	if (test_opt(ctx, DAX_ALWAYS) &&
+	    !bdev_dax_supported(sb->s_bdev, EROFS_BLKSIZ)) {
+		errorfc(fc, "DAX unsupported by block device. Turning off DAX.");
+		clear_opt(ctx, DAX_ALWAYS);
+	}
 	sb->s_flags |= SB_RDONLY | SB_NOATIME;
 	sb->s_maxbytes = MAX_LFS_FILESIZE;
 	sb->s_time_gran = 1;
@@ -609,6 +659,8 @@ static void erofs_kill_sb(struct super_block *sb)
 	sbi = EROFS_SB(sb);
 	if (!sbi)
 		return;
+	if (sbi->dax_dev)
+		fs_put_dax(sbi->dax_dev);
 	kfree(sbi);
 	sb->s_fs_info = NULL;
 }
@@ -711,8 +763,8 @@ static int erofs_statfs(struct dentry *dentry, struct kstatfs *buf)
 
 static int erofs_show_options(struct seq_file *seq, struct dentry *root)
 {
-	struct erofs_sb_info *sbi __maybe_unused = EROFS_SB(root->d_sb);
-	struct erofs_fs_context *ctx __maybe_unused = &sbi->ctx;
+	struct erofs_sb_info *sbi = EROFS_SB(root->d_sb);
+	struct erofs_fs_context *ctx = &sbi->ctx;
 
 #ifdef CONFIG_EROFS_FS_XATTR
 	if (test_opt(ctx, XATTR_USER))
@@ -734,6 +786,10 @@ static int erofs_show_options(struct seq_file *seq, struct dentry *root)
 	else if (ctx->cache_strategy == EROFS_ZIP_CACHE_READAROUND)
 		seq_puts(seq, ",cache_strategy=readaround");
 #endif
+	if (test_opt(ctx, DAX_ALWAYS))
+		seq_puts(seq, ",dax=always");
+	if (test_opt(ctx, DAX_NEVER))
+		seq_puts(seq, ",dax=never");
 	return 0;
 }
 
-- 
2.24.4


^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [PATCH v2 3/3] erofs: convert all uncompressed cases to iomap
  2021-07-30 19:46 [PATCH v2 0/3] erofs: iomap support for uncompressed cases Gao Xiang
  2021-07-30 19:46 ` [PATCH v2 1/3] erofs: iomap support for non-tailpacking DIO Gao Xiang
  2021-07-30 19:46 ` [PATCH v2 2/3] erofs: dax support for non-tailpacking regular file Gao Xiang
@ 2021-07-30 19:46 ` Gao Xiang
  2021-08-04  7:17   ` Chao Yu
  2 siblings, 1 reply; 11+ messages in thread
From: Gao Xiang @ 2021-07-30 19:46 UTC (permalink / raw)
  To: linux-erofs
  Cc: nvdimm, Darrick J. Wong, LKML, Joseph Qi, Liu Bo, Tao Ma,
	linux-fsdevel, Liu Jiang

Since tail-packing inline has been supported by iomap now, let's
convert all EROFS uncompressed data I/O to iomap, which is pretty
straight-forward.

Cc: linux-fsdevel@vger.kernel.org
Signed-off-by: Gao Xiang <hsiangkao@linux.alibaba.com>
---
 fs/erofs/data.c | 288 ++++++++----------------------------------------
 1 file changed, 49 insertions(+), 239 deletions(-)

diff --git a/fs/erofs/data.c b/fs/erofs/data.c
index 911521293b20..6b98156bb5ca 100644
--- a/fs/erofs/data.c
+++ b/fs/erofs/data.c
@@ -9,29 +9,6 @@
 #include <linux/dax.h>
 #include <trace/events/erofs.h>
 
-static void erofs_readendio(struct bio *bio)
-{
-	struct bio_vec *bvec;
-	blk_status_t err = bio->bi_status;
-	struct bvec_iter_all iter_all;
-
-	bio_for_each_segment_all(bvec, bio, iter_all) {
-		struct page *page = bvec->bv_page;
-
-		/* page is already locked */
-		DBG_BUGON(PageUptodate(page));
-
-		if (err)
-			SetPageError(page);
-		else
-			SetPageUptodate(page);
-
-		unlock_page(page);
-		/* page could be reclaimed now */
-	}
-	bio_put(bio);
-}
-
 struct page *erofs_get_meta_page(struct super_block *sb, erofs_blk_t blkaddr)
 {
 	struct address_space *const mapping = sb->s_bdev->bd_inode->i_mapping;
@@ -109,206 +86,6 @@ static int erofs_map_blocks_flatmode(struct inode *inode,
 	return err;
 }
 
-static inline struct bio *erofs_read_raw_page(struct bio *bio,
-					      struct address_space *mapping,
-					      struct page *page,
-					      erofs_off_t *last_block,
-					      unsigned int nblocks,
-					      unsigned int *eblks,
-					      bool ra)
-{
-	struct inode *const inode = mapping->host;
-	struct super_block *const sb = inode->i_sb;
-	erofs_off_t current_block = (erofs_off_t)page->index;
-	int err;
-
-	DBG_BUGON(!nblocks);
-
-	if (PageUptodate(page)) {
-		err = 0;
-		goto has_updated;
-	}
-
-	/* note that for readpage case, bio also equals to NULL */
-	if (bio &&
-	    (*last_block + 1 != current_block || !*eblks)) {
-submit_bio_retry:
-		submit_bio(bio);
-		bio = NULL;
-	}
-
-	if (!bio) {
-		struct erofs_map_blocks map = {
-			.m_la = blknr_to_addr(current_block),
-		};
-		erofs_blk_t blknr;
-		unsigned int blkoff;
-
-		err = erofs_map_blocks_flatmode(inode, &map, EROFS_GET_BLOCKS_RAW);
-		if (err)
-			goto err_out;
-
-		/* zero out the holed page */
-		if (!(map.m_flags & EROFS_MAP_MAPPED)) {
-			zero_user_segment(page, 0, PAGE_SIZE);
-			SetPageUptodate(page);
-
-			/* imply err = 0, see erofs_map_blocks */
-			goto has_updated;
-		}
-
-		/* for RAW access mode, m_plen must be equal to m_llen */
-		DBG_BUGON(map.m_plen != map.m_llen);
-
-		blknr = erofs_blknr(map.m_pa);
-		blkoff = erofs_blkoff(map.m_pa);
-
-		/* deal with inline page */
-		if (map.m_flags & EROFS_MAP_META) {
-			void *vsrc, *vto;
-			struct page *ipage;
-
-			DBG_BUGON(map.m_plen > PAGE_SIZE);
-
-			ipage = erofs_get_meta_page(inode->i_sb, blknr);
-
-			if (IS_ERR(ipage)) {
-				err = PTR_ERR(ipage);
-				goto err_out;
-			}
-
-			vsrc = kmap_atomic(ipage);
-			vto = kmap_atomic(page);
-			memcpy(vto, vsrc + blkoff, map.m_plen);
-			memset(vto + map.m_plen, 0, PAGE_SIZE - map.m_plen);
-			kunmap_atomic(vto);
-			kunmap_atomic(vsrc);
-			flush_dcache_page(page);
-
-			SetPageUptodate(page);
-			/* TODO: could we unlock the page earlier? */
-			unlock_page(ipage);
-			put_page(ipage);
-
-			/* imply err = 0, see erofs_map_blocks */
-			goto has_updated;
-		}
-
-		/* pa must be block-aligned for raw reading */
-		DBG_BUGON(erofs_blkoff(map.m_pa));
-
-		/* max # of continuous pages */
-		if (nblocks > DIV_ROUND_UP(map.m_plen, PAGE_SIZE))
-			nblocks = DIV_ROUND_UP(map.m_plen, PAGE_SIZE);
-
-		*eblks = bio_max_segs(nblocks);
-		bio = bio_alloc(GFP_NOIO, *eblks);
-
-		bio->bi_end_io = erofs_readendio;
-		bio_set_dev(bio, sb->s_bdev);
-		bio->bi_iter.bi_sector = (sector_t)blknr <<
-			LOG_SECTORS_PER_BLOCK;
-		bio->bi_opf = REQ_OP_READ | (ra ? REQ_RAHEAD : 0);
-	}
-
-	err = bio_add_page(bio, page, PAGE_SIZE, 0);
-	/* out of the extent or bio is full */
-	if (err < PAGE_SIZE)
-		goto submit_bio_retry;
-	--*eblks;
-	*last_block = current_block;
-	return bio;
-
-err_out:
-	/* for sync reading, set page error immediately */
-	if (!ra) {
-		SetPageError(page);
-		ClearPageUptodate(page);
-	}
-has_updated:
-	unlock_page(page);
-
-	/* if updated manually, continuous pages has a gap */
-	if (bio)
-		submit_bio(bio);
-	return err ? ERR_PTR(err) : NULL;
-}
-
-/*
- * since we dont have write or truncate flows, so no inode
- * locking needs to be held at the moment.
- */
-static int erofs_raw_access_readpage(struct file *file, struct page *page)
-{
-	erofs_off_t last_block;
-	unsigned int eblks;
-	struct bio *bio;
-
-	trace_erofs_readpage(page, true);
-
-	bio = erofs_read_raw_page(NULL, page->mapping,
-				  page, &last_block, 1, &eblks, false);
-
-	if (IS_ERR(bio))
-		return PTR_ERR(bio);
-
-	if (bio)
-		submit_bio(bio);
-	return 0;
-}
-
-static void erofs_raw_access_readahead(struct readahead_control *rac)
-{
-	erofs_off_t last_block;
-	unsigned int eblks;
-	struct bio *bio = NULL;
-	struct page *page;
-
-	trace_erofs_readpages(rac->mapping->host, readahead_index(rac),
-			readahead_count(rac), true);
-
-	while ((page = readahead_page(rac))) {
-		prefetchw(&page->flags);
-
-		bio = erofs_read_raw_page(bio, rac->mapping, page, &last_block,
-				readahead_count(rac), &eblks, true);
-
-		/* all the page errors are ignored when readahead */
-		if (IS_ERR(bio)) {
-			pr_err("%s, readahead error at page %lu of nid %llu\n",
-			       __func__, page->index,
-			       EROFS_I(rac->mapping->host)->nid);
-
-			bio = NULL;
-		}
-
-		put_page(page);
-	}
-
-	if (bio)
-		submit_bio(bio);
-}
-
-static sector_t erofs_bmap(struct address_space *mapping, sector_t block)
-{
-	struct inode *inode = mapping->host;
-	struct erofs_map_blocks map = {
-		.m_la = blknr_to_addr(block),
-	};
-
-	if (EROFS_I(inode)->datalayout == EROFS_INODE_FLAT_INLINE) {
-		erofs_blk_t blks = i_size_read(inode) >> LOG_BLOCK_SIZE;
-
-		if (block >> LOG_SECTORS_PER_BLOCK >= blks)
-			return 0;
-	}
-
-	if (!erofs_map_blocks_flatmode(inode, &map, EROFS_GET_BLOCKS_RAW))
-		return erofs_blknr(map.m_pa);
-
-	return 0;
-}
-
 static int erofs_iomap_begin(struct inode *inode, loff_t offset, loff_t length,
 		unsigned int flags, struct iomap *iomap, struct iomap *srcmap)
 {
@@ -327,6 +104,7 @@ static int erofs_iomap_begin(struct inode *inode, loff_t offset, loff_t length,
 	iomap->offset = map.m_la;
 	iomap->length = map.m_llen;
 	iomap->flags = 0;
+	iomap->private = NULL;
 
 	if (!(map.m_flags & EROFS_MAP_MAPPED)) {
 		iomap->type = IOMAP_HOLE;
@@ -336,20 +114,61 @@ static int erofs_iomap_begin(struct inode *inode, loff_t offset, loff_t length,
 		return 0;
 	}
 
-	/* that shouldn't happen for now */
 	if (map.m_flags & EROFS_MAP_META) {
-		DBG_BUGON(1);
-		return -ENOTBLK;
+		struct page *ipage;
+
+		iomap->type = IOMAP_INLINE;
+		ipage = erofs_get_meta_page(inode->i_sb,
+					    erofs_blknr(map.m_pa));
+		iomap->inline_data = page_address(ipage) +
+					erofs_blkoff(map.m_pa);
+		iomap->private = ipage;
+	} else {
+		iomap->type = IOMAP_MAPPED;
+		iomap->addr = map.m_pa;
 	}
-	iomap->type = IOMAP_MAPPED;
-	iomap->addr = map.m_pa;
 	return 0;
 }
 
+static int erofs_iomap_end(struct inode *inode, loff_t pos, loff_t length,
+		ssize_t written, unsigned flags, struct iomap *iomap)
+{
+	struct page *ipage = iomap->private;
+
+	if (ipage) {
+		DBG_BUGON(iomap->type != IOMAP_INLINE);
+		unlock_page(ipage);
+		put_page(ipage);
+	} else {
+		DBG_BUGON(iomap->type == IOMAP_INLINE);
+	}
+	return written;
+}
+
 const struct iomap_ops erofs_iomap_ops = {
 	.iomap_begin = erofs_iomap_begin,
+	.iomap_end = erofs_iomap_end,
 };
 
+/*
+ * since we dont have write or truncate flows, so no inode
+ * locking needs to be held at the moment.
+ */
+static int erofs_readpage(struct file *file, struct page *page)
+{
+	return iomap_readpage(page, &erofs_iomap_ops);
+}
+
+static void erofs_readahead(struct readahead_control *rac)
+{
+	return iomap_readahead(rac, &erofs_iomap_ops);
+}
+
+static sector_t erofs_bmap(struct address_space *mapping, sector_t block)
+{
+	return iomap_bmap(mapping, block, &erofs_iomap_ops);
+}
+
 static int erofs_prepare_dio(struct kiocb *iocb, struct iov_iter *to)
 {
 	struct inode *inode = file_inode(iocb->ki_filp);
@@ -365,15 +184,6 @@ static int erofs_prepare_dio(struct kiocb *iocb, struct iov_iter *to)
 
 	if (align & blksize_mask)
 		return -EINVAL;
-
-	/*
-	 * Temporarily fall back tail-packing inline to buffered I/O instead
-	 * since tail-packing inline support relies on an iomap core update.
-	 */
-	if (EROFS_I(inode)->datalayout == EROFS_INODE_FLAT_INLINE &&
-	    iocb->ki_pos + iov_iter_count(to) >
-			rounddown(inode->i_size, EROFS_BLKSIZ))
-		return 1;
 	return 0;
 }
 
@@ -409,8 +219,8 @@ static ssize_t erofs_file_read_iter(struct kiocb *iocb, struct iov_iter *to)
 
 /* for uncompressed (aligned) files and raw access for other files */
 const struct address_space_operations erofs_raw_access_aops = {
-	.readpage = erofs_raw_access_readpage,
-	.readahead = erofs_raw_access_readahead,
+	.readpage = erofs_readpage,
+	.readahead = erofs_readahead,
 	.bmap = erofs_bmap,
 	.direct_IO = noop_direct_IO,
 };
-- 
2.24.4


^ permalink raw reply related	[flat|nested] 11+ messages in thread

* Re: [PATCH v2 1/3] erofs: iomap support for non-tailpacking DIO
  2021-07-30 19:46 ` [PATCH v2 1/3] erofs: iomap support for non-tailpacking DIO Gao Xiang
@ 2021-08-04  2:57   ` Chao Yu
  2021-08-04  4:30     ` Gao Xiang
  0 siblings, 1 reply; 11+ messages in thread
From: Chao Yu @ 2021-08-04  2:57 UTC (permalink / raw)
  To: Gao Xiang, linux-erofs
  Cc: nvdimm, Darrick J. Wong, LKML, Joseph Qi, Liu Bo, Tao Ma,
	linux-fsdevel, Liu Jiang

On 2021/7/31 3:46, Gao Xiang wrote:
> From: Huang Jianan <huangjianan@oppo.com>
> 
> Add iomap support for non-tailpacking uncompressed data in order to
> support DIO and DAX.
> 
> Direct I/O is useful in certain scenarios for uncompressed files.
> For example, double pagecache can be avoid by direct I/O when
> loop device is used for uncompressed files containing upper layer
> compressed filesystem.
> 
> This adds iomap DIO support for non-tailpacking cases first and
> tail-packing inline files are handled in the follow-up patch.
> 
> Cc: linux-fsdevel@vger.kernel.org
> Signed-off-by: Huang Jianan <huangjianan@oppo.com>
> Signed-off-by: Gao Xiang <hsiangkao@linux.alibaba.com>
> ---
>   fs/erofs/Kconfig    |   1 +
>   fs/erofs/data.c     | 102 ++++++++++++++++++++++++++++++++++++++++++++
>   fs/erofs/inode.c    |   5 ++-
>   fs/erofs/internal.h |   1 +
>   4 files changed, 108 insertions(+), 1 deletion(-)
> 
> diff --git a/fs/erofs/Kconfig b/fs/erofs/Kconfig
> index 906af0c1998c..14b747026742 100644
> --- a/fs/erofs/Kconfig
> +++ b/fs/erofs/Kconfig
> @@ -3,6 +3,7 @@
>   config EROFS_FS
>   	tristate "EROFS filesystem support"
>   	depends on BLOCK
> +	select FS_IOMAP
>   	select LIBCRC32C
>   	help
>   	  EROFS (Enhanced Read-Only File System) is a lightweight
> diff --git a/fs/erofs/data.c b/fs/erofs/data.c
> index 3787a5fb0a42..1f97151a9f90 100644
> --- a/fs/erofs/data.c
> +++ b/fs/erofs/data.c
> @@ -5,6 +5,7 @@
>    */
>   #include "internal.h"
>   #include <linux/prefetch.h>
> +#include <linux/iomap.h>
>   
>   #include <trace/events/erofs.h>
>   
> @@ -308,9 +309,110 @@ static sector_t erofs_bmap(struct address_space *mapping, sector_t block)
>   	return 0;
>   }
>   
> +static int erofs_iomap_begin(struct inode *inode, loff_t offset, loff_t length,
> +		unsigned int flags, struct iomap *iomap, struct iomap *srcmap)
> +{
> +	int ret;
> +	struct erofs_map_blocks map;
> +
> +	map.m_la = offset;
> +	map.m_llen = length;
> +
> +	ret = erofs_map_blocks_flatmode(inode, &map, EROFS_GET_BLOCKS_RAW);
> +	if (ret < 0)
> +		return ret;
> +
> +	iomap->bdev = inode->i_sb->s_bdev;
> +	iomap->offset = map.m_la;
> +	iomap->length = map.m_llen;
> +	iomap->flags = 0;
> +
> +	if (!(map.m_flags & EROFS_MAP_MAPPED)) {
> +		iomap->type = IOMAP_HOLE;
> +		iomap->addr = IOMAP_NULL_ADDR;
> +		if (!iomap->length)
> +			iomap->length = length;

This only happens for the case offset exceeds isize?

> +		return 0;
> +	}
> +
> +	/* that shouldn't happen for now */
> +	if (map.m_flags & EROFS_MAP_META) {
> +		DBG_BUGON(1);
> +		return -ENOTBLK;
> +	}
> +	iomap->type = IOMAP_MAPPED;
> +	iomap->addr = map.m_pa;
> +	return 0;
> +}
> +
> +const struct iomap_ops erofs_iomap_ops = {
> +	.iomap_begin = erofs_iomap_begin,
> +};
> +
> +static int erofs_prepare_dio(struct kiocb *iocb, struct iov_iter *to)
> +{
> +	struct inode *inode = file_inode(iocb->ki_filp);
> +	loff_t align = iocb->ki_pos | iov_iter_count(to) |
> +		iov_iter_alignment(to);
> +	struct block_device *bdev = inode->i_sb->s_bdev;
> +	unsigned int blksize_mask;
> +
> +	if (bdev)
> +		blksize_mask = (1 << ilog2(bdev_logical_block_size(bdev))) - 1;
> +	else
> +		blksize_mask = (1 << inode->i_blkbits) - 1;
> +
> +	if (align & blksize_mask)
> +		return -EINVAL;
> +
> +	/*
> +	 * Temporarily fall back tail-packing inline to buffered I/O instead
> +	 * since tail-packing inline support relies on an iomap core update.
> +	 */
> +	if (EROFS_I(inode)->datalayout == EROFS_INODE_FLAT_INLINE &&
> +	    iocb->ki_pos + iov_iter_count(to) >
> +			rounddown(inode->i_size, EROFS_BLKSIZ))
> +		return 1;
> +	return 0;
> +}
> +
> +static ssize_t erofs_file_read_iter(struct kiocb *iocb, struct iov_iter *to)
> +{
> +	/* no need taking (shared) inode lock since it's a ro filesystem */
> +	if (!iov_iter_count(to))
> +		return 0;
> +
> +	if (iocb->ki_flags & IOCB_DIRECT) {
> +		int err = erofs_prepare_dio(iocb, to);
> +
> +		if (!err)
> +			return iomap_dio_rw(iocb, to, &erofs_iomap_ops,
> +					    NULL, 0);
> +		if (err < 0)
> +			return err;
> +		/*
> +		 * Fallback to buffered I/O if the operation being performed on
> +		 * the inode is not supported by direct I/O. The IOCB_DIRECT
> +		 * flag needs to be cleared here in order to ensure that the
> +		 * direct I/O path within generic_file_read_iter() is not
> +		 * taken.
> +		 */
> +		iocb->ki_flags &= ~IOCB_DIRECT;
> +	}
> +	return generic_file_read_iter(iocb, to);

It looks it's fine to call filemap_read() directly since above codes have
covered DIO case, then we don't need to change iocb->ki_flags flag, it's
minor though.

> +}
> +
>   /* for uncompressed (aligned) files and raw access for other files */
>   const struct address_space_operations erofs_raw_access_aops = {
>   	.readpage = erofs_raw_access_readpage,
>   	.readahead = erofs_raw_access_readahead,
>   	.bmap = erofs_bmap,
> +	.direct_IO = noop_direct_IO,
> +};
> +
> +const struct file_operations erofs_file_fops = {
> +	.llseek		= generic_file_llseek,
> +	.read_iter	= erofs_file_read_iter,
> +	.mmap		= generic_file_readonly_mmap,
> +	.splice_read	= generic_file_splice_read,
>   };
> diff --git a/fs/erofs/inode.c b/fs/erofs/inode.c
> index aa8a0d770ba3..00edb7562fea 100644
> --- a/fs/erofs/inode.c
> +++ b/fs/erofs/inode.c
> @@ -247,7 +247,10 @@ static int erofs_fill_inode(struct inode *inode, int isdir)
>   	switch (inode->i_mode & S_IFMT) {
>   	case S_IFREG:
>   		inode->i_op = &erofs_generic_iops;
> -		inode->i_fop = &generic_ro_fops;
> +		if (!erofs_inode_is_data_compressed(vi->datalayout))
> +			inode->i_fop = &erofs_file_fops;
> +		else
> +			inode->i_fop = &generic_ro_fops;

if (erofs_inode_is_data_compressed(vi->datalayout))
	inode->i_fop = &generic_ro_fops;
else
	inode->i_fop = &erofs_file_fops;

Otherwise, it looks good to me.

Reviewed-by: Chao Yu <chao@kernel.org>

Thanks

>   		break;
>   	case S_IFDIR:
>   		inode->i_op = &erofs_dir_iops;
> diff --git a/fs/erofs/internal.h b/fs/erofs/internal.h
> index 543c2ff97d30..2669c785d548 100644
> --- a/fs/erofs/internal.h
> +++ b/fs/erofs/internal.h
> @@ -371,6 +371,7 @@ static inline int z_erofs_map_blocks_iter(struct inode *inode,
>   #endif	/* !CONFIG_EROFS_FS_ZIP */
>   
>   /* data.c */
> +extern const struct file_operations erofs_file_fops;
>   struct page *erofs_get_meta_page(struct super_block *sb, erofs_blk_t blkaddr);
>   
>   /* inode.c */
> 

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH v2 1/3] erofs: iomap support for non-tailpacking DIO
  2021-08-04  2:57   ` Chao Yu
@ 2021-08-04  4:30     ` Gao Xiang
  2021-08-04  4:52       ` Gao Xiang
  0 siblings, 1 reply; 11+ messages in thread
From: Gao Xiang @ 2021-08-04  4:30 UTC (permalink / raw)
  To: Chao Yu
  Cc: nvdimm, Darrick J. Wong, LKML, Joseph Qi, Liu Bo, Tao Ma,
	linux-fsdevel, Liu Jiang, linux-erofs

Hi Chao,

On Wed, Aug 04, 2021 at 10:57:08AM +0800, Chao Yu wrote:
> On 2021/7/31 3:46, Gao Xiang wrote:

...

> >   }
> > +static int erofs_iomap_begin(struct inode *inode, loff_t offset, loff_t length,
> > +		unsigned int flags, struct iomap *iomap, struct iomap *srcmap)
> > +{
> > +	int ret;
> > +	struct erofs_map_blocks map;
> > +
> > +	map.m_la = offset;
> > +	map.m_llen = length;
> > +
> > +	ret = erofs_map_blocks_flatmode(inode, &map, EROFS_GET_BLOCKS_RAW);
> > +	if (ret < 0)
> > +		return ret;
> > +
> > +	iomap->bdev = inode->i_sb->s_bdev;
> > +	iomap->offset = map.m_la;
> > +	iomap->length = map.m_llen;
> > +	iomap->flags = 0;
> > +
> > +	if (!(map.m_flags & EROFS_MAP_MAPPED)) {
> > +		iomap->type = IOMAP_HOLE;
> > +		iomap->addr = IOMAP_NULL_ADDR;
> > +		if (!iomap->length)
> > +			iomap->length = length;
> 
> This only happens for the case offset exceeds isize?

Thanks for the review.

Yeah, this is a convention (length 0 with !EROFS_MAP_MAPPED) for post-EOF
in erofs_map_blocks_flatmode(), need to follow iomap rule as well.

> 
> > +		return 0;
> > +	}
> > +
> > +	/* that shouldn't happen for now */
> > +	if (map.m_flags & EROFS_MAP_META) {
> > +		DBG_BUGON(1);
> > +		return -ENOTBLK;
> > +	}
> > +	iomap->type = IOMAP_MAPPED;
> > +	iomap->addr = map.m_pa;
> > +	return 0;
> > +}
> > +
> > +const struct iomap_ops erofs_iomap_ops = {
> > +	.iomap_begin = erofs_iomap_begin,
> > +};
> > +
> > +static int erofs_prepare_dio(struct kiocb *iocb, struct iov_iter *to)
> > +{
> > +	struct inode *inode = file_inode(iocb->ki_filp);
> > +	loff_t align = iocb->ki_pos | iov_iter_count(to) |
> > +		iov_iter_alignment(to);
> > +	struct block_device *bdev = inode->i_sb->s_bdev;
> > +	unsigned int blksize_mask;
> > +
> > +	if (bdev)
> > +		blksize_mask = (1 << ilog2(bdev_logical_block_size(bdev))) - 1;
> > +	else
> > +		blksize_mask = (1 << inode->i_blkbits) - 1;
> > +
> > +	if (align & blksize_mask)
> > +		return -EINVAL;
> > +
> > +	/*
> > +	 * Temporarily fall back tail-packing inline to buffered I/O instead
> > +	 * since tail-packing inline support relies on an iomap core update.
> > +	 */
> > +	if (EROFS_I(inode)->datalayout == EROFS_INODE_FLAT_INLINE &&
> > +	    iocb->ki_pos + iov_iter_count(to) >
> > +			rounddown(inode->i_size, EROFS_BLKSIZ))
> > +		return 1;
> > +	return 0;
> > +}
> > +
> > +static ssize_t erofs_file_read_iter(struct kiocb *iocb, struct iov_iter *to)
> > +{
> > +	/* no need taking (shared) inode lock since it's a ro filesystem */
> > +	if (!iov_iter_count(to))
> > +		return 0;
> > +
> > +	if (iocb->ki_flags & IOCB_DIRECT) {
> > +		int err = erofs_prepare_dio(iocb, to);
> > +
> > +		if (!err)
> > +			return iomap_dio_rw(iocb, to, &erofs_iomap_ops,
> > +					    NULL, 0);
> > +		if (err < 0)
> > +			return err;
> > +		/*
> > +		 * Fallback to buffered I/O if the operation being performed on
> > +		 * the inode is not supported by direct I/O. The IOCB_DIRECT
> > +		 * flag needs to be cleared here in order to ensure that the
> > +		 * direct I/O path within generic_file_read_iter() is not
> > +		 * taken.
> > +		 */
> > +		iocb->ki_flags &= ~IOCB_DIRECT;
> > +	}
> > +	return generic_file_read_iter(iocb, to);
> 
> It looks it's fine to call filemap_read() directly since above codes have
> covered DIO case, then we don't need to change iocb->ki_flags flag, it's
> minor though.

Yeah, we could use filemap_read() here instead. yet IMO, it might be
better to drop IOCB_DIRECT too to keep iocb consistent with the real
semantics (even it's not used internally.)

> 
> > +}
> > +
> >   /* for uncompressed (aligned) files and raw access for other files */
> >   const struct address_space_operations erofs_raw_access_aops = {
> >   	.readpage = erofs_raw_access_readpage,
> >   	.readahead = erofs_raw_access_readahead,
> >   	.bmap = erofs_bmap,
> > +	.direct_IO = noop_direct_IO,
> > +};
> > +
> > +const struct file_operations erofs_file_fops = {
> > +	.llseek		= generic_file_llseek,
> > +	.read_iter	= erofs_file_read_iter,
> > +	.mmap		= generic_file_readonly_mmap,
> > +	.splice_read	= generic_file_splice_read,
> >   };
> > diff --git a/fs/erofs/inode.c b/fs/erofs/inode.c
> > index aa8a0d770ba3..00edb7562fea 100644
> > --- a/fs/erofs/inode.c
> > +++ b/fs/erofs/inode.c
> > @@ -247,7 +247,10 @@ static int erofs_fill_inode(struct inode *inode, int isdir)
> >   	switch (inode->i_mode & S_IFMT) {
> >   	case S_IFREG:
> >   		inode->i_op = &erofs_generic_iops;
> > -		inode->i_fop = &generic_ro_fops;
> > +		if (!erofs_inode_is_data_compressed(vi->datalayout))
> > +			inode->i_fop = &erofs_file_fops;
> > +		else
> > +			inode->i_fop = &generic_ro_fops;
> 
> if (erofs_inode_is_data_compressed(vi->datalayout))
> 	inode->i_fop = &generic_ro_fops;
> else
> 	inode->i_fop = &erofs_file_fops;
> 
> Otherwise, it looks good to me.

ok, will fix in the next version.

Thanks,
Gao Xiang

> 
> Reviewed-by: Chao Yu <chao@kernel.org>
> 
> Thanks
> 
> >   		break;
> >   	case S_IFDIR:
> >   		inode->i_op = &erofs_dir_iops;
> > diff --git a/fs/erofs/internal.h b/fs/erofs/internal.h
> > index 543c2ff97d30..2669c785d548 100644
> > --- a/fs/erofs/internal.h
> > +++ b/fs/erofs/internal.h
> > @@ -371,6 +371,7 @@ static inline int z_erofs_map_blocks_iter(struct inode *inode,
> >   #endif	/* !CONFIG_EROFS_FS_ZIP */
> >   /* data.c */
> > +extern const struct file_operations erofs_file_fops;
> >   struct page *erofs_get_meta_page(struct super_block *sb, erofs_blk_t blkaddr);
> >   /* inode.c */
> > 

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH v2 1/3] erofs: iomap support for non-tailpacking DIO
  2021-08-04  4:30     ` Gao Xiang
@ 2021-08-04  4:52       ` Gao Xiang
  0 siblings, 0 replies; 11+ messages in thread
From: Gao Xiang @ 2021-08-04  4:52 UTC (permalink / raw)
  To: Chao Yu
  Cc: nvdimm, Darrick J. Wong, LKML, Joseph Qi, Liu Bo, Tao Ma,
	linux-fsdevel, Liu Jiang, linux-erofs

On Wed, Aug 04, 2021 at 12:30:35PM +0800, Gao Xiang wrote:
> Hi Chao,
> 
> On Wed, Aug 04, 2021 at 10:57:08AM +0800, Chao Yu wrote:
> > On 2021/7/31 3:46, Gao Xiang wrote:
> 
> ...
> 
> > >   }
> > > +static int erofs_iomap_begin(struct inode *inode, loff_t offset, loff_t length,
> > > +		unsigned int flags, struct iomap *iomap, struct iomap *srcmap)
> > > +{
> > > +	int ret;
> > > +	struct erofs_map_blocks map;
> > > +
> > > +	map.m_la = offset;
> > > +	map.m_llen = length;
> > > +
> > > +	ret = erofs_map_blocks_flatmode(inode, &map, EROFS_GET_BLOCKS_RAW);
> > > +	if (ret < 0)
> > > +		return ret;
> > > +
> > > +	iomap->bdev = inode->i_sb->s_bdev;
> > > +	iomap->offset = map.m_la;
> > > +	iomap->length = map.m_llen;
> > > +	iomap->flags = 0;
> > > +
> > > +	if (!(map.m_flags & EROFS_MAP_MAPPED)) {
> > > +		iomap->type = IOMAP_HOLE;
> > > +		iomap->addr = IOMAP_NULL_ADDR;
> > > +		if (!iomap->length)
> > > +			iomap->length = length;
> > 
> > This only happens for the case offset exceeds isize?
> 
> Thanks for the review.
> 
> Yeah, this is a convention (length 0 with !EROFS_MAP_MAPPED) for post-EOF
> in erofs_map_blocks_flatmode(), need to follow iomap rule as well.
> 
> > 
> > > +		return 0;
> > > +	}
> > > +
> > > +	/* that shouldn't happen for now */
> > > +	if (map.m_flags & EROFS_MAP_META) {
> > > +		DBG_BUGON(1);
> > > +		return -ENOTBLK;
> > > +	}
> > > +	iomap->type = IOMAP_MAPPED;
> > > +	iomap->addr = map.m_pa;
> > > +	return 0;
> > > +}
> > > +
> > > +const struct iomap_ops erofs_iomap_ops = {
> > > +	.iomap_begin = erofs_iomap_begin,
> > > +};
> > > +
> > > +static int erofs_prepare_dio(struct kiocb *iocb, struct iov_iter *to)
> > > +{
> > > +	struct inode *inode = file_inode(iocb->ki_filp);
> > > +	loff_t align = iocb->ki_pos | iov_iter_count(to) |
> > > +		iov_iter_alignment(to);
> > > +	struct block_device *bdev = inode->i_sb->s_bdev;
> > > +	unsigned int blksize_mask;
> > > +
> > > +	if (bdev)
> > > +		blksize_mask = (1 << ilog2(bdev_logical_block_size(bdev))) - 1;
> > > +	else
> > > +		blksize_mask = (1 << inode->i_blkbits) - 1;
> > > +
> > > +	if (align & blksize_mask)
> > > +		return -EINVAL;
> > > +
> > > +	/*
> > > +	 * Temporarily fall back tail-packing inline to buffered I/O instead
> > > +	 * since tail-packing inline support relies on an iomap core update.
> > > +	 */
> > > +	if (EROFS_I(inode)->datalayout == EROFS_INODE_FLAT_INLINE &&
> > > +	    iocb->ki_pos + iov_iter_count(to) >
> > > +			rounddown(inode->i_size, EROFS_BLKSIZ))
> > > +		return 1;
> > > +	return 0;
> > > +}
> > > +
> > > +static ssize_t erofs_file_read_iter(struct kiocb *iocb, struct iov_iter *to)
> > > +{
> > > +	/* no need taking (shared) inode lock since it's a ro filesystem */
> > > +	if (!iov_iter_count(to))
> > > +		return 0;
> > > +
> > > +	if (iocb->ki_flags & IOCB_DIRECT) {
> > > +		int err = erofs_prepare_dio(iocb, to);
> > > +
> > > +		if (!err)
> > > +			return iomap_dio_rw(iocb, to, &erofs_iomap_ops,
> > > +					    NULL, 0);
> > > +		if (err < 0)
> > > +			return err;
> > > +		/*
> > > +		 * Fallback to buffered I/O if the operation being performed on
> > > +		 * the inode is not supported by direct I/O. The IOCB_DIRECT
> > > +		 * flag needs to be cleared here in order to ensure that the
> > > +		 * direct I/O path within generic_file_read_iter() is not
> > > +		 * taken.
> > > +		 */
> > > +		iocb->ki_flags &= ~IOCB_DIRECT;
> > > +	}
> > > +	return generic_file_read_iter(iocb, to);
> > 
> > It looks it's fine to call filemap_read() directly since above codes have
> > covered DIO case, then we don't need to change iocb->ki_flags flag, it's
> > minor though.
> 
> Yeah, we could use filemap_read() here instead. yet IMO, it might be
> better to drop IOCB_DIRECT too to keep iocb consistent with the real
> semantics (even it's not used internally.)

After checking the other users of filemap_read(), I'm fine to leave
IOCB_DIRECT as-is. Will update.

Thanks,
Gao Xiang

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH v2 2/3] erofs: dax support for non-tailpacking regular file
  2021-07-30 19:46 ` [PATCH v2 2/3] erofs: dax support for non-tailpacking regular file Gao Xiang
@ 2021-08-04  7:14   ` Chao Yu
  2021-08-04 11:21     ` Gao Xiang
  0 siblings, 1 reply; 11+ messages in thread
From: Chao Yu @ 2021-08-04  7:14 UTC (permalink / raw)
  To: Gao Xiang, linux-erofs
  Cc: nvdimm, Darrick J. Wong, LKML, Joseph Qi, Liu Bo, Tao Ma,
	linux-fsdevel, Liu Jiang

On 2021/7/31 3:46, Gao Xiang wrote:
> DAX is quite useful for some VM use cases in order to save guest
> memory extremely with minimal lightweight EROFS.
> 
> In order to prepare for such use cases, add preliminary dax support
> for non-tailpacking regular files for now.
> 
> Tested with the DRAM-emulated PMEM and the EROFS image generated by
> "mkfs.erofs -Enoinline_data enwik9.fsdax.img enwik9"
> 
> Cc: nvdimm@lists.linux.dev
> Cc: linux-fsdevel@vger.kernel.org
> Signed-off-by: Gao Xiang <hsiangkao@linux.alibaba.com>
> ---
>   fs/erofs/data.c     | 42 +++++++++++++++++++++++++++++--
>   fs/erofs/inode.c    |  4 +++
>   fs/erofs/internal.h |  3 +++
>   fs/erofs/super.c    | 60 +++++++++++++++++++++++++++++++++++++++++++--
>   4 files changed, 105 insertions(+), 4 deletions(-)
> 
> diff --git a/fs/erofs/data.c b/fs/erofs/data.c
> index 1f97151a9f90..911521293b20 100644
> --- a/fs/erofs/data.c
> +++ b/fs/erofs/data.c
> @@ -6,7 +6,7 @@
>   #include "internal.h"
>   #include <linux/prefetch.h>
>   #include <linux/iomap.h>
> -
> +#include <linux/dax.h>
>   #include <trace/events/erofs.h>
>   
>   static void erofs_readendio(struct bio *bio)
> @@ -323,6 +323,7 @@ static int erofs_iomap_begin(struct inode *inode, loff_t offset, loff_t length,
>   		return ret;
>   
>   	iomap->bdev = inode->i_sb->s_bdev;
> +	iomap->dax_dev = EROFS_I_SB(inode)->dax_dev;
>   	iomap->offset = map.m_la;
>   	iomap->length = map.m_llen;
>   	iomap->flags = 0;
> @@ -382,6 +383,10 @@ static ssize_t erofs_file_read_iter(struct kiocb *iocb, struct iov_iter *to)
>   	if (!iov_iter_count(to))
>   		return 0;
>   
> +#ifdef CONFIG_FS_DAX
> +	if (IS_DAX(iocb->ki_filp->f_mapping->host))
> +		return dax_iomap_rw(iocb, to, &erofs_iomap_ops);
> +#endif
>   	if (iocb->ki_flags & IOCB_DIRECT) {
>   		int err = erofs_prepare_dio(iocb, to);
>   
> @@ -410,9 +415,42 @@ const struct address_space_operations erofs_raw_access_aops = {
>   	.direct_IO = noop_direct_IO,
>   };
>   
> +#ifdef CONFIG_FS_DAX
> +static vm_fault_t erofs_dax_huge_fault(struct vm_fault *vmf,
> +		enum page_entry_size pe_size)
> +{
> +	return dax_iomap_fault(vmf, pe_size, NULL, NULL, &erofs_iomap_ops);
> +}
> +
> +static vm_fault_t erofs_dax_fault(struct vm_fault *vmf)
> +{
> +	return erofs_dax_huge_fault(vmf, PE_SIZE_PTE);
> +}
> +
> +static const struct vm_operations_struct erofs_dax_vm_ops = {
> +	.fault		= erofs_dax_fault,
> +	.huge_fault	= erofs_dax_huge_fault,
> +};
> +
> +static int erofs_file_mmap(struct file *file, struct vm_area_struct *vma)
> +{
> +	if (!IS_DAX(file_inode(file)))
> +		return generic_file_readonly_mmap(file, vma);
> +
> +	if ((vma->vm_flags & VM_SHARED) && (vma->vm_flags & VM_MAYWRITE))
> +		return -EINVAL;
> +
> +	vma->vm_ops = &erofs_dax_vm_ops;
> +	vma->vm_flags |= VM_HUGEPAGE;
> +	return 0;
> +}
> +#else
> +#define erofs_file_mmap	generic_file_readonly_mmap
> +#endif
> +
>   const struct file_operations erofs_file_fops = {
>   	.llseek		= generic_file_llseek,
>   	.read_iter	= erofs_file_read_iter,
> -	.mmap		= generic_file_readonly_mmap,
> +	.mmap		= erofs_file_mmap,
>   	.splice_read	= generic_file_splice_read,
>   };
> diff --git a/fs/erofs/inode.c b/fs/erofs/inode.c
> index 00edb7562fea..e875fba18159 100644
> --- a/fs/erofs/inode.c
> +++ b/fs/erofs/inode.c
> @@ -174,6 +174,10 @@ static struct page *erofs_read_inode(struct inode *inode,
>   	inode->i_mtime.tv_nsec = inode->i_ctime.tv_nsec;
>   	inode->i_atime.tv_nsec = inode->i_ctime.tv_nsec;
>   
> +	inode->i_flags &= ~S_DAX;
> +	if (test_opt(&sbi->ctx, DAX_ALWAYS) && S_ISREG(inode->i_mode) &&
> +	    vi->datalayout == EROFS_INODE_FLAT_PLAIN)
> +		inode->i_flags |= S_DAX;
>   	if (!nblks)
>   		/* measure inode.i_blocks as generic filesystems */
>   		inode->i_blocks = roundup(inode->i_size, EROFS_BLKSIZ) >> 9;
> diff --git a/fs/erofs/internal.h b/fs/erofs/internal.h
> index 2669c785d548..7c9abfc93109 100644
> --- a/fs/erofs/internal.h
> +++ b/fs/erofs/internal.h
> @@ -83,6 +83,7 @@ struct erofs_sb_info {
>   
>   	struct erofs_sb_lz4_info lz4;
>   #endif	/* CONFIG_EROFS_FS_ZIP */
> +	struct dax_device *dax_dev;
>   	u32 blocks;
>   	u32 meta_blkaddr;
>   #ifdef CONFIG_EROFS_FS_XATTR
> @@ -115,6 +116,8 @@ struct erofs_sb_info {
>   /* Mount flags set via mount options or defaults */
>   #define EROFS_MOUNT_XATTR_USER		0x00000010
>   #define EROFS_MOUNT_POSIX_ACL		0x00000020
> +#define EROFS_MOUNT_DAX_ALWAYS		0x00000040
> +#define EROFS_MOUNT_DAX_NEVER		0x00000080
>   
>   #define clear_opt(ctx, option)	((ctx)->mount_opt &= ~EROFS_MOUNT_##option)
>   #define set_opt(ctx, option)	((ctx)->mount_opt |= EROFS_MOUNT_##option)
> diff --git a/fs/erofs/super.c b/fs/erofs/super.c
> index 8fc6c04b54f4..d5b110fd365d 100644
> --- a/fs/erofs/super.c
> +++ b/fs/erofs/super.c
> @@ -11,6 +11,7 @@
>   #include <linux/crc32c.h>
>   #include <linux/fs_context.h>
>   #include <linux/fs_parser.h>
> +#include <linux/dax.h>
>   #include "xattr.h"
>   
>   #define CREATE_TRACE_POINTS
> @@ -355,6 +356,8 @@ enum {
>   	Opt_user_xattr,
>   	Opt_acl,
>   	Opt_cache_strategy,
> +	Opt_dax,
> +	Opt_dax_enum,

We need to update doc for those new dax mount options.

>   	Opt_err
>   };
>   
> @@ -365,14 +368,47 @@ static const struct constant_table erofs_param_cache_strategy[] = {
>   	{}
>   };
>   
> +static const struct constant_table erofs_dax_param_enums[] = {
> +	{"always",	EROFS_MOUNT_DAX_ALWAYS},
> +	{"never",	EROFS_MOUNT_DAX_NEVER},
> +	{}
> +};
> +
>   static const struct fs_parameter_spec erofs_fs_parameters[] = {
>   	fsparam_flag_no("user_xattr",	Opt_user_xattr),
>   	fsparam_flag_no("acl",		Opt_acl),
>   	fsparam_enum("cache_strategy",	Opt_cache_strategy,
>   		     erofs_param_cache_strategy),
> +	fsparam_flag("dax",             Opt_dax),
> +	fsparam_enum("dax",		Opt_dax_enum, erofs_dax_param_enums),
>   	{}
>   };
>   
> +static bool erofs_fc_set_dax_mode(struct fs_context *fc, unsigned int mode)
> +{
> +#ifdef CONFIG_FS_DAX
> +	struct erofs_fs_context *ctx = fc->fs_private;
> +
> +	switch (mode) {
> +	case EROFS_MOUNT_DAX_ALWAYS:
> +		warnfc(fc, "DAX enabled. Warning: EXPERIMENTAL, use at your own risk");
> +		set_opt(ctx, DAX_ALWAYS);
> +		clear_opt(ctx, DAX_NEVER);
> +		return true;
> +	case EROFS_MOUNT_DAX_NEVER:
> +		set_opt(ctx, DAX_NEVER);
> +		clear_opt(ctx, DAX_ALWAYS);
> +		return true;
> +	default:
> +		DBG_BUGON(1);
> +		return false;
> +	}
> +#else
> +	errorfc(fc, "dax options not supported");
> +	return false;
> +#endif
> +}
> +
>   static int erofs_fc_parse_param(struct fs_context *fc,
>   				struct fs_parameter *param)
>   {
> @@ -412,6 +448,14 @@ static int erofs_fc_parse_param(struct fs_context *fc,
>   		errorfc(fc, "compression not supported, cache_strategy ignored");
>   #endif
>   		break;
> +	case Opt_dax:
> +		if (!erofs_fc_set_dax_mode(fc, EROFS_MOUNT_DAX_ALWAYS))
> +			return -EINVAL;
> +		break;
> +	case Opt_dax_enum:
> +		if (!erofs_fc_set_dax_mode(fc, result.uint_32))
> +			return -EINVAL;
> +		break;
>   	default:
>   		return -ENOPARAM;
>   	}
> @@ -496,10 +540,16 @@ static int erofs_fc_fill_super(struct super_block *sb, struct fs_context *fc)
>   		return -ENOMEM;
>   
>   	sb->s_fs_info = sbi;
> +	sbi->dax_dev = fs_dax_get_by_bdev(sb->s_bdev);
>   	err = erofs_read_superblock(sb);
>   	if (err)
>   		return err;
>   
> +	if (test_opt(ctx, DAX_ALWAYS) &&
> +	    !bdev_dax_supported(sb->s_bdev, EROFS_BLKSIZ)) {
> +		errorfc(fc, "DAX unsupported by block device. Turning off DAX.");
> +		clear_opt(ctx, DAX_ALWAYS);
> +	}
>   	sb->s_flags |= SB_RDONLY | SB_NOATIME;
>   	sb->s_maxbytes = MAX_LFS_FILESIZE;
>   	sb->s_time_gran = 1;
> @@ -609,6 +659,8 @@ static void erofs_kill_sb(struct super_block *sb)
>   	sbi = EROFS_SB(sb);
>   	if (!sbi)
>   		return;
> +	if (sbi->dax_dev)
> +		fs_put_dax(sbi->dax_dev);

fs_put_dax(sbi->dax_dev);

Thanks,

>   	kfree(sbi);
>   	sb->s_fs_info = NULL;
>   }
> @@ -711,8 +763,8 @@ static int erofs_statfs(struct dentry *dentry, struct kstatfs *buf)
>   
>   static int erofs_show_options(struct seq_file *seq, struct dentry *root)
>   {
> -	struct erofs_sb_info *sbi __maybe_unused = EROFS_SB(root->d_sb);
> -	struct erofs_fs_context *ctx __maybe_unused = &sbi->ctx;
> +	struct erofs_sb_info *sbi = EROFS_SB(root->d_sb);
> +	struct erofs_fs_context *ctx = &sbi->ctx;
>   
>   #ifdef CONFIG_EROFS_FS_XATTR
>   	if (test_opt(ctx, XATTR_USER))
> @@ -734,6 +786,10 @@ static int erofs_show_options(struct seq_file *seq, struct dentry *root)
>   	else if (ctx->cache_strategy == EROFS_ZIP_CACHE_READAROUND)
>   		seq_puts(seq, ",cache_strategy=readaround");
>   #endif
> +	if (test_opt(ctx, DAX_ALWAYS))
> +		seq_puts(seq, ",dax=always");
> +	if (test_opt(ctx, DAX_NEVER))
> +		seq_puts(seq, ",dax=never");
>   	return 0;
>   }
>   
> 

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH v2 3/3] erofs: convert all uncompressed cases to iomap
  2021-07-30 19:46 ` [PATCH v2 3/3] erofs: convert all uncompressed cases to iomap Gao Xiang
@ 2021-08-04  7:17   ` Chao Yu
  2021-08-04 11:22     ` Gao Xiang
  0 siblings, 1 reply; 11+ messages in thread
From: Chao Yu @ 2021-08-04  7:17 UTC (permalink / raw)
  To: Gao Xiang, linux-erofs
  Cc: nvdimm, Darrick J. Wong, LKML, Joseph Qi, Liu Bo, Tao Ma,
	linux-fsdevel, Liu Jiang

On 2021/7/31 3:46, Gao Xiang wrote:
> Since tail-packing inline has been supported by iomap now, let's
> convert all EROFS uncompressed data I/O to iomap, which is pretty
> straight-forward.
> 
> Cc: linux-fsdevel@vger.kernel.org
> Signed-off-by: Gao Xiang <hsiangkao@linux.alibaba.com>
> ---
>   fs/erofs/data.c | 288 ++++++++----------------------------------------
>   1 file changed, 49 insertions(+), 239 deletions(-)
> 
> diff --git a/fs/erofs/data.c b/fs/erofs/data.c
> index 911521293b20..6b98156bb5ca 100644
> --- a/fs/erofs/data.c
> +++ b/fs/erofs/data.c
> @@ -9,29 +9,6 @@
>   #include <linux/dax.h>
>   #include <trace/events/erofs.h>
>   
> -static void erofs_readendio(struct bio *bio)
> -{
> -	struct bio_vec *bvec;
> -	blk_status_t err = bio->bi_status;
> -	struct bvec_iter_all iter_all;
> -
> -	bio_for_each_segment_all(bvec, bio, iter_all) {
> -		struct page *page = bvec->bv_page;
> -
> -		/* page is already locked */
> -		DBG_BUGON(PageUptodate(page));
> -
> -		if (err)
> -			SetPageError(page);
> -		else
> -			SetPageUptodate(page);
> -
> -		unlock_page(page);
> -		/* page could be reclaimed now */
> -	}
> -	bio_put(bio);
> -}
> -
>   struct page *erofs_get_meta_page(struct super_block *sb, erofs_blk_t blkaddr)
>   {
>   	struct address_space *const mapping = sb->s_bdev->bd_inode->i_mapping;
> @@ -109,206 +86,6 @@ static int erofs_map_blocks_flatmode(struct inode *inode,
>   	return err;
>   }
>   
> -static inline struct bio *erofs_read_raw_page(struct bio *bio,
> -					      struct address_space *mapping,
> -					      struct page *page,
> -					      erofs_off_t *last_block,
> -					      unsigned int nblocks,
> -					      unsigned int *eblks,
> -					      bool ra)
> -{
> -	struct inode *const inode = mapping->host;
> -	struct super_block *const sb = inode->i_sb;
> -	erofs_off_t current_block = (erofs_off_t)page->index;
> -	int err;
> -
> -	DBG_BUGON(!nblocks);
> -
> -	if (PageUptodate(page)) {
> -		err = 0;
> -		goto has_updated;
> -	}
> -
> -	/* note that for readpage case, bio also equals to NULL */
> -	if (bio &&
> -	    (*last_block + 1 != current_block || !*eblks)) {
> -submit_bio_retry:
> -		submit_bio(bio);
> -		bio = NULL;
> -	}
> -
> -	if (!bio) {
> -		struct erofs_map_blocks map = {
> -			.m_la = blknr_to_addr(current_block),
> -		};
> -		erofs_blk_t blknr;
> -		unsigned int blkoff;
> -
> -		err = erofs_map_blocks_flatmode(inode, &map, EROFS_GET_BLOCKS_RAW);
> -		if (err)
> -			goto err_out;
> -
> -		/* zero out the holed page */
> -		if (!(map.m_flags & EROFS_MAP_MAPPED)) {
> -			zero_user_segment(page, 0, PAGE_SIZE);
> -			SetPageUptodate(page);
> -
> -			/* imply err = 0, see erofs_map_blocks */
> -			goto has_updated;
> -		}
> -
> -		/* for RAW access mode, m_plen must be equal to m_llen */
> -		DBG_BUGON(map.m_plen != map.m_llen);
> -
> -		blknr = erofs_blknr(map.m_pa);
> -		blkoff = erofs_blkoff(map.m_pa);
> -
> -		/* deal with inline page */
> -		if (map.m_flags & EROFS_MAP_META) {
> -			void *vsrc, *vto;
> -			struct page *ipage;
> -
> -			DBG_BUGON(map.m_plen > PAGE_SIZE);
> -
> -			ipage = erofs_get_meta_page(inode->i_sb, blknr);
> -
> -			if (IS_ERR(ipage)) {
> -				err = PTR_ERR(ipage);
> -				goto err_out;
> -			}
> -
> -			vsrc = kmap_atomic(ipage);
> -			vto = kmap_atomic(page);
> -			memcpy(vto, vsrc + blkoff, map.m_plen);
> -			memset(vto + map.m_plen, 0, PAGE_SIZE - map.m_plen);
> -			kunmap_atomic(vto);
> -			kunmap_atomic(vsrc);
> -			flush_dcache_page(page);
> -
> -			SetPageUptodate(page);
> -			/* TODO: could we unlock the page earlier? */
> -			unlock_page(ipage);
> -			put_page(ipage);
> -
> -			/* imply err = 0, see erofs_map_blocks */
> -			goto has_updated;
> -		}
> -
> -		/* pa must be block-aligned for raw reading */
> -		DBG_BUGON(erofs_blkoff(map.m_pa));
> -
> -		/* max # of continuous pages */
> -		if (nblocks > DIV_ROUND_UP(map.m_plen, PAGE_SIZE))
> -			nblocks = DIV_ROUND_UP(map.m_plen, PAGE_SIZE);
> -
> -		*eblks = bio_max_segs(nblocks);
> -		bio = bio_alloc(GFP_NOIO, *eblks);
> -
> -		bio->bi_end_io = erofs_readendio;
> -		bio_set_dev(bio, sb->s_bdev);
> -		bio->bi_iter.bi_sector = (sector_t)blknr <<
> -			LOG_SECTORS_PER_BLOCK;
> -		bio->bi_opf = REQ_OP_READ | (ra ? REQ_RAHEAD : 0);
> -	}
> -
> -	err = bio_add_page(bio, page, PAGE_SIZE, 0);
> -	/* out of the extent or bio is full */
> -	if (err < PAGE_SIZE)
> -		goto submit_bio_retry;
> -	--*eblks;
> -	*last_block = current_block;
> -	return bio;
> -
> -err_out:
> -	/* for sync reading, set page error immediately */
> -	if (!ra) {
> -		SetPageError(page);
> -		ClearPageUptodate(page);
> -	}
> -has_updated:
> -	unlock_page(page);
> -
> -	/* if updated manually, continuous pages has a gap */
> -	if (bio)
> -		submit_bio(bio);
> -	return err ? ERR_PTR(err) : NULL;
> -}
> -
> -/*
> - * since we dont have write or truncate flows, so no inode
> - * locking needs to be held at the moment.
> - */
> -static int erofs_raw_access_readpage(struct file *file, struct page *page)
> -{
> -	erofs_off_t last_block;
> -	unsigned int eblks;
> -	struct bio *bio;
> -
> -	trace_erofs_readpage(page, true);
> -
> -	bio = erofs_read_raw_page(NULL, page->mapping,
> -				  page, &last_block, 1, &eblks, false);
> -
> -	if (IS_ERR(bio))
> -		return PTR_ERR(bio);
> -
> -	if (bio)
> -		submit_bio(bio);
> -	return 0;
> -}
> -
> -static void erofs_raw_access_readahead(struct readahead_control *rac)
> -{
> -	erofs_off_t last_block;
> -	unsigned int eblks;
> -	struct bio *bio = NULL;
> -	struct page *page;
> -
> -	trace_erofs_readpages(rac->mapping->host, readahead_index(rac),
> -			readahead_count(rac), true);
> -
> -	while ((page = readahead_page(rac))) {
> -		prefetchw(&page->flags);
> -
> -		bio = erofs_read_raw_page(bio, rac->mapping, page, &last_block,
> -				readahead_count(rac), &eblks, true);
> -
> -		/* all the page errors are ignored when readahead */
> -		if (IS_ERR(bio)) {
> -			pr_err("%s, readahead error at page %lu of nid %llu\n",
> -			       __func__, page->index,
> -			       EROFS_I(rac->mapping->host)->nid);
> -
> -			bio = NULL;
> -		}
> -
> -		put_page(page);
> -	}
> -
> -	if (bio)
> -		submit_bio(bio);
> -}
> -
> -static sector_t erofs_bmap(struct address_space *mapping, sector_t block)
> -{
> -	struct inode *inode = mapping->host;
> -	struct erofs_map_blocks map = {
> -		.m_la = blknr_to_addr(block),
> -	};
> -
> -	if (EROFS_I(inode)->datalayout == EROFS_INODE_FLAT_INLINE) {
> -		erofs_blk_t blks = i_size_read(inode) >> LOG_BLOCK_SIZE;
> -
> -		if (block >> LOG_SECTORS_PER_BLOCK >= blks)
> -			return 0;
> -	}
> -
> -	if (!erofs_map_blocks_flatmode(inode, &map, EROFS_GET_BLOCKS_RAW))
> -		return erofs_blknr(map.m_pa);
> -
> -	return 0;
> -}
> -
>   static int erofs_iomap_begin(struct inode *inode, loff_t offset, loff_t length,
>   		unsigned int flags, struct iomap *iomap, struct iomap *srcmap)
>   {
> @@ -327,6 +104,7 @@ static int erofs_iomap_begin(struct inode *inode, loff_t offset, loff_t length,
>   	iomap->offset = map.m_la;
>   	iomap->length = map.m_llen;
>   	iomap->flags = 0;
> +	iomap->private = NULL;
>   
>   	if (!(map.m_flags & EROFS_MAP_MAPPED)) {
>   		iomap->type = IOMAP_HOLE;
> @@ -336,20 +114,61 @@ static int erofs_iomap_begin(struct inode *inode, loff_t offset, loff_t length,
>   		return 0;
>   	}
>   
> -	/* that shouldn't happen for now */
>   	if (map.m_flags & EROFS_MAP_META) {
> -		DBG_BUGON(1);
> -		return -ENOTBLK;
> +		struct page *ipage;
> +
> +		iomap->type = IOMAP_INLINE;
> +		ipage = erofs_get_meta_page(inode->i_sb,
> +					    erofs_blknr(map.m_pa));

Error handling for erofs_get_meta_page()?

Thanks

> +		iomap->inline_data = page_address(ipage) +
> +					erofs_blkoff(map.m_pa);
> +		iomap->private = ipage;
> +	} else {
> +		iomap->type = IOMAP_MAPPED;
> +		iomap->addr = map.m_pa;
>   	}
> -	iomap->type = IOMAP_MAPPED;
> -	iomap->addr = map.m_pa;
>   	return 0;
>   }
>   
> +static int erofs_iomap_end(struct inode *inode, loff_t pos, loff_t length,
> +		ssize_t written, unsigned flags, struct iomap *iomap)
> +{
> +	struct page *ipage = iomap->private;
> +
> +	if (ipage) {
> +		DBG_BUGON(iomap->type != IOMAP_INLINE);
> +		unlock_page(ipage);
> +		put_page(ipage);
> +	} else {
> +		DBG_BUGON(iomap->type == IOMAP_INLINE);
> +	}
> +	return written;
> +}
> +
>   const struct iomap_ops erofs_iomap_ops = {
>   	.iomap_begin = erofs_iomap_begin,
> +	.iomap_end = erofs_iomap_end,
>   };
>   
> +/*
> + * since we dont have write or truncate flows, so no inode
> + * locking needs to be held at the moment.
> + */
> +static int erofs_readpage(struct file *file, struct page *page)
> +{
> +	return iomap_readpage(page, &erofs_iomap_ops);
> +}
> +
> +static void erofs_readahead(struct readahead_control *rac)
> +{
> +	return iomap_readahead(rac, &erofs_iomap_ops);
> +}
> +
> +static sector_t erofs_bmap(struct address_space *mapping, sector_t block)
> +{
> +	return iomap_bmap(mapping, block, &erofs_iomap_ops);
> +}
> +
>   static int erofs_prepare_dio(struct kiocb *iocb, struct iov_iter *to)
>   {
>   	struct inode *inode = file_inode(iocb->ki_filp);
> @@ -365,15 +184,6 @@ static int erofs_prepare_dio(struct kiocb *iocb, struct iov_iter *to)
>   
>   	if (align & blksize_mask)
>   		return -EINVAL;
> -
> -	/*
> -	 * Temporarily fall back tail-packing inline to buffered I/O instead
> -	 * since tail-packing inline support relies on an iomap core update.
> -	 */
> -	if (EROFS_I(inode)->datalayout == EROFS_INODE_FLAT_INLINE &&
> -	    iocb->ki_pos + iov_iter_count(to) >
> -			rounddown(inode->i_size, EROFS_BLKSIZ))
> -		return 1;
>   	return 0;
>   }
>   
> @@ -409,8 +219,8 @@ static ssize_t erofs_file_read_iter(struct kiocb *iocb, struct iov_iter *to)
>   
>   /* for uncompressed (aligned) files and raw access for other files */
>   const struct address_space_operations erofs_raw_access_aops = {
> -	.readpage = erofs_raw_access_readpage,
> -	.readahead = erofs_raw_access_readahead,
> +	.readpage = erofs_readpage,
> +	.readahead = erofs_readahead,
>   	.bmap = erofs_bmap,
>   	.direct_IO = noop_direct_IO,
>   };
> 

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH v2 2/3] erofs: dax support for non-tailpacking regular file
  2021-08-04  7:14   ` Chao Yu
@ 2021-08-04 11:21     ` Gao Xiang
  0 siblings, 0 replies; 11+ messages in thread
From: Gao Xiang @ 2021-08-04 11:21 UTC (permalink / raw)
  To: Chao Yu
  Cc: nvdimm, Darrick J. Wong, LKML, Joseph Qi, Liu Bo, Tao Ma,
	linux-fsdevel, Liu Jiang, linux-erofs

On Wed, Aug 04, 2021 at 03:14:34PM +0800, Chao Yu wrote:
> On 2021/7/31 3:46, Gao Xiang wrote:
> > DAX is quite useful for some VM use cases in order to save guest
> > memory extremely with minimal lightweight EROFS.
> > 
> > In order to prepare for such use cases, add preliminary dax support
> > for non-tailpacking regular files for now.
> > 
> > Tested with the DRAM-emulated PMEM and the EROFS image generated by
> > "mkfs.erofs -Enoinline_data enwik9.fsdax.img enwik9"
> > 
> > Cc: nvdimm@lists.linux.dev
> > Cc: linux-fsdevel@vger.kernel.org
> > Signed-off-by: Gao Xiang <hsiangkao@linux.alibaba.com>
> > ---
> >   fs/erofs/data.c     | 42 +++++++++++++++++++++++++++++--
> >   fs/erofs/inode.c    |  4 +++
> >   fs/erofs/internal.h |  3 +++
> >   fs/erofs/super.c    | 60 +++++++++++++++++++++++++++++++++++++++++++--
> >   4 files changed, 105 insertions(+), 4 deletions(-)
> > 
> > diff --git a/fs/erofs/data.c b/fs/erofs/data.c
> > index 1f97151a9f90..911521293b20 100644
> > --- a/fs/erofs/data.c
> > +++ b/fs/erofs/data.c
> > @@ -6,7 +6,7 @@
> >   #include "internal.h"
> >   #include <linux/prefetch.h>
> >   #include <linux/iomap.h>
> > -
> > +#include <linux/dax.h>
> >   #include <trace/events/erofs.h>
> >   static void erofs_readendio(struct bio *bio)
> > @@ -323,6 +323,7 @@ static int erofs_iomap_begin(struct inode *inode, loff_t offset, loff_t length,
> >   		return ret;
> >   	iomap->bdev = inode->i_sb->s_bdev;
> > +	iomap->dax_dev = EROFS_I_SB(inode)->dax_dev;
> >   	iomap->offset = map.m_la;
> >   	iomap->length = map.m_llen;
> >   	iomap->flags = 0;
> > @@ -382,6 +383,10 @@ static ssize_t erofs_file_read_iter(struct kiocb *iocb, struct iov_iter *to)
> >   	if (!iov_iter_count(to))
> >   		return 0;
> > +#ifdef CONFIG_FS_DAX
> > +	if (IS_DAX(iocb->ki_filp->f_mapping->host))
> > +		return dax_iomap_rw(iocb, to, &erofs_iomap_ops);
> > +#endif
> >   	if (iocb->ki_flags & IOCB_DIRECT) {
> >   		int err = erofs_prepare_dio(iocb, to);
> > @@ -410,9 +415,42 @@ const struct address_space_operations erofs_raw_access_aops = {
> >   	.direct_IO = noop_direct_IO,
> >   };
> > +#ifdef CONFIG_FS_DAX
> > +static vm_fault_t erofs_dax_huge_fault(struct vm_fault *vmf,
> > +		enum page_entry_size pe_size)
> > +{
> > +	return dax_iomap_fault(vmf, pe_size, NULL, NULL, &erofs_iomap_ops);
> > +}
> > +
> > +static vm_fault_t erofs_dax_fault(struct vm_fault *vmf)
> > +{
> > +	return erofs_dax_huge_fault(vmf, PE_SIZE_PTE);
> > +}
> > +
> > +static const struct vm_operations_struct erofs_dax_vm_ops = {
> > +	.fault		= erofs_dax_fault,
> > +	.huge_fault	= erofs_dax_huge_fault,
> > +};
> > +
> > +static int erofs_file_mmap(struct file *file, struct vm_area_struct *vma)
> > +{
> > +	if (!IS_DAX(file_inode(file)))
> > +		return generic_file_readonly_mmap(file, vma);
> > +
> > +	if ((vma->vm_flags & VM_SHARED) && (vma->vm_flags & VM_MAYWRITE))
> > +		return -EINVAL;
> > +
> > +	vma->vm_ops = &erofs_dax_vm_ops;
> > +	vma->vm_flags |= VM_HUGEPAGE;
> > +	return 0;
> > +}
> > +#else
> > +#define erofs_file_mmap	generic_file_readonly_mmap
> > +#endif
> > +
> >   const struct file_operations erofs_file_fops = {
> >   	.llseek		= generic_file_llseek,
> >   	.read_iter	= erofs_file_read_iter,
> > -	.mmap		= generic_file_readonly_mmap,
> > +	.mmap		= erofs_file_mmap,
> >   	.splice_read	= generic_file_splice_read,
> >   };
> > diff --git a/fs/erofs/inode.c b/fs/erofs/inode.c
> > index 00edb7562fea..e875fba18159 100644
> > --- a/fs/erofs/inode.c
> > +++ b/fs/erofs/inode.c
> > @@ -174,6 +174,10 @@ static struct page *erofs_read_inode(struct inode *inode,
> >   	inode->i_mtime.tv_nsec = inode->i_ctime.tv_nsec;
> >   	inode->i_atime.tv_nsec = inode->i_ctime.tv_nsec;
> > +	inode->i_flags &= ~S_DAX;
> > +	if (test_opt(&sbi->ctx, DAX_ALWAYS) && S_ISREG(inode->i_mode) &&
> > +	    vi->datalayout == EROFS_INODE_FLAT_PLAIN)
> > +		inode->i_flags |= S_DAX;
> >   	if (!nblks)
> >   		/* measure inode.i_blocks as generic filesystems */
> >   		inode->i_blocks = roundup(inode->i_size, EROFS_BLKSIZ) >> 9;
> > diff --git a/fs/erofs/internal.h b/fs/erofs/internal.h
> > index 2669c785d548..7c9abfc93109 100644
> > --- a/fs/erofs/internal.h
> > +++ b/fs/erofs/internal.h
> > @@ -83,6 +83,7 @@ struct erofs_sb_info {
> >   	struct erofs_sb_lz4_info lz4;
> >   #endif	/* CONFIG_EROFS_FS_ZIP */
> > +	struct dax_device *dax_dev;
> >   	u32 blocks;
> >   	u32 meta_blkaddr;
> >   #ifdef CONFIG_EROFS_FS_XATTR
> > @@ -115,6 +116,8 @@ struct erofs_sb_info {
> >   /* Mount flags set via mount options or defaults */
> >   #define EROFS_MOUNT_XATTR_USER		0x00000010
> >   #define EROFS_MOUNT_POSIX_ACL		0x00000020
> > +#define EROFS_MOUNT_DAX_ALWAYS		0x00000040
> > +#define EROFS_MOUNT_DAX_NEVER		0x00000080
> >   #define clear_opt(ctx, option)	((ctx)->mount_opt &= ~EROFS_MOUNT_##option)
> >   #define set_opt(ctx, option)	((ctx)->mount_opt |= EROFS_MOUNT_##option)
> > diff --git a/fs/erofs/super.c b/fs/erofs/super.c
> > index 8fc6c04b54f4..d5b110fd365d 100644
> > --- a/fs/erofs/super.c
> > +++ b/fs/erofs/super.c
> > @@ -11,6 +11,7 @@
> >   #include <linux/crc32c.h>
> >   #include <linux/fs_context.h>
> >   #include <linux/fs_parser.h>
> > +#include <linux/dax.h>
> >   #include "xattr.h"
> >   #define CREATE_TRACE_POINTS
> > @@ -355,6 +356,8 @@ enum {
> >   	Opt_user_xattr,
> >   	Opt_acl,
> >   	Opt_cache_strategy,
> > +	Opt_dax,
> > +	Opt_dax_enum,
> 
> We need to update doc for those new dax mount options.

Yeah, I agree, let me update this soon.

> 
> >   	Opt_err
> >   };
> > @@ -365,14 +368,47 @@ static const struct constant_table erofs_param_cache_strategy[] = {
> >   	{}
> >   };
> > +static const struct constant_table erofs_dax_param_enums[] = {
> > +	{"always",	EROFS_MOUNT_DAX_ALWAYS},
> > +	{"never",	EROFS_MOUNT_DAX_NEVER},
> > +	{}
> > +};
> > +
> >   static const struct fs_parameter_spec erofs_fs_parameters[] = {
> >   	fsparam_flag_no("user_xattr",	Opt_user_xattr),
> >   	fsparam_flag_no("acl",		Opt_acl),
> >   	fsparam_enum("cache_strategy",	Opt_cache_strategy,
> >   		     erofs_param_cache_strategy),
> > +	fsparam_flag("dax",             Opt_dax),
> > +	fsparam_enum("dax",		Opt_dax_enum, erofs_dax_param_enums),
> >   	{}
> >   };
> > +static bool erofs_fc_set_dax_mode(struct fs_context *fc, unsigned int mode)
> > +{
> > +#ifdef CONFIG_FS_DAX
> > +	struct erofs_fs_context *ctx = fc->fs_private;
> > +
> > +	switch (mode) {
> > +	case EROFS_MOUNT_DAX_ALWAYS:
> > +		warnfc(fc, "DAX enabled. Warning: EXPERIMENTAL, use at your own risk");
> > +		set_opt(ctx, DAX_ALWAYS);
> > +		clear_opt(ctx, DAX_NEVER);
> > +		return true;
> > +	case EROFS_MOUNT_DAX_NEVER:
> > +		set_opt(ctx, DAX_NEVER);
> > +		clear_opt(ctx, DAX_ALWAYS);
> > +		return true;
> > +	default:
> > +		DBG_BUGON(1);
> > +		return false;
> > +	}
> > +#else
> > +	errorfc(fc, "dax options not supported");
> > +	return false;
> > +#endif
> > +}
> > +
> >   static int erofs_fc_parse_param(struct fs_context *fc,
> >   				struct fs_parameter *param)
> >   {
> > @@ -412,6 +448,14 @@ static int erofs_fc_parse_param(struct fs_context *fc,
> >   		errorfc(fc, "compression not supported, cache_strategy ignored");
> >   #endif
> >   		break;
> > +	case Opt_dax:
> > +		if (!erofs_fc_set_dax_mode(fc, EROFS_MOUNT_DAX_ALWAYS))
> > +			return -EINVAL;
> > +		break;
> > +	case Opt_dax_enum:
> > +		if (!erofs_fc_set_dax_mode(fc, result.uint_32))
> > +			return -EINVAL;
> > +		break;
> >   	default:
> >   		return -ENOPARAM;
> >   	}
> > @@ -496,10 +540,16 @@ static int erofs_fc_fill_super(struct super_block *sb, struct fs_context *fc)
> >   		return -ENOMEM;
> >   	sb->s_fs_info = sbi;
> > +	sbi->dax_dev = fs_dax_get_by_bdev(sb->s_bdev);
> >   	err = erofs_read_superblock(sb);
> >   	if (err)
> >   		return err;
> > +	if (test_opt(ctx, DAX_ALWAYS) &&
> > +	    !bdev_dax_supported(sb->s_bdev, EROFS_BLKSIZ)) {
> > +		errorfc(fc, "DAX unsupported by block device. Turning off DAX.");
> > +		clear_opt(ctx, DAX_ALWAYS);
> > +	}
> >   	sb->s_flags |= SB_RDONLY | SB_NOATIME;
> >   	sb->s_maxbytes = MAX_LFS_FILESIZE;
> >   	sb->s_time_gran = 1;
> > @@ -609,6 +659,8 @@ static void erofs_kill_sb(struct super_block *sb)
> >   	sbi = EROFS_SB(sb);
> >   	if (!sbi)
> >   		return;
> > +	if (sbi->dax_dev)
> > +		fs_put_dax(sbi->dax_dev);
> 
> fs_put_dax(sbi->dax_dev);

Will update.

Thanks,
Gao Xiang

> 
> Thanks,
> 
> >   	kfree(sbi);
> >   	sb->s_fs_info = NULL;
> >   }
> > @@ -711,8 +763,8 @@ static int erofs_statfs(struct dentry *dentry, struct kstatfs *buf)
> >   static int erofs_show_options(struct seq_file *seq, struct dentry *root)
> >   {
> > -	struct erofs_sb_info *sbi __maybe_unused = EROFS_SB(root->d_sb);
> > -	struct erofs_fs_context *ctx __maybe_unused = &sbi->ctx;
> > +	struct erofs_sb_info *sbi = EROFS_SB(root->d_sb);
> > +	struct erofs_fs_context *ctx = &sbi->ctx;
> >   #ifdef CONFIG_EROFS_FS_XATTR
> >   	if (test_opt(ctx, XATTR_USER))
> > @@ -734,6 +786,10 @@ static int erofs_show_options(struct seq_file *seq, struct dentry *root)
> >   	else if (ctx->cache_strategy == EROFS_ZIP_CACHE_READAROUND)
> >   		seq_puts(seq, ",cache_strategy=readaround");
> >   #endif
> > +	if (test_opt(ctx, DAX_ALWAYS))
> > +		seq_puts(seq, ",dax=always");
> > +	if (test_opt(ctx, DAX_NEVER))
> > +		seq_puts(seq, ",dax=never");
> >   	return 0;
> >   }
> > 

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH v2 3/3] erofs: convert all uncompressed cases to iomap
  2021-08-04  7:17   ` Chao Yu
@ 2021-08-04 11:22     ` Gao Xiang
  0 siblings, 0 replies; 11+ messages in thread
From: Gao Xiang @ 2021-08-04 11:22 UTC (permalink / raw)
  To: Chao Yu
  Cc: nvdimm, Darrick J. Wong, LKML, Joseph Qi, Liu Bo, Tao Ma,
	linux-fsdevel, Liu Jiang, linux-erofs

On Wed, Aug 04, 2021 at 03:17:50PM +0800, Chao Yu wrote:
> On 2021/7/31 3:46, Gao Xiang wrote:
> > Since tail-packing inline has been supported by iomap now, let's
> > convert all EROFS uncompressed data I/O to iomap, which is pretty
> > straight-forward.
> > 
> > Cc: linux-fsdevel@vger.kernel.org
> > Signed-off-by: Gao Xiang <hsiangkao@linux.alibaba.com>
> > ---
> >   fs/erofs/data.c | 288 ++++++++----------------------------------------
> >   1 file changed, 49 insertions(+), 239 deletions(-)
> > 
> > diff --git a/fs/erofs/data.c b/fs/erofs/data.c
> > index 911521293b20..6b98156bb5ca 100644
> > --- a/fs/erofs/data.c
> > +++ b/fs/erofs/data.c
> > @@ -9,29 +9,6 @@
> >   #include <linux/dax.h>
> >   #include <trace/events/erofs.h>
> > -static void erofs_readendio(struct bio *bio)
> > -{
> > -	struct bio_vec *bvec;
> > -	blk_status_t err = bio->bi_status;
> > -	struct bvec_iter_all iter_all;
> > -
> > -	bio_for_each_segment_all(bvec, bio, iter_all) {
> > -		struct page *page = bvec->bv_page;
> > -
> > -		/* page is already locked */
> > -		DBG_BUGON(PageUptodate(page));
> > -
> > -		if (err)
> > -			SetPageError(page);
> > -		else
> > -			SetPageUptodate(page);
> > -
> > -		unlock_page(page);
> > -		/* page could be reclaimed now */
> > -	}
> > -	bio_put(bio);
> > -}
> > -
> >   struct page *erofs_get_meta_page(struct super_block *sb, erofs_blk_t blkaddr)
> >   {
> >   	struct address_space *const mapping = sb->s_bdev->bd_inode->i_mapping;
> > @@ -109,206 +86,6 @@ static int erofs_map_blocks_flatmode(struct inode *inode,
> >   	return err;
> >   }
> > -static inline struct bio *erofs_read_raw_page(struct bio *bio,
> > -					      struct address_space *mapping,
> > -					      struct page *page,
> > -					      erofs_off_t *last_block,
> > -					      unsigned int nblocks,
> > -					      unsigned int *eblks,
> > -					      bool ra)
> > -{
> > -	struct inode *const inode = mapping->host;
> > -	struct super_block *const sb = inode->i_sb;
> > -	erofs_off_t current_block = (erofs_off_t)page->index;
> > -	int err;
> > -
> > -	DBG_BUGON(!nblocks);
> > -
> > -	if (PageUptodate(page)) {
> > -		err = 0;
> > -		goto has_updated;
> > -	}
> > -
> > -	/* note that for readpage case, bio also equals to NULL */
> > -	if (bio &&
> > -	    (*last_block + 1 != current_block || !*eblks)) {
> > -submit_bio_retry:
> > -		submit_bio(bio);
> > -		bio = NULL;
> > -	}
> > -
> > -	if (!bio) {
> > -		struct erofs_map_blocks map = {
> > -			.m_la = blknr_to_addr(current_block),
> > -		};
> > -		erofs_blk_t blknr;
> > -		unsigned int blkoff;
> > -
> > -		err = erofs_map_blocks_flatmode(inode, &map, EROFS_GET_BLOCKS_RAW);
> > -		if (err)
> > -			goto err_out;
> > -
> > -		/* zero out the holed page */
> > -		if (!(map.m_flags & EROFS_MAP_MAPPED)) {
> > -			zero_user_segment(page, 0, PAGE_SIZE);
> > -			SetPageUptodate(page);
> > -
> > -			/* imply err = 0, see erofs_map_blocks */
> > -			goto has_updated;
> > -		}
> > -
> > -		/* for RAW access mode, m_plen must be equal to m_llen */
> > -		DBG_BUGON(map.m_plen != map.m_llen);
> > -
> > -		blknr = erofs_blknr(map.m_pa);
> > -		blkoff = erofs_blkoff(map.m_pa);
> > -
> > -		/* deal with inline page */
> > -		if (map.m_flags & EROFS_MAP_META) {
> > -			void *vsrc, *vto;
> > -			struct page *ipage;
> > -
> > -			DBG_BUGON(map.m_plen > PAGE_SIZE);
> > -
> > -			ipage = erofs_get_meta_page(inode->i_sb, blknr);
> > -
> > -			if (IS_ERR(ipage)) {
> > -				err = PTR_ERR(ipage);
> > -				goto err_out;
> > -			}
> > -
> > -			vsrc = kmap_atomic(ipage);
> > -			vto = kmap_atomic(page);
> > -			memcpy(vto, vsrc + blkoff, map.m_plen);
> > -			memset(vto + map.m_plen, 0, PAGE_SIZE - map.m_plen);
> > -			kunmap_atomic(vto);
> > -			kunmap_atomic(vsrc);
> > -			flush_dcache_page(page);
> > -
> > -			SetPageUptodate(page);
> > -			/* TODO: could we unlock the page earlier? */
> > -			unlock_page(ipage);
> > -			put_page(ipage);
> > -
> > -			/* imply err = 0, see erofs_map_blocks */
> > -			goto has_updated;
> > -		}
> > -
> > -		/* pa must be block-aligned for raw reading */
> > -		DBG_BUGON(erofs_blkoff(map.m_pa));
> > -
> > -		/* max # of continuous pages */
> > -		if (nblocks > DIV_ROUND_UP(map.m_plen, PAGE_SIZE))
> > -			nblocks = DIV_ROUND_UP(map.m_plen, PAGE_SIZE);
> > -
> > -		*eblks = bio_max_segs(nblocks);
> > -		bio = bio_alloc(GFP_NOIO, *eblks);
> > -
> > -		bio->bi_end_io = erofs_readendio;
> > -		bio_set_dev(bio, sb->s_bdev);
> > -		bio->bi_iter.bi_sector = (sector_t)blknr <<
> > -			LOG_SECTORS_PER_BLOCK;
> > -		bio->bi_opf = REQ_OP_READ | (ra ? REQ_RAHEAD : 0);
> > -	}
> > -
> > -	err = bio_add_page(bio, page, PAGE_SIZE, 0);
> > -	/* out of the extent or bio is full */
> > -	if (err < PAGE_SIZE)
> > -		goto submit_bio_retry;
> > -	--*eblks;
> > -	*last_block = current_block;
> > -	return bio;
> > -
> > -err_out:
> > -	/* for sync reading, set page error immediately */
> > -	if (!ra) {
> > -		SetPageError(page);
> > -		ClearPageUptodate(page);
> > -	}
> > -has_updated:
> > -	unlock_page(page);
> > -
> > -	/* if updated manually, continuous pages has a gap */
> > -	if (bio)
> > -		submit_bio(bio);
> > -	return err ? ERR_PTR(err) : NULL;
> > -}
> > -
> > -/*
> > - * since we dont have write or truncate flows, so no inode
> > - * locking needs to be held at the moment.
> > - */
> > -static int erofs_raw_access_readpage(struct file *file, struct page *page)
> > -{
> > -	erofs_off_t last_block;
> > -	unsigned int eblks;
> > -	struct bio *bio;
> > -
> > -	trace_erofs_readpage(page, true);
> > -
> > -	bio = erofs_read_raw_page(NULL, page->mapping,
> > -				  page, &last_block, 1, &eblks, false);
> > -
> > -	if (IS_ERR(bio))
> > -		return PTR_ERR(bio);
> > -
> > -	if (bio)
> > -		submit_bio(bio);
> > -	return 0;
> > -}
> > -
> > -static void erofs_raw_access_readahead(struct readahead_control *rac)
> > -{
> > -	erofs_off_t last_block;
> > -	unsigned int eblks;
> > -	struct bio *bio = NULL;
> > -	struct page *page;
> > -
> > -	trace_erofs_readpages(rac->mapping->host, readahead_index(rac),
> > -			readahead_count(rac), true);
> > -
> > -	while ((page = readahead_page(rac))) {
> > -		prefetchw(&page->flags);
> > -
> > -		bio = erofs_read_raw_page(bio, rac->mapping, page, &last_block,
> > -				readahead_count(rac), &eblks, true);
> > -
> > -		/* all the page errors are ignored when readahead */
> > -		if (IS_ERR(bio)) {
> > -			pr_err("%s, readahead error at page %lu of nid %llu\n",
> > -			       __func__, page->index,
> > -			       EROFS_I(rac->mapping->host)->nid);
> > -
> > -			bio = NULL;
> > -		}
> > -
> > -		put_page(page);
> > -	}
> > -
> > -	if (bio)
> > -		submit_bio(bio);
> > -}
> > -
> > -static sector_t erofs_bmap(struct address_space *mapping, sector_t block)
> > -{
> > -	struct inode *inode = mapping->host;
> > -	struct erofs_map_blocks map = {
> > -		.m_la = blknr_to_addr(block),
> > -	};
> > -
> > -	if (EROFS_I(inode)->datalayout == EROFS_INODE_FLAT_INLINE) {
> > -		erofs_blk_t blks = i_size_read(inode) >> LOG_BLOCK_SIZE;
> > -
> > -		if (block >> LOG_SECTORS_PER_BLOCK >= blks)
> > -			return 0;
> > -	}
> > -
> > -	if (!erofs_map_blocks_flatmode(inode, &map, EROFS_GET_BLOCKS_RAW))
> > -		return erofs_blknr(map.m_pa);
> > -
> > -	return 0;
> > -}
> > -
> >   static int erofs_iomap_begin(struct inode *inode, loff_t offset, loff_t length,
> >   		unsigned int flags, struct iomap *iomap, struct iomap *srcmap)
> >   {
> > @@ -327,6 +104,7 @@ static int erofs_iomap_begin(struct inode *inode, loff_t offset, loff_t length,
> >   	iomap->offset = map.m_la;
> >   	iomap->length = map.m_llen;
> >   	iomap->flags = 0;
> > +	iomap->private = NULL;
> >   	if (!(map.m_flags & EROFS_MAP_MAPPED)) {
> >   		iomap->type = IOMAP_HOLE;
> > @@ -336,20 +114,61 @@ static int erofs_iomap_begin(struct inode *inode, loff_t offset, loff_t length,
> >   		return 0;
> >   	}
> > -	/* that shouldn't happen for now */
> >   	if (map.m_flags & EROFS_MAP_META) {
> > -		DBG_BUGON(1);
> > -		return -ENOTBLK;
> > +		struct page *ipage;
> > +
> > +		iomap->type = IOMAP_INLINE;
> > +		ipage = erofs_get_meta_page(inode->i_sb,
> > +					    erofs_blknr(map.m_pa));
> 
> Error handling for erofs_get_meta_page()?

Yes, will update. Thanks for pointing out.

Thanks,
Gao Xiang

> 
> Thanks
> 
> > +		iomap->inline_data = page_address(ipage) +
> > +					erofs_blkoff(map.m_pa);
> > +		iomap->private = ipage;
> > +	} else {
> > +		iomap->type = IOMAP_MAPPED;
> > +		iomap->addr = map.m_pa;
> >   	}
> > -	iomap->type = IOMAP_MAPPED;
> > -	iomap->addr = map.m_pa;
> >   	return 0;
> >   }
> > +static int erofs_iomap_end(struct inode *inode, loff_t pos, loff_t length,
> > +		ssize_t written, unsigned flags, struct iomap *iomap)
> > +{
> > +	struct page *ipage = iomap->private;
> > +
> > +	if (ipage) {
> > +		DBG_BUGON(iomap->type != IOMAP_INLINE);
> > +		unlock_page(ipage);
> > +		put_page(ipage);
> > +	} else {
> > +		DBG_BUGON(iomap->type == IOMAP_INLINE);
> > +	}
> > +	return written;
> > +}
> > +
> >   const struct iomap_ops erofs_iomap_ops = {
> >   	.iomap_begin = erofs_iomap_begin,
> > +	.iomap_end = erofs_iomap_end,
> >   };
> > +/*
> > + * since we dont have write or truncate flows, so no inode
> > + * locking needs to be held at the moment.
> > + */
> > +static int erofs_readpage(struct file *file, struct page *page)
> > +{
> > +	return iomap_readpage(page, &erofs_iomap_ops);
> > +}
> > +
> > +static void erofs_readahead(struct readahead_control *rac)
> > +{
> > +	return iomap_readahead(rac, &erofs_iomap_ops);
> > +}
> > +
> > +static sector_t erofs_bmap(struct address_space *mapping, sector_t block)
> > +{
> > +	return iomap_bmap(mapping, block, &erofs_iomap_ops);
> > +}
> > +
> >   static int erofs_prepare_dio(struct kiocb *iocb, struct iov_iter *to)
> >   {
> >   	struct inode *inode = file_inode(iocb->ki_filp);
> > @@ -365,15 +184,6 @@ static int erofs_prepare_dio(struct kiocb *iocb, struct iov_iter *to)
> >   	if (align & blksize_mask)
> >   		return -EINVAL;
> > -
> > -	/*
> > -	 * Temporarily fall back tail-packing inline to buffered I/O instead
> > -	 * since tail-packing inline support relies on an iomap core update.
> > -	 */
> > -	if (EROFS_I(inode)->datalayout == EROFS_INODE_FLAT_INLINE &&
> > -	    iocb->ki_pos + iov_iter_count(to) >
> > -			rounddown(inode->i_size, EROFS_BLKSIZ))
> > -		return 1;
> >   	return 0;
> >   }
> > @@ -409,8 +219,8 @@ static ssize_t erofs_file_read_iter(struct kiocb *iocb, struct iov_iter *to)
> >   /* for uncompressed (aligned) files and raw access for other files */
> >   const struct address_space_operations erofs_raw_access_aops = {
> > -	.readpage = erofs_raw_access_readpage,
> > -	.readahead = erofs_raw_access_readahead,
> > +	.readpage = erofs_readpage,
> > +	.readahead = erofs_readahead,
> >   	.bmap = erofs_bmap,
> >   	.direct_IO = noop_direct_IO,
> >   };
> > 

^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2021-08-04 11:23 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-07-30 19:46 [PATCH v2 0/3] erofs: iomap support for uncompressed cases Gao Xiang
2021-07-30 19:46 ` [PATCH v2 1/3] erofs: iomap support for non-tailpacking DIO Gao Xiang
2021-08-04  2:57   ` Chao Yu
2021-08-04  4:30     ` Gao Xiang
2021-08-04  4:52       ` Gao Xiang
2021-07-30 19:46 ` [PATCH v2 2/3] erofs: dax support for non-tailpacking regular file Gao Xiang
2021-08-04  7:14   ` Chao Yu
2021-08-04 11:21     ` Gao Xiang
2021-07-30 19:46 ` [PATCH v2 3/3] erofs: convert all uncompressed cases to iomap Gao Xiang
2021-08-04  7:17   ` Chao Yu
2021-08-04 11:22     ` Gao Xiang

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).