All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH RFC 0/8] xfs: introduce inode data inline feature
@ 2018-07-06  3:12 Shan Hai
  2018-07-06  3:12 ` [PATCH RFC 1/8] xfs: introduce inline data superblock feature bit Shan Hai
                   ` (10 more replies)
  0 siblings, 11 replies; 50+ messages in thread
From: Shan Hai @ 2018-07-06  3:12 UTC (permalink / raw)
  To: linux-xfs


This series implements xfs inode data inlining feature.

Refered below link during development:
https://marc.info/?l=linux-xfs&m=120493585731509&w=2


How it works:
- the data inlining happens at:
  write_iter: for DIO/DAX write
  writeback: for buffered write
- extents to local format conversion is done in writeback but not in write_iter
- local to extents format conversion is done in the write_iter and writeback
- log the whole inode (core/data) on extents to local conversion
- add a new mkfs.xfs option to enable data inline feature

How to test:
mkfs.xfs -f -i size=2048,inline=1 -m crc=1,finobt=1,rmapbt=1,reflink=1 /dev/sdX
mount /dev/sdX mnt_pnt

Test results:
mkfs.xfs -f -i size=2048,inline=1 -m crc=1,finobt=1,rmapbt=1,reflink=1 /dev/sdb
mkfs.xfs -f -i size=2048          -m crc=1,finobt=1,rmapbt=1,reflink=1 /dev/sdd
mount /dev/sdb /mnt/sdb
mount /dev/sdd /mnt/sdd

Copy the linux v4.18-rc3 source code to /mnt/sdb and /mnt/sdd respectively:
Filesystem                        Size  Used Avail Use% Mounted on
/dev/sdb                           10G  2.4G  7.7G  24% /mnt/sdb
/dev/sdd                           10G  3.1G  6.9G  31% /mnt/sdd

mkfs.xfs -f -i size=2048,inline=1 -m crc=1,finobt=1,rmapbt=1,reflink=1 /dev/sdc
mkfs.xfs -f -i size=2048          -m crc=1,finobt=1,rmapbt=1,reflink=1 /dev/sde
mount /dev/sdc /mnt/sdc
mount /dev/sde /mnt/sde

Build the v4.18-rc3 kernel on /dev/sdc and /dev/sde respectively:
make x86_64_defconfig

Filesystem                        Size  Used Avail Use% Mounted on
/dev/sdc                           10G  689M  9.4G   7% /mnt/sdc
/dev/sde                           10G  694M  9.4G   7% /mnt/sde

Kernel part:
0001-xfs-introduce-inline-data-superblock-feature-bit.patch
0002-xfs-introduce-extents-to-local-conversion-helper.patch
0003-xfs-convert-inode-from-extents-to-local-format.patch
0004-xfs-implement-inline-data-read-write-code.patch
0005-xfs-consider-the-local-format-inode-in-misc-operatio.patch
0006-xfs-fix-imbalanced-locking.patch
0007-xfs-return-non-zero-blocks-for-inline-data.patch
0008-xfs-skip-local-format-inode-for-reflinking.patch

 fs/xfs/libxfs/xfs_bmap.c      |  78 +++++++++++++++++++++++++++++++-----
 fs/xfs/libxfs/xfs_bmap.h      |   4 ++
 fs/xfs/libxfs/xfs_format.h    |  10 ++++-
 fs/xfs/libxfs/xfs_fs.h        |   1 +
 fs/xfs/libxfs/xfs_inode_buf.c |   6 ---
 fs/xfs/libxfs/xfs_sb.c        |   2 +
 fs/xfs/scrub/inode.c          |   2 +-
 fs/xfs/xfs_aops.c             |  60 ++++++++++++++++++++++++++++
 fs/xfs/xfs_bmap_util.c        |  11 +++++-
 fs/xfs/xfs_file.c             | 244 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++-
 fs/xfs/xfs_inode.c            |  20 +++++++---
 fs/xfs/xfs_inode_item.c       |   5 ++-
 fs/xfs/xfs_ioctl.c            |   5 ++-
 fs/xfs/xfs_iomap.c            |   5 ++-
 fs/xfs/xfs_iops.c             |   8 +++-
 fs/xfs/xfs_log_recover.c      |   3 +-
 fs/xfs/xfs_reflink.c          |   6 +++
 17 files changed, 437 insertions(+), 33 deletions(-)

xfsprogs part:
0001-xfsprogs-add-inode-inline-data-support.patch 

 growfs/xfs_growfs.c | 14 +++++++++-----
 libxfs/xfs_format.h |  4 +++-
 libxfs/xfs_fs.h     |  1 +
 mkfs/xfs_mkfs.c     | 20 +++++++++++++++++---
 4 files changed, 30 insertions(+), 9 deletions(-)


Thanks
Shan Hai

^ permalink raw reply	[flat|nested] 50+ messages in thread

* [PATCH RFC 1/8] xfs: introduce inline data superblock feature bit
  2018-07-06  3:12 [PATCH RFC 0/8] xfs: introduce inode data inline feature Shan Hai
@ 2018-07-06  3:12 ` Shan Hai
  2018-07-06  3:34   ` Darrick J. Wong
  2018-07-06  3:12 ` [PATCH RFC 2/8] xfs: introduce extents to local conversion helper Shan Hai
                   ` (9 subsequent siblings)
  10 siblings, 1 reply; 50+ messages in thread
From: Shan Hai @ 2018-07-06  3:12 UTC (permalink / raw)
  To: linux-xfs

The data inlining is an on-disk format change, so introduce a new
superblock feature bit to work with the mkfs tool to handle the
inline data, the feature bit is set at mkfs time.

Signed-off-by: Shan Hai <shan.hai@oracle.com>
---
 fs/xfs/libxfs/xfs_format.h | 10 +++++++++-
 fs/xfs/libxfs/xfs_fs.h     |  1 +
 fs/xfs/libxfs/xfs_sb.c     |  2 ++
 3 files changed, 12 insertions(+), 1 deletion(-)

diff --git a/fs/xfs/libxfs/xfs_format.h b/fs/xfs/libxfs/xfs_format.h
index 059bc44c27e8..6066ba7fdaf7 100644
--- a/fs/xfs/libxfs/xfs_format.h
+++ b/fs/xfs/libxfs/xfs_format.h
@@ -469,10 +469,12 @@ xfs_sb_has_ro_compat_feature(
 #define XFS_SB_FEAT_INCOMPAT_FTYPE	(1 << 0)	/* filetype in dirent */
 #define XFS_SB_FEAT_INCOMPAT_SPINODES	(1 << 1)	/* sparse inode chunks */
 #define XFS_SB_FEAT_INCOMPAT_META_UUID	(1 << 2)	/* metadata UUID */
+#define XFS_SB_FEAT_INCOMPAT_INLINEDATA (1 << 3)	/* inline inode data */
 #define XFS_SB_FEAT_INCOMPAT_ALL \
 		(XFS_SB_FEAT_INCOMPAT_FTYPE|	\
 		 XFS_SB_FEAT_INCOMPAT_SPINODES|	\
-		 XFS_SB_FEAT_INCOMPAT_META_UUID)
+		 XFS_SB_FEAT_INCOMPAT_META_UUID|\
+		 XFS_SB_FEAT_INCOMPAT_INLINEDATA)
 
 #define XFS_SB_FEAT_INCOMPAT_UNKNOWN	~XFS_SB_FEAT_INCOMPAT_ALL
 static inline bool
@@ -550,6 +552,12 @@ static inline bool xfs_sb_version_hasreflink(struct xfs_sb *sbp)
 		(sbp->sb_features_ro_compat & XFS_SB_FEAT_RO_COMPAT_REFLINK);
 }
 
+static inline bool xfs_sb_version_hasinlinedata(struct xfs_sb *sbp)
+{
+	return (XFS_SB_VERSION_NUM(sbp) == XFS_SB_VERSION_5) &&
+		(sbp->sb_features_incompat & XFS_SB_FEAT_INCOMPAT_INLINEDATA);
+}
+
 /*
  * end of superblock version macros
  */
diff --git a/fs/xfs/libxfs/xfs_fs.h b/fs/xfs/libxfs/xfs_fs.h
index f3aa59302fef..ee1f7621c0e4 100644
--- a/fs/xfs/libxfs/xfs_fs.h
+++ b/fs/xfs/libxfs/xfs_fs.h
@@ -210,6 +210,7 @@ typedef struct xfs_fsop_resblks {
 #define XFS_FSOP_GEOM_FLAGS_SPINODES	0x40000	/* sparse inode chunks	*/
 #define XFS_FSOP_GEOM_FLAGS_RMAPBT	0x80000	/* reverse mapping btree */
 #define XFS_FSOP_GEOM_FLAGS_REFLINK	0x100000 /* files can share blocks */
+#define XFS_FSOP_GEOM_FLAGS_INLINEDATA	0x200000 /* inline data into inode */
 
 /*
  * Minimum and maximum sizes need for growth checks.
diff --git a/fs/xfs/libxfs/xfs_sb.c b/fs/xfs/libxfs/xfs_sb.c
index 350119eeaecb..4002ed105b6b 100644
--- a/fs/xfs/libxfs/xfs_sb.c
+++ b/fs/xfs/libxfs/xfs_sb.c
@@ -1068,6 +1068,8 @@ xfs_fs_geometry(
 		geo->logsectsize = sbp->sb_logsectsize;
 	else
 		geo->logsectsize = BBSIZE;
+	if (xfs_sb_version_hasinlinedata(sbp))
+		geo->flags |= XFS_FSOP_GEOM_FLAGS_INLINEDATA;
 	geo->rtsectsize = sbp->sb_blocksize;
 	geo->dirblocksize = xfs_dir2_dirblock_bytes(sbp);
 
-- 
2.11.0


^ permalink raw reply related	[flat|nested] 50+ messages in thread

* [PATCH RFC 2/8] xfs: introduce extents to local conversion helper
  2018-07-06  3:12 [PATCH RFC 0/8] xfs: introduce inode data inline feature Shan Hai
  2018-07-06  3:12 ` [PATCH RFC 1/8] xfs: introduce inline data superblock feature bit Shan Hai
@ 2018-07-06  3:12 ` Shan Hai
  2018-07-06  3:45   ` Darrick J. Wong
  2018-07-08 15:42   ` Christoph Hellwig
  2018-07-06  3:12 ` [PATCH RFC 3/8] xfs: convert inode from extents to local format Shan Hai
                   ` (8 subsequent siblings)
  10 siblings, 2 replies; 50+ messages in thread
From: Shan Hai @ 2018-07-06  3:12 UTC (permalink / raw)
  To: linux-xfs

Delete the extents from xfs inode, copy the data into the data fork,
convert the inode from extents to local format and specify log the
core and data fork of the inode.

Signed-off-by: Shan Hai <shan.hai@oracle.com>
---
 fs/xfs/libxfs/xfs_bmap.c | 60 +++++++++++++++++++++++++++++++++++++++++++++++-
 fs/xfs/libxfs/xfs_bmap.h |  4 ++++
 2 files changed, 63 insertions(+), 1 deletion(-)

diff --git a/fs/xfs/libxfs/xfs_bmap.c b/fs/xfs/libxfs/xfs_bmap.c
index 7205268b30bc..bea6dc254a7d 100644
--- a/fs/xfs/libxfs/xfs_bmap.c
+++ b/fs/xfs/libxfs/xfs_bmap.c
@@ -1,6 +1,7 @@
 // SPDX-License-Identifier: GPL-2.0
 /*
  * Copyright (c) 2000-2006 Silicon Graphics, Inc.
+ * Copyright (c) 2018 Oracle.
  * All Rights Reserved.
  */
 #include "xfs.h"
@@ -213,7 +214,7 @@ xfs_default_attroffset(
  * attribute fork from local to extent format - we reset it where
  * possible to make space available for inline data fork extents.
  */
-STATIC void
+void
 xfs_bmap_forkoff_reset(
 	xfs_inode_t	*ip,
 	int		whichfork)
@@ -5141,6 +5142,63 @@ xfs_bmap_del_extent_real(
 }
 
 /*
+ * Convert an inode from extents to the local format.
+ * Free all the extents of the inode and reset it to the local
+ * format. Copy the contents of the inode's blocks to the inode's
+ * literal area.
+ */
+int
+xfs_bmap_extents_to_local(
+	xfs_trans_t		*tp,
+	xfs_inode_t		*ip,
+	struct xfs_defer_ops	*dfops,
+	int			*logflagsp,
+	int			whichfork,
+	struct page		*page)
+{
+	xfs_ifork_t		*ifp = XFS_IFORK_PTR(ip, whichfork);
+	xfs_fileoff_t		isize = i_size_read(VFS_I(ip));
+	struct xfs_bmbt_irec	got, del;
+	struct xfs_iext_cursor	icur;
+	int			error = 0;
+	int			tmp_logflags;
+	char			*kaddr = NULL;
+
+	ASSERT(whichfork != XFS_COW_FORK);
+	ASSERT(XFS_IFORK_FORMAT(ip, whichfork) == XFS_DINODE_FMT_EXTENTS);
+
+	if (xfs_iext_count(ifp) == 0)
+		goto init_local_fork;
+
+	for_each_xfs_iext(ifp, &icur, &got) {
+		del = got;
+		if (isnullstartblock(got.br_startblock)) {
+			error = xfs_bmap_del_extent_delay(ip,
+					whichfork, &icur, &got, &del);
+			if (error)
+				goto out;
+		} else {
+			error = xfs_bmap_del_extent_real(ip, tp, &icur, dfops,
+					NULL, &del, &tmp_logflags, whichfork, 0);
+			if (error)
+				goto out;
+			*logflagsp |= tmp_logflags;
+		}
+	}
+init_local_fork:
+	kaddr = kmap_atomic(page);
+	xfs_init_local_fork(ip, whichfork, kaddr, isize);
+	kunmap_atomic(kaddr);
+	ip->i_d.di_format = XFS_DINODE_FMT_LOCAL;
+	XFS_IFORK_NEXT_SET(ip, whichfork, 0);
+	ip->i_d.di_size = isize;
+	*logflagsp |= (XFS_ILOG_DDATA | XFS_ILOG_CORE);
+out:
+	return error;
+}
+
+
+/*
  * Unmap (remove) blocks from a file.
  * If nexts is nonzero then the number of extents to remove is limited to
  * that value.  If not all extents in the block range can be removed then
diff --git a/fs/xfs/libxfs/xfs_bmap.h b/fs/xfs/libxfs/xfs_bmap.h
index 9b49ddf99c41..22cd2642f1cd 100644
--- a/fs/xfs/libxfs/xfs_bmap.h
+++ b/fs/xfs/libxfs/xfs_bmap.h
@@ -271,6 +271,10 @@ int	xfs_bmap_map_extent(struct xfs_mount *mp, struct xfs_defer_ops *dfops,
 		struct xfs_inode *ip, struct xfs_bmbt_irec *imap);
 int	xfs_bmap_unmap_extent(struct xfs_mount *mp, struct xfs_defer_ops *dfops,
 		struct xfs_inode *ip, struct xfs_bmbt_irec *imap);
+int	xfs_bmap_extents_to_local(struct xfs_trans *tp, struct xfs_inode *ip,
+		struct xfs_defer_ops *dfops, int *flags, int whichfork,
+		struct page *page);
+void	xfs_bmap_forkoff_reset(struct xfs_inode *ip, int whichfork);
 
 static inline int xfs_bmap_fork_to_state(int whichfork)
 {
-- 
2.11.0


^ permalink raw reply related	[flat|nested] 50+ messages in thread

* [PATCH RFC 3/8] xfs: convert inode from extents to local format
  2018-07-06  3:12 [PATCH RFC 0/8] xfs: introduce inode data inline feature Shan Hai
  2018-07-06  3:12 ` [PATCH RFC 1/8] xfs: introduce inline data superblock feature bit Shan Hai
  2018-07-06  3:12 ` [PATCH RFC 2/8] xfs: introduce extents to local conversion helper Shan Hai
@ 2018-07-06  3:12 ` Shan Hai
  2018-07-06  3:47   ` Darrick J. Wong
  2018-07-06  3:12 ` [PATCH RFC 4/8] xfs: implement inline data read write code Shan Hai
                   ` (7 subsequent siblings)
  10 siblings, 1 reply; 50+ messages in thread
From: Shan Hai @ 2018-07-06  3:12 UTC (permalink / raw)
  To: linux-xfs

Introduce a function to convert an inode from extents to local
format. The conversion happens at writeback time, this avoids
interfering with the delayed allocation and page cache vs
xfs_buf operations, the will be inlined data is directly got
from the page which was read and modified by the iomap write actor.

Signed-off-by: Shan Hai <shan.hai@oracle.com>
---
 fs/xfs/xfs_aops.c | 60 +++++++++++++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 60 insertions(+)

diff --git a/fs/xfs/xfs_aops.c b/fs/xfs/xfs_aops.c
index 8eb3ba3d4d00..ac5a7695f363 100644
--- a/fs/xfs/xfs_aops.c
+++ b/fs/xfs/xfs_aops.c
@@ -9,6 +9,7 @@
 #include "xfs_log_format.h"
 #include "xfs_trans_resv.h"
 #include "xfs_mount.h"
+#include "xfs_defer.h"
 #include "xfs_inode.h"
 #include "xfs_trans.h"
 #include "xfs_inode_item.h"
@@ -887,6 +888,55 @@ xfs_map_cow(
 	return 0;
 }
 
+static int
+xfs_inode_extents_to_local(
+	xfs_inode_t		*ip,
+	int			whichfork,
+	struct page		*page)
+{
+	struct xfs_mount	*mp = ip->i_mount;
+	struct xfs_trans	*tp;
+	struct xfs_defer_ops	dfops;
+	xfs_fsblock_t		first_block;
+	int			logflags;
+	int			error = 0;
+
+	if (i_size_read(VFS_I(ip)) > XFS_IFORK_DSIZE(ip))
+		return 0;
+
+	error = xfs_trans_alloc(mp, &M_RES(mp)->tr_itruncate, 0, 0, 0, &tp);
+	if (error)
+		return error;
+
+	xfs_ilock(ip, XFS_ILOCK_EXCL);
+	xfs_trans_ijoin(tp, ip, 0);
+
+	xfs_defer_init(&dfops, &first_block);
+
+	logflags = 0;
+	error = xfs_bmap_extents_to_local(tp, ip, &dfops, &logflags,
+			XFS_DATA_FORK, page);
+	if (error)
+		goto trans_cancel;
+
+	xfs_trans_log_inode(tp, ip, logflags);
+	error = xfs_defer_finish(&tp, &dfops);
+	if (error)
+		goto trans_cancel;
+	error = xfs_trans_commit(tp);
+	if (error)
+		goto error0;
+	xfs_iunlock(ip, XFS_ILOCK_EXCL);
+
+	return 0;
+
+trans_cancel:
+	xfs_defer_cancel(&dfops);
+	xfs_trans_cancel(tp);
+error0:
+	xfs_iunlock(ip, XFS_ILOCK_EXCL);
+	return error;
+}
 /*
  * We implement an immediate ioend submission policy here to avoid needing to
  * chain multiple ioends and hence nest mempool allocations which can violate
@@ -920,6 +970,8 @@ xfs_writepage_map(
 	int			count = 0;
 	int			uptodate = 1;
 	unsigned int		new_type;
+	xfs_inode_t		*ip = XFS_I(inode);
+	struct xfs_mount	*mp = ip->i_mount;
 
 	bh = head = page_buffers(page);
 	offset = page_offset(page);
@@ -1044,6 +1096,14 @@ xfs_writepage_map(
 		end_page_writeback(page);
 	}
 
+	if (!error) {
+		if (xfs_sb_version_hasinlinedata(&mp->m_sb) &&
+				i_size_read(inode) <= XFS_IFORK_DSIZE(ip)) {
+			error = xfs_inode_extents_to_local(ip, XFS_DATA_FORK,
+								page);
+		}
+	}
+
 	mapping_set_error(page->mapping, error);
 	return error;
 }
-- 
2.11.0


^ permalink raw reply related	[flat|nested] 50+ messages in thread

* [PATCH RFC 4/8] xfs: implement inline data read write code
  2018-07-06  3:12 [PATCH RFC 0/8] xfs: introduce inode data inline feature Shan Hai
                   ` (2 preceding siblings ...)
  2018-07-06  3:12 ` [PATCH RFC 3/8] xfs: convert inode from extents to local format Shan Hai
@ 2018-07-06  3:12 ` Shan Hai
  2018-07-06  3:33   ` Darrick J. Wong
  2018-07-08 15:45   ` Christoph Hellwig
  2018-07-06  3:12 ` [PATCH RFC 5/8] xfs: consider the local format inode in misc operations Shan Hai
                   ` (6 subsequent siblings)
  10 siblings, 2 replies; 50+ messages in thread
From: Shan Hai @ 2018-07-06  3:12 UTC (permalink / raw)
  To: linux-xfs

Implement read/write functions for inline data access and use them
for buffered/DIO/DAX operations.

Signed-off-by: Shan Hai <shan.hai@oracle.com>
---
 fs/xfs/xfs_file.c | 244 +++++++++++++++++++++++++++++++++++++++++++++++++++++-
 1 file changed, 242 insertions(+), 2 deletions(-)

diff --git a/fs/xfs/xfs_file.c b/fs/xfs/xfs_file.c
index a3e7767a5715..55047928b720 100644
--- a/fs/xfs/xfs_file.c
+++ b/fs/xfs/xfs_file.c
@@ -27,6 +27,7 @@
 #include "xfs_pnfs.h"
 #include "xfs_iomap.h"
 #include "xfs_reflink.h"
+#include "xfs_defer.h"
 
 #include <linux/dcache.h>
 #include <linux/falloc.h>
@@ -175,6 +176,185 @@ xfs_file_fsync(
 	return error;
 }
 
+/*
+ * Read the inline data of a local format file.
+ */
+STATIC ssize_t
+xfs_file_inline_data_read(
+	struct xfs_inode	*ip,
+	int			whichfork,
+	struct kiocb		*iocb,
+	struct iov_iter		*to)
+{
+	xfs_ifork_t		*ifp = XFS_IFORK_PTR(ip, whichfork);
+	struct inode		*inode = VFS_I(ip);
+	loff_t			isize = i_size_read(inode);
+	loff_t			*ppos = &iocb->ki_pos;
+	size_t			count = iov_iter_count(to);
+	ssize_t			ret = 0;
+
+	ASSERT(XFS_IFORK_FORMAT(ip, whichfork) == XFS_DINODE_FMT_EXTENTS ||
+	       XFS_IFORK_FORMAT(ip, whichfork) == XFS_DINODE_FMT_LOCAL);
+
+	if (!count)
+		return 0;
+
+	if (unlikely(*ppos >= inode->i_sb->s_maxbytes))
+		return 0;
+
+	if (*ppos > isize)
+		return 0;
+
+	iov_iter_truncate(to, inode->i_sb->s_maxbytes);
+	count = (*ppos + count > isize) ? isize - *ppos : count;
+	ret = copy_to_iter(ifp->if_u1.if_data + *ppos, count, to);
+	*ppos += ret;
+
+	return ret;
+}
+
+/*
+ * Try to inline the data into the xfs inode data fork when it has enough
+ * space to hold it. The data inlining happens at below points:
+ *
+ * - writeback: for buffered writes
+ * - write_iter: for dax/dio writes
+ *
+ * The extents to local format conversion is done here for DAX/DIO write,
+ * while the same conversion is done in the writeback path for buffered write.
+ * This function also converts inode from local to extents when the data
+ * fork could not hold the data inline anymore.
+ */
+STATIC ssize_t
+xfs_file_inline_data_write(
+	struct xfs_inode	*ip,
+	int			whichfork,
+	struct kiocb		*iocb,
+	struct iov_iter		*from)
+{
+	struct xfs_mount	*mp = ip->i_mount;
+	xfs_ifork_t		*ifp = XFS_IFORK_PTR(ip, whichfork);
+	struct inode		*inode = VFS_I(ip);
+	struct address_space	*mapping = inode->i_mapping;
+	loff_t			pos = iocb->ki_pos;
+	size_t			count = iov_iter_count(from);
+	struct xfs_trans	*tp;
+	struct xfs_defer_ops	dfops;
+	xfs_fsblock_t		first_block;
+	int			ret;
+	int			error = 0;
+	int			logflags = 0;
+	loff_t			written = 0;
+	xfs_fileoff_t		bno = 0;
+	struct xfs_bmbt_irec	map;
+	int			nmap;
+	struct page		*page;
+	char			*kaddr;
+
+	ASSERT(XFS_IFORK_FORMAT(ip, whichfork) == XFS_DINODE_FMT_EXTENTS ||
+	       XFS_IFORK_FORMAT(ip, whichfork) == XFS_DINODE_FMT_LOCAL);
+
+	error = xfs_trans_alloc(mp, &M_RES(mp)->tr_write, 1, 0, 0, &tp);
+	if (error) {
+		ASSERT(error == -ENOSPC || XFS_FORCED_SHUTDOWN(mp));
+		return error;
+	}
+
+	xfs_ilock(ip, XFS_ILOCK_EXCL);
+	xfs_trans_ijoin(tp, ip, 0);
+	xfs_defer_init(&dfops, &first_block);
+
+	if (pos + count > XFS_IFORK_DSIZE(ip)) {
+		if (XFS_IFORK_FORMAT(ip, whichfork) == XFS_DINODE_FMT_EXTENTS) {
+			error = 0;
+			goto trans_cancel;
+		}
+
+		page = find_or_create_page(mapping, pos >> PAGE_SHIFT, GFP_NOFS);
+		if (!page) {
+			error = -ENOMEM;
+			goto trans_cancel;
+		}
+		kaddr = kmap_atomic(page);
+		memcpy(kaddr, ifp->if_u1.if_data, ifp->if_bytes);
+		kunmap_atomic(kaddr);
+		SetPageUptodate(page);
+		unlock_page(page);
+		put_page(page);
+
+		xfs_bmap_forkoff_reset(ip, whichfork);
+		ifp->if_flags &= ~XFS_IFINLINE;
+		ifp->if_flags |= XFS_IFEXTENTS;
+		ifp->if_u1.if_root = NULL;
+		ifp->if_height = 0;
+		ifp->if_bytes = 0;
+		XFS_IFORK_FMT_SET(ip, whichfork, XFS_DINODE_FMT_EXTENTS);
+
+		written = ifp->if_bytes;
+		error = xfs_bmap_first_unused(tp, ip, 1, &bno, whichfork);
+		if (error)
+			goto trans_cancel;
+		nmap = 1;
+		error = xfs_bmapi_write(tp, ip, bno, 1, XFS_BMAPI_ENTIRE,
+				&first_block, 1, &map, &nmap, &dfops);
+		if (error)
+			goto trans_cancel;
+
+		goto out_local_to_extents;
+	}
+
+	if (XFS_IFORK_FORMAT(ip, whichfork) == XFS_DINODE_FMT_LOCAL) {
+		xfs_idata_realloc(ip, (pos + count) - ifp->if_bytes, whichfork);
+		ret = copy_from_iter(ifp->if_u1.if_data + pos, count, from);
+		written = ret;
+		goto out;
+	}
+
+	if (IS_DAX(inode) || iocb->ki_flags & IOCB_DIRECT) {
+		struct page	*page = alloc_page(GFP_KERNEL | GFP_NOFS);
+		if (!page) {
+			error = -ENOMEM;
+			goto trans_cancel;
+		}
+
+		kaddr = kmap_atomic(page);
+		ret = copy_from_iter(kaddr + pos, count, from);
+		kunmap_atomic(kaddr);
+		written = ret;
+		error = xfs_bmap_extents_to_local(tp, ip, &dfops, &logflags,
+							XFS_DATA_FORK, page);
+		if (error) {
+			__free_page(page);
+			goto trans_cancel;
+		}
+
+		__free_page(page);
+		goto out;
+	}
+trans_cancel:
+	xfs_defer_cancel(&dfops);
+	xfs_trans_cancel(tp);
+	xfs_iunlock(ip, XFS_ILOCK_EXCL);
+	return error;
+out:
+	if (pos + written > i_size_read(inode))
+		i_size_write(inode, pos + written);
+	ip->i_d.di_size = pos + written;
+
+out_local_to_extents:
+	logflags |= (XFS_ILOG_DDATA | XFS_ILOG_CORE);
+	xfs_trans_log_inode(tp, ip, logflags);
+
+	error = xfs_defer_finish(&tp, &dfops);
+	if (error)
+		goto trans_cancel;
+	error = xfs_trans_commit(tp);
+	xfs_iunlock(ip, XFS_ILOCK_EXCL);
+
+	return written > 0 ? written : error;
+}
+
+
 STATIC ssize_t
 xfs_file_dio_aio_read(
 	struct kiocb		*iocb,
@@ -192,7 +372,12 @@ xfs_file_dio_aio_read(
 	file_accessed(iocb->ki_filp);
 
 	xfs_ilock(ip, XFS_IOLOCK_SHARED);
-	ret = iomap_dio_rw(iocb, to, &xfs_iomap_ops, NULL);
+	if (xfs_sb_version_hasinlinedata(&ip->i_mount->m_sb) &&
+	    XFS_IFORK_FORMAT(ip, XFS_DATA_FORK) == XFS_DINODE_FMT_LOCAL) {
+		ret = xfs_file_inline_data_read(ip, XFS_DATA_FORK, iocb, to);
+	} else {
+		ret = iomap_dio_rw(iocb, to, &xfs_iomap_ops, NULL);
+	}
 	xfs_iunlock(ip, XFS_IOLOCK_SHARED);
 
 	return ret;
@@ -219,6 +404,13 @@ xfs_file_dax_read(
 		xfs_ilock(ip, XFS_IOLOCK_SHARED);
 	}
 
+	if (xfs_sb_version_hasinlinedata(&ip->i_mount->m_sb) &&
+	    XFS_IFORK_FORMAT(ip, XFS_DATA_FORK) == XFS_DINODE_FMT_LOCAL) {
+		ret = xfs_file_inline_data_read(ip, XFS_DATA_FORK, iocb, to);
+	} else {
+		ret = dax_iomap_rw(iocb, to, &xfs_iomap_ops);
+	}
+
 	ret = dax_iomap_rw(iocb, to, &xfs_iomap_ops);
 	xfs_iunlock(ip, XFS_IOLOCK_SHARED);
 
@@ -242,7 +434,12 @@ xfs_file_buffered_aio_read(
 	} else {
 		xfs_ilock(ip, XFS_IOLOCK_SHARED);
 	}
-	ret = generic_file_read_iter(iocb, to);
+
+	if (XFS_IFORK_FORMAT(ip, XFS_DATA_FORK) == XFS_DINODE_FMT_LOCAL) {
+		ret = xfs_file_inline_data_read(ip, XFS_DATA_FORK, iocb, to);
+	} else {
+		ret = generic_file_read_iter(iocb, to);
+	}
 	xfs_iunlock(ip, XFS_IOLOCK_SHARED);
 
 	return ret;
@@ -487,6 +684,7 @@ xfs_file_dio_aio_write(
 	size_t			count = iov_iter_count(from);
 	struct xfs_buftarg      *target = XFS_IS_REALTIME_INODE(ip) ?
 					mp->m_rtdev_targp : mp->m_ddev_targp;
+	int			error;
 
 	/* DIO must be aligned to device logical sector size */
 	if ((iocb->ki_pos | count) & target->bt_logical_sectormask)
@@ -547,6 +745,22 @@ xfs_file_dio_aio_write(
 	}
 
 	trace_xfs_file_direct_write(ip, count, iocb->ki_pos);
+
+	if (xfs_sb_version_hasinlinedata(&mp->m_sb) &&
+		XFS_IFORK_FORMAT(ip, XFS_DATA_FORK) != XFS_DINODE_FMT_BTREE &&
+		S_ISREG(inode->i_mode)) {
+		error = xfs_file_inline_data_write(ip, XFS_DATA_FORK, iocb,
+							from);
+		if (error) {
+			ret = error;
+			goto out;
+		}
+		if (ip->i_d.di_format == XFS_DINODE_FMT_LOCAL) {
+			ret = count;
+			goto out;
+		}
+	}
+
 	ret = iomap_dio_rw(iocb, from, &xfs_iomap_ops, xfs_dio_write_end_io);
 out:
 	xfs_iunlock(ip, iolock);
@@ -586,6 +800,20 @@ xfs_file_dax_write(
 	count = iov_iter_count(from);
 
 	trace_xfs_file_dax_write(ip, count, pos);
+
+	if (xfs_sb_version_hasinlinedata(&ip->i_mount->m_sb) &&
+		ip->i_d.di_format != XFS_DINODE_FMT_BTREE &&
+		S_ISREG(inode->i_mode)) {
+		error = xfs_file_inline_data_write(ip, XFS_DATA_FORK,
+							iocb, from);
+		if (error)
+			goto out;
+		if (ip->i_d.di_format == XFS_DINODE_FMT_LOCAL) {
+			ret = count;
+			goto out;
+		}
+	}
+
 	ret = dax_iomap_rw(iocb, from, &xfs_iomap_ops);
 	if (ret > 0 && iocb->ki_pos > i_size_read(inode)) {
 		i_size_write(inode, iocb->ki_pos);
@@ -614,6 +842,7 @@ xfs_file_buffered_aio_write(
 	struct address_space	*mapping = file->f_mapping;
 	struct inode		*inode = mapping->host;
 	struct xfs_inode	*ip = XFS_I(inode);
+	struct xfs_mount	*mp = ip->i_mount;
 	ssize_t			ret;
 	int			enospc = 0;
 	int			iolock;
@@ -629,6 +858,17 @@ xfs_file_buffered_aio_write(
 	if (ret)
 		goto out;
 
+	if (xfs_sb_version_hasinlinedata(&mp->m_sb) &&
+		XFS_IFORK_FORMAT(ip, XFS_DATA_FORK) != XFS_DINODE_FMT_BTREE) {
+		ret = xfs_file_inline_data_write(ip, XFS_DATA_FORK, iocb, from);
+		if (XFS_IFORK_FORMAT(ip, XFS_DATA_FORK) ==
+							XFS_DINODE_FMT_LOCAL){
+			if (likely(ret >= 0))
+				iocb->ki_pos += ret;
+			goto out;
+		}
+	}
+
 	/* We can write back this queue in page reclaim */
 	current->backing_dev_info = inode_to_bdi(inode);
 
-- 
2.11.0


^ permalink raw reply related	[flat|nested] 50+ messages in thread

* [PATCH RFC 5/8] xfs: consider the local format inode in misc operations
  2018-07-06  3:12 [PATCH RFC 0/8] xfs: introduce inode data inline feature Shan Hai
                   ` (3 preceding siblings ...)
  2018-07-06  3:12 ` [PATCH RFC 4/8] xfs: implement inline data read write code Shan Hai
@ 2018-07-06  3:12 ` Shan Hai
  2018-07-06  3:40   ` Darrick J. Wong
  2018-07-08 15:51   ` Christoph Hellwig
  2018-07-06  3:12 ` [PATCH RFC 6/8] xfs: fix imbalanced locking Shan Hai
                   ` (5 subsequent siblings)
  10 siblings, 2 replies; 50+ messages in thread
From: Shan Hai @ 2018-07-06  3:12 UTC (permalink / raw)
  To: linux-xfs

The local format inode is a legal citizen from now on and consider
it in misc operations.

Signed-off-by: Shan Hai <shan.hai@oracle.com>
---
 fs/xfs/libxfs/xfs_bmap.c      | 18 ++++++++++--------
 fs/xfs/libxfs/xfs_inode_buf.c |  6 ------
 fs/xfs/scrub/inode.c          |  2 +-
 fs/xfs/xfs_bmap_util.c        | 11 +++++++++--
 fs/xfs/xfs_inode.c            | 20 ++++++++++++++------
 fs/xfs/xfs_inode_item.c       |  5 ++++-
 fs/xfs/xfs_iomap.c            |  5 +++--
 fs/xfs/xfs_log_recover.c      |  3 ++-
 8 files changed, 43 insertions(+), 27 deletions(-)

diff --git a/fs/xfs/libxfs/xfs_bmap.c b/fs/xfs/libxfs/xfs_bmap.c
index bea6dc254a7d..6b151bd15da7 100644
--- a/fs/xfs/libxfs/xfs_bmap.c
+++ b/fs/xfs/libxfs/xfs_bmap.c
@@ -830,11 +830,6 @@ xfs_bmap_local_to_extents(
 	struct xfs_bmbt_irec rec;
 	struct xfs_iext_cursor icur;
 
-	/*
-	 * We don't want to deal with the case of keeping inode data inline yet.
-	 * So sending the data fork of a regular inode is invalid.
-	 */
-	ASSERT(!(S_ISREG(VFS_I(ip)->i_mode) && whichfork == XFS_DATA_FORK));
 	ifp = XFS_IFORK_PTR(ip, whichfork);
 	ASSERT(XFS_IFORK_FORMAT(ip, whichfork) == XFS_DINODE_FMT_LOCAL);
 
@@ -3840,7 +3835,8 @@ xfs_bmapi_read(
 
 	if (unlikely(XFS_TEST_ERROR(
 	    (XFS_IFORK_FORMAT(ip, whichfork) != XFS_DINODE_FMT_EXTENTS &&
-	     XFS_IFORK_FORMAT(ip, whichfork) != XFS_DINODE_FMT_BTREE),
+	     XFS_IFORK_FORMAT(ip, whichfork) != XFS_DINODE_FMT_BTREE &&
+	     XFS_IFORK_FORMAT(ip, whichfork) != XFS_DINODE_FMT_LOCAL),
 	     mp, XFS_ERRTAG_BMAPIFORMAT))) {
 		XFS_ERROR_REPORT("xfs_bmapi_read", XFS_ERRLEVEL_LOW, mp);
 		return -EFSCORRUPTED;
@@ -3863,7 +3859,7 @@ xfs_bmapi_read(
 		return 0;
 	}
 
-	if (!(ifp->if_flags & XFS_IFEXTENTS)) {
+	if (!(ifp->if_flags & (XFS_IFINLINE | XFS_IFEXTENTS))) {
 		error = xfs_iread_extents(NULL, ip, whichfork);
 		if (error)
 			return error;
@@ -4285,7 +4281,6 @@ xfs_bmapi_write(
 	       (flags & (XFS_BMAPI_CONVERT | XFS_BMAPI_COWFORK)) ==
 			(XFS_BMAPI_CONVERT | XFS_BMAPI_COWFORK));
 	ASSERT(len > 0);
-	ASSERT(XFS_IFORK_FORMAT(ip, whichfork) != XFS_DINODE_FMT_LOCAL);
 	ASSERT(xfs_isilocked(ip, XFS_ILOCK_EXCL));
 	ASSERT(!(flags & XFS_BMAPI_REMAP));
 
@@ -5244,11 +5239,18 @@ __xfs_bunmapi(
 	ifp = XFS_IFORK_PTR(ip, whichfork);
 	if (unlikely(
 	    XFS_IFORK_FORMAT(ip, whichfork) != XFS_DINODE_FMT_EXTENTS &&
+	    XFS_IFORK_FORMAT(ip, whichfork) != XFS_DINODE_FMT_LOCAL &&
 	    XFS_IFORK_FORMAT(ip, whichfork) != XFS_DINODE_FMT_BTREE)) {
 		XFS_ERROR_REPORT("xfs_bunmapi", XFS_ERRLEVEL_LOW,
 				 ip->i_mount);
 		return -EFSCORRUPTED;
 	}
+
+	if (XFS_IFORK_FORMAT(ip, whichfork) == XFS_DINODE_FMT_LOCAL) {
+		*rlen = 0;
+		return 0;
+	}
+
 	mp = ip->i_mount;
 	if (XFS_FORCED_SHUTDOWN(mp))
 		return -EIO;
diff --git a/fs/xfs/libxfs/xfs_inode_buf.c b/fs/xfs/libxfs/xfs_inode_buf.c
index 33dc34655ac3..cb3c4b308137 100644
--- a/fs/xfs/libxfs/xfs_inode_buf.c
+++ b/fs/xfs/libxfs/xfs_inode_buf.c
@@ -157,7 +157,6 @@ const struct xfs_buf_ops xfs_inode_buf_ra_ops = {
 	.verify_write = xfs_inode_buf_write_verify,
 };
 
-
 /*
  * This routine is called to map an inode to the buffer containing the on-disk
  * version of the inode.  It returns a pointer to the buffer containing the
@@ -384,12 +383,7 @@ xfs_dinode_verify_fork(
 
 	switch (XFS_DFORK_FORMAT(dip, whichfork)) {
 	case XFS_DINODE_FMT_LOCAL:
-		/*
-		 * no local regular files yet
-		 */
 		if (whichfork == XFS_DATA_FORK) {
-			if (S_ISREG(be16_to_cpu(dip->di_mode)))
-				return __this_address;
 			if (be64_to_cpu(dip->di_size) >
 					XFS_DFORK_SIZE(dip, mp, whichfork))
 				return __this_address;
diff --git a/fs/xfs/scrub/inode.c b/fs/xfs/scrub/inode.c
index 7a6208505980..1501983dd7aa 100644
--- a/fs/xfs/scrub/inode.c
+++ b/fs/xfs/scrub/inode.c
@@ -283,7 +283,7 @@ xfs_scrub_dinode(
 			xfs_scrub_ino_set_corrupt(sc, ino);
 		break;
 	case XFS_DINODE_FMT_LOCAL:
-		if (!S_ISDIR(mode) && !S_ISLNK(mode))
+		if (!S_ISREG(mode) && !S_ISDIR(mode) && !S_ISLNK(mode))
 			xfs_scrub_ino_set_corrupt(sc, ino);
 		break;
 	case XFS_DINODE_FMT_EXTENTS:
diff --git a/fs/xfs/xfs_bmap_util.c b/fs/xfs/xfs_bmap_util.c
index 83b1e8c6c18f..46f718177fd7 100644
--- a/fs/xfs/xfs_bmap_util.c
+++ b/fs/xfs/xfs_bmap_util.c
@@ -760,7 +760,8 @@ xfs_can_free_eofblocks(struct xfs_inode *ip, bool force)
 		return false;
 
 	/* If we haven't read in the extent list, then don't do it now. */
-	if (!(ip->i_df.if_flags & XFS_IFEXTENTS))
+	if (ip->i_d.di_format != XFS_DINODE_FMT_LOCAL &&
+		!(ip->i_df.if_flags & XFS_IFEXTENTS))
 		return false;
 
 	/*
@@ -768,7 +769,8 @@ xfs_can_free_eofblocks(struct xfs_inode *ip, bool force)
 	 * has delalloc blocks and we are forced to remove them.
 	 */
 	if (ip->i_d.di_flags & (XFS_DIFLAG_PREALLOC | XFS_DIFLAG_APPEND))
-		if (!force || ip->i_delayed_blks == 0)
+		if ((ip->i_d.di_format != XFS_DINODE_FMT_LOCAL) &&
+			(!force || ip->i_delayed_blks == 0))
 			return false;
 
 	return true;
@@ -792,6 +794,11 @@ xfs_free_eofblocks(
 	struct xfs_bmbt_irec	imap;
 	struct xfs_mount	*mp = ip->i_mount;
 
+	if (xfs_sb_version_hasinlinedata(&mp->m_sb) &&
+		ip->i_d.di_format == XFS_DINODE_FMT_LOCAL) {
+		ip->i_delayed_blks = 0;
+		return 0;
+	}
 	/*
 	 * Figure out if there are any blocks beyond the end
 	 * of the file.  If not, then there is nothing to do.
diff --git a/fs/xfs/xfs_inode.c b/fs/xfs/xfs_inode.c
index 5df4de666cc1..78b9790a7cd4 100644
--- a/fs/xfs/xfs_inode.c
+++ b/fs/xfs/xfs_inode.c
@@ -1563,7 +1563,6 @@ xfs_itruncate_extents_flags(
 	ASSERT(!XFS_NOT_DQATTACHED(mp, ip));
 
 	trace_xfs_itruncate_extents_start(ip, new_size);
-
 	flags |= xfs_bmapi_aflag(whichfork);
 
 	/*
@@ -1745,9 +1744,15 @@ xfs_inactive_truncate(
 	ip->i_d.di_size = 0;
 	xfs_trans_log_inode(tp, ip, XFS_ILOG_CORE);
 
-	error = xfs_itruncate_extents(&tp, ip, XFS_DATA_FORK, 0);
-	if (error)
-		goto error_trans_cancel;
+	if (xfs_sb_version_hasinlinedata(&mp->m_sb) &&
+		ip->i_d.di_format == XFS_DINODE_FMT_LOCAL) {
+		ASSERT(ip->i_d.di_nextents == 0);
+		ip->i_delayed_blks = 0;
+	} else {
+		error = xfs_itruncate_extents(&tp, ip, XFS_DATA_FORK, 0);
+		if (error)
+			goto error_trans_cancel;
+	}
 
 	ASSERT(ip->i_d.di_nextents == 0);
 
@@ -1879,6 +1884,7 @@ xfs_inactive(
 	if (VFS_I(ip)->i_mode == 0) {
 		ASSERT(ip->i_df.if_real_bytes == 0);
 		ASSERT(ip->i_df.if_broot_bytes == 0);
+		ASSERT(ip->i_delayed_blks == 0);
 		return;
 	}
 
@@ -1911,8 +1917,9 @@ xfs_inactive(
 
 	if (S_ISREG(VFS_I(ip)->i_mode) &&
 	    (ip->i_d.di_size != 0 || XFS_ISIZE(ip) != 0 ||
-	     ip->i_d.di_nextents > 0 || ip->i_delayed_blks > 0))
+	     ip->i_d.di_nextents >= 0 || ip->i_delayed_blks >= 0)) {
 		truncate = 1;
+	}
 
 	error = xfs_qm_dqattach(ip);
 	if (error)
@@ -3554,7 +3561,8 @@ xfs_iflush_int(
 	if (S_ISREG(VFS_I(ip)->i_mode)) {
 		if (XFS_TEST_ERROR(
 		    (ip->i_d.di_format != XFS_DINODE_FMT_EXTENTS) &&
-		    (ip->i_d.di_format != XFS_DINODE_FMT_BTREE),
+		    (ip->i_d.di_format != XFS_DINODE_FMT_BTREE) &&
+		    (ip->i_d.di_format != XFS_DINODE_FMT_LOCAL),
 		    mp, XFS_ERRTAG_IFLUSH_3)) {
 			xfs_alert_tag(mp, XFS_PTAG_IFLUSH,
 				"%s: Bad regular inode %Lu, ptr "PTR_FMT,
diff --git a/fs/xfs/xfs_inode_item.c b/fs/xfs/xfs_inode_item.c
index 2389c34c172d..52b297987a75 100644
--- a/fs/xfs/xfs_inode_item.c
+++ b/fs/xfs/xfs_inode_item.c
@@ -139,6 +139,7 @@ xfs_inode_item_format_data_fork(
 	struct xfs_log_iovec	**vecp)
 {
 	struct xfs_inode	*ip = iip->ili_inode;
+	struct xfs_mount	*mp = ip->i_mount;
 	size_t			data_bytes;
 
 	switch (ip->i_d.di_format) {
@@ -197,7 +198,9 @@ xfs_inode_item_format_data_fork(
 			ASSERT(ip->i_df.if_real_bytes == 0 ||
 			       ip->i_df.if_real_bytes >= data_bytes);
 			ASSERT(ip->i_df.if_u1.if_data != NULL);
-			ASSERT(ip->i_d.di_size > 0);
+			if (!xfs_sb_version_hasinlinedata(&mp->m_sb)) {
+				ASSERT(ip->i_d.di_size > 0);
+			}
 			xlog_copy_iovec(lv, vecp, XLOG_REG_TYPE_ILOCAL,
 					ip->i_df.if_u1.if_data, data_bytes);
 			ilf->ilf_dsize = (unsigned)data_bytes;
diff --git a/fs/xfs/xfs_iomap.c b/fs/xfs/xfs_iomap.c
index 55876dd02f0c..aa72966081c5 100644
--- a/fs/xfs/xfs_iomap.c
+++ b/fs/xfs/xfs_iomap.c
@@ -532,7 +532,8 @@ xfs_file_iomap_begin_delay(
 
 	if (unlikely(XFS_TEST_ERROR(
 	    (XFS_IFORK_FORMAT(ip, XFS_DATA_FORK) != XFS_DINODE_FMT_EXTENTS &&
-	     XFS_IFORK_FORMAT(ip, XFS_DATA_FORK) != XFS_DINODE_FMT_BTREE),
+	     XFS_IFORK_FORMAT(ip, XFS_DATA_FORK) != XFS_DINODE_FMT_BTREE &&
+	     XFS_IFORK_FORMAT(ip, XFS_DATA_FORK) != XFS_DINODE_FMT_LOCAL),
 	     mp, XFS_ERRTAG_BMAPIFORMAT))) {
 		XFS_ERROR_REPORT(__func__, XFS_ERRLEVEL_LOW, mp);
 		error = -EFSCORRUPTED;
@@ -541,7 +542,7 @@ xfs_file_iomap_begin_delay(
 
 	XFS_STATS_INC(mp, xs_blk_mapw);
 
-	if (!(ifp->if_flags & XFS_IFEXTENTS)) {
+	if (!(ifp->if_flags & (XFS_IFINLINE |XFS_IFEXTENTS))) {
 		error = xfs_iread_extents(NULL, ip, XFS_DATA_FORK);
 		if (error)
 			goto out_unlock;
diff --git a/fs/xfs/xfs_log_recover.c b/fs/xfs/xfs_log_recover.c
index b181b5f57a19..d8828f275d9a 100644
--- a/fs/xfs/xfs_log_recover.c
+++ b/fs/xfs/xfs_log_recover.c
@@ -3121,7 +3121,8 @@ xlog_recover_inode_pass2(
 
 	if (unlikely(S_ISREG(ldip->di_mode))) {
 		if ((ldip->di_format != XFS_DINODE_FMT_EXTENTS) &&
-		    (ldip->di_format != XFS_DINODE_FMT_BTREE)) {
+		    (ldip->di_format != XFS_DINODE_FMT_BTREE) &&
+		    (ldip->di_format != XFS_DINODE_FMT_LOCAL)) {
 			XFS_CORRUPTION_ERROR("xlog_recover_inode_pass2(3)",
 					 XFS_ERRLEVEL_LOW, mp, ldip,
 					 sizeof(*ldip));
-- 
2.11.0


^ permalink raw reply related	[flat|nested] 50+ messages in thread

* [PATCH RFC 6/8] xfs: fix imbalanced locking
  2018-07-06  3:12 [PATCH RFC 0/8] xfs: introduce inode data inline feature Shan Hai
                   ` (4 preceding siblings ...)
  2018-07-06  3:12 ` [PATCH RFC 5/8] xfs: consider the local format inode in misc operations Shan Hai
@ 2018-07-06  3:12 ` Shan Hai
  2018-07-08 15:53   ` Christoph Hellwig
  2018-07-06  3:12 ` [PATCH RFC 7/8] xfs: return non-zero blocks for inline data Shan Hai
                   ` (4 subsequent siblings)
  10 siblings, 1 reply; 50+ messages in thread
From: Shan Hai @ 2018-07-06  3:12 UTC (permalink / raw)
  To: linux-xfs

The XFS_ILOCK_SHARED type lock is held in the xfs_ioc_fsgetxattr
but does not released when the inode is in local format, fix it by
eleasing the lock on the local format inode.

Signed-off-by: Shan Hai <shan.hai@oracle.com>
---
 fs/xfs/xfs_ioctl.c | 5 ++++-
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/fs/xfs/xfs_ioctl.c b/fs/xfs/xfs_ioctl.c
index 0ef5ece5634c..246562615ccd 100644
--- a/fs/xfs/xfs_ioctl.c
+++ b/fs/xfs/xfs_ioctl.c
@@ -907,7 +907,10 @@ xfs_ioc_fsgetxattr(
 	} else {
 		if (ip->i_df.if_flags & XFS_IFEXTENTS)
 			fa.fsx_nextents = xfs_iext_count(&ip->i_df);
-		else
+		else if (ip->i_df.if_flags & XFS_IFINLINE) {
+			xfs_iunlock(ip, XFS_ILOCK_SHARED);
+			return 0;
+		} else
 			fa.fsx_nextents = ip->i_d.di_nextents;
 	}
 	xfs_iunlock(ip, XFS_ILOCK_SHARED);
-- 
2.11.0


^ permalink raw reply related	[flat|nested] 50+ messages in thread

* [PATCH RFC 7/8] xfs: return non-zero blocks for inline data
  2018-07-06  3:12 [PATCH RFC 0/8] xfs: introduce inode data inline feature Shan Hai
                   ` (5 preceding siblings ...)
  2018-07-06  3:12 ` [PATCH RFC 6/8] xfs: fix imbalanced locking Shan Hai
@ 2018-07-06  3:12 ` Shan Hai
  2018-07-08 15:54   ` Christoph Hellwig
  2018-07-11 13:08   ` Carlos Maiolino
  2018-07-06  3:12 ` [PATCH RFC 8/8] xfs: skip local format inode for reflinking Shan Hai
                   ` (3 subsequent siblings)
  10 siblings, 2 replies; 50+ messages in thread
From: Shan Hai @ 2018-07-06  3:12 UTC (permalink / raw)
  To: linux-xfs

Return non-zero blocks for inline data even though the inode has
no external blocks, otherwise the "ls -ls" would show zero blocks
occupied by the file.

Signed-off-by: Shan Hai <shan.hai@oracle.com>
---
 fs/xfs/xfs_iops.c | 8 +++++++-
 1 file changed, 7 insertions(+), 1 deletion(-)

diff --git a/fs/xfs/xfs_iops.c b/fs/xfs/xfs_iops.c
index 0fa29f39d658..63be1355a473 100644
--- a/fs/xfs/xfs_iops.c
+++ b/fs/xfs/xfs_iops.c
@@ -500,8 +500,14 @@ xfs_vn_getattr(
 	stat->atime = inode->i_atime;
 	stat->mtime = inode->i_mtime;
 	stat->ctime = inode->i_ctime;
-	stat->blocks =
+
+	if (xfs_sb_version_hasinlinedata(&mp->m_sb) &&
+		XFS_IFORK_FORMAT(ip, XFS_DATA_FORK) == XFS_DINODE_FMT_LOCAL) {
+			stat->blocks = BTOBB(XFS_ISIZE(ip));
+	} else {
+		stat->blocks =
 		XFS_FSB_TO_BB(mp, ip->i_d.di_nblocks + ip->i_delayed_blks);
+	}
 
 	if (ip->i_d.di_version == 3) {
 		if (request_mask & STATX_BTIME) {
-- 
2.11.0


^ permalink raw reply related	[flat|nested] 50+ messages in thread

* [PATCH RFC 8/8] xfs: skip local format inode for reflinking
  2018-07-06  3:12 [PATCH RFC 0/8] xfs: introduce inode data inline feature Shan Hai
                   ` (6 preceding siblings ...)
  2018-07-06  3:12 ` [PATCH RFC 7/8] xfs: return non-zero blocks for inline data Shan Hai
@ 2018-07-06  3:12 ` Shan Hai
  2018-07-06  3:26   ` Darrick J. Wong
  2018-07-06  3:12 ` [PATCH RFC 1/1] xfsprogs: add inode inline data support Shan Hai
                   ` (2 subsequent siblings)
  10 siblings, 1 reply; 50+ messages in thread
From: Shan Hai @ 2018-07-06  3:12 UTC (permalink / raw)
  To: linux-xfs

The local format inode holds the data inline in its data fork and
has no extents at all, so skip the inode for reflinking.

Signed-off-by: Shan Hai <shan.hai@oracle.com>
---
 fs/xfs/xfs_reflink.c | 6 ++++++
 1 file changed, 6 insertions(+)

diff --git a/fs/xfs/xfs_reflink.c b/fs/xfs/xfs_reflink.c
index 592fb2071a03..dfb3a85ec0e6 100644
--- a/fs/xfs/xfs_reflink.c
+++ b/fs/xfs/xfs_reflink.c
@@ -1507,6 +1507,12 @@ xfs_reflink_inode_has_shared_extents(
 	int				error;
 
 	ifp = XFS_IFORK_PTR(ip, XFS_DATA_FORK);
+
+	if (ifp->if_flags & XFS_IFINLINE) {
+		*has_shared = false;
+		return 0;
+	}
+
 	if (!(ifp->if_flags & XFS_IFEXTENTS)) {
 		error = xfs_iread_extents(tp, ip, XFS_DATA_FORK);
 		if (error)
-- 
2.11.0


^ permalink raw reply related	[flat|nested] 50+ messages in thread

* [PATCH RFC 1/1] xfsprogs: add inode inline data support
  2018-07-06  3:12 [PATCH RFC 0/8] xfs: introduce inode data inline feature Shan Hai
                   ` (7 preceding siblings ...)
  2018-07-06  3:12 ` [PATCH RFC 8/8] xfs: skip local format inode for reflinking Shan Hai
@ 2018-07-06  3:12 ` Shan Hai
  2018-07-06  3:35   ` Darrick J. Wong
  2018-07-06  3:51 ` [PATCH RFC 0/8] xfs: introduce inode data inline feature Darrick J. Wong
  2018-07-06  5:42 ` Dave Chinner
  10 siblings, 1 reply; 50+ messages in thread
From: Shan Hai @ 2018-07-06  3:12 UTC (permalink / raw)
  To: linux-xfs

Add a new mkfs command line option to enable the inode inline data
feature. The mkfs set a bit in the superblock to notify the kernel
that the data inlining is enabled, there is no extra options for
mount.

Signed-off-by: Shan Hai <shan.hai@oracle.com>
---
 growfs/xfs_growfs.c | 14 +++++++++-----
 libxfs/xfs_format.h |  4 +++-
 libxfs/xfs_fs.h     |  1 +
 mkfs/xfs_mkfs.c     | 20 +++++++++++++++++---
 4 files changed, 30 insertions(+), 9 deletions(-)

diff --git a/growfs/xfs_growfs.c b/growfs/xfs_growfs.c
index 366176b7..a370bf9d 100644
--- a/growfs/xfs_growfs.c
+++ b/growfs/xfs_growfs.c
@@ -60,13 +60,14 @@ report_info(
 	int		finobt_enabled,
 	int		spinodes,
 	int		rmapbt_enabled,
-	int		reflink_enabled)
+	int		reflink_enabled,
+	int		inlinedata_enabled)
 {
 	printf(_(
 	    "meta-data=%-22s isize=%-6u agcount=%u, agsize=%u blks\n"
 	    "         =%-22s sectsz=%-5u attr=%u, projid32bit=%u\n"
 	    "         =%-22s crc=%-8u finobt=%u spinodes=%u rmapbt=%u\n"
-	    "         =%-22s reflink=%u\n"
+	    "         =%-22s reflink=%u inline=%u\n"
 	    "data     =%-22s bsize=%-6u blocks=%llu, imaxpct=%u\n"
 	    "         =%-22s sunit=%-6u swidth=%u blks\n"
 	    "naming   =version %-14u bsize=%-6u ascii-ci=%d ftype=%d\n"
@@ -77,7 +78,7 @@ report_info(
 		mntpoint, geo.inodesize, geo.agcount, geo.agblocks,
 		"", geo.sectsize, attrversion, projid32bit,
 		"", crcs_enabled, finobt_enabled, spinodes, rmapbt_enabled,
-		"", reflink_enabled,
+		"", reflink_enabled, inlinedata_enabled,
 		"", geo.blocksize, (unsigned long long)geo.datablocks,
 			geo.imaxpct,
 		"", geo.sunit, geo.swidth,
@@ -133,6 +134,7 @@ main(int argc, char **argv)
 	int			spinodes;
 	int			rmapbt_enabled;
 	int			reflink_enabled;
+	int			inlinedata_enabled;
 	char			rpath[PATH_MAX];
 
 	progname = basename(argv[0]);
@@ -266,12 +268,14 @@ main(int argc, char **argv)
 	spinodes = geo.flags & XFS_FSOP_GEOM_FLAGS_SPINODES ? 1 : 0;
 	rmapbt_enabled = geo.flags & XFS_FSOP_GEOM_FLAGS_RMAPBT ? 1 : 0;
 	reflink_enabled = geo.flags & XFS_FSOP_GEOM_FLAGS_REFLINK ? 1 : 0;
+	inlinedata_enabled = geo.flags & XFS_FSOP_GEOM_FLAGS_INLINE ? 1 : 0;
 	if (nflag) {
 		report_info(geo, datadev, isint, logdev, rtdev,
 				lazycount, dirversion, logversion,
 				attrversion, projid32bit, crcs_enabled, ci,
 				ftype_enabled, finobt_enabled, spinodes,
-				rmapbt_enabled, reflink_enabled);
+				rmapbt_enabled, reflink_enabled,
+				inlinedata_enabled);
 		exit(0);
 	}
 
@@ -310,7 +314,7 @@ main(int argc, char **argv)
 			lazycount, dirversion, logversion,
 			attrversion, projid32bit, crcs_enabled, ci, ftype_enabled,
 			finobt_enabled, spinodes, rmapbt_enabled,
-			reflink_enabled);
+			reflink_enabled, inlinedata_enabled);
 
 	ddsize = xi.dsize;
 	dlsize = ( xi.logBBsize? xi.logBBsize :
diff --git a/libxfs/xfs_format.h b/libxfs/xfs_format.h
index 48548933..2a765d12 100644
--- a/libxfs/xfs_format.h
+++ b/libxfs/xfs_format.h
@@ -478,10 +478,12 @@ xfs_sb_has_ro_compat_feature(
 #define XFS_SB_FEAT_INCOMPAT_FTYPE	(1 << 0)	/* filetype in dirent */
 #define XFS_SB_FEAT_INCOMPAT_SPINODES	(1 << 1)	/* sparse inode chunks */
 #define XFS_SB_FEAT_INCOMPAT_META_UUID	(1 << 2)	/* metadata UUID */
+#define XFS_SB_FEAT_INCOMPAT_INLINEDATA	(1 << 3)	/* inline inode data */
 #define XFS_SB_FEAT_INCOMPAT_ALL \
 		(XFS_SB_FEAT_INCOMPAT_FTYPE|	\
 		 XFS_SB_FEAT_INCOMPAT_SPINODES|	\
-		 XFS_SB_FEAT_INCOMPAT_META_UUID)
+		 XFS_SB_FEAT_INCOMPAT_META_UUID| \
+		 XFS_SB_FEAT_INCOMPAT_INLINEDATA)
 
 #define XFS_SB_FEAT_INCOMPAT_UNKNOWN	~XFS_SB_FEAT_INCOMPAT_ALL
 static inline bool
diff --git a/libxfs/xfs_fs.h b/libxfs/xfs_fs.h
index 86a379f6..40737fb9 100644
--- a/libxfs/xfs_fs.h
+++ b/libxfs/xfs_fs.h
@@ -243,6 +243,7 @@ typedef struct xfs_fsop_resblks {
 #define XFS_FSOP_GEOM_FLAGS_SPINODES	0x40000	/* sparse inode chunks	*/
 #define XFS_FSOP_GEOM_FLAGS_RMAPBT	0x80000	/* reverse mapping btree */
 #define XFS_FSOP_GEOM_FLAGS_REFLINK	0x100000 /* files can share blocks */
+#define XFS_FSOP_GEOM_FLAGS_INLINE	0x200000 /* inline data into inode */
 
 /*
  * Minimum and maximum sizes need for growth checks.
diff --git a/mkfs/xfs_mkfs.c b/mkfs/xfs_mkfs.c
index 78d0ce5d..b321761b 100644
--- a/mkfs/xfs_mkfs.c
+++ b/mkfs/xfs_mkfs.c
@@ -82,6 +82,7 @@ enum {
 	I_ATTR,
 	I_PROJID32BIT,
 	I_SPINODES,
+	I_INLINE,
 	I_MAX_OPTS,
 };
 
@@ -394,6 +395,7 @@ struct opt_params iopts = {
 		[I_ATTR] = "attr",
 		[I_PROJID32BIT] = "projid32bit",
 		[I_SPINODES] = "sparse",
+		[I_INLINE] = "inline",
 	},
 	.subopt_params = {
 		{ .index = I_ALIGN,
@@ -442,6 +444,12 @@ struct opt_params iopts = {
 		  .maxval = 1,
 		  .defaultval = 1,
 		},
+		{ .index = I_INLINE,
+		  .conflicts = { { NULL, LAST_CONFLICT } },
+		  .minval = 0,
+		  .maxval = 1,
+		  .defaultval = 0,
+		},
 	},
 };
 
@@ -743,6 +751,7 @@ struct sb_feat_args {
 	bool	spinodes;		/* XFS_SB_FEAT_INCOMPAT_SPINODES */
 	bool	rmapbt;			/* XFS_SB_FEAT_RO_COMPAT_RMAPBT */
 	bool	reflink;		/* XFS_SB_FEAT_RO_COMPAT_REFLINK */
+	bool	inlinedata;		/* XFS_SB_FEAT_INCOMPAT_INLINEDATA */
 	bool	nodalign;
 	bool	nortalign;
 };
@@ -871,7 +880,7 @@ usage( void )
 			    sectsize=num\n\
 /* force overwrite */	[-f]\n\
 /* inode size */	[-i log=n|perblock=n|size=num,maxpct=n,attr=0|1|2,\n\
-			    projid32bit=0|1,sparse=0|1]\n\
+			    projid32bit=0|1,sparse=0|1,inline=0|1]\n\
 /* no discard */	[-K]\n\
 /* log subvol */	[-l agnum=n,internal,size=num,logdev=xxx,version=n\n\
 			    sunit=value|su=num,sectsize=num,lazy-count=0|1]\n\
@@ -1506,6 +1515,9 @@ inode_opts_parser(
 	case I_SPINODES:
 		cli->sb_feat.spinodes = getnum(value, opts, subopt);
 		break;
+	case I_INLINE:
+		cli->sb_feat.inlinedata = getnum(value, opts, subopt);
+		break;
 	default:
 		return -EINVAL;
 	}
@@ -2903,6 +2915,8 @@ sb_set_features(
 		sbp->sb_features_incompat |= XFS_SB_FEAT_INCOMPAT_SPINODES;
 	}
 
+	if (fp->inlinedata)
+		sbp->sb_features_incompat |= XFS_SB_FEAT_INCOMPAT_INLINEDATA;
 }
 
 /*
@@ -3184,7 +3198,7 @@ print_mkfs_cfg(
 	printf(_(
 "meta-data=%-22s isize=%-6d agcount=%lld, agsize=%lld blks\n"
 "         =%-22s sectsz=%-5u attr=%u, projid32bit=%u\n"
-"         =%-22s crc=%-8u finobt=%u, sparse=%u, rmapbt=%u, reflink=%u\n"
+"         =%-22s crc=%-8u finobt=%u, sparse=%u, rmapbt=%u, reflink=%u, inline=%u\n"
 "data     =%-22s bsize=%-6u blocks=%llu, imaxpct=%u\n"
 "         =%-22s sunit=%-6u swidth=%u blks\n"
 "naming   =version %-14u bsize=%-6u ascii-ci=%d ftype=%d\n"
@@ -3195,7 +3209,7 @@ print_mkfs_cfg(
 			(long long)cfg->agsize,
 		"", cfg->sectorsize, fp->attr_version, fp->projid32bit,
 		"", fp->crcs_enabled, fp->finobt, fp->spinodes, fp->rmapbt,
-			fp->reflink,
+			fp->reflink, fp->inlinedata,
 		"", cfg->blocksize, (long long)cfg->dblocks, cfg->imaxpct,
 		"", cfg->dsunit, cfg->dswidth,
 		fp->dir_version, cfg->dirblocksize, fp->nci, fp->dirftype,
-- 
2.11.0


^ permalink raw reply related	[flat|nested] 50+ messages in thread

* Re: [PATCH RFC 8/8] xfs: skip local format inode for reflinking
  2018-07-06  3:12 ` [PATCH RFC 8/8] xfs: skip local format inode for reflinking Shan Hai
@ 2018-07-06  3:26   ` Darrick J. Wong
  2018-07-06  3:54     ` Shan Hai
  2018-07-08 16:00     ` Christoph Hellwig
  0 siblings, 2 replies; 50+ messages in thread
From: Darrick J. Wong @ 2018-07-06  3:26 UTC (permalink / raw)
  To: Shan Hai; +Cc: linux-xfs

On Fri, Jul 06, 2018 at 11:12:29AM +0800, Shan Hai wrote:
> The local format inode holds the data inline in its data fork and
> has no extents at all, so skip the inode for reflinking.
> 
> Signed-off-by: Shan Hai <shan.hai@oracle.com>
> ---
>  fs/xfs/xfs_reflink.c | 6 ++++++
>  1 file changed, 6 insertions(+)
> 
> diff --git a/fs/xfs/xfs_reflink.c b/fs/xfs/xfs_reflink.c
> index 592fb2071a03..dfb3a85ec0e6 100644
> --- a/fs/xfs/xfs_reflink.c
> +++ b/fs/xfs/xfs_reflink.c
> @@ -1507,6 +1507,12 @@ xfs_reflink_inode_has_shared_extents(
>  	int				error;
>  
>  	ifp = XFS_IFORK_PTR(ip, XFS_DATA_FORK);
> +
> +	if (ifp->if_flags & XFS_IFINLINE) {
> +		*has_shared = false;
> +		return 0;
> +	}

What happens if someone tries to reflink and either src or dest are an
inline file?

--D

> +
>  	if (!(ifp->if_flags & XFS_IFEXTENTS)) {
>  		error = xfs_iread_extents(tp, ip, XFS_DATA_FORK);
>  		if (error)
> -- 
> 2.11.0
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH RFC 4/8] xfs: implement inline data read write code
  2018-07-06  3:12 ` [PATCH RFC 4/8] xfs: implement inline data read write code Shan Hai
@ 2018-07-06  3:33   ` Darrick J. Wong
  2018-07-06  4:05     ` Shan Hai
  2018-07-08 15:45   ` Christoph Hellwig
  1 sibling, 1 reply; 50+ messages in thread
From: Darrick J. Wong @ 2018-07-06  3:33 UTC (permalink / raw)
  To: Shan Hai; +Cc: linux-xfs

On Fri, Jul 06, 2018 at 11:12:25AM +0800, Shan Hai wrote:
> Implement read/write functions for inline data access and use them
> for buffered/DIO/DAX operations.
> 
> Signed-off-by: Shan Hai <shan.hai@oracle.com>
> ---
>  fs/xfs/xfs_file.c | 244 +++++++++++++++++++++++++++++++++++++++++++++++++++++-
>  1 file changed, 242 insertions(+), 2 deletions(-)
> 
> diff --git a/fs/xfs/xfs_file.c b/fs/xfs/xfs_file.c
> index a3e7767a5715..55047928b720 100644
> --- a/fs/xfs/xfs_file.c
> +++ b/fs/xfs/xfs_file.c
> @@ -27,6 +27,7 @@
>  #include "xfs_pnfs.h"
>  #include "xfs_iomap.h"
>  #include "xfs_reflink.h"
> +#include "xfs_defer.h"
>  
>  #include <linux/dcache.h>
>  #include <linux/falloc.h>
> @@ -175,6 +176,185 @@ xfs_file_fsync(
>  	return error;
>  }
>  
> +/*
> + * Read the inline data of a local format file.
> + */
> +STATIC ssize_t
> +xfs_file_inline_data_read(
> +	struct xfs_inode	*ip,
> +	int			whichfork,
> +	struct kiocb		*iocb,
> +	struct iov_iter		*to)
> +{
> +	xfs_ifork_t		*ifp = XFS_IFORK_PTR(ip, whichfork);
> +	struct inode		*inode = VFS_I(ip);
> +	loff_t			isize = i_size_read(inode);
> +	loff_t			*ppos = &iocb->ki_pos;
> +	size_t			count = iov_iter_count(to);
> +	ssize_t			ret = 0;
> +
> +	ASSERT(XFS_IFORK_FORMAT(ip, whichfork) == XFS_DINODE_FMT_EXTENTS ||
> +	       XFS_IFORK_FORMAT(ip, whichfork) == XFS_DINODE_FMT_LOCAL);
> +
> +	if (!count)
> +		return 0;
> +
> +	if (unlikely(*ppos >= inode->i_sb->s_maxbytes))
> +		return 0;
> +
> +	if (*ppos > isize)
> +		return 0;
> +
> +	iov_iter_truncate(to, inode->i_sb->s_maxbytes);
> +	count = (*ppos + count > isize) ? isize - *ppos : count;
> +	ret = copy_to_iter(ifp->if_u1.if_data + *ppos, count, to);
> +	*ppos += ret;
> +
> +	return ret;
> +}
> +
> +/*
> + * Try to inline the data into the xfs inode data fork when it has enough
> + * space to hold it. The data inlining happens at below points:
> + *
> + * - writeback: for buffered writes
> + * - write_iter: for dax/dio writes
> + *
> + * The extents to local format conversion is done here for DAX/DIO write,
> + * while the same conversion is done in the writeback path for buffered write.
> + * This function also converts inode from local to extents when the data
> + * fork could not hold the data inline anymore.
> + */
> +STATIC ssize_t
> +xfs_file_inline_data_write(
> +	struct xfs_inode	*ip,
> +	int			whichfork,
> +	struct kiocb		*iocb,
> +	struct iov_iter		*from)
> +{
> +	struct xfs_mount	*mp = ip->i_mount;
> +	xfs_ifork_t		*ifp = XFS_IFORK_PTR(ip, whichfork);
> +	struct inode		*inode = VFS_I(ip);
> +	struct address_space	*mapping = inode->i_mapping;
> +	loff_t			pos = iocb->ki_pos;
> +	size_t			count = iov_iter_count(from);
> +	struct xfs_trans	*tp;
> +	struct xfs_defer_ops	dfops;
> +	xfs_fsblock_t		first_block;
> +	int			ret;
> +	int			error = 0;
> +	int			logflags = 0;
> +	loff_t			written = 0;
> +	xfs_fileoff_t		bno = 0;
> +	struct xfs_bmbt_irec	map;
> +	int			nmap;
> +	struct page		*page;
> +	char			*kaddr;
> +
> +	ASSERT(XFS_IFORK_FORMAT(ip, whichfork) == XFS_DINODE_FMT_EXTENTS ||
> +	       XFS_IFORK_FORMAT(ip, whichfork) == XFS_DINODE_FMT_LOCAL);
> +
> +	error = xfs_trans_alloc(mp, &M_RES(mp)->tr_write, 1, 0, 0, &tp);
> +	if (error) {
> +		ASSERT(error == -ENOSPC || XFS_FORCED_SHUTDOWN(mp));
> +		return error;
> +	}
> +
> +	xfs_ilock(ip, XFS_ILOCK_EXCL);
> +	xfs_trans_ijoin(tp, ip, 0);
> +	xfs_defer_init(&dfops, &first_block);
> +
> +	if (pos + count > XFS_IFORK_DSIZE(ip)) {
> +		if (XFS_IFORK_FORMAT(ip, whichfork) == XFS_DINODE_FMT_EXTENTS) {
> +			error = 0;
> +			goto trans_cancel;
> +		}
> +
> +		page = find_or_create_page(mapping, pos >> PAGE_SHIFT, GFP_NOFS);
> +		if (!page) {
> +			error = -ENOMEM;
> +			goto trans_cancel;
> +		}
> +		kaddr = kmap_atomic(page);
> +		memcpy(kaddr, ifp->if_u1.if_data, ifp->if_bytes);
> +		kunmap_atomic(kaddr);
> +		SetPageUptodate(page);
> +		unlock_page(page);
> +		put_page(page);
> +
> +		xfs_bmap_forkoff_reset(ip, whichfork);
> +		ifp->if_flags &= ~XFS_IFINLINE;
> +		ifp->if_flags |= XFS_IFEXTENTS;
> +		ifp->if_u1.if_root = NULL;
> +		ifp->if_height = 0;
> +		ifp->if_bytes = 0;
> +		XFS_IFORK_FMT_SET(ip, whichfork, XFS_DINODE_FMT_EXTENTS);
> +
> +		written = ifp->if_bytes;
> +		error = xfs_bmap_first_unused(tp, ip, 1, &bno, whichfork);
> +		if (error)
> +			goto trans_cancel;
> +		nmap = 1;
> +		error = xfs_bmapi_write(tp, ip, bno, 1, XFS_BMAPI_ENTIRE,
> +				&first_block, 1, &map, &nmap, &dfops);
> +		if (error)
> +			goto trans_cancel;
> +
> +		goto out_local_to_extents;
> +	}
> +
> +	if (XFS_IFORK_FORMAT(ip, whichfork) == XFS_DINODE_FMT_LOCAL) {
> +		xfs_idata_realloc(ip, (pos + count) - ifp->if_bytes, whichfork);
> +		ret = copy_from_iter(ifp->if_u1.if_data + pos, count, from);
> +		written = ret;
> +		goto out;
> +	}
> +
> +	if (IS_DAX(inode) || iocb->ki_flags & IOCB_DIRECT) {
> +		struct page	*page = alloc_page(GFP_KERNEL | GFP_NOFS);
> +		if (!page) {
> +			error = -ENOMEM;
> +			goto trans_cancel;
> +		}
> +
> +		kaddr = kmap_atomic(page);
> +		ret = copy_from_iter(kaddr + pos, count, from);
> +		kunmap_atomic(kaddr);
> +		written = ret;
> +		error = xfs_bmap_extents_to_local(tp, ip, &dfops, &logflags,
> +							XFS_DATA_FORK, page);
> +		if (error) {
> +			__free_page(page);
> +			goto trans_cancel;
> +		}
> +
> +		__free_page(page);
> +		goto out;
> +	}
> +trans_cancel:
> +	xfs_defer_cancel(&dfops);
> +	xfs_trans_cancel(tp);
> +	xfs_iunlock(ip, XFS_ILOCK_EXCL);
> +	return error;
> +out:
> +	if (pos + written > i_size_read(inode))
> +		i_size_write(inode, pos + written);
> +	ip->i_d.di_size = pos + written;
> +
> +out_local_to_extents:
> +	logflags |= (XFS_ILOG_DDATA | XFS_ILOG_CORE);
> +	xfs_trans_log_inode(tp, ip, logflags);
> +
> +	error = xfs_defer_finish(&tp, &dfops);
> +	if (error)
> +		goto trans_cancel;
> +	error = xfs_trans_commit(tp);
> +	xfs_iunlock(ip, XFS_ILOCK_EXCL);
> +
> +	return written > 0 ? written : error;
> +}
> +
> +
>  STATIC ssize_t
>  xfs_file_dio_aio_read(
>  	struct kiocb		*iocb,
> @@ -192,7 +372,12 @@ xfs_file_dio_aio_read(
>  	file_accessed(iocb->ki_filp);
>  
>  	xfs_ilock(ip, XFS_IOLOCK_SHARED);
> -	ret = iomap_dio_rw(iocb, to, &xfs_iomap_ops, NULL);
> +	if (xfs_sb_version_hasinlinedata(&ip->i_mount->m_sb) &&
> +	    XFS_IFORK_FORMAT(ip, XFS_DATA_FORK) == XFS_DINODE_FMT_LOCAL) {
> +		ret = xfs_file_inline_data_read(ip, XFS_DATA_FORK, iocb, to);

I thought iomap was growing the ability to handle inline data?  At least
for gfs2?

Hmm, you might ask Christoph about how to support that... also these
read/write paths are about to get a lot of rip/smash this cycle.

> +	} else {
> +		ret = iomap_dio_rw(iocb, to, &xfs_iomap_ops, NULL);
> +	}
>  	xfs_iunlock(ip, XFS_IOLOCK_SHARED);
>  
>  	return ret;
> @@ -219,6 +404,13 @@ xfs_file_dax_read(
>  		xfs_ilock(ip, XFS_IOLOCK_SHARED);
>  	}
>  
> +	if (xfs_sb_version_hasinlinedata(&ip->i_mount->m_sb) &&
> +	    XFS_IFORK_FORMAT(ip, XFS_DATA_FORK) == XFS_DINODE_FMT_LOCAL) {
> +		ret = xfs_file_inline_data_read(ip, XFS_DATA_FORK, iocb, to);
> +	} else {
> +		ret = dax_iomap_rw(iocb, to, &xfs_iomap_ops);
> +	}
> +
>  	ret = dax_iomap_rw(iocb, to, &xfs_iomap_ops);
>  	xfs_iunlock(ip, XFS_IOLOCK_SHARED);
>  
> @@ -242,7 +434,12 @@ xfs_file_buffered_aio_read(
>  	} else {
>  		xfs_ilock(ip, XFS_IOLOCK_SHARED);
>  	}
> -	ret = generic_file_read_iter(iocb, to);
> +
> +	if (XFS_IFORK_FORMAT(ip, XFS_DATA_FORK) == XFS_DINODE_FMT_LOCAL) {
> +		ret = xfs_file_inline_data_read(ip, XFS_DATA_FORK, iocb, to);
> +	} else {
> +		ret = generic_file_read_iter(iocb, to);
> +	}
>  	xfs_iunlock(ip, XFS_IOLOCK_SHARED);
>  
>  	return ret;
> @@ -487,6 +684,7 @@ xfs_file_dio_aio_write(
>  	size_t			count = iov_iter_count(from);
>  	struct xfs_buftarg      *target = XFS_IS_REALTIME_INODE(ip) ?
>  					mp->m_rtdev_targp : mp->m_ddev_targp;
> +	int			error;
>  
>  	/* DIO must be aligned to device logical sector size */
>  	if ((iocb->ki_pos | count) & target->bt_logical_sectormask)
> @@ -547,6 +745,22 @@ xfs_file_dio_aio_write(
>  	}
>  
>  	trace_xfs_file_direct_write(ip, count, iocb->ki_pos);
> +
> +	if (xfs_sb_version_hasinlinedata(&mp->m_sb) &&
> +		XFS_IFORK_FORMAT(ip, XFS_DATA_FORK) != XFS_DINODE_FMT_BTREE &&

!= BTREE? Why not == INLINE?

> +		S_ISREG(inode->i_mode)) {
> +		error = xfs_file_inline_data_write(ip, XFS_DATA_FORK, iocb,
> +							from);

int != ssize_t... also, if we decided we needed extents format but the
file turned out to be extents format already then did we lose the write
here?

> +		if (error) {
> +			ret = error;
> +			goto out;
> +		}
> +		if (ip->i_d.di_format == XFS_DINODE_FMT_LOCAL) {
> +			ret = count;
> +			goto out;
> +		}
> +	}
> +
>  	ret = iomap_dio_rw(iocb, from, &xfs_iomap_ops, xfs_dio_write_end_io);
>  out:
>  	xfs_iunlock(ip, iolock);
> @@ -586,6 +800,20 @@ xfs_file_dax_write(
>  	count = iov_iter_count(from);
>  
>  	trace_xfs_file_dax_write(ip, count, pos);
> +
> +	if (xfs_sb_version_hasinlinedata(&ip->i_mount->m_sb) &&
> +		ip->i_d.di_format != XFS_DINODE_FMT_BTREE &&
> +		S_ISREG(inode->i_mode)) {
> +		error = xfs_file_inline_data_write(ip, XFS_DATA_FORK,
> +							iocb, from);
> +		if (error)
> +			goto out;
> +		if (ip->i_d.di_format == XFS_DINODE_FMT_LOCAL) {
> +			ret = count;
> +			goto out;
> +		}
> +	}
> +
>  	ret = dax_iomap_rw(iocb, from, &xfs_iomap_ops);
>  	if (ret > 0 && iocb->ki_pos > i_size_read(inode)) {
>  		i_size_write(inode, iocb->ki_pos);
> @@ -614,6 +842,7 @@ xfs_file_buffered_aio_write(
>  	struct address_space	*mapping = file->f_mapping;
>  	struct inode		*inode = mapping->host;
>  	struct xfs_inode	*ip = XFS_I(inode);
> +	struct xfs_mount	*mp = ip->i_mount;
>  	ssize_t			ret;
>  	int			enospc = 0;
>  	int			iolock;
> @@ -629,6 +858,17 @@ xfs_file_buffered_aio_write(
>  	if (ret)
>  		goto out;
>  
> +	if (xfs_sb_version_hasinlinedata(&mp->m_sb) &&
> +		XFS_IFORK_FORMAT(ip, XFS_DATA_FORK) != XFS_DINODE_FMT_BTREE) {
> +		ret = xfs_file_inline_data_write(ip, XFS_DATA_FORK, iocb, from);
> +		if (XFS_IFORK_FORMAT(ip, XFS_DATA_FORK) ==
> +							XFS_DINODE_FMT_LOCAL){

What's going on with the indenting here?

--D

> +			if (likely(ret >= 0))
> +				iocb->ki_pos += ret;
> +			goto out;
> +		}
> +	}
> +
>  	/* We can write back this queue in page reclaim */
>  	current->backing_dev_info = inode_to_bdi(inode);
>  
> -- 
> 2.11.0
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH RFC 1/8] xfs: introduce inline data superblock feature bit
  2018-07-06  3:12 ` [PATCH RFC 1/8] xfs: introduce inline data superblock feature bit Shan Hai
@ 2018-07-06  3:34   ` Darrick J. Wong
  2018-07-06  4:06     ` Shan Hai
  0 siblings, 1 reply; 50+ messages in thread
From: Darrick J. Wong @ 2018-07-06  3:34 UTC (permalink / raw)
  To: Shan Hai; +Cc: linux-xfs

On Fri, Jul 06, 2018 at 11:12:22AM +0800, Shan Hai wrote:
> The data inlining is an on-disk format change, so introduce a new
> superblock feature bit to work with the mkfs tool to handle the
> inline data, the feature bit is set at mkfs time.
> 
> Signed-off-by: Shan Hai <shan.hai@oracle.com>
> ---
>  fs/xfs/libxfs/xfs_format.h | 10 +++++++++-
>  fs/xfs/libxfs/xfs_fs.h     |  1 +
>  fs/xfs/libxfs/xfs_sb.c     |  2 ++
>  3 files changed, 12 insertions(+), 1 deletion(-)
> 
> diff --git a/fs/xfs/libxfs/xfs_format.h b/fs/xfs/libxfs/xfs_format.h
> index 059bc44c27e8..6066ba7fdaf7 100644
> --- a/fs/xfs/libxfs/xfs_format.h
> +++ b/fs/xfs/libxfs/xfs_format.h
> @@ -469,10 +469,12 @@ xfs_sb_has_ro_compat_feature(
>  #define XFS_SB_FEAT_INCOMPAT_FTYPE	(1 << 0)	/* filetype in dirent */
>  #define XFS_SB_FEAT_INCOMPAT_SPINODES	(1 << 1)	/* sparse inode chunks */
>  #define XFS_SB_FEAT_INCOMPAT_META_UUID	(1 << 2)	/* metadata UUID */
> +#define XFS_SB_FEAT_INCOMPAT_INLINEDATA (1 << 3)	/* inline inode data */
>  #define XFS_SB_FEAT_INCOMPAT_ALL \
>  		(XFS_SB_FEAT_INCOMPAT_FTYPE|	\
>  		 XFS_SB_FEAT_INCOMPAT_SPINODES|	\
> -		 XFS_SB_FEAT_INCOMPAT_META_UUID)
> +		 XFS_SB_FEAT_INCOMPAT_META_UUID|\
> +		 XFS_SB_FEAT_INCOMPAT_INLINEDATA)

Uh... don't go enabling this feature in the _ALL flags until you've
finished putting all the code in...

>  
>  #define XFS_SB_FEAT_INCOMPAT_UNKNOWN	~XFS_SB_FEAT_INCOMPAT_ALL
>  static inline bool
> @@ -550,6 +552,12 @@ static inline bool xfs_sb_version_hasreflink(struct xfs_sb *sbp)
>  		(sbp->sb_features_ro_compat & XFS_SB_FEAT_RO_COMPAT_REFLINK);
>  }
>  
> +static inline bool xfs_sb_version_hasinlinedata(struct xfs_sb *sbp)
> +{
> +	return (XFS_SB_VERSION_NUM(sbp) == XFS_SB_VERSION_5) &&
> +		(sbp->sb_features_incompat & XFS_SB_FEAT_INCOMPAT_INLINEDATA);
> +}
> +
>  /*
>   * end of superblock version macros
>   */
> diff --git a/fs/xfs/libxfs/xfs_fs.h b/fs/xfs/libxfs/xfs_fs.h
> index f3aa59302fef..ee1f7621c0e4 100644
> --- a/fs/xfs/libxfs/xfs_fs.h
> +++ b/fs/xfs/libxfs/xfs_fs.h
> @@ -210,6 +210,7 @@ typedef struct xfs_fsop_resblks {
>  #define XFS_FSOP_GEOM_FLAGS_SPINODES	0x40000	/* sparse inode chunks	*/
>  #define XFS_FSOP_GEOM_FLAGS_RMAPBT	0x80000	/* reverse mapping btree */
>  #define XFS_FSOP_GEOM_FLAGS_REFLINK	0x100000 /* files can share blocks */
> +#define XFS_FSOP_GEOM_FLAGS_INLINEDATA	0x200000 /* inline data into inode */
>  
>  /*
>   * Minimum and maximum sizes need for growth checks.
> diff --git a/fs/xfs/libxfs/xfs_sb.c b/fs/xfs/libxfs/xfs_sb.c
> index 350119eeaecb..4002ed105b6b 100644
> --- a/fs/xfs/libxfs/xfs_sb.c
> +++ b/fs/xfs/libxfs/xfs_sb.c
> @@ -1068,6 +1068,8 @@ xfs_fs_geometry(
>  		geo->logsectsize = sbp->sb_logsectsize;
>  	else
>  		geo->logsectsize = BBSIZE;
> +	if (xfs_sb_version_hasinlinedata(sbp))
> +		geo->flags |= XFS_FSOP_GEOM_FLAGS_INLINEDATA;
>  	geo->rtsectsize = sbp->sb_blocksize;
>  	geo->dirblocksize = xfs_dir2_dirblock_bytes(sbp);
>  
> -- 
> 2.11.0
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH RFC 1/1] xfsprogs: add inode inline data support
  2018-07-06  3:12 ` [PATCH RFC 1/1] xfsprogs: add inode inline data support Shan Hai
@ 2018-07-06  3:35   ` Darrick J. Wong
  2018-07-06 19:14     ` Eric Sandeen
  0 siblings, 1 reply; 50+ messages in thread
From: Darrick J. Wong @ 2018-07-06  3:35 UTC (permalink / raw)
  To: Shan Hai; +Cc: linux-xfs

On Fri, Jul 06, 2018 at 11:12:30AM +0800, Shan Hai wrote:
> Add a new mkfs command line option to enable the inode inline data
> feature. The mkfs set a bit in the superblock to notify the kernel
> that the data inlining is enabled, there is no extra options for
> mount.
> 
> Signed-off-by: Shan Hai <shan.hai@oracle.com>
> ---
>  growfs/xfs_growfs.c | 14 +++++++++-----
>  libxfs/xfs_format.h |  4 +++-
>  libxfs/xfs_fs.h     |  1 +
>  mkfs/xfs_mkfs.c     | 20 +++++++++++++++++---
>  4 files changed, 30 insertions(+), 9 deletions(-)
> 
> diff --git a/growfs/xfs_growfs.c b/growfs/xfs_growfs.c
> index 366176b7..a370bf9d 100644
> --- a/growfs/xfs_growfs.c
> +++ b/growfs/xfs_growfs.c
> @@ -60,13 +60,14 @@ report_info(
>  	int		finobt_enabled,
>  	int		spinodes,
>  	int		rmapbt_enabled,
> -	int		reflink_enabled)
> +	int		reflink_enabled,
> +	int		inlinedata_enabled)
>  {
>  	printf(_(
>  	    "meta-data=%-22s isize=%-6u agcount=%u, agsize=%u blks\n"
>  	    "         =%-22s sectsz=%-5u attr=%u, projid32bit=%u\n"
>  	    "         =%-22s crc=%-8u finobt=%u spinodes=%u rmapbt=%u\n"
> -	    "         =%-22s reflink=%u\n"
> +	    "         =%-22s reflink=%u inline=%u\n"

I coulda sworn we refactored this into a library function...

Do xfs_db or xfs_repair require any changes?

Did xfstests have anything exciting to say about this feature?

--D

>  	    "data     =%-22s bsize=%-6u blocks=%llu, imaxpct=%u\n"
>  	    "         =%-22s sunit=%-6u swidth=%u blks\n"
>  	    "naming   =version %-14u bsize=%-6u ascii-ci=%d ftype=%d\n"
> @@ -77,7 +78,7 @@ report_info(
>  		mntpoint, geo.inodesize, geo.agcount, geo.agblocks,
>  		"", geo.sectsize, attrversion, projid32bit,
>  		"", crcs_enabled, finobt_enabled, spinodes, rmapbt_enabled,
> -		"", reflink_enabled,
> +		"", reflink_enabled, inlinedata_enabled,
>  		"", geo.blocksize, (unsigned long long)geo.datablocks,
>  			geo.imaxpct,
>  		"", geo.sunit, geo.swidth,
> @@ -133,6 +134,7 @@ main(int argc, char **argv)
>  	int			spinodes;
>  	int			rmapbt_enabled;
>  	int			reflink_enabled;
> +	int			inlinedata_enabled;
>  	char			rpath[PATH_MAX];
>  
>  	progname = basename(argv[0]);
> @@ -266,12 +268,14 @@ main(int argc, char **argv)
>  	spinodes = geo.flags & XFS_FSOP_GEOM_FLAGS_SPINODES ? 1 : 0;
>  	rmapbt_enabled = geo.flags & XFS_FSOP_GEOM_FLAGS_RMAPBT ? 1 : 0;
>  	reflink_enabled = geo.flags & XFS_FSOP_GEOM_FLAGS_REFLINK ? 1 : 0;
> +	inlinedata_enabled = geo.flags & XFS_FSOP_GEOM_FLAGS_INLINE ? 1 : 0;
>  	if (nflag) {
>  		report_info(geo, datadev, isint, logdev, rtdev,
>  				lazycount, dirversion, logversion,
>  				attrversion, projid32bit, crcs_enabled, ci,
>  				ftype_enabled, finobt_enabled, spinodes,
> -				rmapbt_enabled, reflink_enabled);
> +				rmapbt_enabled, reflink_enabled,
> +				inlinedata_enabled);
>  		exit(0);
>  	}
>  
> @@ -310,7 +314,7 @@ main(int argc, char **argv)
>  			lazycount, dirversion, logversion,
>  			attrversion, projid32bit, crcs_enabled, ci, ftype_enabled,
>  			finobt_enabled, spinodes, rmapbt_enabled,
> -			reflink_enabled);
> +			reflink_enabled, inlinedata_enabled);
>  
>  	ddsize = xi.dsize;
>  	dlsize = ( xi.logBBsize? xi.logBBsize :
> diff --git a/libxfs/xfs_format.h b/libxfs/xfs_format.h
> index 48548933..2a765d12 100644
> --- a/libxfs/xfs_format.h
> +++ b/libxfs/xfs_format.h
> @@ -478,10 +478,12 @@ xfs_sb_has_ro_compat_feature(
>  #define XFS_SB_FEAT_INCOMPAT_FTYPE	(1 << 0)	/* filetype in dirent */
>  #define XFS_SB_FEAT_INCOMPAT_SPINODES	(1 << 1)	/* sparse inode chunks */
>  #define XFS_SB_FEAT_INCOMPAT_META_UUID	(1 << 2)	/* metadata UUID */
> +#define XFS_SB_FEAT_INCOMPAT_INLINEDATA	(1 << 3)	/* inline inode data */
>  #define XFS_SB_FEAT_INCOMPAT_ALL \
>  		(XFS_SB_FEAT_INCOMPAT_FTYPE|	\
>  		 XFS_SB_FEAT_INCOMPAT_SPINODES|	\
> -		 XFS_SB_FEAT_INCOMPAT_META_UUID)
> +		 XFS_SB_FEAT_INCOMPAT_META_UUID| \
> +		 XFS_SB_FEAT_INCOMPAT_INLINEDATA)
>  
>  #define XFS_SB_FEAT_INCOMPAT_UNKNOWN	~XFS_SB_FEAT_INCOMPAT_ALL
>  static inline bool
> diff --git a/libxfs/xfs_fs.h b/libxfs/xfs_fs.h
> index 86a379f6..40737fb9 100644
> --- a/libxfs/xfs_fs.h
> +++ b/libxfs/xfs_fs.h
> @@ -243,6 +243,7 @@ typedef struct xfs_fsop_resblks {
>  #define XFS_FSOP_GEOM_FLAGS_SPINODES	0x40000	/* sparse inode chunks	*/
>  #define XFS_FSOP_GEOM_FLAGS_RMAPBT	0x80000	/* reverse mapping btree */
>  #define XFS_FSOP_GEOM_FLAGS_REFLINK	0x100000 /* files can share blocks */
> +#define XFS_FSOP_GEOM_FLAGS_INLINE	0x200000 /* inline data into inode */
>  
>  /*
>   * Minimum and maximum sizes need for growth checks.
> diff --git a/mkfs/xfs_mkfs.c b/mkfs/xfs_mkfs.c
> index 78d0ce5d..b321761b 100644
> --- a/mkfs/xfs_mkfs.c
> +++ b/mkfs/xfs_mkfs.c
> @@ -82,6 +82,7 @@ enum {
>  	I_ATTR,
>  	I_PROJID32BIT,
>  	I_SPINODES,
> +	I_INLINE,
>  	I_MAX_OPTS,
>  };
>  
> @@ -394,6 +395,7 @@ struct opt_params iopts = {
>  		[I_ATTR] = "attr",
>  		[I_PROJID32BIT] = "projid32bit",
>  		[I_SPINODES] = "sparse",
> +		[I_INLINE] = "inline",
>  	},
>  	.subopt_params = {
>  		{ .index = I_ALIGN,
> @@ -442,6 +444,12 @@ struct opt_params iopts = {
>  		  .maxval = 1,
>  		  .defaultval = 1,
>  		},
> +		{ .index = I_INLINE,
> +		  .conflicts = { { NULL, LAST_CONFLICT } },
> +		  .minval = 0,
> +		  .maxval = 1,
> +		  .defaultval = 0,
> +		},
>  	},
>  };
>  
> @@ -743,6 +751,7 @@ struct sb_feat_args {
>  	bool	spinodes;		/* XFS_SB_FEAT_INCOMPAT_SPINODES */
>  	bool	rmapbt;			/* XFS_SB_FEAT_RO_COMPAT_RMAPBT */
>  	bool	reflink;		/* XFS_SB_FEAT_RO_COMPAT_REFLINK */
> +	bool	inlinedata;		/* XFS_SB_FEAT_INCOMPAT_INLINEDATA */
>  	bool	nodalign;
>  	bool	nortalign;
>  };
> @@ -871,7 +880,7 @@ usage( void )
>  			    sectsize=num\n\
>  /* force overwrite */	[-f]\n\
>  /* inode size */	[-i log=n|perblock=n|size=num,maxpct=n,attr=0|1|2,\n\
> -			    projid32bit=0|1,sparse=0|1]\n\
> +			    projid32bit=0|1,sparse=0|1,inline=0|1]\n\
>  /* no discard */	[-K]\n\
>  /* log subvol */	[-l agnum=n,internal,size=num,logdev=xxx,version=n\n\
>  			    sunit=value|su=num,sectsize=num,lazy-count=0|1]\n\
> @@ -1506,6 +1515,9 @@ inode_opts_parser(
>  	case I_SPINODES:
>  		cli->sb_feat.spinodes = getnum(value, opts, subopt);
>  		break;
> +	case I_INLINE:
> +		cli->sb_feat.inlinedata = getnum(value, opts, subopt);
> +		break;
>  	default:
>  		return -EINVAL;
>  	}
> @@ -2903,6 +2915,8 @@ sb_set_features(
>  		sbp->sb_features_incompat |= XFS_SB_FEAT_INCOMPAT_SPINODES;
>  	}
>  
> +	if (fp->inlinedata)
> +		sbp->sb_features_incompat |= XFS_SB_FEAT_INCOMPAT_INLINEDATA;
>  }
>  
>  /*
> @@ -3184,7 +3198,7 @@ print_mkfs_cfg(
>  	printf(_(
>  "meta-data=%-22s isize=%-6d agcount=%lld, agsize=%lld blks\n"
>  "         =%-22s sectsz=%-5u attr=%u, projid32bit=%u\n"
> -"         =%-22s crc=%-8u finobt=%u, sparse=%u, rmapbt=%u, reflink=%u\n"
> +"         =%-22s crc=%-8u finobt=%u, sparse=%u, rmapbt=%u, reflink=%u, inline=%u\n"
>  "data     =%-22s bsize=%-6u blocks=%llu, imaxpct=%u\n"
>  "         =%-22s sunit=%-6u swidth=%u blks\n"
>  "naming   =version %-14u bsize=%-6u ascii-ci=%d ftype=%d\n"
> @@ -3195,7 +3209,7 @@ print_mkfs_cfg(
>  			(long long)cfg->agsize,
>  		"", cfg->sectorsize, fp->attr_version, fp->projid32bit,
>  		"", fp->crcs_enabled, fp->finobt, fp->spinodes, fp->rmapbt,
> -			fp->reflink,
> +			fp->reflink, fp->inlinedata,
>  		"", cfg->blocksize, (long long)cfg->dblocks, cfg->imaxpct,
>  		"", cfg->dsunit, cfg->dswidth,
>  		fp->dir_version, cfg->dirblocksize, fp->nci, fp->dirftype,
> -- 
> 2.11.0
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH RFC 5/8] xfs: consider the local format inode in misc operations
  2018-07-06  3:12 ` [PATCH RFC 5/8] xfs: consider the local format inode in misc operations Shan Hai
@ 2018-07-06  3:40   ` Darrick J. Wong
  2018-07-06  4:40     ` Shan Hai
  2018-07-08 15:51   ` Christoph Hellwig
  1 sibling, 1 reply; 50+ messages in thread
From: Darrick J. Wong @ 2018-07-06  3:40 UTC (permalink / raw)
  To: Shan Hai; +Cc: linux-xfs

On Fri, Jul 06, 2018 at 11:12:26AM +0800, Shan Hai wrote:
> The local format inode is a legal citizen from now on and consider
> it in misc operations.
> 
> Signed-off-by: Shan Hai <shan.hai@oracle.com>
> ---
>  fs/xfs/libxfs/xfs_bmap.c      | 18 ++++++++++--------
>  fs/xfs/libxfs/xfs_inode_buf.c |  6 ------
>  fs/xfs/scrub/inode.c          |  2 +-
>  fs/xfs/xfs_bmap_util.c        | 11 +++++++++--
>  fs/xfs/xfs_inode.c            | 20 ++++++++++++++------
>  fs/xfs/xfs_inode_item.c       |  5 ++++-
>  fs/xfs/xfs_iomap.c            |  5 +++--
>  fs/xfs/xfs_log_recover.c      |  3 ++-
>  8 files changed, 43 insertions(+), 27 deletions(-)
> 
> diff --git a/fs/xfs/libxfs/xfs_bmap.c b/fs/xfs/libxfs/xfs_bmap.c
> index bea6dc254a7d..6b151bd15da7 100644
> --- a/fs/xfs/libxfs/xfs_bmap.c
> +++ b/fs/xfs/libxfs/xfs_bmap.c
> @@ -830,11 +830,6 @@ xfs_bmap_local_to_extents(
>  	struct xfs_bmbt_irec rec;
>  	struct xfs_iext_cursor icur;
>  
> -	/*
> -	 * We don't want to deal with the case of keeping inode data inline yet.
> -	 * So sending the data fork of a regular inode is invalid.
> -	 */
> -	ASSERT(!(S_ISREG(VFS_I(ip)->i_mode) && whichfork == XFS_DATA_FORK));
>  	ifp = XFS_IFORK_PTR(ip, whichfork);
>  	ASSERT(XFS_IFORK_FORMAT(ip, whichfork) == XFS_DINODE_FMT_LOCAL);
>  
> @@ -3840,7 +3835,8 @@ xfs_bmapi_read(
>  
>  	if (unlikely(XFS_TEST_ERROR(
>  	    (XFS_IFORK_FORMAT(ip, whichfork) != XFS_DINODE_FMT_EXTENTS &&
> -	     XFS_IFORK_FORMAT(ip, whichfork) != XFS_DINODE_FMT_BTREE),
> +	     XFS_IFORK_FORMAT(ip, whichfork) != XFS_DINODE_FMT_BTREE &&
> +	     XFS_IFORK_FORMAT(ip, whichfork) != XFS_DINODE_FMT_LOCAL),
>  	     mp, XFS_ERRTAG_BMAPIFORMAT))) {
>  		XFS_ERROR_REPORT("xfs_bmapi_read", XFS_ERRLEVEL_LOW, mp);
>  		return -EFSCORRUPTED;
> @@ -3863,7 +3859,7 @@ xfs_bmapi_read(
>  		return 0;
>  	}
>  
> -	if (!(ifp->if_flags & XFS_IFEXTENTS)) {
> +	if (!(ifp->if_flags & (XFS_IFINLINE | XFS_IFEXTENTS))) {
>  		error = xfs_iread_extents(NULL, ip, whichfork);
>  		if (error)
>  			return error;
> @@ -4285,7 +4281,6 @@ xfs_bmapi_write(
>  	       (flags & (XFS_BMAPI_CONVERT | XFS_BMAPI_COWFORK)) ==
>  			(XFS_BMAPI_CONVERT | XFS_BMAPI_COWFORK));
>  	ASSERT(len > 0);
> -	ASSERT(XFS_IFORK_FORMAT(ip, whichfork) != XFS_DINODE_FMT_LOCAL);
>  	ASSERT(xfs_isilocked(ip, XFS_ILOCK_EXCL));
>  	ASSERT(!(flags & XFS_BMAPI_REMAP));
>  
> @@ -5244,11 +5239,18 @@ __xfs_bunmapi(
>  	ifp = XFS_IFORK_PTR(ip, whichfork);
>  	if (unlikely(
>  	    XFS_IFORK_FORMAT(ip, whichfork) != XFS_DINODE_FMT_EXTENTS &&
> +	    XFS_IFORK_FORMAT(ip, whichfork) != XFS_DINODE_FMT_LOCAL &&
>  	    XFS_IFORK_FORMAT(ip, whichfork) != XFS_DINODE_FMT_BTREE)) {
>  		XFS_ERROR_REPORT("xfs_bunmapi", XFS_ERRLEVEL_LOW,
>  				 ip->i_mount);
>  		return -EFSCORRUPTED;
>  	}
> +
> +	if (XFS_IFORK_FORMAT(ip, whichfork) == XFS_DINODE_FMT_LOCAL) {
> +		*rlen = 0;
> +		return 0;
> +	}
> +
>  	mp = ip->i_mount;
>  	if (XFS_FORCED_SHUTDOWN(mp))
>  		return -EIO;
> diff --git a/fs/xfs/libxfs/xfs_inode_buf.c b/fs/xfs/libxfs/xfs_inode_buf.c
> index 33dc34655ac3..cb3c4b308137 100644
> --- a/fs/xfs/libxfs/xfs_inode_buf.c
> +++ b/fs/xfs/libxfs/xfs_inode_buf.c
> @@ -157,7 +157,6 @@ const struct xfs_buf_ops xfs_inode_buf_ra_ops = {
>  	.verify_write = xfs_inode_buf_write_verify,
>  };
>  
> -
>  /*
>   * This routine is called to map an inode to the buffer containing the on-disk
>   * version of the inode.  It returns a pointer to the buffer containing the
> @@ -384,12 +383,7 @@ xfs_dinode_verify_fork(
>  
>  	switch (XFS_DFORK_FORMAT(dip, whichfork)) {
>  	case XFS_DINODE_FMT_LOCAL:
> -		/*
> -		 * no local regular files yet
> -		 */
>  		if (whichfork == XFS_DATA_FORK) {
> -			if (S_ISREG(be16_to_cpu(dip->di_mode)))
> -				return __this_address;
>  			if (be64_to_cpu(dip->di_size) >
>  					XFS_DFORK_SIZE(dip, mp, whichfork))
>  				return __this_address;
> diff --git a/fs/xfs/scrub/inode.c b/fs/xfs/scrub/inode.c
> index 7a6208505980..1501983dd7aa 100644
> --- a/fs/xfs/scrub/inode.c
> +++ b/fs/xfs/scrub/inode.c
> @@ -283,7 +283,7 @@ xfs_scrub_dinode(
>  			xfs_scrub_ino_set_corrupt(sc, ino);
>  		break;
>  	case XFS_DINODE_FMT_LOCAL:
> -		if (!S_ISDIR(mode) && !S_ISLNK(mode))
> +		if (!S_ISREG(mode) && !S_ISDIR(mode) && !S_ISLNK(mode))
>  			xfs_scrub_ino_set_corrupt(sc, ino);
>  		break;
>  	case XFS_DINODE_FMT_EXTENTS:
> diff --git a/fs/xfs/xfs_bmap_util.c b/fs/xfs/xfs_bmap_util.c
> index 83b1e8c6c18f..46f718177fd7 100644
> --- a/fs/xfs/xfs_bmap_util.c
> +++ b/fs/xfs/xfs_bmap_util.c
> @@ -760,7 +760,8 @@ xfs_can_free_eofblocks(struct xfs_inode *ip, bool force)
>  		return false;
>  
>  	/* If we haven't read in the extent list, then don't do it now. */
> -	if (!(ip->i_df.if_flags & XFS_IFEXTENTS))
> +	if (ip->i_d.di_format != XFS_DINODE_FMT_LOCAL &&
> +		!(ip->i_df.if_flags & XFS_IFEXTENTS))
>  		return false;
>  
>  	/*
> @@ -768,7 +769,8 @@ xfs_can_free_eofblocks(struct xfs_inode *ip, bool force)
>  	 * has delalloc blocks and we are forced to remove them.
>  	 */
>  	if (ip->i_d.di_flags & (XFS_DIFLAG_PREALLOC | XFS_DIFLAG_APPEND))
> -		if (!force || ip->i_delayed_blks == 0)
> +		if ((ip->i_d.di_format != XFS_DINODE_FMT_LOCAL) &&
> +			(!force || ip->i_delayed_blks == 0))
>  			return false;
>  
>  	return true;
> @@ -792,6 +794,11 @@ xfs_free_eofblocks(
>  	struct xfs_bmbt_irec	imap;
>  	struct xfs_mount	*mp = ip->i_mount;
>  
> +	if (xfs_sb_version_hasinlinedata(&mp->m_sb) &&
> +		ip->i_d.di_format == XFS_DINODE_FMT_LOCAL) {

Bleh, this indentation is confusing me.

if (foobarblahblabhlbahlabhlabh... &&
    bazcow...)
	dostuff();

or:

if (foobarblahblabhlabhalbhba.. &&
		bazcow...)
	dostuff();

> +		ip->i_delayed_blks = 0;
> +		return 0;
> +	}
>  	/*
>  	 * Figure out if there are any blocks beyond the end
>  	 * of the file.  If not, then there is nothing to do.
> diff --git a/fs/xfs/xfs_inode.c b/fs/xfs/xfs_inode.c
> index 5df4de666cc1..78b9790a7cd4 100644
> --- a/fs/xfs/xfs_inode.c
> +++ b/fs/xfs/xfs_inode.c
> @@ -1563,7 +1563,6 @@ xfs_itruncate_extents_flags(
>  	ASSERT(!XFS_NOT_DQATTACHED(mp, ip));
>  
>  	trace_xfs_itruncate_extents_start(ip, new_size);
> -
>  	flags |= xfs_bmapi_aflag(whichfork);
>  
>  	/*
> @@ -1745,9 +1744,15 @@ xfs_inactive_truncate(
>  	ip->i_d.di_size = 0;
>  	xfs_trans_log_inode(tp, ip, XFS_ILOG_CORE);
>  
> -	error = xfs_itruncate_extents(&tp, ip, XFS_DATA_FORK, 0);
> -	if (error)
> -		goto error_trans_cancel;
> +	if (xfs_sb_version_hasinlinedata(&mp->m_sb) &&
> +		ip->i_d.di_format == XFS_DINODE_FMT_LOCAL) {
> +		ASSERT(ip->i_d.di_nextents == 0);
> +		ip->i_delayed_blks = 0;
> +	} else {
> +		error = xfs_itruncate_extents(&tp, ip, XFS_DATA_FORK, 0);
> +		if (error)
> +			goto error_trans_cancel;
> +	}
>  
>  	ASSERT(ip->i_d.di_nextents == 0);
>  
> @@ -1879,6 +1884,7 @@ xfs_inactive(
>  	if (VFS_I(ip)->i_mode == 0) {
>  		ASSERT(ip->i_df.if_real_bytes == 0);
>  		ASSERT(ip->i_df.if_broot_bytes == 0);
> +		ASSERT(ip->i_delayed_blks == 0);

Why?

>  		return;
>  	}
>  
> @@ -1911,8 +1917,9 @@ xfs_inactive(
>  
>  	if (S_ISREG(VFS_I(ip)->i_mode) &&
>  	    (ip->i_d.di_size != 0 || XFS_ISIZE(ip) != 0 ||
> -	     ip->i_d.di_nextents > 0 || ip->i_delayed_blks > 0))
> +	     ip->i_d.di_nextents >= 0 || ip->i_delayed_blks >= 0)) {

Why does the test change from > to >= ?

>  		truncate = 1;
> +	}
>  
>  	error = xfs_qm_dqattach(ip);
>  	if (error)
> @@ -3554,7 +3561,8 @@ xfs_iflush_int(
>  	if (S_ISREG(VFS_I(ip)->i_mode)) {
>  		if (XFS_TEST_ERROR(
>  		    (ip->i_d.di_format != XFS_DINODE_FMT_EXTENTS) &&
> -		    (ip->i_d.di_format != XFS_DINODE_FMT_BTREE),
> +		    (ip->i_d.di_format != XFS_DINODE_FMT_BTREE) &&
> +		    (ip->i_d.di_format != XFS_DINODE_FMT_LOCAL),
>  		    mp, XFS_ERRTAG_IFLUSH_3)) {
>  			xfs_alert_tag(mp, XFS_PTAG_IFLUSH,
>  				"%s: Bad regular inode %Lu, ptr "PTR_FMT,
> diff --git a/fs/xfs/xfs_inode_item.c b/fs/xfs/xfs_inode_item.c
> index 2389c34c172d..52b297987a75 100644
> --- a/fs/xfs/xfs_inode_item.c
> +++ b/fs/xfs/xfs_inode_item.c
> @@ -139,6 +139,7 @@ xfs_inode_item_format_data_fork(
>  	struct xfs_log_iovec	**vecp)
>  {
>  	struct xfs_inode	*ip = iip->ili_inode;
> +	struct xfs_mount	*mp = ip->i_mount;
>  	size_t			data_bytes;
>  
>  	switch (ip->i_d.di_format) {
> @@ -197,7 +198,9 @@ xfs_inode_item_format_data_fork(
>  			ASSERT(ip->i_df.if_real_bytes == 0 ||
>  			       ip->i_df.if_real_bytes >= data_bytes);
>  			ASSERT(ip->i_df.if_u1.if_data != NULL);
> -			ASSERT(ip->i_d.di_size > 0);
> +			if (!xfs_sb_version_hasinlinedata(&mp->m_sb)) {
> +				ASSERT(ip->i_d.di_size > 0);

ASSERT(di_size > 0 || hasinlinedata); ??

> +			}
>  			xlog_copy_iovec(lv, vecp, XLOG_REG_TYPE_ILOCAL,
>  					ip->i_df.if_u1.if_data, data_bytes);
>  			ilf->ilf_dsize = (unsigned)data_bytes;
> diff --git a/fs/xfs/xfs_iomap.c b/fs/xfs/xfs_iomap.c
> index 55876dd02f0c..aa72966081c5 100644
> --- a/fs/xfs/xfs_iomap.c
> +++ b/fs/xfs/xfs_iomap.c
> @@ -532,7 +532,8 @@ xfs_file_iomap_begin_delay(
>  
>  	if (unlikely(XFS_TEST_ERROR(
>  	    (XFS_IFORK_FORMAT(ip, XFS_DATA_FORK) != XFS_DINODE_FMT_EXTENTS &&
> -	     XFS_IFORK_FORMAT(ip, XFS_DATA_FORK) != XFS_DINODE_FMT_BTREE),
> +	     XFS_IFORK_FORMAT(ip, XFS_DATA_FORK) != XFS_DINODE_FMT_BTREE &&
> +	     XFS_IFORK_FORMAT(ip, XFS_DATA_FORK) != XFS_DINODE_FMT_LOCAL),
>  	     mp, XFS_ERRTAG_BMAPIFORMAT))) {
>  		XFS_ERROR_REPORT(__func__, XFS_ERRLEVEL_LOW, mp);
>  		error = -EFSCORRUPTED;
> @@ -541,7 +542,7 @@ xfs_file_iomap_begin_delay(
>  
>  	XFS_STATS_INC(mp, xs_blk_mapw);
>  
> -	if (!(ifp->if_flags & XFS_IFEXTENTS)) {
> +	if (!(ifp->if_flags & (XFS_IFINLINE |XFS_IFEXTENTS))) {
>  		error = xfs_iread_extents(NULL, ip, XFS_DATA_FORK);
>  		if (error)
>  			goto out_unlock;
> diff --git a/fs/xfs/xfs_log_recover.c b/fs/xfs/xfs_log_recover.c
> index b181b5f57a19..d8828f275d9a 100644
> --- a/fs/xfs/xfs_log_recover.c
> +++ b/fs/xfs/xfs_log_recover.c
> @@ -3121,7 +3121,8 @@ xlog_recover_inode_pass2(
>  
>  	if (unlikely(S_ISREG(ldip->di_mode))) {
>  		if ((ldip->di_format != XFS_DINODE_FMT_EXTENTS) &&
> -		    (ldip->di_format != XFS_DINODE_FMT_BTREE)) {
> +		    (ldip->di_format != XFS_DINODE_FMT_BTREE) &&
> +		    (ldip->di_format != XFS_DINODE_FMT_LOCAL)) {
>  			XFS_CORRUPTION_ERROR("xlog_recover_inode_pass2(3)",
>  					 XFS_ERRLEVEL_LOW, mp, ldip,
>  					 sizeof(*ldip));
> -- 
> 2.11.0
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH RFC 2/8] xfs: introduce extents to local conversion helper
  2018-07-06  3:12 ` [PATCH RFC 2/8] xfs: introduce extents to local conversion helper Shan Hai
@ 2018-07-06  3:45   ` Darrick J. Wong
  2018-07-06  4:15     ` Shan Hai
  2018-07-08 15:42   ` Christoph Hellwig
  1 sibling, 1 reply; 50+ messages in thread
From: Darrick J. Wong @ 2018-07-06  3:45 UTC (permalink / raw)
  To: Shan Hai; +Cc: linux-xfs

On Fri, Jul 06, 2018 at 11:12:23AM +0800, Shan Hai wrote:
> Delete the extents from xfs inode, copy the data into the data fork,
> convert the inode from extents to local format and specify log the
> core and data fork of the inode.
> 
> Signed-off-by: Shan Hai <shan.hai@oracle.com>
> ---
>  fs/xfs/libxfs/xfs_bmap.c | 60 +++++++++++++++++++++++++++++++++++++++++++++++-
>  fs/xfs/libxfs/xfs_bmap.h |  4 ++++
>  2 files changed, 63 insertions(+), 1 deletion(-)
> 
> diff --git a/fs/xfs/libxfs/xfs_bmap.c b/fs/xfs/libxfs/xfs_bmap.c
> index 7205268b30bc..bea6dc254a7d 100644
> --- a/fs/xfs/libxfs/xfs_bmap.c
> +++ b/fs/xfs/libxfs/xfs_bmap.c
> @@ -1,6 +1,7 @@
>  // SPDX-License-Identifier: GPL-2.0
>  /*
>   * Copyright (c) 2000-2006 Silicon Graphics, Inc.
> + * Copyright (c) 2018 Oracle.
>   * All Rights Reserved.
>   */
>  #include "xfs.h"
> @@ -213,7 +214,7 @@ xfs_default_attroffset(
>   * attribute fork from local to extent format - we reset it where
>   * possible to make space available for inline data fork extents.
>   */
> -STATIC void
> +void
>  xfs_bmap_forkoff_reset(

Unrelated change in this patch?

Also, is it safe to reset forkoff when converting the data fork to
extents?  Does doing so mes up the attr fork?

>  	xfs_inode_t	*ip,
>  	int		whichfork)
> @@ -5141,6 +5142,63 @@ xfs_bmap_del_extent_real(
>  }
>  
>  /*
> + * Convert an inode from extents to the local format.
> + * Free all the extents of the inode and reset it to the local
> + * format. Copy the contents of the inode's blocks to the inode's
> + * literal area.
> + */
> +int
> +xfs_bmap_extents_to_local(
> +	xfs_trans_t		*tp,
> +	xfs_inode_t		*ip,

No typedefs, please.

	struct xfs_inode	*ip,

> +	struct xfs_defer_ops	*dfops,
> +	int			*logflagsp,
> +	int			whichfork,
> +	struct page		*page)
> +{
> +	xfs_ifork_t		*ifp = XFS_IFORK_PTR(ip, whichfork);
> +	xfs_fileoff_t		isize = i_size_read(VFS_I(ip));
> +	struct xfs_bmbt_irec	got, del;
> +	struct xfs_iext_cursor	icur;
> +	int			error = 0;
> +	int			tmp_logflags;
> +	char			*kaddr = NULL;
> +
> +	ASSERT(whichfork != XFS_COW_FORK);
> +	ASSERT(XFS_IFORK_FORMAT(ip, whichfork) == XFS_DINODE_FMT_EXTENTS);
> +
> +	if (xfs_iext_count(ifp) == 0)
> +		goto init_local_fork;
> +
> +	for_each_xfs_iext(ifp, &icur, &got) {
> +		del = got;
> +		if (isnullstartblock(got.br_startblock)) {
> +			error = xfs_bmap_del_extent_delay(ip,
> +					whichfork, &icur, &got, &del);
> +			if (error)
> +				goto out;
> +		} else {
> +			error = xfs_bmap_del_extent_real(ip, tp, &icur, dfops,
> +					NULL, &del, &tmp_logflags, whichfork, 0);
> +			if (error)
> +				goto out;
> +			*logflagsp |= tmp_logflags;
> +		}
> +	}
> +init_local_fork:
> +	kaddr = kmap_atomic(page);
> +	xfs_init_local_fork(ip, whichfork, kaddr, isize);
> +	kunmap_atomic(kaddr);
> +	ip->i_d.di_format = XFS_DINODE_FMT_LOCAL;
> +	XFS_IFORK_NEXT_SET(ip, whichfork, 0);
> +	ip->i_d.di_size = isize;
> +	*logflagsp |= (XFS_ILOG_DDATA | XFS_ILOG_CORE);
> +out:
> +	return error;
> +}
> +
> +
> +/*
>   * Unmap (remove) blocks from a file.
>   * If nexts is nonzero then the number of extents to remove is limited to
>   * that value.  If not all extents in the block range can be removed then
> diff --git a/fs/xfs/libxfs/xfs_bmap.h b/fs/xfs/libxfs/xfs_bmap.h
> index 9b49ddf99c41..22cd2642f1cd 100644
> --- a/fs/xfs/libxfs/xfs_bmap.h
> +++ b/fs/xfs/libxfs/xfs_bmap.h
> @@ -271,6 +271,10 @@ int	xfs_bmap_map_extent(struct xfs_mount *mp, struct xfs_defer_ops *dfops,
>  		struct xfs_inode *ip, struct xfs_bmbt_irec *imap);
>  int	xfs_bmap_unmap_extent(struct xfs_mount *mp, struct xfs_defer_ops *dfops,
>  		struct xfs_inode *ip, struct xfs_bmbt_irec *imap);
> +int	xfs_bmap_extents_to_local(struct xfs_trans *tp, struct xfs_inode *ip,
> +		struct xfs_defer_ops *dfops, int *flags, int whichfork,
> +		struct page *page);
> +void	xfs_bmap_forkoff_reset(struct xfs_inode *ip, int whichfork);
>  
>  static inline int xfs_bmap_fork_to_state(int whichfork)
>  {
> -- 
> 2.11.0
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH RFC 3/8] xfs: convert inode from extents to local format
  2018-07-06  3:12 ` [PATCH RFC 3/8] xfs: convert inode from extents to local format Shan Hai
@ 2018-07-06  3:47   ` Darrick J. Wong
  2018-07-06  4:24     ` Shan Hai
  0 siblings, 1 reply; 50+ messages in thread
From: Darrick J. Wong @ 2018-07-06  3:47 UTC (permalink / raw)
  To: Shan Hai; +Cc: linux-xfs

On Fri, Jul 06, 2018 at 11:12:24AM +0800, Shan Hai wrote:
> Introduce a function to convert an inode from extents to local
> format. The conversion happens at writeback time, this avoids
> interfering with the delayed allocation and page cache vs
> xfs_buf operations, the will be inlined data is directly got
> from the page which was read and modified by the iomap write actor.
> 
> Signed-off-by: Shan Hai <shan.hai@oracle.com>
> ---
>  fs/xfs/xfs_aops.c | 60 +++++++++++++++++++++++++++++++++++++++++++++++++++++++
>  1 file changed, 60 insertions(+)
> 
> diff --git a/fs/xfs/xfs_aops.c b/fs/xfs/xfs_aops.c
> index 8eb3ba3d4d00..ac5a7695f363 100644
> --- a/fs/xfs/xfs_aops.c
> +++ b/fs/xfs/xfs_aops.c
> @@ -9,6 +9,7 @@
>  #include "xfs_log_format.h"
>  #include "xfs_trans_resv.h"
>  #include "xfs_mount.h"
> +#include "xfs_defer.h"
>  #include "xfs_inode.h"
>  #include "xfs_trans.h"
>  #include "xfs_inode_item.h"
> @@ -887,6 +888,55 @@ xfs_map_cow(
>  	return 0;
>  }
>  
> +static int
> +xfs_inode_extents_to_local(
> +	xfs_inode_t		*ip,
> +	int			whichfork,
> +	struct page		*page)
> +{
> +	struct xfs_mount	*mp = ip->i_mount;
> +	struct xfs_trans	*tp;
> +	struct xfs_defer_ops	dfops;
> +	xfs_fsblock_t		first_block;
> +	int			logflags;
> +	int			error = 0;
> +
> +	if (i_size_read(VFS_I(ip)) > XFS_IFORK_DSIZE(ip))
> +		return 0;

If it's too big to fit in local format, then isn't this an error?

--D

> +	error = xfs_trans_alloc(mp, &M_RES(mp)->tr_itruncate, 0, 0, 0, &tp);
> +	if (error)
> +		return error;
> +
> +	xfs_ilock(ip, XFS_ILOCK_EXCL);
> +	xfs_trans_ijoin(tp, ip, 0);
> +
> +	xfs_defer_init(&dfops, &first_block);
> +
> +	logflags = 0;
> +	error = xfs_bmap_extents_to_local(tp, ip, &dfops, &logflags,
> +			XFS_DATA_FORK, page);
> +	if (error)
> +		goto trans_cancel;
> +
> +	xfs_trans_log_inode(tp, ip, logflags);
> +	error = xfs_defer_finish(&tp, &dfops);
> +	if (error)
> +		goto trans_cancel;
> +	error = xfs_trans_commit(tp);
> +	if (error)
> +		goto error0;
> +	xfs_iunlock(ip, XFS_ILOCK_EXCL);
> +
> +	return 0;
> +
> +trans_cancel:
> +	xfs_defer_cancel(&dfops);
> +	xfs_trans_cancel(tp);
> +error0:
> +	xfs_iunlock(ip, XFS_ILOCK_EXCL);
> +	return error;
> +}
>  /*
>   * We implement an immediate ioend submission policy here to avoid needing to
>   * chain multiple ioends and hence nest mempool allocations which can violate
> @@ -920,6 +970,8 @@ xfs_writepage_map(
>  	int			count = 0;
>  	int			uptodate = 1;
>  	unsigned int		new_type;
> +	xfs_inode_t		*ip = XFS_I(inode);
> +	struct xfs_mount	*mp = ip->i_mount;
>  
>  	bh = head = page_buffers(page);
>  	offset = page_offset(page);
> @@ -1044,6 +1096,14 @@ xfs_writepage_map(
>  		end_page_writeback(page);
>  	}
>  
> +	if (!error) {
> +		if (xfs_sb_version_hasinlinedata(&mp->m_sb) &&
> +				i_size_read(inode) <= XFS_IFORK_DSIZE(ip)) {
> +			error = xfs_inode_extents_to_local(ip, XFS_DATA_FORK,
> +								page);
> +		}
> +	}
> +
>  	mapping_set_error(page->mapping, error);
>  	return error;
>  }
> -- 
> 2.11.0
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH RFC 0/8] xfs: introduce inode data inline feature
  2018-07-06  3:12 [PATCH RFC 0/8] xfs: introduce inode data inline feature Shan Hai
                   ` (8 preceding siblings ...)
  2018-07-06  3:12 ` [PATCH RFC 1/1] xfsprogs: add inode inline data support Shan Hai
@ 2018-07-06  3:51 ` Darrick J. Wong
  2018-07-06  4:09   ` Shan Hai
  2018-07-06  5:42 ` Dave Chinner
  10 siblings, 1 reply; 50+ messages in thread
From: Darrick J. Wong @ 2018-07-06  3:51 UTC (permalink / raw)
  To: Shan Hai; +Cc: linux-xfs

On Fri, Jul 06, 2018 at 11:12:21AM +0800, Shan Hai wrote:
> 
> This series implements xfs inode data inlining feature.
> 
> Refered below link during development:
> https://marc.info/?l=linux-xfs&m=120493585731509&w=2
> 
> 
> How it works:
> - the data inlining happens at:
>   write_iter: for DIO/DAX write
>   writeback: for buffered write
> - extents to local format conversion is done in writeback but not in write_iter
> - local to extents format conversion is done in the write_iter and writeback
> - log the whole inode (core/data) on extents to local conversion
> - add a new mkfs.xfs option to enable data inline feature
> 
> How to test:
> mkfs.xfs -f -i size=2048,inline=1 -m crc=1,finobt=1,rmapbt=1,reflink=1 /dev/sdX
> mount /dev/sdX mnt_pnt
> 
> Test results:
> mkfs.xfs -f -i size=2048,inline=1 -m crc=1,finobt=1,rmapbt=1,reflink=1 /dev/sdb
> mkfs.xfs -f -i size=2048          -m crc=1,finobt=1,rmapbt=1,reflink=1 /dev/sdd
> mount /dev/sdb /mnt/sdb
> mount /dev/sdd /mnt/sdd
> 
> Copy the linux v4.18-rc3 source code to /mnt/sdb and /mnt/sdd respectively:
> Filesystem                        Size  Used Avail Use% Mounted on
> /dev/sdb                           10G  2.4G  7.7G  24% /mnt/sdb
> /dev/sdd                           10G  3.1G  6.9G  31% /mnt/sdd
> 
> mkfs.xfs -f -i size=2048,inline=1 -m crc=1,finobt=1,rmapbt=1,reflink=1 /dev/sdc
> mkfs.xfs -f -i size=2048          -m crc=1,finobt=1,rmapbt=1,reflink=1 /dev/sde
> mount /dev/sdc /mnt/sdc
> mount /dev/sde /mnt/sde
> 
> Build the v4.18-rc3 kernel on /dev/sdc and /dev/sde respectively:
> make x86_64_defconfig
> 
> Filesystem                        Size  Used Avail Use% Mounted on
> /dev/sdc                           10G  689M  9.4G   7% /mnt/sdc
> /dev/sde                           10G  694M  9.4G   7% /mnt/sde

Not much of a savings... but I'll give this a preliminary review anyway.

Also, I am probably going to merge hch's bufferhead killing series for
4.19, which will rip right through a lot of this. :)

--D

> 
> Kernel part:
> 0001-xfs-introduce-inline-data-superblock-feature-bit.patch
> 0002-xfs-introduce-extents-to-local-conversion-helper.patch
> 0003-xfs-convert-inode-from-extents-to-local-format.patch
> 0004-xfs-implement-inline-data-read-write-code.patch
> 0005-xfs-consider-the-local-format-inode-in-misc-operatio.patch
> 0006-xfs-fix-imbalanced-locking.patch
> 0007-xfs-return-non-zero-blocks-for-inline-data.patch
> 0008-xfs-skip-local-format-inode-for-reflinking.patch
> 
>  fs/xfs/libxfs/xfs_bmap.c      |  78 +++++++++++++++++++++++++++++++-----
>  fs/xfs/libxfs/xfs_bmap.h      |   4 ++
>  fs/xfs/libxfs/xfs_format.h    |  10 ++++-
>  fs/xfs/libxfs/xfs_fs.h        |   1 +
>  fs/xfs/libxfs/xfs_inode_buf.c |   6 ---
>  fs/xfs/libxfs/xfs_sb.c        |   2 +
>  fs/xfs/scrub/inode.c          |   2 +-
>  fs/xfs/xfs_aops.c             |  60 ++++++++++++++++++++++++++++
>  fs/xfs/xfs_bmap_util.c        |  11 +++++-
>  fs/xfs/xfs_file.c             | 244 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++-
>  fs/xfs/xfs_inode.c            |  20 +++++++---
>  fs/xfs/xfs_inode_item.c       |   5 ++-
>  fs/xfs/xfs_ioctl.c            |   5 ++-
>  fs/xfs/xfs_iomap.c            |   5 ++-
>  fs/xfs/xfs_iops.c             |   8 +++-
>  fs/xfs/xfs_log_recover.c      |   3 +-
>  fs/xfs/xfs_reflink.c          |   6 +++
>  17 files changed, 437 insertions(+), 33 deletions(-)
> 
> xfsprogs part:
> 0001-xfsprogs-add-inode-inline-data-support.patch 
> 
>  growfs/xfs_growfs.c | 14 +++++++++-----
>  libxfs/xfs_format.h |  4 +++-
>  libxfs/xfs_fs.h     |  1 +
>  mkfs/xfs_mkfs.c     | 20 +++++++++++++++++---
>  4 files changed, 30 insertions(+), 9 deletions(-)
> 
> 
> Thanks
> Shan Hai
> --
> To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH RFC 8/8] xfs: skip local format inode for reflinking
  2018-07-06  3:26   ` Darrick J. Wong
@ 2018-07-06  3:54     ` Shan Hai
  2018-07-08 16:00     ` Christoph Hellwig
  1 sibling, 0 replies; 50+ messages in thread
From: Shan Hai @ 2018-07-06  3:54 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: linux-xfs



On 2018年07月06日 11:26, Darrick J. Wong wrote:
> On Fri, Jul 06, 2018 at 11:12:29AM +0800, Shan Hai wrote:
>> The local format inode holds the data inline in its data fork and
>> has no extents at all, so skip the inode for reflinking.
>>
>> Signed-off-by: Shan Hai <shan.hai@oracle.com>
>> ---
>>   fs/xfs/xfs_reflink.c | 6 ++++++
>>   1 file changed, 6 insertions(+)
>>
>> diff --git a/fs/xfs/xfs_reflink.c b/fs/xfs/xfs_reflink.c
>> index 592fb2071a03..dfb3a85ec0e6 100644
>> --- a/fs/xfs/xfs_reflink.c
>> +++ b/fs/xfs/xfs_reflink.c
>> @@ -1507,6 +1507,12 @@ xfs_reflink_inode_has_shared_extents(
>>   	int				error;
>>   
>>   	ifp = XFS_IFORK_PTR(ip, XFS_DATA_FORK);
>> +
>> +	if (ifp->if_flags & XFS_IFINLINE) {
>> +		*has_shared = false;
>> +		return 0;
>> +	}
> What happens if someone tries to reflink and either src or dest are an
> inline file?

I have to re-check this, thanks for the suggestion.

Thanks
Shan Hai
> --D
>
>> +
>>   	if (!(ifp->if_flags & XFS_IFEXTENTS)) {
>>   		error = xfs_iread_extents(tp, ip, XFS_DATA_FORK);
>>   		if (error)
>> -- 
>> 2.11.0
>>
>> --
>> To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
>> the body of a message to majordomo@vger.kernel.org
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html


^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH RFC 4/8] xfs: implement inline data read write code
  2018-07-06  3:33   ` Darrick J. Wong
@ 2018-07-06  4:05     ` Shan Hai
  0 siblings, 0 replies; 50+ messages in thread
From: Shan Hai @ 2018-07-06  4:05 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: linux-xfs



On 2018年07月06日 11:33, Darrick J. Wong wrote:
> On Fri, Jul 06, 2018 at 11:12:25AM +0800, Shan Hai wrote:
>> Implement read/write functions for inline data access and use them
>> for buffered/DIO/DAX operations.
>>
>> Signed-off-by: Shan Hai <shan.hai@oracle.com>
>> ---
>>   fs/xfs/xfs_file.c | 244 +++++++++++++++++++++++++++++++++++++++++++++++++++++-
>>   1 file changed, 242 insertions(+), 2 deletions(-)
>>
>> diff --git a/fs/xfs/xfs_file.c b/fs/xfs/xfs_file.c
>> index a3e7767a5715..55047928b720 100644
>> --- a/fs/xfs/xfs_file.c
>> +++ b/fs/xfs/xfs_file.c
>> @@ -27,6 +27,7 @@
>>   #include "xfs_pnfs.h"
>>   #include "xfs_iomap.h"
>>   #include "xfs_reflink.h"
>> +#include "xfs_defer.h"
>>   
>>   #include <linux/dcache.h>
>>   #include <linux/falloc.h>
>> @@ -175,6 +176,185 @@ xfs_file_fsync(
>>   	return error;
>>   }
>>   
>> +/*
>> + * Read the inline data of a local format file.
>> + */
>> +STATIC ssize_t
>> +xfs_file_inline_data_read(
>> +	struct xfs_inode	*ip,
>> +	int			whichfork,
>> +	struct kiocb		*iocb,
>> +	struct iov_iter		*to)
>> +{
>> +	xfs_ifork_t		*ifp = XFS_IFORK_PTR(ip, whichfork);
>> +	struct inode		*inode = VFS_I(ip);
>> +	loff_t			isize = i_size_read(inode);
>> +	loff_t			*ppos = &iocb->ki_pos;
>> +	size_t			count = iov_iter_count(to);
>> +	ssize_t			ret = 0;
>> +
>> +	ASSERT(XFS_IFORK_FORMAT(ip, whichfork) == XFS_DINODE_FMT_EXTENTS ||
>> +	       XFS_IFORK_FORMAT(ip, whichfork) == XFS_DINODE_FMT_LOCAL);
>> +
>> +	if (!count)
>> +		return 0;
>> +
>> +	if (unlikely(*ppos >= inode->i_sb->s_maxbytes))
>> +		return 0;
>> +
>> +	if (*ppos > isize)
>> +		return 0;
>> +
>> +	iov_iter_truncate(to, inode->i_sb->s_maxbytes);
>> +	count = (*ppos + count > isize) ? isize - *ppos : count;
>> +	ret = copy_to_iter(ifp->if_u1.if_data + *ppos, count, to);
>> +	*ppos += ret;
>> +
>> +	return ret;
>> +}
>> +
>> +/*
>> + * Try to inline the data into the xfs inode data fork when it has enough
>> + * space to hold it. The data inlining happens at below points:
>> + *
>> + * - writeback: for buffered writes
>> + * - write_iter: for dax/dio writes
>> + *
>> + * The extents to local format conversion is done here for DAX/DIO write,
>> + * while the same conversion is done in the writeback path for buffered write.
>> + * This function also converts inode from local to extents when the data
>> + * fork could not hold the data inline anymore.
>> + */
>> +STATIC ssize_t
>> +xfs_file_inline_data_write(
>> +	struct xfs_inode	*ip,
>> +	int			whichfork,
>> +	struct kiocb		*iocb,
>> +	struct iov_iter		*from)
>> +{
>> +	struct xfs_mount	*mp = ip->i_mount;
>> +	xfs_ifork_t		*ifp = XFS_IFORK_PTR(ip, whichfork);
>> +	struct inode		*inode = VFS_I(ip);
>> +	struct address_space	*mapping = inode->i_mapping;
>> +	loff_t			pos = iocb->ki_pos;
>> +	size_t			count = iov_iter_count(from);
>> +	struct xfs_trans	*tp;
>> +	struct xfs_defer_ops	dfops;
>> +	xfs_fsblock_t		first_block;
>> +	int			ret;
>> +	int			error = 0;
>> +	int			logflags = 0;
>> +	loff_t			written = 0;
>> +	xfs_fileoff_t		bno = 0;
>> +	struct xfs_bmbt_irec	map;
>> +	int			nmap;
>> +	struct page		*page;
>> +	char			*kaddr;
>> +
>> +	ASSERT(XFS_IFORK_FORMAT(ip, whichfork) == XFS_DINODE_FMT_EXTENTS ||
>> +	       XFS_IFORK_FORMAT(ip, whichfork) == XFS_DINODE_FMT_LOCAL);
>> +
>> +	error = xfs_trans_alloc(mp, &M_RES(mp)->tr_write, 1, 0, 0, &tp);
>> +	if (error) {
>> +		ASSERT(error == -ENOSPC || XFS_FORCED_SHUTDOWN(mp));
>> +		return error;
>> +	}
>> +
>> +	xfs_ilock(ip, XFS_ILOCK_EXCL);
>> +	xfs_trans_ijoin(tp, ip, 0);
>> +	xfs_defer_init(&dfops, &first_block);
>> +
>> +	if (pos + count > XFS_IFORK_DSIZE(ip)) {
>> +		if (XFS_IFORK_FORMAT(ip, whichfork) == XFS_DINODE_FMT_EXTENTS) {
>> +			error = 0;
>> +			goto trans_cancel;
>> +		}
>> +
>> +		page = find_or_create_page(mapping, pos >> PAGE_SHIFT, GFP_NOFS);
>> +		if (!page) {
>> +			error = -ENOMEM;
>> +			goto trans_cancel;
>> +		}
>> +		kaddr = kmap_atomic(page);
>> +		memcpy(kaddr, ifp->if_u1.if_data, ifp->if_bytes);
>> +		kunmap_atomic(kaddr);
>> +		SetPageUptodate(page);
>> +		unlock_page(page);
>> +		put_page(page);
>> +
>> +		xfs_bmap_forkoff_reset(ip, whichfork);
>> +		ifp->if_flags &= ~XFS_IFINLINE;
>> +		ifp->if_flags |= XFS_IFEXTENTS;
>> +		ifp->if_u1.if_root = NULL;
>> +		ifp->if_height = 0;
>> +		ifp->if_bytes = 0;
>> +		XFS_IFORK_FMT_SET(ip, whichfork, XFS_DINODE_FMT_EXTENTS);
>> +
>> +		written = ifp->if_bytes;
>> +		error = xfs_bmap_first_unused(tp, ip, 1, &bno, whichfork);
>> +		if (error)
>> +			goto trans_cancel;
>> +		nmap = 1;
>> +		error = xfs_bmapi_write(tp, ip, bno, 1, XFS_BMAPI_ENTIRE,
>> +				&first_block, 1, &map, &nmap, &dfops);
>> +		if (error)
>> +			goto trans_cancel;
>> +
>> +		goto out_local_to_extents;
>> +	}
>> +
>> +	if (XFS_IFORK_FORMAT(ip, whichfork) == XFS_DINODE_FMT_LOCAL) {
>> +		xfs_idata_realloc(ip, (pos + count) - ifp->if_bytes, whichfork);
>> +		ret = copy_from_iter(ifp->if_u1.if_data + pos, count, from);
>> +		written = ret;
>> +		goto out;
>> +	}
>> +
>> +	if (IS_DAX(inode) || iocb->ki_flags & IOCB_DIRECT) {
>> +		struct page	*page = alloc_page(GFP_KERNEL | GFP_NOFS);
>> +		if (!page) {
>> +			error = -ENOMEM;
>> +			goto trans_cancel;
>> +		}
>> +
>> +		kaddr = kmap_atomic(page);
>> +		ret = copy_from_iter(kaddr + pos, count, from);
>> +		kunmap_atomic(kaddr);
>> +		written = ret;
>> +		error = xfs_bmap_extents_to_local(tp, ip, &dfops, &logflags,
>> +							XFS_DATA_FORK, page);
>> +		if (error) {
>> +			__free_page(page);
>> +			goto trans_cancel;
>> +		}
>> +
>> +		__free_page(page);
>> +		goto out;
>> +	}
>> +trans_cancel:
>> +	xfs_defer_cancel(&dfops);
>> +	xfs_trans_cancel(tp);
>> +	xfs_iunlock(ip, XFS_ILOCK_EXCL);
>> +	return error;
>> +out:
>> +	if (pos + written > i_size_read(inode))
>> +		i_size_write(inode, pos + written);
>> +	ip->i_d.di_size = pos + written;
>> +
>> +out_local_to_extents:
>> +	logflags |= (XFS_ILOG_DDATA | XFS_ILOG_CORE);
>> +	xfs_trans_log_inode(tp, ip, logflags);
>> +
>> +	error = xfs_defer_finish(&tp, &dfops);
>> +	if (error)
>> +		goto trans_cancel;
>> +	error = xfs_trans_commit(tp);
>> +	xfs_iunlock(ip, XFS_ILOCK_EXCL);
>> +
>> +	return written > 0 ? written : error;
>> +}
>> +
>> +
>>   STATIC ssize_t
>>   xfs_file_dio_aio_read(
>>   	struct kiocb		*iocb,
>> @@ -192,7 +372,12 @@ xfs_file_dio_aio_read(
>>   	file_accessed(iocb->ki_filp);
>>   
>>   	xfs_ilock(ip, XFS_IOLOCK_SHARED);
>> -	ret = iomap_dio_rw(iocb, to, &xfs_iomap_ops, NULL);
>> +	if (xfs_sb_version_hasinlinedata(&ip->i_mount->m_sb) &&
>> +	    XFS_IFORK_FORMAT(ip, XFS_DATA_FORK) == XFS_DINODE_FMT_LOCAL) {
>> +		ret = xfs_file_inline_data_read(ip, XFS_DATA_FORK, iocb, to);
> I thought iomap was growing the ability to handle inline data?  At least
> for gfs2?
>
> Hmm, you might ask Christoph about how to support that... also these
> read/write paths are about to get a lot of rip/smash this cycle.
>
>> +	} else {
>> +		ret = iomap_dio_rw(iocb, to, &xfs_iomap_ops, NULL);
>> +	}
>>   	xfs_iunlock(ip, XFS_IOLOCK_SHARED);
>>   
>>   	return ret;
>> @@ -219,6 +404,13 @@ xfs_file_dax_read(
>>   		xfs_ilock(ip, XFS_IOLOCK_SHARED);
>>   	}
>>   
>> +	if (xfs_sb_version_hasinlinedata(&ip->i_mount->m_sb) &&
>> +	    XFS_IFORK_FORMAT(ip, XFS_DATA_FORK) == XFS_DINODE_FMT_LOCAL) {
>> +		ret = xfs_file_inline_data_read(ip, XFS_DATA_FORK, iocb, to);
>> +	} else {
>> +		ret = dax_iomap_rw(iocb, to, &xfs_iomap_ops);
>> +	}
>> +
>>   	ret = dax_iomap_rw(iocb, to, &xfs_iomap_ops);
>>   	xfs_iunlock(ip, XFS_IOLOCK_SHARED);
>>   
>> @@ -242,7 +434,12 @@ xfs_file_buffered_aio_read(
>>   	} else {
>>   		xfs_ilock(ip, XFS_IOLOCK_SHARED);
>>   	}
>> -	ret = generic_file_read_iter(iocb, to);
>> +
>> +	if (XFS_IFORK_FORMAT(ip, XFS_DATA_FORK) == XFS_DINODE_FMT_LOCAL) {
>> +		ret = xfs_file_inline_data_read(ip, XFS_DATA_FORK, iocb, to);
>> +	} else {
>> +		ret = generic_file_read_iter(iocb, to);
>> +	}
>>   	xfs_iunlock(ip, XFS_IOLOCK_SHARED);
>>   
>>   	return ret;
>> @@ -487,6 +684,7 @@ xfs_file_dio_aio_write(
>>   	size_t			count = iov_iter_count(from);
>>   	struct xfs_buftarg      *target = XFS_IS_REALTIME_INODE(ip) ?
>>   					mp->m_rtdev_targp : mp->m_ddev_targp;
>> +	int			error;
>>   
>>   	/* DIO must be aligned to device logical sector size */
>>   	if ((iocb->ki_pos | count) & target->bt_logical_sectormask)
>> @@ -547,6 +745,22 @@ xfs_file_dio_aio_write(
>>   	}
>>   
>>   	trace_xfs_file_direct_write(ip, count, iocb->ki_pos);
>> +
>> +	if (xfs_sb_version_hasinlinedata(&mp->m_sb) &&
>> +		XFS_IFORK_FORMAT(ip, XFS_DATA_FORK) != XFS_DINODE_FMT_BTREE &&
> != BTREE? Why not == INLINE?

The xfs inode is born with EXTENTS format and the very first write to a 
file will
encounter the EXTENTS since it's not converted to INLINE yet, the format
conversion happens in the following xfs_file_inline_data_write.

>> +		S_ISREG(inode->i_mode)) {
>> +		error = xfs_file_inline_data_write(ip, XFS_DATA_FORK, iocb,
>> +							from);
> int != ssize_t... also, if we decided we needed extents format but the
> file turned out to be extents format already then did we lose the write
> here?

If the file is already in the extents format then the above 
xfs_file_inline_data_write
just returns without format conversion and the control flow falls 
through to the normal
extents handling because of the following if statement found that the 
format is not LOCAL.

>> +		if (error) {
>> +			ret = error;
>> +			goto out;
>> +		}
>> +		if (ip->i_d.di_format == XFS_DINODE_FMT_LOCAL) {
>> +			ret = count;
>> +			goto out;
>> +		}
>> +	}
>> +
>>   	ret = iomap_dio_rw(iocb, from, &xfs_iomap_ops, xfs_dio_write_end_io);
>>   out:
>>   	xfs_iunlock(ip, iolock);
>> @@ -586,6 +800,20 @@ xfs_file_dax_write(
>>   	count = iov_iter_count(from);
>>   
>>   	trace_xfs_file_dax_write(ip, count, pos);
>> +
>> +	if (xfs_sb_version_hasinlinedata(&ip->i_mount->m_sb) &&
>> +		ip->i_d.di_format != XFS_DINODE_FMT_BTREE &&
>> +		S_ISREG(inode->i_mode)) {
>> +		error = xfs_file_inline_data_write(ip, XFS_DATA_FORK,
>> +							iocb, from);
>> +		if (error)
>> +			goto out;
>> +		if (ip->i_d.di_format == XFS_DINODE_FMT_LOCAL) {
>> +			ret = count;
>> +			goto out;
>> +		}
>> +	}
>> +
>>   	ret = dax_iomap_rw(iocb, from, &xfs_iomap_ops);
>>   	if (ret > 0 && iocb->ki_pos > i_size_read(inode)) {
>>   		i_size_write(inode, iocb->ki_pos);
>> @@ -614,6 +842,7 @@ xfs_file_buffered_aio_write(
>>   	struct address_space	*mapping = file->f_mapping;
>>   	struct inode		*inode = mapping->host;
>>   	struct xfs_inode	*ip = XFS_I(inode);
>> +	struct xfs_mount	*mp = ip->i_mount;
>>   	ssize_t			ret;
>>   	int			enospc = 0;
>>   	int			iolock;
>> @@ -629,6 +858,17 @@ xfs_file_buffered_aio_write(
>>   	if (ret)
>>   		goto out;
>>   
>> +	if (xfs_sb_version_hasinlinedata(&mp->m_sb) &&
>> +		XFS_IFORK_FORMAT(ip, XFS_DATA_FORK) != XFS_DINODE_FMT_BTREE) {
>> +		ret = xfs_file_inline_data_write(ip, XFS_DATA_FORK, iocb, from);
>> +		if (XFS_IFORK_FORMAT(ip, XFS_DATA_FORK) ==
>> +							XFS_DINODE_FMT_LOCAL){
> What's going on with the indenting here?

Oops, sorry, I will fix it.

Thanks
Shan Hai
>
> --D
>
>> +			if (likely(ret >= 0))
>> +				iocb->ki_pos += ret;
>> +			goto out;
>> +		}
>> +	}
>> +
>>   	/* We can write back this queue in page reclaim */
>>   	current->backing_dev_info = inode_to_bdi(inode);
>>   
>> -- 
>> 2.11.0
>>
>> --
>> To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
>> the body of a message to majordomo@vger.kernel.org
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html


^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH RFC 1/8] xfs: introduce inline data superblock feature bit
  2018-07-06  3:34   ` Darrick J. Wong
@ 2018-07-06  4:06     ` Shan Hai
  0 siblings, 0 replies; 50+ messages in thread
From: Shan Hai @ 2018-07-06  4:06 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: linux-xfs



On 2018年07月06日 11:34, Darrick J. Wong wrote:
> On Fri, Jul 06, 2018 at 11:12:22AM +0800, Shan Hai wrote:
>> The data inlining is an on-disk format change, so introduce a new
>> superblock feature bit to work with the mkfs tool to handle the
>> inline data, the feature bit is set at mkfs time.
>>
>> Signed-off-by: Shan Hai <shan.hai@oracle.com>
>> ---
>>   fs/xfs/libxfs/xfs_format.h | 10 +++++++++-
>>   fs/xfs/libxfs/xfs_fs.h     |  1 +
>>   fs/xfs/libxfs/xfs_sb.c     |  2 ++
>>   3 files changed, 12 insertions(+), 1 deletion(-)
>>
>> diff --git a/fs/xfs/libxfs/xfs_format.h b/fs/xfs/libxfs/xfs_format.h
>> index 059bc44c27e8..6066ba7fdaf7 100644
>> --- a/fs/xfs/libxfs/xfs_format.h
>> +++ b/fs/xfs/libxfs/xfs_format.h
>> @@ -469,10 +469,12 @@ xfs_sb_has_ro_compat_feature(
>>   #define XFS_SB_FEAT_INCOMPAT_FTYPE	(1 << 0)	/* filetype in dirent */
>>   #define XFS_SB_FEAT_INCOMPAT_SPINODES	(1 << 1)	/* sparse inode chunks */
>>   #define XFS_SB_FEAT_INCOMPAT_META_UUID	(1 << 2)	/* metadata UUID */
>> +#define XFS_SB_FEAT_INCOMPAT_INLINEDATA (1 << 3)	/* inline inode data */
>>   #define XFS_SB_FEAT_INCOMPAT_ALL \
>>   		(XFS_SB_FEAT_INCOMPAT_FTYPE|	\
>>   		 XFS_SB_FEAT_INCOMPAT_SPINODES|	\
>> -		 XFS_SB_FEAT_INCOMPAT_META_UUID)
>> +		 XFS_SB_FEAT_INCOMPAT_META_UUID|\
>> +		 XFS_SB_FEAT_INCOMPAT_INLINEDATA)
> Uh... don't go enabling this feature in the _ALL flags until you've
> finished putting all the code in...
>

Agreed, will fix it, thanks for the suggestion.

Thanks
Shan Hai
>>   
>>   #define XFS_SB_FEAT_INCOMPAT_UNKNOWN	~XFS_SB_FEAT_INCOMPAT_ALL
>>   static inline bool
>> @@ -550,6 +552,12 @@ static inline bool xfs_sb_version_hasreflink(struct xfs_sb *sbp)
>>   		(sbp->sb_features_ro_compat & XFS_SB_FEAT_RO_COMPAT_REFLINK);
>>   }
>>   
>> +static inline bool xfs_sb_version_hasinlinedata(struct xfs_sb *sbp)
>> +{
>> +	return (XFS_SB_VERSION_NUM(sbp) == XFS_SB_VERSION_5) &&
>> +		(sbp->sb_features_incompat & XFS_SB_FEAT_INCOMPAT_INLINEDATA);
>> +}
>> +
>>   /*
>>    * end of superblock version macros
>>    */
>> diff --git a/fs/xfs/libxfs/xfs_fs.h b/fs/xfs/libxfs/xfs_fs.h
>> index f3aa59302fef..ee1f7621c0e4 100644
>> --- a/fs/xfs/libxfs/xfs_fs.h
>> +++ b/fs/xfs/libxfs/xfs_fs.h
>> @@ -210,6 +210,7 @@ typedef struct xfs_fsop_resblks {
>>   #define XFS_FSOP_GEOM_FLAGS_SPINODES	0x40000	/* sparse inode chunks	*/
>>   #define XFS_FSOP_GEOM_FLAGS_RMAPBT	0x80000	/* reverse mapping btree */
>>   #define XFS_FSOP_GEOM_FLAGS_REFLINK	0x100000 /* files can share blocks */
>> +#define XFS_FSOP_GEOM_FLAGS_INLINEDATA	0x200000 /* inline data into inode */
>>   
>>   /*
>>    * Minimum and maximum sizes need for growth checks.
>> diff --git a/fs/xfs/libxfs/xfs_sb.c b/fs/xfs/libxfs/xfs_sb.c
>> index 350119eeaecb..4002ed105b6b 100644
>> --- a/fs/xfs/libxfs/xfs_sb.c
>> +++ b/fs/xfs/libxfs/xfs_sb.c
>> @@ -1068,6 +1068,8 @@ xfs_fs_geometry(
>>   		geo->logsectsize = sbp->sb_logsectsize;
>>   	else
>>   		geo->logsectsize = BBSIZE;
>> +	if (xfs_sb_version_hasinlinedata(sbp))
>> +		geo->flags |= XFS_FSOP_GEOM_FLAGS_INLINEDATA;
>>   	geo->rtsectsize = sbp->sb_blocksize;
>>   	geo->dirblocksize = xfs_dir2_dirblock_bytes(sbp);
>>   
>> -- 
>> 2.11.0
>>
>> --
>> To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
>> the body of a message to majordomo@vger.kernel.org
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html


^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH RFC 0/8] xfs: introduce inode data inline feature
  2018-07-06  3:51 ` [PATCH RFC 0/8] xfs: introduce inode data inline feature Darrick J. Wong
@ 2018-07-06  4:09   ` Shan Hai
  0 siblings, 0 replies; 50+ messages in thread
From: Shan Hai @ 2018-07-06  4:09 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: linux-xfs



On 2018年07月06日 11:51, Darrick J. Wong wrote:
> On Fri, Jul 06, 2018 at 11:12:21AM +0800, Shan Hai wrote:
>> This series implements xfs inode data inlining feature.
>>
>> Refered below link during development:
>> https://marc.info/?l=linux-xfs&m=120493585731509&w=2
>>
>>
>> How it works:
>> - the data inlining happens at:
>>    write_iter: for DIO/DAX write
>>    writeback: for buffered write
>> - extents to local format conversion is done in writeback but not in write_iter
>> - local to extents format conversion is done in the write_iter and writeback
>> - log the whole inode (core/data) on extents to local conversion
>> - add a new mkfs.xfs option to enable data inline feature
>>
>> How to test:
>> mkfs.xfs -f -i size=2048,inline=1 -m crc=1,finobt=1,rmapbt=1,reflink=1 /dev/sdX
>> mount /dev/sdX mnt_pnt
>>
>> Test results:
>> mkfs.xfs -f -i size=2048,inline=1 -m crc=1,finobt=1,rmapbt=1,reflink=1 /dev/sdb
>> mkfs.xfs -f -i size=2048          -m crc=1,finobt=1,rmapbt=1,reflink=1 /dev/sdd
>> mount /dev/sdb /mnt/sdb
>> mount /dev/sdd /mnt/sdd
>>
>> Copy the linux v4.18-rc3 source code to /mnt/sdb and /mnt/sdd respectively:
>> Filesystem                        Size  Used Avail Use% Mounted on
>> /dev/sdb                           10G  2.4G  7.7G  24% /mnt/sdb
>> /dev/sdd                           10G  3.1G  6.9G  31% /mnt/sdd

An observable savings here.

>> mkfs.xfs -f -i size=2048,inline=1 -m crc=1,finobt=1,rmapbt=1,reflink=1 /dev/sdc
>> mkfs.xfs -f -i size=2048          -m crc=1,finobt=1,rmapbt=1,reflink=1 /dev/sde
>> mount /dev/sdc /mnt/sdc
>> mount /dev/sde /mnt/sde
>>
>> Build the v4.18-rc3 kernel on /dev/sdc and /dev/sde respectively:
>> make x86_64_defconfig
>>
>> Filesystem                        Size  Used Avail Use% Mounted on
>> /dev/sdc                           10G  689M  9.4G   7% /mnt/sdc
>> /dev/sde                           10G  694M  9.4G   7% /mnt/sde
> Not much of a savings... but I'll give this a preliminary review anyway.
>
> Also, I am probably going to merge hch's bufferhead killing series for
> 4.19, which will rip right through a lot of this. :)

That would be fine, thanks in advance for the review anyway :)

Thanks
Shan Hai
> --D
>
>> Kernel part:
>> 0001-xfs-introduce-inline-data-superblock-feature-bit.patch
>> 0002-xfs-introduce-extents-to-local-conversion-helper.patch
>> 0003-xfs-convert-inode-from-extents-to-local-format.patch
>> 0004-xfs-implement-inline-data-read-write-code.patch
>> 0005-xfs-consider-the-local-format-inode-in-misc-operatio.patch
>> 0006-xfs-fix-imbalanced-locking.patch
>> 0007-xfs-return-non-zero-blocks-for-inline-data.patch
>> 0008-xfs-skip-local-format-inode-for-reflinking.patch
>>
>>   fs/xfs/libxfs/xfs_bmap.c      |  78 +++++++++++++++++++++++++++++++-----
>>   fs/xfs/libxfs/xfs_bmap.h      |   4 ++
>>   fs/xfs/libxfs/xfs_format.h    |  10 ++++-
>>   fs/xfs/libxfs/xfs_fs.h        |   1 +
>>   fs/xfs/libxfs/xfs_inode_buf.c |   6 ---
>>   fs/xfs/libxfs/xfs_sb.c        |   2 +
>>   fs/xfs/scrub/inode.c          |   2 +-
>>   fs/xfs/xfs_aops.c             |  60 ++++++++++++++++++++++++++++
>>   fs/xfs/xfs_bmap_util.c        |  11 +++++-
>>   fs/xfs/xfs_file.c             | 244 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++-
>>   fs/xfs/xfs_inode.c            |  20 +++++++---
>>   fs/xfs/xfs_inode_item.c       |   5 ++-
>>   fs/xfs/xfs_ioctl.c            |   5 ++-
>>   fs/xfs/xfs_iomap.c            |   5 ++-
>>   fs/xfs/xfs_iops.c             |   8 +++-
>>   fs/xfs/xfs_log_recover.c      |   3 +-
>>   fs/xfs/xfs_reflink.c          |   6 +++
>>   17 files changed, 437 insertions(+), 33 deletions(-)
>>
>> xfsprogs part:
>> 0001-xfsprogs-add-inode-inline-data-support.patch
>>
>>   growfs/xfs_growfs.c | 14 +++++++++-----
>>   libxfs/xfs_format.h |  4 +++-
>>   libxfs/xfs_fs.h     |  1 +
>>   mkfs/xfs_mkfs.c     | 20 +++++++++++++++++---
>>   4 files changed, 30 insertions(+), 9 deletions(-)
>>
>>
>> Thanks
>> Shan Hai
>> --
>> To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
>> the body of a message to majordomo@vger.kernel.org
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> --
> To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html


^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH RFC 2/8] xfs: introduce extents to local conversion helper
  2018-07-06  3:45   ` Darrick J. Wong
@ 2018-07-06  4:15     ` Shan Hai
  0 siblings, 0 replies; 50+ messages in thread
From: Shan Hai @ 2018-07-06  4:15 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: linux-xfs



On 2018年07月06日 11:45, Darrick J. Wong wrote:
> On Fri, Jul 06, 2018 at 11:12:23AM +0800, Shan Hai wrote:
>> Delete the extents from xfs inode, copy the data into the data fork,
>> convert the inode from extents to local format and specify log the
>> core and data fork of the inode.
>>
>> Signed-off-by: Shan Hai <shan.hai@oracle.com>
>> ---
>>   fs/xfs/libxfs/xfs_bmap.c | 60 +++++++++++++++++++++++++++++++++++++++++++++++-
>>   fs/xfs/libxfs/xfs_bmap.h |  4 ++++
>>   2 files changed, 63 insertions(+), 1 deletion(-)
>>
>> diff --git a/fs/xfs/libxfs/xfs_bmap.c b/fs/xfs/libxfs/xfs_bmap.c
>> index 7205268b30bc..bea6dc254a7d 100644
>> --- a/fs/xfs/libxfs/xfs_bmap.c
>> +++ b/fs/xfs/libxfs/xfs_bmap.c
>> @@ -1,6 +1,7 @@
>>   // SPDX-License-Identifier: GPL-2.0
>>   /*
>>    * Copyright (c) 2000-2006 Silicon Graphics, Inc.
>> + * Copyright (c) 2018 Oracle.
>>    * All Rights Reserved.
>>    */
>>   #include "xfs.h"
>> @@ -213,7 +214,7 @@ xfs_default_attroffset(
>>    * attribute fork from local to extent format - we reset it where
>>    * possible to make space available for inline data fork extents.
>>    */
>> -STATIC void
>> +void
>>   xfs_bmap_forkoff_reset(
> Unrelated change in this patch?

Need this in the 3rd patch to convert the inode from extents to local, 
so export it,
because I thought that the conversion function should reside in the 
xfs_file.c instead
of xfs_bmap.c.

> Also, is it safe to reset forkoff when converting the data fork to
> extents?  Does doing so mes up the attr fork?
>

The attr fork should be aware the forkoff changes, I will double check 
this anyway.

Thanks
Shan Hai
>>   	xfs_inode_t	*ip,
>>   	int		whichfork)
>> @@ -5141,6 +5142,63 @@ xfs_bmap_del_extent_real(
>>   }
>>   
>>   /*
>> + * Convert an inode from extents to the local format.
>> + * Free all the extents of the inode and reset it to the local
>> + * format. Copy the contents of the inode's blocks to the inode's
>> + * literal area.
>> + */
>> +int
>> +xfs_bmap_extents_to_local(
>> +	xfs_trans_t		*tp,
>> +	xfs_inode_t		*ip,
> No typedefs, please.
>
> 	struct xfs_inode	*ip,

OK, I will fix it.

Thanks
Shan Hai
>> +	struct xfs_defer_ops	*dfops,
>> +	int			*logflagsp,
>> +	int			whichfork,
>> +	struct page		*page)
>> +{
>> +	xfs_ifork_t		*ifp = XFS_IFORK_PTR(ip, whichfork);
>> +	xfs_fileoff_t		isize = i_size_read(VFS_I(ip));
>> +	struct xfs_bmbt_irec	got, del;
>> +	struct xfs_iext_cursor	icur;
>> +	int			error = 0;
>> +	int			tmp_logflags;
>> +	char			*kaddr = NULL;
>> +
>> +	ASSERT(whichfork != XFS_COW_FORK);
>> +	ASSERT(XFS_IFORK_FORMAT(ip, whichfork) == XFS_DINODE_FMT_EXTENTS);
>> +
>> +	if (xfs_iext_count(ifp) == 0)
>> +		goto init_local_fork;
>> +
>> +	for_each_xfs_iext(ifp, &icur, &got) {
>> +		del = got;
>> +		if (isnullstartblock(got.br_startblock)) {
>> +			error = xfs_bmap_del_extent_delay(ip,
>> +					whichfork, &icur, &got, &del);
>> +			if (error)
>> +				goto out;
>> +		} else {
>> +			error = xfs_bmap_del_extent_real(ip, tp, &icur, dfops,
>> +					NULL, &del, &tmp_logflags, whichfork, 0);
>> +			if (error)
>> +				goto out;
>> +			*logflagsp |= tmp_logflags;
>> +		}
>> +	}
>> +init_local_fork:
>> +	kaddr = kmap_atomic(page);
>> +	xfs_init_local_fork(ip, whichfork, kaddr, isize);
>> +	kunmap_atomic(kaddr);
>> +	ip->i_d.di_format = XFS_DINODE_FMT_LOCAL;
>> +	XFS_IFORK_NEXT_SET(ip, whichfork, 0);
>> +	ip->i_d.di_size = isize;
>> +	*logflagsp |= (XFS_ILOG_DDATA | XFS_ILOG_CORE);
>> +out:
>> +	return error;
>> +}
>> +
>> +
>> +/*
>>    * Unmap (remove) blocks from a file.
>>    * If nexts is nonzero then the number of extents to remove is limited to
>>    * that value.  If not all extents in the block range can be removed then
>> diff --git a/fs/xfs/libxfs/xfs_bmap.h b/fs/xfs/libxfs/xfs_bmap.h
>> index 9b49ddf99c41..22cd2642f1cd 100644
>> --- a/fs/xfs/libxfs/xfs_bmap.h
>> +++ b/fs/xfs/libxfs/xfs_bmap.h
>> @@ -271,6 +271,10 @@ int	xfs_bmap_map_extent(struct xfs_mount *mp, struct xfs_defer_ops *dfops,
>>   		struct xfs_inode *ip, struct xfs_bmbt_irec *imap);
>>   int	xfs_bmap_unmap_extent(struct xfs_mount *mp, struct xfs_defer_ops *dfops,
>>   		struct xfs_inode *ip, struct xfs_bmbt_irec *imap);
>> +int	xfs_bmap_extents_to_local(struct xfs_trans *tp, struct xfs_inode *ip,
>> +		struct xfs_defer_ops *dfops, int *flags, int whichfork,
>> +		struct page *page);
>> +void	xfs_bmap_forkoff_reset(struct xfs_inode *ip, int whichfork);
>>   
>>   static inline int xfs_bmap_fork_to_state(int whichfork)
>>   {
>> -- 
>> 2.11.0
>>
>> --
>> To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
>> the body of a message to majordomo@vger.kernel.org
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html


^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH RFC 3/8] xfs: convert inode from extents to local format
  2018-07-06  3:47   ` Darrick J. Wong
@ 2018-07-06  4:24     ` Shan Hai
  0 siblings, 0 replies; 50+ messages in thread
From: Shan Hai @ 2018-07-06  4:24 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: linux-xfs



On 2018年07月06日 11:47, Darrick J. Wong wrote:
> On Fri, Jul 06, 2018 at 11:12:24AM +0800, Shan Hai wrote:
>> Introduce a function to convert an inode from extents to local
>> format. The conversion happens at writeback time, this avoids
>> interfering with the delayed allocation and page cache vs
>> xfs_buf operations, the will be inlined data is directly got
>> from the page which was read and modified by the iomap write actor.
>>
>> Signed-off-by: Shan Hai <shan.hai@oracle.com>
>> ---
>>   fs/xfs/xfs_aops.c | 60 +++++++++++++++++++++++++++++++++++++++++++++++++++++++
>>   1 file changed, 60 insertions(+)
>>
>> diff --git a/fs/xfs/xfs_aops.c b/fs/xfs/xfs_aops.c
>> index 8eb3ba3d4d00..ac5a7695f363 100644
>> --- a/fs/xfs/xfs_aops.c
>> +++ b/fs/xfs/xfs_aops.c
>> @@ -9,6 +9,7 @@
>>   #include "xfs_log_format.h"
>>   #include "xfs_trans_resv.h"
>>   #include "xfs_mount.h"
>> +#include "xfs_defer.h"
>>   #include "xfs_inode.h"
>>   #include "xfs_trans.h"
>>   #include "xfs_inode_item.h"
>> @@ -887,6 +888,55 @@ xfs_map_cow(
>>   	return 0;
>>   }
>>   
>> +static int
>> +xfs_inode_extents_to_local(
>> +	xfs_inode_t		*ip,
>> +	int			whichfork,
>> +	struct page		*page)
>> +{
>> +	struct xfs_mount	*mp = ip->i_mount;
>> +	struct xfs_trans	*tp;
>> +	struct xfs_defer_ops	dfops;
>> +	xfs_fsblock_t		first_block;
>> +	int			logflags;
>> +	int			error = 0;
>> +
>> +	if (i_size_read(VFS_I(ip)) > XFS_IFORK_DSIZE(ip))
>> +		return 0;
> If it's too big to fit in local format, then isn't this an error?

No, it's not, because the written data travels to writeback means that 
the inode is
still in extents format since the higher level write_iter ends the write 
process by
copying the data to the data fork when the inode is in local format, it 
never let the
local format inode to go down the writeback path.

The xfs_writepage_map will continue as if nothing happens when 
xfs_inode_extents_to_local
returns 0.

Thanks
Shan Hai
> --D
>
>> +	error = xfs_trans_alloc(mp, &M_RES(mp)->tr_itruncate, 0, 0, 0, &tp);
>> +	if (error)
>> +		return error;
>> +
>> +	xfs_ilock(ip, XFS_ILOCK_EXCL);
>> +	xfs_trans_ijoin(tp, ip, 0);
>> +
>> +	xfs_defer_init(&dfops, &first_block);
>> +
>> +	logflags = 0;
>> +	error = xfs_bmap_extents_to_local(tp, ip, &dfops, &logflags,
>> +			XFS_DATA_FORK, page);
>> +	if (error)
>> +		goto trans_cancel;
>> +
>> +	xfs_trans_log_inode(tp, ip, logflags);
>> +	error = xfs_defer_finish(&tp, &dfops);
>> +	if (error)
>> +		goto trans_cancel;
>> +	error = xfs_trans_commit(tp);
>> +	if (error)
>> +		goto error0;
>> +	xfs_iunlock(ip, XFS_ILOCK_EXCL);
>> +
>> +	return 0;
>> +
>> +trans_cancel:
>> +	xfs_defer_cancel(&dfops);
>> +	xfs_trans_cancel(tp);
>> +error0:
>> +	xfs_iunlock(ip, XFS_ILOCK_EXCL);
>> +	return error;
>> +}
>>   /*
>>    * We implement an immediate ioend submission policy here to avoid needing to
>>    * chain multiple ioends and hence nest mempool allocations which can violate
>> @@ -920,6 +970,8 @@ xfs_writepage_map(
>>   	int			count = 0;
>>   	int			uptodate = 1;
>>   	unsigned int		new_type;
>> +	xfs_inode_t		*ip = XFS_I(inode);
>> +	struct xfs_mount	*mp = ip->i_mount;
>>   
>>   	bh = head = page_buffers(page);
>>   	offset = page_offset(page);
>> @@ -1044,6 +1096,14 @@ xfs_writepage_map(
>>   		end_page_writeback(page);
>>   	}
>>   
>> +	if (!error) {
>> +		if (xfs_sb_version_hasinlinedata(&mp->m_sb) &&
>> +				i_size_read(inode) <= XFS_IFORK_DSIZE(ip)) {
>> +			error = xfs_inode_extents_to_local(ip, XFS_DATA_FORK,
>> +								page);
>> +		}
>> +	}
>> +
>>   	mapping_set_error(page->mapping, error);
>>   	return error;
>>   }
>> -- 
>> 2.11.0
>>
>> --
>> To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
>> the body of a message to majordomo@vger.kernel.org
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html


^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH RFC 5/8] xfs: consider the local format inode in misc operations
  2018-07-06  3:40   ` Darrick J. Wong
@ 2018-07-06  4:40     ` Shan Hai
  0 siblings, 0 replies; 50+ messages in thread
From: Shan Hai @ 2018-07-06  4:40 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: linux-xfs



On 2018年07月06日 11:40, Darrick J. Wong wrote:
> On Fri, Jul 06, 2018 at 11:12:26AM +0800, Shan Hai wrote:
>> The local format inode is a legal citizen from now on and consider
>> it in misc operations.
>>
>> Signed-off-by: Shan Hai <shan.hai@oracle.com>
>> ---
>>   fs/xfs/libxfs/xfs_bmap.c      | 18 ++++++++++--------
>>   fs/xfs/libxfs/xfs_inode_buf.c |  6 ------
>>   fs/xfs/scrub/inode.c          |  2 +-
>>   fs/xfs/xfs_bmap_util.c        | 11 +++++++++--
>>   fs/xfs/xfs_inode.c            | 20 ++++++++++++++------
>>   fs/xfs/xfs_inode_item.c       |  5 ++++-
>>   fs/xfs/xfs_iomap.c            |  5 +++--
>>   fs/xfs/xfs_log_recover.c      |  3 ++-
>>   8 files changed, 43 insertions(+), 27 deletions(-)
>>
>> diff --git a/fs/xfs/libxfs/xfs_bmap.c b/fs/xfs/libxfs/xfs_bmap.c
>> index bea6dc254a7d..6b151bd15da7 100644
>> --- a/fs/xfs/libxfs/xfs_bmap.c
>> +++ b/fs/xfs/libxfs/xfs_bmap.c
>> @@ -830,11 +830,6 @@ xfs_bmap_local_to_extents(
>>   	struct xfs_bmbt_irec rec;
>>   	struct xfs_iext_cursor icur;
>>   
>> -	/*
>> -	 * We don't want to deal with the case of keeping inode data inline yet.
>> -	 * So sending the data fork of a regular inode is invalid.
>> -	 */
>> -	ASSERT(!(S_ISREG(VFS_I(ip)->i_mode) && whichfork == XFS_DATA_FORK));
>>   	ifp = XFS_IFORK_PTR(ip, whichfork);
>>   	ASSERT(XFS_IFORK_FORMAT(ip, whichfork) == XFS_DINODE_FMT_LOCAL);
>>   
>> @@ -3840,7 +3835,8 @@ xfs_bmapi_read(
>>   
>>   	if (unlikely(XFS_TEST_ERROR(
>>   	    (XFS_IFORK_FORMAT(ip, whichfork) != XFS_DINODE_FMT_EXTENTS &&
>> -	     XFS_IFORK_FORMAT(ip, whichfork) != XFS_DINODE_FMT_BTREE),
>> +	     XFS_IFORK_FORMAT(ip, whichfork) != XFS_DINODE_FMT_BTREE &&
>> +	     XFS_IFORK_FORMAT(ip, whichfork) != XFS_DINODE_FMT_LOCAL),
>>   	     mp, XFS_ERRTAG_BMAPIFORMAT))) {
>>   		XFS_ERROR_REPORT("xfs_bmapi_read", XFS_ERRLEVEL_LOW, mp);
>>   		return -EFSCORRUPTED;
>> @@ -3863,7 +3859,7 @@ xfs_bmapi_read(
>>   		return 0;
>>   	}
>>   
>> -	if (!(ifp->if_flags & XFS_IFEXTENTS)) {
>> +	if (!(ifp->if_flags & (XFS_IFINLINE | XFS_IFEXTENTS))) {
>>   		error = xfs_iread_extents(NULL, ip, whichfork);
>>   		if (error)
>>   			return error;
>> @@ -4285,7 +4281,6 @@ xfs_bmapi_write(
>>   	       (flags & (XFS_BMAPI_CONVERT | XFS_BMAPI_COWFORK)) ==
>>   			(XFS_BMAPI_CONVERT | XFS_BMAPI_COWFORK));
>>   	ASSERT(len > 0);
>> -	ASSERT(XFS_IFORK_FORMAT(ip, whichfork) != XFS_DINODE_FMT_LOCAL);
>>   	ASSERT(xfs_isilocked(ip, XFS_ILOCK_EXCL));
>>   	ASSERT(!(flags & XFS_BMAPI_REMAP));
>>   
>> @@ -5244,11 +5239,18 @@ __xfs_bunmapi(
>>   	ifp = XFS_IFORK_PTR(ip, whichfork);
>>   	if (unlikely(
>>   	    XFS_IFORK_FORMAT(ip, whichfork) != XFS_DINODE_FMT_EXTENTS &&
>> +	    XFS_IFORK_FORMAT(ip, whichfork) != XFS_DINODE_FMT_LOCAL &&
>>   	    XFS_IFORK_FORMAT(ip, whichfork) != XFS_DINODE_FMT_BTREE)) {
>>   		XFS_ERROR_REPORT("xfs_bunmapi", XFS_ERRLEVEL_LOW,
>>   				 ip->i_mount);
>>   		return -EFSCORRUPTED;
>>   	}
>> +
>> +	if (XFS_IFORK_FORMAT(ip, whichfork) == XFS_DINODE_FMT_LOCAL) {
>> +		*rlen = 0;
>> +		return 0;
>> +	}
>> +
>>   	mp = ip->i_mount;
>>   	if (XFS_FORCED_SHUTDOWN(mp))
>>   		return -EIO;
>> diff --git a/fs/xfs/libxfs/xfs_inode_buf.c b/fs/xfs/libxfs/xfs_inode_buf.c
>> index 33dc34655ac3..cb3c4b308137 100644
>> --- a/fs/xfs/libxfs/xfs_inode_buf.c
>> +++ b/fs/xfs/libxfs/xfs_inode_buf.c
>> @@ -157,7 +157,6 @@ const struct xfs_buf_ops xfs_inode_buf_ra_ops = {
>>   	.verify_write = xfs_inode_buf_write_verify,
>>   };
>>   
>> -
>>   /*
>>    * This routine is called to map an inode to the buffer containing the on-disk
>>    * version of the inode.  It returns a pointer to the buffer containing the
>> @@ -384,12 +383,7 @@ xfs_dinode_verify_fork(
>>   
>>   	switch (XFS_DFORK_FORMAT(dip, whichfork)) {
>>   	case XFS_DINODE_FMT_LOCAL:
>> -		/*
>> -		 * no local regular files yet
>> -		 */
>>   		if (whichfork == XFS_DATA_FORK) {
>> -			if (S_ISREG(be16_to_cpu(dip->di_mode)))
>> -				return __this_address;
>>   			if (be64_to_cpu(dip->di_size) >
>>   					XFS_DFORK_SIZE(dip, mp, whichfork))
>>   				return __this_address;
>> diff --git a/fs/xfs/scrub/inode.c b/fs/xfs/scrub/inode.c
>> index 7a6208505980..1501983dd7aa 100644
>> --- a/fs/xfs/scrub/inode.c
>> +++ b/fs/xfs/scrub/inode.c
>> @@ -283,7 +283,7 @@ xfs_scrub_dinode(
>>   			xfs_scrub_ino_set_corrupt(sc, ino);
>>   		break;
>>   	case XFS_DINODE_FMT_LOCAL:
>> -		if (!S_ISDIR(mode) && !S_ISLNK(mode))
>> +		if (!S_ISREG(mode) && !S_ISDIR(mode) && !S_ISLNK(mode))
>>   			xfs_scrub_ino_set_corrupt(sc, ino);
>>   		break;
>>   	case XFS_DINODE_FMT_EXTENTS:
>> diff --git a/fs/xfs/xfs_bmap_util.c b/fs/xfs/xfs_bmap_util.c
>> index 83b1e8c6c18f..46f718177fd7 100644
>> --- a/fs/xfs/xfs_bmap_util.c
>> +++ b/fs/xfs/xfs_bmap_util.c
>> @@ -760,7 +760,8 @@ xfs_can_free_eofblocks(struct xfs_inode *ip, bool force)
>>   		return false;
>>   
>>   	/* If we haven't read in the extent list, then don't do it now. */
>> -	if (!(ip->i_df.if_flags & XFS_IFEXTENTS))
>> +	if (ip->i_d.di_format != XFS_DINODE_FMT_LOCAL &&
>> +		!(ip->i_df.if_flags & XFS_IFEXTENTS))
>>   		return false;
>>   
>>   	/*
>> @@ -768,7 +769,8 @@ xfs_can_free_eofblocks(struct xfs_inode *ip, bool force)
>>   	 * has delalloc blocks and we are forced to remove them.
>>   	 */
>>   	if (ip->i_d.di_flags & (XFS_DIFLAG_PREALLOC | XFS_DIFLAG_APPEND))
>> -		if (!force || ip->i_delayed_blks == 0)
>> +		if ((ip->i_d.di_format != XFS_DINODE_FMT_LOCAL) &&
>> +			(!force || ip->i_delayed_blks == 0))
>>   			return false;
>>   
>>   	return true;
>> @@ -792,6 +794,11 @@ xfs_free_eofblocks(
>>   	struct xfs_bmbt_irec	imap;
>>   	struct xfs_mount	*mp = ip->i_mount;
>>   
>> +	if (xfs_sb_version_hasinlinedata(&mp->m_sb) &&
>> +		ip->i_d.di_format == XFS_DINODE_FMT_LOCAL) {
> Bleh, this indentation is confusing me.
>
> if (foobarblahblabhlbahlabhlabh... &&
>      bazcow...)
> 	dostuff();
>
> or:
>
> if (foobarblahblabhlabhalbhba.. &&
> 		bazcow...)
> 	dostuff();
>
>> +		ip->i_delayed_blks = 0;
>> +		return 0;
>> +	}
>>   	/*
>>   	 * Figure out if there are any blocks beyond the end
>>   	 * of the file.  If not, then there is nothing to do.
>> diff --git a/fs/xfs/xfs_inode.c b/fs/xfs/xfs_inode.c
>> index 5df4de666cc1..78b9790a7cd4 100644
>> --- a/fs/xfs/xfs_inode.c
>> +++ b/fs/xfs/xfs_inode.c
>> @@ -1563,7 +1563,6 @@ xfs_itruncate_extents_flags(
>>   	ASSERT(!XFS_NOT_DQATTACHED(mp, ip));
>>   
>>   	trace_xfs_itruncate_extents_start(ip, new_size);
>> -
>>   	flags |= xfs_bmapi_aflag(whichfork);
>>   
>>   	/*
>> @@ -1745,9 +1744,15 @@ xfs_inactive_truncate(
>>   	ip->i_d.di_size = 0;
>>   	xfs_trans_log_inode(tp, ip, XFS_ILOG_CORE);
>>   
>> -	error = xfs_itruncate_extents(&tp, ip, XFS_DATA_FORK, 0);
>> -	if (error)
>> -		goto error_trans_cancel;
>> +	if (xfs_sb_version_hasinlinedata(&mp->m_sb) &&
>> +		ip->i_d.di_format == XFS_DINODE_FMT_LOCAL) {
>> +		ASSERT(ip->i_d.di_nextents == 0);
>> +		ip->i_delayed_blks = 0;
>> +	} else {
>> +		error = xfs_itruncate_extents(&tp, ip, XFS_DATA_FORK, 0);
>> +		if (error)
>> +			goto error_trans_cancel;
>> +	}
>>   
>>   	ASSERT(ip->i_d.di_nextents == 0);
>>   
>> @@ -1879,6 +1884,7 @@ xfs_inactive(
>>   	if (VFS_I(ip)->i_mode == 0) {
>>   		ASSERT(ip->i_df.if_real_bytes == 0);
>>   		ASSERT(ip->i_df.if_broot_bytes == 0);
>> +		ASSERT(ip->i_delayed_blks == 0);
> Why?

There is no delayed allocation for the existing local format inode so no 
delayed blocks at all,
like below:

1, a local format inode resides on disk
2, open and write data to it, the write_iter copies the data to the data 
fork and returns if
     the data can be inlined, the block allocation is not necessary, so 
there is no delayed
     allocation and the ip->i_delayed_blks == 0
3, unlink the inode after the inline data write, the ip->i_delayed_blks 
should equals 0

>>   		return;
>>   	}
>>   
>> @@ -1911,8 +1917,9 @@ xfs_inactive(
>>   
>>   	if (S_ISREG(VFS_I(ip)->i_mode) &&
>>   	    (ip->i_d.di_size != 0 || XFS_ISIZE(ip) != 0 ||
>> -	     ip->i_d.di_nextents > 0 || ip->i_delayed_blks > 0))
>> +	     ip->i_d.di_nextents >= 0 || ip->i_delayed_blks >= 0)) {
> Why does the test change from > to >= ?

The same reasons as described above, unlinking a local format inode 
would encounter
ip->i_delayed_blks == 0.

But it should be put in the xfs_sb_version_hasinlinedata scope, that's 
totally my fault,
will fix it.

>>   		truncate = 1;
>> +	}
>>   
>>   	error = xfs_qm_dqattach(ip);
>>   	if (error)
>> @@ -3554,7 +3561,8 @@ xfs_iflush_int(
>>   	if (S_ISREG(VFS_I(ip)->i_mode)) {
>>   		if (XFS_TEST_ERROR(
>>   		    (ip->i_d.di_format != XFS_DINODE_FMT_EXTENTS) &&
>> -		    (ip->i_d.di_format != XFS_DINODE_FMT_BTREE),
>> +		    (ip->i_d.di_format != XFS_DINODE_FMT_BTREE) &&
>> +		    (ip->i_d.di_format != XFS_DINODE_FMT_LOCAL),
>>   		    mp, XFS_ERRTAG_IFLUSH_3)) {
>>   			xfs_alert_tag(mp, XFS_PTAG_IFLUSH,
>>   				"%s: Bad regular inode %Lu, ptr "PTR_FMT,
>> diff --git a/fs/xfs/xfs_inode_item.c b/fs/xfs/xfs_inode_item.c
>> index 2389c34c172d..52b297987a75 100644
>> --- a/fs/xfs/xfs_inode_item.c
>> +++ b/fs/xfs/xfs_inode_item.c
>> @@ -139,6 +139,7 @@ xfs_inode_item_format_data_fork(
>>   	struct xfs_log_iovec	**vecp)
>>   {
>>   	struct xfs_inode	*ip = iip->ili_inode;
>> +	struct xfs_mount	*mp = ip->i_mount;
>>   	size_t			data_bytes;
>>   
>>   	switch (ip->i_d.di_format) {
>> @@ -197,7 +198,9 @@ xfs_inode_item_format_data_fork(
>>   			ASSERT(ip->i_df.if_real_bytes == 0 ||
>>   			       ip->i_df.if_real_bytes >= data_bytes);
>>   			ASSERT(ip->i_df.if_u1.if_data != NULL);
>> -			ASSERT(ip->i_d.di_size > 0);
>> +			if (!xfs_sb_version_hasinlinedata(&mp->m_sb)) {
>> +				ASSERT(ip->i_d.di_size > 0);
> ASSERT(di_size > 0 || hasinlinedata); ??

Yes, agreed, it's much better.

Thanks
Shan Hai
>> +			}
>>   			xlog_copy_iovec(lv, vecp, XLOG_REG_TYPE_ILOCAL,
>>   					ip->i_df.if_u1.if_data, data_bytes);
>>   			ilf->ilf_dsize = (unsigned)data_bytes;
>> diff --git a/fs/xfs/xfs_iomap.c b/fs/xfs/xfs_iomap.c
>> index 55876dd02f0c..aa72966081c5 100644
>> --- a/fs/xfs/xfs_iomap.c
>> +++ b/fs/xfs/xfs_iomap.c
>> @@ -532,7 +532,8 @@ xfs_file_iomap_begin_delay(
>>   
>>   	if (unlikely(XFS_TEST_ERROR(
>>   	    (XFS_IFORK_FORMAT(ip, XFS_DATA_FORK) != XFS_DINODE_FMT_EXTENTS &&
>> -	     XFS_IFORK_FORMAT(ip, XFS_DATA_FORK) != XFS_DINODE_FMT_BTREE),
>> +	     XFS_IFORK_FORMAT(ip, XFS_DATA_FORK) != XFS_DINODE_FMT_BTREE &&
>> +	     XFS_IFORK_FORMAT(ip, XFS_DATA_FORK) != XFS_DINODE_FMT_LOCAL),
>>   	     mp, XFS_ERRTAG_BMAPIFORMAT))) {
>>   		XFS_ERROR_REPORT(__func__, XFS_ERRLEVEL_LOW, mp);
>>   		error = -EFSCORRUPTED;
>> @@ -541,7 +542,7 @@ xfs_file_iomap_begin_delay(
>>   
>>   	XFS_STATS_INC(mp, xs_blk_mapw);
>>   
>> -	if (!(ifp->if_flags & XFS_IFEXTENTS)) {
>> +	if (!(ifp->if_flags & (XFS_IFINLINE |XFS_IFEXTENTS))) {
>>   		error = xfs_iread_extents(NULL, ip, XFS_DATA_FORK);
>>   		if (error)
>>   			goto out_unlock;
>> diff --git a/fs/xfs/xfs_log_recover.c b/fs/xfs/xfs_log_recover.c
>> index b181b5f57a19..d8828f275d9a 100644
>> --- a/fs/xfs/xfs_log_recover.c
>> +++ b/fs/xfs/xfs_log_recover.c
>> @@ -3121,7 +3121,8 @@ xlog_recover_inode_pass2(
>>   
>>   	if (unlikely(S_ISREG(ldip->di_mode))) {
>>   		if ((ldip->di_format != XFS_DINODE_FMT_EXTENTS) &&
>> -		    (ldip->di_format != XFS_DINODE_FMT_BTREE)) {
>> +		    (ldip->di_format != XFS_DINODE_FMT_BTREE) &&
>> +		    (ldip->di_format != XFS_DINODE_FMT_LOCAL)) {
>>   			XFS_CORRUPTION_ERROR("xlog_recover_inode_pass2(3)",
>>   					 XFS_ERRLEVEL_LOW, mp, ldip,
>>   					 sizeof(*ldip));
>> -- 
>> 2.11.0
>>
>> --
>> To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
>> the body of a message to majordomo@vger.kernel.org
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html


^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH RFC 0/8] xfs: introduce inode data inline feature
  2018-07-06  3:12 [PATCH RFC 0/8] xfs: introduce inode data inline feature Shan Hai
                   ` (9 preceding siblings ...)
  2018-07-06  3:51 ` [PATCH RFC 0/8] xfs: introduce inode data inline feature Darrick J. Wong
@ 2018-07-06  5:42 ` Dave Chinner
  2018-07-06  6:39   ` Shan Hai
  2018-07-08 15:58   ` Christoph Hellwig
  10 siblings, 2 replies; 50+ messages in thread
From: Dave Chinner @ 2018-07-06  5:42 UTC (permalink / raw)
  To: Shan Hai; +Cc: linux-xfs

On Fri, Jul 06, 2018 at 11:12:21AM +0800, Shan Hai wrote:
> 
> This series implements xfs inode data inlining feature.

Cool. I was actually thinking about this earlier this week. :)

> Refered below link during development:
> https://marc.info/?l=linux-xfs&m=120493585731509&w=2
> 
> 
> How it works:
> - the data inlining happens at:
>   write_iter: for DIO/DAX write
>   writeback: for buffered write
> - extents to local format conversion is done in writeback but not in write_iter
> - local to extents format conversion is done in the write_iter and writeback

The problem with the way local to extents conversion is imlpemented
in this patchset is that we do not synchronise data writeback with
journal commits and log recovery.  The local to extents conversion
appears to be done like so:

----
writeback arrives in local format, data in extent cached page
convert inode to extent format
allocate new block
commit allocation and inode format conversion
submit data IO to newly allocated block.
----

This works when nothing goes wrong, but it's not failsafe. The
problem is that we've logged the inode data fork conversion before
we've completed writing the data to the new block. IOWs, we can end
up in the state where the active journal recoery window does not contain
the data in the inode and the data has not yet reached the new
allocated block on disk, either.

If we crash in this window, we lose the data that was in the inode -
log recovery finishes with the inode in extent form pointing to
uninitialised data blocks. i.e. not only is it a data loss event,
it's also a security issue because it exposes stale data.

This has been the unsolved problem from all previous attempts to
implement inline data in XFS. I actually outlined this problem and
ways to solve it in the mailing list post linked above, but it
doesn't appear that either mechanism was implemented in this
patchset.

However, unlike that post from 2008, we now have infrastructure that
can be used to solve to this problem: the data path copy-on-write
mechanism.  The COW mechanism is essentially a generic
implementation of the "Method 1: use an intent/done transaction
pair" solution I describe in the above post.

This mechanism solves the above problem by storing the newly
allocated block in the in-memory COW fork and doesn't modify the
data fork until after the data write IO completes. IOWs, we do
allocation before the write IO, and do the data fork and BMBT
manipulation after the IO completes. i.e. we do the local->extent
data fork modification at IO completion using the extent that was
stored in the in memory COW fork.

Hence I think the inline data write path needs to piggy back on the
iomap COW path that we use for writing to shared extents if the
write would cause a data fork format change.

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH RFC 0/8] xfs: introduce inode data inline feature
  2018-07-06  5:42 ` Dave Chinner
@ 2018-07-06  6:39   ` Shan Hai
  2018-07-06  7:11     ` Shan Hai
  2018-07-08 15:58   ` Christoph Hellwig
  1 sibling, 1 reply; 50+ messages in thread
From: Shan Hai @ 2018-07-06  6:39 UTC (permalink / raw)
  To: Dave Chinner; +Cc: linux-xfs



On 2018年07月06日 13:42, Dave Chinner wrote:
> On Fri, Jul 06, 2018 at 11:12:21AM +0800, Shan Hai wrote:
>> This series implements xfs inode data inlining feature.
> Cool. I was actually thinking about this earlier this week. :)
>

Yes, this is a nice feature, and thanks for your previous suggestions in 
the below link,
which saved my time a lot :)

>> Refered below link during development:
>> https://marc.info/?l=linux-xfs&m=120493585731509&w=2
>>
>>
>> How it works:
>> - the data inlining happens at:
>>    write_iter: for DIO/DAX write
>>    writeback: for buffered write
>> - extents to local format conversion is done in writeback but not in write_iter
>> - local to extents format conversion is done in the write_iter and writeback
> The problem with the way local to extents conversion is imlpemented
> in this patchset is that we do not synchronise data writeback with
> journal commits and log recovery.  The local to extents conversion
> appears to be done like so:
>
> ----
> writeback arrives in local format, data in extent cached page
> convert inode to extent format
> allocate new block
> commit allocation and inode format conversion
> submit data IO to newly allocated block.
> ----
>
> This works when nothing goes wrong, but it's not failsafe. The
> problem is that we've logged the inode data fork conversion before
> we've completed writing the data to the new block. IOWs, we can end
> up in the state where the active journal recoery window does not contain
> the data in the inode and the data has not yet reached the new
> allocated block on disk, either.
>
> If we crash in this window, we lose the data that was in the inode -
> log recovery finishes with the inode in extent form pointing to
> uninitialised data blocks. i.e. not only is it a data loss event,
> it's also a security issue because it exposes stale data.
>
> This has been the unsolved problem from all previous attempts to
> implement inline data in XFS. I actually outlined this problem and
> ways to solve it in the mailing list post linked above, but it
> doesn't appear that either mechanism was implemented in this
> patchset.
>
> However, unlike that post from 2008, we now have infrastructure that
> can be used to solve to this problem: the data path copy-on-write
> mechanism.  The COW mechanism is essentially a generic
> implementation of the "Method 1: use an intent/done transaction
> pair" solution I describe in the above post.
>
> This mechanism solves the above problem by storing the newly
> allocated block in the in-memory COW fork and doesn't modify the
> data fork until after the data write IO completes. IOWs, we do
> allocation before the write IO, and do the data fork and BMBT
> manipulation after the IO completes. i.e. we do the local->extent
> data fork modification at IO completion using the extent that was
> stored in the in memory COW fork.

Yes, this can fix the race totally, I will try to implement it in the 
subsequent series.

Or how about below? It's similar to what we did in the xfs_setattr_size, 
but the sync
write in this situation is ugly I know :)

----
write arrives at local inode
allocate a page and insert it into page cache, copy the data from data 
fork to the page
filemap_write_and_wait_range
xfs_trans_roll
convert inode to extent format
allocate a new block
commit allocation and inode format conversion
----

Thanks
Shan Hai
> Hence I think the inline data write path needs to piggy back on the
> iomap COW path that we use for writing to shared extents if the
> write would cause a data fork format change.
>
> Cheers,
>
> Dave.


^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH RFC 0/8] xfs: introduce inode data inline feature
  2018-07-06  6:39   ` Shan Hai
@ 2018-07-06  7:11     ` Shan Hai
  0 siblings, 0 replies; 50+ messages in thread
From: Shan Hai @ 2018-07-06  7:11 UTC (permalink / raw)
  To: Dave Chinner; +Cc: linux-xfs



On 2018年07月06日 14:39, Shan Hai wrote:
>
>
> On 2018年07月06日 13:42, Dave Chinner wrote:
>> On Fri, Jul 06, 2018 at 11:12:21AM +0800, Shan Hai wrote:
>>> This series implements xfs inode data inlining feature.
>> Cool. I was actually thinking about this earlier this week. :)
>>
>
> Yes, this is a nice feature, and thanks for your previous suggestions 
> in the below link,
> which saved my time a lot :)
>
>>> Refered below link during development:
>>> https://marc.info/?l=linux-xfs&m=120493585731509&w=2
>>>
>>>
>>> How it works:
>>> - the data inlining happens at:
>>>    write_iter: for DIO/DAX write
>>>    writeback: for buffered write
>>> - extents to local format conversion is done in writeback but not in 
>>> write_iter
>>> - local to extents format conversion is done in the write_iter and 
>>> writeback
>> The problem with the way local to extents conversion is imlpemented
>> in this patchset is that we do not synchronise data writeback with
>> journal commits and log recovery.  The local to extents conversion
>> appears to be done like so:
>>
>> ----
>> writeback arrives in local format, data in extent cached page
>> convert inode to extent format
>> allocate new block
>> commit allocation and inode format conversion
>> submit data IO to newly allocated block.
>> ----
>>
>> This works when nothing goes wrong, but it's not failsafe. The
>> problem is that we've logged the inode data fork conversion before
>> we've completed writing the data to the new block. IOWs, we can end
>> up in the state where the active journal recoery window does not contain
>> the data in the inode and the data has not yet reached the new
>> allocated block on disk, either.
>>
>> If we crash in this window, we lose the data that was in the inode -
>> log recovery finishes with the inode in extent form pointing to
>> uninitialised data blocks. i.e. not only is it a data loss event,
>> it's also a security issue because it exposes stale data.
>>
>> This has been the unsolved problem from all previous attempts to
>> implement inline data in XFS. I actually outlined this problem and
>> ways to solve it in the mailing list post linked above, but it
>> doesn't appear that either mechanism was implemented in this
>> patchset.
>>
>> However, unlike that post from 2008, we now have infrastructure that
>> can be used to solve to this problem: the data path copy-on-write
>> mechanism.  The COW mechanism is essentially a generic
>> implementation of the "Method 1: use an intent/done transaction
>> pair" solution I describe in the above post.
>>
>> This mechanism solves the above problem by storing the newly
>> allocated block in the in-memory COW fork and doesn't modify the
>> data fork until after the data write IO completes. IOWs, we do
>> allocation before the write IO, and do the data fork and BMBT
>> manipulation after the IO completes. i.e. we do the local->extent
>> data fork modification at IO completion using the extent that was
>> stored in the in memory COW fork.
>
> Yes, this can fix the race totally, I will try to implement it in the 
> subsequent series.
>
> Or how about below? It's similar to what we did in the 
> xfs_setattr_size, but the sync
> write in this situation is ugly I know :)
>
> ----
> write arrives at local inode
> allocate a page and insert it into page cache, copy the data from data 
> fork to the page
> filemap_write_and_wait_range

Oh no, please ignore this, it would never can work, I will try the data 
path COW idea.

Thanks
Shan Hai
>
> xfs_trans_roll
> convert inode to extent format
> allocate a new block
> commit allocation and inode format conversion
> ----
>
> Thanks
> Shan Hai
>> Hence I think the inline data write path needs to piggy back on the
>> iomap COW path that we use for writing to shared extents if the
>> write would cause a data fork format change.
>>
>> Cheers,
>>
>> Dave.
>
> -- 
> To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html


^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH RFC 1/1] xfsprogs: add inode inline data support
  2018-07-06  3:35   ` Darrick J. Wong
@ 2018-07-06 19:14     ` Eric Sandeen
  0 siblings, 0 replies; 50+ messages in thread
From: Eric Sandeen @ 2018-07-06 19:14 UTC (permalink / raw)
  To: Darrick J. Wong, Shan Hai; +Cc: linux-xfs

On 7/5/18 10:35 PM, Darrick J. Wong wrote:
> On Fri, Jul 06, 2018 at 11:12:30AM +0800, Shan Hai wrote:
>> Add a new mkfs command line option to enable the inode inline data
>> feature. The mkfs set a bit in the superblock to notify the kernel
>> that the data inlining is enabled, there is no extra options for
>> mount.
>>
>> Signed-off-by: Shan Hai <shan.hai@oracle.com>
>> ---
>>  growfs/xfs_growfs.c | 14 +++++++++-----
>>  libxfs/xfs_format.h |  4 +++-
>>  libxfs/xfs_fs.h     |  1 +
>>  mkfs/xfs_mkfs.c     | 20 +++++++++++++++++---
>>  4 files changed, 30 insertions(+), 9 deletions(-)
>>
>> diff --git a/growfs/xfs_growfs.c b/growfs/xfs_growfs.c
>> index 366176b7..a370bf9d 100644
>> --- a/growfs/xfs_growfs.c
>> +++ b/growfs/xfs_growfs.c
>> @@ -60,13 +60,14 @@ report_info(
>>  	int		finobt_enabled,
>>  	int		spinodes,
>>  	int		rmapbt_enabled,
>> -	int		reflink_enabled)
>> +	int		reflink_enabled,
>> +	int		inlinedata_enabled)
>>  {
>>  	printf(_(
>>  	    "meta-data=%-22s isize=%-6u agcount=%u, agsize=%u blks\n"
>>  	    "         =%-22s sectsz=%-5u attr=%u, projid32bit=%u\n"
>>  	    "         =%-22s crc=%-8u finobt=%u spinodes=%u rmapbt=%u\n"
>> -	    "         =%-22s reflink=%u\n"
>> +	    "         =%-22s reflink=%u inline=%u\n"
> I coulda sworn we refactored this into a library function...

We did, and it's in 4.17.0 now.

On the next version please rebase this to whatever's in the xfsprogs for-next
branch at the time, that's where all the action is.  :)

Thanks,
-Eric

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH RFC 2/8] xfs: introduce extents to local conversion helper
  2018-07-06  3:12 ` [PATCH RFC 2/8] xfs: introduce extents to local conversion helper Shan Hai
  2018-07-06  3:45   ` Darrick J. Wong
@ 2018-07-08 15:42   ` Christoph Hellwig
  2018-07-09  1:58     ` Shan Hai
  1 sibling, 1 reply; 50+ messages in thread
From: Christoph Hellwig @ 2018-07-08 15:42 UTC (permalink / raw)
  To: Shan Hai; +Cc: linux-xfs

>  /*
> + * Convert an inode from extents to the local format.
> + * Free all the extents of the inode and reset it to the local
> + * format. Copy the contents of the inode's blocks to the inode's
> + * literal area.
> + */

Please use up all 80 chars for your comments (and code)

> +int
> +xfs_bmap_extents_to_local(
> +	xfs_trans_t		*tp,
> +	xfs_inode_t		*ip,

Please use the struct types instead of the typedefs wherever possible.

> +	if (xfs_iext_count(ifp) == 0)
> +		goto init_local_fork;
> +
> +	for_each_xfs_iext(ifp, &icur, &got) {

for_each_xfs_iext should do the right thing for an empty fork.

> +out:
> +	return error;

Just return errors directly instead of going to a label that returns
the error.

> +}
> +
> +
> +/*

One blank line after a function body is enough.


^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH RFC 4/8] xfs: implement inline data read write code
  2018-07-06  3:12 ` [PATCH RFC 4/8] xfs: implement inline data read write code Shan Hai
  2018-07-06  3:33   ` Darrick J. Wong
@ 2018-07-08 15:45   ` Christoph Hellwig
  2018-07-09  2:08     ` Shan Hai
  1 sibling, 1 reply; 50+ messages in thread
From: Christoph Hellwig @ 2018-07-08 15:45 UTC (permalink / raw)
  To: Shan Hai; +Cc: linux-xfs

On Fri, Jul 06, 2018 at 11:12:25AM +0800, Shan Hai wrote:
> Implement read/write functions for inline data access and use them
> for buffered/DIO/DAX operations.

Please use the iomap inline handling in this branch instead of
implementing it in XFS:

https://git.kernel.org/pub/scm/fs/xfs/xfs-linux.git/log/?h=iomap-4.19-merge

It seems like your code doesn't use the page cache for inline data
at the moment, which means you'd want to always use the direct I/O
version of the code as-is.

Althought that makes me wonder how you handle memory mapped I/O, but
I guess I'll find out later.

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH RFC 5/8] xfs: consider the local format inode in misc operations
  2018-07-06  3:12 ` [PATCH RFC 5/8] xfs: consider the local format inode in misc operations Shan Hai
  2018-07-06  3:40   ` Darrick J. Wong
@ 2018-07-08 15:51   ` Christoph Hellwig
  2018-07-09  3:06     ` Shan Hai
  1 sibling, 1 reply; 50+ messages in thread
From: Christoph Hellwig @ 2018-07-08 15:51 UTC (permalink / raw)
  To: Shan Hai; +Cc: linux-xfs

On Fri, Jul 06, 2018 at 11:12:26AM +0800, Shan Hai wrote:
> The local format inode is a legal citizen from now on and consider
> it in misc operations.

This probably wants to go towards the front of the series, and split
into multiple patches with better changelogs for each.

>  
>  	if (unlikely(XFS_TEST_ERROR(
>  	    (XFS_IFORK_FORMAT(ip, whichfork) != XFS_DINODE_FMT_EXTENTS &&
> -	     XFS_IFORK_FORMAT(ip, whichfork) != XFS_DINODE_FMT_BTREE),
> +	     XFS_IFORK_FORMAT(ip, whichfork) != XFS_DINODE_FMT_BTREE &&
> +	     XFS_IFORK_FORMAT(ip, whichfork) != XFS_DINODE_FMT_LOCAL),

This are all the three possible values.  It is kinda confusing to read
the checks this way as we alreayd checked for a valid fork type in
the inode read verifies.

Also this seems to be in xfs_bmapi_read, for which reading allowing
inline format forks seems wrong to me.

> -	if (!(ifp->if_flags & XFS_IFEXTENTS)) {
> +	if (!(ifp->if_flags & (XFS_IFINLINE | XFS_IFEXTENTS))) {
>  		error = xfs_iread_extents(NULL, ip, whichfork);
>  		if (error)
>  			return error;

I think we need to clean this up better.  We probably want an
xfs_inode_has_extents helper, or even hide the check inside
xfs_iread_extents itself.  Also I wonder if the above is read, the
XFS_IFEXTENTS flag has always meant that we have the current extent list
in memory, which should always be the case for a fork in inline format.

> @@ -5244,11 +5239,18 @@ __xfs_bunmapi(
>  	ifp = XFS_IFORK_PTR(ip, whichfork);
>  	if (unlikely(
>  	    XFS_IFORK_FORMAT(ip, whichfork) != XFS_DINODE_FMT_EXTENTS &&
> +	    XFS_IFORK_FORMAT(ip, whichfork) != XFS_DINODE_FMT_LOCAL &&
>  	    XFS_IFORK_FORMAT(ip, whichfork) != XFS_DINODE_FMT_BTREE)) {
>  		XFS_ERROR_REPORT("xfs_bunmapi", XFS_ERRLEVEL_LOW,
>  				 ip->i_mount);
>  		return -EFSCORRUPTED;
>  	}
> +
> +	if (XFS_IFORK_FORMAT(ip, whichfork) == XFS_DINODE_FMT_LOCAL) {
> +		*rlen = 0;
> +		return 0;
> +	}

I don't think we should ever call bunmapi for an inline format fork.

>  };
>  
> -
>  /*

Spurious whitespace change.

>   * This routine is called to map an inode to the buffer containing the on-disk
>   * version of the inode.  It returns a pointer to the buffer containing the
> @@ -384,12 +383,7 @@ xfs_dinode_verify_fork(
>  
>  	switch (XFS_DFORK_FORMAT(dip, whichfork)) {
>  	case XFS_DINODE_FMT_LOCAL:
> -		/*
> -		 * no local regular files yet
> -		 */
>  		if (whichfork == XFS_DATA_FORK) {
> -			if (S_ISREG(be16_to_cpu(dip->di_mode)))
> -				return __this_address;

I think you need to check the new feature bit here and only allow the
inline regular case if the feature bit it set.

> @@ -283,7 +283,7 @@ xfs_scrub_dinode(
>  			xfs_scrub_ino_set_corrupt(sc, ino);
>  		break;
>  	case XFS_DINODE_FMT_LOCAL:
> -		if (!S_ISDIR(mode) && !S_ISLNK(mode))
> +		if (!S_ISREG(mode) && !S_ISDIR(mode) && !S_ISLNK(mode))
>  			xfs_scrub_ino_set_corrupt(sc, ino);
>  		break;

Same here.

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH RFC 6/8] xfs: fix imbalanced locking
  2018-07-06  3:12 ` [PATCH RFC 6/8] xfs: fix imbalanced locking Shan Hai
@ 2018-07-08 15:53   ` Christoph Hellwig
  2018-07-09  3:07     ` Shan Hai
  0 siblings, 1 reply; 50+ messages in thread
From: Christoph Hellwig @ 2018-07-08 15:53 UTC (permalink / raw)
  To: Shan Hai; +Cc: linux-xfs

On Fri, Jul 06, 2018 at 11:12:27AM +0800, Shan Hai wrote:
> The XFS_ILOCK_SHARED type lock is held in the xfs_ioc_fsgetxattr
> but does not released when the inode is in local format, fix it by
> eleasing the lock on the local format inode.
> 
> Signed-off-by: Shan Hai <shan.hai@oracle.com>
> ---
>  fs/xfs/xfs_ioctl.c | 5 ++++-
>  1 file changed, 4 insertions(+), 1 deletion(-)
> 
> diff --git a/fs/xfs/xfs_ioctl.c b/fs/xfs/xfs_ioctl.c
> index 0ef5ece5634c..246562615ccd 100644
> --- a/fs/xfs/xfs_ioctl.c
> +++ b/fs/xfs/xfs_ioctl.c
> @@ -907,7 +907,10 @@ xfs_ioc_fsgetxattr(
>  	} else {
>  		if (ip->i_df.if_flags & XFS_IFEXTENTS)
>  			fa.fsx_nextents = xfs_iext_count(&ip->i_df);
> -		else
> +		else if (ip->i_df.if_flags & XFS_IFINLINE) {
> +			xfs_iunlock(ip, XFS_ILOCK_SHARED);
> +			return 0;
> +		} else
>  			fa.fsx_nextents = ip->i_d.di_nextents;
>  	}
>  	xfs_iunlock(ip, XFS_ILOCK_SHARED);

This looks odd, and the changelog confuses even more.

Please avoid the early return, which means we miss the copy_to_user
of the fa struct.  It seems to me like this should be something like:

		if (ip->i_df.if_flags & XFS_IFEXTENTS)
                        fa.fsx_nextents = xfs_iext_count(&ip->i_df);
		esle if (ip->i_df.if_flags & XFS_IFINLINE)
			fa.fsx_nextents = 0;
                else
			fa.fsx_nextents = ip->i_d.di_nextents;

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH RFC 7/8] xfs: return non-zero blocks for inline data
  2018-07-06  3:12 ` [PATCH RFC 7/8] xfs: return non-zero blocks for inline data Shan Hai
@ 2018-07-08 15:54   ` Christoph Hellwig
  2018-07-11 13:08   ` Carlos Maiolino
  1 sibling, 0 replies; 50+ messages in thread
From: Christoph Hellwig @ 2018-07-08 15:54 UTC (permalink / raw)
  To: Shan Hai; +Cc: linux-xfs

On Fri, Jul 06, 2018 at 11:12:28AM +0800, Shan Hai wrote:
> Return non-zero blocks for inline data even though the inode has
> no external blocks, otherwise the "ls -ls" would show zero blocks
> occupied by the file.

Does this match the behaviour of existing file systems with inline
data?  What do btrfs, ext4 or gfs2 report?

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH RFC 0/8] xfs: introduce inode data inline feature
  2018-07-06  5:42 ` Dave Chinner
  2018-07-06  6:39   ` Shan Hai
@ 2018-07-08 15:58   ` Christoph Hellwig
  1 sibling, 0 replies; 50+ messages in thread
From: Christoph Hellwig @ 2018-07-08 15:58 UTC (permalink / raw)
  To: Dave Chinner; +Cc: Shan Hai, linux-xfs

On Fri, Jul 06, 2018 at 03:42:50PM +1000, Dave Chinner wrote:
> This mechanism solves the above problem by storing the newly
> allocated block in the in-memory COW fork and doesn't modify the
> data fork until after the data write IO completes. IOWs, we do
> allocation before the write IO, and do the data fork and BMBT
> manipulation after the IO completes. i.e. we do the local->extent
> data fork modification at IO completion using the extent that was
> stored in the in memory COW fork.
> 
> Hence I think the inline data write path needs to piggy back on the
> iomap COW path that we use for writing to shared extents if the
> write would cause a data fork format change.

I'm not even sure we need the cow infrastructure for that, but yes,
we need the format conversion to be driven from the I/O completion
handler for buffer I/O writeback, and for direct I/O (including AIO)
as well.

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH RFC 8/8] xfs: skip local format inode for reflinking
  2018-07-06  3:26   ` Darrick J. Wong
  2018-07-06  3:54     ` Shan Hai
@ 2018-07-08 16:00     ` Christoph Hellwig
  1 sibling, 0 replies; 50+ messages in thread
From: Christoph Hellwig @ 2018-07-08 16:00 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: Shan Hai, linux-xfs

On Thu, Jul 05, 2018 at 08:26:53PM -0700, Darrick J. Wong wrote:
> What happens if someone tries to reflink and either src or dest are an
> inline file?

Very good question.  I'd also like to see how btrfs and gfs2
handle that case, as it would be good if all Linux file systems
behave the same way.

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH RFC 2/8] xfs: introduce extents to local conversion helper
  2018-07-08 15:42   ` Christoph Hellwig
@ 2018-07-09  1:58     ` Shan Hai
  0 siblings, 0 replies; 50+ messages in thread
From: Shan Hai @ 2018-07-09  1:58 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: linux-xfs



On 2018年07月08日 23:42, Christoph Hellwig wrote:
>>   /*
>> + * Convert an inode from extents to the local format.
>> + * Free all the extents of the inode and reset it to the local
>> + * format. Copy the contents of the inode's blocks to the inode's
>> + * literal area.
>> + */
> Please use up all 80 chars for your comments (and code)
>
>> +int
>> +xfs_bmap_extents_to_local(
>> +	xfs_trans_t		*tp,
>> +	xfs_inode_t		*ip,
> Please use the struct types instead of the typedefs wherever possible.
>
>> +	if (xfs_iext_count(ifp) == 0)
>> +		goto init_local_fork;
>> +
>> +	for_each_xfs_iext(ifp, &icur, &got) {
> for_each_xfs_iext should do the right thing for an empty fork.
>
>> +out:
>> +	return error;
> Just return errors directly instead of going to a label that returns
> the error.
>
>> +}
>> +
>> +
>> +/*
> One blank line after a function body is enough.
>

OK, I will fix all of the above, thanks for the review.

Thanks
Shan Hai

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH RFC 4/8] xfs: implement inline data read write code
  2018-07-08 15:45   ` Christoph Hellwig
@ 2018-07-09  2:08     ` Shan Hai
  0 siblings, 0 replies; 50+ messages in thread
From: Shan Hai @ 2018-07-09  2:08 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: linux-xfs



On 2018年07月08日 23:45, Christoph Hellwig wrote:
> On Fri, Jul 06, 2018 at 11:12:25AM +0800, Shan Hai wrote:
>> Implement read/write functions for inline data access and use them
>> for buffered/DIO/DAX operations.
> Please use the iomap inline handling in this branch instead of
> implementing it in XFS:
>
> https://git.kernel.org/pub/scm/fs/xfs/xfs-linux.git/log/?h=iomap-4.19-merge

OK, I will base the patches on the iomap inline handling code.

> It seems like your code doesn't use the page cache for inline data
> at the moment, which means you'd want to always use the direct I/O
> version of the code as-is.

Yes, the read path uses direct I/O version while the write is not.

> Althought that makes me wonder how you handle memory mapped I/O, but
> I guess I'll find out later.

I will use page cache to handle the mmap case, thanks for the review.

Thanks
Shan Hai

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH RFC 5/8] xfs: consider the local format inode in misc operations
  2018-07-08 15:51   ` Christoph Hellwig
@ 2018-07-09  3:06     ` Shan Hai
  0 siblings, 0 replies; 50+ messages in thread
From: Shan Hai @ 2018-07-09  3:06 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: linux-xfs



On 2018年07月08日 23:51, Christoph Hellwig wrote:
> On Fri, Jul 06, 2018 at 11:12:26AM +0800, Shan Hai wrote:
>> The local format inode is a legal citizen from now on and consider
>> it in misc operations.
> This probably wants to go towards the front of the series, and split
> into multiple patches with better changelogs for each.
>
>>   
>>   	if (unlikely(XFS_TEST_ERROR(
>>   	    (XFS_IFORK_FORMAT(ip, whichfork) != XFS_DINODE_FMT_EXTENTS &&
>> -	     XFS_IFORK_FORMAT(ip, whichfork) != XFS_DINODE_FMT_BTREE),
>> +	     XFS_IFORK_FORMAT(ip, whichfork) != XFS_DINODE_FMT_BTREE &&
>> +	     XFS_IFORK_FORMAT(ip, whichfork) != XFS_DINODE_FMT_LOCAL),
> This are all the three possible values.  It is kinda confusing to read
> the checks this way as we alreayd checked for a valid fork type in
> the inode read verifies.
>
> Also this seems to be in xfs_bmapi_read, for which reading allowing
> inline format forks seems wrong to me.

Agreed.

>> -	if (!(ifp->if_flags & XFS_IFEXTENTS)) {
>> +	if (!(ifp->if_flags & (XFS_IFINLINE | XFS_IFEXTENTS))) {
>>   		error = xfs_iread_extents(NULL, ip, whichfork);
>>   		if (error)
>>   			return error;
> I think we need to clean this up better.  We probably want an
> xfs_inode_has_extents helper, or even hide the check inside
> xfs_iread_extents itself.  Also I wonder if the above is read, the
> XFS_IFEXTENTS flag has always meant that we have the current extent list
> in memory, which should always be the case for a fork in inline format.
>

Yes, you are right, seems the above 2 checks are left over from my code 
cleanup before
sending it out, my apology for wasting your time like this,  really sorry.

>> @@ -5244,11 +5239,18 @@ __xfs_bunmapi(
>>   	ifp = XFS_IFORK_PTR(ip, whichfork);
>>   	if (unlikely(
>>   	    XFS_IFORK_FORMAT(ip, whichfork) != XFS_DINODE_FMT_EXTENTS &&
>> +	    XFS_IFORK_FORMAT(ip, whichfork) != XFS_DINODE_FMT_LOCAL &&
>>   	    XFS_IFORK_FORMAT(ip, whichfork) != XFS_DINODE_FMT_BTREE)) {
>>   		XFS_ERROR_REPORT("xfs_bunmapi", XFS_ERRLEVEL_LOW,
>>   				 ip->i_mount);
>>   		return -EFSCORRUPTED;
>>   	}
>> +
>> +	if (XFS_IFORK_FORMAT(ip, whichfork) == XFS_DINODE_FMT_LOCAL) {
>> +		*rlen = 0;
>> +		return 0;
>> +	}
> I don't think we should ever call bunmapi for an inline format fork.
>
>>   };
>>   
>> -
>>   /*
> Spurious whitespace change.
>
>>    * This routine is called to map an inode to the buffer containing the on-disk
>>    * version of the inode.  It returns a pointer to the buffer containing the
>> @@ -384,12 +383,7 @@ xfs_dinode_verify_fork(
>>   
>>   	switch (XFS_DFORK_FORMAT(dip, whichfork)) {
>>   	case XFS_DINODE_FMT_LOCAL:
>> -		/*
>> -		 * no local regular files yet
>> -		 */
>>   		if (whichfork == XFS_DATA_FORK) {
>> -			if (S_ISREG(be16_to_cpu(dip->di_mode)))
>> -				return __this_address;
> I think you need to check the new feature bit here and only allow the
> inline regular case if the feature bit it set.

OK, I will do that.

>> @@ -283,7 +283,7 @@ xfs_scrub_dinode(
>>   			xfs_scrub_ino_set_corrupt(sc, ino);
>>   		break;
>>   	case XFS_DINODE_FMT_LOCAL:
>> -		if (!S_ISDIR(mode) && !S_ISLNK(mode))
>> +		if (!S_ISREG(mode) && !S_ISDIR(mode) && !S_ISLNK(mode))
>>   			xfs_scrub_ino_set_corrupt(sc, ino);
>>   		break;
> Same here.

Ditto.

Thanks
Shan Hai

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH RFC 6/8] xfs: fix imbalanced locking
  2018-07-08 15:53   ` Christoph Hellwig
@ 2018-07-09  3:07     ` Shan Hai
  0 siblings, 0 replies; 50+ messages in thread
From: Shan Hai @ 2018-07-09  3:07 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: linux-xfs



On 2018年07月08日 23:53, Christoph Hellwig wrote:
> On Fri, Jul 06, 2018 at 11:12:27AM +0800, Shan Hai wrote:
>> The XFS_ILOCK_SHARED type lock is held in the xfs_ioc_fsgetxattr
>> but does not released when the inode is in local format, fix it by
>> eleasing the lock on the local format inode.
>>
>> Signed-off-by: Shan Hai <shan.hai@oracle.com>
>> ---
>>   fs/xfs/xfs_ioctl.c | 5 ++++-
>>   1 file changed, 4 insertions(+), 1 deletion(-)
>>
>> diff --git a/fs/xfs/xfs_ioctl.c b/fs/xfs/xfs_ioctl.c
>> index 0ef5ece5634c..246562615ccd 100644
>> --- a/fs/xfs/xfs_ioctl.c
>> +++ b/fs/xfs/xfs_ioctl.c
>> @@ -907,7 +907,10 @@ xfs_ioc_fsgetxattr(
>>   	} else {
>>   		if (ip->i_df.if_flags & XFS_IFEXTENTS)
>>   			fa.fsx_nextents = xfs_iext_count(&ip->i_df);
>> -		else
>> +		else if (ip->i_df.if_flags & XFS_IFINLINE) {
>> +			xfs_iunlock(ip, XFS_ILOCK_SHARED);
>> +			return 0;
>> +		} else
>>   			fa.fsx_nextents = ip->i_d.di_nextents;
>>   	}
>>   	xfs_iunlock(ip, XFS_ILOCK_SHARED);
> This looks odd, and the changelog confuses even more.
>
> Please avoid the early return, which means we miss the copy_to_user
> of the fa struct.  It seems to me like this should be something like:
>
> 		if (ip->i_df.if_flags & XFS_IFEXTENTS)
>                          fa.fsx_nextents = xfs_iext_count(&ip->i_df);
> 		esle if (ip->i_df.if_flags & XFS_IFINLINE)
> 			fa.fsx_nextents = 0;
>                  else
> 			fa.fsx_nextents = ip->i_d.di_nextents;


Agreed, I will double check this, thanks for the review.

Thanks
Shan Hai

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH RFC 7/8] xfs: return non-zero blocks for inline data
  2018-07-06  3:12 ` [PATCH RFC 7/8] xfs: return non-zero blocks for inline data Shan Hai
  2018-07-08 15:54   ` Christoph Hellwig
@ 2018-07-11 13:08   ` Carlos Maiolino
  2018-07-12  1:03     ` Shan Hai
  1 sibling, 1 reply; 50+ messages in thread
From: Carlos Maiolino @ 2018-07-11 13:08 UTC (permalink / raw)
  To: Shan Hai; +Cc: linux-xfs

On Fri, Jul 06, 2018 at 11:12:28AM +0800, Shan Hai wrote:
> Return non-zero blocks for inline data even though the inode has
> no external blocks, otherwise the "ls -ls" would show zero blocks
> occupied by the file.
> 

Is there any issue you ran into while leaving inodes with zero blocks allocated?
Inodes should actually report the real amount of allocated blocks, not fake it.
Inodes with inlined data should actually report 0 blocks, otherwise, many
applications which actually relies on the amount of allocated blocks for each
file will misbehave.

> Signed-off-by: Shan Hai <shan.hai@oracle.com>
> ---
>  fs/xfs/xfs_iops.c | 8 +++++++-
>  1 file changed, 7 insertions(+), 1 deletion(-)
> 
> diff --git a/fs/xfs/xfs_iops.c b/fs/xfs/xfs_iops.c
> index 0fa29f39d658..63be1355a473 100644
> --- a/fs/xfs/xfs_iops.c
> +++ b/fs/xfs/xfs_iops.c
> @@ -500,8 +500,14 @@ xfs_vn_getattr(
>  	stat->atime = inode->i_atime;
>  	stat->mtime = inode->i_mtime;
>  	stat->ctime = inode->i_ctime;
> -	stat->blocks =
> +
> +	if (xfs_sb_version_hasinlinedata(&mp->m_sb) &&
> +		XFS_IFORK_FORMAT(ip, XFS_DATA_FORK) == XFS_DINODE_FMT_LOCAL) {
> +			stat->blocks = BTOBB(XFS_ISIZE(ip));
> +	} else {
> +		stat->blocks =
>  		XFS_FSB_TO_BB(mp, ip->i_d.di_nblocks + ip->i_delayed_blks);
> +	}
>  
>  	if (ip->i_d.di_version == 3) {
>  		if (request_mask & STATX_BTIME) {
> -- 
> 2.11.0
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

-- 
Carlos

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH RFC 7/8] xfs: return non-zero blocks for inline data
  2018-07-11 13:08   ` Carlos Maiolino
@ 2018-07-12  1:03     ` Shan Hai
  2018-07-12  1:13       ` Shan Hai
  0 siblings, 1 reply; 50+ messages in thread
From: Shan Hai @ 2018-07-12  1:03 UTC (permalink / raw)
  To: linux-xfs



On 2018年07月11日 21:08, Carlos Maiolino wrote:
> On Fri, Jul 06, 2018 at 11:12:28AM +0800, Shan Hai wrote:
>> Return non-zero blocks for inline data even though the inode has
>> no external blocks, otherwise the "ls -ls" would show zero blocks
>> occupied by the file.
>>
> Is there any issue you ran into while leaving inodes with zero blocks allocated?
> Inodes should actually report the real amount of allocated blocks, not fake it.
> Inodes with inlined data should actually report 0 blocks, otherwise, many
> applications which actually relies on the amount of allocated blocks for each
> file will misbehave.
>

Man ls(1) reads:

-s, --size
     print the allocated size of each file, in blocks

So the 'ls -ls' would report 0 blocks when the data is inlined, a file 
holds data
but it consumes 0 blocks, how is it possible :), this patch fixes this 
problem.

Thanks
Shan Hai
>> Signed-off-by: Shan Hai <shan.hai@oracle.com>
>> ---
>>   fs/xfs/xfs_iops.c | 8 +++++++-
>>   1 file changed, 7 insertions(+), 1 deletion(-)
>>
>> diff --git a/fs/xfs/xfs_iops.c b/fs/xfs/xfs_iops.c
>> index 0fa29f39d658..63be1355a473 100644
>> --- a/fs/xfs/xfs_iops.c
>> +++ b/fs/xfs/xfs_iops.c
>> @@ -500,8 +500,14 @@ xfs_vn_getattr(
>>   	stat->atime = inode->i_atime;
>>   	stat->mtime = inode->i_mtime;
>>   	stat->ctime = inode->i_ctime;
>> -	stat->blocks =
>> +
>> +	if (xfs_sb_version_hasinlinedata(&mp->m_sb) &&
>> +		XFS_IFORK_FORMAT(ip, XFS_DATA_FORK) == XFS_DINODE_FMT_LOCAL) {
>> +			stat->blocks = BTOBB(XFS_ISIZE(ip));
>> +	} else {
>> +		stat->blocks =
>>   		XFS_FSB_TO_BB(mp, ip->i_d.di_nblocks + ip->i_delayed_blks);
>> +	}
>>   
>>   	if (ip->i_d.di_version == 3) {
>>   		if (request_mask & STATX_BTIME) {
>> -- 
>> 2.11.0
>>
>> --
>> To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
>> the body of a message to majordomo@vger.kernel.org
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html


^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH RFC 7/8] xfs: return non-zero blocks for inline data
  2018-07-12  1:03     ` Shan Hai
@ 2018-07-12  1:13       ` Shan Hai
  2018-07-12  1:31         ` Darrick J. Wong
  0 siblings, 1 reply; 50+ messages in thread
From: Shan Hai @ 2018-07-12  1:13 UTC (permalink / raw)
  To: linux-xfs



On 2018年07月12日 09:03, Shan Hai wrote:
>
>
> On 2018年07月11日 21:08, Carlos Maiolino wrote:
>> On Fri, Jul 06, 2018 at 11:12:28AM +0800, Shan Hai wrote:
>>> Return non-zero blocks for inline data even though the inode has
>>> no external blocks, otherwise the "ls -ls" would show zero blocks
>>> occupied by the file.
>>>
>> Is there any issue you ran into while leaving inodes with zero blocks 
>> allocated?
>> Inodes should actually report the real amount of allocated blocks, 
>> not fake it.
>> Inodes with inlined data should actually report 0 blocks, otherwise, 
>> many
>> applications which actually relies on the amount of allocated blocks 
>> for each
>> file will misbehave.
>>
>
> Man ls(1) reads:
>
> -s, --size
>     print the allocated size of each file, in blocks
>
> So the 'ls -ls' would report 0 blocks when the data is inlined, a file 
> holds data
> but it consumes 0 blocks, how is it possible :), this patch fixes this 
> problem.
>

This patch is inspired by the
upstream commit 9206c561554c9 (ext4: return non-zero st_blocks for 
inline data),
please refer it for details.

Thanks
Shan Hai

> Thanks
> Shan Hai
>>> Signed-off-by: Shan Hai <shan.hai@oracle.com>
>>> ---
>>>   fs/xfs/xfs_iops.c | 8 +++++++-
>>>   1 file changed, 7 insertions(+), 1 deletion(-)
>>>
>>> diff --git a/fs/xfs/xfs_iops.c b/fs/xfs/xfs_iops.c
>>> index 0fa29f39d658..63be1355a473 100644
>>> --- a/fs/xfs/xfs_iops.c
>>> +++ b/fs/xfs/xfs_iops.c
>>> @@ -500,8 +500,14 @@ xfs_vn_getattr(
>>>       stat->atime = inode->i_atime;
>>>       stat->mtime = inode->i_mtime;
>>>       stat->ctime = inode->i_ctime;
>>> -    stat->blocks =
>>> +
>>> +    if (xfs_sb_version_hasinlinedata(&mp->m_sb) &&
>>> +        XFS_IFORK_FORMAT(ip, XFS_DATA_FORK) == XFS_DINODE_FMT_LOCAL) {
>>> +            stat->blocks = BTOBB(XFS_ISIZE(ip));
>>> +    } else {
>>> +        stat->blocks =
>>>           XFS_FSB_TO_BB(mp, ip->i_d.di_nblocks + ip->i_delayed_blks);
>>> +    }
>>>         if (ip->i_d.di_version == 3) {
>>>           if (request_mask & STATX_BTIME) {
>>> -- 
>>> 2.11.0
>>>
>>> -- 
>>> To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
>>> the body of a message to majordomo@vger.kernel.org
>>> More majordomo info at http://vger.kernel.org/majordomo-info.html
>
> -- 
> To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html


^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH RFC 7/8] xfs: return non-zero blocks for inline data
  2018-07-12  1:13       ` Shan Hai
@ 2018-07-12  1:31         ` Darrick J. Wong
  2018-07-12  1:46           ` Shan Hai
  0 siblings, 1 reply; 50+ messages in thread
From: Darrick J. Wong @ 2018-07-12  1:31 UTC (permalink / raw)
  To: Shan Hai; +Cc: linux-xfs

On Thu, Jul 12, 2018 at 09:13:46AM +0800, Shan Hai wrote:
> 
> 
> On 2018年07月12日 09:03, Shan Hai wrote:
> > 
> > 
> > On 2018年07月11日 21:08, Carlos Maiolino wrote:
> > > On Fri, Jul 06, 2018 at 11:12:28AM +0800, Shan Hai wrote:
> > > > Return non-zero blocks for inline data even though the inode has
> > > > no external blocks, otherwise the "ls -ls" would show zero blocks
> > > > occupied by the file.
> > > > 
> > > Is there any issue you ran into while leaving inodes with zero
> > > blocks allocated?
> > > Inodes should actually report the real amount of allocated blocks,
> > > not fake it.
> > > Inodes with inlined data should actually report 0 blocks, otherwise,
> > > many
> > > applications which actually relies on the amount of allocated blocks
> > > for each
> > > file will misbehave.
> > > 
> > 
> > Man ls(1) reads:
> > 
> > -s, --size
> >     print the allocated size of each file, in blocks
> > 
> > So the 'ls -ls' would report 0 blocks when the data is inlined, a file
> > holds data
> > but it consumes 0 blocks, how is it possible :), this patch fixes this
> > problem.
> > 
> 
> This patch is inspired by the
> upstream commit 9206c561554c9 (ext4: return non-zero st_blocks for inline
> data),
> please refer it for details.

The fact that we're following precedent set by ext4 is worth mentioning
in the commit message.

--D

> Thanks
> Shan Hai
> 
> > Thanks
> > Shan Hai
> > > > Signed-off-by: Shan Hai <shan.hai@oracle.com>
> > > > ---
> > > >   fs/xfs/xfs_iops.c | 8 +++++++-
> > > >   1 file changed, 7 insertions(+), 1 deletion(-)
> > > > 
> > > > diff --git a/fs/xfs/xfs_iops.c b/fs/xfs/xfs_iops.c
> > > > index 0fa29f39d658..63be1355a473 100644
> > > > --- a/fs/xfs/xfs_iops.c
> > > > +++ b/fs/xfs/xfs_iops.c
> > > > @@ -500,8 +500,14 @@ xfs_vn_getattr(
> > > >       stat->atime = inode->i_atime;
> > > >       stat->mtime = inode->i_mtime;
> > > >       stat->ctime = inode->i_ctime;
> > > > -    stat->blocks =
> > > > +
> > > > +    if (xfs_sb_version_hasinlinedata(&mp->m_sb) &&
> > > > +        XFS_IFORK_FORMAT(ip, XFS_DATA_FORK) == XFS_DINODE_FMT_LOCAL) {
> > > > +            stat->blocks = BTOBB(XFS_ISIZE(ip));
> > > > +    } else {
> > > > +        stat->blocks =
> > > >           XFS_FSB_TO_BB(mp, ip->i_d.di_nblocks + ip->i_delayed_blks);
> > > > +    }
> > > >         if (ip->i_d.di_version == 3) {
> > > >           if (request_mask & STATX_BTIME) {
> > > > -- 
> > > > 2.11.0
> > > > 
> > > > -- 
> > > > To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
> > > > the body of a message to majordomo@vger.kernel.org
> > > > More majordomo info at http://vger.kernel.org/majordomo-info.html
> > 
> > -- 
> > To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
> > the body of a message to majordomo@vger.kernel.org
> > More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH RFC 7/8] xfs: return non-zero blocks for inline data
  2018-07-12  1:31         ` Darrick J. Wong
@ 2018-07-12  1:46           ` Shan Hai
  2018-07-12  9:08             ` Carlos Maiolino
  0 siblings, 1 reply; 50+ messages in thread
From: Shan Hai @ 2018-07-12  1:46 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: linux-xfs



On 2018年07月12日 09:31, Darrick J. Wong wrote:
> On Thu, Jul 12, 2018 at 09:13:46AM +0800, Shan Hai wrote:
>>
>> On 2018年07月12日 09:03, Shan Hai wrote:
>>>
>>> On 2018年07月11日 21:08, Carlos Maiolino wrote:
>>>> On Fri, Jul 06, 2018 at 11:12:28AM +0800, Shan Hai wrote:
>>>>> Return non-zero blocks for inline data even though the inode has
>>>>> no external blocks, otherwise the "ls -ls" would show zero blocks
>>>>> occupied by the file.
>>>>>
>>>> Is there any issue you ran into while leaving inodes with zero
>>>> blocks allocated?
>>>> Inodes should actually report the real amount of allocated blocks,
>>>> not fake it.
>>>> Inodes with inlined data should actually report 0 blocks, otherwise,
>>>> many
>>>> applications which actually relies on the amount of allocated blocks
>>>> for each
>>>> file will misbehave.
>>>>
>>> Man ls(1) reads:
>>>
>>> -s, --size
>>>      print the allocated size of each file, in blocks
>>>
>>> So the 'ls -ls' would report 0 blocks when the data is inlined, a file
>>> holds data
>>> but it consumes 0 blocks, how is it possible :), this patch fixes this
>>> problem.
>>>
>> This patch is inspired by the
>> upstream commit 9206c561554c9 (ext4: return non-zero st_blocks for inline
>> data),
>> please refer it for details.
> The fact that we're following precedent set by ext4 is worth mentioning
> in the commit message.

OK, I will include it in the next version of the patch.

Thanks
Shan Hai
> --D
>
>> Thanks
>> Shan Hai
>>
>>> Thanks
>>> Shan Hai
>>>>> Signed-off-by: Shan Hai <shan.hai@oracle.com>
>>>>> ---
>>>>>    fs/xfs/xfs_iops.c | 8 +++++++-
>>>>>    1 file changed, 7 insertions(+), 1 deletion(-)
>>>>>
>>>>> diff --git a/fs/xfs/xfs_iops.c b/fs/xfs/xfs_iops.c
>>>>> index 0fa29f39d658..63be1355a473 100644
>>>>> --- a/fs/xfs/xfs_iops.c
>>>>> +++ b/fs/xfs/xfs_iops.c
>>>>> @@ -500,8 +500,14 @@ xfs_vn_getattr(
>>>>>        stat->atime = inode->i_atime;
>>>>>        stat->mtime = inode->i_mtime;
>>>>>        stat->ctime = inode->i_ctime;
>>>>> -    stat->blocks =
>>>>> +
>>>>> +    if (xfs_sb_version_hasinlinedata(&mp->m_sb) &&
>>>>> +        XFS_IFORK_FORMAT(ip, XFS_DATA_FORK) == XFS_DINODE_FMT_LOCAL) {
>>>>> +            stat->blocks = BTOBB(XFS_ISIZE(ip));
>>>>> +    } else {
>>>>> +        stat->blocks =
>>>>>            XFS_FSB_TO_BB(mp, ip->i_d.di_nblocks + ip->i_delayed_blks);
>>>>> +    }
>>>>>          if (ip->i_d.di_version == 3) {
>>>>>            if (request_mask & STATX_BTIME) {
>>>>> -- 
>>>>> 2.11.0
>>>>>
>>>>> -- 
>>>>> To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
>>>>> the body of a message to majordomo@vger.kernel.org
>>>>> More majordomo info at http://vger.kernel.org/majordomo-info.html
>>> -- 
>>> To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
>>> the body of a message to majordomo@vger.kernel.org
>>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>> --
>> To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
>> the body of a message to majordomo@vger.kernel.org
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html


^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH RFC 7/8] xfs: return non-zero blocks for inline data
  2018-07-12  1:46           ` Shan Hai
@ 2018-07-12  9:08             ` Carlos Maiolino
  2018-07-12 10:48               ` Shan Hai
  0 siblings, 1 reply; 50+ messages in thread
From: Carlos Maiolino @ 2018-07-12  9:08 UTC (permalink / raw)
  To: Shan Hai; +Cc: Darrick J. Wong, linux-xfs

On Thu, Jul 12, 2018 at 09:46:20AM +0800, Shan Hai wrote:
> 
> 
> On 2018年07月12日 09:31, Darrick J. Wong wrote:
> > On Thu, Jul 12, 2018 at 09:13:46AM +0800, Shan Hai wrote:
> > > 
> > > On 2018年07月12日 09:03, Shan Hai wrote:
> > > > 
> > > > On 2018年07月11日 21:08, Carlos Maiolino wrote:
> > > > > On Fri, Jul 06, 2018 at 11:12:28AM +0800, Shan Hai wrote:
> > > > > > Return non-zero blocks for inline data even though the inode has
> > > > > > no external blocks, otherwise the "ls -ls" would show zero blocks
> > > > > > occupied by the file.
> > > > > > 
> > > > > Is there any issue you ran into while leaving inodes with zero
> > > > > blocks allocated?
> > > > > Inodes should actually report the real amount of allocated blocks,
> > > > > not fake it.
> > > > > Inodes with inlined data should actually report 0 blocks, otherwise,
> > > > > many
> > > > > applications which actually relies on the amount of allocated blocks
> > > > > for each
> > > > > file will misbehave.
> > > > > 
> > > > Man ls(1) reads:
> > > > 
> > > > -s, --size
> > > >      print the allocated size of each file, in blocks
> > > > 
> > > > So the 'ls -ls' would report 0 blocks when the data is inlined, a file
> > > > holds data
> > > > but it consumes 0 blocks, how is it possible :),

It is possible because the file doesn't consume data blocks at all.

> > > This patch is inspired by the
> > > upstream commit 9206c561554c9 (ext4: return non-zero st_blocks for inline
> > > data),
> > > please refer it for details.
> > The fact that we're following precedent set by ext4 is worth mentioning
> > in the commit message.

The fact another filesystem use this trick, doesn't necessarily means it's
correct. Ext4 added it to workaround a issue with tar, which actually skip zero
blocks files. I honestly think it is wrong, we are working around a user space
problem, which is wrongly assuming a 0-block file is empty.

A quick search led me to this thread from tar project:

http://www.mail-archive.com/bug-tar@gnu.org/msg04209.html

which well, from that time, they were already aware that always assuming a
zero-block file is empty, was not safe.



Reporting a used block in a file that is storing data inlined in the inode is
prone for space usage accounting error IMO.

Suppose you have a 4K block-size filesystem, you create 100k files into it with
inlined data.

So now you have 100k files reporting to be using a single FSB, wich converts
into a bit less than 400MB.

How will you make sure the accounting is correct? You can't really mark any
blocks used in the filesystem, so, tools like `df` (or any other tool based on
statfs() ), will report these 400MB as free.

At the same time, tools like `ls`, `du`, etc which relies on per-file stat, will
report these 400MB used.


I do see advantages on reporting a single block use, but IMO, it will cause more
confusion than good.

Anyway, as I said, my opinion only, I do really think reporting a single block
used by inlined files is wrong, and if we are going to do that, we should at
least properly document this is being done to workaround user space issues,
while, in the meantime, it might create others.

Cheers.

-- 
Carlos

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH RFC 7/8] xfs: return non-zero blocks for inline data
  2018-07-12  9:08             ` Carlos Maiolino
@ 2018-07-12 10:48               ` Shan Hai
  2018-07-13 12:39                 ` Carlos Maiolino
  0 siblings, 1 reply; 50+ messages in thread
From: Shan Hai @ 2018-07-12 10:48 UTC (permalink / raw)
  To: Darrick J. Wong, linux-xfs



On 2018年07月12日 17:08, Carlos Maiolino wrote:
> On Thu, Jul 12, 2018 at 09:46:20AM +0800, Shan Hai wrote:
>>
>> On 2018年07月12日 09:31, Darrick J. Wong wrote:
>>> On Thu, Jul 12, 2018 at 09:13:46AM +0800, Shan Hai wrote:
>>>> On 2018年07月12日 09:03, Shan Hai wrote:
>>>>> On 2018年07月11日 21:08, Carlos Maiolino wrote:
>>>>>> On Fri, Jul 06, 2018 at 11:12:28AM +0800, Shan Hai wrote:
>>>>>>> Return non-zero blocks for inline data even though the inode has
>>>>>>> no external blocks, otherwise the "ls -ls" would show zero blocks
>>>>>>> occupied by the file.
>>>>>>>
>>>>>> Is there any issue you ran into while leaving inodes with zero
>>>>>> blocks allocated?
>>>>>> Inodes should actually report the real amount of allocated blocks,
>>>>>> not fake it.
>>>>>> Inodes with inlined data should actually report 0 blocks, otherwise,
>>>>>> many
>>>>>> applications which actually relies on the amount of allocated blocks
>>>>>> for each
>>>>>> file will misbehave.
>>>>>>
>>>>> Man ls(1) reads:
>>>>>
>>>>> -s, --size
>>>>>       print the allocated size of each file, in blocks
>>>>>
>>>>> So the 'ls -ls' would report 0 blocks when the data is inlined, a file
>>>>> holds data
>>>>> but it consumes 0 blocks, how is it possible :),
> It is possible because the file doesn't consume data blocks at all.
>
>>>> This patch is inspired by the
>>>> upstream commit 9206c561554c9 (ext4: return non-zero st_blocks for inline
>>>> data),
>>>> please refer it for details.
>>> The fact that we're following precedent set by ext4 is worth mentioning
>>> in the commit message.
> The fact another filesystem use this trick, doesn't necessarily means it's
> correct. Ext4 added it to workaround a issue with tar, which actually skip zero
> blocks files. I honestly think it is wrong, we are working around a user space
> problem, which is wrongly assuming a 0-block file is empty.

The assumption which deemed 0 block files as empty holds true until the 
kernel broke the
rules and introduced inline data feature, so the user space should not 
be blamed in my opinion.

> A quick search led me to this thread from tar project:
>
> http://www.mail-archive.com/bug-tar@gnu.org/msg04209.html
>
> which well, from that time, they were already aware that always assuming a
> zero-block file is empty, was not safe.
>
>
>
> Reporting a used block in a file that is storing data inlined in the inode is
> prone for space usage accounting error IMO.
>
> Suppose you have a 4K block-size filesystem, you create 100k files into it with
> inlined data.
>
> So now you have 100k files reporting to be using a single FSB, wich converts
> into a bit less than 400MB.
>
> How will you make sure the accounting is correct? You can't really mark any
> blocks used in the filesystem, so, tools like `df` (or any other tool based on
> statfs() ), will report these 400MB as free.
>
> At the same time, tools like `ls`, `du`, etc which relies on per-file stat, will
> report these 400MB used.
>
>
> I do see advantages on reporting a single block use, but IMO, it will cause more
> confusion than good.
>
> Anyway, as I said, my opinion only, I do really think reporting a single block
> used by inlined files is wrong, and if we are going to do that, we should at
> least properly document this is being done to workaround user space issues,
> while, in the meantime, it might create others.

Faking block usage count is not quite correct one but the sad fact is 
that there are applications
out there which are designed and implemented on the assumption that 
non-zero length files
consume at least a block, so I don't think breaking the user space 
suddenly by reporting 0 blocks
for inline data is not a correct solution either.

I will put a big fat note in inline comment to emphasize this issue.
Any other better ideas?

Thanks
Shan Hai
> Cheers.
>


^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH RFC 7/8] xfs: return non-zero blocks for inline data
  2018-07-12 10:48               ` Shan Hai
@ 2018-07-13 12:39                 ` Carlos Maiolino
  2018-07-17 13:57                   ` Christoph Hellwig
  0 siblings, 1 reply; 50+ messages in thread
From: Carlos Maiolino @ 2018-07-13 12:39 UTC (permalink / raw)
  To: Shan Hai; +Cc: Darrick J. Wong, linux-xfs

> > > > > > > Inodes with inlined data should actually report 0 blocks, otherwise,
> > > > > > > many
> > > > > > > applications which actually relies on the amount of allocated blocks
> > > > > > > for each
> > > > > > > file will misbehave.
> > > > > > > 
> > > > > > Man ls(1) reads:
> > > > > > 
> > > > > > -s, --size
> > > > > >       print the allocated size of each file, in blocks
> > > > > > 
> > > > > > So the 'ls -ls' would report 0 blocks when the data is inlined, a file
> > > > > > holds data
> > > > > > but it consumes 0 blocks, how is it possible :),
> > It is possible because the file doesn't consume data blocks at all.
> > 
> > > > > This patch is inspired by the
> > > > > upstream commit 9206c561554c9 (ext4: return non-zero st_blocks for inline
> > > > > data),
> > > > > please refer it for details.
> > > > The fact that we're following precedent set by ext4 is worth mentioning
> > > > in the commit message.
> > The fact another filesystem use this trick, doesn't necessarily means it's
> > correct. Ext4 added it to workaround a issue with tar, which actually skip zero
> > blocks files. I honestly think it is wrong, we are working around a user space
> > problem, which is wrongly assuming a 0-block file is empty.
> 
> The assumption which deemed 0 block files as empty holds true until the
> kernel broke the
> rules and introduced inline data feature, so the user space should not be
> blamed in my opinion.

Did we? I couldn't find any interface, design, or whatever saying that userspace
can safely assume a zero block file don't have data there and can be ignored,
in fact, the link I posted previously already shows how tar people were not sure
if they should assume a zero block file could be ignored. So, in fact, we didn't
break anything, because as far as I know, nobody and no standard said an
existing zero block file could be safely ignored, assuming it contains no data
at all. Such thing might exist, but I'm not aware of.

> 
> > A quick search led me to this thread from tar project:
> > 
> > http://www.mail-archive.com/bug-tar@gnu.org/msg04209.html
> > 
> > 
> > Anyway, as I said, my opinion only, I do really think reporting a single block
> > used by inlined files is wrong, and if we are going to do that, we should at
> > least properly document this is being done to workaround user space issues,
> > while, in the meantime, it might create others.
> 
> Faking block usage count is not quite correct one but the sad fact is that
> there are applications
> out there which are designed and implemented on the assumption that non-zero
> length files
> consume at least a block, so I don't think breaking the user space suddenly
> by reporting 0 blocks
> for inline data is not a correct solution either.
> 

I understand, and yeah, we fell into some kind of chicken-egg problem, I still
do think though there are not that much applications out there assuming
zero block files are empty, and working around bad coded application which make
assumptions they are not supposed to make, is as bad as breaking applications
for not following standards, which will end up punishing applications which do
not make assumptions they should.

The biggest drawback I can see by reporting 0 blocks allocated for inline files,
might be while estimating space needed for a copy, or a backup for example,
where, if the target filesystem does not support inlined data, 0 blocks files
will occupy at least a block on the target filesystem, but the same holds true
when source and target filesystems have different block sizes.

Anyway, that's all the arguments I have against faking block usage for inlined
files, but the final decision is not mine :)

Cheers

-- 
Carlos

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH RFC 7/8] xfs: return non-zero blocks for inline data
  2018-07-13 12:39                 ` Carlos Maiolino
@ 2018-07-17 13:57                   ` Christoph Hellwig
  2018-07-18 15:03                     ` Carlos Maiolino
  0 siblings, 1 reply; 50+ messages in thread
From: Christoph Hellwig @ 2018-07-17 13:57 UTC (permalink / raw)
  To: Shan Hai, Darrick J. Wong, linux-xfs

On Fri, Jul 13, 2018 at 02:39:55PM +0200, Carlos Maiolino wrote:
> Did we? I couldn't find any interface, design, or whatever saying that userspace
> can safely assume a zero block file don't have data there and can be ignored,
> in fact, the link I posted previously already shows how tar people were not sure
> if they should assume a zero block file could be ignored. So, in fact, we didn't
> break anything, because as far as I know, nobody and no standard said an
> existing zero block file could be safely ignored, assuming it contains no data
> at all. Such thing might exist, but I'm not aware of.

You are right that there is no interface.  But breaking existing
userspace simply isn't an option either, especially when it could
lead to data loss and corruption.

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH RFC 7/8] xfs: return non-zero blocks for inline data
  2018-07-17 13:57                   ` Christoph Hellwig
@ 2018-07-18 15:03                     ` Carlos Maiolino
  0 siblings, 0 replies; 50+ messages in thread
From: Carlos Maiolino @ 2018-07-18 15:03 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: Shan Hai, Darrick J. Wong, linux-xfs

On Tue, Jul 17, 2018 at 06:57:41AM -0700, Christoph Hellwig wrote:
> On Fri, Jul 13, 2018 at 02:39:55PM +0200, Carlos Maiolino wrote:
> > Did we? I couldn't find any interface, design, or whatever saying that userspace
> > can safely assume a zero block file don't have data there and can be ignored,
> > in fact, the link I posted previously already shows how tar people were not sure
> > if they should assume a zero block file could be ignored. So, in fact, we didn't
> > break anything, because as far as I know, nobody and no standard said an
> > existing zero block file could be safely ignored, assuming it contains no data
> > at all. Such thing might exist, but I'm not aware of.
> 
> You are right that there is no interface.  But breaking existing
> userspace simply isn't an option either, especially when it could
> lead to data loss and corruption.

Fair enough.

-- 
Carlos

^ permalink raw reply	[flat|nested] 50+ messages in thread

end of thread, other threads:[~2018-07-18 15:42 UTC | newest]

Thread overview: 50+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-07-06  3:12 [PATCH RFC 0/8] xfs: introduce inode data inline feature Shan Hai
2018-07-06  3:12 ` [PATCH RFC 1/8] xfs: introduce inline data superblock feature bit Shan Hai
2018-07-06  3:34   ` Darrick J. Wong
2018-07-06  4:06     ` Shan Hai
2018-07-06  3:12 ` [PATCH RFC 2/8] xfs: introduce extents to local conversion helper Shan Hai
2018-07-06  3:45   ` Darrick J. Wong
2018-07-06  4:15     ` Shan Hai
2018-07-08 15:42   ` Christoph Hellwig
2018-07-09  1:58     ` Shan Hai
2018-07-06  3:12 ` [PATCH RFC 3/8] xfs: convert inode from extents to local format Shan Hai
2018-07-06  3:47   ` Darrick J. Wong
2018-07-06  4:24     ` Shan Hai
2018-07-06  3:12 ` [PATCH RFC 4/8] xfs: implement inline data read write code Shan Hai
2018-07-06  3:33   ` Darrick J. Wong
2018-07-06  4:05     ` Shan Hai
2018-07-08 15:45   ` Christoph Hellwig
2018-07-09  2:08     ` Shan Hai
2018-07-06  3:12 ` [PATCH RFC 5/8] xfs: consider the local format inode in misc operations Shan Hai
2018-07-06  3:40   ` Darrick J. Wong
2018-07-06  4:40     ` Shan Hai
2018-07-08 15:51   ` Christoph Hellwig
2018-07-09  3:06     ` Shan Hai
2018-07-06  3:12 ` [PATCH RFC 6/8] xfs: fix imbalanced locking Shan Hai
2018-07-08 15:53   ` Christoph Hellwig
2018-07-09  3:07     ` Shan Hai
2018-07-06  3:12 ` [PATCH RFC 7/8] xfs: return non-zero blocks for inline data Shan Hai
2018-07-08 15:54   ` Christoph Hellwig
2018-07-11 13:08   ` Carlos Maiolino
2018-07-12  1:03     ` Shan Hai
2018-07-12  1:13       ` Shan Hai
2018-07-12  1:31         ` Darrick J. Wong
2018-07-12  1:46           ` Shan Hai
2018-07-12  9:08             ` Carlos Maiolino
2018-07-12 10:48               ` Shan Hai
2018-07-13 12:39                 ` Carlos Maiolino
2018-07-17 13:57                   ` Christoph Hellwig
2018-07-18 15:03                     ` Carlos Maiolino
2018-07-06  3:12 ` [PATCH RFC 8/8] xfs: skip local format inode for reflinking Shan Hai
2018-07-06  3:26   ` Darrick J. Wong
2018-07-06  3:54     ` Shan Hai
2018-07-08 16:00     ` Christoph Hellwig
2018-07-06  3:12 ` [PATCH RFC 1/1] xfsprogs: add inode inline data support Shan Hai
2018-07-06  3:35   ` Darrick J. Wong
2018-07-06 19:14     ` Eric Sandeen
2018-07-06  3:51 ` [PATCH RFC 0/8] xfs: introduce inode data inline feature Darrick J. Wong
2018-07-06  4:09   ` Shan Hai
2018-07-06  5:42 ` Dave Chinner
2018-07-06  6:39   ` Shan Hai
2018-07-06  7:11     ` Shan Hai
2018-07-08 15:58   ` Christoph Hellwig

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.