All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 0/3] ocfs2: add nowait aio support
@ 2017-11-27  9:46 ` Gang He
  0 siblings, 0 replies; 62+ messages in thread
From: Gang He @ 2017-11-27  9:46 UTC (permalink / raw)
  To: mfasheh, jlbec, rgoldwyn, hch; +Cc: Gang He, linux-kernel, ocfs2-devel, akpm

As you know, VFS layer has introduced non-block aio
flag IOCB_NOWAIT, which informs kernel to bail out 
if an AIO request will block for reasons such as file 
allocations, or a writeback triggered, or would block
while allocating requests while performing direct I/O.
Subsequent, pwritev2/preadv2 also can leverage this
part kernel code.
So far, ext4/xfs/btrfs have supported this feature,
I'd like to add the related code for ocfs2 file system. 

Gang He (3):
  ocfs2: add ocfs2_try_rw_lock and ocfs2_try_inode_lock
  ocfs2: add ocfs2_overwrite_io function
  ocfs2: nowait aio support

 fs/ocfs2/dir.c         |  2 +-
 fs/ocfs2/dlmglue.c     | 42 ++++++++++++++++++++++++----
 fs/ocfs2/dlmglue.h     |  6 +++-
 fs/ocfs2/extent_map.c  | 67 +++++++++++++++++++++++++++++++++++++++++++++
 fs/ocfs2/extent_map.h  |  3 ++
 fs/ocfs2/file.c        | 74 +++++++++++++++++++++++++++++++++++++-------------
 fs/ocfs2/mmap.c        |  2 +-
 fs/ocfs2/ocfs2_trace.h | 10 ++++---
 8 files changed, 175 insertions(+), 31 deletions(-)

-- 
1.8.5.6

^ permalink raw reply	[flat|nested] 62+ messages in thread

* [Ocfs2-devel] [PATCH 0/3] ocfs2: add nowait aio support
@ 2017-11-27  9:46 ` Gang He
  0 siblings, 0 replies; 62+ messages in thread
From: Gang He @ 2017-11-27  9:46 UTC (permalink / raw)
  To: mfasheh, jlbec, rgoldwyn, hch; +Cc: Gang He, linux-kernel, ocfs2-devel, akpm

As you know, VFS layer has introduced non-block aio
flag IOCB_NOWAIT, which informs kernel to bail out 
if an AIO request will block for reasons such as file 
allocations, or a writeback triggered, or would block
while allocating requests while performing direct I/O.
Subsequent, pwritev2/preadv2 also can leverage this
part kernel code.
So far, ext4/xfs/btrfs have supported this feature,
I'd like to add the related code for ocfs2 file system. 

Gang He (3):
  ocfs2: add ocfs2_try_rw_lock and ocfs2_try_inode_lock
  ocfs2: add ocfs2_overwrite_io function
  ocfs2: nowait aio support

 fs/ocfs2/dir.c         |  2 +-
 fs/ocfs2/dlmglue.c     | 42 ++++++++++++++++++++++++----
 fs/ocfs2/dlmglue.h     |  6 +++-
 fs/ocfs2/extent_map.c  | 67 +++++++++++++++++++++++++++++++++++++++++++++
 fs/ocfs2/extent_map.h  |  3 ++
 fs/ocfs2/file.c        | 74 +++++++++++++++++++++++++++++++++++++-------------
 fs/ocfs2/mmap.c        |  2 +-
 fs/ocfs2/ocfs2_trace.h | 10 ++++---
 8 files changed, 175 insertions(+), 31 deletions(-)

-- 
1.8.5.6

^ permalink raw reply	[flat|nested] 62+ messages in thread

* [PATCH 1/3] ocfs2: add ocfs2_try_rw_lock and ocfs2_try_inode_lock
  2017-11-27  9:46 ` [Ocfs2-devel] " Gang He
@ 2017-11-27  9:46   ` Gang He
  -1 siblings, 0 replies; 62+ messages in thread
From: Gang He @ 2017-11-27  9:46 UTC (permalink / raw)
  To: mfasheh, jlbec, rgoldwyn, hch; +Cc: Gang He, linux-kernel, ocfs2-devel, akpm

Add ocfs2_try_rw_lock and ocfs2_try_inode_lock functions, which
will be used in non-block IO scenarios.

Signed-off-by: Gang He <ghe@suse.com>
---
 fs/ocfs2/dlmglue.c | 22 ++++++++++++++++++++++
 fs/ocfs2/dlmglue.h |  4 ++++
 2 files changed, 26 insertions(+)

diff --git a/fs/ocfs2/dlmglue.c b/fs/ocfs2/dlmglue.c
index 4689940..5cfbd04 100644
--- a/fs/ocfs2/dlmglue.c
+++ b/fs/ocfs2/dlmglue.c
@@ -1742,6 +1742,28 @@ int ocfs2_rw_lock(struct inode *inode, int write)
 	return status;
 }
 
+int ocfs2_try_rw_lock(struct inode *inode, int write)
+{
+	int status, level;
+	struct ocfs2_lock_res *lockres;
+	struct ocfs2_super *osb = OCFS2_SB(inode->i_sb);
+
+	mlog(0, "inode %llu try to take %s RW lock\n",
+	     (unsigned long long)OCFS2_I(inode)->ip_blkno,
+	     write ? "EXMODE" : "PRMODE");
+
+	if (ocfs2_mount_local(osb))
+		return 0;
+
+	lockres = &OCFS2_I(inode)->ip_rw_lockres;
+
+	level = write ? DLM_LOCK_EX : DLM_LOCK_PR;
+
+	status = ocfs2_cluster_lock(OCFS2_SB(inode->i_sb), lockres, level,
+				    DLM_LKF_NOQUEUE, 0);
+	return status;
+}
+
 void ocfs2_rw_unlock(struct inode *inode, int write)
 {
 	int level = write ? DLM_LOCK_EX : DLM_LOCK_PR;
diff --git a/fs/ocfs2/dlmglue.h b/fs/ocfs2/dlmglue.h
index a7fc18b..05910fc 100644
--- a/fs/ocfs2/dlmglue.h
+++ b/fs/ocfs2/dlmglue.h
@@ -116,6 +116,7 @@ void ocfs2_refcount_lock_res_init(struct ocfs2_lock_res *lockres,
 int ocfs2_create_new_inode_locks(struct inode *inode);
 int ocfs2_drop_inode_locks(struct inode *inode);
 int ocfs2_rw_lock(struct inode *inode, int write);
+int ocfs2_try_rw_lock(struct inode *inode, int write);
 void ocfs2_rw_unlock(struct inode *inode, int write);
 int ocfs2_open_lock(struct inode *inode);
 int ocfs2_try_open_lock(struct inode *inode, int write);
@@ -140,6 +141,9 @@ int ocfs2_inode_lock_with_page(struct inode *inode,
 /* 99% of the time we don't want to supply any additional flags --
  * those are for very specific cases only. */
 #define ocfs2_inode_lock(i, b, e) ocfs2_inode_lock_full_nested(i, b, e, 0, OI_LS_NORMAL)
+#define ocfs2_try_inode_lock(i, b, e)\
+		ocfs2_inode_lock_full_nested(i, b, e, OCFS2_META_LOCK_NOQUEUE,\
+		OI_LS_NORMAL)
 void ocfs2_inode_unlock(struct inode *inode,
 		       int ex);
 int ocfs2_super_lock(struct ocfs2_super *osb,
-- 
1.8.5.6

^ permalink raw reply	[flat|nested] 62+ messages in thread

* [Ocfs2-devel] [PATCH 1/3] ocfs2: add ocfs2_try_rw_lock and ocfs2_try_inode_lock
@ 2017-11-27  9:46   ` Gang He
  0 siblings, 0 replies; 62+ messages in thread
From: Gang He @ 2017-11-27  9:46 UTC (permalink / raw)
  To: mfasheh, jlbec, rgoldwyn, hch; +Cc: Gang He, linux-kernel, ocfs2-devel, akpm

Add ocfs2_try_rw_lock and ocfs2_try_inode_lock functions, which
will be used in non-block IO scenarios.

Signed-off-by: Gang He <ghe@suse.com>
---
 fs/ocfs2/dlmglue.c | 22 ++++++++++++++++++++++
 fs/ocfs2/dlmglue.h |  4 ++++
 2 files changed, 26 insertions(+)

diff --git a/fs/ocfs2/dlmglue.c b/fs/ocfs2/dlmglue.c
index 4689940..5cfbd04 100644
--- a/fs/ocfs2/dlmglue.c
+++ b/fs/ocfs2/dlmglue.c
@@ -1742,6 +1742,28 @@ int ocfs2_rw_lock(struct inode *inode, int write)
 	return status;
 }
 
+int ocfs2_try_rw_lock(struct inode *inode, int write)
+{
+	int status, level;
+	struct ocfs2_lock_res *lockres;
+	struct ocfs2_super *osb = OCFS2_SB(inode->i_sb);
+
+	mlog(0, "inode %llu try to take %s RW lock\n",
+	     (unsigned long long)OCFS2_I(inode)->ip_blkno,
+	     write ? "EXMODE" : "PRMODE");
+
+	if (ocfs2_mount_local(osb))
+		return 0;
+
+	lockres = &OCFS2_I(inode)->ip_rw_lockres;
+
+	level = write ? DLM_LOCK_EX : DLM_LOCK_PR;
+
+	status = ocfs2_cluster_lock(OCFS2_SB(inode->i_sb), lockres, level,
+				    DLM_LKF_NOQUEUE, 0);
+	return status;
+}
+
 void ocfs2_rw_unlock(struct inode *inode, int write)
 {
 	int level = write ? DLM_LOCK_EX : DLM_LOCK_PR;
diff --git a/fs/ocfs2/dlmglue.h b/fs/ocfs2/dlmglue.h
index a7fc18b..05910fc 100644
--- a/fs/ocfs2/dlmglue.h
+++ b/fs/ocfs2/dlmglue.h
@@ -116,6 +116,7 @@ void ocfs2_refcount_lock_res_init(struct ocfs2_lock_res *lockres,
 int ocfs2_create_new_inode_locks(struct inode *inode);
 int ocfs2_drop_inode_locks(struct inode *inode);
 int ocfs2_rw_lock(struct inode *inode, int write);
+int ocfs2_try_rw_lock(struct inode *inode, int write);
 void ocfs2_rw_unlock(struct inode *inode, int write);
 int ocfs2_open_lock(struct inode *inode);
 int ocfs2_try_open_lock(struct inode *inode, int write);
@@ -140,6 +141,9 @@ int ocfs2_inode_lock_with_page(struct inode *inode,
 /* 99% of the time we don't want to supply any additional flags --
  * those are for very specific cases only. */
 #define ocfs2_inode_lock(i, b, e) ocfs2_inode_lock_full_nested(i, b, e, 0, OI_LS_NORMAL)
+#define ocfs2_try_inode_lock(i, b, e)\
+		ocfs2_inode_lock_full_nested(i, b, e, OCFS2_META_LOCK_NOQUEUE,\
+		OI_LS_NORMAL)
 void ocfs2_inode_unlock(struct inode *inode,
 		       int ex);
 int ocfs2_super_lock(struct ocfs2_super *osb,
-- 
1.8.5.6

^ permalink raw reply	[flat|nested] 62+ messages in thread

* [PATCH 2/3] ocfs2: add ocfs2_overwrite_io function
  2017-11-27  9:46 ` [Ocfs2-devel] " Gang He
@ 2017-11-27  9:46   ` Gang He
  -1 siblings, 0 replies; 62+ messages in thread
From: Gang He @ 2017-11-27  9:46 UTC (permalink / raw)
  To: mfasheh, jlbec, rgoldwyn, hch; +Cc: Gang He, linux-kernel, ocfs2-devel, akpm

Add ocfs2_overwrite_io function, which is used to judge if
overwrite allocated blocks, otherwise, the write will bring extra
block allocation overhead.

Signed-off-by: Gang He <ghe@suse.com>
---
 fs/ocfs2/extent_map.c | 67 +++++++++++++++++++++++++++++++++++++++++++++++++++
 fs/ocfs2/extent_map.h |  3 +++
 2 files changed, 70 insertions(+)

diff --git a/fs/ocfs2/extent_map.c b/fs/ocfs2/extent_map.c
index e4719e0..98bf325 100644
--- a/fs/ocfs2/extent_map.c
+++ b/fs/ocfs2/extent_map.c
@@ -832,6 +832,73 @@ int ocfs2_fiemap(struct inode *inode, struct fiemap_extent_info *fieinfo,
 	return ret;
 }
 
+/* Is IO overwriting allocated blocks? */
+int ocfs2_overwrite_io(struct inode *inode, u64 map_start, u64 map_len,
+		       int wait)
+{
+	int ret = 0, is_last;
+	u32 mapping_end, cpos;
+	struct ocfs2_super *osb = OCFS2_SB(inode->i_sb);
+	struct buffer_head *di_bh = NULL;
+	struct ocfs2_extent_rec rec;
+
+	if (wait)
+		ret = ocfs2_inode_lock(inode, &di_bh, 0);
+	else
+		ret = ocfs2_try_inode_lock(inode, &di_bh, 0);
+	if (ret)
+		goto out;
+
+	if (wait)
+		down_read(&OCFS2_I(inode)->ip_alloc_sem);
+	else {
+		if (!down_read_trylock(&OCFS2_I(inode)->ip_alloc_sem)) {
+			ret = -EAGAIN;
+			goto out_unlock1;
+		}
+	}
+
+	if ((OCFS2_I(inode)->ip_dyn_features & OCFS2_INLINE_DATA_FL) &&
+	   ((map_start + map_len) <= i_size_read(inode)))
+		goto out_unlock2;
+
+	cpos = map_start >> osb->s_clustersize_bits;
+	mapping_end = ocfs2_clusters_for_bytes(inode->i_sb,
+					       map_start + map_len);
+	is_last = 0;
+	while (cpos < mapping_end && !is_last) {
+		ret = ocfs2_get_clusters_nocache(inode, di_bh, cpos,
+						 NULL, &rec, &is_last);
+		if (ret) {
+			mlog_errno(ret);
+			goto out_unlock2;
+		}
+
+		if (rec.e_blkno == 0ULL)
+			break;
+
+		if (rec.e_flags & OCFS2_EXT_REFCOUNTED)
+			break;
+
+		cpos = le32_to_cpu(rec.e_cpos) +
+			le16_to_cpu(rec.e_leaf_clusters);
+	}
+
+	if (cpos < mapping_end)
+		ret = 1;
+
+out_unlock2:
+	brelse(di_bh);
+
+	up_read(&OCFS2_I(inode)->ip_alloc_sem);
+
+out_unlock1:
+	ocfs2_inode_unlock(inode, 0);
+
+out:
+	return (ret ? 0 : 1);
+}
+
 int ocfs2_seek_data_hole_offset(struct file *file, loff_t *offset, int whence)
 {
 	struct inode *inode = file->f_mapping->host;
diff --git a/fs/ocfs2/extent_map.h b/fs/ocfs2/extent_map.h
index 67ea57d..fd9e86a 100644
--- a/fs/ocfs2/extent_map.h
+++ b/fs/ocfs2/extent_map.h
@@ -53,6 +53,9 @@ int ocfs2_extent_map_get_blocks(struct inode *inode, u64 v_blkno, u64 *p_blkno,
 int ocfs2_fiemap(struct inode *inode, struct fiemap_extent_info *fieinfo,
 		 u64 map_start, u64 map_len);
 
+int ocfs2_overwrite_io(struct inode *inode, u64 map_start, u64 map_len,
+		       int wait);
+
 int ocfs2_seek_data_hole_offset(struct file *file, loff_t *offset, int origin);
 
 int ocfs2_xattr_get_clusters(struct inode *inode, u32 v_cluster,
-- 
1.8.5.6

^ permalink raw reply	[flat|nested] 62+ messages in thread

* [Ocfs2-devel] [PATCH 2/3] ocfs2: add ocfs2_overwrite_io function
@ 2017-11-27  9:46   ` Gang He
  0 siblings, 0 replies; 62+ messages in thread
From: Gang He @ 2017-11-27  9:46 UTC (permalink / raw)
  To: mfasheh, jlbec, rgoldwyn, hch; +Cc: Gang He, linux-kernel, ocfs2-devel, akpm

Add ocfs2_overwrite_io function, which is used to judge if
overwrite allocated blocks, otherwise, the write will bring extra
block allocation overhead.

Signed-off-by: Gang He <ghe@suse.com>
---
 fs/ocfs2/extent_map.c | 67 +++++++++++++++++++++++++++++++++++++++++++++++++++
 fs/ocfs2/extent_map.h |  3 +++
 2 files changed, 70 insertions(+)

diff --git a/fs/ocfs2/extent_map.c b/fs/ocfs2/extent_map.c
index e4719e0..98bf325 100644
--- a/fs/ocfs2/extent_map.c
+++ b/fs/ocfs2/extent_map.c
@@ -832,6 +832,73 @@ int ocfs2_fiemap(struct inode *inode, struct fiemap_extent_info *fieinfo,
 	return ret;
 }
 
+/* Is IO overwriting allocated blocks? */
+int ocfs2_overwrite_io(struct inode *inode, u64 map_start, u64 map_len,
+		       int wait)
+{
+	int ret = 0, is_last;
+	u32 mapping_end, cpos;
+	struct ocfs2_super *osb = OCFS2_SB(inode->i_sb);
+	struct buffer_head *di_bh = NULL;
+	struct ocfs2_extent_rec rec;
+
+	if (wait)
+		ret = ocfs2_inode_lock(inode, &di_bh, 0);
+	else
+		ret = ocfs2_try_inode_lock(inode, &di_bh, 0);
+	if (ret)
+		goto out;
+
+	if (wait)
+		down_read(&OCFS2_I(inode)->ip_alloc_sem);
+	else {
+		if (!down_read_trylock(&OCFS2_I(inode)->ip_alloc_sem)) {
+			ret = -EAGAIN;
+			goto out_unlock1;
+		}
+	}
+
+	if ((OCFS2_I(inode)->ip_dyn_features & OCFS2_INLINE_DATA_FL) &&
+	   ((map_start + map_len) <= i_size_read(inode)))
+		goto out_unlock2;
+
+	cpos = map_start >> osb->s_clustersize_bits;
+	mapping_end = ocfs2_clusters_for_bytes(inode->i_sb,
+					       map_start + map_len);
+	is_last = 0;
+	while (cpos < mapping_end && !is_last) {
+		ret = ocfs2_get_clusters_nocache(inode, di_bh, cpos,
+						 NULL, &rec, &is_last);
+		if (ret) {
+			mlog_errno(ret);
+			goto out_unlock2;
+		}
+
+		if (rec.e_blkno == 0ULL)
+			break;
+
+		if (rec.e_flags & OCFS2_EXT_REFCOUNTED)
+			break;
+
+		cpos = le32_to_cpu(rec.e_cpos) +
+			le16_to_cpu(rec.e_leaf_clusters);
+	}
+
+	if (cpos < mapping_end)
+		ret = 1;
+
+out_unlock2:
+	brelse(di_bh);
+
+	up_read(&OCFS2_I(inode)->ip_alloc_sem);
+
+out_unlock1:
+	ocfs2_inode_unlock(inode, 0);
+
+out:
+	return (ret ? 0 : 1);
+}
+
 int ocfs2_seek_data_hole_offset(struct file *file, loff_t *offset, int whence)
 {
 	struct inode *inode = file->f_mapping->host;
diff --git a/fs/ocfs2/extent_map.h b/fs/ocfs2/extent_map.h
index 67ea57d..fd9e86a 100644
--- a/fs/ocfs2/extent_map.h
+++ b/fs/ocfs2/extent_map.h
@@ -53,6 +53,9 @@ int ocfs2_extent_map_get_blocks(struct inode *inode, u64 v_blkno, u64 *p_blkno,
 int ocfs2_fiemap(struct inode *inode, struct fiemap_extent_info *fieinfo,
 		 u64 map_start, u64 map_len);
 
+int ocfs2_overwrite_io(struct inode *inode, u64 map_start, u64 map_len,
+		       int wait);
+
 int ocfs2_seek_data_hole_offset(struct file *file, loff_t *offset, int origin);
 
 int ocfs2_xattr_get_clusters(struct inode *inode, u32 v_cluster,
-- 
1.8.5.6

^ permalink raw reply	[flat|nested] 62+ messages in thread

* [PATCH 3/3] ocfs2: nowait aio support
  2017-11-27  9:46 ` [Ocfs2-devel] " Gang He
@ 2017-11-27  9:46   ` Gang He
  -1 siblings, 0 replies; 62+ messages in thread
From: Gang He @ 2017-11-27  9:46 UTC (permalink / raw)
  To: mfasheh, jlbec, rgoldwyn, hch; +Cc: Gang He, linux-kernel, ocfs2-devel, akpm

Return EAGAIN if any of the following checks fail for direct I/O:
Can not get the related locks immediately,
Blocks are not allocated at the write location, it will trigger
block allocation and block IO operations.

Signed-off-by: Gang He <ghe@suse.com>
---
 fs/ocfs2/dir.c         |  2 +-
 fs/ocfs2/dlmglue.c     | 20 ++++++++++----
 fs/ocfs2/dlmglue.h     |  2 +-
 fs/ocfs2/file.c        | 74 +++++++++++++++++++++++++++++++++++++-------------
 fs/ocfs2/mmap.c        |  2 +-
 fs/ocfs2/ocfs2_trace.h | 10 ++++---
 6 files changed, 79 insertions(+), 31 deletions(-)

diff --git a/fs/ocfs2/dir.c b/fs/ocfs2/dir.c
index febe631..ea50901 100644
--- a/fs/ocfs2/dir.c
+++ b/fs/ocfs2/dir.c
@@ -1957,7 +1957,7 @@ int ocfs2_readdir(struct file *file, struct dir_context *ctx)
 
 	trace_ocfs2_readdir((unsigned long long)OCFS2_I(inode)->ip_blkno);
 
-	error = ocfs2_inode_lock_atime(inode, file->f_path.mnt, &lock_level);
+	error = ocfs2_inode_lock_atime(inode, file->f_path.mnt, &lock_level, 1);
 	if (lock_level && error >= 0) {
 		/* We release EX lock which used to update atime
 		 * and get PR lock again to reduce contention
diff --git a/fs/ocfs2/dlmglue.c b/fs/ocfs2/dlmglue.c
index 5cfbd04..feb8dbe 100644
--- a/fs/ocfs2/dlmglue.c
+++ b/fs/ocfs2/dlmglue.c
@@ -2516,13 +2516,18 @@ int ocfs2_inode_lock_with_page(struct inode *inode,
 
 int ocfs2_inode_lock_atime(struct inode *inode,
 			  struct vfsmount *vfsmnt,
-			  int *level)
+			  int *level, int wait)
 {
 	int ret;
 
-	ret = ocfs2_inode_lock(inode, NULL, 0);
+	if (wait)
+		ret = ocfs2_inode_lock(inode, NULL, 0);
+	else
+		ret = ocfs2_try_inode_lock(inode, NULL, 0);
+
 	if (ret < 0) {
-		mlog_errno(ret);
+		if (ret != -EAGAIN)
+			mlog_errno(ret);
 		return ret;
 	}
 
@@ -2534,9 +2539,14 @@ int ocfs2_inode_lock_atime(struct inode *inode,
 		struct buffer_head *bh = NULL;
 
 		ocfs2_inode_unlock(inode, 0);
-		ret = ocfs2_inode_lock(inode, &bh, 1);
+		if (wait)
+			ret = ocfs2_inode_lock(inode, &bh, 1);
+		else
+			ret = ocfs2_try_inode_lock(inode, &bh, 1);
+
 		if (ret < 0) {
-			mlog_errno(ret);
+			if (ret != -EAGAIN)
+				mlog_errno(ret);
 			return ret;
 		}
 		*level = 1;
diff --git a/fs/ocfs2/dlmglue.h b/fs/ocfs2/dlmglue.h
index 05910fc..c83dbb5 100644
--- a/fs/ocfs2/dlmglue.h
+++ b/fs/ocfs2/dlmglue.h
@@ -123,7 +123,7 @@ void ocfs2_refcount_lock_res_init(struct ocfs2_lock_res *lockres,
 void ocfs2_open_unlock(struct inode *inode);
 int ocfs2_inode_lock_atime(struct inode *inode,
 			  struct vfsmount *vfsmnt,
-			  int *level);
+			  int *level, int wait);
 int ocfs2_inode_lock_full_nested(struct inode *inode,
 			 struct buffer_head **ret_bh,
 			 int ex,
diff --git a/fs/ocfs2/file.c b/fs/ocfs2/file.c
index dc455d4..900f04e 100644
--- a/fs/ocfs2/file.c
+++ b/fs/ocfs2/file.c
@@ -140,6 +140,8 @@ static int ocfs2_file_open(struct inode *inode, struct file *file)
 		spin_unlock(&oi->ip_lock);
 	}
 
+	file->f_mode |= FMODE_NOWAIT;
+
 leave:
 	return status;
 }
@@ -2132,8 +2134,7 @@ static int ocfs2_prepare_inode_for_refcount(struct inode *inode,
 }
 
 static int ocfs2_prepare_inode_for_write(struct file *file,
-					 loff_t pos,
-					 size_t count)
+					 loff_t pos, size_t count, int wait)
 {
 	int ret = 0, meta_level = 0;
 	struct dentry *dentry = file->f_path.dentry;
@@ -2145,10 +2146,14 @@ static int ocfs2_prepare_inode_for_write(struct file *file,
 	 * if we need to make modifications here.
 	 */
 	for(;;) {
-		ret = ocfs2_inode_lock(inode, NULL, meta_level);
+		if (wait)
+			ret = ocfs2_inode_lock(inode, NULL, meta_level);
+		else
+			ret = ocfs2_try_inode_lock(inode, NULL, meta_level);
 		if (ret < 0) {
 			meta_level = -1;
-			mlog_errno(ret);
+			if (ret != -EAGAIN)
+				mlog_errno(ret);
 			goto out;
 		}
 
@@ -2199,7 +2204,7 @@ static int ocfs2_prepare_inode_for_write(struct file *file,
 
 out_unlock:
 	trace_ocfs2_prepare_inode_for_write(OCFS2_I(inode)->ip_blkno,
-					    pos, count);
+					    pos, count, wait);
 
 	if (meta_level >= 0)
 		ocfs2_inode_unlock(inode, meta_level);
@@ -2211,7 +2216,7 @@ static int ocfs2_prepare_inode_for_write(struct file *file,
 static ssize_t ocfs2_file_write_iter(struct kiocb *iocb,
 				    struct iov_iter *from)
 {
-	int direct_io, rw_level;
+	int rw_level;
 	ssize_t written = 0;
 	ssize_t ret;
 	size_t count = iov_iter_count(from);
@@ -2223,6 +2228,8 @@ static ssize_t ocfs2_file_write_iter(struct kiocb *iocb,
 	void *saved_ki_complete = NULL;
 	int append_write = ((iocb->ki_pos + count) >=
 			i_size_read(inode) ? 1 : 0);
+	int direct_io = iocb->ki_flags & IOCB_DIRECT ? 1 : 0;
+	int nowait = iocb->ki_flags & IOCB_NOWAIT ? 1 : 0;
 
 	trace_ocfs2_file_aio_write(inode, file, file->f_path.dentry,
 		(unsigned long long)OCFS2_I(inode)->ip_blkno,
@@ -2230,12 +2237,17 @@ static ssize_t ocfs2_file_write_iter(struct kiocb *iocb,
 		file->f_path.dentry->d_name.name,
 		(unsigned int)from->nr_segs);	/* GRRRRR */
 
+	if (!direct_io && nowait)
+		return -EOPNOTSUPP;
+
 	if (count == 0)
 		return 0;
 
-	direct_io = iocb->ki_flags & IOCB_DIRECT ? 1 : 0;
-
-	inode_lock(inode);
+	if (direct_io && nowait) {
+		if (!inode_trylock(inode))
+			return -EAGAIN;
+	} else
+		inode_lock(inode);
 
 	/*
 	 * Concurrent O_DIRECT writes are allowed with
@@ -2244,9 +2256,13 @@ static ssize_t ocfs2_file_write_iter(struct kiocb *iocb,
 	 */
 	rw_level = (!direct_io || full_coherency || append_write);
 
-	ret = ocfs2_rw_lock(inode, rw_level);
+	if (direct_io && nowait)
+		ret = ocfs2_try_rw_lock(inode, rw_level);
+	else
+		ret = ocfs2_rw_lock(inode, rw_level);
 	if (ret < 0) {
-		mlog_errno(ret);
+		if (ret != -EAGAIN)
+			mlog_errno(ret);
 		goto out_mutex;
 	}
 
@@ -2260,9 +2276,13 @@ static ssize_t ocfs2_file_write_iter(struct kiocb *iocb,
 		 * other nodes to drop their caches.  Buffered I/O
 		 * already does this in write_begin().
 		 */
-		ret = ocfs2_inode_lock(inode, NULL, 1);
+		if (nowait)
+			ret = ocfs2_try_inode_lock(inode, NULL, 1);
+		else
+			ret = ocfs2_inode_lock(inode, NULL, 1);
 		if (ret < 0) {
-			mlog_errno(ret);
+			if (ret != -EAGAIN)
+				mlog_errno(ret);
 			goto out;
 		}
 
@@ -2277,9 +2297,17 @@ static ssize_t ocfs2_file_write_iter(struct kiocb *iocb,
 	}
 	count = ret;
 
-	ret = ocfs2_prepare_inode_for_write(file, iocb->ki_pos, count);
+	if (direct_io && nowait) {
+		if (!ocfs2_overwrite_io(inode, iocb->ki_pos, count, 0)) {
+			ret = -EAGAIN;
+			goto out;
+		}
+	}
+
+	ret = ocfs2_prepare_inode_for_write(file, iocb->ki_pos, count, !nowait);
 	if (ret < 0) {
-		mlog_errno(ret);
+		if (ret != -EAGAIN)
+			mlog_errno(ret);
 		goto out;
 	}
 
@@ -2355,6 +2383,7 @@ static ssize_t ocfs2_file_read_iter(struct kiocb *iocb,
 	int ret = 0, rw_level = -1, lock_level = 0;
 	struct file *filp = iocb->ki_filp;
 	struct inode *inode = file_inode(filp);
+	int nowait = iocb->ki_flags & IOCB_NOWAIT ? 1 : 0;
 
 	trace_ocfs2_file_aio_read(inode, filp, filp->f_path.dentry,
 			(unsigned long long)OCFS2_I(inode)->ip_blkno,
@@ -2374,9 +2403,14 @@ static ssize_t ocfs2_file_read_iter(struct kiocb *iocb,
 	 * need locks to protect pending reads from racing with truncate.
 	 */
 	if (iocb->ki_flags & IOCB_DIRECT) {
-		ret = ocfs2_rw_lock(inode, 0);
+		if (nowait)
+			ret = ocfs2_try_rw_lock(inode, 0);
+		else
+			ret = ocfs2_rw_lock(inode, 0);
+
 		if (ret < 0) {
-			mlog_errno(ret);
+			if (ret != -EAGAIN)
+				mlog_errno(ret);
 			goto bail;
 		}
 		rw_level = 0;
@@ -2393,9 +2427,11 @@ static ssize_t ocfs2_file_read_iter(struct kiocb *iocb,
 	 * like i_size. This allows the checks down below
 	 * generic_file_aio_read() a chance of actually working.
 	 */
-	ret = ocfs2_inode_lock_atime(inode, filp->f_path.mnt, &lock_level);
+	ret = ocfs2_inode_lock_atime(inode, filp->f_path.mnt, &lock_level,
+				     !nowait);
 	if (ret < 0) {
-		mlog_errno(ret);
+		if (ret != -EAGAIN)
+			mlog_errno(ret);
 		goto bail;
 	}
 	ocfs2_inode_unlock(inode, lock_level);
diff --git a/fs/ocfs2/mmap.c b/fs/ocfs2/mmap.c
index 098f5c7..fb9a20e 100644
--- a/fs/ocfs2/mmap.c
+++ b/fs/ocfs2/mmap.c
@@ -184,7 +184,7 @@ int ocfs2_mmap(struct file *file, struct vm_area_struct *vma)
 	int ret = 0, lock_level = 0;
 
 	ret = ocfs2_inode_lock_atime(file_inode(file),
-				    file->f_path.mnt, &lock_level);
+				    file->f_path.mnt, &lock_level, 1);
 	if (ret < 0) {
 		mlog_errno(ret);
 		goto out;
diff --git a/fs/ocfs2/ocfs2_trace.h b/fs/ocfs2/ocfs2_trace.h
index a0b5d00..e2a11aa 100644
--- a/fs/ocfs2/ocfs2_trace.h
+++ b/fs/ocfs2/ocfs2_trace.h
@@ -1449,20 +1449,22 @@
 
 TRACE_EVENT(ocfs2_prepare_inode_for_write,
 	TP_PROTO(unsigned long long ino, unsigned long long saved_pos,
-		 unsigned long count),
-	TP_ARGS(ino, saved_pos, count),
+		 unsigned long count, int wait),
+	TP_ARGS(ino, saved_pos, count, wait),
 	TP_STRUCT__entry(
 		__field(unsigned long long, ino)
 		__field(unsigned long long, saved_pos)
 		__field(unsigned long, count)
+		__field(int, wait)
 	),
 	TP_fast_assign(
 		__entry->ino = ino;
 		__entry->saved_pos = saved_pos;
 		__entry->count = count;
+		__entry->wait = wait;
 	),
-	TP_printk("%llu %llu %lu", __entry->ino,
-		  __entry->saved_pos, __entry->count)
+	TP_printk("%llu %llu %lu %d", __entry->ino,
+		  __entry->saved_pos, __entry->count, __entry->wait)
 );
 
 DEFINE_OCFS2_INT_EVENT(generic_file_aio_read_ret);
-- 
1.8.5.6

^ permalink raw reply	[flat|nested] 62+ messages in thread

* [Ocfs2-devel] [PATCH 3/3] ocfs2: nowait aio support
@ 2017-11-27  9:46   ` Gang He
  0 siblings, 0 replies; 62+ messages in thread
From: Gang He @ 2017-11-27  9:46 UTC (permalink / raw)
  To: mfasheh, jlbec, rgoldwyn, hch; +Cc: Gang He, linux-kernel, ocfs2-devel, akpm

Return EAGAIN if any of the following checks fail for direct I/O:
Can not get the related locks immediately,
Blocks are not allocated at the write location, it will trigger
block allocation and block IO operations.

Signed-off-by: Gang He <ghe@suse.com>
---
 fs/ocfs2/dir.c         |  2 +-
 fs/ocfs2/dlmglue.c     | 20 ++++++++++----
 fs/ocfs2/dlmglue.h     |  2 +-
 fs/ocfs2/file.c        | 74 +++++++++++++++++++++++++++++++++++++-------------
 fs/ocfs2/mmap.c        |  2 +-
 fs/ocfs2/ocfs2_trace.h | 10 ++++---
 6 files changed, 79 insertions(+), 31 deletions(-)

diff --git a/fs/ocfs2/dir.c b/fs/ocfs2/dir.c
index febe631..ea50901 100644
--- a/fs/ocfs2/dir.c
+++ b/fs/ocfs2/dir.c
@@ -1957,7 +1957,7 @@ int ocfs2_readdir(struct file *file, struct dir_context *ctx)
 
 	trace_ocfs2_readdir((unsigned long long)OCFS2_I(inode)->ip_blkno);
 
-	error = ocfs2_inode_lock_atime(inode, file->f_path.mnt, &lock_level);
+	error = ocfs2_inode_lock_atime(inode, file->f_path.mnt, &lock_level, 1);
 	if (lock_level && error >= 0) {
 		/* We release EX lock which used to update atime
 		 * and get PR lock again to reduce contention
diff --git a/fs/ocfs2/dlmglue.c b/fs/ocfs2/dlmglue.c
index 5cfbd04..feb8dbe 100644
--- a/fs/ocfs2/dlmglue.c
+++ b/fs/ocfs2/dlmglue.c
@@ -2516,13 +2516,18 @@ int ocfs2_inode_lock_with_page(struct inode *inode,
 
 int ocfs2_inode_lock_atime(struct inode *inode,
 			  struct vfsmount *vfsmnt,
-			  int *level)
+			  int *level, int wait)
 {
 	int ret;
 
-	ret = ocfs2_inode_lock(inode, NULL, 0);
+	if (wait)
+		ret = ocfs2_inode_lock(inode, NULL, 0);
+	else
+		ret = ocfs2_try_inode_lock(inode, NULL, 0);
+
 	if (ret < 0) {
-		mlog_errno(ret);
+		if (ret != -EAGAIN)
+			mlog_errno(ret);
 		return ret;
 	}
 
@@ -2534,9 +2539,14 @@ int ocfs2_inode_lock_atime(struct inode *inode,
 		struct buffer_head *bh = NULL;
 
 		ocfs2_inode_unlock(inode, 0);
-		ret = ocfs2_inode_lock(inode, &bh, 1);
+		if (wait)
+			ret = ocfs2_inode_lock(inode, &bh, 1);
+		else
+			ret = ocfs2_try_inode_lock(inode, &bh, 1);
+
 		if (ret < 0) {
-			mlog_errno(ret);
+			if (ret != -EAGAIN)
+				mlog_errno(ret);
 			return ret;
 		}
 		*level = 1;
diff --git a/fs/ocfs2/dlmglue.h b/fs/ocfs2/dlmglue.h
index 05910fc..c83dbb5 100644
--- a/fs/ocfs2/dlmglue.h
+++ b/fs/ocfs2/dlmglue.h
@@ -123,7 +123,7 @@ void ocfs2_refcount_lock_res_init(struct ocfs2_lock_res *lockres,
 void ocfs2_open_unlock(struct inode *inode);
 int ocfs2_inode_lock_atime(struct inode *inode,
 			  struct vfsmount *vfsmnt,
-			  int *level);
+			  int *level, int wait);
 int ocfs2_inode_lock_full_nested(struct inode *inode,
 			 struct buffer_head **ret_bh,
 			 int ex,
diff --git a/fs/ocfs2/file.c b/fs/ocfs2/file.c
index dc455d4..900f04e 100644
--- a/fs/ocfs2/file.c
+++ b/fs/ocfs2/file.c
@@ -140,6 +140,8 @@ static int ocfs2_file_open(struct inode *inode, struct file *file)
 		spin_unlock(&oi->ip_lock);
 	}
 
+	file->f_mode |= FMODE_NOWAIT;
+
 leave:
 	return status;
 }
@@ -2132,8 +2134,7 @@ static int ocfs2_prepare_inode_for_refcount(struct inode *inode,
 }
 
 static int ocfs2_prepare_inode_for_write(struct file *file,
-					 loff_t pos,
-					 size_t count)
+					 loff_t pos, size_t count, int wait)
 {
 	int ret = 0, meta_level = 0;
 	struct dentry *dentry = file->f_path.dentry;
@@ -2145,10 +2146,14 @@ static int ocfs2_prepare_inode_for_write(struct file *file,
 	 * if we need to make modifications here.
 	 */
 	for(;;) {
-		ret = ocfs2_inode_lock(inode, NULL, meta_level);
+		if (wait)
+			ret = ocfs2_inode_lock(inode, NULL, meta_level);
+		else
+			ret = ocfs2_try_inode_lock(inode, NULL, meta_level);
 		if (ret < 0) {
 			meta_level = -1;
-			mlog_errno(ret);
+			if (ret != -EAGAIN)
+				mlog_errno(ret);
 			goto out;
 		}
 
@@ -2199,7 +2204,7 @@ static int ocfs2_prepare_inode_for_write(struct file *file,
 
 out_unlock:
 	trace_ocfs2_prepare_inode_for_write(OCFS2_I(inode)->ip_blkno,
-					    pos, count);
+					    pos, count, wait);
 
 	if (meta_level >= 0)
 		ocfs2_inode_unlock(inode, meta_level);
@@ -2211,7 +2216,7 @@ static int ocfs2_prepare_inode_for_write(struct file *file,
 static ssize_t ocfs2_file_write_iter(struct kiocb *iocb,
 				    struct iov_iter *from)
 {
-	int direct_io, rw_level;
+	int rw_level;
 	ssize_t written = 0;
 	ssize_t ret;
 	size_t count = iov_iter_count(from);
@@ -2223,6 +2228,8 @@ static ssize_t ocfs2_file_write_iter(struct kiocb *iocb,
 	void *saved_ki_complete = NULL;
 	int append_write = ((iocb->ki_pos + count) >=
 			i_size_read(inode) ? 1 : 0);
+	int direct_io = iocb->ki_flags & IOCB_DIRECT ? 1 : 0;
+	int nowait = iocb->ki_flags & IOCB_NOWAIT ? 1 : 0;
 
 	trace_ocfs2_file_aio_write(inode, file, file->f_path.dentry,
 		(unsigned long long)OCFS2_I(inode)->ip_blkno,
@@ -2230,12 +2237,17 @@ static ssize_t ocfs2_file_write_iter(struct kiocb *iocb,
 		file->f_path.dentry->d_name.name,
 		(unsigned int)from->nr_segs);	/* GRRRRR */
 
+	if (!direct_io && nowait)
+		return -EOPNOTSUPP;
+
 	if (count == 0)
 		return 0;
 
-	direct_io = iocb->ki_flags & IOCB_DIRECT ? 1 : 0;
-
-	inode_lock(inode);
+	if (direct_io && nowait) {
+		if (!inode_trylock(inode))
+			return -EAGAIN;
+	} else
+		inode_lock(inode);
 
 	/*
 	 * Concurrent O_DIRECT writes are allowed with
@@ -2244,9 +2256,13 @@ static ssize_t ocfs2_file_write_iter(struct kiocb *iocb,
 	 */
 	rw_level = (!direct_io || full_coherency || append_write);
 
-	ret = ocfs2_rw_lock(inode, rw_level);
+	if (direct_io && nowait)
+		ret = ocfs2_try_rw_lock(inode, rw_level);
+	else
+		ret = ocfs2_rw_lock(inode, rw_level);
 	if (ret < 0) {
-		mlog_errno(ret);
+		if (ret != -EAGAIN)
+			mlog_errno(ret);
 		goto out_mutex;
 	}
 
@@ -2260,9 +2276,13 @@ static ssize_t ocfs2_file_write_iter(struct kiocb *iocb,
 		 * other nodes to drop their caches.  Buffered I/O
 		 * already does this in write_begin().
 		 */
-		ret = ocfs2_inode_lock(inode, NULL, 1);
+		if (nowait)
+			ret = ocfs2_try_inode_lock(inode, NULL, 1);
+		else
+			ret = ocfs2_inode_lock(inode, NULL, 1);
 		if (ret < 0) {
-			mlog_errno(ret);
+			if (ret != -EAGAIN)
+				mlog_errno(ret);
 			goto out;
 		}
 
@@ -2277,9 +2297,17 @@ static ssize_t ocfs2_file_write_iter(struct kiocb *iocb,
 	}
 	count = ret;
 
-	ret = ocfs2_prepare_inode_for_write(file, iocb->ki_pos, count);
+	if (direct_io && nowait) {
+		if (!ocfs2_overwrite_io(inode, iocb->ki_pos, count, 0)) {
+			ret = -EAGAIN;
+			goto out;
+		}
+	}
+
+	ret = ocfs2_prepare_inode_for_write(file, iocb->ki_pos, count, !nowait);
 	if (ret < 0) {
-		mlog_errno(ret);
+		if (ret != -EAGAIN)
+			mlog_errno(ret);
 		goto out;
 	}
 
@@ -2355,6 +2383,7 @@ static ssize_t ocfs2_file_read_iter(struct kiocb *iocb,
 	int ret = 0, rw_level = -1, lock_level = 0;
 	struct file *filp = iocb->ki_filp;
 	struct inode *inode = file_inode(filp);
+	int nowait = iocb->ki_flags & IOCB_NOWAIT ? 1 : 0;
 
 	trace_ocfs2_file_aio_read(inode, filp, filp->f_path.dentry,
 			(unsigned long long)OCFS2_I(inode)->ip_blkno,
@@ -2374,9 +2403,14 @@ static ssize_t ocfs2_file_read_iter(struct kiocb *iocb,
 	 * need locks to protect pending reads from racing with truncate.
 	 */
 	if (iocb->ki_flags & IOCB_DIRECT) {
-		ret = ocfs2_rw_lock(inode, 0);
+		if (nowait)
+			ret = ocfs2_try_rw_lock(inode, 0);
+		else
+			ret = ocfs2_rw_lock(inode, 0);
+
 		if (ret < 0) {
-			mlog_errno(ret);
+			if (ret != -EAGAIN)
+				mlog_errno(ret);
 			goto bail;
 		}
 		rw_level = 0;
@@ -2393,9 +2427,11 @@ static ssize_t ocfs2_file_read_iter(struct kiocb *iocb,
 	 * like i_size. This allows the checks down below
 	 * generic_file_aio_read() a chance of actually working.
 	 */
-	ret = ocfs2_inode_lock_atime(inode, filp->f_path.mnt, &lock_level);
+	ret = ocfs2_inode_lock_atime(inode, filp->f_path.mnt, &lock_level,
+				     !nowait);
 	if (ret < 0) {
-		mlog_errno(ret);
+		if (ret != -EAGAIN)
+			mlog_errno(ret);
 		goto bail;
 	}
 	ocfs2_inode_unlock(inode, lock_level);
diff --git a/fs/ocfs2/mmap.c b/fs/ocfs2/mmap.c
index 098f5c7..fb9a20e 100644
--- a/fs/ocfs2/mmap.c
+++ b/fs/ocfs2/mmap.c
@@ -184,7 +184,7 @@ int ocfs2_mmap(struct file *file, struct vm_area_struct *vma)
 	int ret = 0, lock_level = 0;
 
 	ret = ocfs2_inode_lock_atime(file_inode(file),
-				    file->f_path.mnt, &lock_level);
+				    file->f_path.mnt, &lock_level, 1);
 	if (ret < 0) {
 		mlog_errno(ret);
 		goto out;
diff --git a/fs/ocfs2/ocfs2_trace.h b/fs/ocfs2/ocfs2_trace.h
index a0b5d00..e2a11aa 100644
--- a/fs/ocfs2/ocfs2_trace.h
+++ b/fs/ocfs2/ocfs2_trace.h
@@ -1449,20 +1449,22 @@
 
 TRACE_EVENT(ocfs2_prepare_inode_for_write,
 	TP_PROTO(unsigned long long ino, unsigned long long saved_pos,
-		 unsigned long count),
-	TP_ARGS(ino, saved_pos, count),
+		 unsigned long count, int wait),
+	TP_ARGS(ino, saved_pos, count, wait),
 	TP_STRUCT__entry(
 		__field(unsigned long long, ino)
 		__field(unsigned long long, saved_pos)
 		__field(unsigned long, count)
+		__field(int, wait)
 	),
 	TP_fast_assign(
 		__entry->ino = ino;
 		__entry->saved_pos = saved_pos;
 		__entry->count = count;
+		__entry->wait = wait;
 	),
-	TP_printk("%llu %llu %lu", __entry->ino,
-		  __entry->saved_pos, __entry->count)
+	TP_printk("%llu %llu %lu %d", __entry->ino,
+		  __entry->saved_pos, __entry->count, __entry->wait)
 );
 
 DEFINE_OCFS2_INT_EVENT(generic_file_aio_read_ret);
-- 
1.8.5.6

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [Ocfs2-devel] [PATCH 2/3] ocfs2: add ocfs2_overwrite_io function
  2017-11-27  9:46   ` [Ocfs2-devel] " Gang He
@ 2017-11-28  1:13     ` Joseph Qi
  -1 siblings, 0 replies; 62+ messages in thread
From: Joseph Qi @ 2017-11-28  1:13 UTC (permalink / raw)
  To: Gang He, mfasheh, jlbec, rgoldwyn, hch; +Cc: linux-kernel, ocfs2-devel

Hi Gang,

On 17/11/27 17:46, Gang He wrote:
> Add ocfs2_overwrite_io function, which is used to judge if
> overwrite allocated blocks, otherwise, the write will bring extra
> block allocation overhead.
> 
> Signed-off-by: Gang He <ghe@suse.com>
> ---
>  fs/ocfs2/extent_map.c | 67 +++++++++++++++++++++++++++++++++++++++++++++++++++
>  fs/ocfs2/extent_map.h |  3 +++
>  2 files changed, 70 insertions(+)
> 
> diff --git a/fs/ocfs2/extent_map.c b/fs/ocfs2/extent_map.c
> index e4719e0..98bf325 100644
> --- a/fs/ocfs2/extent_map.c
> +++ b/fs/ocfs2/extent_map.c
> @@ -832,6 +832,73 @@ int ocfs2_fiemap(struct inode *inode, struct fiemap_extent_info *fieinfo,
>  	return ret;
>  }
>  
> +/* Is IO overwriting allocated blocks? */
> +int ocfs2_overwrite_io(struct inode *inode, u64 map_start, u64 map_len,
> +		       int wait)
> +{
> +	int ret = 0, is_last;
> +	u32 mapping_end, cpos;
> +	struct ocfs2_super *osb = OCFS2_SB(inode->i_sb);
> +	struct buffer_head *di_bh = NULL;
> +	struct ocfs2_extent_rec rec;
> +
> +	if (wait)
> +		ret = ocfs2_inode_lock(inode, &di_bh, 0);
> +	else
> +		ret = ocfs2_try_inode_lock(inode, &di_bh, 0);
> +	if (ret)
> +		goto out;
> +
> +	if (wait)
> +		down_read(&OCFS2_I(inode)->ip_alloc_sem);
> +	else {
> +		if (!down_read_trylock(&OCFS2_I(inode)->ip_alloc_sem)) {
> +			ret = -EAGAIN;
> +			goto out_unlock1;
> +		}
> +	}
> +
> +	if ((OCFS2_I(inode)->ip_dyn_features & OCFS2_INLINE_DATA_FL) &&
> +	   ((map_start + map_len) <= i_size_read(inode)))
> +		goto out_unlock2;
> +
> +	cpos = map_start >> osb->s_clustersize_bits;
> +	mapping_end = ocfs2_clusters_for_bytes(inode->i_sb,
> +					       map_start + map_len);
> +	is_last = 0;
> +	while (cpos < mapping_end && !is_last) {
> +		ret = ocfs2_get_clusters_nocache(inode, di_bh, cpos,
> +						 NULL, &rec, &is_last);
> +		if (ret) {
> +			mlog_errno(ret);
> +			goto out_unlock2;
> +		}
> +
> +		if (rec.e_blkno == 0ULL)
> +			break;
> +
> +		if (rec.e_flags & OCFS2_EXT_REFCOUNTED)
> +			break;
> +
> +		cpos = le32_to_cpu(rec.e_cpos) +
> +			le16_to_cpu(rec.e_leaf_clusters);
> +	}
> +
> +	if (cpos < mapping_end)
> +		ret = 1;
> +
> +out_unlock2:
> +	brelse(di_bh);
> +
> +	up_read(&OCFS2_I(inode)->ip_alloc_sem);
> +
> +out_unlock1:
Should brelse(di_bh) be here?

> +	ocfs2_inode_unlock(inode, 0);
> +
> +out:
> +	return (ret ? 0 : 1);
I don't think EAGAIN and other error code can be handled the same. We
have to distinguish them.

Thanks,
Joseph

> +}
> +
>  int ocfs2_seek_data_hole_offset(struct file *file, loff_t *offset, int whence)
>  {
>  	struct inode *inode = file->f_mapping->host;
> diff --git a/fs/ocfs2/extent_map.h b/fs/ocfs2/extent_map.h
> index 67ea57d..fd9e86a 100644
> --- a/fs/ocfs2/extent_map.h
> +++ b/fs/ocfs2/extent_map.h
> @@ -53,6 +53,9 @@ int ocfs2_extent_map_get_blocks(struct inode *inode, u64 v_blkno, u64 *p_blkno,
>  int ocfs2_fiemap(struct inode *inode, struct fiemap_extent_info *fieinfo,
>  		 u64 map_start, u64 map_len);
>  
> +int ocfs2_overwrite_io(struct inode *inode, u64 map_start, u64 map_len,
> +		       int wait);
> +
>  int ocfs2_seek_data_hole_offset(struct file *file, loff_t *offset, int origin);
>  
>  int ocfs2_xattr_get_clusters(struct inode *inode, u32 v_cluster,
> 

^ permalink raw reply	[flat|nested] 62+ messages in thread

* [Ocfs2-devel] [PATCH 2/3] ocfs2: add ocfs2_overwrite_io function
@ 2017-11-28  1:13     ` Joseph Qi
  0 siblings, 0 replies; 62+ messages in thread
From: Joseph Qi @ 2017-11-28  1:13 UTC (permalink / raw)
  To: Gang He, mfasheh, jlbec, rgoldwyn, hch; +Cc: linux-kernel, ocfs2-devel

Hi Gang,

On 17/11/27 17:46, Gang He wrote:
> Add ocfs2_overwrite_io function, which is used to judge if
> overwrite allocated blocks, otherwise, the write will bring extra
> block allocation overhead.
> 
> Signed-off-by: Gang He <ghe@suse.com>
> ---
>  fs/ocfs2/extent_map.c | 67 +++++++++++++++++++++++++++++++++++++++++++++++++++
>  fs/ocfs2/extent_map.h |  3 +++
>  2 files changed, 70 insertions(+)
> 
> diff --git a/fs/ocfs2/extent_map.c b/fs/ocfs2/extent_map.c
> index e4719e0..98bf325 100644
> --- a/fs/ocfs2/extent_map.c
> +++ b/fs/ocfs2/extent_map.c
> @@ -832,6 +832,73 @@ int ocfs2_fiemap(struct inode *inode, struct fiemap_extent_info *fieinfo,
>  	return ret;
>  }
>  
> +/* Is IO overwriting allocated blocks? */
> +int ocfs2_overwrite_io(struct inode *inode, u64 map_start, u64 map_len,
> +		       int wait)
> +{
> +	int ret = 0, is_last;
> +	u32 mapping_end, cpos;
> +	struct ocfs2_super *osb = OCFS2_SB(inode->i_sb);
> +	struct buffer_head *di_bh = NULL;
> +	struct ocfs2_extent_rec rec;
> +
> +	if (wait)
> +		ret = ocfs2_inode_lock(inode, &di_bh, 0);
> +	else
> +		ret = ocfs2_try_inode_lock(inode, &di_bh, 0);
> +	if (ret)
> +		goto out;
> +
> +	if (wait)
> +		down_read(&OCFS2_I(inode)->ip_alloc_sem);
> +	else {
> +		if (!down_read_trylock(&OCFS2_I(inode)->ip_alloc_sem)) {
> +			ret = -EAGAIN;
> +			goto out_unlock1;
> +		}
> +	}
> +
> +	if ((OCFS2_I(inode)->ip_dyn_features & OCFS2_INLINE_DATA_FL) &&
> +	   ((map_start + map_len) <= i_size_read(inode)))
> +		goto out_unlock2;
> +
> +	cpos = map_start >> osb->s_clustersize_bits;
> +	mapping_end = ocfs2_clusters_for_bytes(inode->i_sb,
> +					       map_start + map_len);
> +	is_last = 0;
> +	while (cpos < mapping_end && !is_last) {
> +		ret = ocfs2_get_clusters_nocache(inode, di_bh, cpos,
> +						 NULL, &rec, &is_last);
> +		if (ret) {
> +			mlog_errno(ret);
> +			goto out_unlock2;
> +		}
> +
> +		if (rec.e_blkno == 0ULL)
> +			break;
> +
> +		if (rec.e_flags & OCFS2_EXT_REFCOUNTED)
> +			break;
> +
> +		cpos = le32_to_cpu(rec.e_cpos) +
> +			le16_to_cpu(rec.e_leaf_clusters);
> +	}
> +
> +	if (cpos < mapping_end)
> +		ret = 1;
> +
> +out_unlock2:
> +	brelse(di_bh);
> +
> +	up_read(&OCFS2_I(inode)->ip_alloc_sem);
> +
> +out_unlock1:
Should brelse(di_bh) be here?

> +	ocfs2_inode_unlock(inode, 0);
> +
> +out:
> +	return (ret ? 0 : 1);
I don't think EAGAIN and other error code can be handled the same. We
have to distinguish them.

Thanks,
Joseph

> +}
> +
>  int ocfs2_seek_data_hole_offset(struct file *file, loff_t *offset, int whence)
>  {
>  	struct inode *inode = file->f_mapping->host;
> diff --git a/fs/ocfs2/extent_map.h b/fs/ocfs2/extent_map.h
> index 67ea57d..fd9e86a 100644
> --- a/fs/ocfs2/extent_map.h
> +++ b/fs/ocfs2/extent_map.h
> @@ -53,6 +53,9 @@ int ocfs2_extent_map_get_blocks(struct inode *inode, u64 v_blkno, u64 *p_blkno,
>  int ocfs2_fiemap(struct inode *inode, struct fiemap_extent_info *fieinfo,
>  		 u64 map_start, u64 map_len);
>  
> +int ocfs2_overwrite_io(struct inode *inode, u64 map_start, u64 map_len,
> +		       int wait);
> +
>  int ocfs2_seek_data_hole_offset(struct file *file, loff_t *offset, int origin);
>  
>  int ocfs2_xattr_get_clusters(struct inode *inode, u32 v_cluster,
> 

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [Ocfs2-devel] [PATCH 1/3] ocfs2: add ocfs2_try_rw_lock and ocfs2_try_inode_lock
  2017-11-27  9:46   ` [Ocfs2-devel] " Gang He
@ 2017-11-28  1:32     ` piaojun
  -1 siblings, 0 replies; 62+ messages in thread
From: piaojun @ 2017-11-28  1:32 UTC (permalink / raw)
  To: Gang He, mfasheh, jlbec, rgoldwyn, hch; +Cc: linux-kernel, ocfs2-devel

Hi Gang,

On 2017/11/27 17:46, Gang He wrote:
> Add ocfs2_try_rw_lock and ocfs2_try_inode_lock functions, which
> will be used in non-block IO scenarios.
> 
> Signed-off-by: Gang He <ghe@suse.com>
> ---
>  fs/ocfs2/dlmglue.c | 22 ++++++++++++++++++++++
>  fs/ocfs2/dlmglue.h |  4 ++++
>  2 files changed, 26 insertions(+)
> 
> diff --git a/fs/ocfs2/dlmglue.c b/fs/ocfs2/dlmglue.c
> index 4689940..5cfbd04 100644
> --- a/fs/ocfs2/dlmglue.c
> +++ b/fs/ocfs2/dlmglue.c
> @@ -1742,6 +1742,28 @@ int ocfs2_rw_lock(struct inode *inode, int write)
>  	return status;
>  }
>  
> +int ocfs2_try_rw_lock(struct inode *inode, int write)
> +{
> +	int status, level;
> +	struct ocfs2_lock_res *lockres;
> +	struct ocfs2_super *osb = OCFS2_SB(inode->i_sb);
> +
> +	mlog(0, "inode %llu try to take %s RW lock\n",
> +	     (unsigned long long)OCFS2_I(inode)->ip_blkno,
> +	     write ? "EXMODE" : "PRMODE");
> +
> +	if (ocfs2_mount_local(osb))
> +		return 0;
> +
> +	lockres = &OCFS2_I(inode)->ip_rw_lockres;
> +
> +	level = write ? DLM_LOCK_EX : DLM_LOCK_PR;
> +
> +	status = ocfs2_cluster_lock(OCFS2_SB(inode->i_sb), lockres, level,
> +				    DLM_LKF_NOQUEUE, 0);

we'd better use 'osb' instead of 'OCFS2_SB(inode->i_sb)'.

> +	return status;
> +}
> +
>  void ocfs2_rw_unlock(struct inode *inode, int write)
>  {
>  	int level = write ? DLM_LOCK_EX : DLM_LOCK_PR;
> diff --git a/fs/ocfs2/dlmglue.h b/fs/ocfs2/dlmglue.h
> index a7fc18b..05910fc 100644
> --- a/fs/ocfs2/dlmglue.h
> +++ b/fs/ocfs2/dlmglue.h
> @@ -116,6 +116,7 @@ void ocfs2_refcount_lock_res_init(struct ocfs2_lock_res *lockres,
>  int ocfs2_create_new_inode_locks(struct inode *inode);
>  int ocfs2_drop_inode_locks(struct inode *inode);
>  int ocfs2_rw_lock(struct inode *inode, int write);
> +int ocfs2_try_rw_lock(struct inode *inode, int write);
>  void ocfs2_rw_unlock(struct inode *inode, int write);
>  int ocfs2_open_lock(struct inode *inode);
>  int ocfs2_try_open_lock(struct inode *inode, int write);
> @@ -140,6 +141,9 @@ int ocfs2_inode_lock_with_page(struct inode *inode,
>  /* 99% of the time we don't want to supply any additional flags --
>   * those are for very specific cases only. */
>  #define ocfs2_inode_lock(i, b, e) ocfs2_inode_lock_full_nested(i, b, e, 0, OI_LS_NORMAL)
> +#define ocfs2_try_inode_lock(i, b, e)\
> +		ocfs2_inode_lock_full_nested(i, b, e, OCFS2_META_LOCK_NOQUEUE,\
> +		OI_LS_NORMAL)
>  void ocfs2_inode_unlock(struct inode *inode,
>  		       int ex);
>  int ocfs2_super_lock(struct ocfs2_super *osb,
> 

^ permalink raw reply	[flat|nested] 62+ messages in thread

* [Ocfs2-devel] [PATCH 1/3] ocfs2: add ocfs2_try_rw_lock and ocfs2_try_inode_lock
@ 2017-11-28  1:32     ` piaojun
  0 siblings, 0 replies; 62+ messages in thread
From: piaojun @ 2017-11-28  1:32 UTC (permalink / raw)
  To: Gang He, mfasheh, jlbec, rgoldwyn, hch; +Cc: linux-kernel, ocfs2-devel

Hi Gang,

On 2017/11/27 17:46, Gang He wrote:
> Add ocfs2_try_rw_lock and ocfs2_try_inode_lock functions, which
> will be used in non-block IO scenarios.
> 
> Signed-off-by: Gang He <ghe@suse.com>
> ---
>  fs/ocfs2/dlmglue.c | 22 ++++++++++++++++++++++
>  fs/ocfs2/dlmglue.h |  4 ++++
>  2 files changed, 26 insertions(+)
> 
> diff --git a/fs/ocfs2/dlmglue.c b/fs/ocfs2/dlmglue.c
> index 4689940..5cfbd04 100644
> --- a/fs/ocfs2/dlmglue.c
> +++ b/fs/ocfs2/dlmglue.c
> @@ -1742,6 +1742,28 @@ int ocfs2_rw_lock(struct inode *inode, int write)
>  	return status;
>  }
>  
> +int ocfs2_try_rw_lock(struct inode *inode, int write)
> +{
> +	int status, level;
> +	struct ocfs2_lock_res *lockres;
> +	struct ocfs2_super *osb = OCFS2_SB(inode->i_sb);
> +
> +	mlog(0, "inode %llu try to take %s RW lock\n",
> +	     (unsigned long long)OCFS2_I(inode)->ip_blkno,
> +	     write ? "EXMODE" : "PRMODE");
> +
> +	if (ocfs2_mount_local(osb))
> +		return 0;
> +
> +	lockres = &OCFS2_I(inode)->ip_rw_lockres;
> +
> +	level = write ? DLM_LOCK_EX : DLM_LOCK_PR;
> +
> +	status = ocfs2_cluster_lock(OCFS2_SB(inode->i_sb), lockres, level,
> +				    DLM_LKF_NOQUEUE, 0);

we'd better use 'osb' instead of 'OCFS2_SB(inode->i_sb)'.

> +	return status;
> +}
> +
>  void ocfs2_rw_unlock(struct inode *inode, int write)
>  {
>  	int level = write ? DLM_LOCK_EX : DLM_LOCK_PR;
> diff --git a/fs/ocfs2/dlmglue.h b/fs/ocfs2/dlmglue.h
> index a7fc18b..05910fc 100644
> --- a/fs/ocfs2/dlmglue.h
> +++ b/fs/ocfs2/dlmglue.h
> @@ -116,6 +116,7 @@ void ocfs2_refcount_lock_res_init(struct ocfs2_lock_res *lockres,
>  int ocfs2_create_new_inode_locks(struct inode *inode);
>  int ocfs2_drop_inode_locks(struct inode *inode);
>  int ocfs2_rw_lock(struct inode *inode, int write);
> +int ocfs2_try_rw_lock(struct inode *inode, int write);
>  void ocfs2_rw_unlock(struct inode *inode, int write);
>  int ocfs2_open_lock(struct inode *inode);
>  int ocfs2_try_open_lock(struct inode *inode, int write);
> @@ -140,6 +141,9 @@ int ocfs2_inode_lock_with_page(struct inode *inode,
>  /* 99% of the time we don't want to supply any additional flags --
>   * those are for very specific cases only. */
>  #define ocfs2_inode_lock(i, b, e) ocfs2_inode_lock_full_nested(i, b, e, 0, OI_LS_NORMAL)
> +#define ocfs2_try_inode_lock(i, b, e)\
> +		ocfs2_inode_lock_full_nested(i, b, e, OCFS2_META_LOCK_NOQUEUE,\
> +		OI_LS_NORMAL)
>  void ocfs2_inode_unlock(struct inode *inode,
>  		       int ex);
>  int ocfs2_super_lock(struct ocfs2_super *osb,
> 

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [Ocfs2-devel] [PATCH 2/3] ocfs2: add ocfs2_overwrite_io function
  2017-11-27  9:46   ` [Ocfs2-devel] " Gang He
@ 2017-11-28  1:50     ` piaojun
  -1 siblings, 0 replies; 62+ messages in thread
From: piaojun @ 2017-11-28  1:50 UTC (permalink / raw)
  To: Gang He, mfasheh, jlbec, rgoldwyn, hch; +Cc: linux-kernel, ocfs2-devel

Hi Gang,

If ocfs2_overwrite_io is only called in 'nowait' scenarios, I wonder if
we can discard 'int wait' just as ext4 does:

static bool ext4_overwrite_io(struct inode *inode, loff_t pos, loff_t len);

thans,
Jun

On 2017/11/27 17:46, Gang He wrote:
> Add ocfs2_overwrite_io function, which is used to judge if
> overwrite allocated blocks, otherwise, the write will bring extra
> block allocation overhead.
> 
> Signed-off-by: Gang He <ghe@suse.com>
> ---
>  fs/ocfs2/extent_map.c | 67 +++++++++++++++++++++++++++++++++++++++++++++++++++
>  fs/ocfs2/extent_map.h |  3 +++
>  2 files changed, 70 insertions(+)
> 
> diff --git a/fs/ocfs2/extent_map.c b/fs/ocfs2/extent_map.c
> index e4719e0..98bf325 100644
> --- a/fs/ocfs2/extent_map.c
> +++ b/fs/ocfs2/extent_map.c
> @@ -832,6 +832,73 @@ int ocfs2_fiemap(struct inode *inode, struct fiemap_extent_info *fieinfo,
>  	return ret;
>  }
>  
> +/* Is IO overwriting allocated blocks? */
> +int ocfs2_overwrite_io(struct inode *inode, u64 map_start, u64 map_len,
> +		       int wait)
> +{
> +	int ret = 0, is_last;
> +	u32 mapping_end, cpos;
> +	struct ocfs2_super *osb = OCFS2_SB(inode->i_sb);
> +	struct buffer_head *di_bh = NULL;
> +	struct ocfs2_extent_rec rec;
> +
> +	if (wait)
> +		ret = ocfs2_inode_lock(inode, &di_bh, 0);
> +	else
> +		ret = ocfs2_try_inode_lock(inode, &di_bh, 0);
> +	if (ret)
> +		goto out;
> +
> +	if (wait)
> +		down_read(&OCFS2_I(inode)->ip_alloc_sem);
> +	else {
> +		if (!down_read_trylock(&OCFS2_I(inode)->ip_alloc_sem)) {
> +			ret = -EAGAIN;
> +			goto out_unlock1;
> +		}
> +	}
> +
> +	if ((OCFS2_I(inode)->ip_dyn_features & OCFS2_INLINE_DATA_FL) &&
> +	   ((map_start + map_len) <= i_size_read(inode)))
> +		goto out_unlock2;
> +
> +	cpos = map_start >> osb->s_clustersize_bits;
> +	mapping_end = ocfs2_clusters_for_bytes(inode->i_sb,
> +					       map_start + map_len);
> +	is_last = 0;
> +	while (cpos < mapping_end && !is_last) {
> +		ret = ocfs2_get_clusters_nocache(inode, di_bh, cpos,
> +						 NULL, &rec, &is_last);
> +		if (ret) {
> +			mlog_errno(ret);
> +			goto out_unlock2;
> +		}
> +
> +		if (rec.e_blkno == 0ULL)
> +			break;
> +
> +		if (rec.e_flags & OCFS2_EXT_REFCOUNTED)
> +			break;
> +
> +		cpos = le32_to_cpu(rec.e_cpos) +
> +			le16_to_cpu(rec.e_leaf_clusters);
> +	}
> +
> +	if (cpos < mapping_end)
> +		ret = 1;
> +
> +out_unlock2:
> +	brelse(di_bh);
> +
> +	up_read(&OCFS2_I(inode)->ip_alloc_sem);
> +
> +out_unlock1:
> +	ocfs2_inode_unlock(inode, 0);
> +
> +out:
> +	return (ret ? 0 : 1);
> +}
> +
>  int ocfs2_seek_data_hole_offset(struct file *file, loff_t *offset, int whence)
>  {
>  	struct inode *inode = file->f_mapping->host;
> diff --git a/fs/ocfs2/extent_map.h b/fs/ocfs2/extent_map.h
> index 67ea57d..fd9e86a 100644
> --- a/fs/ocfs2/extent_map.h
> +++ b/fs/ocfs2/extent_map.h
> @@ -53,6 +53,9 @@ int ocfs2_extent_map_get_blocks(struct inode *inode, u64 v_blkno, u64 *p_blkno,
>  int ocfs2_fiemap(struct inode *inode, struct fiemap_extent_info *fieinfo,
>  		 u64 map_start, u64 map_len);
>  
> +int ocfs2_overwrite_io(struct inode *inode, u64 map_start, u64 map_len,
> +		       int wait);
> +
>  int ocfs2_seek_data_hole_offset(struct file *file, loff_t *offset, int origin);
>  
>  int ocfs2_xattr_get_clusters(struct inode *inode, u32 v_cluster,
> 

^ permalink raw reply	[flat|nested] 62+ messages in thread

* [Ocfs2-devel] [PATCH 2/3] ocfs2: add ocfs2_overwrite_io function
@ 2017-11-28  1:50     ` piaojun
  0 siblings, 0 replies; 62+ messages in thread
From: piaojun @ 2017-11-28  1:50 UTC (permalink / raw)
  To: Gang He, mfasheh, jlbec, rgoldwyn, hch; +Cc: linux-kernel, ocfs2-devel

Hi Gang,

If ocfs2_overwrite_io is only called in 'nowait' scenarios, I wonder if
we can discard 'int wait' just as ext4 does:

static bool ext4_overwrite_io(struct inode *inode, loff_t pos, loff_t len);

thans,
Jun

On 2017/11/27 17:46, Gang He wrote:
> Add ocfs2_overwrite_io function, which is used to judge if
> overwrite allocated blocks, otherwise, the write will bring extra
> block allocation overhead.
> 
> Signed-off-by: Gang He <ghe@suse.com>
> ---
>  fs/ocfs2/extent_map.c | 67 +++++++++++++++++++++++++++++++++++++++++++++++++++
>  fs/ocfs2/extent_map.h |  3 +++
>  2 files changed, 70 insertions(+)
> 
> diff --git a/fs/ocfs2/extent_map.c b/fs/ocfs2/extent_map.c
> index e4719e0..98bf325 100644
> --- a/fs/ocfs2/extent_map.c
> +++ b/fs/ocfs2/extent_map.c
> @@ -832,6 +832,73 @@ int ocfs2_fiemap(struct inode *inode, struct fiemap_extent_info *fieinfo,
>  	return ret;
>  }
>  
> +/* Is IO overwriting allocated blocks? */
> +int ocfs2_overwrite_io(struct inode *inode, u64 map_start, u64 map_len,
> +		       int wait)
> +{
> +	int ret = 0, is_last;
> +	u32 mapping_end, cpos;
> +	struct ocfs2_super *osb = OCFS2_SB(inode->i_sb);
> +	struct buffer_head *di_bh = NULL;
> +	struct ocfs2_extent_rec rec;
> +
> +	if (wait)
> +		ret = ocfs2_inode_lock(inode, &di_bh, 0);
> +	else
> +		ret = ocfs2_try_inode_lock(inode, &di_bh, 0);
> +	if (ret)
> +		goto out;
> +
> +	if (wait)
> +		down_read(&OCFS2_I(inode)->ip_alloc_sem);
> +	else {
> +		if (!down_read_trylock(&OCFS2_I(inode)->ip_alloc_sem)) {
> +			ret = -EAGAIN;
> +			goto out_unlock1;
> +		}
> +	}
> +
> +	if ((OCFS2_I(inode)->ip_dyn_features & OCFS2_INLINE_DATA_FL) &&
> +	   ((map_start + map_len) <= i_size_read(inode)))
> +		goto out_unlock2;
> +
> +	cpos = map_start >> osb->s_clustersize_bits;
> +	mapping_end = ocfs2_clusters_for_bytes(inode->i_sb,
> +					       map_start + map_len);
> +	is_last = 0;
> +	while (cpos < mapping_end && !is_last) {
> +		ret = ocfs2_get_clusters_nocache(inode, di_bh, cpos,
> +						 NULL, &rec, &is_last);
> +		if (ret) {
> +			mlog_errno(ret);
> +			goto out_unlock2;
> +		}
> +
> +		if (rec.e_blkno == 0ULL)
> +			break;
> +
> +		if (rec.e_flags & OCFS2_EXT_REFCOUNTED)
> +			break;
> +
> +		cpos = le32_to_cpu(rec.e_cpos) +
> +			le16_to_cpu(rec.e_leaf_clusters);
> +	}
> +
> +	if (cpos < mapping_end)
> +		ret = 1;
> +
> +out_unlock2:
> +	brelse(di_bh);
> +
> +	up_read(&OCFS2_I(inode)->ip_alloc_sem);
> +
> +out_unlock1:
> +	ocfs2_inode_unlock(inode, 0);
> +
> +out:
> +	return (ret ? 0 : 1);
> +}
> +
>  int ocfs2_seek_data_hole_offset(struct file *file, loff_t *offset, int whence)
>  {
>  	struct inode *inode = file->f_mapping->host;
> diff --git a/fs/ocfs2/extent_map.h b/fs/ocfs2/extent_map.h
> index 67ea57d..fd9e86a 100644
> --- a/fs/ocfs2/extent_map.h
> +++ b/fs/ocfs2/extent_map.h
> @@ -53,6 +53,9 @@ int ocfs2_extent_map_get_blocks(struct inode *inode, u64 v_blkno, u64 *p_blkno,
>  int ocfs2_fiemap(struct inode *inode, struct fiemap_extent_info *fieinfo,
>  		 u64 map_start, u64 map_len);
>  
> +int ocfs2_overwrite_io(struct inode *inode, u64 map_start, u64 map_len,
> +		       int wait);
> +
>  int ocfs2_seek_data_hole_offset(struct file *file, loff_t *offset, int origin);
>  
>  int ocfs2_xattr_get_clusters(struct inode *inode, u32 v_cluster,
> 

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [Ocfs2-devel] [PATCH 1/3] ocfs2: add ocfs2_try_rw_lock and ocfs2_try_inode_lock
  2017-11-27  9:46   ` [Ocfs2-devel] " Gang He
@ 2017-11-28  1:52     ` Changwei Ge
  -1 siblings, 0 replies; 62+ messages in thread
From: Changwei Ge @ 2017-11-28  1:52 UTC (permalink / raw)
  To: Gang He, mfasheh, jlbec, rgoldwyn, hch; +Cc: linux-kernel, ocfs2-devel

Hi Gang,

On 2017/11/27 17:48, Gang He wrote:
> Add ocfs2_try_rw_lock and ocfs2_try_inode_lock functions, which
> will be used in non-block IO scenarios.
> 
> Signed-off-by: Gang He <ghe@suse.com>
> ---
>   fs/ocfs2/dlmglue.c | 22 ++++++++++++++++++++++
>   fs/ocfs2/dlmglue.h |  4 ++++
>   2 files changed, 26 insertions(+)
> 
> diff --git a/fs/ocfs2/dlmglue.c b/fs/ocfs2/dlmglue.c
> index 4689940..5cfbd04 100644
> --- a/fs/ocfs2/dlmglue.c
> +++ b/fs/ocfs2/dlmglue.c
> @@ -1742,6 +1742,28 @@ int ocfs2_rw_lock(struct inode *inode, int write)
>   	return status;
>   }
>   
> +int ocfs2_try_rw_lock(struct inode *inode, int write)
> +{
> +	int status, level;
> +	struct ocfs2_lock_res *lockres;
> +	struct ocfs2_super *osb = OCFS2_SB(inode->i_sb);
> +
> +	mlog(0, "inode %llu try to take %s RW lock\n",
> +	     (unsigned long long)OCFS2_I(inode)->ip_blkno,
> +	     write ? "EXMODE" : "PRMODE");
> +
> +	if (ocfs2_mount_local(osb))
> +		return 0;
> +
> +	lockres = &OCFS2_I(inode)->ip_rw_lockres;
> +
> +	level = write ? DLM_LOCK_EX : DLM_LOCK_PR;
> +
> +	status = ocfs2_cluster_lock(OCFS2_SB(inode->i_sb), lockres, level,
> +				    DLM_LKF_NOQUEUE, 0);
> +	return status;
> +}

The newly added function ocfs2_try_rw_lock almost has the same logic 
with ocfs2_rw_lock.Is it possible to combine them into an unique one?
That will be more elegant.

Moreover, can you elaborate further why we need a *NOQUEUE* lock for 
supporting non-block aio?

Why can't we wait for a while to grant a lock request? Is this necessary?

Thanks,
Changwei

> +
>   void ocfs2_rw_unlock(struct inode *inode, int write)
>   {
>   	int level = write ? DLM_LOCK_EX : DLM_LOCK_PR;
> diff --git a/fs/ocfs2/dlmglue.h b/fs/ocfs2/dlmglue.h
> index a7fc18b..05910fc 100644
> --- a/fs/ocfs2/dlmglue.h
> +++ b/fs/ocfs2/dlmglue.h
> @@ -116,6 +116,7 @@ void ocfs2_refcount_lock_res_init(struct ocfs2_lock_res *lockres,
>   int ocfs2_create_new_inode_locks(struct inode *inode);
>   int ocfs2_drop_inode_locks(struct inode *inode);
>   int ocfs2_rw_lock(struct inode *inode, int write);
> +int ocfs2_try_rw_lock(struct inode *inode, int write);
>   void ocfs2_rw_unlock(struct inode *inode, int write);
>   int ocfs2_open_lock(struct inode *inode);
>   int ocfs2_try_open_lock(struct inode *inode, int write);
> @@ -140,6 +141,9 @@ int ocfs2_inode_lock_with_page(struct inode *inode,
>   /* 99% of the time we don't want to supply any additional flags --
>    * those are for very specific cases only. */
>   #define ocfs2_inode_lock(i, b, e) ocfs2_inode_lock_full_nested(i, b, e, 0, OI_LS_NORMAL)
> +#define ocfs2_try_inode_lock(i, b, e)\
> +		ocfs2_inode_lock_full_nested(i, b, e, OCFS2_META_LOCK_NOQUEUE,\
> +		OI_LS_NORMAL)
>   void ocfs2_inode_unlock(struct inode *inode,
>   		       int ex);
>   int ocfs2_super_lock(struct ocfs2_super *osb,
> 

^ permalink raw reply	[flat|nested] 62+ messages in thread

* [Ocfs2-devel] [PATCH 1/3] ocfs2: add ocfs2_try_rw_lock and ocfs2_try_inode_lock
@ 2017-11-28  1:52     ` Changwei Ge
  0 siblings, 0 replies; 62+ messages in thread
From: Changwei Ge @ 2017-11-28  1:52 UTC (permalink / raw)
  To: Gang He, mfasheh, jlbec, rgoldwyn, hch; +Cc: linux-kernel, ocfs2-devel

Hi Gang,

On 2017/11/27 17:48, Gang He wrote:
> Add ocfs2_try_rw_lock and ocfs2_try_inode_lock functions, which
> will be used in non-block IO scenarios.
> 
> Signed-off-by: Gang He <ghe@suse.com>
> ---
>   fs/ocfs2/dlmglue.c | 22 ++++++++++++++++++++++
>   fs/ocfs2/dlmglue.h |  4 ++++
>   2 files changed, 26 insertions(+)
> 
> diff --git a/fs/ocfs2/dlmglue.c b/fs/ocfs2/dlmglue.c
> index 4689940..5cfbd04 100644
> --- a/fs/ocfs2/dlmglue.c
> +++ b/fs/ocfs2/dlmglue.c
> @@ -1742,6 +1742,28 @@ int ocfs2_rw_lock(struct inode *inode, int write)
>   	return status;
>   }
>   
> +int ocfs2_try_rw_lock(struct inode *inode, int write)
> +{
> +	int status, level;
> +	struct ocfs2_lock_res *lockres;
> +	struct ocfs2_super *osb = OCFS2_SB(inode->i_sb);
> +
> +	mlog(0, "inode %llu try to take %s RW lock\n",
> +	     (unsigned long long)OCFS2_I(inode)->ip_blkno,
> +	     write ? "EXMODE" : "PRMODE");
> +
> +	if (ocfs2_mount_local(osb))
> +		return 0;
> +
> +	lockres = &OCFS2_I(inode)->ip_rw_lockres;
> +
> +	level = write ? DLM_LOCK_EX : DLM_LOCK_PR;
> +
> +	status = ocfs2_cluster_lock(OCFS2_SB(inode->i_sb), lockres, level,
> +				    DLM_LKF_NOQUEUE, 0);
> +	return status;
> +}

The newly added function ocfs2_try_rw_lock almost has the same logic 
with ocfs2_rw_lock.Is it possible to combine them into an unique one?
That will be more elegant.

Moreover, can you elaborate further why we need a *NOQUEUE* lock for 
supporting non-block aio?

Why can't we wait for a while to grant a lock request? Is this necessary?

Thanks,
Changwei

> +
>   void ocfs2_rw_unlock(struct inode *inode, int write)
>   {
>   	int level = write ? DLM_LOCK_EX : DLM_LOCK_PR;
> diff --git a/fs/ocfs2/dlmglue.h b/fs/ocfs2/dlmglue.h
> index a7fc18b..05910fc 100644
> --- a/fs/ocfs2/dlmglue.h
> +++ b/fs/ocfs2/dlmglue.h
> @@ -116,6 +116,7 @@ void ocfs2_refcount_lock_res_init(struct ocfs2_lock_res *lockres,
>   int ocfs2_create_new_inode_locks(struct inode *inode);
>   int ocfs2_drop_inode_locks(struct inode *inode);
>   int ocfs2_rw_lock(struct inode *inode, int write);
> +int ocfs2_try_rw_lock(struct inode *inode, int write);
>   void ocfs2_rw_unlock(struct inode *inode, int write);
>   int ocfs2_open_lock(struct inode *inode);
>   int ocfs2_try_open_lock(struct inode *inode, int write);
> @@ -140,6 +141,9 @@ int ocfs2_inode_lock_with_page(struct inode *inode,
>   /* 99% of the time we don't want to supply any additional flags --
>    * those are for very specific cases only. */
>   #define ocfs2_inode_lock(i, b, e) ocfs2_inode_lock_full_nested(i, b, e, 0, OI_LS_NORMAL)
> +#define ocfs2_try_inode_lock(i, b, e)\
> +		ocfs2_inode_lock_full_nested(i, b, e, OCFS2_META_LOCK_NOQUEUE,\
> +		OI_LS_NORMAL)
>   void ocfs2_inode_unlock(struct inode *inode,
>   		       int ex);
>   int ocfs2_super_lock(struct ocfs2_super *osb,
> 

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [Ocfs2-devel] [PATCH 2/3] ocfs2: add ocfs2_overwrite_io function
  2017-11-28  1:50     ` piaojun
@ 2017-11-28  2:10       ` Changwei Ge
  -1 siblings, 0 replies; 62+ messages in thread
From: Changwei Ge @ 2017-11-28  2:10 UTC (permalink / raw)
  To: piaojun, Gang He, mfasheh, jlbec, rgoldwyn, hch; +Cc: linux-kernel, ocfs2-devel

On 2017/11/28 9:52, piaojun wrote:
> Hi Gang,
> 
> If ocfs2_overwrite_io is only called in 'nowait' scenarios, I wonder if
> we can discard 'int wait' just as ext4 does:
> 
> static bool ext4_overwrite_io(struct inode *inode, loff_t pos, loff_t len);

Yes, Jun has a point.
It seems that ocfs2_overwrite_io is only involved in non-blocking aio 
and no other code spot is calling ocfs2_overwrite_io with wait=1 passed.

> 
> thans,
> Jun
> 
> On 2017/11/27 17:46, Gang He wrote:
>> Add ocfs2_overwrite_io function, which is used to judge if
>> overwrite allocated blocks, otherwise, the write will bring extra
>> block allocation overhead.
>>
>> Signed-off-by: Gang He <ghe@suse.com>
>> ---
>>   fs/ocfs2/extent_map.c | 67 +++++++++++++++++++++++++++++++++++++++++++++++++++
>>   fs/ocfs2/extent_map.h |  3 +++
>>   2 files changed, 70 insertions(+)
>>
>> diff --git a/fs/ocfs2/extent_map.c b/fs/ocfs2/extent_map.c
>> index e4719e0..98bf325 100644
>> --- a/fs/ocfs2/extent_map.c
>> +++ b/fs/ocfs2/extent_map.c
>> @@ -832,6 +832,73 @@ int ocfs2_fiemap(struct inode *inode, struct fiemap_extent_info *fieinfo,
>>   	return ret;
>>   }
>>   
>> +/* Is IO overwriting allocated blocks? */
>> +int ocfs2_overwrite_io(struct inode *inode, u64 map_start, u64 map_len,
>> +		       int wait)
>> +{
>> +	int ret = 0, is_last;
>> +	u32 mapping_end, cpos;
>> +	struct ocfs2_super *osb = OCFS2_SB(inode->i_sb);
>> +	struct buffer_head *di_bh = NULL;
>> +	struct ocfs2_extent_rec rec;
>> +
>> +	if (wait)
>> +		ret = ocfs2_inode_lock(inode, &di_bh, 0);
>> +	else
>> +		ret = ocfs2_try_inode_lock(inode, &di_bh, 0);
>> +	if (ret)
>> +		goto out;
>> +
>> +	if (wait)
>> +		down_read(&OCFS2_I(inode)->ip_alloc_sem);
>> +	else {
>> +		if (!down_read_trylock(&OCFS2_I(inode)->ip_alloc_sem)) {
>> +			ret = -EAGAIN;
Here is a little strange, it seems that you don't care much about how 
this function fails. Why evaluate _ret_ to  -EAGAIN here and ignore it 
later?

Thanks,
Changwei

>> +			goto out_unlock1;
>> +		}
>> +	}
>> +
>> +	if ((OCFS2_I(inode)->ip_dyn_features & OCFS2_INLINE_DATA_FL) &&
>> +	   ((map_start + map_len) <= i_size_read(inode)))
>> +		goto out_unlock2;
>> +
>> +	cpos = map_start >> osb->s_clustersize_bits;
>> +	mapping_end = ocfs2_clusters_for_bytes(inode->i_sb,
>> +					       map_start + map_len);
>> +	is_last = 0;
>> +	while (cpos < mapping_end && !is_last) {
>> +		ret = ocfs2_get_clusters_nocache(inode, di_bh, cpos,
>> +						 NULL, &rec, &is_last);
>> +		if (ret) {
>> +			mlog_errno(ret);
>> +			goto out_unlock2;
>> +		}
>> +
>> +		if (rec.e_blkno == 0ULL)
>> +			break;
>> +
>> +		if (rec.e_flags & OCFS2_EXT_REFCOUNTED)
>> +			break;
>> +
>> +		cpos = le32_to_cpu(rec.e_cpos) +
>> +			le16_to_cpu(rec.e_leaf_clusters);
>> +	}
>> +
>> +	if (cpos < mapping_end)
>> +		ret = 1;
>> +
>> +out_unlock2:
>> +	brelse(di_bh);
>> +
>> +	up_read(&OCFS2_I(inode)->ip_alloc_sem);
>> +
>> +out_unlock1:
>> +	ocfs2_inode_unlock(inode, 0);
>> +
>> +out:
>> +	return (ret ? 0 : 1);
>> +}
>> +
>>   int ocfs2_seek_data_hole_offset(struct file *file, loff_t *offset, int whence)
>>   {
>>   	struct inode *inode = file->f_mapping->host;
>> diff --git a/fs/ocfs2/extent_map.h b/fs/ocfs2/extent_map.h
>> index 67ea57d..fd9e86a 100644
>> --- a/fs/ocfs2/extent_map.h
>> +++ b/fs/ocfs2/extent_map.h
>> @@ -53,6 +53,9 @@ int ocfs2_extent_map_get_blocks(struct inode *inode, u64 v_blkno, u64 *p_blkno,
>>   int ocfs2_fiemap(struct inode *inode, struct fiemap_extent_info *fieinfo,
>>   		 u64 map_start, u64 map_len);
>>   
>> +int ocfs2_overwrite_io(struct inode *inode, u64 map_start, u64 map_len,
>> +		       int wait);
>> +
>>   int ocfs2_seek_data_hole_offset(struct file *file, loff_t *offset, int origin);
>>   
>>   int ocfs2_xattr_get_clusters(struct inode *inode, u32 v_cluster,
>>
> 
> _______________________________________________
> Ocfs2-devel mailing list
> Ocfs2-devel@oss.oracle.com
> https://oss.oracle.com/mailman/listinfo/ocfs2-devel
> 

^ permalink raw reply	[flat|nested] 62+ messages in thread

* [Ocfs2-devel] [PATCH 2/3] ocfs2: add ocfs2_overwrite_io function
@ 2017-11-28  2:10       ` Changwei Ge
  0 siblings, 0 replies; 62+ messages in thread
From: Changwei Ge @ 2017-11-28  2:10 UTC (permalink / raw)
  To: piaojun, Gang He, mfasheh, jlbec, rgoldwyn, hch; +Cc: linux-kernel, ocfs2-devel

On 2017/11/28 9:52, piaojun wrote:
> Hi Gang,
> 
> If ocfs2_overwrite_io is only called in 'nowait' scenarios, I wonder if
> we can discard 'int wait' just as ext4 does:
> 
> static bool ext4_overwrite_io(struct inode *inode, loff_t pos, loff_t len);

Yes, Jun has a point.
It seems that ocfs2_overwrite_io is only involved in non-blocking aio 
and no other code spot is calling ocfs2_overwrite_io with wait=1 passed.

> 
> thans,
> Jun
> 
> On 2017/11/27 17:46, Gang He wrote:
>> Add ocfs2_overwrite_io function, which is used to judge if
>> overwrite allocated blocks, otherwise, the write will bring extra
>> block allocation overhead.
>>
>> Signed-off-by: Gang He <ghe@suse.com>
>> ---
>>   fs/ocfs2/extent_map.c | 67 +++++++++++++++++++++++++++++++++++++++++++++++++++
>>   fs/ocfs2/extent_map.h |  3 +++
>>   2 files changed, 70 insertions(+)
>>
>> diff --git a/fs/ocfs2/extent_map.c b/fs/ocfs2/extent_map.c
>> index e4719e0..98bf325 100644
>> --- a/fs/ocfs2/extent_map.c
>> +++ b/fs/ocfs2/extent_map.c
>> @@ -832,6 +832,73 @@ int ocfs2_fiemap(struct inode *inode, struct fiemap_extent_info *fieinfo,
>>   	return ret;
>>   }
>>   
>> +/* Is IO overwriting allocated blocks? */
>> +int ocfs2_overwrite_io(struct inode *inode, u64 map_start, u64 map_len,
>> +		       int wait)
>> +{
>> +	int ret = 0, is_last;
>> +	u32 mapping_end, cpos;
>> +	struct ocfs2_super *osb = OCFS2_SB(inode->i_sb);
>> +	struct buffer_head *di_bh = NULL;
>> +	struct ocfs2_extent_rec rec;
>> +
>> +	if (wait)
>> +		ret = ocfs2_inode_lock(inode, &di_bh, 0);
>> +	else
>> +		ret = ocfs2_try_inode_lock(inode, &di_bh, 0);
>> +	if (ret)
>> +		goto out;
>> +
>> +	if (wait)
>> +		down_read(&OCFS2_I(inode)->ip_alloc_sem);
>> +	else {
>> +		if (!down_read_trylock(&OCFS2_I(inode)->ip_alloc_sem)) {
>> +			ret = -EAGAIN;
Here is a little strange, it seems that you don't care much about how 
this function fails. Why evaluate _ret_ to  -EAGAIN here and ignore it 
later?

Thanks,
Changwei

>> +			goto out_unlock1;
>> +		}
>> +	}
>> +
>> +	if ((OCFS2_I(inode)->ip_dyn_features & OCFS2_INLINE_DATA_FL) &&
>> +	   ((map_start + map_len) <= i_size_read(inode)))
>> +		goto out_unlock2;
>> +
>> +	cpos = map_start >> osb->s_clustersize_bits;
>> +	mapping_end = ocfs2_clusters_for_bytes(inode->i_sb,
>> +					       map_start + map_len);
>> +	is_last = 0;
>> +	while (cpos < mapping_end && !is_last) {
>> +		ret = ocfs2_get_clusters_nocache(inode, di_bh, cpos,
>> +						 NULL, &rec, &is_last);
>> +		if (ret) {
>> +			mlog_errno(ret);
>> +			goto out_unlock2;
>> +		}
>> +
>> +		if (rec.e_blkno == 0ULL)
>> +			break;
>> +
>> +		if (rec.e_flags & OCFS2_EXT_REFCOUNTED)
>> +			break;
>> +
>> +		cpos = le32_to_cpu(rec.e_cpos) +
>> +			le16_to_cpu(rec.e_leaf_clusters);
>> +	}
>> +
>> +	if (cpos < mapping_end)
>> +		ret = 1;
>> +
>> +out_unlock2:
>> +	brelse(di_bh);
>> +
>> +	up_read(&OCFS2_I(inode)->ip_alloc_sem);
>> +
>> +out_unlock1:
>> +	ocfs2_inode_unlock(inode, 0);
>> +
>> +out:
>> +	return (ret ? 0 : 1);
>> +}
>> +
>>   int ocfs2_seek_data_hole_offset(struct file *file, loff_t *offset, int whence)
>>   {
>>   	struct inode *inode = file->f_mapping->host;
>> diff --git a/fs/ocfs2/extent_map.h b/fs/ocfs2/extent_map.h
>> index 67ea57d..fd9e86a 100644
>> --- a/fs/ocfs2/extent_map.h
>> +++ b/fs/ocfs2/extent_map.h
>> @@ -53,6 +53,9 @@ int ocfs2_extent_map_get_blocks(struct inode *inode, u64 v_blkno, u64 *p_blkno,
>>   int ocfs2_fiemap(struct inode *inode, struct fiemap_extent_info *fieinfo,
>>   		 u64 map_start, u64 map_len);
>>   
>> +int ocfs2_overwrite_io(struct inode *inode, u64 map_start, u64 map_len,
>> +		       int wait);
>> +
>>   int ocfs2_seek_data_hole_offset(struct file *file, loff_t *offset, int origin);
>>   
>>   int ocfs2_xattr_get_clusters(struct inode *inode, u32 v_cluster,
>>
> 
> _______________________________________________
> Ocfs2-devel mailing list
> Ocfs2-devel at oss.oracle.com
> https://oss.oracle.com/mailman/listinfo/ocfs2-devel
> 

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [Ocfs2-devel] [PATCH 2/3] ocfs2: add ocfs2_overwrite_io function
  2017-11-27  9:46   ` [Ocfs2-devel] " Gang He
@ 2017-11-28  2:19     ` alex chen
  -1 siblings, 0 replies; 62+ messages in thread
From: alex chen @ 2017-11-28  2:19 UTC (permalink / raw)
  To: Gang He; +Cc: mfasheh, jlbec, rgoldwyn, hch, linux-kernel, ocfs2-devel

Hi Gang,

On 2017/11/27 17:46, Gang He wrote:
> Add ocfs2_overwrite_io function, which is used to judge if
> overwrite allocated blocks, otherwise, the write will bring extra
> block allocation overhead.
> 
> Signed-off-by: Gang He <ghe@suse.com>
> ---
>  fs/ocfs2/extent_map.c | 67 +++++++++++++++++++++++++++++++++++++++++++++++++++
>  fs/ocfs2/extent_map.h |  3 +++
>  2 files changed, 70 insertions(+)
> 
> diff --git a/fs/ocfs2/extent_map.c b/fs/ocfs2/extent_map.c
> index e4719e0..98bf325 100644
> --- a/fs/ocfs2/extent_map.c
> +++ b/fs/ocfs2/extent_map.c
> @@ -832,6 +832,73 @@ int ocfs2_fiemap(struct inode *inode, struct fiemap_extent_info *fieinfo,
>  	return ret;
>  }
>  
> +/* Is IO overwriting allocated blocks? */
> +int ocfs2_overwrite_io(struct inode *inode, u64 map_start, u64 map_len,
> +		       int wait)
> +{
> +	int ret = 0, is_last;
> +	u32 mapping_end, cpos;
> +	struct ocfs2_super *osb = OCFS2_SB(inode->i_sb);
> +	struct buffer_head *di_bh = NULL;
> +	struct ocfs2_extent_rec rec;
> +
> +	if (wait)
> +		ret = ocfs2_inode_lock(inode, &di_bh, 0);
> +	else
> +		ret = ocfs2_try_inode_lock(inode, &di_bh, 0);
> +	if (ret)
> +		goto out;
> +
> +	if (wait)
> +		down_read(&OCFS2_I(inode)->ip_alloc_sem);
> +	else {
> +		if (!down_read_trylock(&OCFS2_I(inode)->ip_alloc_sem)) {
> +			ret = -EAGAIN;
> +			goto out_unlock1;
> +		}
> +	}
> +
> +	if ((OCFS2_I(inode)->ip_dyn_features & OCFS2_INLINE_DATA_FL) &&
> +	   ((map_start + map_len) <= i_size_read(inode)))
> +		goto out_unlock2;
> +
> +	cpos = map_start >> osb->s_clustersize_bits;
> +	mapping_end = ocfs2_clusters_for_bytes(inode->i_sb,
> +					       map_start + map_len);
> +	is_last = 0;
> +	while (cpos < mapping_end && !is_last) {
> +		ret = ocfs2_get_clusters_nocache(inode, di_bh, cpos,
> +						 NULL, &rec, &is_last);
> +		if (ret) {
> +			mlog_errno(ret);
> +			goto out_unlock2;
> +		}
> +
> +		if (rec.e_blkno == 0ULL)
> +			break;
I think here the blocks is not overwrite, because the hold is found and the blocks
should be allocated.
> +
> +		if (rec.e_flags & OCFS2_EXT_REFCOUNTED)
> +			break;
> +
> +		cpos = le32_to_cpu(rec.e_cpos) +
> +			le16_to_cpu(rec.e_leaf_clusters);
> +	}
> +
> +	if (cpos < mapping_end)
> +		ret = 1;
> +
> +out_unlock2:

I think the 'out_up_read' is more readable than the 'out_unlock2' .

> +	brelse(di_bh);
> +
> +	up_read(&OCFS2_I(inode)->ip_alloc_sem);
> +
> +out_unlock1:

We should release buffer head here.

> +	ocfs2_inode_unlock(inode, 0);
> +
> +out:
> +	return (ret ? 0 : 1);
> +}
> +
>  int ocfs2_seek_data_hole_offset(struct file *file, loff_t *offset, int whence)
>  {
>  	struct inode *inode = file->f_mapping->host;
> diff --git a/fs/ocfs2/extent_map.h b/fs/ocfs2/extent_map.h
> index 67ea57d..fd9e86a 100644
> --- a/fs/ocfs2/extent_map.h
> +++ b/fs/ocfs2/extent_map.h
> @@ -53,6 +53,9 @@ int ocfs2_extent_map_get_blocks(struct inode *inode, u64 v_blkno, u64 *p_blkno,
>  int ocfs2_fiemap(struct inode *inode, struct fiemap_extent_info *fieinfo,
>  		 u64 map_start, u64 map_len);
>  
> +int ocfs2_overwrite_io(struct inode *inode, u64 map_start, u64 map_len,
> +		       int wait);
> +
>  int ocfs2_seek_data_hole_offset(struct file *file, loff_t *offset, int origin);
>  
>  int ocfs2_xattr_get_clusters(struct inode *inode, u32 v_cluster,
> 

^ permalink raw reply	[flat|nested] 62+ messages in thread

* [Ocfs2-devel] [PATCH 2/3] ocfs2: add ocfs2_overwrite_io function
@ 2017-11-28  2:19     ` alex chen
  0 siblings, 0 replies; 62+ messages in thread
From: alex chen @ 2017-11-28  2:19 UTC (permalink / raw)
  To: Gang He; +Cc: mfasheh, jlbec, rgoldwyn, hch, linux-kernel, ocfs2-devel

Hi Gang,

On 2017/11/27 17:46, Gang He wrote:
> Add ocfs2_overwrite_io function, which is used to judge if
> overwrite allocated blocks, otherwise, the write will bring extra
> block allocation overhead.
> 
> Signed-off-by: Gang He <ghe@suse.com>
> ---
>  fs/ocfs2/extent_map.c | 67 +++++++++++++++++++++++++++++++++++++++++++++++++++
>  fs/ocfs2/extent_map.h |  3 +++
>  2 files changed, 70 insertions(+)
> 
> diff --git a/fs/ocfs2/extent_map.c b/fs/ocfs2/extent_map.c
> index e4719e0..98bf325 100644
> --- a/fs/ocfs2/extent_map.c
> +++ b/fs/ocfs2/extent_map.c
> @@ -832,6 +832,73 @@ int ocfs2_fiemap(struct inode *inode, struct fiemap_extent_info *fieinfo,
>  	return ret;
>  }
>  
> +/* Is IO overwriting allocated blocks? */
> +int ocfs2_overwrite_io(struct inode *inode, u64 map_start, u64 map_len,
> +		       int wait)
> +{
> +	int ret = 0, is_last;
> +	u32 mapping_end, cpos;
> +	struct ocfs2_super *osb = OCFS2_SB(inode->i_sb);
> +	struct buffer_head *di_bh = NULL;
> +	struct ocfs2_extent_rec rec;
> +
> +	if (wait)
> +		ret = ocfs2_inode_lock(inode, &di_bh, 0);
> +	else
> +		ret = ocfs2_try_inode_lock(inode, &di_bh, 0);
> +	if (ret)
> +		goto out;
> +
> +	if (wait)
> +		down_read(&OCFS2_I(inode)->ip_alloc_sem);
> +	else {
> +		if (!down_read_trylock(&OCFS2_I(inode)->ip_alloc_sem)) {
> +			ret = -EAGAIN;
> +			goto out_unlock1;
> +		}
> +	}
> +
> +	if ((OCFS2_I(inode)->ip_dyn_features & OCFS2_INLINE_DATA_FL) &&
> +	   ((map_start + map_len) <= i_size_read(inode)))
> +		goto out_unlock2;
> +
> +	cpos = map_start >> osb->s_clustersize_bits;
> +	mapping_end = ocfs2_clusters_for_bytes(inode->i_sb,
> +					       map_start + map_len);
> +	is_last = 0;
> +	while (cpos < mapping_end && !is_last) {
> +		ret = ocfs2_get_clusters_nocache(inode, di_bh, cpos,
> +						 NULL, &rec, &is_last);
> +		if (ret) {
> +			mlog_errno(ret);
> +			goto out_unlock2;
> +		}
> +
> +		if (rec.e_blkno == 0ULL)
> +			break;
I think here the blocks is not overwrite, because the hold is found and the blocks
should be allocated.
> +
> +		if (rec.e_flags & OCFS2_EXT_REFCOUNTED)
> +			break;
> +
> +		cpos = le32_to_cpu(rec.e_cpos) +
> +			le16_to_cpu(rec.e_leaf_clusters);
> +	}
> +
> +	if (cpos < mapping_end)
> +		ret = 1;
> +
> +out_unlock2:

I think the 'out_up_read' is more readable than the 'out_unlock2' .

> +	brelse(di_bh);
> +
> +	up_read(&OCFS2_I(inode)->ip_alloc_sem);
> +
> +out_unlock1:

We should release buffer head here.

> +	ocfs2_inode_unlock(inode, 0);
> +
> +out:
> +	return (ret ? 0 : 1);
> +}
> +
>  int ocfs2_seek_data_hole_offset(struct file *file, loff_t *offset, int whence)
>  {
>  	struct inode *inode = file->f_mapping->host;
> diff --git a/fs/ocfs2/extent_map.h b/fs/ocfs2/extent_map.h
> index 67ea57d..fd9e86a 100644
> --- a/fs/ocfs2/extent_map.h
> +++ b/fs/ocfs2/extent_map.h
> @@ -53,6 +53,9 @@ int ocfs2_extent_map_get_blocks(struct inode *inode, u64 v_blkno, u64 *p_blkno,
>  int ocfs2_fiemap(struct inode *inode, struct fiemap_extent_info *fieinfo,
>  		 u64 map_start, u64 map_len);
>  
> +int ocfs2_overwrite_io(struct inode *inode, u64 map_start, u64 map_len,
> +		       int wait);
> +
>  int ocfs2_seek_data_hole_offset(struct file *file, loff_t *offset, int origin);
>  
>  int ocfs2_xattr_get_clusters(struct inode *inode, u32 v_cluster,
> 

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [Ocfs2-devel] [PATCH 2/3] ocfs2: add ocfs2_overwrite_io function
  2017-11-27  9:46   ` [Ocfs2-devel] " Gang He
@ 2017-11-28  2:48     ` Changwei Ge
  -1 siblings, 0 replies; 62+ messages in thread
From: Changwei Ge @ 2017-11-28  2:48 UTC (permalink / raw)
  To: Gang He, mfasheh, jlbec, rgoldwyn, hch; +Cc: linux-kernel, ocfs2-devel

Hi,
Gang

On 2017/11/27 17:48, Gang He wrote:
> Add ocfs2_overwrite_io function, which is used to judge if
> overwrite allocated blocks, otherwise, the write will bring extra
> block allocation overhead.
> 

Can you elaborate how this overhead is introduced?
Forgive me, I don't figure it.

Thanks,
Changwei

> Signed-off-by: Gang He <ghe@suse.com>
> ---
>   fs/ocfs2/extent_map.c | 67 +++++++++++++++++++++++++++++++++++++++++++++++++++
>   fs/ocfs2/extent_map.h |  3 +++
>   2 files changed, 70 insertions(+)
> 
> diff --git a/fs/ocfs2/extent_map.c b/fs/ocfs2/extent_map.c
> index e4719e0..98bf325 100644
> --- a/fs/ocfs2/extent_map.c
> +++ b/fs/ocfs2/extent_map.c
> @@ -832,6 +832,73 @@ int ocfs2_fiemap(struct inode *inode, struct fiemap_extent_info *fieinfo,
>   	return ret;
>   }
>   
> +/* Is IO overwriting allocated blocks? */
> +int ocfs2_overwrite_io(struct inode *inode, u64 map_start, u64 map_len,
> +		       int wait)
> +{
> +	int ret = 0, is_last;
> +	u32 mapping_end, cpos;
> +	struct ocfs2_super *osb = OCFS2_SB(inode->i_sb);
> +	struct buffer_head *di_bh = NULL;
> +	struct ocfs2_extent_rec rec;
> +
> +	if (wait)
> +		ret = ocfs2_inode_lock(inode, &di_bh, 0);
> +	else
> +		ret = ocfs2_try_inode_lock(inode, &di_bh, 0);
> +	if (ret)
> +		goto out;
> +
> +	if (wait)
> +		down_read(&OCFS2_I(inode)->ip_alloc_sem);
> +	else {
> +		if (!down_read_trylock(&OCFS2_I(inode)->ip_alloc_sem)) {
> +			ret = -EAGAIN;
> +			goto out_unlock1;
> +		}
> +	}
> +
> +	if ((OCFS2_I(inode)->ip_dyn_features & OCFS2_INLINE_DATA_FL) &&
> +	   ((map_start + map_len) <= i_size_read(inode)))
> +		goto out_unlock2;
> +
> +	cpos = map_start >> osb->s_clustersize_bits;
> +	mapping_end = ocfs2_clusters_for_bytes(inode->i_sb,
> +					       map_start + map_len);
> +	is_last = 0;
> +	while (cpos < mapping_end && !is_last) {
> +		ret = ocfs2_get_clusters_nocache(inode, di_bh, cpos,
> +						 NULL, &rec, &is_last);
> +		if (ret) {
> +			mlog_errno(ret);
> +			goto out_unlock2;
> +		}
> +
> +		if (rec.e_blkno == 0ULL)
> +			break;
> +
> +		if (rec.e_flags & OCFS2_EXT_REFCOUNTED)
> +			break;
> +
> +		cpos = le32_to_cpu(rec.e_cpos) +
> +			le16_to_cpu(rec.e_leaf_clusters);
> +	}
> +
> +	if (cpos < mapping_end)
> +		ret = 1;
> +
> +out_unlock2:
> +	brelse(di_bh);
> +
> +	up_read(&OCFS2_I(inode)->ip_alloc_sem);
> +
> +out_unlock1:
> +	ocfs2_inode_unlock(inode, 0);
> +
> +out:
> +	return (ret ? 0 : 1);
> +}
> +
>   int ocfs2_seek_data_hole_offset(struct file *file, loff_t *offset, int whence)
>   {
>   	struct inode *inode = file->f_mapping->host;
> diff --git a/fs/ocfs2/extent_map.h b/fs/ocfs2/extent_map.h
> index 67ea57d..fd9e86a 100644
> --- a/fs/ocfs2/extent_map.h
> +++ b/fs/ocfs2/extent_map.h
> @@ -53,6 +53,9 @@ int ocfs2_extent_map_get_blocks(struct inode *inode, u64 v_blkno, u64 *p_blkno,
>   int ocfs2_fiemap(struct inode *inode, struct fiemap_extent_info *fieinfo,
>   		 u64 map_start, u64 map_len);
>   
> +int ocfs2_overwrite_io(struct inode *inode, u64 map_start, u64 map_len,
> +		       int wait);
> +
>   int ocfs2_seek_data_hole_offset(struct file *file, loff_t *offset, int origin);
>   
>   int ocfs2_xattr_get_clusters(struct inode *inode, u32 v_cluster,
> 

^ permalink raw reply	[flat|nested] 62+ messages in thread

* [Ocfs2-devel] [PATCH 2/3] ocfs2: add ocfs2_overwrite_io function
@ 2017-11-28  2:48     ` Changwei Ge
  0 siblings, 0 replies; 62+ messages in thread
From: Changwei Ge @ 2017-11-28  2:48 UTC (permalink / raw)
  To: Gang He, mfasheh, jlbec, rgoldwyn, hch; +Cc: linux-kernel, ocfs2-devel

Hi,
Gang

On 2017/11/27 17:48, Gang He wrote:
> Add ocfs2_overwrite_io function, which is used to judge if
> overwrite allocated blocks, otherwise, the write will bring extra
> block allocation overhead.
> 

Can you elaborate how this overhead is introduced?
Forgive me, I don't figure it.

Thanks,
Changwei

> Signed-off-by: Gang He <ghe@suse.com>
> ---
>   fs/ocfs2/extent_map.c | 67 +++++++++++++++++++++++++++++++++++++++++++++++++++
>   fs/ocfs2/extent_map.h |  3 +++
>   2 files changed, 70 insertions(+)
> 
> diff --git a/fs/ocfs2/extent_map.c b/fs/ocfs2/extent_map.c
> index e4719e0..98bf325 100644
> --- a/fs/ocfs2/extent_map.c
> +++ b/fs/ocfs2/extent_map.c
> @@ -832,6 +832,73 @@ int ocfs2_fiemap(struct inode *inode, struct fiemap_extent_info *fieinfo,
>   	return ret;
>   }
>   
> +/* Is IO overwriting allocated blocks? */
> +int ocfs2_overwrite_io(struct inode *inode, u64 map_start, u64 map_len,
> +		       int wait)
> +{
> +	int ret = 0, is_last;
> +	u32 mapping_end, cpos;
> +	struct ocfs2_super *osb = OCFS2_SB(inode->i_sb);
> +	struct buffer_head *di_bh = NULL;
> +	struct ocfs2_extent_rec rec;
> +
> +	if (wait)
> +		ret = ocfs2_inode_lock(inode, &di_bh, 0);
> +	else
> +		ret = ocfs2_try_inode_lock(inode, &di_bh, 0);
> +	if (ret)
> +		goto out;
> +
> +	if (wait)
> +		down_read(&OCFS2_I(inode)->ip_alloc_sem);
> +	else {
> +		if (!down_read_trylock(&OCFS2_I(inode)->ip_alloc_sem)) {
> +			ret = -EAGAIN;
> +			goto out_unlock1;
> +		}
> +	}
> +
> +	if ((OCFS2_I(inode)->ip_dyn_features & OCFS2_INLINE_DATA_FL) &&
> +	   ((map_start + map_len) <= i_size_read(inode)))
> +		goto out_unlock2;
> +
> +	cpos = map_start >> osb->s_clustersize_bits;
> +	mapping_end = ocfs2_clusters_for_bytes(inode->i_sb,
> +					       map_start + map_len);
> +	is_last = 0;
> +	while (cpos < mapping_end && !is_last) {
> +		ret = ocfs2_get_clusters_nocache(inode, di_bh, cpos,
> +						 NULL, &rec, &is_last);
> +		if (ret) {
> +			mlog_errno(ret);
> +			goto out_unlock2;
> +		}
> +
> +		if (rec.e_blkno == 0ULL)
> +			break;
> +
> +		if (rec.e_flags & OCFS2_EXT_REFCOUNTED)
> +			break;
> +
> +		cpos = le32_to_cpu(rec.e_cpos) +
> +			le16_to_cpu(rec.e_leaf_clusters);
> +	}
> +
> +	if (cpos < mapping_end)
> +		ret = 1;
> +
> +out_unlock2:
> +	brelse(di_bh);
> +
> +	up_read(&OCFS2_I(inode)->ip_alloc_sem);
> +
> +out_unlock1:
> +	ocfs2_inode_unlock(inode, 0);
> +
> +out:
> +	return (ret ? 0 : 1);
> +}
> +
>   int ocfs2_seek_data_hole_offset(struct file *file, loff_t *offset, int whence)
>   {
>   	struct inode *inode = file->f_mapping->host;
> diff --git a/fs/ocfs2/extent_map.h b/fs/ocfs2/extent_map.h
> index 67ea57d..fd9e86a 100644
> --- a/fs/ocfs2/extent_map.h
> +++ b/fs/ocfs2/extent_map.h
> @@ -53,6 +53,9 @@ int ocfs2_extent_map_get_blocks(struct inode *inode, u64 v_blkno, u64 *p_blkno,
>   int ocfs2_fiemap(struct inode *inode, struct fiemap_extent_info *fieinfo,
>   		 u64 map_start, u64 map_len);
>   
> +int ocfs2_overwrite_io(struct inode *inode, u64 map_start, u64 map_len,
> +		       int wait);
> +
>   int ocfs2_seek_data_hole_offset(struct file *file, loff_t *offset, int origin);
>   
>   int ocfs2_xattr_get_clusters(struct inode *inode, u32 v_cluster,
> 

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [Ocfs2-devel] [PATCH 3/3] ocfs2: nowait aio support
  2017-11-27  9:46   ` [Ocfs2-devel] " Gang He
@ 2017-11-28  2:51     ` alex chen
  -1 siblings, 0 replies; 62+ messages in thread
From: alex chen @ 2017-11-28  2:51 UTC (permalink / raw)
  To: Gang He; +Cc: mfasheh, jlbec, rgoldwyn, hch, linux-kernel, ocfs2-devel

Hi Gang,

On 2017/11/27 17:46, Gang He wrote:
> Return EAGAIN if any of the following checks fail for direct I/O:
> Can not get the related locks immediately,
> Blocks are not allocated at the write location, it will trigger
> block allocation and block IO operations.
> 
> Signed-off-by: Gang He <ghe@suse.com>
> ---
>  fs/ocfs2/dir.c         |  2 +-
>  fs/ocfs2/dlmglue.c     | 20 ++++++++++----
>  fs/ocfs2/dlmglue.h     |  2 +-
>  fs/ocfs2/file.c        | 74 +++++++++++++++++++++++++++++++++++++-------------
>  fs/ocfs2/mmap.c        |  2 +-
>  fs/ocfs2/ocfs2_trace.h | 10 ++++---
>  6 files changed, 79 insertions(+), 31 deletions(-)
> 
> diff --git a/fs/ocfs2/dir.c b/fs/ocfs2/dir.c
> index febe631..ea50901 100644
> --- a/fs/ocfs2/dir.c
> +++ b/fs/ocfs2/dir.c
> @@ -1957,7 +1957,7 @@ int ocfs2_readdir(struct file *file, struct dir_context *ctx)
>  
>  	trace_ocfs2_readdir((unsigned long long)OCFS2_I(inode)->ip_blkno);
>  
> -	error = ocfs2_inode_lock_atime(inode, file->f_path.mnt, &lock_level);
> +	error = ocfs2_inode_lock_atime(inode, file->f_path.mnt, &lock_level, 1);
>  	if (lock_level && error >= 0) {
>  		/* We release EX lock which used to update atime
>  		 * and get PR lock again to reduce contention
> diff --git a/fs/ocfs2/dlmglue.c b/fs/ocfs2/dlmglue.c
> index 5cfbd04..feb8dbe 100644
> --- a/fs/ocfs2/dlmglue.c
> +++ b/fs/ocfs2/dlmglue.c
> @@ -2516,13 +2516,18 @@ int ocfs2_inode_lock_with_page(struct inode *inode,
>  
>  int ocfs2_inode_lock_atime(struct inode *inode,
>  			  struct vfsmount *vfsmnt,
> -			  int *level)
> +			  int *level, int wait)
>  {
>  	int ret;
>  
> -	ret = ocfs2_inode_lock(inode, NULL, 0);
> +	if (wait)
> +		ret = ocfs2_inode_lock(inode, NULL, 0);
> +	else
> +		ret = ocfs2_try_inode_lock(inode, NULL, 0);
> +
>  	if (ret < 0) {
> -		mlog_errno(ret);
> +		if (ret != -EAGAIN)
> +			mlog_errno(ret);
>  		return ret;
>  	}
>  
> @@ -2534,9 +2539,14 @@ int ocfs2_inode_lock_atime(struct inode *inode,
>  		struct buffer_head *bh = NULL;
>  
>  		ocfs2_inode_unlock(inode, 0);
> -		ret = ocfs2_inode_lock(inode, &bh, 1);
> +		if (wait)
> +			ret = ocfs2_inode_lock(inode, &bh, 1);
> +		else
> +			ret = ocfs2_try_inode_lock(inode, &bh, 1);
> +
>  		if (ret < 0) {
> -			mlog_errno(ret);
> +			if (ret != -EAGAIN)
> +				mlog_errno(ret);
>  			return ret;
>  		}
>  		*level = 1;
> diff --git a/fs/ocfs2/dlmglue.h b/fs/ocfs2/dlmglue.h
> index 05910fc..c83dbb5 100644
> --- a/fs/ocfs2/dlmglue.h
> +++ b/fs/ocfs2/dlmglue.h
> @@ -123,7 +123,7 @@ void ocfs2_refcount_lock_res_init(struct ocfs2_lock_res *lockres,
>  void ocfs2_open_unlock(struct inode *inode);
>  int ocfs2_inode_lock_atime(struct inode *inode,
>  			  struct vfsmount *vfsmnt,
> -			  int *level);
> +			  int *level, int wait);
>  int ocfs2_inode_lock_full_nested(struct inode *inode,
>  			 struct buffer_head **ret_bh,
>  			 int ex,
> diff --git a/fs/ocfs2/file.c b/fs/ocfs2/file.c
> index dc455d4..900f04e 100644
> --- a/fs/ocfs2/file.c
> +++ b/fs/ocfs2/file.c
> @@ -140,6 +140,8 @@ static int ocfs2_file_open(struct inode *inode, struct file *file)
>  		spin_unlock(&oi->ip_lock);
>  	}
>  
> +	file->f_mode |= FMODE_NOWAIT;
> +
>  leave:
>  	return status;
>  }
> @@ -2132,8 +2134,7 @@ static int ocfs2_prepare_inode_for_refcount(struct inode *inode,
>  }
>  
>  static int ocfs2_prepare_inode_for_write(struct file *file,
> -					 loff_t pos,
> -					 size_t count)
> +					 loff_t pos, size_t count, int wait)
>  {
>  	int ret = 0, meta_level = 0;
>  	struct dentry *dentry = file->f_path.dentry;
> @@ -2145,10 +2146,14 @@ static int ocfs2_prepare_inode_for_write(struct file *file,
>  	 * if we need to make modifications here.
>  	 */
>  	for(;;) {
> -		ret = ocfs2_inode_lock(inode, NULL, meta_level);
> +		if (wait)
> +			ret = ocfs2_inode_lock(inode, NULL, meta_level);
> +		else
> +			ret = ocfs2_try_inode_lock(inode, NULL, meta_level);
>  		if (ret < 0) {
>  			meta_level = -1;
> -			mlog_errno(ret);
> +			if (ret != -EAGAIN)
> +				mlog_errno(ret);
>  			goto out;
>  		}
>

We will lock inode again in ocfs2_prepare_inode_for_write()->ocfs2_prepare_inode_for_refcount().
Should we add the check of 'nowait' flags?

> @@ -2199,7 +2204,7 @@ static int ocfs2_prepare_inode_for_write(struct file *file,
>  
>  out_unlock:
>  	trace_ocfs2_prepare_inode_for_write(OCFS2_I(inode)->ip_blkno,
> -					    pos, count);
> +					    pos, count, wait);
>  
>  	if (meta_level >= 0)
>  		ocfs2_inode_unlock(inode, meta_level);
> @@ -2211,7 +2216,7 @@ static int ocfs2_prepare_inode_for_write(struct file *file,
>  static ssize_t ocfs2_file_write_iter(struct kiocb *iocb,
>  				    struct iov_iter *from)
>  {
> -	int direct_io, rw_level;
> +	int rw_level;
>  	ssize_t written = 0;
>  	ssize_t ret;
>  	size_t count = iov_iter_count(from);
> @@ -2223,6 +2228,8 @@ static ssize_t ocfs2_file_write_iter(struct kiocb *iocb,
>  	void *saved_ki_complete = NULL;
>  	int append_write = ((iocb->ki_pos + count) >=
>  			i_size_read(inode) ? 1 : 0);
> +	int direct_io = iocb->ki_flags & IOCB_DIRECT ? 1 : 0;
> +	int nowait = iocb->ki_flags & IOCB_NOWAIT ? 1 : 0;
>  
>  	trace_ocfs2_file_aio_write(inode, file, file->f_path.dentry,
>  		(unsigned long long)OCFS2_I(inode)->ip_blkno,
> @@ -2230,12 +2237,17 @@ static ssize_t ocfs2_file_write_iter(struct kiocb *iocb,
>  		file->f_path.dentry->d_name.name,
>  		(unsigned int)from->nr_segs);	/* GRRRRR */
>  
> +	if (!direct_io && nowait)
> +		return -EOPNOTSUPP;
> +
>  	if (count == 0)
>  		return 0;
>  
> -	direct_io = iocb->ki_flags & IOCB_DIRECT ? 1 : 0;
> -
> -	inode_lock(inode);
> +	if (direct_io && nowait) {
> +		if (!inode_trylock(inode))
> +			return -EAGAIN;
> +	} else
> +		inode_lock(inode);
>  
>  	/*
>  	 * Concurrent O_DIRECT writes are allowed with
> @@ -2244,9 +2256,13 @@ static ssize_t ocfs2_file_write_iter(struct kiocb *iocb,
>  	 */
>  	rw_level = (!direct_io || full_coherency || append_write);
>  
> -	ret = ocfs2_rw_lock(inode, rw_level);
> +	if (direct_io && nowait)
> +		ret = ocfs2_try_rw_lock(inode, rw_level);
> +	else
> +		ret = ocfs2_rw_lock(inode, rw_level);
>  	if (ret < 0) {
> -		mlog_errno(ret);
> +		if (ret != -EAGAIN)
> +			mlog_errno(ret);
>  		goto out_mutex;
>  	}
>  
> @@ -2260,9 +2276,13 @@ static ssize_t ocfs2_file_write_iter(struct kiocb *iocb,
>  		 * other nodes to drop their caches.  Buffered I/O
>  		 * already does this in write_begin().
>  		 */
> -		ret = ocfs2_inode_lock(inode, NULL, 1);
> +		if (nowait)
> +			ret = ocfs2_try_inode_lock(inode, NULL, 1);
> +		else
> +			ret = ocfs2_inode_lock(inode, NULL, 1);
>  		if (ret < 0) {
> -			mlog_errno(ret);
> +			if (ret != -EAGAIN)
> +				mlog_errno(ret);
>  			goto out;
>  		}
>  
> @@ -2277,9 +2297,17 @@ static ssize_t ocfs2_file_write_iter(struct kiocb *iocb,
>  	}
>  	count = ret;
>  
> -	ret = ocfs2_prepare_inode_for_write(file, iocb->ki_pos, count);
> +	if (direct_io && nowait) {
> +		if (!ocfs2_overwrite_io(inode, iocb->ki_pos, count, 0)) {
> +			ret = -EAGAIN;
> +			goto out;
> +		}
> +	}
> +
> +	ret = ocfs2_prepare_inode_for_write(file, iocb->ki_pos, count, !nowait);
>  	if (ret < 0) {
> -		mlog_errno(ret);
> +		if (ret != -EAGAIN)
> +			mlog_errno(ret);
>  		goto out;
>  	}
>  
> @@ -2355,6 +2383,7 @@ static ssize_t ocfs2_file_read_iter(struct kiocb *iocb,
>  	int ret = 0, rw_level = -1, lock_level = 0;
>  	struct file *filp = iocb->ki_filp;
>  	struct inode *inode = file_inode(filp);
> +	int nowait = iocb->ki_flags & IOCB_NOWAIT ? 1 : 0;
>  
>  	trace_ocfs2_file_aio_read(inode, filp, filp->f_path.dentry,
>  			(unsigned long long)OCFS2_I(inode)->ip_blkno,
> @@ -2374,9 +2403,14 @@ static ssize_t ocfs2_file_read_iter(struct kiocb *iocb,
>  	 * need locks to protect pending reads from racing with truncate.
>  	 */
>  	if (iocb->ki_flags & IOCB_DIRECT) {
> -		ret = ocfs2_rw_lock(inode, 0);
> +		if (nowait)
> +			ret = ocfs2_try_rw_lock(inode, 0);
> +		else
> +			ret = ocfs2_rw_lock(inode, 0);
> +
>  		if (ret < 0) {
> -			mlog_errno(ret);
> +			if (ret != -EAGAIN)
> +				mlog_errno(ret);
>  			goto bail;
>  		}
>  		rw_level = 0;
> @@ -2393,9 +2427,11 @@ static ssize_t ocfs2_file_read_iter(struct kiocb *iocb,
>  	 * like i_size. This allows the checks down below
>  	 * generic_file_aio_read() a chance of actually working.
>  	 */
> -	ret = ocfs2_inode_lock_atime(inode, filp->f_path.mnt, &lock_level);
> +	ret = ocfs2_inode_lock_atime(inode, filp->f_path.mnt, &lock_level,
> +				     !nowait);

Should we judge if the flags is included O_DIRECT?

>  	if (ret < 0) {
> -		mlog_errno(ret);
> +		if (ret != -EAGAIN)
> +			mlog_errno(ret);
>  		goto bail;
>  	}
>  	ocfs2_inode_unlock(inode, lock_level);
> diff --git a/fs/ocfs2/mmap.c b/fs/ocfs2/mmap.c
> index 098f5c7..fb9a20e 100644
> --- a/fs/ocfs2/mmap.c
> +++ b/fs/ocfs2/mmap.c
> @@ -184,7 +184,7 @@ int ocfs2_mmap(struct file *file, struct vm_area_struct *vma)
>  	int ret = 0, lock_level = 0;
>  
>  	ret = ocfs2_inode_lock_atime(file_inode(file),
> -				    file->f_path.mnt, &lock_level);
> +				    file->f_path.mnt, &lock_level, 1);
>  	if (ret < 0) {
>  		mlog_errno(ret);
>  		goto out;
> diff --git a/fs/ocfs2/ocfs2_trace.h b/fs/ocfs2/ocfs2_trace.h
> index a0b5d00..e2a11aa 100644
> --- a/fs/ocfs2/ocfs2_trace.h
> +++ b/fs/ocfs2/ocfs2_trace.h
> @@ -1449,20 +1449,22 @@
>  
>  TRACE_EVENT(ocfs2_prepare_inode_for_write,
>  	TP_PROTO(unsigned long long ino, unsigned long long saved_pos,
> -		 unsigned long count),
> -	TP_ARGS(ino, saved_pos, count),
> +		 unsigned long count, int wait),
> +	TP_ARGS(ino, saved_pos, count, wait),
>  	TP_STRUCT__entry(
>  		__field(unsigned long long, ino)
>  		__field(unsigned long long, saved_pos)
>  		__field(unsigned long, count)
> +		__field(int, wait)
>  	),
>  	TP_fast_assign(
>  		__entry->ino = ino;
>  		__entry->saved_pos = saved_pos;
>  		__entry->count = count;
> +		__entry->wait = wait;
>  	),
> -	TP_printk("%llu %llu %lu", __entry->ino,
> -		  __entry->saved_pos, __entry->count)
> +	TP_printk("%llu %llu %lu %d", __entry->ino,
> +		  __entry->saved_pos, __entry->count, __entry->wait)
>  );
>  
>  DEFINE_OCFS2_INT_EVENT(generic_file_aio_read_ret);
> 

^ permalink raw reply	[flat|nested] 62+ messages in thread

* [Ocfs2-devel] [PATCH 3/3] ocfs2: nowait aio support
@ 2017-11-28  2:51     ` alex chen
  0 siblings, 0 replies; 62+ messages in thread
From: alex chen @ 2017-11-28  2:51 UTC (permalink / raw)
  To: Gang He; +Cc: mfasheh, jlbec, rgoldwyn, hch, linux-kernel, ocfs2-devel

Hi Gang,

On 2017/11/27 17:46, Gang He wrote:
> Return EAGAIN if any of the following checks fail for direct I/O:
> Can not get the related locks immediately,
> Blocks are not allocated at the write location, it will trigger
> block allocation and block IO operations.
> 
> Signed-off-by: Gang He <ghe@suse.com>
> ---
>  fs/ocfs2/dir.c         |  2 +-
>  fs/ocfs2/dlmglue.c     | 20 ++++++++++----
>  fs/ocfs2/dlmglue.h     |  2 +-
>  fs/ocfs2/file.c        | 74 +++++++++++++++++++++++++++++++++++++-------------
>  fs/ocfs2/mmap.c        |  2 +-
>  fs/ocfs2/ocfs2_trace.h | 10 ++++---
>  6 files changed, 79 insertions(+), 31 deletions(-)
> 
> diff --git a/fs/ocfs2/dir.c b/fs/ocfs2/dir.c
> index febe631..ea50901 100644
> --- a/fs/ocfs2/dir.c
> +++ b/fs/ocfs2/dir.c
> @@ -1957,7 +1957,7 @@ int ocfs2_readdir(struct file *file, struct dir_context *ctx)
>  
>  	trace_ocfs2_readdir((unsigned long long)OCFS2_I(inode)->ip_blkno);
>  
> -	error = ocfs2_inode_lock_atime(inode, file->f_path.mnt, &lock_level);
> +	error = ocfs2_inode_lock_atime(inode, file->f_path.mnt, &lock_level, 1);
>  	if (lock_level && error >= 0) {
>  		/* We release EX lock which used to update atime
>  		 * and get PR lock again to reduce contention
> diff --git a/fs/ocfs2/dlmglue.c b/fs/ocfs2/dlmglue.c
> index 5cfbd04..feb8dbe 100644
> --- a/fs/ocfs2/dlmglue.c
> +++ b/fs/ocfs2/dlmglue.c
> @@ -2516,13 +2516,18 @@ int ocfs2_inode_lock_with_page(struct inode *inode,
>  
>  int ocfs2_inode_lock_atime(struct inode *inode,
>  			  struct vfsmount *vfsmnt,
> -			  int *level)
> +			  int *level, int wait)
>  {
>  	int ret;
>  
> -	ret = ocfs2_inode_lock(inode, NULL, 0);
> +	if (wait)
> +		ret = ocfs2_inode_lock(inode, NULL, 0);
> +	else
> +		ret = ocfs2_try_inode_lock(inode, NULL, 0);
> +
>  	if (ret < 0) {
> -		mlog_errno(ret);
> +		if (ret != -EAGAIN)
> +			mlog_errno(ret);
>  		return ret;
>  	}
>  
> @@ -2534,9 +2539,14 @@ int ocfs2_inode_lock_atime(struct inode *inode,
>  		struct buffer_head *bh = NULL;
>  
>  		ocfs2_inode_unlock(inode, 0);
> -		ret = ocfs2_inode_lock(inode, &bh, 1);
> +		if (wait)
> +			ret = ocfs2_inode_lock(inode, &bh, 1);
> +		else
> +			ret = ocfs2_try_inode_lock(inode, &bh, 1);
> +
>  		if (ret < 0) {
> -			mlog_errno(ret);
> +			if (ret != -EAGAIN)
> +				mlog_errno(ret);
>  			return ret;
>  		}
>  		*level = 1;
> diff --git a/fs/ocfs2/dlmglue.h b/fs/ocfs2/dlmglue.h
> index 05910fc..c83dbb5 100644
> --- a/fs/ocfs2/dlmglue.h
> +++ b/fs/ocfs2/dlmglue.h
> @@ -123,7 +123,7 @@ void ocfs2_refcount_lock_res_init(struct ocfs2_lock_res *lockres,
>  void ocfs2_open_unlock(struct inode *inode);
>  int ocfs2_inode_lock_atime(struct inode *inode,
>  			  struct vfsmount *vfsmnt,
> -			  int *level);
> +			  int *level, int wait);
>  int ocfs2_inode_lock_full_nested(struct inode *inode,
>  			 struct buffer_head **ret_bh,
>  			 int ex,
> diff --git a/fs/ocfs2/file.c b/fs/ocfs2/file.c
> index dc455d4..900f04e 100644
> --- a/fs/ocfs2/file.c
> +++ b/fs/ocfs2/file.c
> @@ -140,6 +140,8 @@ static int ocfs2_file_open(struct inode *inode, struct file *file)
>  		spin_unlock(&oi->ip_lock);
>  	}
>  
> +	file->f_mode |= FMODE_NOWAIT;
> +
>  leave:
>  	return status;
>  }
> @@ -2132,8 +2134,7 @@ static int ocfs2_prepare_inode_for_refcount(struct inode *inode,
>  }
>  
>  static int ocfs2_prepare_inode_for_write(struct file *file,
> -					 loff_t pos,
> -					 size_t count)
> +					 loff_t pos, size_t count, int wait)
>  {
>  	int ret = 0, meta_level = 0;
>  	struct dentry *dentry = file->f_path.dentry;
> @@ -2145,10 +2146,14 @@ static int ocfs2_prepare_inode_for_write(struct file *file,
>  	 * if we need to make modifications here.
>  	 */
>  	for(;;) {
> -		ret = ocfs2_inode_lock(inode, NULL, meta_level);
> +		if (wait)
> +			ret = ocfs2_inode_lock(inode, NULL, meta_level);
> +		else
> +			ret = ocfs2_try_inode_lock(inode, NULL, meta_level);
>  		if (ret < 0) {
>  			meta_level = -1;
> -			mlog_errno(ret);
> +			if (ret != -EAGAIN)
> +				mlog_errno(ret);
>  			goto out;
>  		}
>

We will lock inode again in ocfs2_prepare_inode_for_write()->ocfs2_prepare_inode_for_refcount().
Should we add the check of 'nowait' flags?

> @@ -2199,7 +2204,7 @@ static int ocfs2_prepare_inode_for_write(struct file *file,
>  
>  out_unlock:
>  	trace_ocfs2_prepare_inode_for_write(OCFS2_I(inode)->ip_blkno,
> -					    pos, count);
> +					    pos, count, wait);
>  
>  	if (meta_level >= 0)
>  		ocfs2_inode_unlock(inode, meta_level);
> @@ -2211,7 +2216,7 @@ static int ocfs2_prepare_inode_for_write(struct file *file,
>  static ssize_t ocfs2_file_write_iter(struct kiocb *iocb,
>  				    struct iov_iter *from)
>  {
> -	int direct_io, rw_level;
> +	int rw_level;
>  	ssize_t written = 0;
>  	ssize_t ret;
>  	size_t count = iov_iter_count(from);
> @@ -2223,6 +2228,8 @@ static ssize_t ocfs2_file_write_iter(struct kiocb *iocb,
>  	void *saved_ki_complete = NULL;
>  	int append_write = ((iocb->ki_pos + count) >=
>  			i_size_read(inode) ? 1 : 0);
> +	int direct_io = iocb->ki_flags & IOCB_DIRECT ? 1 : 0;
> +	int nowait = iocb->ki_flags & IOCB_NOWAIT ? 1 : 0;
>  
>  	trace_ocfs2_file_aio_write(inode, file, file->f_path.dentry,
>  		(unsigned long long)OCFS2_I(inode)->ip_blkno,
> @@ -2230,12 +2237,17 @@ static ssize_t ocfs2_file_write_iter(struct kiocb *iocb,
>  		file->f_path.dentry->d_name.name,
>  		(unsigned int)from->nr_segs);	/* GRRRRR */
>  
> +	if (!direct_io && nowait)
> +		return -EOPNOTSUPP;
> +
>  	if (count == 0)
>  		return 0;
>  
> -	direct_io = iocb->ki_flags & IOCB_DIRECT ? 1 : 0;
> -
> -	inode_lock(inode);
> +	if (direct_io && nowait) {
> +		if (!inode_trylock(inode))
> +			return -EAGAIN;
> +	} else
> +		inode_lock(inode);
>  
>  	/*
>  	 * Concurrent O_DIRECT writes are allowed with
> @@ -2244,9 +2256,13 @@ static ssize_t ocfs2_file_write_iter(struct kiocb *iocb,
>  	 */
>  	rw_level = (!direct_io || full_coherency || append_write);
>  
> -	ret = ocfs2_rw_lock(inode, rw_level);
> +	if (direct_io && nowait)
> +		ret = ocfs2_try_rw_lock(inode, rw_level);
> +	else
> +		ret = ocfs2_rw_lock(inode, rw_level);
>  	if (ret < 0) {
> -		mlog_errno(ret);
> +		if (ret != -EAGAIN)
> +			mlog_errno(ret);
>  		goto out_mutex;
>  	}
>  
> @@ -2260,9 +2276,13 @@ static ssize_t ocfs2_file_write_iter(struct kiocb *iocb,
>  		 * other nodes to drop their caches.  Buffered I/O
>  		 * already does this in write_begin().
>  		 */
> -		ret = ocfs2_inode_lock(inode, NULL, 1);
> +		if (nowait)
> +			ret = ocfs2_try_inode_lock(inode, NULL, 1);
> +		else
> +			ret = ocfs2_inode_lock(inode, NULL, 1);
>  		if (ret < 0) {
> -			mlog_errno(ret);
> +			if (ret != -EAGAIN)
> +				mlog_errno(ret);
>  			goto out;
>  		}
>  
> @@ -2277,9 +2297,17 @@ static ssize_t ocfs2_file_write_iter(struct kiocb *iocb,
>  	}
>  	count = ret;
>  
> -	ret = ocfs2_prepare_inode_for_write(file, iocb->ki_pos, count);
> +	if (direct_io && nowait) {
> +		if (!ocfs2_overwrite_io(inode, iocb->ki_pos, count, 0)) {
> +			ret = -EAGAIN;
> +			goto out;
> +		}
> +	}
> +
> +	ret = ocfs2_prepare_inode_for_write(file, iocb->ki_pos, count, !nowait);
>  	if (ret < 0) {
> -		mlog_errno(ret);
> +		if (ret != -EAGAIN)
> +			mlog_errno(ret);
>  		goto out;
>  	}
>  
> @@ -2355,6 +2383,7 @@ static ssize_t ocfs2_file_read_iter(struct kiocb *iocb,
>  	int ret = 0, rw_level = -1, lock_level = 0;
>  	struct file *filp = iocb->ki_filp;
>  	struct inode *inode = file_inode(filp);
> +	int nowait = iocb->ki_flags & IOCB_NOWAIT ? 1 : 0;
>  
>  	trace_ocfs2_file_aio_read(inode, filp, filp->f_path.dentry,
>  			(unsigned long long)OCFS2_I(inode)->ip_blkno,
> @@ -2374,9 +2403,14 @@ static ssize_t ocfs2_file_read_iter(struct kiocb *iocb,
>  	 * need locks to protect pending reads from racing with truncate.
>  	 */
>  	if (iocb->ki_flags & IOCB_DIRECT) {
> -		ret = ocfs2_rw_lock(inode, 0);
> +		if (nowait)
> +			ret = ocfs2_try_rw_lock(inode, 0);
> +		else
> +			ret = ocfs2_rw_lock(inode, 0);
> +
>  		if (ret < 0) {
> -			mlog_errno(ret);
> +			if (ret != -EAGAIN)
> +				mlog_errno(ret);
>  			goto bail;
>  		}
>  		rw_level = 0;
> @@ -2393,9 +2427,11 @@ static ssize_t ocfs2_file_read_iter(struct kiocb *iocb,
>  	 * like i_size. This allows the checks down below
>  	 * generic_file_aio_read() a chance of actually working.
>  	 */
> -	ret = ocfs2_inode_lock_atime(inode, filp->f_path.mnt, &lock_level);
> +	ret = ocfs2_inode_lock_atime(inode, filp->f_path.mnt, &lock_level,
> +				     !nowait);

Should we judge if the flags is included O_DIRECT?

>  	if (ret < 0) {
> -		mlog_errno(ret);
> +		if (ret != -EAGAIN)
> +			mlog_errno(ret);
>  		goto bail;
>  	}
>  	ocfs2_inode_unlock(inode, lock_level);
> diff --git a/fs/ocfs2/mmap.c b/fs/ocfs2/mmap.c
> index 098f5c7..fb9a20e 100644
> --- a/fs/ocfs2/mmap.c
> +++ b/fs/ocfs2/mmap.c
> @@ -184,7 +184,7 @@ int ocfs2_mmap(struct file *file, struct vm_area_struct *vma)
>  	int ret = 0, lock_level = 0;
>  
>  	ret = ocfs2_inode_lock_atime(file_inode(file),
> -				    file->f_path.mnt, &lock_level);
> +				    file->f_path.mnt, &lock_level, 1);
>  	if (ret < 0) {
>  		mlog_errno(ret);
>  		goto out;
> diff --git a/fs/ocfs2/ocfs2_trace.h b/fs/ocfs2/ocfs2_trace.h
> index a0b5d00..e2a11aa 100644
> --- a/fs/ocfs2/ocfs2_trace.h
> +++ b/fs/ocfs2/ocfs2_trace.h
> @@ -1449,20 +1449,22 @@
>  
>  TRACE_EVENT(ocfs2_prepare_inode_for_write,
>  	TP_PROTO(unsigned long long ino, unsigned long long saved_pos,
> -		 unsigned long count),
> -	TP_ARGS(ino, saved_pos, count),
> +		 unsigned long count, int wait),
> +	TP_ARGS(ino, saved_pos, count, wait),
>  	TP_STRUCT__entry(
>  		__field(unsigned long long, ino)
>  		__field(unsigned long long, saved_pos)
>  		__field(unsigned long, count)
> +		__field(int, wait)
>  	),
>  	TP_fast_assign(
>  		__entry->ino = ino;
>  		__entry->saved_pos = saved_pos;
>  		__entry->count = count;
> +		__entry->wait = wait;
>  	),
> -	TP_printk("%llu %llu %lu", __entry->ino,
> -		  __entry->saved_pos, __entry->count)
> +	TP_printk("%llu %llu %lu %d", __entry->ino,
> +		  __entry->saved_pos, __entry->count, __entry->wait)
>  );
>  
>  DEFINE_OCFS2_INT_EVENT(generic_file_aio_read_ret);
> 

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [Ocfs2-devel] [PATCH 2/3] ocfs2: add ocfs2_overwrite_io function
  2017-11-28  1:13     ` Joseph Qi
@ 2017-11-28  3:35       ` Gang He
  -1 siblings, 0 replies; 62+ messages in thread
From: Gang He @ 2017-11-28  3:35 UTC (permalink / raw)
  To: jlbec, jiangqi903, hch, Goldwyn Rodrigues, mfasheh
  Cc: ocfs2-devel, linux-kernel

Hello Joseph,


>>> 
> Hi Gang,
> 
> On 17/11/27 17:46, Gang He wrote:
>> Add ocfs2_overwrite_io function, which is used to judge if
>> overwrite allocated blocks, otherwise, the write will bring extra
>> block allocation overhead.
>> 
>> Signed-off-by: Gang He <ghe@suse.com>
>> ---
>>  fs/ocfs2/extent_map.c | 67 
> +++++++++++++++++++++++++++++++++++++++++++++++++++
>>  fs/ocfs2/extent_map.h |  3 +++
>>  2 files changed, 70 insertions(+)
>> 
>> diff --git a/fs/ocfs2/extent_map.c b/fs/ocfs2/extent_map.c
>> index e4719e0..98bf325 100644
>> --- a/fs/ocfs2/extent_map.c
>> +++ b/fs/ocfs2/extent_map.c
>> @@ -832,6 +832,73 @@ int ocfs2_fiemap(struct inode *inode, struct 
> fiemap_extent_info *fieinfo,
>>  	return ret;
>>  }
>>  
>> +/* Is IO overwriting allocated blocks? */
>> +int ocfs2_overwrite_io(struct inode *inode, u64 map_start, u64 map_len,
>> +		       int wait)
>> +{
>> +	int ret = 0, is_last;
>> +	u32 mapping_end, cpos;
>> +	struct ocfs2_super *osb = OCFS2_SB(inode->i_sb);
>> +	struct buffer_head *di_bh = NULL;
>> +	struct ocfs2_extent_rec rec;
>> +
>> +	if (wait)
>> +		ret = ocfs2_inode_lock(inode, &di_bh, 0);
>> +	else
>> +		ret = ocfs2_try_inode_lock(inode, &di_bh, 0);
>> +	if (ret)
>> +		goto out;
>> +
>> +	if (wait)
>> +		down_read(&OCFS2_I(inode)->ip_alloc_sem);
>> +	else {
>> +		if (!down_read_trylock(&OCFS2_I(inode)->ip_alloc_sem)) {
>> +			ret = -EAGAIN;
>> +			goto out_unlock1;
>> +		}
>> +	}
>> +
>> +	if ((OCFS2_I(inode)->ip_dyn_features & OCFS2_INLINE_DATA_FL) &&
>> +	   ((map_start + map_len) <= i_size_read(inode)))
>> +		goto out_unlock2;
>> +
>> +	cpos = map_start >> osb->s_clustersize_bits;
>> +	mapping_end = ocfs2_clusters_for_bytes(inode->i_sb,
>> +					       map_start + map_len);
>> +	is_last = 0;
>> +	while (cpos < mapping_end && !is_last) {
>> +		ret = ocfs2_get_clusters_nocache(inode, di_bh, cpos,
>> +						 NULL, &rec, &is_last);
>> +		if (ret) {
>> +			mlog_errno(ret);
>> +			goto out_unlock2;
>> +		}
>> +
>> +		if (rec.e_blkno == 0ULL)
>> +			break;
>> +
>> +		if (rec.e_flags & OCFS2_EXT_REFCOUNTED)
>> +			break;
>> +
>> +		cpos = le32_to_cpu(rec.e_cpos) +
>> +			le16_to_cpu(rec.e_leaf_clusters);
>> +	}
>> +
>> +	if (cpos < mapping_end)
>> +		ret = 1;
>> +
>> +out_unlock2:
>> +	brelse(di_bh);
>> +
>> +	up_read(&OCFS2_I(inode)->ip_alloc_sem);
>> +
>> +out_unlock1:
> Should brelse(di_bh) be here?
If the code jumps to out_unlock1 directly, the di_bh pointer should be NULL, it is not necessary to release.

> 
>> +	ocfs2_inode_unlock(inode, 0);
>> +
>> +out:
>> +	return (ret ? 0 : 1);
> I don't think EAGAIN and other error code can be handled the same. We
> have to distinguish them.
Ok, I think we can add one line log to report the error in case the error is not EAGAIN. 

> 
> Thanks,
> Joseph
> 
>> +}
>> +
>>  int ocfs2_seek_data_hole_offset(struct file *file, loff_t *offset, int 
> whence)
>>  {
>>  	struct inode *inode = file->f_mapping->host;
>> diff --git a/fs/ocfs2/extent_map.h b/fs/ocfs2/extent_map.h
>> index 67ea57d..fd9e86a 100644
>> --- a/fs/ocfs2/extent_map.h
>> +++ b/fs/ocfs2/extent_map.h
>> @@ -53,6 +53,9 @@ int ocfs2_extent_map_get_blocks(struct inode *inode, u64 
> v_blkno, u64 *p_blkno,
>>  int ocfs2_fiemap(struct inode *inode, struct fiemap_extent_info *fieinfo,
>>  		 u64 map_start, u64 map_len);
>>  
>> +int ocfs2_overwrite_io(struct inode *inode, u64 map_start, u64 map_len,
>> +		       int wait);
>> +
>>  int ocfs2_seek_data_hole_offset(struct file *file, loff_t *offset, int 
> origin);
>>  
>>  int ocfs2_xattr_get_clusters(struct inode *inode, u32 v_cluster,
>> 

^ permalink raw reply	[flat|nested] 62+ messages in thread

* [Ocfs2-devel] [PATCH 2/3] ocfs2: add ocfs2_overwrite_io function
@ 2017-11-28  3:35       ` Gang He
  0 siblings, 0 replies; 62+ messages in thread
From: Gang He @ 2017-11-28  3:35 UTC (permalink / raw)
  To: jlbec, jiangqi903, hch, Goldwyn Rodrigues, mfasheh
  Cc: ocfs2-devel, linux-kernel

Hello Joseph,


>>> 
> Hi Gang,
> 
> On 17/11/27 17:46, Gang He wrote:
>> Add ocfs2_overwrite_io function, which is used to judge if
>> overwrite allocated blocks, otherwise, the write will bring extra
>> block allocation overhead.
>> 
>> Signed-off-by: Gang He <ghe@suse.com>
>> ---
>>  fs/ocfs2/extent_map.c | 67 
> +++++++++++++++++++++++++++++++++++++++++++++++++++
>>  fs/ocfs2/extent_map.h |  3 +++
>>  2 files changed, 70 insertions(+)
>> 
>> diff --git a/fs/ocfs2/extent_map.c b/fs/ocfs2/extent_map.c
>> index e4719e0..98bf325 100644
>> --- a/fs/ocfs2/extent_map.c
>> +++ b/fs/ocfs2/extent_map.c
>> @@ -832,6 +832,73 @@ int ocfs2_fiemap(struct inode *inode, struct 
> fiemap_extent_info *fieinfo,
>>  	return ret;
>>  }
>>  
>> +/* Is IO overwriting allocated blocks? */
>> +int ocfs2_overwrite_io(struct inode *inode, u64 map_start, u64 map_len,
>> +		       int wait)
>> +{
>> +	int ret = 0, is_last;
>> +	u32 mapping_end, cpos;
>> +	struct ocfs2_super *osb = OCFS2_SB(inode->i_sb);
>> +	struct buffer_head *di_bh = NULL;
>> +	struct ocfs2_extent_rec rec;
>> +
>> +	if (wait)
>> +		ret = ocfs2_inode_lock(inode, &di_bh, 0);
>> +	else
>> +		ret = ocfs2_try_inode_lock(inode, &di_bh, 0);
>> +	if (ret)
>> +		goto out;
>> +
>> +	if (wait)
>> +		down_read(&OCFS2_I(inode)->ip_alloc_sem);
>> +	else {
>> +		if (!down_read_trylock(&OCFS2_I(inode)->ip_alloc_sem)) {
>> +			ret = -EAGAIN;
>> +			goto out_unlock1;
>> +		}
>> +	}
>> +
>> +	if ((OCFS2_I(inode)->ip_dyn_features & OCFS2_INLINE_DATA_FL) &&
>> +	   ((map_start + map_len) <= i_size_read(inode)))
>> +		goto out_unlock2;
>> +
>> +	cpos = map_start >> osb->s_clustersize_bits;
>> +	mapping_end = ocfs2_clusters_for_bytes(inode->i_sb,
>> +					       map_start + map_len);
>> +	is_last = 0;
>> +	while (cpos < mapping_end && !is_last) {
>> +		ret = ocfs2_get_clusters_nocache(inode, di_bh, cpos,
>> +						 NULL, &rec, &is_last);
>> +		if (ret) {
>> +			mlog_errno(ret);
>> +			goto out_unlock2;
>> +		}
>> +
>> +		if (rec.e_blkno == 0ULL)
>> +			break;
>> +
>> +		if (rec.e_flags & OCFS2_EXT_REFCOUNTED)
>> +			break;
>> +
>> +		cpos = le32_to_cpu(rec.e_cpos) +
>> +			le16_to_cpu(rec.e_leaf_clusters);
>> +	}
>> +
>> +	if (cpos < mapping_end)
>> +		ret = 1;
>> +
>> +out_unlock2:
>> +	brelse(di_bh);
>> +
>> +	up_read(&OCFS2_I(inode)->ip_alloc_sem);
>> +
>> +out_unlock1:
> Should brelse(di_bh) be here?
If the code jumps to out_unlock1 directly, the di_bh pointer should be NULL, it is not necessary to release.

> 
>> +	ocfs2_inode_unlock(inode, 0);
>> +
>> +out:
>> +	return (ret ? 0 : 1);
> I don't think EAGAIN and other error code can be handled the same. We
> have to distinguish them.
Ok, I think we can add one line log to report the error in case the error is not EAGAIN. 

> 
> Thanks,
> Joseph
> 
>> +}
>> +
>>  int ocfs2_seek_data_hole_offset(struct file *file, loff_t *offset, int 
> whence)
>>  {
>>  	struct inode *inode = file->f_mapping->host;
>> diff --git a/fs/ocfs2/extent_map.h b/fs/ocfs2/extent_map.h
>> index 67ea57d..fd9e86a 100644
>> --- a/fs/ocfs2/extent_map.h
>> +++ b/fs/ocfs2/extent_map.h
>> @@ -53,6 +53,9 @@ int ocfs2_extent_map_get_blocks(struct inode *inode, u64 
> v_blkno, u64 *p_blkno,
>>  int ocfs2_fiemap(struct inode *inode, struct fiemap_extent_info *fieinfo,
>>  		 u64 map_start, u64 map_len);
>>  
>> +int ocfs2_overwrite_io(struct inode *inode, u64 map_start, u64 map_len,
>> +		       int wait);
>> +
>>  int ocfs2_seek_data_hole_offset(struct file *file, loff_t *offset, int 
> origin);
>>  
>>  int ocfs2_xattr_get_clusters(struct inode *inode, u32 v_cluster,
>> 

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [Ocfs2-devel] [PATCH 1/3] ocfs2: add ocfs2_try_rw_lock and ocfs2_try_inode_lock
  2017-11-28  1:32     ` piaojun
@ 2017-11-28  5:05       ` Gang He
  -1 siblings, 0 replies; 62+ messages in thread
From: Gang He @ 2017-11-28  5:05 UTC (permalink / raw)
  To: jlbec, piaojun, hch, Goldwyn Rodrigues, mfasheh; +Cc: ocfs2-devel, linux-kernel

Hello Jun,


>>> 
> Hi Gang,
> 
> On 2017/11/27 17:46, Gang He wrote:
>> Add ocfs2_try_rw_lock and ocfs2_try_inode_lock functions, which
>> will be used in non-block IO scenarios.
>> 
>> Signed-off-by: Gang He <ghe@suse.com>
>> ---
>>  fs/ocfs2/dlmglue.c | 22 ++++++++++++++++++++++
>>  fs/ocfs2/dlmglue.h |  4 ++++
>>  2 files changed, 26 insertions(+)
>> 
>> diff --git a/fs/ocfs2/dlmglue.c b/fs/ocfs2/dlmglue.c
>> index 4689940..5cfbd04 100644
>> --- a/fs/ocfs2/dlmglue.c
>> +++ b/fs/ocfs2/dlmglue.c
>> @@ -1742,6 +1742,28 @@ int ocfs2_rw_lock(struct inode *inode, int write)
>>  	return status;
>>  }
>>  
>> +int ocfs2_try_rw_lock(struct inode *inode, int write)
>> +{
>> +	int status, level;
>> +	struct ocfs2_lock_res *lockres;
>> +	struct ocfs2_super *osb = OCFS2_SB(inode->i_sb);
>> +
>> +	mlog(0, "inode %llu try to take %s RW lock\n",
>> +	     (unsigned long long)OCFS2_I(inode)->ip_blkno,
>> +	     write ? "EXMODE" : "PRMODE");
>> +
>> +	if (ocfs2_mount_local(osb))
>> +		return 0;
>> +
>> +	lockres = &OCFS2_I(inode)->ip_rw_lockres;
>> +
>> +	level = write ? DLM_LOCK_EX : DLM_LOCK_PR;
>> +
>> +	status = ocfs2_cluster_lock(OCFS2_SB(inode->i_sb), lockres, level,
>> +				    DLM_LKF_NOQUEUE, 0);
> 
> we'd better use 'osb' instead of 'OCFS2_SB(inode->i_sb)'.
Ok, I did this change in next version.

> 
>> +	return status;
>> +}
>> +
>>  void ocfs2_rw_unlock(struct inode *inode, int write)
>>  {
>>  	int level = write ? DLM_LOCK_EX : DLM_LOCK_PR;
>> diff --git a/fs/ocfs2/dlmglue.h b/fs/ocfs2/dlmglue.h
>> index a7fc18b..05910fc 100644
>> --- a/fs/ocfs2/dlmglue.h
>> +++ b/fs/ocfs2/dlmglue.h
>> @@ -116,6 +116,7 @@ void ocfs2_refcount_lock_res_init(struct ocfs2_lock_res 
> *lockres,
>>  int ocfs2_create_new_inode_locks(struct inode *inode);
>>  int ocfs2_drop_inode_locks(struct inode *inode);
>>  int ocfs2_rw_lock(struct inode *inode, int write);
>> +int ocfs2_try_rw_lock(struct inode *inode, int write);
>>  void ocfs2_rw_unlock(struct inode *inode, int write);
>>  int ocfs2_open_lock(struct inode *inode);
>>  int ocfs2_try_open_lock(struct inode *inode, int write);
>> @@ -140,6 +141,9 @@ int ocfs2_inode_lock_with_page(struct inode *inode,
>>  /* 99% of the time we don't want to supply any additional flags --
>>   * those are for very specific cases only. */
>>  #define ocfs2_inode_lock(i, b, e) ocfs2_inode_lock_full_nested(i, b, e, 0, 
> OI_LS_NORMAL)
>> +#define ocfs2_try_inode_lock(i, b, e)\
>> +		ocfs2_inode_lock_full_nested(i, b, e, OCFS2_META_LOCK_NOQUEUE,\
>> +		OI_LS_NORMAL)
>>  void ocfs2_inode_unlock(struct inode *inode,
>>  		       int ex);
>>  int ocfs2_super_lock(struct ocfs2_super *osb,
>> 

^ permalink raw reply	[flat|nested] 62+ messages in thread

* [Ocfs2-devel] [PATCH 1/3] ocfs2: add ocfs2_try_rw_lock and ocfs2_try_inode_lock
@ 2017-11-28  5:05       ` Gang He
  0 siblings, 0 replies; 62+ messages in thread
From: Gang He @ 2017-11-28  5:05 UTC (permalink / raw)
  To: jlbec, piaojun, hch, Goldwyn Rodrigues, mfasheh; +Cc: ocfs2-devel, linux-kernel

Hello Jun,


>>> 
> Hi Gang,
> 
> On 2017/11/27 17:46, Gang He wrote:
>> Add ocfs2_try_rw_lock and ocfs2_try_inode_lock functions, which
>> will be used in non-block IO scenarios.
>> 
>> Signed-off-by: Gang He <ghe@suse.com>
>> ---
>>  fs/ocfs2/dlmglue.c | 22 ++++++++++++++++++++++
>>  fs/ocfs2/dlmglue.h |  4 ++++
>>  2 files changed, 26 insertions(+)
>> 
>> diff --git a/fs/ocfs2/dlmglue.c b/fs/ocfs2/dlmglue.c
>> index 4689940..5cfbd04 100644
>> --- a/fs/ocfs2/dlmglue.c
>> +++ b/fs/ocfs2/dlmglue.c
>> @@ -1742,6 +1742,28 @@ int ocfs2_rw_lock(struct inode *inode, int write)
>>  	return status;
>>  }
>>  
>> +int ocfs2_try_rw_lock(struct inode *inode, int write)
>> +{
>> +	int status, level;
>> +	struct ocfs2_lock_res *lockres;
>> +	struct ocfs2_super *osb = OCFS2_SB(inode->i_sb);
>> +
>> +	mlog(0, "inode %llu try to take %s RW lock\n",
>> +	     (unsigned long long)OCFS2_I(inode)->ip_blkno,
>> +	     write ? "EXMODE" : "PRMODE");
>> +
>> +	if (ocfs2_mount_local(osb))
>> +		return 0;
>> +
>> +	lockres = &OCFS2_I(inode)->ip_rw_lockres;
>> +
>> +	level = write ? DLM_LOCK_EX : DLM_LOCK_PR;
>> +
>> +	status = ocfs2_cluster_lock(OCFS2_SB(inode->i_sb), lockres, level,
>> +				    DLM_LKF_NOQUEUE, 0);
> 
> we'd better use 'osb' instead of 'OCFS2_SB(inode->i_sb)'.
Ok, I did this change in next version.

> 
>> +	return status;
>> +}
>> +
>>  void ocfs2_rw_unlock(struct inode *inode, int write)
>>  {
>>  	int level = write ? DLM_LOCK_EX : DLM_LOCK_PR;
>> diff --git a/fs/ocfs2/dlmglue.h b/fs/ocfs2/dlmglue.h
>> index a7fc18b..05910fc 100644
>> --- a/fs/ocfs2/dlmglue.h
>> +++ b/fs/ocfs2/dlmglue.h
>> @@ -116,6 +116,7 @@ void ocfs2_refcount_lock_res_init(struct ocfs2_lock_res 
> *lockres,
>>  int ocfs2_create_new_inode_locks(struct inode *inode);
>>  int ocfs2_drop_inode_locks(struct inode *inode);
>>  int ocfs2_rw_lock(struct inode *inode, int write);
>> +int ocfs2_try_rw_lock(struct inode *inode, int write);
>>  void ocfs2_rw_unlock(struct inode *inode, int write);
>>  int ocfs2_open_lock(struct inode *inode);
>>  int ocfs2_try_open_lock(struct inode *inode, int write);
>> @@ -140,6 +141,9 @@ int ocfs2_inode_lock_with_page(struct inode *inode,
>>  /* 99% of the time we don't want to supply any additional flags --
>>   * those are for very specific cases only. */
>>  #define ocfs2_inode_lock(i, b, e) ocfs2_inode_lock_full_nested(i, b, e, 0, 
> OI_LS_NORMAL)
>> +#define ocfs2_try_inode_lock(i, b, e)\
>> +		ocfs2_inode_lock_full_nested(i, b, e, OCFS2_META_LOCK_NOQUEUE,\
>> +		OI_LS_NORMAL)
>>  void ocfs2_inode_unlock(struct inode *inode,
>>  		       int ex);
>>  int ocfs2_super_lock(struct ocfs2_super *osb,
>> 

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [Ocfs2-devel] [PATCH 2/3] ocfs2: add ocfs2_overwrite_io function
  2017-11-28  1:50     ` piaojun
@ 2017-11-28  5:07       ` Gang He
  -1 siblings, 0 replies; 62+ messages in thread
From: Gang He @ 2017-11-28  5:07 UTC (permalink / raw)
  To: jlbec, piaojun, hch, Goldwyn Rodrigues, mfasheh; +Cc: ocfs2-devel, linux-kernel

Hi Jun,


>>> 
> Hi Gang,
> 
> If ocfs2_overwrite_io is only called in 'nowait' scenarios, I wonder if
> we can discard 'int wait' just as ext4 does:
> 
> static bool ext4_overwrite_io(struct inode *inode, loff_t pos, loff_t len);
Ok, it looks most people prefer to get rid of "wait" parameter.

Thanks
Gang

> 
> thans,
> Jun
> 
> On 2017/11/27 17:46, Gang He wrote:
>> Add ocfs2_overwrite_io function, which is used to judge if
>> overwrite allocated blocks, otherwise, the write will bring extra
>> block allocation overhead.
>> 
>> Signed-off-by: Gang He <ghe@suse.com>
>> ---
>>  fs/ocfs2/extent_map.c | 67 
> +++++++++++++++++++++++++++++++++++++++++++++++++++
>>  fs/ocfs2/extent_map.h |  3 +++
>>  2 files changed, 70 insertions(+)
>> 
>> diff --git a/fs/ocfs2/extent_map.c b/fs/ocfs2/extent_map.c
>> index e4719e0..98bf325 100644
>> --- a/fs/ocfs2/extent_map.c
>> +++ b/fs/ocfs2/extent_map.c
>> @@ -832,6 +832,73 @@ int ocfs2_fiemap(struct inode *inode, struct 
> fiemap_extent_info *fieinfo,
>>  	return ret;
>>  }
>>  
>> +/* Is IO overwriting allocated blocks? */
>> +int ocfs2_overwrite_io(struct inode *inode, u64 map_start, u64 map_len,
>> +		       int wait)
>> +{
>> +	int ret = 0, is_last;
>> +	u32 mapping_end, cpos;
>> +	struct ocfs2_super *osb = OCFS2_SB(inode->i_sb);
>> +	struct buffer_head *di_bh = NULL;
>> +	struct ocfs2_extent_rec rec;
>> +
>> +	if (wait)
>> +		ret = ocfs2_inode_lock(inode, &di_bh, 0);
>> +	else
>> +		ret = ocfs2_try_inode_lock(inode, &di_bh, 0);
>> +	if (ret)
>> +		goto out;
>> +
>> +	if (wait)
>> +		down_read(&OCFS2_I(inode)->ip_alloc_sem);
>> +	else {
>> +		if (!down_read_trylock(&OCFS2_I(inode)->ip_alloc_sem)) {
>> +			ret = -EAGAIN;
>> +			goto out_unlock1;
>> +		}
>> +	}
>> +
>> +	if ((OCFS2_I(inode)->ip_dyn_features & OCFS2_INLINE_DATA_FL) &&
>> +	   ((map_start + map_len) <= i_size_read(inode)))
>> +		goto out_unlock2;
>> +
>> +	cpos = map_start >> osb->s_clustersize_bits;
>> +	mapping_end = ocfs2_clusters_for_bytes(inode->i_sb,
>> +					       map_start + map_len);
>> +	is_last = 0;
>> +	while (cpos < mapping_end && !is_last) {
>> +		ret = ocfs2_get_clusters_nocache(inode, di_bh, cpos,
>> +						 NULL, &rec, &is_last);
>> +		if (ret) {
>> +			mlog_errno(ret);
>> +			goto out_unlock2;
>> +		}
>> +
>> +		if (rec.e_blkno == 0ULL)
>> +			break;
>> +
>> +		if (rec.e_flags & OCFS2_EXT_REFCOUNTED)
>> +			break;
>> +
>> +		cpos = le32_to_cpu(rec.e_cpos) +
>> +			le16_to_cpu(rec.e_leaf_clusters);
>> +	}
>> +
>> +	if (cpos < mapping_end)
>> +		ret = 1;
>> +
>> +out_unlock2:
>> +	brelse(di_bh);
>> +
>> +	up_read(&OCFS2_I(inode)->ip_alloc_sem);
>> +
>> +out_unlock1:
>> +	ocfs2_inode_unlock(inode, 0);
>> +
>> +out:
>> +	return (ret ? 0 : 1);
>> +}
>> +
>>  int ocfs2_seek_data_hole_offset(struct file *file, loff_t *offset, int 
> whence)
>>  {
>>  	struct inode *inode = file->f_mapping->host;
>> diff --git a/fs/ocfs2/extent_map.h b/fs/ocfs2/extent_map.h
>> index 67ea57d..fd9e86a 100644
>> --- a/fs/ocfs2/extent_map.h
>> +++ b/fs/ocfs2/extent_map.h
>> @@ -53,6 +53,9 @@ int ocfs2_extent_map_get_blocks(struct inode *inode, u64 
> v_blkno, u64 *p_blkno,
>>  int ocfs2_fiemap(struct inode *inode, struct fiemap_extent_info *fieinfo,
>>  		 u64 map_start, u64 map_len);
>>  
>> +int ocfs2_overwrite_io(struct inode *inode, u64 map_start, u64 map_len,
>> +		       int wait);
>> +
>>  int ocfs2_seek_data_hole_offset(struct file *file, loff_t *offset, int 
> origin);
>>  
>>  int ocfs2_xattr_get_clusters(struct inode *inode, u32 v_cluster,
>> 

^ permalink raw reply	[flat|nested] 62+ messages in thread

* [Ocfs2-devel] [PATCH 2/3] ocfs2: add ocfs2_overwrite_io function
@ 2017-11-28  5:07       ` Gang He
  0 siblings, 0 replies; 62+ messages in thread
From: Gang He @ 2017-11-28  5:07 UTC (permalink / raw)
  To: jlbec, piaojun, hch, Goldwyn Rodrigues, mfasheh; +Cc: ocfs2-devel, linux-kernel

Hi Jun,


>>> 
> Hi Gang,
> 
> If ocfs2_overwrite_io is only called in 'nowait' scenarios, I wonder if
> we can discard 'int wait' just as ext4 does:
> 
> static bool ext4_overwrite_io(struct inode *inode, loff_t pos, loff_t len);
Ok, it looks most people prefer to get rid of "wait" parameter.

Thanks
Gang

> 
> thans,
> Jun
> 
> On 2017/11/27 17:46, Gang He wrote:
>> Add ocfs2_overwrite_io function, which is used to judge if
>> overwrite allocated blocks, otherwise, the write will bring extra
>> block allocation overhead.
>> 
>> Signed-off-by: Gang He <ghe@suse.com>
>> ---
>>  fs/ocfs2/extent_map.c | 67 
> +++++++++++++++++++++++++++++++++++++++++++++++++++
>>  fs/ocfs2/extent_map.h |  3 +++
>>  2 files changed, 70 insertions(+)
>> 
>> diff --git a/fs/ocfs2/extent_map.c b/fs/ocfs2/extent_map.c
>> index e4719e0..98bf325 100644
>> --- a/fs/ocfs2/extent_map.c
>> +++ b/fs/ocfs2/extent_map.c
>> @@ -832,6 +832,73 @@ int ocfs2_fiemap(struct inode *inode, struct 
> fiemap_extent_info *fieinfo,
>>  	return ret;
>>  }
>>  
>> +/* Is IO overwriting allocated blocks? */
>> +int ocfs2_overwrite_io(struct inode *inode, u64 map_start, u64 map_len,
>> +		       int wait)
>> +{
>> +	int ret = 0, is_last;
>> +	u32 mapping_end, cpos;
>> +	struct ocfs2_super *osb = OCFS2_SB(inode->i_sb);
>> +	struct buffer_head *di_bh = NULL;
>> +	struct ocfs2_extent_rec rec;
>> +
>> +	if (wait)
>> +		ret = ocfs2_inode_lock(inode, &di_bh, 0);
>> +	else
>> +		ret = ocfs2_try_inode_lock(inode, &di_bh, 0);
>> +	if (ret)
>> +		goto out;
>> +
>> +	if (wait)
>> +		down_read(&OCFS2_I(inode)->ip_alloc_sem);
>> +	else {
>> +		if (!down_read_trylock(&OCFS2_I(inode)->ip_alloc_sem)) {
>> +			ret = -EAGAIN;
>> +			goto out_unlock1;
>> +		}
>> +	}
>> +
>> +	if ((OCFS2_I(inode)->ip_dyn_features & OCFS2_INLINE_DATA_FL) &&
>> +	   ((map_start + map_len) <= i_size_read(inode)))
>> +		goto out_unlock2;
>> +
>> +	cpos = map_start >> osb->s_clustersize_bits;
>> +	mapping_end = ocfs2_clusters_for_bytes(inode->i_sb,
>> +					       map_start + map_len);
>> +	is_last = 0;
>> +	while (cpos < mapping_end && !is_last) {
>> +		ret = ocfs2_get_clusters_nocache(inode, di_bh, cpos,
>> +						 NULL, &rec, &is_last);
>> +		if (ret) {
>> +			mlog_errno(ret);
>> +			goto out_unlock2;
>> +		}
>> +
>> +		if (rec.e_blkno == 0ULL)
>> +			break;
>> +
>> +		if (rec.e_flags & OCFS2_EXT_REFCOUNTED)
>> +			break;
>> +
>> +		cpos = le32_to_cpu(rec.e_cpos) +
>> +			le16_to_cpu(rec.e_leaf_clusters);
>> +	}
>> +
>> +	if (cpos < mapping_end)
>> +		ret = 1;
>> +
>> +out_unlock2:
>> +	brelse(di_bh);
>> +
>> +	up_read(&OCFS2_I(inode)->ip_alloc_sem);
>> +
>> +out_unlock1:
>> +	ocfs2_inode_unlock(inode, 0);
>> +
>> +out:
>> +	return (ret ? 0 : 1);
>> +}
>> +
>>  int ocfs2_seek_data_hole_offset(struct file *file, loff_t *offset, int 
> whence)
>>  {
>>  	struct inode *inode = file->f_mapping->host;
>> diff --git a/fs/ocfs2/extent_map.h b/fs/ocfs2/extent_map.h
>> index 67ea57d..fd9e86a 100644
>> --- a/fs/ocfs2/extent_map.h
>> +++ b/fs/ocfs2/extent_map.h
>> @@ -53,6 +53,9 @@ int ocfs2_extent_map_get_blocks(struct inode *inode, u64 
> v_blkno, u64 *p_blkno,
>>  int ocfs2_fiemap(struct inode *inode, struct fiemap_extent_info *fieinfo,
>>  		 u64 map_start, u64 map_len);
>>  
>> +int ocfs2_overwrite_io(struct inode *inode, u64 map_start, u64 map_len,
>> +		       int wait);
>> +
>>  int ocfs2_seek_data_hole_offset(struct file *file, loff_t *offset, int 
> origin);
>>  
>>  int ocfs2_xattr_get_clusters(struct inode *inode, u32 v_cluster,
>> 

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [Ocfs2-devel] [PATCH 1/3] ocfs2: add ocfs2_try_rw_lock and ocfs2_try_inode_lock
  2017-11-28  1:52     ` Changwei Ge
@ 2017-11-28  5:26       ` Gang He
  -1 siblings, 0 replies; 62+ messages in thread
From: Gang He @ 2017-11-28  5:26 UTC (permalink / raw)
  To: jlbec, ge.changwei, hch, Goldwyn Rodrigues, mfasheh
  Cc: ocfs2-devel, linux-kernel

Hello Changwei,


>>> 
> Hi Gang,
> 
> On 2017/11/27 17:48, Gang He wrote:
>> Add ocfs2_try_rw_lock and ocfs2_try_inode_lock functions, which
>> will be used in non-block IO scenarios.
>> 
>> Signed-off-by: Gang He <ghe@suse.com>
>> ---
>>   fs/ocfs2/dlmglue.c | 22 ++++++++++++++++++++++
>>   fs/ocfs2/dlmglue.h |  4 ++++
>>   2 files changed, 26 insertions(+)
>> 
>> diff --git a/fs/ocfs2/dlmglue.c b/fs/ocfs2/dlmglue.c
>> index 4689940..5cfbd04 100644
>> --- a/fs/ocfs2/dlmglue.c
>> +++ b/fs/ocfs2/dlmglue.c
>> @@ -1742,6 +1742,28 @@ int ocfs2_rw_lock(struct inode *inode, int write)
>>   	return status;
>>   }
>>   
>> +int ocfs2_try_rw_lock(struct inode *inode, int write)
>> +{
>> +	int status, level;
>> +	struct ocfs2_lock_res *lockres;
>> +	struct ocfs2_super *osb = OCFS2_SB(inode->i_sb);
>> +
>> +	mlog(0, "inode %llu try to take %s RW lock\n",
>> +	     (unsigned long long)OCFS2_I(inode)->ip_blkno,
>> +	     write ? "EXMODE" : "PRMODE");
>> +
>> +	if (ocfs2_mount_local(osb))
>> +		return 0;
>> +
>> +	lockres = &OCFS2_I(inode)->ip_rw_lockres;
>> +
>> +	level = write ? DLM_LOCK_EX : DLM_LOCK_PR;
>> +
>> +	status = ocfs2_cluster_lock(OCFS2_SB(inode->i_sb), lockres, level,
>> +				    DLM_LKF_NOQUEUE, 0);
>> +	return status;
>> +}
> 
> The newly added function ocfs2_try_rw_lock almost has the same logic 
> with ocfs2_rw_lock.Is it possible to combine them into an unique one?
> That will be more elegant.
I prefer to keep ocfs2_try_rw_lock() separately, since there has been the similar function/code here (e.g. ocfs2_try_open_lock).
second, adding a new ocfs2_try_rw_lock() function can avoid impact the existing code.

> 
> Moreover, can you elaborate further why we need a *NOQUEUE* lock for 
> supporting non-block aio?
Non-block IO means that the invoking should return with -EAGAIN instead of being blocked to wait for certain resource (e.g. lock, block allocation, etc.).

> 
> Why can't we wait for a while to grant a lock request? Is this necessary?
Non-block IO is a way for the upper application to submit IO, if the invoking will be blocked, the invoking will failed with -EAGAIN, 
then, the upper application will submit this IO with the normal (block mode) way in a delayed thread, this IO mode will benefit some database application.

> 
> Thanks,
> Changwei
> 
>> +
>>   void ocfs2_rw_unlock(struct inode *inode, int write)
>>   {
>>   	int level = write ? DLM_LOCK_EX : DLM_LOCK_PR;
>> diff --git a/fs/ocfs2/dlmglue.h b/fs/ocfs2/dlmglue.h
>> index a7fc18b..05910fc 100644
>> --- a/fs/ocfs2/dlmglue.h
>> +++ b/fs/ocfs2/dlmglue.h
>> @@ -116,6 +116,7 @@ void ocfs2_refcount_lock_res_init(struct ocfs2_lock_res 
> *lockres,
>>   int ocfs2_create_new_inode_locks(struct inode *inode);
>>   int ocfs2_drop_inode_locks(struct inode *inode);
>>   int ocfs2_rw_lock(struct inode *inode, int write);
>> +int ocfs2_try_rw_lock(struct inode *inode, int write);
>>   void ocfs2_rw_unlock(struct inode *inode, int write);
>>   int ocfs2_open_lock(struct inode *inode);
>>   int ocfs2_try_open_lock(struct inode *inode, int write);
>> @@ -140,6 +141,9 @@ int ocfs2_inode_lock_with_page(struct inode *inode,
>>   /* 99% of the time we don't want to supply any additional flags --
>>    * those are for very specific cases only. */
>>   #define ocfs2_inode_lock(i, b, e) ocfs2_inode_lock_full_nested(i, b, e, 0, 
> OI_LS_NORMAL)
>> +#define ocfs2_try_inode_lock(i, b, e)\
>> +		ocfs2_inode_lock_full_nested(i, b, e, OCFS2_META_LOCK_NOQUEUE,\
>> +		OI_LS_NORMAL)
>>   void ocfs2_inode_unlock(struct inode *inode,
>>   		       int ex);
>>   int ocfs2_super_lock(struct ocfs2_super *osb,
>> 

^ permalink raw reply	[flat|nested] 62+ messages in thread

* [Ocfs2-devel] [PATCH 1/3] ocfs2: add ocfs2_try_rw_lock and ocfs2_try_inode_lock
@ 2017-11-28  5:26       ` Gang He
  0 siblings, 0 replies; 62+ messages in thread
From: Gang He @ 2017-11-28  5:26 UTC (permalink / raw)
  To: jlbec, ge.changwei, hch, Goldwyn Rodrigues, mfasheh
  Cc: ocfs2-devel, linux-kernel

Hello Changwei,


>>> 
> Hi Gang,
> 
> On 2017/11/27 17:48, Gang He wrote:
>> Add ocfs2_try_rw_lock and ocfs2_try_inode_lock functions, which
>> will be used in non-block IO scenarios.
>> 
>> Signed-off-by: Gang He <ghe@suse.com>
>> ---
>>   fs/ocfs2/dlmglue.c | 22 ++++++++++++++++++++++
>>   fs/ocfs2/dlmglue.h |  4 ++++
>>   2 files changed, 26 insertions(+)
>> 
>> diff --git a/fs/ocfs2/dlmglue.c b/fs/ocfs2/dlmglue.c
>> index 4689940..5cfbd04 100644
>> --- a/fs/ocfs2/dlmglue.c
>> +++ b/fs/ocfs2/dlmglue.c
>> @@ -1742,6 +1742,28 @@ int ocfs2_rw_lock(struct inode *inode, int write)
>>   	return status;
>>   }
>>   
>> +int ocfs2_try_rw_lock(struct inode *inode, int write)
>> +{
>> +	int status, level;
>> +	struct ocfs2_lock_res *lockres;
>> +	struct ocfs2_super *osb = OCFS2_SB(inode->i_sb);
>> +
>> +	mlog(0, "inode %llu try to take %s RW lock\n",
>> +	     (unsigned long long)OCFS2_I(inode)->ip_blkno,
>> +	     write ? "EXMODE" : "PRMODE");
>> +
>> +	if (ocfs2_mount_local(osb))
>> +		return 0;
>> +
>> +	lockres = &OCFS2_I(inode)->ip_rw_lockres;
>> +
>> +	level = write ? DLM_LOCK_EX : DLM_LOCK_PR;
>> +
>> +	status = ocfs2_cluster_lock(OCFS2_SB(inode->i_sb), lockres, level,
>> +				    DLM_LKF_NOQUEUE, 0);
>> +	return status;
>> +}
> 
> The newly added function ocfs2_try_rw_lock almost has the same logic 
> with ocfs2_rw_lock.Is it possible to combine them into an unique one?
> That will be more elegant.
I prefer to keep ocfs2_try_rw_lock() separately, since there has been the similar function/code here (e.g. ocfs2_try_open_lock).
second, adding a new ocfs2_try_rw_lock() function can avoid impact the existing code.

> 
> Moreover, can you elaborate further why we need a *NOQUEUE* lock for 
> supporting non-block aio?
Non-block IO means that the invoking should return with -EAGAIN instead of being blocked to wait for certain resource (e.g. lock, block allocation, etc.).

> 
> Why can't we wait for a while to grant a lock request? Is this necessary?
Non-block IO is a way for the upper application to submit IO, if the invoking will be blocked, the invoking will failed with -EAGAIN, 
then, the upper application will submit this IO with the normal (block mode) way in a delayed thread, this IO mode will benefit some database application.

> 
> Thanks,
> Changwei
> 
>> +
>>   void ocfs2_rw_unlock(struct inode *inode, int write)
>>   {
>>   	int level = write ? DLM_LOCK_EX : DLM_LOCK_PR;
>> diff --git a/fs/ocfs2/dlmglue.h b/fs/ocfs2/dlmglue.h
>> index a7fc18b..05910fc 100644
>> --- a/fs/ocfs2/dlmglue.h
>> +++ b/fs/ocfs2/dlmglue.h
>> @@ -116,6 +116,7 @@ void ocfs2_refcount_lock_res_init(struct ocfs2_lock_res 
> *lockres,
>>   int ocfs2_create_new_inode_locks(struct inode *inode);
>>   int ocfs2_drop_inode_locks(struct inode *inode);
>>   int ocfs2_rw_lock(struct inode *inode, int write);
>> +int ocfs2_try_rw_lock(struct inode *inode, int write);
>>   void ocfs2_rw_unlock(struct inode *inode, int write);
>>   int ocfs2_open_lock(struct inode *inode);
>>   int ocfs2_try_open_lock(struct inode *inode, int write);
>> @@ -140,6 +141,9 @@ int ocfs2_inode_lock_with_page(struct inode *inode,
>>   /* 99% of the time we don't want to supply any additional flags --
>>    * those are for very specific cases only. */
>>   #define ocfs2_inode_lock(i, b, e) ocfs2_inode_lock_full_nested(i, b, e, 0, 
> OI_LS_NORMAL)
>> +#define ocfs2_try_inode_lock(i, b, e)\
>> +		ocfs2_inode_lock_full_nested(i, b, e, OCFS2_META_LOCK_NOQUEUE,\
>> +		OI_LS_NORMAL)
>>   void ocfs2_inode_unlock(struct inode *inode,
>>   		       int ex);
>>   int ocfs2_super_lock(struct ocfs2_super *osb,
>> 

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [Ocfs2-devel] [PATCH 2/3] ocfs2: add ocfs2_overwrite_io function
  2017-11-28  2:10       ` Changwei Ge
@ 2017-11-28  5:27         ` Gang He
  -1 siblings, 0 replies; 62+ messages in thread
From: Gang He @ 2017-11-28  5:27 UTC (permalink / raw)
  To: jlbec, ge.changwei, piaojun, hch, Goldwyn Rodrigues, mfasheh
  Cc: ocfs2-devel, linux-kernel




>>> 
> On 2017/11/28 9:52, piaojun wrote:
>> Hi Gang,
>> 
>> If ocfs2_overwrite_io is only called in 'nowait' scenarios, I wonder if
>> we can discard 'int wait' just as ext4 does:
>> 
>> static bool ext4_overwrite_io(struct inode *inode, loff_t pos, loff_t len);
> 
> Yes, Jun has a point.
> It seems that ocfs2_overwrite_io is only involved in non-blocking aio 
> and no other code spot is calling ocfs2_overwrite_io with wait=1 passed.
Ok, I will do this change.

> 
>> 
>> thans,
>> Jun
>> 
>> On 2017/11/27 17:46, Gang He wrote:
>>> Add ocfs2_overwrite_io function, which is used to judge if
>>> overwrite allocated blocks, otherwise, the write will bring extra
>>> block allocation overhead.
>>>
>>> Signed-off-by: Gang He <ghe@suse.com>
>>> ---
>>>   fs/ocfs2/extent_map.c | 67 
> +++++++++++++++++++++++++++++++++++++++++++++++++++
>>>   fs/ocfs2/extent_map.h |  3 +++
>>>   2 files changed, 70 insertions(+)
>>>
>>> diff --git a/fs/ocfs2/extent_map.c b/fs/ocfs2/extent_map.c
>>> index e4719e0..98bf325 100644
>>> --- a/fs/ocfs2/extent_map.c
>>> +++ b/fs/ocfs2/extent_map.c
>>> @@ -832,6 +832,73 @@ int ocfs2_fiemap(struct inode *inode, struct 
> fiemap_extent_info *fieinfo,
>>>   	return ret;
>>>   }
>>>   
>>> +/* Is IO overwriting allocated blocks? */
>>> +int ocfs2_overwrite_io(struct inode *inode, u64 map_start, u64 map_len,
>>> +		       int wait)
>>> +{
>>> +	int ret = 0, is_last;
>>> +	u32 mapping_end, cpos;
>>> +	struct ocfs2_super *osb = OCFS2_SB(inode->i_sb);
>>> +	struct buffer_head *di_bh = NULL;
>>> +	struct ocfs2_extent_rec rec;
>>> +
>>> +	if (wait)
>>> +		ret = ocfs2_inode_lock(inode, &di_bh, 0);
>>> +	else
>>> +		ret = ocfs2_try_inode_lock(inode, &di_bh, 0);
>>> +	if (ret)
>>> +		goto out;
>>> +
>>> +	if (wait)
>>> +		down_read(&OCFS2_I(inode)->ip_alloc_sem);
>>> +	else {
>>> +		if (!down_read_trylock(&OCFS2_I(inode)->ip_alloc_sem)) {
>>> +			ret = -EAGAIN;
> Here is a little strange, it seems that you don't care much about how 
> this function fails. Why evaluate _ret_ to  -EAGAIN here and ignore it 
> later?
> 
> Thanks,
> Changwei
> 
>>> +			goto out_unlock1;
>>> +		}
>>> +	}
>>> +
>>> +	if ((OCFS2_I(inode)->ip_dyn_features & OCFS2_INLINE_DATA_FL) &&
>>> +	   ((map_start + map_len) <= i_size_read(inode)))
>>> +		goto out_unlock2;
>>> +
>>> +	cpos = map_start >> osb->s_clustersize_bits;
>>> +	mapping_end = ocfs2_clusters_for_bytes(inode->i_sb,
>>> +					       map_start + map_len);
>>> +	is_last = 0;
>>> +	while (cpos < mapping_end && !is_last) {
>>> +		ret = ocfs2_get_clusters_nocache(inode, di_bh, cpos,
>>> +						 NULL, &rec, &is_last);
>>> +		if (ret) {
>>> +			mlog_errno(ret);
>>> +			goto out_unlock2;
>>> +		}
>>> +
>>> +		if (rec.e_blkno == 0ULL)
>>> +			break;
>>> +
>>> +		if (rec.e_flags & OCFS2_EXT_REFCOUNTED)
>>> +			break;
>>> +
>>> +		cpos = le32_to_cpu(rec.e_cpos) +
>>> +			le16_to_cpu(rec.e_leaf_clusters);
>>> +	}
>>> +
>>> +	if (cpos < mapping_end)
>>> +		ret = 1;
>>> +
>>> +out_unlock2:
>>> +	brelse(di_bh);
>>> +
>>> +	up_read(&OCFS2_I(inode)->ip_alloc_sem);
>>> +
>>> +out_unlock1:
>>> +	ocfs2_inode_unlock(inode, 0);
>>> +
>>> +out:
>>> +	return (ret ? 0 : 1);
>>> +}
>>> +
>>>   int ocfs2_seek_data_hole_offset(struct file *file, loff_t *offset, int 
> whence)
>>>   {
>>>   	struct inode *inode = file->f_mapping->host;
>>> diff --git a/fs/ocfs2/extent_map.h b/fs/ocfs2/extent_map.h
>>> index 67ea57d..fd9e86a 100644
>>> --- a/fs/ocfs2/extent_map.h
>>> +++ b/fs/ocfs2/extent_map.h
>>> @@ -53,6 +53,9 @@ int ocfs2_extent_map_get_blocks(struct inode *inode, u64 
> v_blkno, u64 *p_blkno,
>>>   int ocfs2_fiemap(struct inode *inode, struct fiemap_extent_info *fieinfo,
>>>   		 u64 map_start, u64 map_len);
>>>   
>>> +int ocfs2_overwrite_io(struct inode *inode, u64 map_start, u64 map_len,
>>> +		       int wait);
>>> +
>>>   int ocfs2_seek_data_hole_offset(struct file *file, loff_t *offset, int 
> origin);
>>>   
>>>   int ocfs2_xattr_get_clusters(struct inode *inode, u32 v_cluster,
>>>
>> 
>> _______________________________________________
>> Ocfs2-devel mailing list
>> Ocfs2-devel@oss.oracle.com 
>> https://oss.oracle.com/mailman/listinfo/ocfs2-devel 
>> 

^ permalink raw reply	[flat|nested] 62+ messages in thread

* [Ocfs2-devel] [PATCH 2/3] ocfs2: add ocfs2_overwrite_io function
@ 2017-11-28  5:27         ` Gang He
  0 siblings, 0 replies; 62+ messages in thread
From: Gang He @ 2017-11-28  5:27 UTC (permalink / raw)
  To: jlbec, ge.changwei, piaojun, hch, Goldwyn Rodrigues, mfasheh
  Cc: ocfs2-devel, linux-kernel




>>> 
> On 2017/11/28 9:52, piaojun wrote:
>> Hi Gang,
>> 
>> If ocfs2_overwrite_io is only called in 'nowait' scenarios, I wonder if
>> we can discard 'int wait' just as ext4 does:
>> 
>> static bool ext4_overwrite_io(struct inode *inode, loff_t pos, loff_t len);
> 
> Yes, Jun has a point.
> It seems that ocfs2_overwrite_io is only involved in non-blocking aio 
> and no other code spot is calling ocfs2_overwrite_io with wait=1 passed.
Ok, I will do this change.

> 
>> 
>> thans,
>> Jun
>> 
>> On 2017/11/27 17:46, Gang He wrote:
>>> Add ocfs2_overwrite_io function, which is used to judge if
>>> overwrite allocated blocks, otherwise, the write will bring extra
>>> block allocation overhead.
>>>
>>> Signed-off-by: Gang He <ghe@suse.com>
>>> ---
>>>   fs/ocfs2/extent_map.c | 67 
> +++++++++++++++++++++++++++++++++++++++++++++++++++
>>>   fs/ocfs2/extent_map.h |  3 +++
>>>   2 files changed, 70 insertions(+)
>>>
>>> diff --git a/fs/ocfs2/extent_map.c b/fs/ocfs2/extent_map.c
>>> index e4719e0..98bf325 100644
>>> --- a/fs/ocfs2/extent_map.c
>>> +++ b/fs/ocfs2/extent_map.c
>>> @@ -832,6 +832,73 @@ int ocfs2_fiemap(struct inode *inode, struct 
> fiemap_extent_info *fieinfo,
>>>   	return ret;
>>>   }
>>>   
>>> +/* Is IO overwriting allocated blocks? */
>>> +int ocfs2_overwrite_io(struct inode *inode, u64 map_start, u64 map_len,
>>> +		       int wait)
>>> +{
>>> +	int ret = 0, is_last;
>>> +	u32 mapping_end, cpos;
>>> +	struct ocfs2_super *osb = OCFS2_SB(inode->i_sb);
>>> +	struct buffer_head *di_bh = NULL;
>>> +	struct ocfs2_extent_rec rec;
>>> +
>>> +	if (wait)
>>> +		ret = ocfs2_inode_lock(inode, &di_bh, 0);
>>> +	else
>>> +		ret = ocfs2_try_inode_lock(inode, &di_bh, 0);
>>> +	if (ret)
>>> +		goto out;
>>> +
>>> +	if (wait)
>>> +		down_read(&OCFS2_I(inode)->ip_alloc_sem);
>>> +	else {
>>> +		if (!down_read_trylock(&OCFS2_I(inode)->ip_alloc_sem)) {
>>> +			ret = -EAGAIN;
> Here is a little strange, it seems that you don't care much about how 
> this function fails. Why evaluate _ret_ to  -EAGAIN here and ignore it 
> later?
> 
> Thanks,
> Changwei
> 
>>> +			goto out_unlock1;
>>> +		}
>>> +	}
>>> +
>>> +	if ((OCFS2_I(inode)->ip_dyn_features & OCFS2_INLINE_DATA_FL) &&
>>> +	   ((map_start + map_len) <= i_size_read(inode)))
>>> +		goto out_unlock2;
>>> +
>>> +	cpos = map_start >> osb->s_clustersize_bits;
>>> +	mapping_end = ocfs2_clusters_for_bytes(inode->i_sb,
>>> +					       map_start + map_len);
>>> +	is_last = 0;
>>> +	while (cpos < mapping_end && !is_last) {
>>> +		ret = ocfs2_get_clusters_nocache(inode, di_bh, cpos,
>>> +						 NULL, &rec, &is_last);
>>> +		if (ret) {
>>> +			mlog_errno(ret);
>>> +			goto out_unlock2;
>>> +		}
>>> +
>>> +		if (rec.e_blkno == 0ULL)
>>> +			break;
>>> +
>>> +		if (rec.e_flags & OCFS2_EXT_REFCOUNTED)
>>> +			break;
>>> +
>>> +		cpos = le32_to_cpu(rec.e_cpos) +
>>> +			le16_to_cpu(rec.e_leaf_clusters);
>>> +	}
>>> +
>>> +	if (cpos < mapping_end)
>>> +		ret = 1;
>>> +
>>> +out_unlock2:
>>> +	brelse(di_bh);
>>> +
>>> +	up_read(&OCFS2_I(inode)->ip_alloc_sem);
>>> +
>>> +out_unlock1:
>>> +	ocfs2_inode_unlock(inode, 0);
>>> +
>>> +out:
>>> +	return (ret ? 0 : 1);
>>> +}
>>> +
>>>   int ocfs2_seek_data_hole_offset(struct file *file, loff_t *offset, int 
> whence)
>>>   {
>>>   	struct inode *inode = file->f_mapping->host;
>>> diff --git a/fs/ocfs2/extent_map.h b/fs/ocfs2/extent_map.h
>>> index 67ea57d..fd9e86a 100644
>>> --- a/fs/ocfs2/extent_map.h
>>> +++ b/fs/ocfs2/extent_map.h
>>> @@ -53,6 +53,9 @@ int ocfs2_extent_map_get_blocks(struct inode *inode, u64 
> v_blkno, u64 *p_blkno,
>>>   int ocfs2_fiemap(struct inode *inode, struct fiemap_extent_info *fieinfo,
>>>   		 u64 map_start, u64 map_len);
>>>   
>>> +int ocfs2_overwrite_io(struct inode *inode, u64 map_start, u64 map_len,
>>> +		       int wait);
>>> +
>>>   int ocfs2_seek_data_hole_offset(struct file *file, loff_t *offset, int 
> origin);
>>>   
>>>   int ocfs2_xattr_get_clusters(struct inode *inode, u32 v_cluster,
>>>
>> 
>> _______________________________________________
>> Ocfs2-devel mailing list
>> Ocfs2-devel at oss.oracle.com 
>> https://oss.oracle.com/mailman/listinfo/ocfs2-devel 
>> 

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [Ocfs2-devel] [PATCH 2/3] ocfs2: add ocfs2_overwrite_io function
  2017-11-28  2:19     ` alex chen
@ 2017-11-28  5:33       ` Gang He
  -1 siblings, 0 replies; 62+ messages in thread
From: Gang He @ 2017-11-28  5:33 UTC (permalink / raw)
  To: alex.chen
  Cc: jlbec, hch, ocfs2-devel, Goldwyn Rodrigues, mfasheh, linux-kernel

Hello Alex,


>>> 
> Hi Gang,
> 
> On 2017/11/27 17:46, Gang He wrote:
>> Add ocfs2_overwrite_io function, which is used to judge if
>> overwrite allocated blocks, otherwise, the write will bring extra
>> block allocation overhead.
>> 
>> Signed-off-by: Gang He <ghe@suse.com>
>> ---
>>  fs/ocfs2/extent_map.c | 67 
> +++++++++++++++++++++++++++++++++++++++++++++++++++
>>  fs/ocfs2/extent_map.h |  3 +++
>>  2 files changed, 70 insertions(+)
>> 
>> diff --git a/fs/ocfs2/extent_map.c b/fs/ocfs2/extent_map.c
>> index e4719e0..98bf325 100644
>> --- a/fs/ocfs2/extent_map.c
>> +++ b/fs/ocfs2/extent_map.c
>> @@ -832,6 +832,73 @@ int ocfs2_fiemap(struct inode *inode, struct 
> fiemap_extent_info *fieinfo,
>>  	return ret;
>>  }
>>  
>> +/* Is IO overwriting allocated blocks? */
>> +int ocfs2_overwrite_io(struct inode *inode, u64 map_start, u64 map_len,
>> +		       int wait)
>> +{
>> +	int ret = 0, is_last;
>> +	u32 mapping_end, cpos;
>> +	struct ocfs2_super *osb = OCFS2_SB(inode->i_sb);
>> +	struct buffer_head *di_bh = NULL;
>> +	struct ocfs2_extent_rec rec;
>> +
>> +	if (wait)
>> +		ret = ocfs2_inode_lock(inode, &di_bh, 0);
>> +	else
>> +		ret = ocfs2_try_inode_lock(inode, &di_bh, 0);
>> +	if (ret)
>> +		goto out;
>> +
>> +	if (wait)
>> +		down_read(&OCFS2_I(inode)->ip_alloc_sem);
>> +	else {
>> +		if (!down_read_trylock(&OCFS2_I(inode)->ip_alloc_sem)) {
>> +			ret = -EAGAIN;
>> +			goto out_unlock1;
>> +		}
>> +	}
>> +
>> +	if ((OCFS2_I(inode)->ip_dyn_features & OCFS2_INLINE_DATA_FL) &&
>> +	   ((map_start + map_len) <= i_size_read(inode)))
>> +		goto out_unlock2;
>> +
>> +	cpos = map_start >> osb->s_clustersize_bits;
>> +	mapping_end = ocfs2_clusters_for_bytes(inode->i_sb,
>> +					       map_start + map_len);
>> +	is_last = 0;
>> +	while (cpos < mapping_end && !is_last) {
>> +		ret = ocfs2_get_clusters_nocache(inode, di_bh, cpos,
>> +						 NULL, &rec, &is_last);
>> +		if (ret) {
>> +			mlog_errno(ret);
>> +			goto out_unlock2;
>> +		}
>> +
>> +		if (rec.e_blkno == 0ULL)
>> +			break;
> I think here the blocks is not overwrite, because the hold is found and the 
> blocks
> should be allocated.
If the rec.e_blkno == NULL, this means there is a hole.
The file hole means that these blocks are not allocated, it does not like unwritten block.
The unwritten blocks means that these blocks are allocated, but still have not been unwritten. 

>> +
>> +		if (rec.e_flags & OCFS2_EXT_REFCOUNTED)
>> +			break;
>> +
>> +		cpos = le32_to_cpu(rec.e_cpos) +
>> +			le16_to_cpu(rec.e_leaf_clusters);
>> +	}
>> +
>> +	if (cpos < mapping_end)
>> +		ret = 1;
>> +
>> +out_unlock2:
> 
> I think the 'out_up_read' is more readable than the 'out_unlock2' .
Ok, I will use more readable tag here.
> 
>> +	brelse(di_bh);
>> +
>> +	up_read(&OCFS2_I(inode)->ip_alloc_sem);
>> +
>> +out_unlock1:
> 
> We should release buffer head here.
> 
>> +	ocfs2_inode_unlock(inode, 0);
>> +
>> +out:
>> +	return (ret ? 0 : 1);
>> +}
>> +
>>  int ocfs2_seek_data_hole_offset(struct file *file, loff_t *offset, int 
> whence)
>>  {
>>  	struct inode *inode = file->f_mapping->host;
>> diff --git a/fs/ocfs2/extent_map.h b/fs/ocfs2/extent_map.h
>> index 67ea57d..fd9e86a 100644
>> --- a/fs/ocfs2/extent_map.h
>> +++ b/fs/ocfs2/extent_map.h
>> @@ -53,6 +53,9 @@ int ocfs2_extent_map_get_blocks(struct inode *inode, u64 
> v_blkno, u64 *p_blkno,
>>  int ocfs2_fiemap(struct inode *inode, struct fiemap_extent_info *fieinfo,
>>  		 u64 map_start, u64 map_len);
>>  
>> +int ocfs2_overwrite_io(struct inode *inode, u64 map_start, u64 map_len,
>> +		       int wait);
>> +
>>  int ocfs2_seek_data_hole_offset(struct file *file, loff_t *offset, int 
> origin);
>>  
>>  int ocfs2_xattr_get_clusters(struct inode *inode, u32 v_cluster,
>> 

^ permalink raw reply	[flat|nested] 62+ messages in thread

* [Ocfs2-devel] [PATCH 2/3] ocfs2: add ocfs2_overwrite_io function
@ 2017-11-28  5:33       ` Gang He
  0 siblings, 0 replies; 62+ messages in thread
From: Gang He @ 2017-11-28  5:33 UTC (permalink / raw)
  To: alex.chen
  Cc: jlbec, hch, ocfs2-devel, Goldwyn Rodrigues, mfasheh, linux-kernel

Hello Alex,


>>> 
> Hi Gang,
> 
> On 2017/11/27 17:46, Gang He wrote:
>> Add ocfs2_overwrite_io function, which is used to judge if
>> overwrite allocated blocks, otherwise, the write will bring extra
>> block allocation overhead.
>> 
>> Signed-off-by: Gang He <ghe@suse.com>
>> ---
>>  fs/ocfs2/extent_map.c | 67 
> +++++++++++++++++++++++++++++++++++++++++++++++++++
>>  fs/ocfs2/extent_map.h |  3 +++
>>  2 files changed, 70 insertions(+)
>> 
>> diff --git a/fs/ocfs2/extent_map.c b/fs/ocfs2/extent_map.c
>> index e4719e0..98bf325 100644
>> --- a/fs/ocfs2/extent_map.c
>> +++ b/fs/ocfs2/extent_map.c
>> @@ -832,6 +832,73 @@ int ocfs2_fiemap(struct inode *inode, struct 
> fiemap_extent_info *fieinfo,
>>  	return ret;
>>  }
>>  
>> +/* Is IO overwriting allocated blocks? */
>> +int ocfs2_overwrite_io(struct inode *inode, u64 map_start, u64 map_len,
>> +		       int wait)
>> +{
>> +	int ret = 0, is_last;
>> +	u32 mapping_end, cpos;
>> +	struct ocfs2_super *osb = OCFS2_SB(inode->i_sb);
>> +	struct buffer_head *di_bh = NULL;
>> +	struct ocfs2_extent_rec rec;
>> +
>> +	if (wait)
>> +		ret = ocfs2_inode_lock(inode, &di_bh, 0);
>> +	else
>> +		ret = ocfs2_try_inode_lock(inode, &di_bh, 0);
>> +	if (ret)
>> +		goto out;
>> +
>> +	if (wait)
>> +		down_read(&OCFS2_I(inode)->ip_alloc_sem);
>> +	else {
>> +		if (!down_read_trylock(&OCFS2_I(inode)->ip_alloc_sem)) {
>> +			ret = -EAGAIN;
>> +			goto out_unlock1;
>> +		}
>> +	}
>> +
>> +	if ((OCFS2_I(inode)->ip_dyn_features & OCFS2_INLINE_DATA_FL) &&
>> +	   ((map_start + map_len) <= i_size_read(inode)))
>> +		goto out_unlock2;
>> +
>> +	cpos = map_start >> osb->s_clustersize_bits;
>> +	mapping_end = ocfs2_clusters_for_bytes(inode->i_sb,
>> +					       map_start + map_len);
>> +	is_last = 0;
>> +	while (cpos < mapping_end && !is_last) {
>> +		ret = ocfs2_get_clusters_nocache(inode, di_bh, cpos,
>> +						 NULL, &rec, &is_last);
>> +		if (ret) {
>> +			mlog_errno(ret);
>> +			goto out_unlock2;
>> +		}
>> +
>> +		if (rec.e_blkno == 0ULL)
>> +			break;
> I think here the blocks is not overwrite, because the hold is found and the 
> blocks
> should be allocated.
If the rec.e_blkno == NULL, this means there is a hole.
The file hole means that these blocks are not allocated, it does not like unwritten block.
The unwritten blocks means that these blocks are allocated, but still have not been unwritten. 

>> +
>> +		if (rec.e_flags & OCFS2_EXT_REFCOUNTED)
>> +			break;
>> +
>> +		cpos = le32_to_cpu(rec.e_cpos) +
>> +			le16_to_cpu(rec.e_leaf_clusters);
>> +	}
>> +
>> +	if (cpos < mapping_end)
>> +		ret = 1;
>> +
>> +out_unlock2:
> 
> I think the 'out_up_read' is more readable than the 'out_unlock2' .
Ok, I will use more readable tag here.
> 
>> +	brelse(di_bh);
>> +
>> +	up_read(&OCFS2_I(inode)->ip_alloc_sem);
>> +
>> +out_unlock1:
> 
> We should release buffer head here.
> 
>> +	ocfs2_inode_unlock(inode, 0);
>> +
>> +out:
>> +	return (ret ? 0 : 1);
>> +}
>> +
>>  int ocfs2_seek_data_hole_offset(struct file *file, loff_t *offset, int 
> whence)
>>  {
>>  	struct inode *inode = file->f_mapping->host;
>> diff --git a/fs/ocfs2/extent_map.h b/fs/ocfs2/extent_map.h
>> index 67ea57d..fd9e86a 100644
>> --- a/fs/ocfs2/extent_map.h
>> +++ b/fs/ocfs2/extent_map.h
>> @@ -53,6 +53,9 @@ int ocfs2_extent_map_get_blocks(struct inode *inode, u64 
> v_blkno, u64 *p_blkno,
>>  int ocfs2_fiemap(struct inode *inode, struct fiemap_extent_info *fieinfo,
>>  		 u64 map_start, u64 map_len);
>>  
>> +int ocfs2_overwrite_io(struct inode *inode, u64 map_start, u64 map_len,
>> +		       int wait);
>> +
>>  int ocfs2_seek_data_hole_offset(struct file *file, loff_t *offset, int 
> origin);
>>  
>>  int ocfs2_xattr_get_clusters(struct inode *inode, u32 v_cluster,
>> 

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [Ocfs2-devel] [PATCH 2/3] ocfs2: add ocfs2_overwrite_io function
  2017-11-28  2:48     ` Changwei Ge
@ 2017-11-28  5:40       ` Gang He
  -1 siblings, 0 replies; 62+ messages in thread
From: Gang He @ 2017-11-28  5:40 UTC (permalink / raw)
  To: jlbec, ge.changwei, hch, Goldwyn Rodrigues, mfasheh
  Cc: ocfs2-devel, linux-kernel

Hi Changwei,


>>> 
> Hi,
> Gang
> 
> On 2017/11/27 17:48, Gang He wrote:
>> Add ocfs2_overwrite_io function, which is used to judge if
>> overwrite allocated blocks, otherwise, the write will bring extra
>> block allocation overhead.
>> 
> 
> Can you elaborate how this overhead is introduced?
> Forgive me, I don't figure it.
If the blocks have been allocated, we just write these block directly.
If these blocks have not been allocated, that means we need to allocate these block firstly before write,
this allocation will bring the IO invoking be blocked, if the upper application does not want take this kind of overhead,
he can pass a nowait flag to avoid and return immediately with a -EAGAIN error.

Thanks
Gang

> 
> Thanks,
> Changwei
> 
>> Signed-off-by: Gang He <ghe@suse.com>
>> ---
>>   fs/ocfs2/extent_map.c | 67 
> +++++++++++++++++++++++++++++++++++++++++++++++++++
>>   fs/ocfs2/extent_map.h |  3 +++
>>   2 files changed, 70 insertions(+)
>> 
>> diff --git a/fs/ocfs2/extent_map.c b/fs/ocfs2/extent_map.c
>> index e4719e0..98bf325 100644
>> --- a/fs/ocfs2/extent_map.c
>> +++ b/fs/ocfs2/extent_map.c
>> @@ -832,6 +832,73 @@ int ocfs2_fiemap(struct inode *inode, struct 
> fiemap_extent_info *fieinfo,
>>   	return ret;
>>   }
>>   
>> +/* Is IO overwriting allocated blocks? */
>> +int ocfs2_overwrite_io(struct inode *inode, u64 map_start, u64 map_len,
>> +		       int wait)
>> +{
>> +	int ret = 0, is_last;
>> +	u32 mapping_end, cpos;
>> +	struct ocfs2_super *osb = OCFS2_SB(inode->i_sb);
>> +	struct buffer_head *di_bh = NULL;
>> +	struct ocfs2_extent_rec rec;
>> +
>> +	if (wait)
>> +		ret = ocfs2_inode_lock(inode, &di_bh, 0);
>> +	else
>> +		ret = ocfs2_try_inode_lock(inode, &di_bh, 0);
>> +	if (ret)
>> +		goto out;
>> +
>> +	if (wait)
>> +		down_read(&OCFS2_I(inode)->ip_alloc_sem);
>> +	else {
>> +		if (!down_read_trylock(&OCFS2_I(inode)->ip_alloc_sem)) {
>> +			ret = -EAGAIN;
>> +			goto out_unlock1;
>> +		}
>> +	}
>> +
>> +	if ((OCFS2_I(inode)->ip_dyn_features & OCFS2_INLINE_DATA_FL) &&
>> +	   ((map_start + map_len) <= i_size_read(inode)))
>> +		goto out_unlock2;
>> +
>> +	cpos = map_start >> osb->s_clustersize_bits;
>> +	mapping_end = ocfs2_clusters_for_bytes(inode->i_sb,
>> +					       map_start + map_len);
>> +	is_last = 0;
>> +	while (cpos < mapping_end && !is_last) {
>> +		ret = ocfs2_get_clusters_nocache(inode, di_bh, cpos,
>> +						 NULL, &rec, &is_last);
>> +		if (ret) {
>> +			mlog_errno(ret);
>> +			goto out_unlock2;
>> +		}
>> +
>> +		if (rec.e_blkno == 0ULL)
>> +			break;
>> +
>> +		if (rec.e_flags & OCFS2_EXT_REFCOUNTED)
>> +			break;
>> +
>> +		cpos = le32_to_cpu(rec.e_cpos) +
>> +			le16_to_cpu(rec.e_leaf_clusters);
>> +	}
>> +
>> +	if (cpos < mapping_end)
>> +		ret = 1;
>> +
>> +out_unlock2:
>> +	brelse(di_bh);
>> +
>> +	up_read(&OCFS2_I(inode)->ip_alloc_sem);
>> +
>> +out_unlock1:
>> +	ocfs2_inode_unlock(inode, 0);
>> +
>> +out:
>> +	return (ret ? 0 : 1);
>> +}
>> +
>>   int ocfs2_seek_data_hole_offset(struct file *file, loff_t *offset, int 
> whence)
>>   {
>>   	struct inode *inode = file->f_mapping->host;
>> diff --git a/fs/ocfs2/extent_map.h b/fs/ocfs2/extent_map.h
>> index 67ea57d..fd9e86a 100644
>> --- a/fs/ocfs2/extent_map.h
>> +++ b/fs/ocfs2/extent_map.h
>> @@ -53,6 +53,9 @@ int ocfs2_extent_map_get_blocks(struct inode *inode, u64 
> v_blkno, u64 *p_blkno,
>>   int ocfs2_fiemap(struct inode *inode, struct fiemap_extent_info *fieinfo,
>>   		 u64 map_start, u64 map_len);
>>   
>> +int ocfs2_overwrite_io(struct inode *inode, u64 map_start, u64 map_len,
>> +		       int wait);
>> +
>>   int ocfs2_seek_data_hole_offset(struct file *file, loff_t *offset, int 
> origin);
>>   
>>   int ocfs2_xattr_get_clusters(struct inode *inode, u32 v_cluster,
>> 

^ permalink raw reply	[flat|nested] 62+ messages in thread

* [Ocfs2-devel] [PATCH 2/3] ocfs2: add ocfs2_overwrite_io function
@ 2017-11-28  5:40       ` Gang He
  0 siblings, 0 replies; 62+ messages in thread
From: Gang He @ 2017-11-28  5:40 UTC (permalink / raw)
  To: jlbec, ge.changwei, hch, Goldwyn Rodrigues, mfasheh
  Cc: ocfs2-devel, linux-kernel

Hi Changwei,


>>> 
> Hi,
> Gang
> 
> On 2017/11/27 17:48, Gang He wrote:
>> Add ocfs2_overwrite_io function, which is used to judge if
>> overwrite allocated blocks, otherwise, the write will bring extra
>> block allocation overhead.
>> 
> 
> Can you elaborate how this overhead is introduced?
> Forgive me, I don't figure it.
If the blocks have been allocated, we just write these block directly.
If these blocks have not been allocated, that means we need to allocate these block firstly before write,
this allocation will bring the IO invoking be blocked, if the upper application does not want take this kind of overhead,
he can pass a nowait flag to avoid and return immediately with a -EAGAIN error.

Thanks
Gang

> 
> Thanks,
> Changwei
> 
>> Signed-off-by: Gang He <ghe@suse.com>
>> ---
>>   fs/ocfs2/extent_map.c | 67 
> +++++++++++++++++++++++++++++++++++++++++++++++++++
>>   fs/ocfs2/extent_map.h |  3 +++
>>   2 files changed, 70 insertions(+)
>> 
>> diff --git a/fs/ocfs2/extent_map.c b/fs/ocfs2/extent_map.c
>> index e4719e0..98bf325 100644
>> --- a/fs/ocfs2/extent_map.c
>> +++ b/fs/ocfs2/extent_map.c
>> @@ -832,6 +832,73 @@ int ocfs2_fiemap(struct inode *inode, struct 
> fiemap_extent_info *fieinfo,
>>   	return ret;
>>   }
>>   
>> +/* Is IO overwriting allocated blocks? */
>> +int ocfs2_overwrite_io(struct inode *inode, u64 map_start, u64 map_len,
>> +		       int wait)
>> +{
>> +	int ret = 0, is_last;
>> +	u32 mapping_end, cpos;
>> +	struct ocfs2_super *osb = OCFS2_SB(inode->i_sb);
>> +	struct buffer_head *di_bh = NULL;
>> +	struct ocfs2_extent_rec rec;
>> +
>> +	if (wait)
>> +		ret = ocfs2_inode_lock(inode, &di_bh, 0);
>> +	else
>> +		ret = ocfs2_try_inode_lock(inode, &di_bh, 0);
>> +	if (ret)
>> +		goto out;
>> +
>> +	if (wait)
>> +		down_read(&OCFS2_I(inode)->ip_alloc_sem);
>> +	else {
>> +		if (!down_read_trylock(&OCFS2_I(inode)->ip_alloc_sem)) {
>> +			ret = -EAGAIN;
>> +			goto out_unlock1;
>> +		}
>> +	}
>> +
>> +	if ((OCFS2_I(inode)->ip_dyn_features & OCFS2_INLINE_DATA_FL) &&
>> +	   ((map_start + map_len) <= i_size_read(inode)))
>> +		goto out_unlock2;
>> +
>> +	cpos = map_start >> osb->s_clustersize_bits;
>> +	mapping_end = ocfs2_clusters_for_bytes(inode->i_sb,
>> +					       map_start + map_len);
>> +	is_last = 0;
>> +	while (cpos < mapping_end && !is_last) {
>> +		ret = ocfs2_get_clusters_nocache(inode, di_bh, cpos,
>> +						 NULL, &rec, &is_last);
>> +		if (ret) {
>> +			mlog_errno(ret);
>> +			goto out_unlock2;
>> +		}
>> +
>> +		if (rec.e_blkno == 0ULL)
>> +			break;
>> +
>> +		if (rec.e_flags & OCFS2_EXT_REFCOUNTED)
>> +			break;
>> +
>> +		cpos = le32_to_cpu(rec.e_cpos) +
>> +			le16_to_cpu(rec.e_leaf_clusters);
>> +	}
>> +
>> +	if (cpos < mapping_end)
>> +		ret = 1;
>> +
>> +out_unlock2:
>> +	brelse(di_bh);
>> +
>> +	up_read(&OCFS2_I(inode)->ip_alloc_sem);
>> +
>> +out_unlock1:
>> +	ocfs2_inode_unlock(inode, 0);
>> +
>> +out:
>> +	return (ret ? 0 : 1);
>> +}
>> +
>>   int ocfs2_seek_data_hole_offset(struct file *file, loff_t *offset, int 
> whence)
>>   {
>>   	struct inode *inode = file->f_mapping->host;
>> diff --git a/fs/ocfs2/extent_map.h b/fs/ocfs2/extent_map.h
>> index 67ea57d..fd9e86a 100644
>> --- a/fs/ocfs2/extent_map.h
>> +++ b/fs/ocfs2/extent_map.h
>> @@ -53,6 +53,9 @@ int ocfs2_extent_map_get_blocks(struct inode *inode, u64 
> v_blkno, u64 *p_blkno,
>>   int ocfs2_fiemap(struct inode *inode, struct fiemap_extent_info *fieinfo,
>>   		 u64 map_start, u64 map_len);
>>   
>> +int ocfs2_overwrite_io(struct inode *inode, u64 map_start, u64 map_len,
>> +		       int wait);
>> +
>>   int ocfs2_seek_data_hole_offset(struct file *file, loff_t *offset, int 
> origin);
>>   
>>   int ocfs2_xattr_get_clusters(struct inode *inode, u32 v_cluster,
>> 

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [Ocfs2-devel] [PATCH 2/3] ocfs2: add ocfs2_overwrite_io function
  2017-11-28  5:40       ` Gang He
@ 2017-11-28  5:48         ` Changwei Ge
  -1 siblings, 0 replies; 62+ messages in thread
From: Changwei Ge @ 2017-11-28  5:48 UTC (permalink / raw)
  To: Gang He, jlbec, hch, Goldwyn Rodrigues, mfasheh; +Cc: ocfs2-devel, linux-kernel

On 2017/11/28 13:44, Gang He wrote:
> Hi Changwei,
> 
> 
>>>>
>> Hi,
>> Gang
>>
>> On 2017/11/27 17:48, Gang He wrote:
>>> Add ocfs2_overwrite_io function, which is used to judge if
>>> overwrite allocated blocks, otherwise, the write will bring extra
>>> block allocation overhead.
>>>
>>
>> Can you elaborate how this overhead is introduced?
>> Forgive me, I don't figure it.
> If the blocks have been allocated, we just write these block directly.
> If these blocks have not been allocated, that means we need to allocate these block firstly before write,
> this allocation will bring the IO invoking be blocked, if the upper application does not want take this kind of overhead,
> he can pass a nowait flag to avoid and return immediately with a -EAGAIN error.

Thanks for your answer and contribution.
This makes sense.

Changwei

> 
> Thanks
> Gang
> 
>>
>> Thanks,
>> Changwei
>>
>>> Signed-off-by: Gang He <ghe@suse.com>
>>> ---
>>>    fs/ocfs2/extent_map.c | 67
>> +++++++++++++++++++++++++++++++++++++++++++++++++++
>>>    fs/ocfs2/extent_map.h |  3 +++
>>>    2 files changed, 70 insertions(+)
>>>
>>> diff --git a/fs/ocfs2/extent_map.c b/fs/ocfs2/extent_map.c
>>> index e4719e0..98bf325 100644
>>> --- a/fs/ocfs2/extent_map.c
>>> +++ b/fs/ocfs2/extent_map.c
>>> @@ -832,6 +832,73 @@ int ocfs2_fiemap(struct inode *inode, struct
>> fiemap_extent_info *fieinfo,
>>>    	return ret;
>>>    }
>>>    
>>> +/* Is IO overwriting allocated blocks? */
>>> +int ocfs2_overwrite_io(struct inode *inode, u64 map_start, u64 map_len,
>>> +		       int wait)
>>> +{
>>> +	int ret = 0, is_last;
>>> +	u32 mapping_end, cpos;
>>> +	struct ocfs2_super *osb = OCFS2_SB(inode->i_sb);
>>> +	struct buffer_head *di_bh = NULL;
>>> +	struct ocfs2_extent_rec rec;
>>> +
>>> +	if (wait)
>>> +		ret = ocfs2_inode_lock(inode, &di_bh, 0);
>>> +	else
>>> +		ret = ocfs2_try_inode_lock(inode, &di_bh, 0);
>>> +	if (ret)
>>> +		goto out;
>>> +
>>> +	if (wait)
>>> +		down_read(&OCFS2_I(inode)->ip_alloc_sem);
>>> +	else {
>>> +		if (!down_read_trylock(&OCFS2_I(inode)->ip_alloc_sem)) {
>>> +			ret = -EAGAIN;
>>> +			goto out_unlock1;
>>> +		}
>>> +	}
>>> +
>>> +	if ((OCFS2_I(inode)->ip_dyn_features & OCFS2_INLINE_DATA_FL) &&
>>> +	   ((map_start + map_len) <= i_size_read(inode)))
>>> +		goto out_unlock2;
>>> +
>>> +	cpos = map_start >> osb->s_clustersize_bits;
>>> +	mapping_end = ocfs2_clusters_for_bytes(inode->i_sb,
>>> +					       map_start + map_len);
>>> +	is_last = 0;
>>> +	while (cpos < mapping_end && !is_last) {
>>> +		ret = ocfs2_get_clusters_nocache(inode, di_bh, cpos,
>>> +						 NULL, &rec, &is_last);
>>> +		if (ret) {
>>> +			mlog_errno(ret);
>>> +			goto out_unlock2;
>>> +		}
>>> +
>>> +		if (rec.e_blkno == 0ULL)
>>> +			break;
>>> +
>>> +		if (rec.e_flags & OCFS2_EXT_REFCOUNTED)
>>> +			break;
>>> +
>>> +		cpos = le32_to_cpu(rec.e_cpos) +
>>> +			le16_to_cpu(rec.e_leaf_clusters);
>>> +	}
>>> +
>>> +	if (cpos < mapping_end)
>>> +		ret = 1;
>>> +
>>> +out_unlock2:
>>> +	brelse(di_bh);
>>> +
>>> +	up_read(&OCFS2_I(inode)->ip_alloc_sem);
>>> +
>>> +out_unlock1:
>>> +	ocfs2_inode_unlock(inode, 0);
>>> +
>>> +out:
>>> +	return (ret ? 0 : 1);
>>> +}
>>> +
>>>    int ocfs2_seek_data_hole_offset(struct file *file, loff_t *offset, int
>> whence)
>>>    {
>>>    	struct inode *inode = file->f_mapping->host;
>>> diff --git a/fs/ocfs2/extent_map.h b/fs/ocfs2/extent_map.h
>>> index 67ea57d..fd9e86a 100644
>>> --- a/fs/ocfs2/extent_map.h
>>> +++ b/fs/ocfs2/extent_map.h
>>> @@ -53,6 +53,9 @@ int ocfs2_extent_map_get_blocks(struct inode *inode, u64
>> v_blkno, u64 *p_blkno,
>>>    int ocfs2_fiemap(struct inode *inode, struct fiemap_extent_info *fieinfo,
>>>    		 u64 map_start, u64 map_len);
>>>    
>>> +int ocfs2_overwrite_io(struct inode *inode, u64 map_start, u64 map_len,
>>> +		       int wait);
>>> +
>>>    int ocfs2_seek_data_hole_offset(struct file *file, loff_t *offset, int
>> origin);
>>>    
>>>    int ocfs2_xattr_get_clusters(struct inode *inode, u32 v_cluster,
>>>
> 
> 

^ permalink raw reply	[flat|nested] 62+ messages in thread

* [Ocfs2-devel] [PATCH 2/3] ocfs2: add ocfs2_overwrite_io function
@ 2017-11-28  5:48         ` Changwei Ge
  0 siblings, 0 replies; 62+ messages in thread
From: Changwei Ge @ 2017-11-28  5:48 UTC (permalink / raw)
  To: Gang He, jlbec, hch, Goldwyn Rodrigues, mfasheh; +Cc: ocfs2-devel, linux-kernel

On 2017/11/28 13:44, Gang He wrote:
> Hi Changwei,
> 
> 
>>>>
>> Hi,
>> Gang
>>
>> On 2017/11/27 17:48, Gang He wrote:
>>> Add ocfs2_overwrite_io function, which is used to judge if
>>> overwrite allocated blocks, otherwise, the write will bring extra
>>> block allocation overhead.
>>>
>>
>> Can you elaborate how this overhead is introduced?
>> Forgive me, I don't figure it.
> If the blocks have been allocated, we just write these block directly.
> If these blocks have not been allocated, that means we need to allocate these block firstly before write,
> this allocation will bring the IO invoking be blocked, if the upper application does not want take this kind of overhead,
> he can pass a nowait flag to avoid and return immediately with a -EAGAIN error.

Thanks for your answer and contribution.
This makes sense.

Changwei

> 
> Thanks
> Gang
> 
>>
>> Thanks,
>> Changwei
>>
>>> Signed-off-by: Gang He <ghe@suse.com>
>>> ---
>>>    fs/ocfs2/extent_map.c | 67
>> +++++++++++++++++++++++++++++++++++++++++++++++++++
>>>    fs/ocfs2/extent_map.h |  3 +++
>>>    2 files changed, 70 insertions(+)
>>>
>>> diff --git a/fs/ocfs2/extent_map.c b/fs/ocfs2/extent_map.c
>>> index e4719e0..98bf325 100644
>>> --- a/fs/ocfs2/extent_map.c
>>> +++ b/fs/ocfs2/extent_map.c
>>> @@ -832,6 +832,73 @@ int ocfs2_fiemap(struct inode *inode, struct
>> fiemap_extent_info *fieinfo,
>>>    	return ret;
>>>    }
>>>    
>>> +/* Is IO overwriting allocated blocks? */
>>> +int ocfs2_overwrite_io(struct inode *inode, u64 map_start, u64 map_len,
>>> +		       int wait)
>>> +{
>>> +	int ret = 0, is_last;
>>> +	u32 mapping_end, cpos;
>>> +	struct ocfs2_super *osb = OCFS2_SB(inode->i_sb);
>>> +	struct buffer_head *di_bh = NULL;
>>> +	struct ocfs2_extent_rec rec;
>>> +
>>> +	if (wait)
>>> +		ret = ocfs2_inode_lock(inode, &di_bh, 0);
>>> +	else
>>> +		ret = ocfs2_try_inode_lock(inode, &di_bh, 0);
>>> +	if (ret)
>>> +		goto out;
>>> +
>>> +	if (wait)
>>> +		down_read(&OCFS2_I(inode)->ip_alloc_sem);
>>> +	else {
>>> +		if (!down_read_trylock(&OCFS2_I(inode)->ip_alloc_sem)) {
>>> +			ret = -EAGAIN;
>>> +			goto out_unlock1;
>>> +		}
>>> +	}
>>> +
>>> +	if ((OCFS2_I(inode)->ip_dyn_features & OCFS2_INLINE_DATA_FL) &&
>>> +	   ((map_start + map_len) <= i_size_read(inode)))
>>> +		goto out_unlock2;
>>> +
>>> +	cpos = map_start >> osb->s_clustersize_bits;
>>> +	mapping_end = ocfs2_clusters_for_bytes(inode->i_sb,
>>> +					       map_start + map_len);
>>> +	is_last = 0;
>>> +	while (cpos < mapping_end && !is_last) {
>>> +		ret = ocfs2_get_clusters_nocache(inode, di_bh, cpos,
>>> +						 NULL, &rec, &is_last);
>>> +		if (ret) {
>>> +			mlog_errno(ret);
>>> +			goto out_unlock2;
>>> +		}
>>> +
>>> +		if (rec.e_blkno == 0ULL)
>>> +			break;
>>> +
>>> +		if (rec.e_flags & OCFS2_EXT_REFCOUNTED)
>>> +			break;
>>> +
>>> +		cpos = le32_to_cpu(rec.e_cpos) +
>>> +			le16_to_cpu(rec.e_leaf_clusters);
>>> +	}
>>> +
>>> +	if (cpos < mapping_end)
>>> +		ret = 1;
>>> +
>>> +out_unlock2:
>>> +	brelse(di_bh);
>>> +
>>> +	up_read(&OCFS2_I(inode)->ip_alloc_sem);
>>> +
>>> +out_unlock1:
>>> +	ocfs2_inode_unlock(inode, 0);
>>> +
>>> +out:
>>> +	return (ret ? 0 : 1);
>>> +}
>>> +
>>>    int ocfs2_seek_data_hole_offset(struct file *file, loff_t *offset, int
>> whence)
>>>    {
>>>    	struct inode *inode = file->f_mapping->host;
>>> diff --git a/fs/ocfs2/extent_map.h b/fs/ocfs2/extent_map.h
>>> index 67ea57d..fd9e86a 100644
>>> --- a/fs/ocfs2/extent_map.h
>>> +++ b/fs/ocfs2/extent_map.h
>>> @@ -53,6 +53,9 @@ int ocfs2_extent_map_get_blocks(struct inode *inode, u64
>> v_blkno, u64 *p_blkno,
>>>    int ocfs2_fiemap(struct inode *inode, struct fiemap_extent_info *fieinfo,
>>>    		 u64 map_start, u64 map_len);
>>>    
>>> +int ocfs2_overwrite_io(struct inode *inode, u64 map_start, u64 map_len,
>>> +		       int wait);
>>> +
>>>    int ocfs2_seek_data_hole_offset(struct file *file, loff_t *offset, int
>> origin);
>>>    
>>>    int ocfs2_xattr_get_clusters(struct inode *inode, u32 v_cluster,
>>>
> 
> 

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [Ocfs2-devel] [PATCH 3/3] ocfs2: nowait aio support
  2017-11-28  2:51     ` alex chen
@ 2017-11-28  5:59       ` Gang He
  -1 siblings, 0 replies; 62+ messages in thread
From: Gang He @ 2017-11-28  5:59 UTC (permalink / raw)
  To: alex.chen
  Cc: jlbec, hch, ocfs2-devel, Goldwyn Rodrigues, mfasheh, linux-kernel

Hello Alex,


>>> 
> Hi Gang,
> 
> On 2017/11/27 17:46, Gang He wrote:
>> Return EAGAIN if any of the following checks fail for direct I/O:
>> Can not get the related locks immediately,
>> Blocks are not allocated at the write location, it will trigger
>> block allocation and block IO operations.
>> 
>> Signed-off-by: Gang He <ghe@suse.com>
>> ---
>>  fs/ocfs2/dir.c         |  2 +-
>>  fs/ocfs2/dlmglue.c     | 20 ++++++++++----
>>  fs/ocfs2/dlmglue.h     |  2 +-
>>  fs/ocfs2/file.c        | 74 +++++++++++++++++++++++++++++++++++++-------------
>>  fs/ocfs2/mmap.c        |  2 +-
>>  fs/ocfs2/ocfs2_trace.h | 10 ++++---
>>  6 files changed, 79 insertions(+), 31 deletions(-)
>> 
>> diff --git a/fs/ocfs2/dir.c b/fs/ocfs2/dir.c
>> index febe631..ea50901 100644
>> --- a/fs/ocfs2/dir.c
>> +++ b/fs/ocfs2/dir.c
>> @@ -1957,7 +1957,7 @@ int ocfs2_readdir(struct file *file, struct dir_context 
> *ctx)
>>  
>>  	trace_ocfs2_readdir((unsigned long long)OCFS2_I(inode)->ip_blkno);
>>  
>> -	error = ocfs2_inode_lock_atime(inode, file->f_path.mnt, &lock_level);
>> +	error = ocfs2_inode_lock_atime(inode, file->f_path.mnt, &lock_level, 1);
>>  	if (lock_level && error >= 0) {
>>  		/* We release EX lock which used to update atime
>>  		 * and get PR lock again to reduce contention
>> diff --git a/fs/ocfs2/dlmglue.c b/fs/ocfs2/dlmglue.c
>> index 5cfbd04..feb8dbe 100644
>> --- a/fs/ocfs2/dlmglue.c
>> +++ b/fs/ocfs2/dlmglue.c
>> @@ -2516,13 +2516,18 @@ int ocfs2_inode_lock_with_page(struct inode *inode,
>>  
>>  int ocfs2_inode_lock_atime(struct inode *inode,
>>  			  struct vfsmount *vfsmnt,
>> -			  int *level)
>> +			  int *level, int wait)
>>  {
>>  	int ret;
>>  
>> -	ret = ocfs2_inode_lock(inode, NULL, 0);
>> +	if (wait)
>> +		ret = ocfs2_inode_lock(inode, NULL, 0);
>> +	else
>> +		ret = ocfs2_try_inode_lock(inode, NULL, 0);
>> +
>>  	if (ret < 0) {
>> -		mlog_errno(ret);
>> +		if (ret != -EAGAIN)
>> +			mlog_errno(ret);
>>  		return ret;
>>  	}
>>  
>> @@ -2534,9 +2539,14 @@ int ocfs2_inode_lock_atime(struct inode *inode,
>>  		struct buffer_head *bh = NULL;
>>  
>>  		ocfs2_inode_unlock(inode, 0);
>> -		ret = ocfs2_inode_lock(inode, &bh, 1);
>> +		if (wait)
>> +			ret = ocfs2_inode_lock(inode, &bh, 1);
>> +		else
>> +			ret = ocfs2_try_inode_lock(inode, &bh, 1);
>> +
>>  		if (ret < 0) {
>> -			mlog_errno(ret);
>> +			if (ret != -EAGAIN)
>> +				mlog_errno(ret);
>>  			return ret;
>>  		}
>>  		*level = 1;
>> diff --git a/fs/ocfs2/dlmglue.h b/fs/ocfs2/dlmglue.h
>> index 05910fc..c83dbb5 100644
>> --- a/fs/ocfs2/dlmglue.h
>> +++ b/fs/ocfs2/dlmglue.h
>> @@ -123,7 +123,7 @@ void ocfs2_refcount_lock_res_init(struct ocfs2_lock_res 
> *lockres,
>>  void ocfs2_open_unlock(struct inode *inode);
>>  int ocfs2_inode_lock_atime(struct inode *inode,
>>  			  struct vfsmount *vfsmnt,
>> -			  int *level);
>> +			  int *level, int wait);
>>  int ocfs2_inode_lock_full_nested(struct inode *inode,
>>  			 struct buffer_head **ret_bh,
>>  			 int ex,
>> diff --git a/fs/ocfs2/file.c b/fs/ocfs2/file.c
>> index dc455d4..900f04e 100644
>> --- a/fs/ocfs2/file.c
>> +++ b/fs/ocfs2/file.c
>> @@ -140,6 +140,8 @@ static int ocfs2_file_open(struct inode *inode, struct 
> file *file)
>>  		spin_unlock(&oi->ip_lock);
>>  	}
>>  
>> +	file->f_mode |= FMODE_NOWAIT;
>> +
>>  leave:
>>  	return status;
>>  }
>> @@ -2132,8 +2134,7 @@ static int ocfs2_prepare_inode_for_refcount(struct 
> inode *inode,
>>  }
>>  
>>  static int ocfs2_prepare_inode_for_write(struct file *file,
>> -					 loff_t pos,
>> -					 size_t count)
>> +					 loff_t pos, size_t count, int wait)
>>  {
>>  	int ret = 0, meta_level = 0;
>>  	struct dentry *dentry = file->f_path.dentry;
>> @@ -2145,10 +2146,14 @@ static int ocfs2_prepare_inode_for_write(struct file 
> *file,
>>  	 * if we need to make modifications here.
>>  	 */
>>  	for(;;) {
>> -		ret = ocfs2_inode_lock(inode, NULL, meta_level);
>> +		if (wait)
>> +			ret = ocfs2_inode_lock(inode, NULL, meta_level);
>> +		else
>> +			ret = ocfs2_try_inode_lock(inode, NULL, meta_level);
>>  		if (ret < 0) {
>>  			meta_level = -1;
>> -			mlog_errno(ret);
>> +			if (ret != -EAGAIN)
>> +				mlog_errno(ret);
>>  			goto out;
>>  		}
>>
> 
> We will lock inode again in 
> ocfs2_prepare_inode_for_write()->ocfs2_prepare_inode_for_refcount().
> Should we add the check of 'nowait' flags?
I think ocfs2_overwrite_io() function can make sure ocfs2_prepare_inode_for_refcount() is passed,
but it looks there is a race condition between ocfs2_overwrite_io() and ocfs2_prepare_inode_for_write() since inode lock is released.
I think I will move ocfs2_overwrite_io() function invoking in ocfs2_prepare_inode_for_write() to avoid this race.

Thanks
Gang


> 
>> @@ -2199,7 +2204,7 @@ static int ocfs2_prepare_inode_for_write(struct file 
> *file,
>>  
>>  out_unlock:
>>  	trace_ocfs2_prepare_inode_for_write(OCFS2_I(inode)->ip_blkno,
>> -					    pos, count);
>> +					    pos, count, wait);
>>  
>>  	if (meta_level >= 0)
>>  		ocfs2_inode_unlock(inode, meta_level);
>> @@ -2211,7 +2216,7 @@ static int ocfs2_prepare_inode_for_write(struct file 
> *file,
>>  static ssize_t ocfs2_file_write_iter(struct kiocb *iocb,
>>  				    struct iov_iter *from)
>>  {
>> -	int direct_io, rw_level;
>> +	int rw_level;
>>  	ssize_t written = 0;
>>  	ssize_t ret;
>>  	size_t count = iov_iter_count(from);
>> @@ -2223,6 +2228,8 @@ static ssize_t ocfs2_file_write_iter(struct kiocb 
> *iocb,
>>  	void *saved_ki_complete = NULL;
>>  	int append_write = ((iocb->ki_pos + count) >=
>>  			i_size_read(inode) ? 1 : 0);
>> +	int direct_io = iocb->ki_flags & IOCB_DIRECT ? 1 : 0;
>> +	int nowait = iocb->ki_flags & IOCB_NOWAIT ? 1 : 0;
>>  
>>  	trace_ocfs2_file_aio_write(inode, file, file->f_path.dentry,
>>  		(unsigned long long)OCFS2_I(inode)->ip_blkno,
>> @@ -2230,12 +2237,17 @@ static ssize_t ocfs2_file_write_iter(struct kiocb 
> *iocb,
>>  		file->f_path.dentry->d_name.name,
>>  		(unsigned int)from->nr_segs);	/* GRRRRR */
>>  
>> +	if (!direct_io && nowait)
>> +		return -EOPNOTSUPP;
>> +
>>  	if (count == 0)
>>  		return 0;
>>  
>> -	direct_io = iocb->ki_flags & IOCB_DIRECT ? 1 : 0;
>> -
>> -	inode_lock(inode);
>> +	if (direct_io && nowait) {
>> +		if (!inode_trylock(inode))
>> +			return -EAGAIN;
>> +	} else
>> +		inode_lock(inode);
>>  
>>  	/*
>>  	 * Concurrent O_DIRECT writes are allowed with
>> @@ -2244,9 +2256,13 @@ static ssize_t ocfs2_file_write_iter(struct kiocb 
> *iocb,
>>  	 */
>>  	rw_level = (!direct_io || full_coherency || append_write);
>>  
>> -	ret = ocfs2_rw_lock(inode, rw_level);
>> +	if (direct_io && nowait)
>> +		ret = ocfs2_try_rw_lock(inode, rw_level);
>> +	else
>> +		ret = ocfs2_rw_lock(inode, rw_level);
>>  	if (ret < 0) {
>> -		mlog_errno(ret);
>> +		if (ret != -EAGAIN)
>> +			mlog_errno(ret);
>>  		goto out_mutex;
>>  	}
>>  
>> @@ -2260,9 +2276,13 @@ static ssize_t ocfs2_file_write_iter(struct kiocb 
> *iocb,
>>  		 * other nodes to drop their caches.  Buffered I/O
>>  		 * already does this in write_begin().
>>  		 */
>> -		ret = ocfs2_inode_lock(inode, NULL, 1);
>> +		if (nowait)
>> +			ret = ocfs2_try_inode_lock(inode, NULL, 1);
>> +		else
>> +			ret = ocfs2_inode_lock(inode, NULL, 1);
>>  		if (ret < 0) {
>> -			mlog_errno(ret);
>> +			if (ret != -EAGAIN)
>> +				mlog_errno(ret);
>>  			goto out;
>>  		}
>>  
>> @@ -2277,9 +2297,17 @@ static ssize_t ocfs2_file_write_iter(struct kiocb 
> *iocb,
>>  	}
>>  	count = ret;
>>  
>> -	ret = ocfs2_prepare_inode_for_write(file, iocb->ki_pos, count);
>> +	if (direct_io && nowait) {
>> +		if (!ocfs2_overwrite_io(inode, iocb->ki_pos, count, 0)) {
>> +			ret = -EAGAIN;
>> +			goto out;
>> +		}
>> +	}
>> +
>> +	ret = ocfs2_prepare_inode_for_write(file, iocb->ki_pos, count, !nowait);
>>  	if (ret < 0) {
>> -		mlog_errno(ret);
>> +		if (ret != -EAGAIN)
>> +			mlog_errno(ret);
>>  		goto out;
>>  	}
>>  
>> @@ -2355,6 +2383,7 @@ static ssize_t ocfs2_file_read_iter(struct kiocb *iocb,
>>  	int ret = 0, rw_level = -1, lock_level = 0;
>>  	struct file *filp = iocb->ki_filp;
>>  	struct inode *inode = file_inode(filp);
>> +	int nowait = iocb->ki_flags & IOCB_NOWAIT ? 1 : 0;
>>  
>>  	trace_ocfs2_file_aio_read(inode, filp, filp->f_path.dentry,
>>  			(unsigned long long)OCFS2_I(inode)->ip_blkno,
>> @@ -2374,9 +2403,14 @@ static ssize_t ocfs2_file_read_iter(struct kiocb 
> *iocb,
>>  	 * need locks to protect pending reads from racing with truncate.
>>  	 */
>>  	if (iocb->ki_flags & IOCB_DIRECT) {
>> -		ret = ocfs2_rw_lock(inode, 0);
>> +		if (nowait)
>> +			ret = ocfs2_try_rw_lock(inode, 0);
>> +		else
>> +			ret = ocfs2_rw_lock(inode, 0);
>> +
>>  		if (ret < 0) {
>> -			mlog_errno(ret);
>> +			if (ret != -EAGAIN)
>> +				mlog_errno(ret);
>>  			goto bail;
>>  		}
>>  		rw_level = 0;
>> @@ -2393,9 +2427,11 @@ static ssize_t ocfs2_file_read_iter(struct kiocb 
> *iocb,
>>  	 * like i_size. This allows the checks down below
>>  	 * generic_file_aio_read() a chance of actually working.
>>  	 */
>> -	ret = ocfs2_inode_lock_atime(inode, filp->f_path.mnt, &lock_level);
>> +	ret = ocfs2_inode_lock_atime(inode, filp->f_path.mnt, &lock_level,
>> +				     !nowait);
> 
> Should we judge if the flags is included O_DIRECT?
> 
>>  	if (ret < 0) {
>> -		mlog_errno(ret);
>> +		if (ret != -EAGAIN)
>> +			mlog_errno(ret);
>>  		goto bail;
>>  	}
>>  	ocfs2_inode_unlock(inode, lock_level);
>> diff --git a/fs/ocfs2/mmap.c b/fs/ocfs2/mmap.c
>> index 098f5c7..fb9a20e 100644
>> --- a/fs/ocfs2/mmap.c
>> +++ b/fs/ocfs2/mmap.c
>> @@ -184,7 +184,7 @@ int ocfs2_mmap(struct file *file, struct vm_area_struct 
> *vma)
>>  	int ret = 0, lock_level = 0;
>>  
>>  	ret = ocfs2_inode_lock_atime(file_inode(file),
>> -				    file->f_path.mnt, &lock_level);
>> +				    file->f_path.mnt, &lock_level, 1);
>>  	if (ret < 0) {
>>  		mlog_errno(ret);
>>  		goto out;
>> diff --git a/fs/ocfs2/ocfs2_trace.h b/fs/ocfs2/ocfs2_trace.h
>> index a0b5d00..e2a11aa 100644
>> --- a/fs/ocfs2/ocfs2_trace.h
>> +++ b/fs/ocfs2/ocfs2_trace.h
>> @@ -1449,20 +1449,22 @@
>>  
>>  TRACE_EVENT(ocfs2_prepare_inode_for_write,
>>  	TP_PROTO(unsigned long long ino, unsigned long long saved_pos,
>> -		 unsigned long count),
>> -	TP_ARGS(ino, saved_pos, count),
>> +		 unsigned long count, int wait),
>> +	TP_ARGS(ino, saved_pos, count, wait),
>>  	TP_STRUCT__entry(
>>  		__field(unsigned long long, ino)
>>  		__field(unsigned long long, saved_pos)
>>  		__field(unsigned long, count)
>> +		__field(int, wait)
>>  	),
>>  	TP_fast_assign(
>>  		__entry->ino = ino;
>>  		__entry->saved_pos = saved_pos;
>>  		__entry->count = count;
>> +		__entry->wait = wait;
>>  	),
>> -	TP_printk("%llu %llu %lu", __entry->ino,
>> -		  __entry->saved_pos, __entry->count)
>> +	TP_printk("%llu %llu %lu %d", __entry->ino,
>> +		  __entry->saved_pos, __entry->count, __entry->wait)
>>  );
>>  
>>  DEFINE_OCFS2_INT_EVENT(generic_file_aio_read_ret);
>> 

^ permalink raw reply	[flat|nested] 62+ messages in thread

* [Ocfs2-devel] [PATCH 3/3] ocfs2: nowait aio support
@ 2017-11-28  5:59       ` Gang He
  0 siblings, 0 replies; 62+ messages in thread
From: Gang He @ 2017-11-28  5:59 UTC (permalink / raw)
  To: alex.chen
  Cc: jlbec, hch, ocfs2-devel, Goldwyn Rodrigues, mfasheh, linux-kernel

Hello Alex,


>>> 
> Hi Gang,
> 
> On 2017/11/27 17:46, Gang He wrote:
>> Return EAGAIN if any of the following checks fail for direct I/O:
>> Can not get the related locks immediately,
>> Blocks are not allocated at the write location, it will trigger
>> block allocation and block IO operations.
>> 
>> Signed-off-by: Gang He <ghe@suse.com>
>> ---
>>  fs/ocfs2/dir.c         |  2 +-
>>  fs/ocfs2/dlmglue.c     | 20 ++++++++++----
>>  fs/ocfs2/dlmglue.h     |  2 +-
>>  fs/ocfs2/file.c        | 74 +++++++++++++++++++++++++++++++++++++-------------
>>  fs/ocfs2/mmap.c        |  2 +-
>>  fs/ocfs2/ocfs2_trace.h | 10 ++++---
>>  6 files changed, 79 insertions(+), 31 deletions(-)
>> 
>> diff --git a/fs/ocfs2/dir.c b/fs/ocfs2/dir.c
>> index febe631..ea50901 100644
>> --- a/fs/ocfs2/dir.c
>> +++ b/fs/ocfs2/dir.c
>> @@ -1957,7 +1957,7 @@ int ocfs2_readdir(struct file *file, struct dir_context 
> *ctx)
>>  
>>  	trace_ocfs2_readdir((unsigned long long)OCFS2_I(inode)->ip_blkno);
>>  
>> -	error = ocfs2_inode_lock_atime(inode, file->f_path.mnt, &lock_level);
>> +	error = ocfs2_inode_lock_atime(inode, file->f_path.mnt, &lock_level, 1);
>>  	if (lock_level && error >= 0) {
>>  		/* We release EX lock which used to update atime
>>  		 * and get PR lock again to reduce contention
>> diff --git a/fs/ocfs2/dlmglue.c b/fs/ocfs2/dlmglue.c
>> index 5cfbd04..feb8dbe 100644
>> --- a/fs/ocfs2/dlmglue.c
>> +++ b/fs/ocfs2/dlmglue.c
>> @@ -2516,13 +2516,18 @@ int ocfs2_inode_lock_with_page(struct inode *inode,
>>  
>>  int ocfs2_inode_lock_atime(struct inode *inode,
>>  			  struct vfsmount *vfsmnt,
>> -			  int *level)
>> +			  int *level, int wait)
>>  {
>>  	int ret;
>>  
>> -	ret = ocfs2_inode_lock(inode, NULL, 0);
>> +	if (wait)
>> +		ret = ocfs2_inode_lock(inode, NULL, 0);
>> +	else
>> +		ret = ocfs2_try_inode_lock(inode, NULL, 0);
>> +
>>  	if (ret < 0) {
>> -		mlog_errno(ret);
>> +		if (ret != -EAGAIN)
>> +			mlog_errno(ret);
>>  		return ret;
>>  	}
>>  
>> @@ -2534,9 +2539,14 @@ int ocfs2_inode_lock_atime(struct inode *inode,
>>  		struct buffer_head *bh = NULL;
>>  
>>  		ocfs2_inode_unlock(inode, 0);
>> -		ret = ocfs2_inode_lock(inode, &bh, 1);
>> +		if (wait)
>> +			ret = ocfs2_inode_lock(inode, &bh, 1);
>> +		else
>> +			ret = ocfs2_try_inode_lock(inode, &bh, 1);
>> +
>>  		if (ret < 0) {
>> -			mlog_errno(ret);
>> +			if (ret != -EAGAIN)
>> +				mlog_errno(ret);
>>  			return ret;
>>  		}
>>  		*level = 1;
>> diff --git a/fs/ocfs2/dlmglue.h b/fs/ocfs2/dlmglue.h
>> index 05910fc..c83dbb5 100644
>> --- a/fs/ocfs2/dlmglue.h
>> +++ b/fs/ocfs2/dlmglue.h
>> @@ -123,7 +123,7 @@ void ocfs2_refcount_lock_res_init(struct ocfs2_lock_res 
> *lockres,
>>  void ocfs2_open_unlock(struct inode *inode);
>>  int ocfs2_inode_lock_atime(struct inode *inode,
>>  			  struct vfsmount *vfsmnt,
>> -			  int *level);
>> +			  int *level, int wait);
>>  int ocfs2_inode_lock_full_nested(struct inode *inode,
>>  			 struct buffer_head **ret_bh,
>>  			 int ex,
>> diff --git a/fs/ocfs2/file.c b/fs/ocfs2/file.c
>> index dc455d4..900f04e 100644
>> --- a/fs/ocfs2/file.c
>> +++ b/fs/ocfs2/file.c
>> @@ -140,6 +140,8 @@ static int ocfs2_file_open(struct inode *inode, struct 
> file *file)
>>  		spin_unlock(&oi->ip_lock);
>>  	}
>>  
>> +	file->f_mode |= FMODE_NOWAIT;
>> +
>>  leave:
>>  	return status;
>>  }
>> @@ -2132,8 +2134,7 @@ static int ocfs2_prepare_inode_for_refcount(struct 
> inode *inode,
>>  }
>>  
>>  static int ocfs2_prepare_inode_for_write(struct file *file,
>> -					 loff_t pos,
>> -					 size_t count)
>> +					 loff_t pos, size_t count, int wait)
>>  {
>>  	int ret = 0, meta_level = 0;
>>  	struct dentry *dentry = file->f_path.dentry;
>> @@ -2145,10 +2146,14 @@ static int ocfs2_prepare_inode_for_write(struct file 
> *file,
>>  	 * if we need to make modifications here.
>>  	 */
>>  	for(;;) {
>> -		ret = ocfs2_inode_lock(inode, NULL, meta_level);
>> +		if (wait)
>> +			ret = ocfs2_inode_lock(inode, NULL, meta_level);
>> +		else
>> +			ret = ocfs2_try_inode_lock(inode, NULL, meta_level);
>>  		if (ret < 0) {
>>  			meta_level = -1;
>> -			mlog_errno(ret);
>> +			if (ret != -EAGAIN)
>> +				mlog_errno(ret);
>>  			goto out;
>>  		}
>>
> 
> We will lock inode again in 
> ocfs2_prepare_inode_for_write()->ocfs2_prepare_inode_for_refcount().
> Should we add the check of 'nowait' flags?
I think ocfs2_overwrite_io() function can make sure ocfs2_prepare_inode_for_refcount() is passed,
but it looks there is a race condition between ocfs2_overwrite_io() and ocfs2_prepare_inode_for_write() since inode lock is released.
I think I will move ocfs2_overwrite_io() function invoking in ocfs2_prepare_inode_for_write() to avoid this race.

Thanks
Gang


> 
>> @@ -2199,7 +2204,7 @@ static int ocfs2_prepare_inode_for_write(struct file 
> *file,
>>  
>>  out_unlock:
>>  	trace_ocfs2_prepare_inode_for_write(OCFS2_I(inode)->ip_blkno,
>> -					    pos, count);
>> +					    pos, count, wait);
>>  
>>  	if (meta_level >= 0)
>>  		ocfs2_inode_unlock(inode, meta_level);
>> @@ -2211,7 +2216,7 @@ static int ocfs2_prepare_inode_for_write(struct file 
> *file,
>>  static ssize_t ocfs2_file_write_iter(struct kiocb *iocb,
>>  				    struct iov_iter *from)
>>  {
>> -	int direct_io, rw_level;
>> +	int rw_level;
>>  	ssize_t written = 0;
>>  	ssize_t ret;
>>  	size_t count = iov_iter_count(from);
>> @@ -2223,6 +2228,8 @@ static ssize_t ocfs2_file_write_iter(struct kiocb 
> *iocb,
>>  	void *saved_ki_complete = NULL;
>>  	int append_write = ((iocb->ki_pos + count) >=
>>  			i_size_read(inode) ? 1 : 0);
>> +	int direct_io = iocb->ki_flags & IOCB_DIRECT ? 1 : 0;
>> +	int nowait = iocb->ki_flags & IOCB_NOWAIT ? 1 : 0;
>>  
>>  	trace_ocfs2_file_aio_write(inode, file, file->f_path.dentry,
>>  		(unsigned long long)OCFS2_I(inode)->ip_blkno,
>> @@ -2230,12 +2237,17 @@ static ssize_t ocfs2_file_write_iter(struct kiocb 
> *iocb,
>>  		file->f_path.dentry->d_name.name,
>>  		(unsigned int)from->nr_segs);	/* GRRRRR */
>>  
>> +	if (!direct_io && nowait)
>> +		return -EOPNOTSUPP;
>> +
>>  	if (count == 0)
>>  		return 0;
>>  
>> -	direct_io = iocb->ki_flags & IOCB_DIRECT ? 1 : 0;
>> -
>> -	inode_lock(inode);
>> +	if (direct_io && nowait) {
>> +		if (!inode_trylock(inode))
>> +			return -EAGAIN;
>> +	} else
>> +		inode_lock(inode);
>>  
>>  	/*
>>  	 * Concurrent O_DIRECT writes are allowed with
>> @@ -2244,9 +2256,13 @@ static ssize_t ocfs2_file_write_iter(struct kiocb 
> *iocb,
>>  	 */
>>  	rw_level = (!direct_io || full_coherency || append_write);
>>  
>> -	ret = ocfs2_rw_lock(inode, rw_level);
>> +	if (direct_io && nowait)
>> +		ret = ocfs2_try_rw_lock(inode, rw_level);
>> +	else
>> +		ret = ocfs2_rw_lock(inode, rw_level);
>>  	if (ret < 0) {
>> -		mlog_errno(ret);
>> +		if (ret != -EAGAIN)
>> +			mlog_errno(ret);
>>  		goto out_mutex;
>>  	}
>>  
>> @@ -2260,9 +2276,13 @@ static ssize_t ocfs2_file_write_iter(struct kiocb 
> *iocb,
>>  		 * other nodes to drop their caches.  Buffered I/O
>>  		 * already does this in write_begin().
>>  		 */
>> -		ret = ocfs2_inode_lock(inode, NULL, 1);
>> +		if (nowait)
>> +			ret = ocfs2_try_inode_lock(inode, NULL, 1);
>> +		else
>> +			ret = ocfs2_inode_lock(inode, NULL, 1);
>>  		if (ret < 0) {
>> -			mlog_errno(ret);
>> +			if (ret != -EAGAIN)
>> +				mlog_errno(ret);
>>  			goto out;
>>  		}
>>  
>> @@ -2277,9 +2297,17 @@ static ssize_t ocfs2_file_write_iter(struct kiocb 
> *iocb,
>>  	}
>>  	count = ret;
>>  
>> -	ret = ocfs2_prepare_inode_for_write(file, iocb->ki_pos, count);
>> +	if (direct_io && nowait) {
>> +		if (!ocfs2_overwrite_io(inode, iocb->ki_pos, count, 0)) {
>> +			ret = -EAGAIN;
>> +			goto out;
>> +		}
>> +	}
>> +
>> +	ret = ocfs2_prepare_inode_for_write(file, iocb->ki_pos, count, !nowait);
>>  	if (ret < 0) {
>> -		mlog_errno(ret);
>> +		if (ret != -EAGAIN)
>> +			mlog_errno(ret);
>>  		goto out;
>>  	}
>>  
>> @@ -2355,6 +2383,7 @@ static ssize_t ocfs2_file_read_iter(struct kiocb *iocb,
>>  	int ret = 0, rw_level = -1, lock_level = 0;
>>  	struct file *filp = iocb->ki_filp;
>>  	struct inode *inode = file_inode(filp);
>> +	int nowait = iocb->ki_flags & IOCB_NOWAIT ? 1 : 0;
>>  
>>  	trace_ocfs2_file_aio_read(inode, filp, filp->f_path.dentry,
>>  			(unsigned long long)OCFS2_I(inode)->ip_blkno,
>> @@ -2374,9 +2403,14 @@ static ssize_t ocfs2_file_read_iter(struct kiocb 
> *iocb,
>>  	 * need locks to protect pending reads from racing with truncate.
>>  	 */
>>  	if (iocb->ki_flags & IOCB_DIRECT) {
>> -		ret = ocfs2_rw_lock(inode, 0);
>> +		if (nowait)
>> +			ret = ocfs2_try_rw_lock(inode, 0);
>> +		else
>> +			ret = ocfs2_rw_lock(inode, 0);
>> +
>>  		if (ret < 0) {
>> -			mlog_errno(ret);
>> +			if (ret != -EAGAIN)
>> +				mlog_errno(ret);
>>  			goto bail;
>>  		}
>>  		rw_level = 0;
>> @@ -2393,9 +2427,11 @@ static ssize_t ocfs2_file_read_iter(struct kiocb 
> *iocb,
>>  	 * like i_size. This allows the checks down below
>>  	 * generic_file_aio_read() a chance of actually working.
>>  	 */
>> -	ret = ocfs2_inode_lock_atime(inode, filp->f_path.mnt, &lock_level);
>> +	ret = ocfs2_inode_lock_atime(inode, filp->f_path.mnt, &lock_level,
>> +				     !nowait);
> 
> Should we judge if the flags is included O_DIRECT?
> 
>>  	if (ret < 0) {
>> -		mlog_errno(ret);
>> +		if (ret != -EAGAIN)
>> +			mlog_errno(ret);
>>  		goto bail;
>>  	}
>>  	ocfs2_inode_unlock(inode, lock_level);
>> diff --git a/fs/ocfs2/mmap.c b/fs/ocfs2/mmap.c
>> index 098f5c7..fb9a20e 100644
>> --- a/fs/ocfs2/mmap.c
>> +++ b/fs/ocfs2/mmap.c
>> @@ -184,7 +184,7 @@ int ocfs2_mmap(struct file *file, struct vm_area_struct 
> *vma)
>>  	int ret = 0, lock_level = 0;
>>  
>>  	ret = ocfs2_inode_lock_atime(file_inode(file),
>> -				    file->f_path.mnt, &lock_level);
>> +				    file->f_path.mnt, &lock_level, 1);
>>  	if (ret < 0) {
>>  		mlog_errno(ret);
>>  		goto out;
>> diff --git a/fs/ocfs2/ocfs2_trace.h b/fs/ocfs2/ocfs2_trace.h
>> index a0b5d00..e2a11aa 100644
>> --- a/fs/ocfs2/ocfs2_trace.h
>> +++ b/fs/ocfs2/ocfs2_trace.h
>> @@ -1449,20 +1449,22 @@
>>  
>>  TRACE_EVENT(ocfs2_prepare_inode_for_write,
>>  	TP_PROTO(unsigned long long ino, unsigned long long saved_pos,
>> -		 unsigned long count),
>> -	TP_ARGS(ino, saved_pos, count),
>> +		 unsigned long count, int wait),
>> +	TP_ARGS(ino, saved_pos, count, wait),
>>  	TP_STRUCT__entry(
>>  		__field(unsigned long long, ino)
>>  		__field(unsigned long long, saved_pos)
>>  		__field(unsigned long, count)
>> +		__field(int, wait)
>>  	),
>>  	TP_fast_assign(
>>  		__entry->ino = ino;
>>  		__entry->saved_pos = saved_pos;
>>  		__entry->count = count;
>> +		__entry->wait = wait;
>>  	),
>> -	TP_printk("%llu %llu %lu", __entry->ino,
>> -		  __entry->saved_pos, __entry->count)
>> +	TP_printk("%llu %llu %lu %d", __entry->ino,
>> +		  __entry->saved_pos, __entry->count, __entry->wait)
>>  );
>>  
>>  DEFINE_OCFS2_INT_EVENT(generic_file_aio_read_ret);
>> 

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [Ocfs2-devel] [PATCH 2/3] ocfs2: add ocfs2_overwrite_io function
  2017-11-28  5:33       ` Gang He
@ 2017-11-28  6:19         ` alex chen
  -1 siblings, 0 replies; 62+ messages in thread
From: alex chen @ 2017-11-28  6:19 UTC (permalink / raw)
  To: Gang He; +Cc: jlbec, hch, ocfs2-devel, Goldwyn Rodrigues, mfasheh, linux-kernel

Hi Gang,

On 2017/11/28 13:33, Gang He wrote:
> Hello Alex,
> 
> 
>>>>
>> Hi Gang,
>>
>> On 2017/11/27 17:46, Gang He wrote:
>>> Add ocfs2_overwrite_io function, which is used to judge if
>>> overwrite allocated blocks, otherwise, the write will bring extra
>>> block allocation overhead.
>>>
>>> Signed-off-by: Gang He <ghe@suse.com>
>>> ---
>>>  fs/ocfs2/extent_map.c | 67 
>> +++++++++++++++++++++++++++++++++++++++++++++++++++
>>>  fs/ocfs2/extent_map.h |  3 +++
>>>  2 files changed, 70 insertions(+)
>>>
>>> diff --git a/fs/ocfs2/extent_map.c b/fs/ocfs2/extent_map.c
>>> index e4719e0..98bf325 100644
>>> --- a/fs/ocfs2/extent_map.c
>>> +++ b/fs/ocfs2/extent_map.c
>>> @@ -832,6 +832,73 @@ int ocfs2_fiemap(struct inode *inode, struct 
>> fiemap_extent_info *fieinfo,
>>>  	return ret;
>>>  }
>>>  
>>> +/* Is IO overwriting allocated blocks? */
>>> +int ocfs2_overwrite_io(struct inode *inode, u64 map_start, u64 map_len,
>>> +		       int wait)
>>> +{
>>> +	int ret = 0, is_last;
>>> +	u32 mapping_end, cpos;
>>> +	struct ocfs2_super *osb = OCFS2_SB(inode->i_sb);
>>> +	struct buffer_head *di_bh = NULL;
>>> +	struct ocfs2_extent_rec rec;
>>> +
>>> +	if (wait)
>>> +		ret = ocfs2_inode_lock(inode, &di_bh, 0);
>>> +	else
>>> +		ret = ocfs2_try_inode_lock(inode, &di_bh, 0);
>>> +	if (ret)
>>> +		goto out;
>>> +
>>> +	if (wait)
>>> +		down_read(&OCFS2_I(inode)->ip_alloc_sem);
>>> +	else {
>>> +		if (!down_read_trylock(&OCFS2_I(inode)->ip_alloc_sem)) {
>>> +			ret = -EAGAIN;
>>> +			goto out_unlock1;
>>> +		}
>>> +	}
>>> +
>>> +	if ((OCFS2_I(inode)->ip_dyn_features & OCFS2_INLINE_DATA_FL) &&
>>> +	   ((map_start + map_len) <= i_size_read(inode)))
>>> +		goto out_unlock2;
>>> +
>>> +	cpos = map_start >> osb->s_clustersize_bits;
>>> +	mapping_end = ocfs2_clusters_for_bytes(inode->i_sb,
>>> +					       map_start + map_len);
>>> +	is_last = 0;
>>> +	while (cpos < mapping_end && !is_last) {
>>> +		ret = ocfs2_get_clusters_nocache(inode, di_bh, cpos,
>>> +						 NULL, &rec, &is_last);
>>> +		if (ret) {
>>> +			mlog_errno(ret);
>>> +			goto out_unlock2;
>>> +		}
>>> +
>>> +		if (rec.e_blkno == 0ULL)
>>> +			break;
>> I think here the blocks is not overwrite, because the hold is found and the 
>> blocks
>> should be allocated.
> If the rec.e_blkno == NULL, this means there is a hole.
> The file hole means that these blocks are not allocated, it does not like unwritten block.
> The unwritten blocks means that these blocks are allocated, but still have not been unwritten. 
> 
If we break the loop when we find the hold, out of this function we will allocate the blocks in
ocfs2_file_write_iter()->..->ocfs2_direct_IO->__blockdev_direct_IO->..->ocfs2_dio_wr_get_block()
->ocfs2_write_begin_nolock. Does this violate the semantics of 'IOCB_NOWAIT';

BTW, should we consider the down_write() and ocfs2_inode_lock() in ocfs2_dio_wr_get_block() when
the flag 'IOCB_NOWAIT' is set;

>>> +
>>> +		if (rec.e_flags & OCFS2_EXT_REFCOUNTED)
>>> +			break;
>>> +
>>> +		cpos = le32_to_cpu(rec.e_cpos) +
>>> +			le16_to_cpu(rec.e_leaf_clusters);
>>> +	}
>>> +
>>> +	if (cpos < mapping_end)
>>> +		ret = 1;
>>> +
>>> +out_unlock2:
>>
>> I think the 'out_up_read' is more readable than the 'out_unlock2' .
> Ok, I will use more readable tag here.
>>
>>> +	brelse(di_bh);
>>> +
>>> +	up_read(&OCFS2_I(inode)->ip_alloc_sem);
>>> +
>>> +out_unlock1:
>>
>> We should release buffer head here.
>>
>>> +	ocfs2_inode_unlock(inode, 0);
>>> +
>>> +out:
>>> +	return (ret ? 0 : 1);
>>> +}
>>> +
>>>  int ocfs2_seek_data_hole_offset(struct file *file, loff_t *offset, int 
>> whence)
>>>  {
>>>  	struct inode *inode = file->f_mapping->host;
>>> diff --git a/fs/ocfs2/extent_map.h b/fs/ocfs2/extent_map.h
>>> index 67ea57d..fd9e86a 100644
>>> --- a/fs/ocfs2/extent_map.h
>>> +++ b/fs/ocfs2/extent_map.h
>>> @@ -53,6 +53,9 @@ int ocfs2_extent_map_get_blocks(struct inode *inode, u64 
>> v_blkno, u64 *p_blkno,
>>>  int ocfs2_fiemap(struct inode *inode, struct fiemap_extent_info *fieinfo,
>>>  		 u64 map_start, u64 map_len);
>>>  
>>> +int ocfs2_overwrite_io(struct inode *inode, u64 map_start, u64 map_len,
>>> +		       int wait);
>>> +
>>>  int ocfs2_seek_data_hole_offset(struct file *file, loff_t *offset, int 
>> origin);
>>>  
>>>  int ocfs2_xattr_get_clusters(struct inode *inode, u32 v_cluster,
>>>
> 
> 
> .
> 

^ permalink raw reply	[flat|nested] 62+ messages in thread

* [Ocfs2-devel] [PATCH 2/3] ocfs2: add ocfs2_overwrite_io function
@ 2017-11-28  6:19         ` alex chen
  0 siblings, 0 replies; 62+ messages in thread
From: alex chen @ 2017-11-28  6:19 UTC (permalink / raw)
  To: Gang He; +Cc: jlbec, hch, ocfs2-devel, Goldwyn Rodrigues, mfasheh, linux-kernel

Hi Gang,

On 2017/11/28 13:33, Gang He wrote:
> Hello Alex,
> 
> 
>>>>
>> Hi Gang,
>>
>> On 2017/11/27 17:46, Gang He wrote:
>>> Add ocfs2_overwrite_io function, which is used to judge if
>>> overwrite allocated blocks, otherwise, the write will bring extra
>>> block allocation overhead.
>>>
>>> Signed-off-by: Gang He <ghe@suse.com>
>>> ---
>>>  fs/ocfs2/extent_map.c | 67 
>> +++++++++++++++++++++++++++++++++++++++++++++++++++
>>>  fs/ocfs2/extent_map.h |  3 +++
>>>  2 files changed, 70 insertions(+)
>>>
>>> diff --git a/fs/ocfs2/extent_map.c b/fs/ocfs2/extent_map.c
>>> index e4719e0..98bf325 100644
>>> --- a/fs/ocfs2/extent_map.c
>>> +++ b/fs/ocfs2/extent_map.c
>>> @@ -832,6 +832,73 @@ int ocfs2_fiemap(struct inode *inode, struct 
>> fiemap_extent_info *fieinfo,
>>>  	return ret;
>>>  }
>>>  
>>> +/* Is IO overwriting allocated blocks? */
>>> +int ocfs2_overwrite_io(struct inode *inode, u64 map_start, u64 map_len,
>>> +		       int wait)
>>> +{
>>> +	int ret = 0, is_last;
>>> +	u32 mapping_end, cpos;
>>> +	struct ocfs2_super *osb = OCFS2_SB(inode->i_sb);
>>> +	struct buffer_head *di_bh = NULL;
>>> +	struct ocfs2_extent_rec rec;
>>> +
>>> +	if (wait)
>>> +		ret = ocfs2_inode_lock(inode, &di_bh, 0);
>>> +	else
>>> +		ret = ocfs2_try_inode_lock(inode, &di_bh, 0);
>>> +	if (ret)
>>> +		goto out;
>>> +
>>> +	if (wait)
>>> +		down_read(&OCFS2_I(inode)->ip_alloc_sem);
>>> +	else {
>>> +		if (!down_read_trylock(&OCFS2_I(inode)->ip_alloc_sem)) {
>>> +			ret = -EAGAIN;
>>> +			goto out_unlock1;
>>> +		}
>>> +	}
>>> +
>>> +	if ((OCFS2_I(inode)->ip_dyn_features & OCFS2_INLINE_DATA_FL) &&
>>> +	   ((map_start + map_len) <= i_size_read(inode)))
>>> +		goto out_unlock2;
>>> +
>>> +	cpos = map_start >> osb->s_clustersize_bits;
>>> +	mapping_end = ocfs2_clusters_for_bytes(inode->i_sb,
>>> +					       map_start + map_len);
>>> +	is_last = 0;
>>> +	while (cpos < mapping_end && !is_last) {
>>> +		ret = ocfs2_get_clusters_nocache(inode, di_bh, cpos,
>>> +						 NULL, &rec, &is_last);
>>> +		if (ret) {
>>> +			mlog_errno(ret);
>>> +			goto out_unlock2;
>>> +		}
>>> +
>>> +		if (rec.e_blkno == 0ULL)
>>> +			break;
>> I think here the blocks is not overwrite, because the hold is found and the 
>> blocks
>> should be allocated.
> If the rec.e_blkno == NULL, this means there is a hole.
> The file hole means that these blocks are not allocated, it does not like unwritten block.
> The unwritten blocks means that these blocks are allocated, but still have not been unwritten. 
> 
If we break the loop when we find the hold, out of this function we will allocate the blocks in
ocfs2_file_write_iter()->..->ocfs2_direct_IO->__blockdev_direct_IO->..->ocfs2_dio_wr_get_block()
->ocfs2_write_begin_nolock. Does this violate the semantics of 'IOCB_NOWAIT';

BTW, should we consider the down_write() and ocfs2_inode_lock() in ocfs2_dio_wr_get_block() when
the flag 'IOCB_NOWAIT' is set;

>>> +
>>> +		if (rec.e_flags & OCFS2_EXT_REFCOUNTED)
>>> +			break;
>>> +
>>> +		cpos = le32_to_cpu(rec.e_cpos) +
>>> +			le16_to_cpu(rec.e_leaf_clusters);
>>> +	}
>>> +
>>> +	if (cpos < mapping_end)
>>> +		ret = 1;
>>> +
>>> +out_unlock2:
>>
>> I think the 'out_up_read' is more readable than the 'out_unlock2' .
> Ok, I will use more readable tag here.
>>
>>> +	brelse(di_bh);
>>> +
>>> +	up_read(&OCFS2_I(inode)->ip_alloc_sem);
>>> +
>>> +out_unlock1:
>>
>> We should release buffer head here.
>>
>>> +	ocfs2_inode_unlock(inode, 0);
>>> +
>>> +out:
>>> +	return (ret ? 0 : 1);
>>> +}
>>> +
>>>  int ocfs2_seek_data_hole_offset(struct file *file, loff_t *offset, int 
>> whence)
>>>  {
>>>  	struct inode *inode = file->f_mapping->host;
>>> diff --git a/fs/ocfs2/extent_map.h b/fs/ocfs2/extent_map.h
>>> index 67ea57d..fd9e86a 100644
>>> --- a/fs/ocfs2/extent_map.h
>>> +++ b/fs/ocfs2/extent_map.h
>>> @@ -53,6 +53,9 @@ int ocfs2_extent_map_get_blocks(struct inode *inode, u64 
>> v_blkno, u64 *p_blkno,
>>>  int ocfs2_fiemap(struct inode *inode, struct fiemap_extent_info *fieinfo,
>>>  		 u64 map_start, u64 map_len);
>>>  
>>> +int ocfs2_overwrite_io(struct inode *inode, u64 map_start, u64 map_len,
>>> +		       int wait);
>>> +
>>>  int ocfs2_seek_data_hole_offset(struct file *file, loff_t *offset, int 
>> origin);
>>>  
>>>  int ocfs2_xattr_get_clusters(struct inode *inode, u32 v_cluster,
>>>
> 
> 
> .
> 

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [Ocfs2-devel] [PATCH 2/3] ocfs2: add ocfs2_overwrite_io function
  2017-11-28  3:35       ` Gang He
@ 2017-11-28  6:51         ` Joseph Qi
  -1 siblings, 0 replies; 62+ messages in thread
From: Joseph Qi @ 2017-11-28  6:51 UTC (permalink / raw)
  To: Gang He, jlbec, hch, Goldwyn Rodrigues, mfasheh; +Cc: ocfs2-devel, linux-kernel



On 17/11/28 11:35, Gang He wrote:
> Hello Joseph,
> 
> 
>>>>
>> Hi Gang,
>>
>> On 17/11/27 17:46, Gang He wrote:
>>> Add ocfs2_overwrite_io function, which is used to judge if
>>> overwrite allocated blocks, otherwise, the write will bring extra
>>> block allocation overhead.
>>>
>>> Signed-off-by: Gang He <ghe@suse.com>
>>> ---
>>>  fs/ocfs2/extent_map.c | 67 
>> +++++++++++++++++++++++++++++++++++++++++++++++++++
>>>  fs/ocfs2/extent_map.h |  3 +++
>>>  2 files changed, 70 insertions(+)
>>>
>>> diff --git a/fs/ocfs2/extent_map.c b/fs/ocfs2/extent_map.c
>>> index e4719e0..98bf325 100644
>>> --- a/fs/ocfs2/extent_map.c
>>> +++ b/fs/ocfs2/extent_map.c
>>> @@ -832,6 +832,73 @@ int ocfs2_fiemap(struct inode *inode, struct 
>> fiemap_extent_info *fieinfo,
>>>  	return ret;
>>>  }
>>>  
>>> +/* Is IO overwriting allocated blocks? */
>>> +int ocfs2_overwrite_io(struct inode *inode, u64 map_start, u64 map_len,
>>> +		       int wait)
>>> +{
>>> +	int ret = 0, is_last;
>>> +	u32 mapping_end, cpos;
>>> +	struct ocfs2_super *osb = OCFS2_SB(inode->i_sb);
>>> +	struct buffer_head *di_bh = NULL;
>>> +	struct ocfs2_extent_rec rec;
>>> +
>>> +	if (wait)
>>> +		ret = ocfs2_inode_lock(inode, &di_bh, 0);
>>> +	else
>>> +		ret = ocfs2_try_inode_lock(inode, &di_bh, 0);
>>> +	if (ret)
>>> +		goto out;
>>> +
>>> +	if (wait)
>>> +		down_read(&OCFS2_I(inode)->ip_alloc_sem);
>>> +	else {
>>> +		if (!down_read_trylock(&OCFS2_I(inode)->ip_alloc_sem)) {
>>> +			ret = -EAGAIN;
>>> +			goto out_unlock1;
>>> +		}
>>> +	}
>>> +
>>> +	if ((OCFS2_I(inode)->ip_dyn_features & OCFS2_INLINE_DATA_FL) &&
>>> +	   ((map_start + map_len) <= i_size_read(inode)))
>>> +		goto out_unlock2;
>>> +
>>> +	cpos = map_start >> osb->s_clustersize_bits;
>>> +	mapping_end = ocfs2_clusters_for_bytes(inode->i_sb,
>>> +					       map_start + map_len);
>>> +	is_last = 0;
>>> +	while (cpos < mapping_end && !is_last) {
>>> +		ret = ocfs2_get_clusters_nocache(inode, di_bh, cpos,
>>> +						 NULL, &rec, &is_last);
>>> +		if (ret) {
>>> +			mlog_errno(ret);
>>> +			goto out_unlock2;
>>> +		}
>>> +
>>> +		if (rec.e_blkno == 0ULL)
>>> +			break;
>>> +
>>> +		if (rec.e_flags & OCFS2_EXT_REFCOUNTED)
>>> +			break;
>>> +
>>> +		cpos = le32_to_cpu(rec.e_cpos) +
>>> +			le16_to_cpu(rec.e_leaf_clusters);
>>> +	}
>>> +
>>> +	if (cpos < mapping_end)
>>> +		ret = 1;
>>> +
>>> +out_unlock2:
>>> +	brelse(di_bh);
>>> +
>>> +	up_read(&OCFS2_I(inode)->ip_alloc_sem);
>>> +
>>> +out_unlock1:
>> Should brelse(di_bh) be here?
> If the code jumps to out_unlock1 directly, the di_bh pointer should be NULL, it is not necessary to release.
> 
Umm... No, once going out here, we have already taken inode lock. So
di_bh should be released.

>>
>>> +	ocfs2_inode_unlock(inode, 0);
>>> +
>>> +out:
>>> +	return (ret ? 0 : 1);
>> I don't think EAGAIN and other error code can be handled the same. We
>> have to distinguish them.
> Ok, I think we can add one line log to report the error in case the error is not EAGAIN. 
> 
My point is, there is no need to try again in several cases, e.g. EROFS
returned by ocfs2_get_clusters_nocache.

>>
>> Thanks,
>> Joseph
>>
>>> +}
>>> +
>>>  int ocfs2_seek_data_hole_offset(struct file *file, loff_t *offset, int 
>> whence)
>>>  {
>>>  	struct inode *inode = file->f_mapping->host;
>>> diff --git a/fs/ocfs2/extent_map.h b/fs/ocfs2/extent_map.h
>>> index 67ea57d..fd9e86a 100644
>>> --- a/fs/ocfs2/extent_map.h
>>> +++ b/fs/ocfs2/extent_map.h
>>> @@ -53,6 +53,9 @@ int ocfs2_extent_map_get_blocks(struct inode *inode, u64 
>> v_blkno, u64 *p_blkno,
>>>  int ocfs2_fiemap(struct inode *inode, struct fiemap_extent_info *fieinfo,
>>>  		 u64 map_start, u64 map_len);
>>>  
>>> +int ocfs2_overwrite_io(struct inode *inode, u64 map_start, u64 map_len,
>>> +		       int wait);
>>> +
>>>  int ocfs2_seek_data_hole_offset(struct file *file, loff_t *offset, int 
>> origin);
>>>  
>>>  int ocfs2_xattr_get_clusters(struct inode *inode, u32 v_cluster,
>>>
> 

^ permalink raw reply	[flat|nested] 62+ messages in thread

* [Ocfs2-devel] [PATCH 2/3] ocfs2: add ocfs2_overwrite_io function
@ 2017-11-28  6:51         ` Joseph Qi
  0 siblings, 0 replies; 62+ messages in thread
From: Joseph Qi @ 2017-11-28  6:51 UTC (permalink / raw)
  To: Gang He, jlbec, hch, Goldwyn Rodrigues, mfasheh; +Cc: ocfs2-devel, linux-kernel



On 17/11/28 11:35, Gang He wrote:
> Hello Joseph,
> 
> 
>>>>
>> Hi Gang,
>>
>> On 17/11/27 17:46, Gang He wrote:
>>> Add ocfs2_overwrite_io function, which is used to judge if
>>> overwrite allocated blocks, otherwise, the write will bring extra
>>> block allocation overhead.
>>>
>>> Signed-off-by: Gang He <ghe@suse.com>
>>> ---
>>>  fs/ocfs2/extent_map.c | 67 
>> +++++++++++++++++++++++++++++++++++++++++++++++++++
>>>  fs/ocfs2/extent_map.h |  3 +++
>>>  2 files changed, 70 insertions(+)
>>>
>>> diff --git a/fs/ocfs2/extent_map.c b/fs/ocfs2/extent_map.c
>>> index e4719e0..98bf325 100644
>>> --- a/fs/ocfs2/extent_map.c
>>> +++ b/fs/ocfs2/extent_map.c
>>> @@ -832,6 +832,73 @@ int ocfs2_fiemap(struct inode *inode, struct 
>> fiemap_extent_info *fieinfo,
>>>  	return ret;
>>>  }
>>>  
>>> +/* Is IO overwriting allocated blocks? */
>>> +int ocfs2_overwrite_io(struct inode *inode, u64 map_start, u64 map_len,
>>> +		       int wait)
>>> +{
>>> +	int ret = 0, is_last;
>>> +	u32 mapping_end, cpos;
>>> +	struct ocfs2_super *osb = OCFS2_SB(inode->i_sb);
>>> +	struct buffer_head *di_bh = NULL;
>>> +	struct ocfs2_extent_rec rec;
>>> +
>>> +	if (wait)
>>> +		ret = ocfs2_inode_lock(inode, &di_bh, 0);
>>> +	else
>>> +		ret = ocfs2_try_inode_lock(inode, &di_bh, 0);
>>> +	if (ret)
>>> +		goto out;
>>> +
>>> +	if (wait)
>>> +		down_read(&OCFS2_I(inode)->ip_alloc_sem);
>>> +	else {
>>> +		if (!down_read_trylock(&OCFS2_I(inode)->ip_alloc_sem)) {
>>> +			ret = -EAGAIN;
>>> +			goto out_unlock1;
>>> +		}
>>> +	}
>>> +
>>> +	if ((OCFS2_I(inode)->ip_dyn_features & OCFS2_INLINE_DATA_FL) &&
>>> +	   ((map_start + map_len) <= i_size_read(inode)))
>>> +		goto out_unlock2;
>>> +
>>> +	cpos = map_start >> osb->s_clustersize_bits;
>>> +	mapping_end = ocfs2_clusters_for_bytes(inode->i_sb,
>>> +					       map_start + map_len);
>>> +	is_last = 0;
>>> +	while (cpos < mapping_end && !is_last) {
>>> +		ret = ocfs2_get_clusters_nocache(inode, di_bh, cpos,
>>> +						 NULL, &rec, &is_last);
>>> +		if (ret) {
>>> +			mlog_errno(ret);
>>> +			goto out_unlock2;
>>> +		}
>>> +
>>> +		if (rec.e_blkno == 0ULL)
>>> +			break;
>>> +
>>> +		if (rec.e_flags & OCFS2_EXT_REFCOUNTED)
>>> +			break;
>>> +
>>> +		cpos = le32_to_cpu(rec.e_cpos) +
>>> +			le16_to_cpu(rec.e_leaf_clusters);
>>> +	}
>>> +
>>> +	if (cpos < mapping_end)
>>> +		ret = 1;
>>> +
>>> +out_unlock2:
>>> +	brelse(di_bh);
>>> +
>>> +	up_read(&OCFS2_I(inode)->ip_alloc_sem);
>>> +
>>> +out_unlock1:
>> Should brelse(di_bh) be here?
> If the code jumps to out_unlock1 directly, the di_bh pointer should be NULL, it is not necessary to release.
> 
Umm... No, once going out here, we have already taken inode lock. So
di_bh should be released.

>>
>>> +	ocfs2_inode_unlock(inode, 0);
>>> +
>>> +out:
>>> +	return (ret ? 0 : 1);
>> I don't think EAGAIN and other error code can be handled the same. We
>> have to distinguish them.
> Ok, I think we can add one line log to report the error in case the error is not EAGAIN. 
> 
My point is, there is no need to try again in several cases, e.g. EROFS
returned by ocfs2_get_clusters_nocache.

>>
>> Thanks,
>> Joseph
>>
>>> +}
>>> +
>>>  int ocfs2_seek_data_hole_offset(struct file *file, loff_t *offset, int 
>> whence)
>>>  {
>>>  	struct inode *inode = file->f_mapping->host;
>>> diff --git a/fs/ocfs2/extent_map.h b/fs/ocfs2/extent_map.h
>>> index 67ea57d..fd9e86a 100644
>>> --- a/fs/ocfs2/extent_map.h
>>> +++ b/fs/ocfs2/extent_map.h
>>> @@ -53,6 +53,9 @@ int ocfs2_extent_map_get_blocks(struct inode *inode, u64 
>> v_blkno, u64 *p_blkno,
>>>  int ocfs2_fiemap(struct inode *inode, struct fiemap_extent_info *fieinfo,
>>>  		 u64 map_start, u64 map_len);
>>>  
>>> +int ocfs2_overwrite_io(struct inode *inode, u64 map_start, u64 map_len,
>>> +		       int wait);
>>> +
>>>  int ocfs2_seek_data_hole_offset(struct file *file, loff_t *offset, int 
>> origin);
>>>  
>>>  int ocfs2_xattr_get_clusters(struct inode *inode, u32 v_cluster,
>>>
> 

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [Ocfs2-devel] [PATCH 2/3] ocfs2: add ocfs2_overwrite_io function
  2017-11-28  6:51         ` Joseph Qi
@ 2017-11-28  7:24           ` Gang He
  -1 siblings, 0 replies; 62+ messages in thread
From: Gang He @ 2017-11-28  7:24 UTC (permalink / raw)
  To: jlbec, jiangqi903, hch, Goldwyn Rodrigues, mfasheh
  Cc: ocfs2-devel, linux-kernel

Hello Joseph,


>>> 

> 
> On 17/11/28 11:35, Gang He wrote:
>> Hello Joseph,
>> 
>> 
>>>>>
>>> Hi Gang,
>>>
>>> On 17/11/27 17:46, Gang He wrote:
>>>> Add ocfs2_overwrite_io function, which is used to judge if
>>>> overwrite allocated blocks, otherwise, the write will bring extra
>>>> block allocation overhead.
>>>>
>>>> Signed-off-by: Gang He <ghe@suse.com>
>>>> ---
>>>>  fs/ocfs2/extent_map.c | 67 
>>> +++++++++++++++++++++++++++++++++++++++++++++++++++
>>>>  fs/ocfs2/extent_map.h |  3 +++
>>>>  2 files changed, 70 insertions(+)
>>>>
>>>> diff --git a/fs/ocfs2/extent_map.c b/fs/ocfs2/extent_map.c
>>>> index e4719e0..98bf325 100644
>>>> --- a/fs/ocfs2/extent_map.c
>>>> +++ b/fs/ocfs2/extent_map.c
>>>> @@ -832,6 +832,73 @@ int ocfs2_fiemap(struct inode *inode, struct 
>>> fiemap_extent_info *fieinfo,
>>>>  	return ret;
>>>>  }
>>>>  
>>>> +/* Is IO overwriting allocated blocks? */
>>>> +int ocfs2_overwrite_io(struct inode *inode, u64 map_start, u64 map_len,
>>>> +		       int wait)
>>>> +{
>>>> +	int ret = 0, is_last;
>>>> +	u32 mapping_end, cpos;
>>>> +	struct ocfs2_super *osb = OCFS2_SB(inode->i_sb);
>>>> +	struct buffer_head *di_bh = NULL;
>>>> +	struct ocfs2_extent_rec rec;
>>>> +
>>>> +	if (wait)
>>>> +		ret = ocfs2_inode_lock(inode, &di_bh, 0);
>>>> +	else
>>>> +		ret = ocfs2_try_inode_lock(inode, &di_bh, 0);
>>>> +	if (ret)
>>>> +		goto out;
>>>> +
>>>> +	if (wait)
>>>> +		down_read(&OCFS2_I(inode)->ip_alloc_sem);
>>>> +	else {
>>>> +		if (!down_read_trylock(&OCFS2_I(inode)->ip_alloc_sem)) {
>>>> +			ret = -EAGAIN;
>>>> +			goto out_unlock1;
>>>> +		}
>>>> +	}
>>>> +
>>>> +	if ((OCFS2_I(inode)->ip_dyn_features & OCFS2_INLINE_DATA_FL) &&
>>>> +	   ((map_start + map_len) <= i_size_read(inode)))
>>>> +		goto out_unlock2;
>>>> +
>>>> +	cpos = map_start >> osb->s_clustersize_bits;
>>>> +	mapping_end = ocfs2_clusters_for_bytes(inode->i_sb,
>>>> +					       map_start + map_len);
>>>> +	is_last = 0;
>>>> +	while (cpos < mapping_end && !is_last) {
>>>> +		ret = ocfs2_get_clusters_nocache(inode, di_bh, cpos,
>>>> +						 NULL, &rec, &is_last);
>>>> +		if (ret) {
>>>> +			mlog_errno(ret);
>>>> +			goto out_unlock2;
>>>> +		}
>>>> +
>>>> +		if (rec.e_blkno == 0ULL)
>>>> +			break;
>>>> +
>>>> +		if (rec.e_flags & OCFS2_EXT_REFCOUNTED)
>>>> +			break;
>>>> +
>>>> +		cpos = le32_to_cpu(rec.e_cpos) +
>>>> +			le16_to_cpu(rec.e_leaf_clusters);
>>>> +	}
>>>> +
>>>> +	if (cpos < mapping_end)
>>>> +		ret = 1;
>>>> +
>>>> +out_unlock2:
>>>> +	brelse(di_bh);
>>>> +
>>>> +	up_read(&OCFS2_I(inode)->ip_alloc_sem);
>>>> +
>>>> +out_unlock1:
>>> Should brelse(di_bh) be here?
>> If the code jumps to out_unlock1 directly, the di_bh pointer should be NULL, 
> it is not necessary to release.
>> 
> Umm... No, once going out here, we have already taken inode lock. So
> di_bh should be released.
Sorry, you are right.

> 
>>>
>>>> +	ocfs2_inode_unlock(inode, 0);
>>>> +
>>>> +out:
>>>> +	return (ret ? 0 : 1);
>>> I don't think EAGAIN and other error code can be handled the same. We
>>> have to distinguish them.
>> Ok, I think we can add one line log to report the error in case the error is 
> not EAGAIN. 
>> 
> My point is, there is no need to try again in several cases, e.g. EROFS
> returned by ocfs2_get_clusters_nocache.
In this function ocfs2_overwrite_io() only can return True(1) or False(0), then I think we can only give a error print before return true/false.
It is not necessary to return another value, but should let the user know any possible error message.

Thanks
Gang 

> 
>>>
>>> Thanks,
>>> Joseph
>>>
>>>> +}
>>>> +
>>>>  int ocfs2_seek_data_hole_offset(struct file *file, loff_t *offset, int 
>>> whence)
>>>>  {
>>>>  	struct inode *inode = file->f_mapping->host;
>>>> diff --git a/fs/ocfs2/extent_map.h b/fs/ocfs2/extent_map.h
>>>> index 67ea57d..fd9e86a 100644
>>>> --- a/fs/ocfs2/extent_map.h
>>>> +++ b/fs/ocfs2/extent_map.h
>>>> @@ -53,6 +53,9 @@ int ocfs2_extent_map_get_blocks(struct inode *inode, u64 
>>> v_blkno, u64 *p_blkno,
>>>>  int ocfs2_fiemap(struct inode *inode, struct fiemap_extent_info *fieinfo,
>>>>  		 u64 map_start, u64 map_len);
>>>>  
>>>> +int ocfs2_overwrite_io(struct inode *inode, u64 map_start, u64 map_len,
>>>> +		       int wait);
>>>> +
>>>>  int ocfs2_seek_data_hole_offset(struct file *file, loff_t *offset, int 
>>> origin);
>>>>  
>>>>  int ocfs2_xattr_get_clusters(struct inode *inode, u32 v_cluster,
>>>>
>> 

^ permalink raw reply	[flat|nested] 62+ messages in thread

* [Ocfs2-devel] [PATCH 2/3] ocfs2: add ocfs2_overwrite_io function
@ 2017-11-28  7:24           ` Gang He
  0 siblings, 0 replies; 62+ messages in thread
From: Gang He @ 2017-11-28  7:24 UTC (permalink / raw)
  To: jlbec, jiangqi903, hch, Goldwyn Rodrigues, mfasheh
  Cc: ocfs2-devel, linux-kernel

Hello Joseph,


>>> 

> 
> On 17/11/28 11:35, Gang He wrote:
>> Hello Joseph,
>> 
>> 
>>>>>
>>> Hi Gang,
>>>
>>> On 17/11/27 17:46, Gang He wrote:
>>>> Add ocfs2_overwrite_io function, which is used to judge if
>>>> overwrite allocated blocks, otherwise, the write will bring extra
>>>> block allocation overhead.
>>>>
>>>> Signed-off-by: Gang He <ghe@suse.com>
>>>> ---
>>>>  fs/ocfs2/extent_map.c | 67 
>>> +++++++++++++++++++++++++++++++++++++++++++++++++++
>>>>  fs/ocfs2/extent_map.h |  3 +++
>>>>  2 files changed, 70 insertions(+)
>>>>
>>>> diff --git a/fs/ocfs2/extent_map.c b/fs/ocfs2/extent_map.c
>>>> index e4719e0..98bf325 100644
>>>> --- a/fs/ocfs2/extent_map.c
>>>> +++ b/fs/ocfs2/extent_map.c
>>>> @@ -832,6 +832,73 @@ int ocfs2_fiemap(struct inode *inode, struct 
>>> fiemap_extent_info *fieinfo,
>>>>  	return ret;
>>>>  }
>>>>  
>>>> +/* Is IO overwriting allocated blocks? */
>>>> +int ocfs2_overwrite_io(struct inode *inode, u64 map_start, u64 map_len,
>>>> +		       int wait)
>>>> +{
>>>> +	int ret = 0, is_last;
>>>> +	u32 mapping_end, cpos;
>>>> +	struct ocfs2_super *osb = OCFS2_SB(inode->i_sb);
>>>> +	struct buffer_head *di_bh = NULL;
>>>> +	struct ocfs2_extent_rec rec;
>>>> +
>>>> +	if (wait)
>>>> +		ret = ocfs2_inode_lock(inode, &di_bh, 0);
>>>> +	else
>>>> +		ret = ocfs2_try_inode_lock(inode, &di_bh, 0);
>>>> +	if (ret)
>>>> +		goto out;
>>>> +
>>>> +	if (wait)
>>>> +		down_read(&OCFS2_I(inode)->ip_alloc_sem);
>>>> +	else {
>>>> +		if (!down_read_trylock(&OCFS2_I(inode)->ip_alloc_sem)) {
>>>> +			ret = -EAGAIN;
>>>> +			goto out_unlock1;
>>>> +		}
>>>> +	}
>>>> +
>>>> +	if ((OCFS2_I(inode)->ip_dyn_features & OCFS2_INLINE_DATA_FL) &&
>>>> +	   ((map_start + map_len) <= i_size_read(inode)))
>>>> +		goto out_unlock2;
>>>> +
>>>> +	cpos = map_start >> osb->s_clustersize_bits;
>>>> +	mapping_end = ocfs2_clusters_for_bytes(inode->i_sb,
>>>> +					       map_start + map_len);
>>>> +	is_last = 0;
>>>> +	while (cpos < mapping_end && !is_last) {
>>>> +		ret = ocfs2_get_clusters_nocache(inode, di_bh, cpos,
>>>> +						 NULL, &rec, &is_last);
>>>> +		if (ret) {
>>>> +			mlog_errno(ret);
>>>> +			goto out_unlock2;
>>>> +		}
>>>> +
>>>> +		if (rec.e_blkno == 0ULL)
>>>> +			break;
>>>> +
>>>> +		if (rec.e_flags & OCFS2_EXT_REFCOUNTED)
>>>> +			break;
>>>> +
>>>> +		cpos = le32_to_cpu(rec.e_cpos) +
>>>> +			le16_to_cpu(rec.e_leaf_clusters);
>>>> +	}
>>>> +
>>>> +	if (cpos < mapping_end)
>>>> +		ret = 1;
>>>> +
>>>> +out_unlock2:
>>>> +	brelse(di_bh);
>>>> +
>>>> +	up_read(&OCFS2_I(inode)->ip_alloc_sem);
>>>> +
>>>> +out_unlock1:
>>> Should brelse(di_bh) be here?
>> If the code jumps to out_unlock1 directly, the di_bh pointer should be NULL, 
> it is not necessary to release.
>> 
> Umm... No, once going out here, we have already taken inode lock. So
> di_bh should be released.
Sorry, you are right.

> 
>>>
>>>> +	ocfs2_inode_unlock(inode, 0);
>>>> +
>>>> +out:
>>>> +	return (ret ? 0 : 1);
>>> I don't think EAGAIN and other error code can be handled the same. We
>>> have to distinguish them.
>> Ok, I think we can add one line log to report the error in case the error is 
> not EAGAIN. 
>> 
> My point is, there is no need to try again in several cases, e.g. EROFS
> returned by ocfs2_get_clusters_nocache.
In this function ocfs2_overwrite_io() only can return True(1) or False(0), then I think we can only give a error print before return true/false.
It is not necessary to return another value, but should let the user know any possible error message.

Thanks
Gang 

> 
>>>
>>> Thanks,
>>> Joseph
>>>
>>>> +}
>>>> +
>>>>  int ocfs2_seek_data_hole_offset(struct file *file, loff_t *offset, int 
>>> whence)
>>>>  {
>>>>  	struct inode *inode = file->f_mapping->host;
>>>> diff --git a/fs/ocfs2/extent_map.h b/fs/ocfs2/extent_map.h
>>>> index 67ea57d..fd9e86a 100644
>>>> --- a/fs/ocfs2/extent_map.h
>>>> +++ b/fs/ocfs2/extent_map.h
>>>> @@ -53,6 +53,9 @@ int ocfs2_extent_map_get_blocks(struct inode *inode, u64 
>>> v_blkno, u64 *p_blkno,
>>>>  int ocfs2_fiemap(struct inode *inode, struct fiemap_extent_info *fieinfo,
>>>>  		 u64 map_start, u64 map_len);
>>>>  
>>>> +int ocfs2_overwrite_io(struct inode *inode, u64 map_start, u64 map_len,
>>>> +		       int wait);
>>>> +
>>>>  int ocfs2_seek_data_hole_offset(struct file *file, loff_t *offset, int 
>>> origin);
>>>>  
>>>>  int ocfs2_xattr_get_clusters(struct inode *inode, u32 v_cluster,
>>>>
>> 

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [Ocfs2-devel] [PATCH 2/3] ocfs2: add ocfs2_overwrite_io function
  2017-11-28  6:19         ` alex chen
@ 2017-11-28  7:38           ` Gang He
  -1 siblings, 0 replies; 62+ messages in thread
From: Gang He @ 2017-11-28  7:38 UTC (permalink / raw)
  To: alex.chen
  Cc: jlbec, hch, ocfs2-devel, Goldwyn Rodrigues, mfasheh, linux-kernel

Hi Alex,


>>> 
> Hi Gang,
> 
> On 2017/11/28 13:33, Gang He wrote:
>> Hello Alex,
>> 
>> 
>>>>>
>>> Hi Gang,
>>>
>>> On 2017/11/27 17:46, Gang He wrote:
>>>> Add ocfs2_overwrite_io function, which is used to judge if
>>>> overwrite allocated blocks, otherwise, the write will bring extra
>>>> block allocation overhead.
>>>>
>>>> Signed-off-by: Gang He <ghe@suse.com>
>>>> ---
>>>>  fs/ocfs2/extent_map.c | 67 
>>> +++++++++++++++++++++++++++++++++++++++++++++++++++
>>>>  fs/ocfs2/extent_map.h |  3 +++
>>>>  2 files changed, 70 insertions(+)
>>>>
>>>> diff --git a/fs/ocfs2/extent_map.c b/fs/ocfs2/extent_map.c
>>>> index e4719e0..98bf325 100644
>>>> --- a/fs/ocfs2/extent_map.c
>>>> +++ b/fs/ocfs2/extent_map.c
>>>> @@ -832,6 +832,73 @@ int ocfs2_fiemap(struct inode *inode, struct 
>>> fiemap_extent_info *fieinfo,
>>>>  	return ret;
>>>>  }
>>>>  
>>>> +/* Is IO overwriting allocated blocks? */
>>>> +int ocfs2_overwrite_io(struct inode *inode, u64 map_start, u64 map_len,
>>>> +		       int wait)
>>>> +{
>>>> +	int ret = 0, is_last;
>>>> +	u32 mapping_end, cpos;
>>>> +	struct ocfs2_super *osb = OCFS2_SB(inode->i_sb);
>>>> +	struct buffer_head *di_bh = NULL;
>>>> +	struct ocfs2_extent_rec rec;
>>>> +
>>>> +	if (wait)
>>>> +		ret = ocfs2_inode_lock(inode, &di_bh, 0);
>>>> +	else
>>>> +		ret = ocfs2_try_inode_lock(inode, &di_bh, 0);
>>>> +	if (ret)
>>>> +		goto out;
>>>> +
>>>> +	if (wait)
>>>> +		down_read(&OCFS2_I(inode)->ip_alloc_sem);
>>>> +	else {
>>>> +		if (!down_read_trylock(&OCFS2_I(inode)->ip_alloc_sem)) {
>>>> +			ret = -EAGAIN;
>>>> +			goto out_unlock1;
>>>> +		}
>>>> +	}
>>>> +
>>>> +	if ((OCFS2_I(inode)->ip_dyn_features & OCFS2_INLINE_DATA_FL) &&
>>>> +	   ((map_start + map_len) <= i_size_read(inode)))
>>>> +		goto out_unlock2;
>>>> +
>>>> +	cpos = map_start >> osb->s_clustersize_bits;
>>>> +	mapping_end = ocfs2_clusters_for_bytes(inode->i_sb,
>>>> +					       map_start + map_len);
>>>> +	is_last = 0;
>>>> +	while (cpos < mapping_end && !is_last) {
>>>> +		ret = ocfs2_get_clusters_nocache(inode, di_bh, cpos,
>>>> +						 NULL, &rec, &is_last);
>>>> +		if (ret) {
>>>> +			mlog_errno(ret);
>>>> +			goto out_unlock2;
>>>> +		}
>>>> +
>>>> +		if (rec.e_blkno == 0ULL)
>>>> +			break;
>>> I think here the blocks is not overwrite, because the hold is found and the 
>>> blocks
>>> should be allocated.
>> If the rec.e_blkno == NULL, this means there is a hole.
>> The file hole means that these blocks are not allocated, it does not like 
> unwritten block.
>> The unwritten blocks means that these blocks are allocated, but still have 
> not been unwritten. 
>> 
> If we break the loop when we find the hold, out of this function we will 
> allocate the blocks in
> ocfs2_file_write_iter()->..->ocfs2_direct_IO->__blockdev_direct_IO->..->ocfs2_dio_wr_g
> et_block()
> ->ocfs2_write_begin_nolock. Does this violate the semantics of 'IOCB_NOWAIT';
Yes, then we need to check if this is a overwrite before doing direct-io.

> 
> BTW, should we consider the down_write() and ocfs2_inode_lock() in 
> ocfs2_dio_wr_get_block() when
> the flag 'IOCB_NOWAIT' is set;
I think that we should not consider that layer lock, otherwise, the code change will become more and more complex and big.
I also refer to ext4 file system code change for this feature(728fbc0e10b7f3ce2ee043b32e3453fd5201c055), they did not do any change in that layer.

Thanks
Gang

> 
>>>> +
>>>> +		if (rec.e_flags & OCFS2_EXT_REFCOUNTED)
>>>> +			break;
>>>> +
>>>> +		cpos = le32_to_cpu(rec.e_cpos) +
>>>> +			le16_to_cpu(rec.e_leaf_clusters);
>>>> +	}
>>>> +
>>>> +	if (cpos < mapping_end)
>>>> +		ret = 1;
>>>> +
>>>> +out_unlock2:
>>>
>>> I think the 'out_up_read' is more readable than the 'out_unlock2' .
>> Ok, I will use more readable tag here.
>>>
>>>> +	brelse(di_bh);
>>>> +
>>>> +	up_read(&OCFS2_I(inode)->ip_alloc_sem);
>>>> +
>>>> +out_unlock1:
>>>
>>> We should release buffer head here.
>>>
>>>> +	ocfs2_inode_unlock(inode, 0);
>>>> +
>>>> +out:
>>>> +	return (ret ? 0 : 1);
>>>> +}
>>>> +
>>>>  int ocfs2_seek_data_hole_offset(struct file *file, loff_t *offset, int 
>>> whence)
>>>>  {
>>>>  	struct inode *inode = file->f_mapping->host;
>>>> diff --git a/fs/ocfs2/extent_map.h b/fs/ocfs2/extent_map.h
>>>> index 67ea57d..fd9e86a 100644
>>>> --- a/fs/ocfs2/extent_map.h
>>>> +++ b/fs/ocfs2/extent_map.h
>>>> @@ -53,6 +53,9 @@ int ocfs2_extent_map_get_blocks(struct inode *inode, u64 
>>> v_blkno, u64 *p_blkno,
>>>>  int ocfs2_fiemap(struct inode *inode, struct fiemap_extent_info *fieinfo,
>>>>  		 u64 map_start, u64 map_len);
>>>>  
>>>> +int ocfs2_overwrite_io(struct inode *inode, u64 map_start, u64 map_len,
>>>> +		       int wait);
>>>> +
>>>>  int ocfs2_seek_data_hole_offset(struct file *file, loff_t *offset, int 
>>> origin);
>>>>  
>>>>  int ocfs2_xattr_get_clusters(struct inode *inode, u32 v_cluster,
>>>>
>> 
>> 
>> .
>> 

^ permalink raw reply	[flat|nested] 62+ messages in thread

* [Ocfs2-devel] [PATCH 2/3] ocfs2: add ocfs2_overwrite_io function
@ 2017-11-28  7:38           ` Gang He
  0 siblings, 0 replies; 62+ messages in thread
From: Gang He @ 2017-11-28  7:38 UTC (permalink / raw)
  To: alex.chen
  Cc: jlbec, hch, ocfs2-devel, Goldwyn Rodrigues, mfasheh, linux-kernel

Hi Alex,


>>> 
> Hi Gang,
> 
> On 2017/11/28 13:33, Gang He wrote:
>> Hello Alex,
>> 
>> 
>>>>>
>>> Hi Gang,
>>>
>>> On 2017/11/27 17:46, Gang He wrote:
>>>> Add ocfs2_overwrite_io function, which is used to judge if
>>>> overwrite allocated blocks, otherwise, the write will bring extra
>>>> block allocation overhead.
>>>>
>>>> Signed-off-by: Gang He <ghe@suse.com>
>>>> ---
>>>>  fs/ocfs2/extent_map.c | 67 
>>> +++++++++++++++++++++++++++++++++++++++++++++++++++
>>>>  fs/ocfs2/extent_map.h |  3 +++
>>>>  2 files changed, 70 insertions(+)
>>>>
>>>> diff --git a/fs/ocfs2/extent_map.c b/fs/ocfs2/extent_map.c
>>>> index e4719e0..98bf325 100644
>>>> --- a/fs/ocfs2/extent_map.c
>>>> +++ b/fs/ocfs2/extent_map.c
>>>> @@ -832,6 +832,73 @@ int ocfs2_fiemap(struct inode *inode, struct 
>>> fiemap_extent_info *fieinfo,
>>>>  	return ret;
>>>>  }
>>>>  
>>>> +/* Is IO overwriting allocated blocks? */
>>>> +int ocfs2_overwrite_io(struct inode *inode, u64 map_start, u64 map_len,
>>>> +		       int wait)
>>>> +{
>>>> +	int ret = 0, is_last;
>>>> +	u32 mapping_end, cpos;
>>>> +	struct ocfs2_super *osb = OCFS2_SB(inode->i_sb);
>>>> +	struct buffer_head *di_bh = NULL;
>>>> +	struct ocfs2_extent_rec rec;
>>>> +
>>>> +	if (wait)
>>>> +		ret = ocfs2_inode_lock(inode, &di_bh, 0);
>>>> +	else
>>>> +		ret = ocfs2_try_inode_lock(inode, &di_bh, 0);
>>>> +	if (ret)
>>>> +		goto out;
>>>> +
>>>> +	if (wait)
>>>> +		down_read(&OCFS2_I(inode)->ip_alloc_sem);
>>>> +	else {
>>>> +		if (!down_read_trylock(&OCFS2_I(inode)->ip_alloc_sem)) {
>>>> +			ret = -EAGAIN;
>>>> +			goto out_unlock1;
>>>> +		}
>>>> +	}
>>>> +
>>>> +	if ((OCFS2_I(inode)->ip_dyn_features & OCFS2_INLINE_DATA_FL) &&
>>>> +	   ((map_start + map_len) <= i_size_read(inode)))
>>>> +		goto out_unlock2;
>>>> +
>>>> +	cpos = map_start >> osb->s_clustersize_bits;
>>>> +	mapping_end = ocfs2_clusters_for_bytes(inode->i_sb,
>>>> +					       map_start + map_len);
>>>> +	is_last = 0;
>>>> +	while (cpos < mapping_end && !is_last) {
>>>> +		ret = ocfs2_get_clusters_nocache(inode, di_bh, cpos,
>>>> +						 NULL, &rec, &is_last);
>>>> +		if (ret) {
>>>> +			mlog_errno(ret);
>>>> +			goto out_unlock2;
>>>> +		}
>>>> +
>>>> +		if (rec.e_blkno == 0ULL)
>>>> +			break;
>>> I think here the blocks is not overwrite, because the hold is found and the 
>>> blocks
>>> should be allocated.
>> If the rec.e_blkno == NULL, this means there is a hole.
>> The file hole means that these blocks are not allocated, it does not like 
> unwritten block.
>> The unwritten blocks means that these blocks are allocated, but still have 
> not been unwritten. 
>> 
> If we break the loop when we find the hold, out of this function we will 
> allocate the blocks in
> ocfs2_file_write_iter()->..->ocfs2_direct_IO->__blockdev_direct_IO->..->ocfs2_dio_wr_g
> et_block()
> ->ocfs2_write_begin_nolock. Does this violate the semantics of 'IOCB_NOWAIT';
Yes, then we need to check if this is a overwrite before doing direct-io.

> 
> BTW, should we consider the down_write() and ocfs2_inode_lock() in 
> ocfs2_dio_wr_get_block() when
> the flag 'IOCB_NOWAIT' is set;
I think that we should not consider that layer lock, otherwise, the code change will become more and more complex and big.
I also refer to ext4 file system code change for this feature(728fbc0e10b7f3ce2ee043b32e3453fd5201c055), they did not do any change in that layer.

Thanks
Gang

> 
>>>> +
>>>> +		if (rec.e_flags & OCFS2_EXT_REFCOUNTED)
>>>> +			break;
>>>> +
>>>> +		cpos = le32_to_cpu(rec.e_cpos) +
>>>> +			le16_to_cpu(rec.e_leaf_clusters);
>>>> +	}
>>>> +
>>>> +	if (cpos < mapping_end)
>>>> +		ret = 1;
>>>> +
>>>> +out_unlock2:
>>>
>>> I think the 'out_up_read' is more readable than the 'out_unlock2' .
>> Ok, I will use more readable tag here.
>>>
>>>> +	brelse(di_bh);
>>>> +
>>>> +	up_read(&OCFS2_I(inode)->ip_alloc_sem);
>>>> +
>>>> +out_unlock1:
>>>
>>> We should release buffer head here.
>>>
>>>> +	ocfs2_inode_unlock(inode, 0);
>>>> +
>>>> +out:
>>>> +	return (ret ? 0 : 1);
>>>> +}
>>>> +
>>>>  int ocfs2_seek_data_hole_offset(struct file *file, loff_t *offset, int 
>>> whence)
>>>>  {
>>>>  	struct inode *inode = file->f_mapping->host;
>>>> diff --git a/fs/ocfs2/extent_map.h b/fs/ocfs2/extent_map.h
>>>> index 67ea57d..fd9e86a 100644
>>>> --- a/fs/ocfs2/extent_map.h
>>>> +++ b/fs/ocfs2/extent_map.h
>>>> @@ -53,6 +53,9 @@ int ocfs2_extent_map_get_blocks(struct inode *inode, u64 
>>> v_blkno, u64 *p_blkno,
>>>>  int ocfs2_fiemap(struct inode *inode, struct fiemap_extent_info *fieinfo,
>>>>  		 u64 map_start, u64 map_len);
>>>>  
>>>> +int ocfs2_overwrite_io(struct inode *inode, u64 map_start, u64 map_len,
>>>> +		       int wait);
>>>> +
>>>>  int ocfs2_seek_data_hole_offset(struct file *file, loff_t *offset, int 
>>> origin);
>>>>  
>>>>  int ocfs2_xattr_get_clusters(struct inode *inode, u32 v_cluster,
>>>>
>> 
>> 
>> .
>> 

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [Ocfs2-devel] [PATCH 2/3] ocfs2: add ocfs2_overwrite_io function
  2017-11-28  7:38           ` Gang He
@ 2017-11-28  8:11             ` alex chen
  -1 siblings, 0 replies; 62+ messages in thread
From: alex chen @ 2017-11-28  8:11 UTC (permalink / raw)
  To: Gang He; +Cc: jlbec, hch, ocfs2-devel, Goldwyn Rodrigues, mfasheh, linux-kernel

Hi Gang,

On 2017/11/28 15:38, Gang He wrote:
> Hi Alex,
> 
> 
>>>>
>> Hi Gang,
>>
>> On 2017/11/28 13:33, Gang He wrote:
>>> Hello Alex,
>>>
>>>
>>>>>>
>>>> Hi Gang,
>>>>
>>>> On 2017/11/27 17:46, Gang He wrote:
>>>>> Add ocfs2_overwrite_io function, which is used to judge if
>>>>> overwrite allocated blocks, otherwise, the write will bring extra
>>>>> block allocation overhead.
>>>>>
>>>>> Signed-off-by: Gang He <ghe@suse.com>
>>>>> ---
>>>>>  fs/ocfs2/extent_map.c | 67 
>>>> +++++++++++++++++++++++++++++++++++++++++++++++++++
>>>>>  fs/ocfs2/extent_map.h |  3 +++
>>>>>  2 files changed, 70 insertions(+)
>>>>>
>>>>> diff --git a/fs/ocfs2/extent_map.c b/fs/ocfs2/extent_map.c
>>>>> index e4719e0..98bf325 100644
>>>>> --- a/fs/ocfs2/extent_map.c
>>>>> +++ b/fs/ocfs2/extent_map.c
>>>>> @@ -832,6 +832,73 @@ int ocfs2_fiemap(struct inode *inode, struct 
>>>> fiemap_extent_info *fieinfo,
>>>>>  	return ret;
>>>>>  }
>>>>>  
>>>>> +/* Is IO overwriting allocated blocks? */
>>>>> +int ocfs2_overwrite_io(struct inode *inode, u64 map_start, u64 map_len,
>>>>> +		       int wait)
>>>>> +{
>>>>> +	int ret = 0, is_last;
>>>>> +	u32 mapping_end, cpos;
>>>>> +	struct ocfs2_super *osb = OCFS2_SB(inode->i_sb);
>>>>> +	struct buffer_head *di_bh = NULL;
>>>>> +	struct ocfs2_extent_rec rec;
>>>>> +
>>>>> +	if (wait)
>>>>> +		ret = ocfs2_inode_lock(inode, &di_bh, 0);
>>>>> +	else
>>>>> +		ret = ocfs2_try_inode_lock(inode, &di_bh, 0);
>>>>> +	if (ret)
>>>>> +		goto out;
>>>>> +
>>>>> +	if (wait)
>>>>> +		down_read(&OCFS2_I(inode)->ip_alloc_sem);
>>>>> +	else {
>>>>> +		if (!down_read_trylock(&OCFS2_I(inode)->ip_alloc_sem)) {
>>>>> +			ret = -EAGAIN;
>>>>> +			goto out_unlock1;
>>>>> +		}
>>>>> +	}
>>>>> +
>>>>> +	if ((OCFS2_I(inode)->ip_dyn_features & OCFS2_INLINE_DATA_FL) &&
>>>>> +	   ((map_start + map_len) <= i_size_read(inode)))
>>>>> +		goto out_unlock2;
>>>>> +
>>>>> +	cpos = map_start >> osb->s_clustersize_bits;
>>>>> +	mapping_end = ocfs2_clusters_for_bytes(inode->i_sb,
>>>>> +					       map_start + map_len);
>>>>> +	is_last = 0;
>>>>> +	while (cpos < mapping_end && !is_last) {
>>>>> +		ret = ocfs2_get_clusters_nocache(inode, di_bh, cpos,
>>>>> +						 NULL, &rec, &is_last);
>>>>> +		if (ret) {
>>>>> +			mlog_errno(ret);
>>>>> +			goto out_unlock2;
>>>>> +		}
>>>>> +
>>>>> +		if (rec.e_blkno == 0ULL)
>>>>> +			break;
>>>> I think here the blocks is not overwrite, because the hold is found and the 
>>>> blocks
>>>> should be allocated.
>>> If the rec.e_blkno == NULL, this means there is a hole.
>>> The file hole means that these blocks are not allocated, it does not like 
>> unwritten block.
>>> The unwritten blocks means that these blocks are allocated, but still have 
>> not been unwritten. 
>>>
>> If we break the loop when we find the hold, out of this function we will 
>> allocate the blocks in
>> ocfs2_file_write_iter()->..->ocfs2_direct_IO->__blockdev_direct_IO->..->ocfs2_dio_wr_g
>> et_block()
>> ->ocfs2_write_begin_nolock. Does this violate the semantics of 'IOCB_NOWAIT';
> Yes, then we need to check if this is a overwrite before doing direct-io.
>

I mean here we should return 0 instead of break and we should immediately return -EAGAIN
to upper apps, otherwise, some block allocation will be happen, which violates the
semantics of 'IOCB_NOWAIT'.

Thanks,
Alex

>>
>> BTW, should we consider the down_write() and ocfs2_inode_lock() in 
>> ocfs2_dio_wr_get_block() when
>> the flag 'IOCB_NOWAIT' is set;
> I think that we should not consider that layer lock, otherwise, the code change will become more and more complex and big.
> I also refer to ext4 file system code change for this feature(728fbc0e10b7f3ce2ee043b32e3453fd5201c055), they did not do any change in that layer.
> 

OK.

> Thanks
> Gang
> 
>>
>>>>> +
>>>>> +		if (rec.e_flags & OCFS2_EXT_REFCOUNTED)
>>>>> +			break;
>>>>> +
>>>>> +		cpos = le32_to_cpu(rec.e_cpos) +
>>>>> +			le16_to_cpu(rec.e_leaf_clusters);
>>>>> +	}
>>>>> +
>>>>> +	if (cpos < mapping_end)
>>>>> +		ret = 1;
>>>>> +
>>>>> +out_unlock2:
>>>>
>>>> I think the 'out_up_read' is more readable than the 'out_unlock2' .
>>> Ok, I will use more readable tag here.
>>>>
>>>>> +	brelse(di_bh);
>>>>> +
>>>>> +	up_read(&OCFS2_I(inode)->ip_alloc_sem);
>>>>> +
>>>>> +out_unlock1:
>>>>
>>>> We should release buffer head here.
>>>>
>>>>> +	ocfs2_inode_unlock(inode, 0);
>>>>> +
>>>>> +out:
>>>>> +	return (ret ? 0 : 1);
>>>>> +}
>>>>> +
>>>>>  int ocfs2_seek_data_hole_offset(struct file *file, loff_t *offset, int 
>>>> whence)
>>>>>  {
>>>>>  	struct inode *inode = file->f_mapping->host;
>>>>> diff --git a/fs/ocfs2/extent_map.h b/fs/ocfs2/extent_map.h
>>>>> index 67ea57d..fd9e86a 100644
>>>>> --- a/fs/ocfs2/extent_map.h
>>>>> +++ b/fs/ocfs2/extent_map.h
>>>>> @@ -53,6 +53,9 @@ int ocfs2_extent_map_get_blocks(struct inode *inode, u64 
>>>> v_blkno, u64 *p_blkno,
>>>>>  int ocfs2_fiemap(struct inode *inode, struct fiemap_extent_info *fieinfo,
>>>>>  		 u64 map_start, u64 map_len);
>>>>>  
>>>>> +int ocfs2_overwrite_io(struct inode *inode, u64 map_start, u64 map_len,
>>>>> +		       int wait);
>>>>> +
>>>>>  int ocfs2_seek_data_hole_offset(struct file *file, loff_t *offset, int 
>>>> origin);
>>>>>  
>>>>>  int ocfs2_xattr_get_clusters(struct inode *inode, u32 v_cluster,
>>>>>
>>>
>>>
>>> .
>>>
> 
> .
> 

^ permalink raw reply	[flat|nested] 62+ messages in thread

* [Ocfs2-devel] [PATCH 2/3] ocfs2: add ocfs2_overwrite_io function
@ 2017-11-28  8:11             ` alex chen
  0 siblings, 0 replies; 62+ messages in thread
From: alex chen @ 2017-11-28  8:11 UTC (permalink / raw)
  To: Gang He; +Cc: jlbec, hch, ocfs2-devel, Goldwyn Rodrigues, mfasheh, linux-kernel

Hi Gang,

On 2017/11/28 15:38, Gang He wrote:
> Hi Alex,
> 
> 
>>>>
>> Hi Gang,
>>
>> On 2017/11/28 13:33, Gang He wrote:
>>> Hello Alex,
>>>
>>>
>>>>>>
>>>> Hi Gang,
>>>>
>>>> On 2017/11/27 17:46, Gang He wrote:
>>>>> Add ocfs2_overwrite_io function, which is used to judge if
>>>>> overwrite allocated blocks, otherwise, the write will bring extra
>>>>> block allocation overhead.
>>>>>
>>>>> Signed-off-by: Gang He <ghe@suse.com>
>>>>> ---
>>>>>  fs/ocfs2/extent_map.c | 67 
>>>> +++++++++++++++++++++++++++++++++++++++++++++++++++
>>>>>  fs/ocfs2/extent_map.h |  3 +++
>>>>>  2 files changed, 70 insertions(+)
>>>>>
>>>>> diff --git a/fs/ocfs2/extent_map.c b/fs/ocfs2/extent_map.c
>>>>> index e4719e0..98bf325 100644
>>>>> --- a/fs/ocfs2/extent_map.c
>>>>> +++ b/fs/ocfs2/extent_map.c
>>>>> @@ -832,6 +832,73 @@ int ocfs2_fiemap(struct inode *inode, struct 
>>>> fiemap_extent_info *fieinfo,
>>>>>  	return ret;
>>>>>  }
>>>>>  
>>>>> +/* Is IO overwriting allocated blocks? */
>>>>> +int ocfs2_overwrite_io(struct inode *inode, u64 map_start, u64 map_len,
>>>>> +		       int wait)
>>>>> +{
>>>>> +	int ret = 0, is_last;
>>>>> +	u32 mapping_end, cpos;
>>>>> +	struct ocfs2_super *osb = OCFS2_SB(inode->i_sb);
>>>>> +	struct buffer_head *di_bh = NULL;
>>>>> +	struct ocfs2_extent_rec rec;
>>>>> +
>>>>> +	if (wait)
>>>>> +		ret = ocfs2_inode_lock(inode, &di_bh, 0);
>>>>> +	else
>>>>> +		ret = ocfs2_try_inode_lock(inode, &di_bh, 0);
>>>>> +	if (ret)
>>>>> +		goto out;
>>>>> +
>>>>> +	if (wait)
>>>>> +		down_read(&OCFS2_I(inode)->ip_alloc_sem);
>>>>> +	else {
>>>>> +		if (!down_read_trylock(&OCFS2_I(inode)->ip_alloc_sem)) {
>>>>> +			ret = -EAGAIN;
>>>>> +			goto out_unlock1;
>>>>> +		}
>>>>> +	}
>>>>> +
>>>>> +	if ((OCFS2_I(inode)->ip_dyn_features & OCFS2_INLINE_DATA_FL) &&
>>>>> +	   ((map_start + map_len) <= i_size_read(inode)))
>>>>> +		goto out_unlock2;
>>>>> +
>>>>> +	cpos = map_start >> osb->s_clustersize_bits;
>>>>> +	mapping_end = ocfs2_clusters_for_bytes(inode->i_sb,
>>>>> +					       map_start + map_len);
>>>>> +	is_last = 0;
>>>>> +	while (cpos < mapping_end && !is_last) {
>>>>> +		ret = ocfs2_get_clusters_nocache(inode, di_bh, cpos,
>>>>> +						 NULL, &rec, &is_last);
>>>>> +		if (ret) {
>>>>> +			mlog_errno(ret);
>>>>> +			goto out_unlock2;
>>>>> +		}
>>>>> +
>>>>> +		if (rec.e_blkno == 0ULL)
>>>>> +			break;
>>>> I think here the blocks is not overwrite, because the hold is found and the 
>>>> blocks
>>>> should be allocated.
>>> If the rec.e_blkno == NULL, this means there is a hole.
>>> The file hole means that these blocks are not allocated, it does not like 
>> unwritten block.
>>> The unwritten blocks means that these blocks are allocated, but still have 
>> not been unwritten. 
>>>
>> If we break the loop when we find the hold, out of this function we will 
>> allocate the blocks in
>> ocfs2_file_write_iter()->..->ocfs2_direct_IO->__blockdev_direct_IO->..->ocfs2_dio_wr_g
>> et_block()
>> ->ocfs2_write_begin_nolock. Does this violate the semantics of 'IOCB_NOWAIT';
> Yes, then we need to check if this is a overwrite before doing direct-io.
>

I mean here we should return 0 instead of break and we should immediately return -EAGAIN
to upper apps, otherwise, some block allocation will be happen, which violates the
semantics of 'IOCB_NOWAIT'.

Thanks,
Alex

>>
>> BTW, should we consider the down_write() and ocfs2_inode_lock() in 
>> ocfs2_dio_wr_get_block() when
>> the flag 'IOCB_NOWAIT' is set;
> I think that we should not consider that layer lock, otherwise, the code change will become more and more complex and big.
> I also refer to ext4 file system code change for this feature(728fbc0e10b7f3ce2ee043b32e3453fd5201c055), they did not do any change in that layer.
> 

OK.

> Thanks
> Gang
> 
>>
>>>>> +
>>>>> +		if (rec.e_flags & OCFS2_EXT_REFCOUNTED)
>>>>> +			break;
>>>>> +
>>>>> +		cpos = le32_to_cpu(rec.e_cpos) +
>>>>> +			le16_to_cpu(rec.e_leaf_clusters);
>>>>> +	}
>>>>> +
>>>>> +	if (cpos < mapping_end)
>>>>> +		ret = 1;
>>>>> +
>>>>> +out_unlock2:
>>>>
>>>> I think the 'out_up_read' is more readable than the 'out_unlock2' .
>>> Ok, I will use more readable tag here.
>>>>
>>>>> +	brelse(di_bh);
>>>>> +
>>>>> +	up_read(&OCFS2_I(inode)->ip_alloc_sem);
>>>>> +
>>>>> +out_unlock1:
>>>>
>>>> We should release buffer head here.
>>>>
>>>>> +	ocfs2_inode_unlock(inode, 0);
>>>>> +
>>>>> +out:
>>>>> +	return (ret ? 0 : 1);
>>>>> +}
>>>>> +
>>>>>  int ocfs2_seek_data_hole_offset(struct file *file, loff_t *offset, int 
>>>> whence)
>>>>>  {
>>>>>  	struct inode *inode = file->f_mapping->host;
>>>>> diff --git a/fs/ocfs2/extent_map.h b/fs/ocfs2/extent_map.h
>>>>> index 67ea57d..fd9e86a 100644
>>>>> --- a/fs/ocfs2/extent_map.h
>>>>> +++ b/fs/ocfs2/extent_map.h
>>>>> @@ -53,6 +53,9 @@ int ocfs2_extent_map_get_blocks(struct inode *inode, u64 
>>>> v_blkno, u64 *p_blkno,
>>>>>  int ocfs2_fiemap(struct inode *inode, struct fiemap_extent_info *fieinfo,
>>>>>  		 u64 map_start, u64 map_len);
>>>>>  
>>>>> +int ocfs2_overwrite_io(struct inode *inode, u64 map_start, u64 map_len,
>>>>> +		       int wait);
>>>>> +
>>>>>  int ocfs2_seek_data_hole_offset(struct file *file, loff_t *offset, int 
>>>> origin);
>>>>>  
>>>>>  int ocfs2_xattr_get_clusters(struct inode *inode, u32 v_cluster,
>>>>>
>>>
>>>
>>> .
>>>
> 
> .
> 

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [Ocfs2-devel] [PATCH 2/3] ocfs2: add ocfs2_overwrite_io function
  2017-11-28  8:11             ` alex chen
@ 2017-11-28  8:32               ` Gang He
  -1 siblings, 0 replies; 62+ messages in thread
From: Gang He @ 2017-11-28  8:32 UTC (permalink / raw)
  To: alex.chen
  Cc: jlbec, hch, ocfs2-devel, Goldwyn Rodrigues, mfasheh, linux-kernel

Hi Alex,


>>> 
> Hi Gang,
> 
> On 2017/11/28 15:38, Gang He wrote:
>> Hi Alex,
>> 
>> 
>>>>>
>>> Hi Gang,
>>>
>>> On 2017/11/28 13:33, Gang He wrote:
>>>> Hello Alex,
>>>>
>>>>
>>>>>>>
>>>>> Hi Gang,
>>>>>
>>>>> On 2017/11/27 17:46, Gang He wrote:
>>>>>> Add ocfs2_overwrite_io function, which is used to judge if
>>>>>> overwrite allocated blocks, otherwise, the write will bring extra
>>>>>> block allocation overhead.
>>>>>>
>>>>>> Signed-off-by: Gang He <ghe@suse.com>
>>>>>> ---
>>>>>>  fs/ocfs2/extent_map.c | 67 
>>>>> +++++++++++++++++++++++++++++++++++++++++++++++++++
>>>>>>  fs/ocfs2/extent_map.h |  3 +++
>>>>>>  2 files changed, 70 insertions(+)
>>>>>>
>>>>>> diff --git a/fs/ocfs2/extent_map.c b/fs/ocfs2/extent_map.c
>>>>>> index e4719e0..98bf325 100644
>>>>>> --- a/fs/ocfs2/extent_map.c
>>>>>> +++ b/fs/ocfs2/extent_map.c
>>>>>> @@ -832,6 +832,73 @@ int ocfs2_fiemap(struct inode *inode, struct 
>>>>> fiemap_extent_info *fieinfo,
>>>>>>  	return ret;
>>>>>>  }
>>>>>>  
>>>>>> +/* Is IO overwriting allocated blocks? */
>>>>>> +int ocfs2_overwrite_io(struct inode *inode, u64 map_start, u64 map_len,
>>>>>> +		       int wait)
>>>>>> +{
>>>>>> +	int ret = 0, is_last;
>>>>>> +	u32 mapping_end, cpos;
>>>>>> +	struct ocfs2_super *osb = OCFS2_SB(inode->i_sb);
>>>>>> +	struct buffer_head *di_bh = NULL;
>>>>>> +	struct ocfs2_extent_rec rec;
>>>>>> +
>>>>>> +	if (wait)
>>>>>> +		ret = ocfs2_inode_lock(inode, &di_bh, 0);
>>>>>> +	else
>>>>>> +		ret = ocfs2_try_inode_lock(inode, &di_bh, 0);
>>>>>> +	if (ret)
>>>>>> +		goto out;
>>>>>> +
>>>>>> +	if (wait)
>>>>>> +		down_read(&OCFS2_I(inode)->ip_alloc_sem);
>>>>>> +	else {
>>>>>> +		if (!down_read_trylock(&OCFS2_I(inode)->ip_alloc_sem)) {
>>>>>> +			ret = -EAGAIN;
>>>>>> +			goto out_unlock1;
>>>>>> +		}
>>>>>> +	}
>>>>>> +
>>>>>> +	if ((OCFS2_I(inode)->ip_dyn_features & OCFS2_INLINE_DATA_FL) &&
>>>>>> +	   ((map_start + map_len) <= i_size_read(inode)))
>>>>>> +		goto out_unlock2;
>>>>>> +
>>>>>> +	cpos = map_start >> osb->s_clustersize_bits;
>>>>>> +	mapping_end = ocfs2_clusters_for_bytes(inode->i_sb,
>>>>>> +					       map_start + map_len);
>>>>>> +	is_last = 0;
>>>>>> +	while (cpos < mapping_end && !is_last) {
>>>>>> +		ret = ocfs2_get_clusters_nocache(inode, di_bh, cpos,
>>>>>> +						 NULL, &rec, &is_last);
>>>>>> +		if (ret) {
>>>>>> +			mlog_errno(ret);
>>>>>> +			goto out_unlock2;
>>>>>> +		}
>>>>>> +
>>>>>> +		if (rec.e_blkno == 0ULL)
>>>>>> +			break;
>>>>> I think here the blocks is not overwrite, because the hold is found and the 
>>>>> blocks
>>>>> should be allocated.
>>>> If the rec.e_blkno == NULL, this means there is a hole.
>>>> The file hole means that these blocks are not allocated, it does not like 
>>> unwritten block.
>>>> The unwritten blocks means that these blocks are allocated, but still have 
>>> not been unwritten. 
>>>>
>>> If we break the loop when we find the hold, out of this function we will 
>>> allocate the blocks in
>>> ocfs2_file_write_iter()->..->ocfs2_direct_IO->__blockdev_direct_IO->..->ocfs2_dio_wr_g
>>> et_block()
>>> ->ocfs2_write_begin_nolock. Does this violate the semantics of 'IOCB_NOWAIT';
>> Yes, then we need to check if this is a overwrite before doing direct-io.
>>
> 
> I mean here we should return 0 instead of break and we should immediately 
> return -EAGAIN
> to upper apps, otherwise, some block allocation will be happen, which 
> violates the
> semantics of 'IOCB_NOWAIT'.
Before we do a direct-io, I need to check if this is a overwrite allocated blocks IO.
If not, we will return  -EAGAIN in 'IOCB_NOWAIT' mode. this should not trigger any block allocation.
I am not sure if we understand your concern totally.

Thanks
Gang 

> 
> Thanks,
> Alex
> 
>>>
>>> BTW, should we consider the down_write() and ocfs2_inode_lock() in 
>>> ocfs2_dio_wr_get_block() when
>>> the flag 'IOCB_NOWAIT' is set;
>> I think that we should not consider that layer lock, otherwise, the code 
> change will become more and more complex and big.
>> I also refer to ext4 file system code change for this 
> feature(728fbc0e10b7f3ce2ee043b32e3453fd5201c055), they did not do any change 
> in that layer.
>> 
> 
> OK.
> 
>> Thanks
>> Gang
>> 
>>>
>>>>>> +
>>>>>> +		if (rec.e_flags & OCFS2_EXT_REFCOUNTED)
>>>>>> +			break;
>>>>>> +
>>>>>> +		cpos = le32_to_cpu(rec.e_cpos) +
>>>>>> +			le16_to_cpu(rec.e_leaf_clusters);
>>>>>> +	}
>>>>>> +
>>>>>> +	if (cpos < mapping_end)
>>>>>> +		ret = 1;
>>>>>> +
>>>>>> +out_unlock2:
>>>>>
>>>>> I think the 'out_up_read' is more readable than the 'out_unlock2' .
>>>> Ok, I will use more readable tag here.
>>>>>
>>>>>> +	brelse(di_bh);
>>>>>> +
>>>>>> +	up_read(&OCFS2_I(inode)->ip_alloc_sem);
>>>>>> +
>>>>>> +out_unlock1:
>>>>>
>>>>> We should release buffer head here.
>>>>>
>>>>>> +	ocfs2_inode_unlock(inode, 0);
>>>>>> +
>>>>>> +out:
>>>>>> +	return (ret ? 0 : 1);
>>>>>> +}
>>>>>> +
>>>>>>  int ocfs2_seek_data_hole_offset(struct file *file, loff_t *offset, int 
>>>>> whence)
>>>>>>  {
>>>>>>  	struct inode *inode = file->f_mapping->host;
>>>>>> diff --git a/fs/ocfs2/extent_map.h b/fs/ocfs2/extent_map.h
>>>>>> index 67ea57d..fd9e86a 100644
>>>>>> --- a/fs/ocfs2/extent_map.h
>>>>>> +++ b/fs/ocfs2/extent_map.h
>>>>>> @@ -53,6 +53,9 @@ int ocfs2_extent_map_get_blocks(struct inode *inode, u64 
>>>>> v_blkno, u64 *p_blkno,
>>>>>>  int ocfs2_fiemap(struct inode *inode, struct fiemap_extent_info *fieinfo,
>>>>>>  		 u64 map_start, u64 map_len);
>>>>>>  
>>>>>> +int ocfs2_overwrite_io(struct inode *inode, u64 map_start, u64 map_len,
>>>>>> +		       int wait);
>>>>>> +
>>>>>>  int ocfs2_seek_data_hole_offset(struct file *file, loff_t *offset, int 
>>>>> origin);
>>>>>>  
>>>>>>  int ocfs2_xattr_get_clusters(struct inode *inode, u32 v_cluster,
>>>>>>
>>>>
>>>>
>>>> .
>>>>
>> 
>> .
>> 

^ permalink raw reply	[flat|nested] 62+ messages in thread

* [Ocfs2-devel] [PATCH 2/3] ocfs2: add ocfs2_overwrite_io function
@ 2017-11-28  8:32               ` Gang He
  0 siblings, 0 replies; 62+ messages in thread
From: Gang He @ 2017-11-28  8:32 UTC (permalink / raw)
  To: alex.chen
  Cc: jlbec, hch, ocfs2-devel, Goldwyn Rodrigues, mfasheh, linux-kernel

Hi Alex,


>>> 
> Hi Gang,
> 
> On 2017/11/28 15:38, Gang He wrote:
>> Hi Alex,
>> 
>> 
>>>>>
>>> Hi Gang,
>>>
>>> On 2017/11/28 13:33, Gang He wrote:
>>>> Hello Alex,
>>>>
>>>>
>>>>>>>
>>>>> Hi Gang,
>>>>>
>>>>> On 2017/11/27 17:46, Gang He wrote:
>>>>>> Add ocfs2_overwrite_io function, which is used to judge if
>>>>>> overwrite allocated blocks, otherwise, the write will bring extra
>>>>>> block allocation overhead.
>>>>>>
>>>>>> Signed-off-by: Gang He <ghe@suse.com>
>>>>>> ---
>>>>>>  fs/ocfs2/extent_map.c | 67 
>>>>> +++++++++++++++++++++++++++++++++++++++++++++++++++
>>>>>>  fs/ocfs2/extent_map.h |  3 +++
>>>>>>  2 files changed, 70 insertions(+)
>>>>>>
>>>>>> diff --git a/fs/ocfs2/extent_map.c b/fs/ocfs2/extent_map.c
>>>>>> index e4719e0..98bf325 100644
>>>>>> --- a/fs/ocfs2/extent_map.c
>>>>>> +++ b/fs/ocfs2/extent_map.c
>>>>>> @@ -832,6 +832,73 @@ int ocfs2_fiemap(struct inode *inode, struct 
>>>>> fiemap_extent_info *fieinfo,
>>>>>>  	return ret;
>>>>>>  }
>>>>>>  
>>>>>> +/* Is IO overwriting allocated blocks? */
>>>>>> +int ocfs2_overwrite_io(struct inode *inode, u64 map_start, u64 map_len,
>>>>>> +		       int wait)
>>>>>> +{
>>>>>> +	int ret = 0, is_last;
>>>>>> +	u32 mapping_end, cpos;
>>>>>> +	struct ocfs2_super *osb = OCFS2_SB(inode->i_sb);
>>>>>> +	struct buffer_head *di_bh = NULL;
>>>>>> +	struct ocfs2_extent_rec rec;
>>>>>> +
>>>>>> +	if (wait)
>>>>>> +		ret = ocfs2_inode_lock(inode, &di_bh, 0);
>>>>>> +	else
>>>>>> +		ret = ocfs2_try_inode_lock(inode, &di_bh, 0);
>>>>>> +	if (ret)
>>>>>> +		goto out;
>>>>>> +
>>>>>> +	if (wait)
>>>>>> +		down_read(&OCFS2_I(inode)->ip_alloc_sem);
>>>>>> +	else {
>>>>>> +		if (!down_read_trylock(&OCFS2_I(inode)->ip_alloc_sem)) {
>>>>>> +			ret = -EAGAIN;
>>>>>> +			goto out_unlock1;
>>>>>> +		}
>>>>>> +	}
>>>>>> +
>>>>>> +	if ((OCFS2_I(inode)->ip_dyn_features & OCFS2_INLINE_DATA_FL) &&
>>>>>> +	   ((map_start + map_len) <= i_size_read(inode)))
>>>>>> +		goto out_unlock2;
>>>>>> +
>>>>>> +	cpos = map_start >> osb->s_clustersize_bits;
>>>>>> +	mapping_end = ocfs2_clusters_for_bytes(inode->i_sb,
>>>>>> +					       map_start + map_len);
>>>>>> +	is_last = 0;
>>>>>> +	while (cpos < mapping_end && !is_last) {
>>>>>> +		ret = ocfs2_get_clusters_nocache(inode, di_bh, cpos,
>>>>>> +						 NULL, &rec, &is_last);
>>>>>> +		if (ret) {
>>>>>> +			mlog_errno(ret);
>>>>>> +			goto out_unlock2;
>>>>>> +		}
>>>>>> +
>>>>>> +		if (rec.e_blkno == 0ULL)
>>>>>> +			break;
>>>>> I think here the blocks is not overwrite, because the hold is found and the 
>>>>> blocks
>>>>> should be allocated.
>>>> If the rec.e_blkno == NULL, this means there is a hole.
>>>> The file hole means that these blocks are not allocated, it does not like 
>>> unwritten block.
>>>> The unwritten blocks means that these blocks are allocated, but still have 
>>> not been unwritten. 
>>>>
>>> If we break the loop when we find the hold, out of this function we will 
>>> allocate the blocks in
>>> ocfs2_file_write_iter()->..->ocfs2_direct_IO->__blockdev_direct_IO->..->ocfs2_dio_wr_g
>>> et_block()
>>> ->ocfs2_write_begin_nolock. Does this violate the semantics of 'IOCB_NOWAIT';
>> Yes, then we need to check if this is a overwrite before doing direct-io.
>>
> 
> I mean here we should return 0 instead of break and we should immediately 
> return -EAGAIN
> to upper apps, otherwise, some block allocation will be happen, which 
> violates the
> semantics of 'IOCB_NOWAIT'.
Before we do a direct-io, I need to check if this is a overwrite allocated blocks IO.
If not, we will return  -EAGAIN in 'IOCB_NOWAIT' mode. this should not trigger any block allocation.
I am not sure if we understand your concern totally.

Thanks
Gang 

> 
> Thanks,
> Alex
> 
>>>
>>> BTW, should we consider the down_write() and ocfs2_inode_lock() in 
>>> ocfs2_dio_wr_get_block() when
>>> the flag 'IOCB_NOWAIT' is set;
>> I think that we should not consider that layer lock, otherwise, the code 
> change will become more and more complex and big.
>> I also refer to ext4 file system code change for this 
> feature(728fbc0e10b7f3ce2ee043b32e3453fd5201c055), they did not do any change 
> in that layer.
>> 
> 
> OK.
> 
>> Thanks
>> Gang
>> 
>>>
>>>>>> +
>>>>>> +		if (rec.e_flags & OCFS2_EXT_REFCOUNTED)
>>>>>> +			break;
>>>>>> +
>>>>>> +		cpos = le32_to_cpu(rec.e_cpos) +
>>>>>> +			le16_to_cpu(rec.e_leaf_clusters);
>>>>>> +	}
>>>>>> +
>>>>>> +	if (cpos < mapping_end)
>>>>>> +		ret = 1;
>>>>>> +
>>>>>> +out_unlock2:
>>>>>
>>>>> I think the 'out_up_read' is more readable than the 'out_unlock2' .
>>>> Ok, I will use more readable tag here.
>>>>>
>>>>>> +	brelse(di_bh);
>>>>>> +
>>>>>> +	up_read(&OCFS2_I(inode)->ip_alloc_sem);
>>>>>> +
>>>>>> +out_unlock1:
>>>>>
>>>>> We should release buffer head here.
>>>>>
>>>>>> +	ocfs2_inode_unlock(inode, 0);
>>>>>> +
>>>>>> +out:
>>>>>> +	return (ret ? 0 : 1);
>>>>>> +}
>>>>>> +
>>>>>>  int ocfs2_seek_data_hole_offset(struct file *file, loff_t *offset, int 
>>>>> whence)
>>>>>>  {
>>>>>>  	struct inode *inode = file->f_mapping->host;
>>>>>> diff --git a/fs/ocfs2/extent_map.h b/fs/ocfs2/extent_map.h
>>>>>> index 67ea57d..fd9e86a 100644
>>>>>> --- a/fs/ocfs2/extent_map.h
>>>>>> +++ b/fs/ocfs2/extent_map.h
>>>>>> @@ -53,6 +53,9 @@ int ocfs2_extent_map_get_blocks(struct inode *inode, u64 
>>>>> v_blkno, u64 *p_blkno,
>>>>>>  int ocfs2_fiemap(struct inode *inode, struct fiemap_extent_info *fieinfo,
>>>>>>  		 u64 map_start, u64 map_len);
>>>>>>  
>>>>>> +int ocfs2_overwrite_io(struct inode *inode, u64 map_start, u64 map_len,
>>>>>> +		       int wait);
>>>>>> +
>>>>>>  int ocfs2_seek_data_hole_offset(struct file *file, loff_t *offset, int 
>>>>> origin);
>>>>>>  
>>>>>>  int ocfs2_xattr_get_clusters(struct inode *inode, u32 v_cluster,
>>>>>>
>>>>
>>>>
>>>> .
>>>>
>> 
>> .
>> 

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [Ocfs2-devel] [PATCH 2/3] ocfs2: add ocfs2_overwrite_io function
  2017-11-28  7:24           ` Gang He
@ 2017-11-28  8:40             ` Joseph Qi
  -1 siblings, 0 replies; 62+ messages in thread
From: Joseph Qi @ 2017-11-28  8:40 UTC (permalink / raw)
  To: Gang He, jlbec, hch, Goldwyn Rodrigues, mfasheh; +Cc: ocfs2-devel, linux-kernel



On 17/11/28 15:24, Gang He wrote:
> Hello Joseph,
> 
> 
>>>>
> 
>>
>> On 17/11/28 11:35, Gang He wrote:
>>> Hello Joseph,
>>>
>>>
>>>>>>
>>>> Hi Gang,
>>>>
>>>> On 17/11/27 17:46, Gang He wrote:
>>>>> Add ocfs2_overwrite_io function, which is used to judge if
>>>>> overwrite allocated blocks, otherwise, the write will bring extra
>>>>> block allocation overhead.
>>>>>
>>>>> Signed-off-by: Gang He <ghe@suse.com>
>>>>> ---
>>>>>  fs/ocfs2/extent_map.c | 67 
>>>> +++++++++++++++++++++++++++++++++++++++++++++++++++
>>>>>  fs/ocfs2/extent_map.h |  3 +++
>>>>>  2 files changed, 70 insertions(+)
>>>>>
>>>>> diff --git a/fs/ocfs2/extent_map.c b/fs/ocfs2/extent_map.c
>>>>> index e4719e0..98bf325 100644
>>>>> --- a/fs/ocfs2/extent_map.c
>>>>> +++ b/fs/ocfs2/extent_map.c
>>>>> @@ -832,6 +832,73 @@ int ocfs2_fiemap(struct inode *inode, struct 
>>>> fiemap_extent_info *fieinfo,
>>>>>  	return ret;
>>>>>  }
>>>>>  
>>>>> +/* Is IO overwriting allocated blocks? */
>>>>> +int ocfs2_overwrite_io(struct inode *inode, u64 map_start, u64 map_len,
>>>>> +		       int wait)
>>>>> +{
>>>>> +	int ret = 0, is_last;
>>>>> +	u32 mapping_end, cpos;
>>>>> +	struct ocfs2_super *osb = OCFS2_SB(inode->i_sb);
>>>>> +	struct buffer_head *di_bh = NULL;
>>>>> +	struct ocfs2_extent_rec rec;
>>>>> +
>>>>> +	if (wait)
>>>>> +		ret = ocfs2_inode_lock(inode, &di_bh, 0);
>>>>> +	else
>>>>> +		ret = ocfs2_try_inode_lock(inode, &di_bh, 0);
>>>>> +	if (ret)
>>>>> +		goto out;
>>>>> +
>>>>> +	if (wait)
>>>>> +		down_read(&OCFS2_I(inode)->ip_alloc_sem);
>>>>> +	else {
>>>>> +		if (!down_read_trylock(&OCFS2_I(inode)->ip_alloc_sem)) {
>>>>> +			ret = -EAGAIN;
>>>>> +			goto out_unlock1;
>>>>> +		}
>>>>> +	}
>>>>> +
>>>>> +	if ((OCFS2_I(inode)->ip_dyn_features & OCFS2_INLINE_DATA_FL) &&
>>>>> +	   ((map_start + map_len) <= i_size_read(inode)))
>>>>> +		goto out_unlock2;
>>>>> +
>>>>> +	cpos = map_start >> osb->s_clustersize_bits;
>>>>> +	mapping_end = ocfs2_clusters_for_bytes(inode->i_sb,
>>>>> +					       map_start + map_len);
>>>>> +	is_last = 0;
>>>>> +	while (cpos < mapping_end && !is_last) {
>>>>> +		ret = ocfs2_get_clusters_nocache(inode, di_bh, cpos,
>>>>> +						 NULL, &rec, &is_last);
>>>>> +		if (ret) {
>>>>> +			mlog_errno(ret);
>>>>> +			goto out_unlock2;
>>>>> +		}
>>>>> +
>>>>> +		if (rec.e_blkno == 0ULL)
>>>>> +			break;
>>>>> +
>>>>> +		if (rec.e_flags & OCFS2_EXT_REFCOUNTED)
>>>>> +			break;
>>>>> +
>>>>> +		cpos = le32_to_cpu(rec.e_cpos) +
>>>>> +			le16_to_cpu(rec.e_leaf_clusters);
>>>>> +	}
>>>>> +
>>>>> +	if (cpos < mapping_end)
>>>>> +		ret = 1;
>>>>> +
>>>>> +out_unlock2:
>>>>> +	brelse(di_bh);
>>>>> +
>>>>> +	up_read(&OCFS2_I(inode)->ip_alloc_sem);
>>>>> +
>>>>> +out_unlock1:
>>>> Should brelse(di_bh) be here?
>>> If the code jumps to out_unlock1 directly, the di_bh pointer should be NULL, 
>> it is not necessary to release.
>>>
>> Umm... No, once going out here, we have already taken inode lock. So
>> di_bh should be released.
> Sorry, you are right.
> 
>>
>>>>
>>>>> +	ocfs2_inode_unlock(inode, 0);
>>>>> +
>>>>> +out:
>>>>> +	return (ret ? 0 : 1);
>>>> I don't think EAGAIN and other error code can be handled the same. We
>>>> have to distinguish them.
>>> Ok, I think we can add one line log to report the error in case the error is 
>> not EAGAIN. 
>>>
>> My point is, there is no need to try again in several cases, e.g. EROFS
>> returned by ocfs2_get_clusters_nocache.
> In this function ocfs2_overwrite_io() only can return True(1) or False(0), then I think we can only give a error print before return true/false.
> It is not necessary to return another value, but should let the user know any possible error message.
>This is because you just ignore the error and convert it to 0 or 1.
But in your next patch, if !ocfs2_overwrite_io(), it will return EGAIN
to upper layer and let it try again.
But in some cases, e.g. EROFS, trying again is meaningless. That's why
we can't simply return 0 or 1 here. Also we have to distinguish the
error code in the next patch.

> Thanks
> Gang 
> 
>>
>>>>
>>>> Thanks,
>>>> Joseph
>>>>
>>>>> +}
>>>>> +
>>>>>  int ocfs2_seek_data_hole_offset(struct file *file, loff_t *offset, int 
>>>> whence)
>>>>>  {
>>>>>  	struct inode *inode = file->f_mapping->host;
>>>>> diff --git a/fs/ocfs2/extent_map.h b/fs/ocfs2/extent_map.h
>>>>> index 67ea57d..fd9e86a 100644
>>>>> --- a/fs/ocfs2/extent_map.h
>>>>> +++ b/fs/ocfs2/extent_map.h
>>>>> @@ -53,6 +53,9 @@ int ocfs2_extent_map_get_blocks(struct inode *inode, u64 
>>>> v_blkno, u64 *p_blkno,
>>>>>  int ocfs2_fiemap(struct inode *inode, struct fiemap_extent_info *fieinfo,
>>>>>  		 u64 map_start, u64 map_len);
>>>>>  
>>>>> +int ocfs2_overwrite_io(struct inode *inode, u64 map_start, u64 map_len,
>>>>> +		       int wait);
>>>>> +
>>>>>  int ocfs2_seek_data_hole_offset(struct file *file, loff_t *offset, int 
>>>> origin);
>>>>>  
>>>>>  int ocfs2_xattr_get_clusters(struct inode *inode, u32 v_cluster,
>>>>>
>>>

^ permalink raw reply	[flat|nested] 62+ messages in thread

* [Ocfs2-devel] [PATCH 2/3] ocfs2: add ocfs2_overwrite_io function
@ 2017-11-28  8:40             ` Joseph Qi
  0 siblings, 0 replies; 62+ messages in thread
From: Joseph Qi @ 2017-11-28  8:40 UTC (permalink / raw)
  To: Gang He, jlbec, hch, Goldwyn Rodrigues, mfasheh; +Cc: ocfs2-devel, linux-kernel



On 17/11/28 15:24, Gang He wrote:
> Hello Joseph,
> 
> 
>>>>
> 
>>
>> On 17/11/28 11:35, Gang He wrote:
>>> Hello Joseph,
>>>
>>>
>>>>>>
>>>> Hi Gang,
>>>>
>>>> On 17/11/27 17:46, Gang He wrote:
>>>>> Add ocfs2_overwrite_io function, which is used to judge if
>>>>> overwrite allocated blocks, otherwise, the write will bring extra
>>>>> block allocation overhead.
>>>>>
>>>>> Signed-off-by: Gang He <ghe@suse.com>
>>>>> ---
>>>>>  fs/ocfs2/extent_map.c | 67 
>>>> +++++++++++++++++++++++++++++++++++++++++++++++++++
>>>>>  fs/ocfs2/extent_map.h |  3 +++
>>>>>  2 files changed, 70 insertions(+)
>>>>>
>>>>> diff --git a/fs/ocfs2/extent_map.c b/fs/ocfs2/extent_map.c
>>>>> index e4719e0..98bf325 100644
>>>>> --- a/fs/ocfs2/extent_map.c
>>>>> +++ b/fs/ocfs2/extent_map.c
>>>>> @@ -832,6 +832,73 @@ int ocfs2_fiemap(struct inode *inode, struct 
>>>> fiemap_extent_info *fieinfo,
>>>>>  	return ret;
>>>>>  }
>>>>>  
>>>>> +/* Is IO overwriting allocated blocks? */
>>>>> +int ocfs2_overwrite_io(struct inode *inode, u64 map_start, u64 map_len,
>>>>> +		       int wait)
>>>>> +{
>>>>> +	int ret = 0, is_last;
>>>>> +	u32 mapping_end, cpos;
>>>>> +	struct ocfs2_super *osb = OCFS2_SB(inode->i_sb);
>>>>> +	struct buffer_head *di_bh = NULL;
>>>>> +	struct ocfs2_extent_rec rec;
>>>>> +
>>>>> +	if (wait)
>>>>> +		ret = ocfs2_inode_lock(inode, &di_bh, 0);
>>>>> +	else
>>>>> +		ret = ocfs2_try_inode_lock(inode, &di_bh, 0);
>>>>> +	if (ret)
>>>>> +		goto out;
>>>>> +
>>>>> +	if (wait)
>>>>> +		down_read(&OCFS2_I(inode)->ip_alloc_sem);
>>>>> +	else {
>>>>> +		if (!down_read_trylock(&OCFS2_I(inode)->ip_alloc_sem)) {
>>>>> +			ret = -EAGAIN;
>>>>> +			goto out_unlock1;
>>>>> +		}
>>>>> +	}
>>>>> +
>>>>> +	if ((OCFS2_I(inode)->ip_dyn_features & OCFS2_INLINE_DATA_FL) &&
>>>>> +	   ((map_start + map_len) <= i_size_read(inode)))
>>>>> +		goto out_unlock2;
>>>>> +
>>>>> +	cpos = map_start >> osb->s_clustersize_bits;
>>>>> +	mapping_end = ocfs2_clusters_for_bytes(inode->i_sb,
>>>>> +					       map_start + map_len);
>>>>> +	is_last = 0;
>>>>> +	while (cpos < mapping_end && !is_last) {
>>>>> +		ret = ocfs2_get_clusters_nocache(inode, di_bh, cpos,
>>>>> +						 NULL, &rec, &is_last);
>>>>> +		if (ret) {
>>>>> +			mlog_errno(ret);
>>>>> +			goto out_unlock2;
>>>>> +		}
>>>>> +
>>>>> +		if (rec.e_blkno == 0ULL)
>>>>> +			break;
>>>>> +
>>>>> +		if (rec.e_flags & OCFS2_EXT_REFCOUNTED)
>>>>> +			break;
>>>>> +
>>>>> +		cpos = le32_to_cpu(rec.e_cpos) +
>>>>> +			le16_to_cpu(rec.e_leaf_clusters);
>>>>> +	}
>>>>> +
>>>>> +	if (cpos < mapping_end)
>>>>> +		ret = 1;
>>>>> +
>>>>> +out_unlock2:
>>>>> +	brelse(di_bh);
>>>>> +
>>>>> +	up_read(&OCFS2_I(inode)->ip_alloc_sem);
>>>>> +
>>>>> +out_unlock1:
>>>> Should brelse(di_bh) be here?
>>> If the code jumps to out_unlock1 directly, the di_bh pointer should be NULL, 
>> it is not necessary to release.
>>>
>> Umm... No, once going out here, we have already taken inode lock. So
>> di_bh should be released.
> Sorry, you are right.
> 
>>
>>>>
>>>>> +	ocfs2_inode_unlock(inode, 0);
>>>>> +
>>>>> +out:
>>>>> +	return (ret ? 0 : 1);
>>>> I don't think EAGAIN and other error code can be handled the same. We
>>>> have to distinguish them.
>>> Ok, I think we can add one line log to report the error in case the error is 
>> not EAGAIN. 
>>>
>> My point is, there is no need to try again in several cases, e.g. EROFS
>> returned by ocfs2_get_clusters_nocache.
> In this function ocfs2_overwrite_io() only can return True(1) or False(0), then I think we can only give a error print before return true/false.
> It is not necessary to return another value, but should let the user know any possible error message.
>This is because you just ignore the error and convert it to 0 or 1.
But in your next patch, if !ocfs2_overwrite_io(), it will return EGAIN
to upper layer and let it try again.
But in some cases, e.g. EROFS, trying again is meaningless. That's why
we can't simply return 0 or 1 here. Also we have to distinguish the
error code in the next patch.

> Thanks
> Gang 
> 
>>
>>>>
>>>> Thanks,
>>>> Joseph
>>>>
>>>>> +}
>>>>> +
>>>>>  int ocfs2_seek_data_hole_offset(struct file *file, loff_t *offset, int 
>>>> whence)
>>>>>  {
>>>>>  	struct inode *inode = file->f_mapping->host;
>>>>> diff --git a/fs/ocfs2/extent_map.h b/fs/ocfs2/extent_map.h
>>>>> index 67ea57d..fd9e86a 100644
>>>>> --- a/fs/ocfs2/extent_map.h
>>>>> +++ b/fs/ocfs2/extent_map.h
>>>>> @@ -53,6 +53,9 @@ int ocfs2_extent_map_get_blocks(struct inode *inode, u64 
>>>> v_blkno, u64 *p_blkno,
>>>>>  int ocfs2_fiemap(struct inode *inode, struct fiemap_extent_info *fieinfo,
>>>>>  		 u64 map_start, u64 map_len);
>>>>>  
>>>>> +int ocfs2_overwrite_io(struct inode *inode, u64 map_start, u64 map_len,
>>>>> +		       int wait);
>>>>> +
>>>>>  int ocfs2_seek_data_hole_offset(struct file *file, loff_t *offset, int 
>>>> origin);
>>>>>  
>>>>>  int ocfs2_xattr_get_clusters(struct inode *inode, u32 v_cluster,
>>>>>
>>>

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [Ocfs2-devel] [PATCH 2/3] ocfs2: add ocfs2_overwrite_io function
  2017-11-28  8:40             ` Joseph Qi
@ 2017-11-28  8:54               ` Gang He
  -1 siblings, 0 replies; 62+ messages in thread
From: Gang He @ 2017-11-28  8:54 UTC (permalink / raw)
  To: jlbec, jiangqi903, hch, Goldwyn Rodrigues, mfasheh
  Cc: ocfs2-devel, linux-kernel

Hi Joseph,


>>> 

> 
> On 17/11/28 15:24, Gang He wrote:
>> Hello Joseph,
>> 
>> 
>>>>>
>> 
>>>
>>> On 17/11/28 11:35, Gang He wrote:
>>>> Hello Joseph,
>>>>
>>>>
>>>>>>>
>>>>> Hi Gang,
>>>>>
>>>>> On 17/11/27 17:46, Gang He wrote:
>>>>>> Add ocfs2_overwrite_io function, which is used to judge if
>>>>>> overwrite allocated blocks, otherwise, the write will bring extra
>>>>>> block allocation overhead.
>>>>>>
>>>>>> Signed-off-by: Gang He <ghe@suse.com>
>>>>>> ---
>>>>>>  fs/ocfs2/extent_map.c | 67 
>>>>> +++++++++++++++++++++++++++++++++++++++++++++++++++
>>>>>>  fs/ocfs2/extent_map.h |  3 +++
>>>>>>  2 files changed, 70 insertions(+)
>>>>>>
>>>>>> diff --git a/fs/ocfs2/extent_map.c b/fs/ocfs2/extent_map.c
>>>>>> index e4719e0..98bf325 100644
>>>>>> --- a/fs/ocfs2/extent_map.c
>>>>>> +++ b/fs/ocfs2/extent_map.c
>>>>>> @@ -832,6 +832,73 @@ int ocfs2_fiemap(struct inode *inode, struct 
>>>>> fiemap_extent_info *fieinfo,
>>>>>>  	return ret;
>>>>>>  }
>>>>>>  
>>>>>> +/* Is IO overwriting allocated blocks? */
>>>>>> +int ocfs2_overwrite_io(struct inode *inode, u64 map_start, u64 map_len,
>>>>>> +		       int wait)
>>>>>> +{
>>>>>> +	int ret = 0, is_last;
>>>>>> +	u32 mapping_end, cpos;
>>>>>> +	struct ocfs2_super *osb = OCFS2_SB(inode->i_sb);
>>>>>> +	struct buffer_head *di_bh = NULL;
>>>>>> +	struct ocfs2_extent_rec rec;
>>>>>> +
>>>>>> +	if (wait)
>>>>>> +		ret = ocfs2_inode_lock(inode, &di_bh, 0);
>>>>>> +	else
>>>>>> +		ret = ocfs2_try_inode_lock(inode, &di_bh, 0);
>>>>>> +	if (ret)
>>>>>> +		goto out;
>>>>>> +
>>>>>> +	if (wait)
>>>>>> +		down_read(&OCFS2_I(inode)->ip_alloc_sem);
>>>>>> +	else {
>>>>>> +		if (!down_read_trylock(&OCFS2_I(inode)->ip_alloc_sem)) {
>>>>>> +			ret = -EAGAIN;
>>>>>> +			goto out_unlock1;
>>>>>> +		}
>>>>>> +	}
>>>>>> +
>>>>>> +	if ((OCFS2_I(inode)->ip_dyn_features & OCFS2_INLINE_DATA_FL) &&
>>>>>> +	   ((map_start + map_len) <= i_size_read(inode)))
>>>>>> +		goto out_unlock2;
>>>>>> +
>>>>>> +	cpos = map_start >> osb->s_clustersize_bits;
>>>>>> +	mapping_end = ocfs2_clusters_for_bytes(inode->i_sb,
>>>>>> +					       map_start + map_len);
>>>>>> +	is_last = 0;
>>>>>> +	while (cpos < mapping_end && !is_last) {
>>>>>> +		ret = ocfs2_get_clusters_nocache(inode, di_bh, cpos,
>>>>>> +						 NULL, &rec, &is_last);
>>>>>> +		if (ret) {
>>>>>> +			mlog_errno(ret);
>>>>>> +			goto out_unlock2;
>>>>>> +		}
>>>>>> +
>>>>>> +		if (rec.e_blkno == 0ULL)
>>>>>> +			break;
>>>>>> +
>>>>>> +		if (rec.e_flags & OCFS2_EXT_REFCOUNTED)
>>>>>> +			break;
>>>>>> +
>>>>>> +		cpos = le32_to_cpu(rec.e_cpos) +
>>>>>> +			le16_to_cpu(rec.e_leaf_clusters);
>>>>>> +	}
>>>>>> +
>>>>>> +	if (cpos < mapping_end)
>>>>>> +		ret = 1;
>>>>>> +
>>>>>> +out_unlock2:
>>>>>> +	brelse(di_bh);
>>>>>> +
>>>>>> +	up_read(&OCFS2_I(inode)->ip_alloc_sem);
>>>>>> +
>>>>>> +out_unlock1:
>>>>> Should brelse(di_bh) be here?
>>>> If the code jumps to out_unlock1 directly, the di_bh pointer should be NULL, 
> 
>>> it is not necessary to release.
>>>>
>>> Umm... No, once going out here, we have already taken inode lock. So
>>> di_bh should be released.
>> Sorry, you are right.
>> 
>>>
>>>>>
>>>>>> +	ocfs2_inode_unlock(inode, 0);
>>>>>> +
>>>>>> +out:
>>>>>> +	return (ret ? 0 : 1);
>>>>> I don't think EAGAIN and other error code can be handled the same. We
>>>>> have to distinguish them.
>>>> Ok, I think we can add one line log to report the error in case the error is 
> 
>>> not EAGAIN. 
>>>>
>>> My point is, there is no need to try again in several cases, e.g. EROFS
>>> returned by ocfs2_get_clusters_nocache.
>> In this function ocfs2_overwrite_io() only can return True(1) or False(0), 
> then I think we can only give a error print before return true/false.
>> It is not necessary to return another value, but should let the user know 
> any possible error message.
>>This is because you just ignore the error and convert it to 0 or 1.
> But in your next patch, if !ocfs2_overwrite_io(), it will return EGAIN
> to upper layer and let it try again.
> But in some cases, e.g. EROFS, trying again is meaningless. That's why
> we can't simply return 0 or 1 here. Also we have to distinguish the
> error code in the next patch.
I think that we have to use the return value if we want to propagate the errorno to the above.
I will change the return value meanings of ocfs2_overwrite_io() function.
return 0 means this is a overwrite allocated block IO.
return -EGAIN means there are some blocks which are not allocated.
return other -ERRNO means there is another error happened.
Does it make sense?

Thanks
Gang

> 
>> Thanks
>> Gang 
>> 
>>>
>>>>>
>>>>> Thanks,
>>>>> Joseph
>>>>>
>>>>>> +}
>>>>>> +
>>>>>>  int ocfs2_seek_data_hole_offset(struct file *file, loff_t *offset, int 
>>>>> whence)
>>>>>>  {
>>>>>>  	struct inode *inode = file->f_mapping->host;
>>>>>> diff --git a/fs/ocfs2/extent_map.h b/fs/ocfs2/extent_map.h
>>>>>> index 67ea57d..fd9e86a 100644
>>>>>> --- a/fs/ocfs2/extent_map.h
>>>>>> +++ b/fs/ocfs2/extent_map.h
>>>>>> @@ -53,6 +53,9 @@ int ocfs2_extent_map_get_blocks(struct inode *inode, u64 
>>>>> v_blkno, u64 *p_blkno,
>>>>>>  int ocfs2_fiemap(struct inode *inode, struct fiemap_extent_info *fieinfo,
>>>>>>  		 u64 map_start, u64 map_len);
>>>>>>  
>>>>>> +int ocfs2_overwrite_io(struct inode *inode, u64 map_start, u64 map_len,
>>>>>> +		       int wait);
>>>>>> +
>>>>>>  int ocfs2_seek_data_hole_offset(struct file *file, loff_t *offset, int 
>>>>> origin);
>>>>>>  
>>>>>>  int ocfs2_xattr_get_clusters(struct inode *inode, u32 v_cluster,
>>>>>>
>>>>

^ permalink raw reply	[flat|nested] 62+ messages in thread

* [Ocfs2-devel] [PATCH 2/3] ocfs2: add ocfs2_overwrite_io function
@ 2017-11-28  8:54               ` Gang He
  0 siblings, 0 replies; 62+ messages in thread
From: Gang He @ 2017-11-28  8:54 UTC (permalink / raw)
  To: jlbec, jiangqi903, hch, Goldwyn Rodrigues, mfasheh
  Cc: ocfs2-devel, linux-kernel

Hi Joseph,


>>> 

> 
> On 17/11/28 15:24, Gang He wrote:
>> Hello Joseph,
>> 
>> 
>>>>>
>> 
>>>
>>> On 17/11/28 11:35, Gang He wrote:
>>>> Hello Joseph,
>>>>
>>>>
>>>>>>>
>>>>> Hi Gang,
>>>>>
>>>>> On 17/11/27 17:46, Gang He wrote:
>>>>>> Add ocfs2_overwrite_io function, which is used to judge if
>>>>>> overwrite allocated blocks, otherwise, the write will bring extra
>>>>>> block allocation overhead.
>>>>>>
>>>>>> Signed-off-by: Gang He <ghe@suse.com>
>>>>>> ---
>>>>>>  fs/ocfs2/extent_map.c | 67 
>>>>> +++++++++++++++++++++++++++++++++++++++++++++++++++
>>>>>>  fs/ocfs2/extent_map.h |  3 +++
>>>>>>  2 files changed, 70 insertions(+)
>>>>>>
>>>>>> diff --git a/fs/ocfs2/extent_map.c b/fs/ocfs2/extent_map.c
>>>>>> index e4719e0..98bf325 100644
>>>>>> --- a/fs/ocfs2/extent_map.c
>>>>>> +++ b/fs/ocfs2/extent_map.c
>>>>>> @@ -832,6 +832,73 @@ int ocfs2_fiemap(struct inode *inode, struct 
>>>>> fiemap_extent_info *fieinfo,
>>>>>>  	return ret;
>>>>>>  }
>>>>>>  
>>>>>> +/* Is IO overwriting allocated blocks? */
>>>>>> +int ocfs2_overwrite_io(struct inode *inode, u64 map_start, u64 map_len,
>>>>>> +		       int wait)
>>>>>> +{
>>>>>> +	int ret = 0, is_last;
>>>>>> +	u32 mapping_end, cpos;
>>>>>> +	struct ocfs2_super *osb = OCFS2_SB(inode->i_sb);
>>>>>> +	struct buffer_head *di_bh = NULL;
>>>>>> +	struct ocfs2_extent_rec rec;
>>>>>> +
>>>>>> +	if (wait)
>>>>>> +		ret = ocfs2_inode_lock(inode, &di_bh, 0);
>>>>>> +	else
>>>>>> +		ret = ocfs2_try_inode_lock(inode, &di_bh, 0);
>>>>>> +	if (ret)
>>>>>> +		goto out;
>>>>>> +
>>>>>> +	if (wait)
>>>>>> +		down_read(&OCFS2_I(inode)->ip_alloc_sem);
>>>>>> +	else {
>>>>>> +		if (!down_read_trylock(&OCFS2_I(inode)->ip_alloc_sem)) {
>>>>>> +			ret = -EAGAIN;
>>>>>> +			goto out_unlock1;
>>>>>> +		}
>>>>>> +	}
>>>>>> +
>>>>>> +	if ((OCFS2_I(inode)->ip_dyn_features & OCFS2_INLINE_DATA_FL) &&
>>>>>> +	   ((map_start + map_len) <= i_size_read(inode)))
>>>>>> +		goto out_unlock2;
>>>>>> +
>>>>>> +	cpos = map_start >> osb->s_clustersize_bits;
>>>>>> +	mapping_end = ocfs2_clusters_for_bytes(inode->i_sb,
>>>>>> +					       map_start + map_len);
>>>>>> +	is_last = 0;
>>>>>> +	while (cpos < mapping_end && !is_last) {
>>>>>> +		ret = ocfs2_get_clusters_nocache(inode, di_bh, cpos,
>>>>>> +						 NULL, &rec, &is_last);
>>>>>> +		if (ret) {
>>>>>> +			mlog_errno(ret);
>>>>>> +			goto out_unlock2;
>>>>>> +		}
>>>>>> +
>>>>>> +		if (rec.e_blkno == 0ULL)
>>>>>> +			break;
>>>>>> +
>>>>>> +		if (rec.e_flags & OCFS2_EXT_REFCOUNTED)
>>>>>> +			break;
>>>>>> +
>>>>>> +		cpos = le32_to_cpu(rec.e_cpos) +
>>>>>> +			le16_to_cpu(rec.e_leaf_clusters);
>>>>>> +	}
>>>>>> +
>>>>>> +	if (cpos < mapping_end)
>>>>>> +		ret = 1;
>>>>>> +
>>>>>> +out_unlock2:
>>>>>> +	brelse(di_bh);
>>>>>> +
>>>>>> +	up_read(&OCFS2_I(inode)->ip_alloc_sem);
>>>>>> +
>>>>>> +out_unlock1:
>>>>> Should brelse(di_bh) be here?
>>>> If the code jumps to out_unlock1 directly, the di_bh pointer should be NULL, 
> 
>>> it is not necessary to release.
>>>>
>>> Umm... No, once going out here, we have already taken inode lock. So
>>> di_bh should be released.
>> Sorry, you are right.
>> 
>>>
>>>>>
>>>>>> +	ocfs2_inode_unlock(inode, 0);
>>>>>> +
>>>>>> +out:
>>>>>> +	return (ret ? 0 : 1);
>>>>> I don't think EAGAIN and other error code can be handled the same. We
>>>>> have to distinguish them.
>>>> Ok, I think we can add one line log to report the error in case the error is 
> 
>>> not EAGAIN. 
>>>>
>>> My point is, there is no need to try again in several cases, e.g. EROFS
>>> returned by ocfs2_get_clusters_nocache.
>> In this function ocfs2_overwrite_io() only can return True(1) or False(0), 
> then I think we can only give a error print before return true/false.
>> It is not necessary to return another value, but should let the user know 
> any possible error message.
>>This is because you just ignore the error and convert it to 0 or 1.
> But in your next patch, if !ocfs2_overwrite_io(), it will return EGAIN
> to upper layer and let it try again.
> But in some cases, e.g. EROFS, trying again is meaningless. That's why
> we can't simply return 0 or 1 here. Also we have to distinguish the
> error code in the next patch.
I think that we have to use the return value if we want to propagate the errorno to the above.
I will change the return value meanings of ocfs2_overwrite_io() function.
return 0 means this is a overwrite allocated block IO.
return -EGAIN means there are some blocks which are not allocated.
return other -ERRNO means there is another error happened.
Does it make sense?

Thanks
Gang

> 
>> Thanks
>> Gang 
>> 
>>>
>>>>>
>>>>> Thanks,
>>>>> Joseph
>>>>>
>>>>>> +}
>>>>>> +
>>>>>>  int ocfs2_seek_data_hole_offset(struct file *file, loff_t *offset, int 
>>>>> whence)
>>>>>>  {
>>>>>>  	struct inode *inode = file->f_mapping->host;
>>>>>> diff --git a/fs/ocfs2/extent_map.h b/fs/ocfs2/extent_map.h
>>>>>> index 67ea57d..fd9e86a 100644
>>>>>> --- a/fs/ocfs2/extent_map.h
>>>>>> +++ b/fs/ocfs2/extent_map.h
>>>>>> @@ -53,6 +53,9 @@ int ocfs2_extent_map_get_blocks(struct inode *inode, u64 
>>>>> v_blkno, u64 *p_blkno,
>>>>>>  int ocfs2_fiemap(struct inode *inode, struct fiemap_extent_info *fieinfo,
>>>>>>  		 u64 map_start, u64 map_len);
>>>>>>  
>>>>>> +int ocfs2_overwrite_io(struct inode *inode, u64 map_start, u64 map_len,
>>>>>> +		       int wait);
>>>>>> +
>>>>>>  int ocfs2_seek_data_hole_offset(struct file *file, loff_t *offset, int 
>>>>> origin);
>>>>>>  
>>>>>>  int ocfs2_xattr_get_clusters(struct inode *inode, u32 v_cluster,
>>>>>>
>>>>

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [Ocfs2-devel] [PATCH 2/3] ocfs2: add ocfs2_overwrite_io function
  2017-11-28  8:54               ` Gang He
@ 2017-11-28  9:03                 ` Joseph Qi
  -1 siblings, 0 replies; 62+ messages in thread
From: Joseph Qi @ 2017-11-28  9:03 UTC (permalink / raw)
  To: Gang He, jlbec, hch, Goldwyn Rodrigues, mfasheh; +Cc: ocfs2-devel, linux-kernel



On 17/11/28 16:54, Gang He wrote:
> Hi Joseph,
> 
> 
>>>>
> 
>>
>> On 17/11/28 15:24, Gang He wrote:
>>> Hello Joseph,
>>>
>>>
>>>>>>
>>>
>>>>
>>>> On 17/11/28 11:35, Gang He wrote:
>>>>> Hello Joseph,
>>>>>
>>>>>
>>>>>>>>
>>>>>> Hi Gang,
>>>>>>
>>>>>> On 17/11/27 17:46, Gang He wrote:
>>>>>>> Add ocfs2_overwrite_io function, which is used to judge if
>>>>>>> overwrite allocated blocks, otherwise, the write will bring extra
>>>>>>> block allocation overhead.
>>>>>>>
>>>>>>> Signed-off-by: Gang He <ghe@suse.com>
>>>>>>> ---
>>>>>>>  fs/ocfs2/extent_map.c | 67 
>>>>>> +++++++++++++++++++++++++++++++++++++++++++++++++++
>>>>>>>  fs/ocfs2/extent_map.h |  3 +++
>>>>>>>  2 files changed, 70 insertions(+)
>>>>>>>
>>>>>>> diff --git a/fs/ocfs2/extent_map.c b/fs/ocfs2/extent_map.c
>>>>>>> index e4719e0..98bf325 100644
>>>>>>> --- a/fs/ocfs2/extent_map.c
>>>>>>> +++ b/fs/ocfs2/extent_map.c
>>>>>>> @@ -832,6 +832,73 @@ int ocfs2_fiemap(struct inode *inode, struct 
>>>>>> fiemap_extent_info *fieinfo,
>>>>>>>  	return ret;
>>>>>>>  }
>>>>>>>  
>>>>>>> +/* Is IO overwriting allocated blocks? */
>>>>>>> +int ocfs2_overwrite_io(struct inode *inode, u64 map_start, u64 map_len,
>>>>>>> +		       int wait)
>>>>>>> +{
>>>>>>> +	int ret = 0, is_last;
>>>>>>> +	u32 mapping_end, cpos;
>>>>>>> +	struct ocfs2_super *osb = OCFS2_SB(inode->i_sb);
>>>>>>> +	struct buffer_head *di_bh = NULL;
>>>>>>> +	struct ocfs2_extent_rec rec;
>>>>>>> +
>>>>>>> +	if (wait)
>>>>>>> +		ret = ocfs2_inode_lock(inode, &di_bh, 0);
>>>>>>> +	else
>>>>>>> +		ret = ocfs2_try_inode_lock(inode, &di_bh, 0);
>>>>>>> +	if (ret)
>>>>>>> +		goto out;
>>>>>>> +
>>>>>>> +	if (wait)
>>>>>>> +		down_read(&OCFS2_I(inode)->ip_alloc_sem);
>>>>>>> +	else {
>>>>>>> +		if (!down_read_trylock(&OCFS2_I(inode)->ip_alloc_sem)) {
>>>>>>> +			ret = -EAGAIN;
>>>>>>> +			goto out_unlock1;
>>>>>>> +		}
>>>>>>> +	}
>>>>>>> +
>>>>>>> +	if ((OCFS2_I(inode)->ip_dyn_features & OCFS2_INLINE_DATA_FL) &&
>>>>>>> +	   ((map_start + map_len) <= i_size_read(inode)))
>>>>>>> +		goto out_unlock2;
>>>>>>> +
>>>>>>> +	cpos = map_start >> osb->s_clustersize_bits;
>>>>>>> +	mapping_end = ocfs2_clusters_for_bytes(inode->i_sb,
>>>>>>> +					       map_start + map_len);
>>>>>>> +	is_last = 0;
>>>>>>> +	while (cpos < mapping_end && !is_last) {
>>>>>>> +		ret = ocfs2_get_clusters_nocache(inode, di_bh, cpos,
>>>>>>> +						 NULL, &rec, &is_last);
>>>>>>> +		if (ret) {
>>>>>>> +			mlog_errno(ret);
>>>>>>> +			goto out_unlock2;
>>>>>>> +		}
>>>>>>> +
>>>>>>> +		if (rec.e_blkno == 0ULL)
>>>>>>> +			break;
>>>>>>> +
>>>>>>> +		if (rec.e_flags & OCFS2_EXT_REFCOUNTED)
>>>>>>> +			break;
>>>>>>> +
>>>>>>> +		cpos = le32_to_cpu(rec.e_cpos) +
>>>>>>> +			le16_to_cpu(rec.e_leaf_clusters);
>>>>>>> +	}
>>>>>>> +
>>>>>>> +	if (cpos < mapping_end)
>>>>>>> +		ret = 1;
>>>>>>> +
>>>>>>> +out_unlock2:
>>>>>>> +	brelse(di_bh);
>>>>>>> +
>>>>>>> +	up_read(&OCFS2_I(inode)->ip_alloc_sem);
>>>>>>> +
>>>>>>> +out_unlock1:
>>>>>> Should brelse(di_bh) be here?
>>>>> If the code jumps to out_unlock1 directly, the di_bh pointer should be NULL, 
>>
>>>> it is not necessary to release.
>>>>>
>>>> Umm... No, once going out here, we have already taken inode lock. So
>>>> di_bh should be released.
>>> Sorry, you are right.
>>>
>>>>
>>>>>>
>>>>>>> +	ocfs2_inode_unlock(inode, 0);
>>>>>>> +
>>>>>>> +out:
>>>>>>> +	return (ret ? 0 : 1);
>>>>>> I don't think EAGAIN and other error code can be handled the same. We
>>>>>> have to distinguish them.
>>>>> Ok, I think we can add one line log to report the error in case the error is 
>>
>>>> not EAGAIN. 
>>>>>
>>>> My point is, there is no need to try again in several cases, e.g. EROFS
>>>> returned by ocfs2_get_clusters_nocache.
>>> In this function ocfs2_overwrite_io() only can return True(1) or False(0), 
>> then I think we can only give a error print before return true/false.
>>> It is not necessary to return another value, but should let the user know 
>> any possible error message.
>>> This is because you just ignore the error and convert it to 0 or 1.
>> But in your next patch, if !ocfs2_overwrite_io(), it will return EGAIN
>> to upper layer and let it try again.
>> But in some cases, e.g. EROFS, trying again is meaningless. That's why
>> we can't simply return 0 or 1 here. Also we have to distinguish the
>> error code in the next patch.
> I think that we have to use the return value if we want to propagate the errorno to the above.
> I will change the return value meanings of ocfs2_overwrite_io() function.
> return 0 means this is a overwrite allocated block IO.
> return -EGAIN means there are some blocks which are not allocated.
> return other -ERRNO means there is another error happened.
> Does it make sense?
> 
Yes, that looks fine to me.
We have to make sure the returned EAGAIN to upper layer is really
*EAGAIN*.

>>>>>>
>>>>>>> +}
>>>>>>> +
>>>>>>>  int ocfs2_seek_data_hole_offset(struct file *file, loff_t *offset, int 
>>>>>> whence)
>>>>>>>  {
>>>>>>>  	struct inode *inode = file->f_mapping->host;
>>>>>>> diff --git a/fs/ocfs2/extent_map.h b/fs/ocfs2/extent_map.h
>>>>>>> index 67ea57d..fd9e86a 100644
>>>>>>> --- a/fs/ocfs2/extent_map.h
>>>>>>> +++ b/fs/ocfs2/extent_map.h
>>>>>>> @@ -53,6 +53,9 @@ int ocfs2_extent_map_get_blocks(struct inode *inode, u64 
>>>>>> v_blkno, u64 *p_blkno,
>>>>>>>  int ocfs2_fiemap(struct inode *inode, struct fiemap_extent_info *fieinfo,
>>>>>>>  		 u64 map_start, u64 map_len);
>>>>>>>  
>>>>>>> +int ocfs2_overwrite_io(struct inode *inode, u64 map_start, u64 map_len,
>>>>>>> +		       int wait);
>>>>>>> +
>>>>>>>  int ocfs2_seek_data_hole_offset(struct file *file, loff_t *offset, int 
>>>>>> origin);
>>>>>>>  
>>>>>>>  int ocfs2_xattr_get_clusters(struct inode *inode, u32 v_cluster,
>>>>>>>
>>>>>

^ permalink raw reply	[flat|nested] 62+ messages in thread

* [Ocfs2-devel] [PATCH 2/3] ocfs2: add ocfs2_overwrite_io function
@ 2017-11-28  9:03                 ` Joseph Qi
  0 siblings, 0 replies; 62+ messages in thread
From: Joseph Qi @ 2017-11-28  9:03 UTC (permalink / raw)
  To: Gang He, jlbec, hch, Goldwyn Rodrigues, mfasheh; +Cc: ocfs2-devel, linux-kernel



On 17/11/28 16:54, Gang He wrote:
> Hi Joseph,
> 
> 
>>>>
> 
>>
>> On 17/11/28 15:24, Gang He wrote:
>>> Hello Joseph,
>>>
>>>
>>>>>>
>>>
>>>>
>>>> On 17/11/28 11:35, Gang He wrote:
>>>>> Hello Joseph,
>>>>>
>>>>>
>>>>>>>>
>>>>>> Hi Gang,
>>>>>>
>>>>>> On 17/11/27 17:46, Gang He wrote:
>>>>>>> Add ocfs2_overwrite_io function, which is used to judge if
>>>>>>> overwrite allocated blocks, otherwise, the write will bring extra
>>>>>>> block allocation overhead.
>>>>>>>
>>>>>>> Signed-off-by: Gang He <ghe@suse.com>
>>>>>>> ---
>>>>>>>  fs/ocfs2/extent_map.c | 67 
>>>>>> +++++++++++++++++++++++++++++++++++++++++++++++++++
>>>>>>>  fs/ocfs2/extent_map.h |  3 +++
>>>>>>>  2 files changed, 70 insertions(+)
>>>>>>>
>>>>>>> diff --git a/fs/ocfs2/extent_map.c b/fs/ocfs2/extent_map.c
>>>>>>> index e4719e0..98bf325 100644
>>>>>>> --- a/fs/ocfs2/extent_map.c
>>>>>>> +++ b/fs/ocfs2/extent_map.c
>>>>>>> @@ -832,6 +832,73 @@ int ocfs2_fiemap(struct inode *inode, struct 
>>>>>> fiemap_extent_info *fieinfo,
>>>>>>>  	return ret;
>>>>>>>  }
>>>>>>>  
>>>>>>> +/* Is IO overwriting allocated blocks? */
>>>>>>> +int ocfs2_overwrite_io(struct inode *inode, u64 map_start, u64 map_len,
>>>>>>> +		       int wait)
>>>>>>> +{
>>>>>>> +	int ret = 0, is_last;
>>>>>>> +	u32 mapping_end, cpos;
>>>>>>> +	struct ocfs2_super *osb = OCFS2_SB(inode->i_sb);
>>>>>>> +	struct buffer_head *di_bh = NULL;
>>>>>>> +	struct ocfs2_extent_rec rec;
>>>>>>> +
>>>>>>> +	if (wait)
>>>>>>> +		ret = ocfs2_inode_lock(inode, &di_bh, 0);
>>>>>>> +	else
>>>>>>> +		ret = ocfs2_try_inode_lock(inode, &di_bh, 0);
>>>>>>> +	if (ret)
>>>>>>> +		goto out;
>>>>>>> +
>>>>>>> +	if (wait)
>>>>>>> +		down_read(&OCFS2_I(inode)->ip_alloc_sem);
>>>>>>> +	else {
>>>>>>> +		if (!down_read_trylock(&OCFS2_I(inode)->ip_alloc_sem)) {
>>>>>>> +			ret = -EAGAIN;
>>>>>>> +			goto out_unlock1;
>>>>>>> +		}
>>>>>>> +	}
>>>>>>> +
>>>>>>> +	if ((OCFS2_I(inode)->ip_dyn_features & OCFS2_INLINE_DATA_FL) &&
>>>>>>> +	   ((map_start + map_len) <= i_size_read(inode)))
>>>>>>> +		goto out_unlock2;
>>>>>>> +
>>>>>>> +	cpos = map_start >> osb->s_clustersize_bits;
>>>>>>> +	mapping_end = ocfs2_clusters_for_bytes(inode->i_sb,
>>>>>>> +					       map_start + map_len);
>>>>>>> +	is_last = 0;
>>>>>>> +	while (cpos < mapping_end && !is_last) {
>>>>>>> +		ret = ocfs2_get_clusters_nocache(inode, di_bh, cpos,
>>>>>>> +						 NULL, &rec, &is_last);
>>>>>>> +		if (ret) {
>>>>>>> +			mlog_errno(ret);
>>>>>>> +			goto out_unlock2;
>>>>>>> +		}
>>>>>>> +
>>>>>>> +		if (rec.e_blkno == 0ULL)
>>>>>>> +			break;
>>>>>>> +
>>>>>>> +		if (rec.e_flags & OCFS2_EXT_REFCOUNTED)
>>>>>>> +			break;
>>>>>>> +
>>>>>>> +		cpos = le32_to_cpu(rec.e_cpos) +
>>>>>>> +			le16_to_cpu(rec.e_leaf_clusters);
>>>>>>> +	}
>>>>>>> +
>>>>>>> +	if (cpos < mapping_end)
>>>>>>> +		ret = 1;
>>>>>>> +
>>>>>>> +out_unlock2:
>>>>>>> +	brelse(di_bh);
>>>>>>> +
>>>>>>> +	up_read(&OCFS2_I(inode)->ip_alloc_sem);
>>>>>>> +
>>>>>>> +out_unlock1:
>>>>>> Should brelse(di_bh) be here?
>>>>> If the code jumps to out_unlock1 directly, the di_bh pointer should be NULL, 
>>
>>>> it is not necessary to release.
>>>>>
>>>> Umm... No, once going out here, we have already taken inode lock. So
>>>> di_bh should be released.
>>> Sorry, you are right.
>>>
>>>>
>>>>>>
>>>>>>> +	ocfs2_inode_unlock(inode, 0);
>>>>>>> +
>>>>>>> +out:
>>>>>>> +	return (ret ? 0 : 1);
>>>>>> I don't think EAGAIN and other error code can be handled the same. We
>>>>>> have to distinguish them.
>>>>> Ok, I think we can add one line log to report the error in case the error is 
>>
>>>> not EAGAIN. 
>>>>>
>>>> My point is, there is no need to try again in several cases, e.g. EROFS
>>>> returned by ocfs2_get_clusters_nocache.
>>> In this function ocfs2_overwrite_io() only can return True(1) or False(0), 
>> then I think we can only give a error print before return true/false.
>>> It is not necessary to return another value, but should let the user know 
>> any possible error message.
>>> This is because you just ignore the error and convert it to 0 or 1.
>> But in your next patch, if !ocfs2_overwrite_io(), it will return EGAIN
>> to upper layer and let it try again.
>> But in some cases, e.g. EROFS, trying again is meaningless. That's why
>> we can't simply return 0 or 1 here. Also we have to distinguish the
>> error code in the next patch.
> I think that we have to use the return value if we want to propagate the errorno to the above.
> I will change the return value meanings of ocfs2_overwrite_io() function.
> return 0 means this is a overwrite allocated block IO.
> return -EGAIN means there are some blocks which are not allocated.
> return other -ERRNO means there is another error happened.
> Does it make sense?
> 
Yes, that looks fine to me.
We have to make sure the returned EAGAIN to upper layer is really
*EAGAIN*.

>>>>>>
>>>>>>> +}
>>>>>>> +
>>>>>>>  int ocfs2_seek_data_hole_offset(struct file *file, loff_t *offset, int 
>>>>>> whence)
>>>>>>>  {
>>>>>>>  	struct inode *inode = file->f_mapping->host;
>>>>>>> diff --git a/fs/ocfs2/extent_map.h b/fs/ocfs2/extent_map.h
>>>>>>> index 67ea57d..fd9e86a 100644
>>>>>>> --- a/fs/ocfs2/extent_map.h
>>>>>>> +++ b/fs/ocfs2/extent_map.h
>>>>>>> @@ -53,6 +53,9 @@ int ocfs2_extent_map_get_blocks(struct inode *inode, u64 
>>>>>> v_blkno, u64 *p_blkno,
>>>>>>>  int ocfs2_fiemap(struct inode *inode, struct fiemap_extent_info *fieinfo,
>>>>>>>  		 u64 map_start, u64 map_len);
>>>>>>>  
>>>>>>> +int ocfs2_overwrite_io(struct inode *inode, u64 map_start, u64 map_len,
>>>>>>> +		       int wait);
>>>>>>> +
>>>>>>>  int ocfs2_seek_data_hole_offset(struct file *file, loff_t *offset, int 
>>>>>> origin);
>>>>>>>  
>>>>>>>  int ocfs2_xattr_get_clusters(struct inode *inode, u32 v_cluster,
>>>>>>>
>>>>>

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [Ocfs2-devel] [PATCH 2/3] ocfs2: add ocfs2_overwrite_io function
  2017-11-28  8:32               ` Gang He
@ 2017-11-28 13:22                 ` alex chen
  -1 siblings, 0 replies; 62+ messages in thread
From: alex chen @ 2017-11-28 13:22 UTC (permalink / raw)
  To: Gang He; +Cc: jlbec, hch, ocfs2-devel, Goldwyn Rodrigues, mfasheh, linux-kernel

Hi Gang,

On 2017/11/28 16:32, Gang He wrote:
> Hi Alex,
> 
> 
>>>>
>> Hi Gang,
>>
>> On 2017/11/28 15:38, Gang He wrote:
>>> Hi Alex,
>>>
>>>
>>>>>>
>>>> Hi Gang,
>>>>
>>>> On 2017/11/28 13:33, Gang He wrote:
>>>>> Hello Alex,
>>>>>
>>>>>
>>>>>>>>
>>>>>> Hi Gang,
>>>>>>
>>>>>> On 2017/11/27 17:46, Gang He wrote:
>>>>>>> Add ocfs2_overwrite_io function, which is used to judge if
>>>>>>> overwrite allocated blocks, otherwise, the write will bring extra
>>>>>>> block allocation overhead.
>>>>>>>
>>>>>>> Signed-off-by: Gang He <ghe@suse.com>
>>>>>>> ---
>>>>>>>  fs/ocfs2/extent_map.c | 67 
>>>>>> +++++++++++++++++++++++++++++++++++++++++++++++++++
>>>>>>>  fs/ocfs2/extent_map.h |  3 +++
>>>>>>>  2 files changed, 70 insertions(+)
>>>>>>>
>>>>>>> diff --git a/fs/ocfs2/extent_map.c b/fs/ocfs2/extent_map.c
>>>>>>> index e4719e0..98bf325 100644
>>>>>>> --- a/fs/ocfs2/extent_map.c
>>>>>>> +++ b/fs/ocfs2/extent_map.c
>>>>>>> @@ -832,6 +832,73 @@ int ocfs2_fiemap(struct inode *inode, struct 
>>>>>> fiemap_extent_info *fieinfo,
>>>>>>>  	return ret;
>>>>>>>  }
>>>>>>>  
>>>>>>> +/* Is IO overwriting allocated blocks? */
>>>>>>> +int ocfs2_overwrite_io(struct inode *inode, u64 map_start, u64 map_len,
>>>>>>> +		       int wait)
>>>>>>> +{
>>>>>>> +	int ret = 0, is_last;
>>>>>>> +	u32 mapping_end, cpos;
>>>>>>> +	struct ocfs2_super *osb = OCFS2_SB(inode->i_sb);
>>>>>>> +	struct buffer_head *di_bh = NULL;
>>>>>>> +	struct ocfs2_extent_rec rec;
>>>>>>> +
>>>>>>> +	if (wait)
>>>>>>> +		ret = ocfs2_inode_lock(inode, &di_bh, 0);
>>>>>>> +	else
>>>>>>> +		ret = ocfs2_try_inode_lock(inode, &di_bh, 0);
>>>>>>> +	if (ret)
>>>>>>> +		goto out;
>>>>>>> +
>>>>>>> +	if (wait)
>>>>>>> +		down_read(&OCFS2_I(inode)->ip_alloc_sem);
>>>>>>> +	else {
>>>>>>> +		if (!down_read_trylock(&OCFS2_I(inode)->ip_alloc_sem)) {
>>>>>>> +			ret = -EAGAIN;
>>>>>>> +			goto out_unlock1;
>>>>>>> +		}
>>>>>>> +	}
>>>>>>> +
>>>>>>> +	if ((OCFS2_I(inode)->ip_dyn_features & OCFS2_INLINE_DATA_FL) &&
>>>>>>> +	   ((map_start + map_len) <= i_size_read(inode)))
>>>>>>> +		goto out_unlock2;
>>>>>>> +
>>>>>>> +	cpos = map_start >> osb->s_clustersize_bits;
>>>>>>> +	mapping_end = ocfs2_clusters_for_bytes(inode->i_sb,
>>>>>>> +					       map_start + map_len);
>>>>>>> +	is_last = 0;
>>>>>>> +	while (cpos < mapping_end && !is_last) {
>>>>>>> +		ret = ocfs2_get_clusters_nocache(inode, di_bh, cpos,
>>>>>>> +						 NULL, &rec, &is_last);
>>>>>>> +		if (ret) {
>>>>>>> +			mlog_errno(ret);
>>>>>>> +			goto out_unlock2;
>>>>>>> +		}
>>>>>>> +
>>>>>>> +		if (rec.e_blkno == 0ULL)
>>>>>>> +			break;
>>>>>> I think here the blocks is not overwrite, because the hold is found and the 
>>>>>> blocks
>>>>>> should be allocated.
>>>>> If the rec.e_blkno == NULL, this means there is a hole.
>>>>> The file hole means that these blocks are not allocated, it does not like 
>>>> unwritten block.
>>>>> The unwritten blocks means that these blocks are allocated, but still have 
>>>> not been unwritten. 
>>>>>
>>>> If we break the loop when we find the hold, out of this function we will 
>>>> allocate the blocks in
>>>> ocfs2_file_write_iter()->..->ocfs2_direct_IO->__blockdev_direct_IO->..->ocfs2_dio_wr_g
>>>> et_block()
>>>> ->ocfs2_write_begin_nolock. Does this violate the semantics of 'IOCB_NOWAIT';
>>> Yes, then we need to check if this is a overwrite before doing direct-io.
>>>
>>
>> I mean here we should return 0 instead of break and we should immediately 
>> return -EAGAIN
>> to upper apps, otherwise, some block allocation will be happen, which 
>> violates the
>> semantics of 'IOCB_NOWAIT'.
> Before we do a direct-io, I need to check if this is a overwrite allocated blocks IO.
> If not, we will return  -EAGAIN in 'IOCB_NOWAIT' mode. this should not trigger any block allocation.
> I am not sure if we understand your concern totally.
> 

Yes, your description is correct.
So we should return 0 instead of break when we find the hold in ocfs2_overwrite_io();

> Thanks
> Gang 
> 
>>
>> Thanks,
>> Alex
>>
>>>>
>>>> BTW, should we consider the down_write() and ocfs2_inode_lock() in 
>>>> ocfs2_dio_wr_get_block() when
>>>> the flag 'IOCB_NOWAIT' is set;
>>> I think that we should not consider that layer lock, otherwise, the code 
>> change will become more and more complex and big.
>>> I also refer to ext4 file system code change for this 
>> feature(728fbc0e10b7f3ce2ee043b32e3453fd5201c055), they did not do any change 
>> in that layer.
>>>
>>
>> OK.
>>
>>> Thanks
>>> Gang
>>>
>>>>
>>>>>>> +
>>>>>>> +		if (rec.e_flags & OCFS2_EXT_REFCOUNTED)
>>>>>>> +			break;
>>>>>>> +
>>>>>>> +		cpos = le32_to_cpu(rec.e_cpos) +
>>>>>>> +			le16_to_cpu(rec.e_leaf_clusters);
>>>>>>> +	}
>>>>>>> +
>>>>>>> +	if (cpos < mapping_end)
>>>>>>> +		ret = 1;
>>>>>>> +
>>>>>>> +out_unlock2:
>>>>>>
>>>>>> I think the 'out_up_read' is more readable than the 'out_unlock2' .
>>>>> Ok, I will use more readable tag here.
>>>>>>
>>>>>>> +	brelse(di_bh);
>>>>>>> +
>>>>>>> +	up_read(&OCFS2_I(inode)->ip_alloc_sem);
>>>>>>> +
>>>>>>> +out_unlock1:
>>>>>>
>>>>>> We should release buffer head here.
>>>>>>
>>>>>>> +	ocfs2_inode_unlock(inode, 0);
>>>>>>> +
>>>>>>> +out:
>>>>>>> +	return (ret ? 0 : 1);
>>>>>>> +}
>>>>>>> +
>>>>>>>  int ocfs2_seek_data_hole_offset(struct file *file, loff_t *offset, int 
>>>>>> whence)
>>>>>>>  {
>>>>>>>  	struct inode *inode = file->f_mapping->host;
>>>>>>> diff --git a/fs/ocfs2/extent_map.h b/fs/ocfs2/extent_map.h
>>>>>>> index 67ea57d..fd9e86a 100644
>>>>>>> --- a/fs/ocfs2/extent_map.h
>>>>>>> +++ b/fs/ocfs2/extent_map.h
>>>>>>> @@ -53,6 +53,9 @@ int ocfs2_extent_map_get_blocks(struct inode *inode, u64 
>>>>>> v_blkno, u64 *p_blkno,
>>>>>>>  int ocfs2_fiemap(struct inode *inode, struct fiemap_extent_info *fieinfo,
>>>>>>>  		 u64 map_start, u64 map_len);
>>>>>>>  
>>>>>>> +int ocfs2_overwrite_io(struct inode *inode, u64 map_start, u64 map_len,
>>>>>>> +		       int wait);
>>>>>>> +
>>>>>>>  int ocfs2_seek_data_hole_offset(struct file *file, loff_t *offset, int 
>>>>>> origin);
>>>>>>>  
>>>>>>>  int ocfs2_xattr_get_clusters(struct inode *inode, u32 v_cluster,
>>>>>>>
>>>>>
>>>>>
>>>>> .
>>>>>
>>>
>>> .
>>>
> 
> .
> 

^ permalink raw reply	[flat|nested] 62+ messages in thread

* [Ocfs2-devel] [PATCH 2/3] ocfs2: add ocfs2_overwrite_io function
@ 2017-11-28 13:22                 ` alex chen
  0 siblings, 0 replies; 62+ messages in thread
From: alex chen @ 2017-11-28 13:22 UTC (permalink / raw)
  To: Gang He; +Cc: jlbec, hch, ocfs2-devel, Goldwyn Rodrigues, mfasheh, linux-kernel

Hi Gang,

On 2017/11/28 16:32, Gang He wrote:
> Hi Alex,
> 
> 
>>>>
>> Hi Gang,
>>
>> On 2017/11/28 15:38, Gang He wrote:
>>> Hi Alex,
>>>
>>>
>>>>>>
>>>> Hi Gang,
>>>>
>>>> On 2017/11/28 13:33, Gang He wrote:
>>>>> Hello Alex,
>>>>>
>>>>>
>>>>>>>>
>>>>>> Hi Gang,
>>>>>>
>>>>>> On 2017/11/27 17:46, Gang He wrote:
>>>>>>> Add ocfs2_overwrite_io function, which is used to judge if
>>>>>>> overwrite allocated blocks, otherwise, the write will bring extra
>>>>>>> block allocation overhead.
>>>>>>>
>>>>>>> Signed-off-by: Gang He <ghe@suse.com>
>>>>>>> ---
>>>>>>>  fs/ocfs2/extent_map.c | 67 
>>>>>> +++++++++++++++++++++++++++++++++++++++++++++++++++
>>>>>>>  fs/ocfs2/extent_map.h |  3 +++
>>>>>>>  2 files changed, 70 insertions(+)
>>>>>>>
>>>>>>> diff --git a/fs/ocfs2/extent_map.c b/fs/ocfs2/extent_map.c
>>>>>>> index e4719e0..98bf325 100644
>>>>>>> --- a/fs/ocfs2/extent_map.c
>>>>>>> +++ b/fs/ocfs2/extent_map.c
>>>>>>> @@ -832,6 +832,73 @@ int ocfs2_fiemap(struct inode *inode, struct 
>>>>>> fiemap_extent_info *fieinfo,
>>>>>>>  	return ret;
>>>>>>>  }
>>>>>>>  
>>>>>>> +/* Is IO overwriting allocated blocks? */
>>>>>>> +int ocfs2_overwrite_io(struct inode *inode, u64 map_start, u64 map_len,
>>>>>>> +		       int wait)
>>>>>>> +{
>>>>>>> +	int ret = 0, is_last;
>>>>>>> +	u32 mapping_end, cpos;
>>>>>>> +	struct ocfs2_super *osb = OCFS2_SB(inode->i_sb);
>>>>>>> +	struct buffer_head *di_bh = NULL;
>>>>>>> +	struct ocfs2_extent_rec rec;
>>>>>>> +
>>>>>>> +	if (wait)
>>>>>>> +		ret = ocfs2_inode_lock(inode, &di_bh, 0);
>>>>>>> +	else
>>>>>>> +		ret = ocfs2_try_inode_lock(inode, &di_bh, 0);
>>>>>>> +	if (ret)
>>>>>>> +		goto out;
>>>>>>> +
>>>>>>> +	if (wait)
>>>>>>> +		down_read(&OCFS2_I(inode)->ip_alloc_sem);
>>>>>>> +	else {
>>>>>>> +		if (!down_read_trylock(&OCFS2_I(inode)->ip_alloc_sem)) {
>>>>>>> +			ret = -EAGAIN;
>>>>>>> +			goto out_unlock1;
>>>>>>> +		}
>>>>>>> +	}
>>>>>>> +
>>>>>>> +	if ((OCFS2_I(inode)->ip_dyn_features & OCFS2_INLINE_DATA_FL) &&
>>>>>>> +	   ((map_start + map_len) <= i_size_read(inode)))
>>>>>>> +		goto out_unlock2;
>>>>>>> +
>>>>>>> +	cpos = map_start >> osb->s_clustersize_bits;
>>>>>>> +	mapping_end = ocfs2_clusters_for_bytes(inode->i_sb,
>>>>>>> +					       map_start + map_len);
>>>>>>> +	is_last = 0;
>>>>>>> +	while (cpos < mapping_end && !is_last) {
>>>>>>> +		ret = ocfs2_get_clusters_nocache(inode, di_bh, cpos,
>>>>>>> +						 NULL, &rec, &is_last);
>>>>>>> +		if (ret) {
>>>>>>> +			mlog_errno(ret);
>>>>>>> +			goto out_unlock2;
>>>>>>> +		}
>>>>>>> +
>>>>>>> +		if (rec.e_blkno == 0ULL)
>>>>>>> +			break;
>>>>>> I think here the blocks is not overwrite, because the hold is found and the 
>>>>>> blocks
>>>>>> should be allocated.
>>>>> If the rec.e_blkno == NULL, this means there is a hole.
>>>>> The file hole means that these blocks are not allocated, it does not like 
>>>> unwritten block.
>>>>> The unwritten blocks means that these blocks are allocated, but still have 
>>>> not been unwritten. 
>>>>>
>>>> If we break the loop when we find the hold, out of this function we will 
>>>> allocate the blocks in
>>>> ocfs2_file_write_iter()->..->ocfs2_direct_IO->__blockdev_direct_IO->..->ocfs2_dio_wr_g
>>>> et_block()
>>>> ->ocfs2_write_begin_nolock. Does this violate the semantics of 'IOCB_NOWAIT';
>>> Yes, then we need to check if this is a overwrite before doing direct-io.
>>>
>>
>> I mean here we should return 0 instead of break and we should immediately 
>> return -EAGAIN
>> to upper apps, otherwise, some block allocation will be happen, which 
>> violates the
>> semantics of 'IOCB_NOWAIT'.
> Before we do a direct-io, I need to check if this is a overwrite allocated blocks IO.
> If not, we will return  -EAGAIN in 'IOCB_NOWAIT' mode. this should not trigger any block allocation.
> I am not sure if we understand your concern totally.
> 

Yes, your description is correct.
So we should return 0 instead of break when we find the hold in ocfs2_overwrite_io();

> Thanks
> Gang 
> 
>>
>> Thanks,
>> Alex
>>
>>>>
>>>> BTW, should we consider the down_write() and ocfs2_inode_lock() in 
>>>> ocfs2_dio_wr_get_block() when
>>>> the flag 'IOCB_NOWAIT' is set;
>>> I think that we should not consider that layer lock, otherwise, the code 
>> change will become more and more complex and big.
>>> I also refer to ext4 file system code change for this 
>> feature(728fbc0e10b7f3ce2ee043b32e3453fd5201c055), they did not do any change 
>> in that layer.
>>>
>>
>> OK.
>>
>>> Thanks
>>> Gang
>>>
>>>>
>>>>>>> +
>>>>>>> +		if (rec.e_flags & OCFS2_EXT_REFCOUNTED)
>>>>>>> +			break;
>>>>>>> +
>>>>>>> +		cpos = le32_to_cpu(rec.e_cpos) +
>>>>>>> +			le16_to_cpu(rec.e_leaf_clusters);
>>>>>>> +	}
>>>>>>> +
>>>>>>> +	if (cpos < mapping_end)
>>>>>>> +		ret = 1;
>>>>>>> +
>>>>>>> +out_unlock2:
>>>>>>
>>>>>> I think the 'out_up_read' is more readable than the 'out_unlock2' .
>>>>> Ok, I will use more readable tag here.
>>>>>>
>>>>>>> +	brelse(di_bh);
>>>>>>> +
>>>>>>> +	up_read(&OCFS2_I(inode)->ip_alloc_sem);
>>>>>>> +
>>>>>>> +out_unlock1:
>>>>>>
>>>>>> We should release buffer head here.
>>>>>>
>>>>>>> +	ocfs2_inode_unlock(inode, 0);
>>>>>>> +
>>>>>>> +out:
>>>>>>> +	return (ret ? 0 : 1);
>>>>>>> +}
>>>>>>> +
>>>>>>>  int ocfs2_seek_data_hole_offset(struct file *file, loff_t *offset, int 
>>>>>> whence)
>>>>>>>  {
>>>>>>>  	struct inode *inode = file->f_mapping->host;
>>>>>>> diff --git a/fs/ocfs2/extent_map.h b/fs/ocfs2/extent_map.h
>>>>>>> index 67ea57d..fd9e86a 100644
>>>>>>> --- a/fs/ocfs2/extent_map.h
>>>>>>> +++ b/fs/ocfs2/extent_map.h
>>>>>>> @@ -53,6 +53,9 @@ int ocfs2_extent_map_get_blocks(struct inode *inode, u64 
>>>>>> v_blkno, u64 *p_blkno,
>>>>>>>  int ocfs2_fiemap(struct inode *inode, struct fiemap_extent_info *fieinfo,
>>>>>>>  		 u64 map_start, u64 map_len);
>>>>>>>  
>>>>>>> +int ocfs2_overwrite_io(struct inode *inode, u64 map_start, u64 map_len,
>>>>>>> +		       int wait);
>>>>>>> +
>>>>>>>  int ocfs2_seek_data_hole_offset(struct file *file, loff_t *offset, int 
>>>>>> origin);
>>>>>>>  
>>>>>>>  int ocfs2_xattr_get_clusters(struct inode *inode, u32 v_cluster,
>>>>>>>
>>>>>
>>>>>
>>>>> .
>>>>>
>>>
>>> .
>>>
> 
> .
> 

^ permalink raw reply	[flat|nested] 62+ messages in thread

end of thread, other threads:[~2017-11-28 13:22 UTC | newest]

Thread overview: 62+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-11-27  9:46 [PATCH 0/3] ocfs2: add nowait aio support Gang He
2017-11-27  9:46 ` [Ocfs2-devel] " Gang He
2017-11-27  9:46 ` [PATCH 1/3] ocfs2: add ocfs2_try_rw_lock and ocfs2_try_inode_lock Gang He
2017-11-27  9:46   ` [Ocfs2-devel] " Gang He
2017-11-28  1:32   ` piaojun
2017-11-28  1:32     ` piaojun
2017-11-28  5:05     ` Gang He
2017-11-28  5:05       ` Gang He
2017-11-28  1:52   ` Changwei Ge
2017-11-28  1:52     ` Changwei Ge
2017-11-28  5:26     ` Gang He
2017-11-28  5:26       ` Gang He
2017-11-27  9:46 ` [PATCH 2/3] ocfs2: add ocfs2_overwrite_io function Gang He
2017-11-27  9:46   ` [Ocfs2-devel] " Gang He
2017-11-28  1:13   ` Joseph Qi
2017-11-28  1:13     ` Joseph Qi
2017-11-28  3:35     ` Gang He
2017-11-28  3:35       ` Gang He
2017-11-28  6:51       ` Joseph Qi
2017-11-28  6:51         ` Joseph Qi
2017-11-28  7:24         ` Gang He
2017-11-28  7:24           ` Gang He
2017-11-28  8:40           ` Joseph Qi
2017-11-28  8:40             ` Joseph Qi
2017-11-28  8:54             ` Gang He
2017-11-28  8:54               ` Gang He
2017-11-28  9:03               ` Joseph Qi
2017-11-28  9:03                 ` Joseph Qi
2017-11-28  1:50   ` piaojun
2017-11-28  1:50     ` piaojun
2017-11-28  2:10     ` Changwei Ge
2017-11-28  2:10       ` Changwei Ge
2017-11-28  5:27       ` Gang He
2017-11-28  5:27         ` Gang He
2017-11-28  5:07     ` Gang He
2017-11-28  5:07       ` Gang He
2017-11-28  2:19   ` alex chen
2017-11-28  2:19     ` alex chen
2017-11-28  5:33     ` Gang He
2017-11-28  5:33       ` Gang He
2017-11-28  6:19       ` alex chen
2017-11-28  6:19         ` alex chen
2017-11-28  7:38         ` Gang He
2017-11-28  7:38           ` Gang He
2017-11-28  8:11           ` alex chen
2017-11-28  8:11             ` alex chen
2017-11-28  8:32             ` Gang He
2017-11-28  8:32               ` Gang He
2017-11-28 13:22               ` alex chen
2017-11-28 13:22                 ` alex chen
2017-11-28  2:48   ` Changwei Ge
2017-11-28  2:48     ` Changwei Ge
2017-11-28  5:40     ` Gang He
2017-11-28  5:40       ` Gang He
2017-11-28  5:48       ` Changwei Ge
2017-11-28  5:48         ` Changwei Ge
2017-11-27  9:46 ` [PATCH 3/3] ocfs2: nowait aio support Gang He
2017-11-27  9:46   ` [Ocfs2-devel] " Gang He
2017-11-28  2:51   ` alex chen
2017-11-28  2:51     ` alex chen
2017-11-28  5:59     ` Gang He
2017-11-28  5:59       ` Gang He

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.