* [PATCH v2 0/3] ocfs2: add nowait aio support
@ 2017-11-29 8:36 ` Gang He
0 siblings, 0 replies; 20+ messages in thread
From: Gang He @ 2017-11-29 8:36 UTC (permalink / raw)
To: mfasheh, jlbec, rgoldwyn, hch; +Cc: Gang He, linux-kernel, ocfs2-devel, akpm
As you know, VFS layer has introduced non-block aio
flag IOCB_NOWAIT, which informs kernel to bail out
if an AIO request will block for reasons such as file
allocations, or a writeback triggered, or would block
while allocating requests while performing direct I/O.
Subsequent, pwritev2/preadv2 also can leverage this
part kernel code.
So far, ext4/xfs/btrfs have supported this feature,
I'd like to add the related code for ocfs2 file system.
Compare with v1, some changes are as below,
use osb pointer in ocfs2_try_rw_lock() function,
modify ocfs2_overwrite_io() function to make all error
value can be returned to the upper code,
move invoking ocfs2_overwrite_io() function from
ocfs2_file_write_iter() to ocfs2_prepare_inode_for_write(),
this change can combine acquiring the related locks.
Gang He (3):
ocfs2: add ocfs2_try_rw_lock and ocfs2_try_inode_lock
ocfs2: add ocfs2_overwrite_io function
ocfs2: nowait aio support
fs/ocfs2/dir.c | 2 +-
fs/ocfs2/dlmglue.c | 41 +++++++++++++++++++---
fs/ocfs2/dlmglue.h | 6 +++-
fs/ocfs2/extent_map.c | 41 ++++++++++++++++++++++
fs/ocfs2/extent_map.h | 3 ++
fs/ocfs2/file.c | 95 +++++++++++++++++++++++++++++++++++++++-----------
fs/ocfs2/mmap.c | 2 +-
fs/ocfs2/ocfs2_trace.h | 10 +++---
8 files changed, 168 insertions(+), 32 deletions(-)
--
1.8.5.6
^ permalink raw reply [flat|nested] 20+ messages in thread
* [Ocfs2-devel] [PATCH v2 0/3] ocfs2: add nowait aio support
@ 2017-11-29 8:36 ` Gang He
0 siblings, 0 replies; 20+ messages in thread
From: Gang He @ 2017-11-29 8:36 UTC (permalink / raw)
To: mfasheh, jlbec, rgoldwyn, hch; +Cc: Gang He, linux-kernel, ocfs2-devel, akpm
As you know, VFS layer has introduced non-block aio
flag IOCB_NOWAIT, which informs kernel to bail out
if an AIO request will block for reasons such as file
allocations, or a writeback triggered, or would block
while allocating requests while performing direct I/O.
Subsequent, pwritev2/preadv2 also can leverage this
part kernel code.
So far, ext4/xfs/btrfs have supported this feature,
I'd like to add the related code for ocfs2 file system.
Compare with v1, some changes are as below,
use osb pointer in ocfs2_try_rw_lock() function,
modify ocfs2_overwrite_io() function to make all error
value can be returned to the upper code,
move invoking ocfs2_overwrite_io() function from
ocfs2_file_write_iter() to ocfs2_prepare_inode_for_write(),
this change can combine acquiring the related locks.
Gang He (3):
ocfs2: add ocfs2_try_rw_lock and ocfs2_try_inode_lock
ocfs2: add ocfs2_overwrite_io function
ocfs2: nowait aio support
fs/ocfs2/dir.c | 2 +-
fs/ocfs2/dlmglue.c | 41 +++++++++++++++++++---
fs/ocfs2/dlmglue.h | 6 +++-
fs/ocfs2/extent_map.c | 41 ++++++++++++++++++++++
fs/ocfs2/extent_map.h | 3 ++
fs/ocfs2/file.c | 95 +++++++++++++++++++++++++++++++++++++++-----------
fs/ocfs2/mmap.c | 2 +-
fs/ocfs2/ocfs2_trace.h | 10 +++---
8 files changed, 168 insertions(+), 32 deletions(-)
--
1.8.5.6
^ permalink raw reply [flat|nested] 20+ messages in thread
* [PATCH v2 1/3] ocfs2: add ocfs2_try_rw_lock and ocfs2_try_inode_lock
2017-11-29 8:36 ` [Ocfs2-devel] " Gang He
@ 2017-11-29 8:36 ` Gang He
-1 siblings, 0 replies; 20+ messages in thread
From: Gang He @ 2017-11-29 8:36 UTC (permalink / raw)
To: mfasheh, jlbec, rgoldwyn, hch; +Cc: Gang He, linux-kernel, ocfs2-devel, akpm
Add ocfs2_try_rw_lock and ocfs2_try_inode_lock functions, which
will be used in non-block IO scenarios.
Signed-off-by: Gang He <ghe@suse.com>
---
fs/ocfs2/dlmglue.c | 21 +++++++++++++++++++++
fs/ocfs2/dlmglue.h | 4 ++++
2 files changed, 25 insertions(+)
diff --git a/fs/ocfs2/dlmglue.c b/fs/ocfs2/dlmglue.c
index 4689940..a68efa3 100644
--- a/fs/ocfs2/dlmglue.c
+++ b/fs/ocfs2/dlmglue.c
@@ -1742,6 +1742,27 @@ int ocfs2_rw_lock(struct inode *inode, int write)
return status;
}
+int ocfs2_try_rw_lock(struct inode *inode, int write)
+{
+ int status, level;
+ struct ocfs2_lock_res *lockres;
+ struct ocfs2_super *osb = OCFS2_SB(inode->i_sb);
+
+ mlog(0, "inode %llu try to take %s RW lock\n",
+ (unsigned long long)OCFS2_I(inode)->ip_blkno,
+ write ? "EXMODE" : "PRMODE");
+
+ if (ocfs2_mount_local(osb))
+ return 0;
+
+ lockres = &OCFS2_I(inode)->ip_rw_lockres;
+
+ level = write ? DLM_LOCK_EX : DLM_LOCK_PR;
+
+ status = ocfs2_cluster_lock(osb, lockres, level, DLM_LKF_NOQUEUE, 0);
+ return status;
+}
+
void ocfs2_rw_unlock(struct inode *inode, int write)
{
int level = write ? DLM_LOCK_EX : DLM_LOCK_PR;
diff --git a/fs/ocfs2/dlmglue.h b/fs/ocfs2/dlmglue.h
index a7fc18b..05910fc 100644
--- a/fs/ocfs2/dlmglue.h
+++ b/fs/ocfs2/dlmglue.h
@@ -116,6 +116,7 @@ void ocfs2_refcount_lock_res_init(struct ocfs2_lock_res *lockres,
int ocfs2_create_new_inode_locks(struct inode *inode);
int ocfs2_drop_inode_locks(struct inode *inode);
int ocfs2_rw_lock(struct inode *inode, int write);
+int ocfs2_try_rw_lock(struct inode *inode, int write);
void ocfs2_rw_unlock(struct inode *inode, int write);
int ocfs2_open_lock(struct inode *inode);
int ocfs2_try_open_lock(struct inode *inode, int write);
@@ -140,6 +141,9 @@ int ocfs2_inode_lock_with_page(struct inode *inode,
/* 99% of the time we don't want to supply any additional flags --
* those are for very specific cases only. */
#define ocfs2_inode_lock(i, b, e) ocfs2_inode_lock_full_nested(i, b, e, 0, OI_LS_NORMAL)
+#define ocfs2_try_inode_lock(i, b, e)\
+ ocfs2_inode_lock_full_nested(i, b, e, OCFS2_META_LOCK_NOQUEUE,\
+ OI_LS_NORMAL)
void ocfs2_inode_unlock(struct inode *inode,
int ex);
int ocfs2_super_lock(struct ocfs2_super *osb,
--
1.8.5.6
^ permalink raw reply related [flat|nested] 20+ messages in thread
* [Ocfs2-devel] [PATCH v2 1/3] ocfs2: add ocfs2_try_rw_lock and ocfs2_try_inode_lock
@ 2017-11-29 8:36 ` Gang He
0 siblings, 0 replies; 20+ messages in thread
From: Gang He @ 2017-11-29 8:36 UTC (permalink / raw)
To: mfasheh, jlbec, rgoldwyn, hch; +Cc: Gang He, linux-kernel, ocfs2-devel, akpm
Add ocfs2_try_rw_lock and ocfs2_try_inode_lock functions, which
will be used in non-block IO scenarios.
Signed-off-by: Gang He <ghe@suse.com>
---
fs/ocfs2/dlmglue.c | 21 +++++++++++++++++++++
fs/ocfs2/dlmglue.h | 4 ++++
2 files changed, 25 insertions(+)
diff --git a/fs/ocfs2/dlmglue.c b/fs/ocfs2/dlmglue.c
index 4689940..a68efa3 100644
--- a/fs/ocfs2/dlmglue.c
+++ b/fs/ocfs2/dlmglue.c
@@ -1742,6 +1742,27 @@ int ocfs2_rw_lock(struct inode *inode, int write)
return status;
}
+int ocfs2_try_rw_lock(struct inode *inode, int write)
+{
+ int status, level;
+ struct ocfs2_lock_res *lockres;
+ struct ocfs2_super *osb = OCFS2_SB(inode->i_sb);
+
+ mlog(0, "inode %llu try to take %s RW lock\n",
+ (unsigned long long)OCFS2_I(inode)->ip_blkno,
+ write ? "EXMODE" : "PRMODE");
+
+ if (ocfs2_mount_local(osb))
+ return 0;
+
+ lockres = &OCFS2_I(inode)->ip_rw_lockres;
+
+ level = write ? DLM_LOCK_EX : DLM_LOCK_PR;
+
+ status = ocfs2_cluster_lock(osb, lockres, level, DLM_LKF_NOQUEUE, 0);
+ return status;
+}
+
void ocfs2_rw_unlock(struct inode *inode, int write)
{
int level = write ? DLM_LOCK_EX : DLM_LOCK_PR;
diff --git a/fs/ocfs2/dlmglue.h b/fs/ocfs2/dlmglue.h
index a7fc18b..05910fc 100644
--- a/fs/ocfs2/dlmglue.h
+++ b/fs/ocfs2/dlmglue.h
@@ -116,6 +116,7 @@ void ocfs2_refcount_lock_res_init(struct ocfs2_lock_res *lockres,
int ocfs2_create_new_inode_locks(struct inode *inode);
int ocfs2_drop_inode_locks(struct inode *inode);
int ocfs2_rw_lock(struct inode *inode, int write);
+int ocfs2_try_rw_lock(struct inode *inode, int write);
void ocfs2_rw_unlock(struct inode *inode, int write);
int ocfs2_open_lock(struct inode *inode);
int ocfs2_try_open_lock(struct inode *inode, int write);
@@ -140,6 +141,9 @@ int ocfs2_inode_lock_with_page(struct inode *inode,
/* 99% of the time we don't want to supply any additional flags --
* those are for very specific cases only. */
#define ocfs2_inode_lock(i, b, e) ocfs2_inode_lock_full_nested(i, b, e, 0, OI_LS_NORMAL)
+#define ocfs2_try_inode_lock(i, b, e)\
+ ocfs2_inode_lock_full_nested(i, b, e, OCFS2_META_LOCK_NOQUEUE,\
+ OI_LS_NORMAL)
void ocfs2_inode_unlock(struct inode *inode,
int ex);
int ocfs2_super_lock(struct ocfs2_super *osb,
--
1.8.5.6
^ permalink raw reply related [flat|nested] 20+ messages in thread
* [PATCH v2 2/3] ocfs2: add ocfs2_overwrite_io function
2017-11-29 8:36 ` [Ocfs2-devel] " Gang He
@ 2017-11-29 8:36 ` Gang He
-1 siblings, 0 replies; 20+ messages in thread
From: Gang He @ 2017-11-29 8:36 UTC (permalink / raw)
To: mfasheh, jlbec, rgoldwyn, hch; +Cc: Gang He, linux-kernel, ocfs2-devel, akpm
Add ocfs2_overwrite_io function, which is used to judge if
overwrite allocated blocks, otherwise, the write will bring extra
block allocation overhead.
Signed-off-by: Gang He <ghe@suse.com>
---
fs/ocfs2/extent_map.c | 41 +++++++++++++++++++++++++++++++++++++++++
fs/ocfs2/extent_map.h | 3 +++
2 files changed, 44 insertions(+)
diff --git a/fs/ocfs2/extent_map.c b/fs/ocfs2/extent_map.c
index e4719e0..4da33e9 100644
--- a/fs/ocfs2/extent_map.c
+++ b/fs/ocfs2/extent_map.c
@@ -832,6 +832,47 @@ int ocfs2_fiemap(struct inode *inode, struct fiemap_extent_info *fieinfo,
return ret;
}
+/* Is IO overwriting allocated blocks? */
+int ocfs2_overwrite_io(struct inode *inode, struct buffer_head *di_bh,
+ u64 map_start, u64 map_len)
+{
+ int ret = 0, is_last;
+ u32 mapping_end, cpos;
+ struct ocfs2_super *osb = OCFS2_SB(inode->i_sb);
+ struct ocfs2_extent_rec rec;
+
+ if ((OCFS2_I(inode)->ip_dyn_features & OCFS2_INLINE_DATA_FL) &&
+ ((map_start + map_len) <= i_size_read(inode)))
+ goto out;
+
+ cpos = map_start >> osb->s_clustersize_bits;
+ mapping_end = ocfs2_clusters_for_bytes(inode->i_sb,
+ map_start + map_len);
+ is_last = 0;
+ while (cpos < mapping_end && !is_last) {
+ ret = ocfs2_get_clusters_nocache(inode, di_bh, cpos,
+ NULL, &rec, &is_last);
+ if (ret) {
+ mlog_errno(ret);
+ goto out;
+ }
+
+ if (rec.e_blkno == 0ULL)
+ break;
+
+ if (rec.e_flags & OCFS2_EXT_REFCOUNTED)
+ break;
+
+ cpos = le32_to_cpu(rec.e_cpos) +
+ le16_to_cpu(rec.e_leaf_clusters);
+ }
+
+ if (cpos < mapping_end)
+ ret = -EAGAIN;
+out:
+ return ret;
+}
+
int ocfs2_seek_data_hole_offset(struct file *file, loff_t *offset, int whence)
{
struct inode *inode = file->f_mapping->host;
diff --git a/fs/ocfs2/extent_map.h b/fs/ocfs2/extent_map.h
index 67ea57d..1057586 100644
--- a/fs/ocfs2/extent_map.h
+++ b/fs/ocfs2/extent_map.h
@@ -53,6 +53,9 @@ int ocfs2_extent_map_get_blocks(struct inode *inode, u64 v_blkno, u64 *p_blkno,
int ocfs2_fiemap(struct inode *inode, struct fiemap_extent_info *fieinfo,
u64 map_start, u64 map_len);
+int ocfs2_overwrite_io(struct inode *inode, struct buffer_head *di_bh,
+ u64 map_start, u64 map_len);
+
int ocfs2_seek_data_hole_offset(struct file *file, loff_t *offset, int origin);
int ocfs2_xattr_get_clusters(struct inode *inode, u32 v_cluster,
--
1.8.5.6
^ permalink raw reply related [flat|nested] 20+ messages in thread
* [Ocfs2-devel] [PATCH v2 2/3] ocfs2: add ocfs2_overwrite_io function
@ 2017-11-29 8:36 ` Gang He
0 siblings, 0 replies; 20+ messages in thread
From: Gang He @ 2017-11-29 8:36 UTC (permalink / raw)
To: mfasheh, jlbec, rgoldwyn, hch; +Cc: Gang He, linux-kernel, ocfs2-devel, akpm
Add ocfs2_overwrite_io function, which is used to judge if
overwrite allocated blocks, otherwise, the write will bring extra
block allocation overhead.
Signed-off-by: Gang He <ghe@suse.com>
---
fs/ocfs2/extent_map.c | 41 +++++++++++++++++++++++++++++++++++++++++
fs/ocfs2/extent_map.h | 3 +++
2 files changed, 44 insertions(+)
diff --git a/fs/ocfs2/extent_map.c b/fs/ocfs2/extent_map.c
index e4719e0..4da33e9 100644
--- a/fs/ocfs2/extent_map.c
+++ b/fs/ocfs2/extent_map.c
@@ -832,6 +832,47 @@ int ocfs2_fiemap(struct inode *inode, struct fiemap_extent_info *fieinfo,
return ret;
}
+/* Is IO overwriting allocated blocks? */
+int ocfs2_overwrite_io(struct inode *inode, struct buffer_head *di_bh,
+ u64 map_start, u64 map_len)
+{
+ int ret = 0, is_last;
+ u32 mapping_end, cpos;
+ struct ocfs2_super *osb = OCFS2_SB(inode->i_sb);
+ struct ocfs2_extent_rec rec;
+
+ if ((OCFS2_I(inode)->ip_dyn_features & OCFS2_INLINE_DATA_FL) &&
+ ((map_start + map_len) <= i_size_read(inode)))
+ goto out;
+
+ cpos = map_start >> osb->s_clustersize_bits;
+ mapping_end = ocfs2_clusters_for_bytes(inode->i_sb,
+ map_start + map_len);
+ is_last = 0;
+ while (cpos < mapping_end && !is_last) {
+ ret = ocfs2_get_clusters_nocache(inode, di_bh, cpos,
+ NULL, &rec, &is_last);
+ if (ret) {
+ mlog_errno(ret);
+ goto out;
+ }
+
+ if (rec.e_blkno == 0ULL)
+ break;
+
+ if (rec.e_flags & OCFS2_EXT_REFCOUNTED)
+ break;
+
+ cpos = le32_to_cpu(rec.e_cpos) +
+ le16_to_cpu(rec.e_leaf_clusters);
+ }
+
+ if (cpos < mapping_end)
+ ret = -EAGAIN;
+out:
+ return ret;
+}
+
int ocfs2_seek_data_hole_offset(struct file *file, loff_t *offset, int whence)
{
struct inode *inode = file->f_mapping->host;
diff --git a/fs/ocfs2/extent_map.h b/fs/ocfs2/extent_map.h
index 67ea57d..1057586 100644
--- a/fs/ocfs2/extent_map.h
+++ b/fs/ocfs2/extent_map.h
@@ -53,6 +53,9 @@ int ocfs2_extent_map_get_blocks(struct inode *inode, u64 v_blkno, u64 *p_blkno,
int ocfs2_fiemap(struct inode *inode, struct fiemap_extent_info *fieinfo,
u64 map_start, u64 map_len);
+int ocfs2_overwrite_io(struct inode *inode, struct buffer_head *di_bh,
+ u64 map_start, u64 map_len);
+
int ocfs2_seek_data_hole_offset(struct file *file, loff_t *offset, int origin);
int ocfs2_xattr_get_clusters(struct inode *inode, u32 v_cluster,
--
1.8.5.6
^ permalink raw reply related [flat|nested] 20+ messages in thread
* [PATCH v2 3/3] ocfs2: nowait aio support
2017-11-29 8:36 ` [Ocfs2-devel] " Gang He
@ 2017-11-29 8:36 ` Gang He
-1 siblings, 0 replies; 20+ messages in thread
From: Gang He @ 2017-11-29 8:36 UTC (permalink / raw)
To: mfasheh, jlbec, rgoldwyn, hch; +Cc: Gang He, linux-kernel, ocfs2-devel, akpm
Return -EAGAIN if any of the following checks fail for
direct I/O with nowait flag:
Can not get the related locks immediately,
Blocks are not allocated at the write location, it will trigger
block allocation, this will block IO operations.
Signed-off-by: Gang He <ghe@suse.com>
---
fs/ocfs2/dir.c | 2 +-
fs/ocfs2/dlmglue.c | 20 ++++++++---
fs/ocfs2/dlmglue.h | 2 +-
fs/ocfs2/file.c | 95 +++++++++++++++++++++++++++++++++++++++-----------
fs/ocfs2/mmap.c | 2 +-
fs/ocfs2/ocfs2_trace.h | 10 +++---
6 files changed, 99 insertions(+), 32 deletions(-)
diff --git a/fs/ocfs2/dir.c b/fs/ocfs2/dir.c
index febe631..ea50901 100644
--- a/fs/ocfs2/dir.c
+++ b/fs/ocfs2/dir.c
@@ -1957,7 +1957,7 @@ int ocfs2_readdir(struct file *file, struct dir_context *ctx)
trace_ocfs2_readdir((unsigned long long)OCFS2_I(inode)->ip_blkno);
- error = ocfs2_inode_lock_atime(inode, file->f_path.mnt, &lock_level);
+ error = ocfs2_inode_lock_atime(inode, file->f_path.mnt, &lock_level, 1);
if (lock_level && error >= 0) {
/* We release EX lock which used to update atime
* and get PR lock again to reduce contention
diff --git a/fs/ocfs2/dlmglue.c b/fs/ocfs2/dlmglue.c
index a68efa3..07e169f 100644
--- a/fs/ocfs2/dlmglue.c
+++ b/fs/ocfs2/dlmglue.c
@@ -2515,13 +2515,18 @@ int ocfs2_inode_lock_with_page(struct inode *inode,
int ocfs2_inode_lock_atime(struct inode *inode,
struct vfsmount *vfsmnt,
- int *level)
+ int *level, int wait)
{
int ret;
- ret = ocfs2_inode_lock(inode, NULL, 0);
+ if (wait)
+ ret = ocfs2_inode_lock(inode, NULL, 0);
+ else
+ ret = ocfs2_try_inode_lock(inode, NULL, 0);
+
if (ret < 0) {
- mlog_errno(ret);
+ if (ret != -EAGAIN)
+ mlog_errno(ret);
return ret;
}
@@ -2533,9 +2538,14 @@ int ocfs2_inode_lock_atime(struct inode *inode,
struct buffer_head *bh = NULL;
ocfs2_inode_unlock(inode, 0);
- ret = ocfs2_inode_lock(inode, &bh, 1);
+ if (wait)
+ ret = ocfs2_inode_lock(inode, &bh, 1);
+ else
+ ret = ocfs2_try_inode_lock(inode, &bh, 1);
+
if (ret < 0) {
- mlog_errno(ret);
+ if (ret != -EAGAIN)
+ mlog_errno(ret);
return ret;
}
*level = 1;
diff --git a/fs/ocfs2/dlmglue.h b/fs/ocfs2/dlmglue.h
index 05910fc..c83dbb5 100644
--- a/fs/ocfs2/dlmglue.h
+++ b/fs/ocfs2/dlmglue.h
@@ -123,7 +123,7 @@ void ocfs2_refcount_lock_res_init(struct ocfs2_lock_res *lockres,
void ocfs2_open_unlock(struct inode *inode);
int ocfs2_inode_lock_atime(struct inode *inode,
struct vfsmount *vfsmnt,
- int *level);
+ int *level, int wait);
int ocfs2_inode_lock_full_nested(struct inode *inode,
struct buffer_head **ret_bh,
int ex,
diff --git a/fs/ocfs2/file.c b/fs/ocfs2/file.c
index dc455d4..51b7f06 100644
--- a/fs/ocfs2/file.c
+++ b/fs/ocfs2/file.c
@@ -140,6 +140,8 @@ static int ocfs2_file_open(struct inode *inode, struct file *file)
spin_unlock(&oi->ip_lock);
}
+ file->f_mode |= FMODE_NOWAIT;
+
leave:
return status;
}
@@ -2132,12 +2134,12 @@ static int ocfs2_prepare_inode_for_refcount(struct inode *inode,
}
static int ocfs2_prepare_inode_for_write(struct file *file,
- loff_t pos,
- size_t count)
+ loff_t pos, size_t count, int wait)
{
- int ret = 0, meta_level = 0;
+ int ret = 0, meta_level = 0, overwrite_io = 0;
struct dentry *dentry = file->f_path.dentry;
struct inode *inode = d_inode(dentry);
+ struct buffer_head *di_bh = NULL;
loff_t end;
/*
@@ -2145,13 +2147,40 @@ static int ocfs2_prepare_inode_for_write(struct file *file,
* if we need to make modifications here.
*/
for(;;) {
- ret = ocfs2_inode_lock(inode, NULL, meta_level);
+ if (wait)
+ ret = ocfs2_inode_lock(inode, NULL, meta_level);
+ else
+ ret = ocfs2_try_inode_lock(inode,
+ overwrite_io ? NULL : &di_bh, meta_level);
if (ret < 0) {
meta_level = -1;
- mlog_errno(ret);
+ if (ret != -EAGAIN)
+ mlog_errno(ret);
goto out;
}
+ /*
+ * Check if IO will overwrite allocated blocks in case
+ * IOCB_NOWAIT flag is set.
+ */
+ if (!wait && !overwrite_io) {
+ overwrite_io = 1;
+ if (!down_read_trylock(&OCFS2_I(inode)->ip_alloc_sem)) {
+ ret = -EAGAIN;
+ goto out_unlock;
+ }
+
+ ret = ocfs2_overwrite_io(inode, di_bh, pos, count);
+ brelse(di_bh);
+ di_bh = NULL;
+ up_read(&OCFS2_I(inode)->ip_alloc_sem);
+ if (ret < 0) {
+ if (ret != -EAGAIN)
+ mlog_errno(ret);
+ goto out_unlock;
+ }
+ }
+
/* Clear suid / sgid if necessary. We do this here
* instead of later in the write path because
* remove_suid() calls ->setattr without any hint that
@@ -2199,7 +2228,9 @@ static int ocfs2_prepare_inode_for_write(struct file *file,
out_unlock:
trace_ocfs2_prepare_inode_for_write(OCFS2_I(inode)->ip_blkno,
- pos, count);
+ pos, count, wait);
+
+ brelse(di_bh);
if (meta_level >= 0)
ocfs2_inode_unlock(inode, meta_level);
@@ -2211,7 +2242,7 @@ static int ocfs2_prepare_inode_for_write(struct file *file,
static ssize_t ocfs2_file_write_iter(struct kiocb *iocb,
struct iov_iter *from)
{
- int direct_io, rw_level;
+ int rw_level;
ssize_t written = 0;
ssize_t ret;
size_t count = iov_iter_count(from);
@@ -2223,6 +2254,8 @@ static ssize_t ocfs2_file_write_iter(struct kiocb *iocb,
void *saved_ki_complete = NULL;
int append_write = ((iocb->ki_pos + count) >=
i_size_read(inode) ? 1 : 0);
+ int direct_io = iocb->ki_flags & IOCB_DIRECT ? 1 : 0;
+ int nowait = iocb->ki_flags & IOCB_NOWAIT ? 1 : 0;
trace_ocfs2_file_aio_write(inode, file, file->f_path.dentry,
(unsigned long long)OCFS2_I(inode)->ip_blkno,
@@ -2230,12 +2263,17 @@ static ssize_t ocfs2_file_write_iter(struct kiocb *iocb,
file->f_path.dentry->d_name.name,
(unsigned int)from->nr_segs); /* GRRRRR */
+ if (!direct_io && nowait)
+ return -EOPNOTSUPP;
+
if (count == 0)
return 0;
- direct_io = iocb->ki_flags & IOCB_DIRECT ? 1 : 0;
-
- inode_lock(inode);
+ if (direct_io && nowait) {
+ if (!inode_trylock(inode))
+ return -EAGAIN;
+ } else
+ inode_lock(inode);
/*
* Concurrent O_DIRECT writes are allowed with
@@ -2244,9 +2282,13 @@ static ssize_t ocfs2_file_write_iter(struct kiocb *iocb,
*/
rw_level = (!direct_io || full_coherency || append_write);
- ret = ocfs2_rw_lock(inode, rw_level);
+ if (direct_io && nowait)
+ ret = ocfs2_try_rw_lock(inode, rw_level);
+ else
+ ret = ocfs2_rw_lock(inode, rw_level);
if (ret < 0) {
- mlog_errno(ret);
+ if (ret != -EAGAIN)
+ mlog_errno(ret);
goto out_mutex;
}
@@ -2260,9 +2302,13 @@ static ssize_t ocfs2_file_write_iter(struct kiocb *iocb,
* other nodes to drop their caches. Buffered I/O
* already does this in write_begin().
*/
- ret = ocfs2_inode_lock(inode, NULL, 1);
+ if (nowait)
+ ret = ocfs2_try_inode_lock(inode, NULL, 1);
+ else
+ ret = ocfs2_inode_lock(inode, NULL, 1);
if (ret < 0) {
- mlog_errno(ret);
+ if (ret != -EAGAIN)
+ mlog_errno(ret);
goto out;
}
@@ -2277,9 +2323,10 @@ static ssize_t ocfs2_file_write_iter(struct kiocb *iocb,
}
count = ret;
- ret = ocfs2_prepare_inode_for_write(file, iocb->ki_pos, count);
+ ret = ocfs2_prepare_inode_for_write(file, iocb->ki_pos, count, !nowait);
if (ret < 0) {
- mlog_errno(ret);
+ if (ret != -EAGAIN)
+ mlog_errno(ret);
goto out;
}
@@ -2355,6 +2402,7 @@ static ssize_t ocfs2_file_read_iter(struct kiocb *iocb,
int ret = 0, rw_level = -1, lock_level = 0;
struct file *filp = iocb->ki_filp;
struct inode *inode = file_inode(filp);
+ int nowait = iocb->ki_flags & IOCB_NOWAIT ? 1 : 0;
trace_ocfs2_file_aio_read(inode, filp, filp->f_path.dentry,
(unsigned long long)OCFS2_I(inode)->ip_blkno,
@@ -2374,9 +2422,14 @@ static ssize_t ocfs2_file_read_iter(struct kiocb *iocb,
* need locks to protect pending reads from racing with truncate.
*/
if (iocb->ki_flags & IOCB_DIRECT) {
- ret = ocfs2_rw_lock(inode, 0);
+ if (nowait)
+ ret = ocfs2_try_rw_lock(inode, 0);
+ else
+ ret = ocfs2_rw_lock(inode, 0);
+
if (ret < 0) {
- mlog_errno(ret);
+ if (ret != -EAGAIN)
+ mlog_errno(ret);
goto bail;
}
rw_level = 0;
@@ -2393,9 +2446,11 @@ static ssize_t ocfs2_file_read_iter(struct kiocb *iocb,
* like i_size. This allows the checks down below
* generic_file_aio_read() a chance of actually working.
*/
- ret = ocfs2_inode_lock_atime(inode, filp->f_path.mnt, &lock_level);
+ ret = ocfs2_inode_lock_atime(inode, filp->f_path.mnt, &lock_level,
+ !nowait);
if (ret < 0) {
- mlog_errno(ret);
+ if (ret != -EAGAIN)
+ mlog_errno(ret);
goto bail;
}
ocfs2_inode_unlock(inode, lock_level);
diff --git a/fs/ocfs2/mmap.c b/fs/ocfs2/mmap.c
index 098f5c7..fb9a20e 100644
--- a/fs/ocfs2/mmap.c
+++ b/fs/ocfs2/mmap.c
@@ -184,7 +184,7 @@ int ocfs2_mmap(struct file *file, struct vm_area_struct *vma)
int ret = 0, lock_level = 0;
ret = ocfs2_inode_lock_atime(file_inode(file),
- file->f_path.mnt, &lock_level);
+ file->f_path.mnt, &lock_level, 1);
if (ret < 0) {
mlog_errno(ret);
goto out;
diff --git a/fs/ocfs2/ocfs2_trace.h b/fs/ocfs2/ocfs2_trace.h
index a0b5d00..e2a11aa 100644
--- a/fs/ocfs2/ocfs2_trace.h
+++ b/fs/ocfs2/ocfs2_trace.h
@@ -1449,20 +1449,22 @@
TRACE_EVENT(ocfs2_prepare_inode_for_write,
TP_PROTO(unsigned long long ino, unsigned long long saved_pos,
- unsigned long count),
- TP_ARGS(ino, saved_pos, count),
+ unsigned long count, int wait),
+ TP_ARGS(ino, saved_pos, count, wait),
TP_STRUCT__entry(
__field(unsigned long long, ino)
__field(unsigned long long, saved_pos)
__field(unsigned long, count)
+ __field(int, wait)
),
TP_fast_assign(
__entry->ino = ino;
__entry->saved_pos = saved_pos;
__entry->count = count;
+ __entry->wait = wait;
),
- TP_printk("%llu %llu %lu", __entry->ino,
- __entry->saved_pos, __entry->count)
+ TP_printk("%llu %llu %lu %d", __entry->ino,
+ __entry->saved_pos, __entry->count, __entry->wait)
);
DEFINE_OCFS2_INT_EVENT(generic_file_aio_read_ret);
--
1.8.5.6
^ permalink raw reply related [flat|nested] 20+ messages in thread
* [Ocfs2-devel] [PATCH v2 3/3] ocfs2: nowait aio support
@ 2017-11-29 8:36 ` Gang He
0 siblings, 0 replies; 20+ messages in thread
From: Gang He @ 2017-11-29 8:36 UTC (permalink / raw)
To: mfasheh, jlbec, rgoldwyn, hch; +Cc: Gang He, linux-kernel, ocfs2-devel, akpm
Return -EAGAIN if any of the following checks fail for
direct I/O with nowait flag:
Can not get the related locks immediately,
Blocks are not allocated at the write location, it will trigger
block allocation, this will block IO operations.
Signed-off-by: Gang He <ghe@suse.com>
---
fs/ocfs2/dir.c | 2 +-
fs/ocfs2/dlmglue.c | 20 ++++++++---
fs/ocfs2/dlmglue.h | 2 +-
fs/ocfs2/file.c | 95 +++++++++++++++++++++++++++++++++++++++-----------
fs/ocfs2/mmap.c | 2 +-
fs/ocfs2/ocfs2_trace.h | 10 +++---
6 files changed, 99 insertions(+), 32 deletions(-)
diff --git a/fs/ocfs2/dir.c b/fs/ocfs2/dir.c
index febe631..ea50901 100644
--- a/fs/ocfs2/dir.c
+++ b/fs/ocfs2/dir.c
@@ -1957,7 +1957,7 @@ int ocfs2_readdir(struct file *file, struct dir_context *ctx)
trace_ocfs2_readdir((unsigned long long)OCFS2_I(inode)->ip_blkno);
- error = ocfs2_inode_lock_atime(inode, file->f_path.mnt, &lock_level);
+ error = ocfs2_inode_lock_atime(inode, file->f_path.mnt, &lock_level, 1);
if (lock_level && error >= 0) {
/* We release EX lock which used to update atime
* and get PR lock again to reduce contention
diff --git a/fs/ocfs2/dlmglue.c b/fs/ocfs2/dlmglue.c
index a68efa3..07e169f 100644
--- a/fs/ocfs2/dlmglue.c
+++ b/fs/ocfs2/dlmglue.c
@@ -2515,13 +2515,18 @@ int ocfs2_inode_lock_with_page(struct inode *inode,
int ocfs2_inode_lock_atime(struct inode *inode,
struct vfsmount *vfsmnt,
- int *level)
+ int *level, int wait)
{
int ret;
- ret = ocfs2_inode_lock(inode, NULL, 0);
+ if (wait)
+ ret = ocfs2_inode_lock(inode, NULL, 0);
+ else
+ ret = ocfs2_try_inode_lock(inode, NULL, 0);
+
if (ret < 0) {
- mlog_errno(ret);
+ if (ret != -EAGAIN)
+ mlog_errno(ret);
return ret;
}
@@ -2533,9 +2538,14 @@ int ocfs2_inode_lock_atime(struct inode *inode,
struct buffer_head *bh = NULL;
ocfs2_inode_unlock(inode, 0);
- ret = ocfs2_inode_lock(inode, &bh, 1);
+ if (wait)
+ ret = ocfs2_inode_lock(inode, &bh, 1);
+ else
+ ret = ocfs2_try_inode_lock(inode, &bh, 1);
+
if (ret < 0) {
- mlog_errno(ret);
+ if (ret != -EAGAIN)
+ mlog_errno(ret);
return ret;
}
*level = 1;
diff --git a/fs/ocfs2/dlmglue.h b/fs/ocfs2/dlmglue.h
index 05910fc..c83dbb5 100644
--- a/fs/ocfs2/dlmglue.h
+++ b/fs/ocfs2/dlmglue.h
@@ -123,7 +123,7 @@ void ocfs2_refcount_lock_res_init(struct ocfs2_lock_res *lockres,
void ocfs2_open_unlock(struct inode *inode);
int ocfs2_inode_lock_atime(struct inode *inode,
struct vfsmount *vfsmnt,
- int *level);
+ int *level, int wait);
int ocfs2_inode_lock_full_nested(struct inode *inode,
struct buffer_head **ret_bh,
int ex,
diff --git a/fs/ocfs2/file.c b/fs/ocfs2/file.c
index dc455d4..51b7f06 100644
--- a/fs/ocfs2/file.c
+++ b/fs/ocfs2/file.c
@@ -140,6 +140,8 @@ static int ocfs2_file_open(struct inode *inode, struct file *file)
spin_unlock(&oi->ip_lock);
}
+ file->f_mode |= FMODE_NOWAIT;
+
leave:
return status;
}
@@ -2132,12 +2134,12 @@ static int ocfs2_prepare_inode_for_refcount(struct inode *inode,
}
static int ocfs2_prepare_inode_for_write(struct file *file,
- loff_t pos,
- size_t count)
+ loff_t pos, size_t count, int wait)
{
- int ret = 0, meta_level = 0;
+ int ret = 0, meta_level = 0, overwrite_io = 0;
struct dentry *dentry = file->f_path.dentry;
struct inode *inode = d_inode(dentry);
+ struct buffer_head *di_bh = NULL;
loff_t end;
/*
@@ -2145,13 +2147,40 @@ static int ocfs2_prepare_inode_for_write(struct file *file,
* if we need to make modifications here.
*/
for(;;) {
- ret = ocfs2_inode_lock(inode, NULL, meta_level);
+ if (wait)
+ ret = ocfs2_inode_lock(inode, NULL, meta_level);
+ else
+ ret = ocfs2_try_inode_lock(inode,
+ overwrite_io ? NULL : &di_bh, meta_level);
if (ret < 0) {
meta_level = -1;
- mlog_errno(ret);
+ if (ret != -EAGAIN)
+ mlog_errno(ret);
goto out;
}
+ /*
+ * Check if IO will overwrite allocated blocks in case
+ * IOCB_NOWAIT flag is set.
+ */
+ if (!wait && !overwrite_io) {
+ overwrite_io = 1;
+ if (!down_read_trylock(&OCFS2_I(inode)->ip_alloc_sem)) {
+ ret = -EAGAIN;
+ goto out_unlock;
+ }
+
+ ret = ocfs2_overwrite_io(inode, di_bh, pos, count);
+ brelse(di_bh);
+ di_bh = NULL;
+ up_read(&OCFS2_I(inode)->ip_alloc_sem);
+ if (ret < 0) {
+ if (ret != -EAGAIN)
+ mlog_errno(ret);
+ goto out_unlock;
+ }
+ }
+
/* Clear suid / sgid if necessary. We do this here
* instead of later in the write path because
* remove_suid() calls ->setattr without any hint that
@@ -2199,7 +2228,9 @@ static int ocfs2_prepare_inode_for_write(struct file *file,
out_unlock:
trace_ocfs2_prepare_inode_for_write(OCFS2_I(inode)->ip_blkno,
- pos, count);
+ pos, count, wait);
+
+ brelse(di_bh);
if (meta_level >= 0)
ocfs2_inode_unlock(inode, meta_level);
@@ -2211,7 +2242,7 @@ static int ocfs2_prepare_inode_for_write(struct file *file,
static ssize_t ocfs2_file_write_iter(struct kiocb *iocb,
struct iov_iter *from)
{
- int direct_io, rw_level;
+ int rw_level;
ssize_t written = 0;
ssize_t ret;
size_t count = iov_iter_count(from);
@@ -2223,6 +2254,8 @@ static ssize_t ocfs2_file_write_iter(struct kiocb *iocb,
void *saved_ki_complete = NULL;
int append_write = ((iocb->ki_pos + count) >=
i_size_read(inode) ? 1 : 0);
+ int direct_io = iocb->ki_flags & IOCB_DIRECT ? 1 : 0;
+ int nowait = iocb->ki_flags & IOCB_NOWAIT ? 1 : 0;
trace_ocfs2_file_aio_write(inode, file, file->f_path.dentry,
(unsigned long long)OCFS2_I(inode)->ip_blkno,
@@ -2230,12 +2263,17 @@ static ssize_t ocfs2_file_write_iter(struct kiocb *iocb,
file->f_path.dentry->d_name.name,
(unsigned int)from->nr_segs); /* GRRRRR */
+ if (!direct_io && nowait)
+ return -EOPNOTSUPP;
+
if (count == 0)
return 0;
- direct_io = iocb->ki_flags & IOCB_DIRECT ? 1 : 0;
-
- inode_lock(inode);
+ if (direct_io && nowait) {
+ if (!inode_trylock(inode))
+ return -EAGAIN;
+ } else
+ inode_lock(inode);
/*
* Concurrent O_DIRECT writes are allowed with
@@ -2244,9 +2282,13 @@ static ssize_t ocfs2_file_write_iter(struct kiocb *iocb,
*/
rw_level = (!direct_io || full_coherency || append_write);
- ret = ocfs2_rw_lock(inode, rw_level);
+ if (direct_io && nowait)
+ ret = ocfs2_try_rw_lock(inode, rw_level);
+ else
+ ret = ocfs2_rw_lock(inode, rw_level);
if (ret < 0) {
- mlog_errno(ret);
+ if (ret != -EAGAIN)
+ mlog_errno(ret);
goto out_mutex;
}
@@ -2260,9 +2302,13 @@ static ssize_t ocfs2_file_write_iter(struct kiocb *iocb,
* other nodes to drop their caches. Buffered I/O
* already does this in write_begin().
*/
- ret = ocfs2_inode_lock(inode, NULL, 1);
+ if (nowait)
+ ret = ocfs2_try_inode_lock(inode, NULL, 1);
+ else
+ ret = ocfs2_inode_lock(inode, NULL, 1);
if (ret < 0) {
- mlog_errno(ret);
+ if (ret != -EAGAIN)
+ mlog_errno(ret);
goto out;
}
@@ -2277,9 +2323,10 @@ static ssize_t ocfs2_file_write_iter(struct kiocb *iocb,
}
count = ret;
- ret = ocfs2_prepare_inode_for_write(file, iocb->ki_pos, count);
+ ret = ocfs2_prepare_inode_for_write(file, iocb->ki_pos, count, !nowait);
if (ret < 0) {
- mlog_errno(ret);
+ if (ret != -EAGAIN)
+ mlog_errno(ret);
goto out;
}
@@ -2355,6 +2402,7 @@ static ssize_t ocfs2_file_read_iter(struct kiocb *iocb,
int ret = 0, rw_level = -1, lock_level = 0;
struct file *filp = iocb->ki_filp;
struct inode *inode = file_inode(filp);
+ int nowait = iocb->ki_flags & IOCB_NOWAIT ? 1 : 0;
trace_ocfs2_file_aio_read(inode, filp, filp->f_path.dentry,
(unsigned long long)OCFS2_I(inode)->ip_blkno,
@@ -2374,9 +2422,14 @@ static ssize_t ocfs2_file_read_iter(struct kiocb *iocb,
* need locks to protect pending reads from racing with truncate.
*/
if (iocb->ki_flags & IOCB_DIRECT) {
- ret = ocfs2_rw_lock(inode, 0);
+ if (nowait)
+ ret = ocfs2_try_rw_lock(inode, 0);
+ else
+ ret = ocfs2_rw_lock(inode, 0);
+
if (ret < 0) {
- mlog_errno(ret);
+ if (ret != -EAGAIN)
+ mlog_errno(ret);
goto bail;
}
rw_level = 0;
@@ -2393,9 +2446,11 @@ static ssize_t ocfs2_file_read_iter(struct kiocb *iocb,
* like i_size. This allows the checks down below
* generic_file_aio_read() a chance of actually working.
*/
- ret = ocfs2_inode_lock_atime(inode, filp->f_path.mnt, &lock_level);
+ ret = ocfs2_inode_lock_atime(inode, filp->f_path.mnt, &lock_level,
+ !nowait);
if (ret < 0) {
- mlog_errno(ret);
+ if (ret != -EAGAIN)
+ mlog_errno(ret);
goto bail;
}
ocfs2_inode_unlock(inode, lock_level);
diff --git a/fs/ocfs2/mmap.c b/fs/ocfs2/mmap.c
index 098f5c7..fb9a20e 100644
--- a/fs/ocfs2/mmap.c
+++ b/fs/ocfs2/mmap.c
@@ -184,7 +184,7 @@ int ocfs2_mmap(struct file *file, struct vm_area_struct *vma)
int ret = 0, lock_level = 0;
ret = ocfs2_inode_lock_atime(file_inode(file),
- file->f_path.mnt, &lock_level);
+ file->f_path.mnt, &lock_level, 1);
if (ret < 0) {
mlog_errno(ret);
goto out;
diff --git a/fs/ocfs2/ocfs2_trace.h b/fs/ocfs2/ocfs2_trace.h
index a0b5d00..e2a11aa 100644
--- a/fs/ocfs2/ocfs2_trace.h
+++ b/fs/ocfs2/ocfs2_trace.h
@@ -1449,20 +1449,22 @@
TRACE_EVENT(ocfs2_prepare_inode_for_write,
TP_PROTO(unsigned long long ino, unsigned long long saved_pos,
- unsigned long count),
- TP_ARGS(ino, saved_pos, count),
+ unsigned long count, int wait),
+ TP_ARGS(ino, saved_pos, count, wait),
TP_STRUCT__entry(
__field(unsigned long long, ino)
__field(unsigned long long, saved_pos)
__field(unsigned long, count)
+ __field(int, wait)
),
TP_fast_assign(
__entry->ino = ino;
__entry->saved_pos = saved_pos;
__entry->count = count;
+ __entry->wait = wait;
),
- TP_printk("%llu %llu %lu", __entry->ino,
- __entry->saved_pos, __entry->count)
+ TP_printk("%llu %llu %lu %d", __entry->ino,
+ __entry->saved_pos, __entry->count, __entry->wait)
);
DEFINE_OCFS2_INT_EVENT(generic_file_aio_read_ret);
--
1.8.5.6
^ permalink raw reply related [flat|nested] 20+ messages in thread
* Re: [Ocfs2-devel] [PATCH v2 1/3] ocfs2: add ocfs2_try_rw_lock and ocfs2_try_inode_lock
2017-11-29 8:36 ` [Ocfs2-devel] " Gang He
@ 2017-11-29 9:05 ` piaojun
-1 siblings, 0 replies; 20+ messages in thread
From: piaojun @ 2017-11-29 9:05 UTC (permalink / raw)
To: Gang He, mfasheh, jlbec, rgoldwyn, hch; +Cc: linux-kernel, ocfs2-devel
On 2017/11/29 16:36, Gang He wrote:
> Add ocfs2_try_rw_lock and ocfs2_try_inode_lock functions, which
> will be used in non-block IO scenarios.
>
> Signed-off-by: Gang He <ghe@suse.com>
Reviewed-by: Jun Piao <piaojun@huawei.com>
> ---
> fs/ocfs2/dlmglue.c | 21 +++++++++++++++++++++
> fs/ocfs2/dlmglue.h | 4 ++++
> 2 files changed, 25 insertions(+)
>
> diff --git a/fs/ocfs2/dlmglue.c b/fs/ocfs2/dlmglue.c
> index 4689940..a68efa3 100644
> --- a/fs/ocfs2/dlmglue.c
> +++ b/fs/ocfs2/dlmglue.c
> @@ -1742,6 +1742,27 @@ int ocfs2_rw_lock(struct inode *inode, int write)
> return status;
> }
>
> +int ocfs2_try_rw_lock(struct inode *inode, int write)
> +{
> + int status, level;
> + struct ocfs2_lock_res *lockres;
> + struct ocfs2_super *osb = OCFS2_SB(inode->i_sb);
> +
> + mlog(0, "inode %llu try to take %s RW lock\n",
> + (unsigned long long)OCFS2_I(inode)->ip_blkno,
> + write ? "EXMODE" : "PRMODE");
> +
> + if (ocfs2_mount_local(osb))
> + return 0;
> +
> + lockres = &OCFS2_I(inode)->ip_rw_lockres;
> +
> + level = write ? DLM_LOCK_EX : DLM_LOCK_PR;
> +
> + status = ocfs2_cluster_lock(osb, lockres, level, DLM_LKF_NOQUEUE, 0);
> + return status;
> +}
> +
> void ocfs2_rw_unlock(struct inode *inode, int write)
> {
> int level = write ? DLM_LOCK_EX : DLM_LOCK_PR;
> diff --git a/fs/ocfs2/dlmglue.h b/fs/ocfs2/dlmglue.h
> index a7fc18b..05910fc 100644
> --- a/fs/ocfs2/dlmglue.h
> +++ b/fs/ocfs2/dlmglue.h
> @@ -116,6 +116,7 @@ void ocfs2_refcount_lock_res_init(struct ocfs2_lock_res *lockres,
> int ocfs2_create_new_inode_locks(struct inode *inode);
> int ocfs2_drop_inode_locks(struct inode *inode);
> int ocfs2_rw_lock(struct inode *inode, int write);
> +int ocfs2_try_rw_lock(struct inode *inode, int write);
> void ocfs2_rw_unlock(struct inode *inode, int write);
> int ocfs2_open_lock(struct inode *inode);
> int ocfs2_try_open_lock(struct inode *inode, int write);
> @@ -140,6 +141,9 @@ int ocfs2_inode_lock_with_page(struct inode *inode,
> /* 99% of the time we don't want to supply any additional flags --
> * those are for very specific cases only. */
> #define ocfs2_inode_lock(i, b, e) ocfs2_inode_lock_full_nested(i, b, e, 0, OI_LS_NORMAL)
> +#define ocfs2_try_inode_lock(i, b, e)\
> + ocfs2_inode_lock_full_nested(i, b, e, OCFS2_META_LOCK_NOQUEUE,\
> + OI_LS_NORMAL)
> void ocfs2_inode_unlock(struct inode *inode,
> int ex);
> int ocfs2_super_lock(struct ocfs2_super *osb,
>
^ permalink raw reply [flat|nested] 20+ messages in thread
* [Ocfs2-devel] [PATCH v2 1/3] ocfs2: add ocfs2_try_rw_lock and ocfs2_try_inode_lock
@ 2017-11-29 9:05 ` piaojun
0 siblings, 0 replies; 20+ messages in thread
From: piaojun @ 2017-11-29 9:05 UTC (permalink / raw)
To: Gang He, mfasheh, jlbec, rgoldwyn, hch; +Cc: linux-kernel, ocfs2-devel
On 2017/11/29 16:36, Gang He wrote:
> Add ocfs2_try_rw_lock and ocfs2_try_inode_lock functions, which
> will be used in non-block IO scenarios.
>
> Signed-off-by: Gang He <ghe@suse.com>
Reviewed-by: Jun Piao <piaojun@huawei.com>
> ---
> fs/ocfs2/dlmglue.c | 21 +++++++++++++++++++++
> fs/ocfs2/dlmglue.h | 4 ++++
> 2 files changed, 25 insertions(+)
>
> diff --git a/fs/ocfs2/dlmglue.c b/fs/ocfs2/dlmglue.c
> index 4689940..a68efa3 100644
> --- a/fs/ocfs2/dlmglue.c
> +++ b/fs/ocfs2/dlmglue.c
> @@ -1742,6 +1742,27 @@ int ocfs2_rw_lock(struct inode *inode, int write)
> return status;
> }
>
> +int ocfs2_try_rw_lock(struct inode *inode, int write)
> +{
> + int status, level;
> + struct ocfs2_lock_res *lockres;
> + struct ocfs2_super *osb = OCFS2_SB(inode->i_sb);
> +
> + mlog(0, "inode %llu try to take %s RW lock\n",
> + (unsigned long long)OCFS2_I(inode)->ip_blkno,
> + write ? "EXMODE" : "PRMODE");
> +
> + if (ocfs2_mount_local(osb))
> + return 0;
> +
> + lockres = &OCFS2_I(inode)->ip_rw_lockres;
> +
> + level = write ? DLM_LOCK_EX : DLM_LOCK_PR;
> +
> + status = ocfs2_cluster_lock(osb, lockres, level, DLM_LKF_NOQUEUE, 0);
> + return status;
> +}
> +
> void ocfs2_rw_unlock(struct inode *inode, int write)
> {
> int level = write ? DLM_LOCK_EX : DLM_LOCK_PR;
> diff --git a/fs/ocfs2/dlmglue.h b/fs/ocfs2/dlmglue.h
> index a7fc18b..05910fc 100644
> --- a/fs/ocfs2/dlmglue.h
> +++ b/fs/ocfs2/dlmglue.h
> @@ -116,6 +116,7 @@ void ocfs2_refcount_lock_res_init(struct ocfs2_lock_res *lockres,
> int ocfs2_create_new_inode_locks(struct inode *inode);
> int ocfs2_drop_inode_locks(struct inode *inode);
> int ocfs2_rw_lock(struct inode *inode, int write);
> +int ocfs2_try_rw_lock(struct inode *inode, int write);
> void ocfs2_rw_unlock(struct inode *inode, int write);
> int ocfs2_open_lock(struct inode *inode);
> int ocfs2_try_open_lock(struct inode *inode, int write);
> @@ -140,6 +141,9 @@ int ocfs2_inode_lock_with_page(struct inode *inode,
> /* 99% of the time we don't want to supply any additional flags --
> * those are for very specific cases only. */
> #define ocfs2_inode_lock(i, b, e) ocfs2_inode_lock_full_nested(i, b, e, 0, OI_LS_NORMAL)
> +#define ocfs2_try_inode_lock(i, b, e)\
> + ocfs2_inode_lock_full_nested(i, b, e, OCFS2_META_LOCK_NOQUEUE,\
> + OI_LS_NORMAL)
> void ocfs2_inode_unlock(struct inode *inode,
> int ex);
> int ocfs2_super_lock(struct ocfs2_super *osb,
>
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [Ocfs2-devel] [PATCH v2 1/3] ocfs2: add ocfs2_try_rw_lock and ocfs2_try_inode_lock
2017-11-29 8:36 ` [Ocfs2-devel] " Gang He
@ 2017-11-29 12:02 ` Changwei Ge
-1 siblings, 0 replies; 20+ messages in thread
From: Changwei Ge @ 2017-11-29 12:02 UTC (permalink / raw)
To: Gang He, mfasheh, jlbec, rgoldwyn, hch; +Cc: linux-kernel, ocfs2-devel
On 2017/11/29 16:38, Gang He wrote:
> Add ocfs2_try_rw_lock and ocfs2_try_inode_lock functions, which
> will be used in non-block IO scenarios.
>
> Signed-off-by: Gang He <ghe@suse.com>
> ---
> fs/ocfs2/dlmglue.c | 21 +++++++++++++++++++++
> fs/ocfs2/dlmglue.h | 4 ++++
> 2 files changed, 25 insertions(+)
>
> diff --git a/fs/ocfs2/dlmglue.c b/fs/ocfs2/dlmglue.c
> index 4689940..a68efa3 100644
> --- a/fs/ocfs2/dlmglue.c
> +++ b/fs/ocfs2/dlmglue.c
> @@ -1742,6 +1742,27 @@ int ocfs2_rw_lock(struct inode *inode, int write)
> return status;
> }
>
> +int ocfs2_try_rw_lock(struct inode *inode, int write)
> +{
> + int status, level;
> + struct ocfs2_lock_res *lockres;
> + struct ocfs2_super *osb = OCFS2_SB(inode->i_sb);
> +
> + mlog(0, "inode %llu try to take %s RW lock\n",
> + (unsigned long long)OCFS2_I(inode)->ip_blkno,
> + write ? "EXMODE" : "PRMODE");
> +
> + if (ocfs2_mount_local(osb))
> + return 0;
> +
> + lockres = &OCFS2_I(inode)->ip_rw_lockres;
> +
> + level = write ? DLM_LOCK_EX : DLM_LOCK_PR;
> +
> + status = ocfs2_cluster_lock(osb, lockres, level, DLM_LKF_NOQUEUE, 0);
Hi Gang,
Should we consider about passing a flag - OCFS2_LOCK_NONBLOCK to
ocfs2_cluster_lock. Otherwise a cluster locking progress may be waiting
for accomplishment of DC, which I think violates _NO_WAIT_ semantics.
Thanks,
Changwei.
> + return status;
> +}
> +
> void ocfs2_rw_unlock(struct inode *inode, int write)
> {
> int level = write ? DLM_LOCK_EX : DLM_LOCK_PR;
> diff --git a/fs/ocfs2/dlmglue.h b/fs/ocfs2/dlmglue.h
> index a7fc18b..05910fc 100644
> --- a/fs/ocfs2/dlmglue.h
> +++ b/fs/ocfs2/dlmglue.h
> @@ -116,6 +116,7 @@ void ocfs2_refcount_lock_res_init(struct ocfs2_lock_res *lockres,
> int ocfs2_create_new_inode_locks(struct inode *inode);
> int ocfs2_drop_inode_locks(struct inode *inode);
> int ocfs2_rw_lock(struct inode *inode, int write);
> +int ocfs2_try_rw_lock(struct inode *inode, int write);
> void ocfs2_rw_unlock(struct inode *inode, int write);
> int ocfs2_open_lock(struct inode *inode);
> int ocfs2_try_open_lock(struct inode *inode, int write);
> @@ -140,6 +141,9 @@ int ocfs2_inode_lock_with_page(struct inode *inode,
> /* 99% of the time we don't want to supply any additional flags --
> * those are for very specific cases only. */
> #define ocfs2_inode_lock(i, b, e) ocfs2_inode_lock_full_nested(i, b, e, 0, OI_LS_NORMAL)
> +#define ocfs2_try_inode_lock(i, b, e)\
> + ocfs2_inode_lock_full_nested(i, b, e, OCFS2_META_LOCK_NOQUEUE,\
> + OI_LS_NORMAL)
> void ocfs2_inode_unlock(struct inode *inode,
> int ex);
> int ocfs2_super_lock(struct ocfs2_super *osb,
>
^ permalink raw reply [flat|nested] 20+ messages in thread
* [Ocfs2-devel] [PATCH v2 1/3] ocfs2: add ocfs2_try_rw_lock and ocfs2_try_inode_lock
@ 2017-11-29 12:02 ` Changwei Ge
0 siblings, 0 replies; 20+ messages in thread
From: Changwei Ge @ 2017-11-29 12:02 UTC (permalink / raw)
To: Gang He, mfasheh, jlbec, rgoldwyn, hch; +Cc: linux-kernel, ocfs2-devel
On 2017/11/29 16:38, Gang He wrote:
> Add ocfs2_try_rw_lock and ocfs2_try_inode_lock functions, which
> will be used in non-block IO scenarios.
>
> Signed-off-by: Gang He <ghe@suse.com>
> ---
> fs/ocfs2/dlmglue.c | 21 +++++++++++++++++++++
> fs/ocfs2/dlmglue.h | 4 ++++
> 2 files changed, 25 insertions(+)
>
> diff --git a/fs/ocfs2/dlmglue.c b/fs/ocfs2/dlmglue.c
> index 4689940..a68efa3 100644
> --- a/fs/ocfs2/dlmglue.c
> +++ b/fs/ocfs2/dlmglue.c
> @@ -1742,6 +1742,27 @@ int ocfs2_rw_lock(struct inode *inode, int write)
> return status;
> }
>
> +int ocfs2_try_rw_lock(struct inode *inode, int write)
> +{
> + int status, level;
> + struct ocfs2_lock_res *lockres;
> + struct ocfs2_super *osb = OCFS2_SB(inode->i_sb);
> +
> + mlog(0, "inode %llu try to take %s RW lock\n",
> + (unsigned long long)OCFS2_I(inode)->ip_blkno,
> + write ? "EXMODE" : "PRMODE");
> +
> + if (ocfs2_mount_local(osb))
> + return 0;
> +
> + lockres = &OCFS2_I(inode)->ip_rw_lockres;
> +
> + level = write ? DLM_LOCK_EX : DLM_LOCK_PR;
> +
> + status = ocfs2_cluster_lock(osb, lockres, level, DLM_LKF_NOQUEUE, 0);
Hi Gang,
Should we consider about passing a flag - OCFS2_LOCK_NONBLOCK to
ocfs2_cluster_lock. Otherwise a cluster locking progress may be waiting
for accomplishment of DC, which I think violates _NO_WAIT_ semantics.
Thanks,
Changwei.
> + return status;
> +}
> +
> void ocfs2_rw_unlock(struct inode *inode, int write)
> {
> int level = write ? DLM_LOCK_EX : DLM_LOCK_PR;
> diff --git a/fs/ocfs2/dlmglue.h b/fs/ocfs2/dlmglue.h
> index a7fc18b..05910fc 100644
> --- a/fs/ocfs2/dlmglue.h
> +++ b/fs/ocfs2/dlmglue.h
> @@ -116,6 +116,7 @@ void ocfs2_refcount_lock_res_init(struct ocfs2_lock_res *lockres,
> int ocfs2_create_new_inode_locks(struct inode *inode);
> int ocfs2_drop_inode_locks(struct inode *inode);
> int ocfs2_rw_lock(struct inode *inode, int write);
> +int ocfs2_try_rw_lock(struct inode *inode, int write);
> void ocfs2_rw_unlock(struct inode *inode, int write);
> int ocfs2_open_lock(struct inode *inode);
> int ocfs2_try_open_lock(struct inode *inode, int write);
> @@ -140,6 +141,9 @@ int ocfs2_inode_lock_with_page(struct inode *inode,
> /* 99% of the time we don't want to supply any additional flags --
> * those are for very specific cases only. */
> #define ocfs2_inode_lock(i, b, e) ocfs2_inode_lock_full_nested(i, b, e, 0, OI_LS_NORMAL)
> +#define ocfs2_try_inode_lock(i, b, e)\
> + ocfs2_inode_lock_full_nested(i, b, e, OCFS2_META_LOCK_NOQUEUE,\
> + OI_LS_NORMAL)
> void ocfs2_inode_unlock(struct inode *inode,
> int ex);
> int ocfs2_super_lock(struct ocfs2_super *osb,
>
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [Ocfs2-devel] [PATCH v2 2/3] ocfs2: add ocfs2_overwrite_io function
2017-11-29 8:36 ` [Ocfs2-devel] " Gang He
@ 2017-11-29 12:05 ` Changwei Ge
-1 siblings, 0 replies; 20+ messages in thread
From: Changwei Ge @ 2017-11-29 12:05 UTC (permalink / raw)
To: Gang He, mfasheh, jlbec, rgoldwyn, hch; +Cc: linux-kernel, ocfs2-devel
It looks fine to me.
On 2017/11/29 16:39, Gang He wrote:
> Add ocfs2_overwrite_io function, which is used to judge if
> overwrite allocated blocks, otherwise, the write will bring extra
> block allocation overhead.
>
> Signed-off-by: Gang He <ghe@suse.com>
Reviewed-by: Changwei Ge <ge.changwei@h3c.com>
> ---
> fs/ocfs2/extent_map.c | 41 +++++++++++++++++++++++++++++++++++++++++
> fs/ocfs2/extent_map.h | 3 +++
> 2 files changed, 44 insertions(+)
>
> diff --git a/fs/ocfs2/extent_map.c b/fs/ocfs2/extent_map.c
> index e4719e0..4da33e9 100644
> --- a/fs/ocfs2/extent_map.c
> +++ b/fs/ocfs2/extent_map.c
> @@ -832,6 +832,47 @@ int ocfs2_fiemap(struct inode *inode, struct fiemap_extent_info *fieinfo,
> return ret;
> }
>
> +/* Is IO overwriting allocated blocks? */
> +int ocfs2_overwrite_io(struct inode *inode, struct buffer_head *di_bh,
> + u64 map_start, u64 map_len)
> +{
> + int ret = 0, is_last;
> + u32 mapping_end, cpos;
> + struct ocfs2_super *osb = OCFS2_SB(inode->i_sb);
> + struct ocfs2_extent_rec rec;
> +
> + if ((OCFS2_I(inode)->ip_dyn_features & OCFS2_INLINE_DATA_FL) &&
> + ((map_start + map_len) <= i_size_read(inode)))
> + goto out;
> +
> + cpos = map_start >> osb->s_clustersize_bits;
> + mapping_end = ocfs2_clusters_for_bytes(inode->i_sb,
> + map_start + map_len);
> + is_last = 0;
> + while (cpos < mapping_end && !is_last) {
> + ret = ocfs2_get_clusters_nocache(inode, di_bh, cpos,
> + NULL, &rec, &is_last);
> + if (ret) {
> + mlog_errno(ret);
> + goto out;
> + }
> +
> + if (rec.e_blkno == 0ULL)
> + break;
> +
> + if (rec.e_flags & OCFS2_EXT_REFCOUNTED)
> + break;
> +
> + cpos = le32_to_cpu(rec.e_cpos) +
> + le16_to_cpu(rec.e_leaf_clusters);
> + }
> +
> + if (cpos < mapping_end)
> + ret = -EAGAIN;
> +out:
> + return ret;
> +}
> +
> int ocfs2_seek_data_hole_offset(struct file *file, loff_t *offset, int whence)
> {
> struct inode *inode = file->f_mapping->host;
> diff --git a/fs/ocfs2/extent_map.h b/fs/ocfs2/extent_map.h
> index 67ea57d..1057586 100644
> --- a/fs/ocfs2/extent_map.h
> +++ b/fs/ocfs2/extent_map.h
> @@ -53,6 +53,9 @@ int ocfs2_extent_map_get_blocks(struct inode *inode, u64 v_blkno, u64 *p_blkno,
> int ocfs2_fiemap(struct inode *inode, struct fiemap_extent_info *fieinfo,
> u64 map_start, u64 map_len);
>
> +int ocfs2_overwrite_io(struct inode *inode, struct buffer_head *di_bh,
> + u64 map_start, u64 map_len);
> +
> int ocfs2_seek_data_hole_offset(struct file *file, loff_t *offset, int origin);
>
> int ocfs2_xattr_get_clusters(struct inode *inode, u32 v_cluster,
>
^ permalink raw reply [flat|nested] 20+ messages in thread
* [Ocfs2-devel] [PATCH v2 2/3] ocfs2: add ocfs2_overwrite_io function
@ 2017-11-29 12:05 ` Changwei Ge
0 siblings, 0 replies; 20+ messages in thread
From: Changwei Ge @ 2017-11-29 12:05 UTC (permalink / raw)
To: Gang He, mfasheh, jlbec, rgoldwyn, hch; +Cc: linux-kernel, ocfs2-devel
It looks fine to me.
On 2017/11/29 16:39, Gang He wrote:
> Add ocfs2_overwrite_io function, which is used to judge if
> overwrite allocated blocks, otherwise, the write will bring extra
> block allocation overhead.
>
> Signed-off-by: Gang He <ghe@suse.com>
Reviewed-by: Changwei Ge <ge.changwei@h3c.com>
> ---
> fs/ocfs2/extent_map.c | 41 +++++++++++++++++++++++++++++++++++++++++
> fs/ocfs2/extent_map.h | 3 +++
> 2 files changed, 44 insertions(+)
>
> diff --git a/fs/ocfs2/extent_map.c b/fs/ocfs2/extent_map.c
> index e4719e0..4da33e9 100644
> --- a/fs/ocfs2/extent_map.c
> +++ b/fs/ocfs2/extent_map.c
> @@ -832,6 +832,47 @@ int ocfs2_fiemap(struct inode *inode, struct fiemap_extent_info *fieinfo,
> return ret;
> }
>
> +/* Is IO overwriting allocated blocks? */
> +int ocfs2_overwrite_io(struct inode *inode, struct buffer_head *di_bh,
> + u64 map_start, u64 map_len)
> +{
> + int ret = 0, is_last;
> + u32 mapping_end, cpos;
> + struct ocfs2_super *osb = OCFS2_SB(inode->i_sb);
> + struct ocfs2_extent_rec rec;
> +
> + if ((OCFS2_I(inode)->ip_dyn_features & OCFS2_INLINE_DATA_FL) &&
> + ((map_start + map_len) <= i_size_read(inode)))
> + goto out;
> +
> + cpos = map_start >> osb->s_clustersize_bits;
> + mapping_end = ocfs2_clusters_for_bytes(inode->i_sb,
> + map_start + map_len);
> + is_last = 0;
> + while (cpos < mapping_end && !is_last) {
> + ret = ocfs2_get_clusters_nocache(inode, di_bh, cpos,
> + NULL, &rec, &is_last);
> + if (ret) {
> + mlog_errno(ret);
> + goto out;
> + }
> +
> + if (rec.e_blkno == 0ULL)
> + break;
> +
> + if (rec.e_flags & OCFS2_EXT_REFCOUNTED)
> + break;
> +
> + cpos = le32_to_cpu(rec.e_cpos) +
> + le16_to_cpu(rec.e_leaf_clusters);
> + }
> +
> + if (cpos < mapping_end)
> + ret = -EAGAIN;
> +out:
> + return ret;
> +}
> +
> int ocfs2_seek_data_hole_offset(struct file *file, loff_t *offset, int whence)
> {
> struct inode *inode = file->f_mapping->host;
> diff --git a/fs/ocfs2/extent_map.h b/fs/ocfs2/extent_map.h
> index 67ea57d..1057586 100644
> --- a/fs/ocfs2/extent_map.h
> +++ b/fs/ocfs2/extent_map.h
> @@ -53,6 +53,9 @@ int ocfs2_extent_map_get_blocks(struct inode *inode, u64 v_blkno, u64 *p_blkno,
> int ocfs2_fiemap(struct inode *inode, struct fiemap_extent_info *fieinfo,
> u64 map_start, u64 map_len);
>
> +int ocfs2_overwrite_io(struct inode *inode, struct buffer_head *di_bh,
> + u64 map_start, u64 map_len);
> +
> int ocfs2_seek_data_hole_offset(struct file *file, loff_t *offset, int origin);
>
> int ocfs2_xattr_get_clusters(struct inode *inode, u32 v_cluster,
>
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [Ocfs2-devel] [PATCH v2 1/3] ocfs2: add ocfs2_try_rw_lock and ocfs2_try_inode_lock
2017-11-29 12:02 ` Changwei Ge
@ 2017-11-30 2:44 ` Gang He
-1 siblings, 0 replies; 20+ messages in thread
From: Gang He @ 2017-11-30 2:44 UTC (permalink / raw)
To: jlbec, ge.changwei, hch, Goldwyn Rodrigues, mfasheh
Cc: ocfs2-devel, linux-kernel
Hello Changwei,
>>>
> On 2017/11/29 16:38, Gang He wrote:
>> Add ocfs2_try_rw_lock and ocfs2_try_inode_lock functions, which
>> will be used in non-block IO scenarios.
>>
>> Signed-off-by: Gang He <ghe@suse.com>
>> ---
>> fs/ocfs2/dlmglue.c | 21 +++++++++++++++++++++
>> fs/ocfs2/dlmglue.h | 4 ++++
>> 2 files changed, 25 insertions(+)
>>
>> diff --git a/fs/ocfs2/dlmglue.c b/fs/ocfs2/dlmglue.c
>> index 4689940..a68efa3 100644
>> --- a/fs/ocfs2/dlmglue.c
>> +++ b/fs/ocfs2/dlmglue.c
>> @@ -1742,6 +1742,27 @@ int ocfs2_rw_lock(struct inode *inode, int write)
>> return status;
>> }
>>
>> +int ocfs2_try_rw_lock(struct inode *inode, int write)
>> +{
>> + int status, level;
>> + struct ocfs2_lock_res *lockres;
>> + struct ocfs2_super *osb = OCFS2_SB(inode->i_sb);
>> +
>> + mlog(0, "inode %llu try to take %s RW lock\n",
>> + (unsigned long long)OCFS2_I(inode)->ip_blkno,
>> + write ? "EXMODE" : "PRMODE");
>> +
>> + if (ocfs2_mount_local(osb))
>> + return 0;
>> +
>> + lockres = &OCFS2_I(inode)->ip_rw_lockres;
>> +
>> + level = write ? DLM_LOCK_EX : DLM_LOCK_PR;
>> +
>> + status = ocfs2_cluster_lock(osb, lockres, level, DLM_LKF_NOQUEUE, 0);
>
> Hi Gang,
> Should we consider about passing a flag - OCFS2_LOCK_NONBLOCK to
> ocfs2_cluster_lock. Otherwise a cluster locking progress may be waiting
> for accomplishment of DC, which I think violates _NO_WAIT_ semantics.
If ocfs2 is a local file system, we should not wait for any condition, but for a cluster file system,
we cannot avoid this totally according to the current DLM lock design, we need to wait for a little to get lock for the first time.
Why do we not use OCFS2_LOCK_NONBLOCK flag to get a lock?
since this flag is not stable to get a lock no matter this lock is occupied by other nodes, or not.
If you use OCFS2_LOCK_NONBLOCK flag to get a fresh lock, you possibly fail or success, depends on when the lock acquisition callback happens.
So, I think DLM_LKF_NOQUEUE flag should be more matched to _NO_WAIT_ semantics.
we always get a fresh lock successfully, always failed if the lock is/was occupied by other nodes.
This flag can give us a consistent locking behavior.
Thanks
Gang
>
> Thanks,
> Changwei.
>
>> + return status;
>> +}
>> +
>> void ocfs2_rw_unlock(struct inode *inode, int write)
>> {
>> int level = write ? DLM_LOCK_EX : DLM_LOCK_PR;
>> diff --git a/fs/ocfs2/dlmglue.h b/fs/ocfs2/dlmglue.h
>> index a7fc18b..05910fc 100644
>> --- a/fs/ocfs2/dlmglue.h
>> +++ b/fs/ocfs2/dlmglue.h
>> @@ -116,6 +116,7 @@ void ocfs2_refcount_lock_res_init(struct ocfs2_lock_res
> *lockres,
>> int ocfs2_create_new_inode_locks(struct inode *inode);
>> int ocfs2_drop_inode_locks(struct inode *inode);
>> int ocfs2_rw_lock(struct inode *inode, int write);
>> +int ocfs2_try_rw_lock(struct inode *inode, int write);
>> void ocfs2_rw_unlock(struct inode *inode, int write);
>> int ocfs2_open_lock(struct inode *inode);
>> int ocfs2_try_open_lock(struct inode *inode, int write);
>> @@ -140,6 +141,9 @@ int ocfs2_inode_lock_with_page(struct inode *inode,
>> /* 99% of the time we don't want to supply any additional flags --
>> * those are for very specific cases only. */
>> #define ocfs2_inode_lock(i, b, e) ocfs2_inode_lock_full_nested(i, b, e, 0,
> OI_LS_NORMAL)
>> +#define ocfs2_try_inode_lock(i, b, e)\
>> + ocfs2_inode_lock_full_nested(i, b, e, OCFS2_META_LOCK_NOQUEUE,\
>> + OI_LS_NORMAL)
>> void ocfs2_inode_unlock(struct inode *inode,
>> int ex);
>> int ocfs2_super_lock(struct ocfs2_super *osb,
>>
^ permalink raw reply [flat|nested] 20+ messages in thread
* [Ocfs2-devel] [PATCH v2 1/3] ocfs2: add ocfs2_try_rw_lock and ocfs2_try_inode_lock
@ 2017-11-30 2:44 ` Gang He
0 siblings, 0 replies; 20+ messages in thread
From: Gang He @ 2017-11-30 2:44 UTC (permalink / raw)
To: jlbec, ge.changwei, hch, Goldwyn Rodrigues, mfasheh
Cc: ocfs2-devel, linux-kernel
Hello Changwei,
>>>
> On 2017/11/29 16:38, Gang He wrote:
>> Add ocfs2_try_rw_lock and ocfs2_try_inode_lock functions, which
>> will be used in non-block IO scenarios.
>>
>> Signed-off-by: Gang He <ghe@suse.com>
>> ---
>> fs/ocfs2/dlmglue.c | 21 +++++++++++++++++++++
>> fs/ocfs2/dlmglue.h | 4 ++++
>> 2 files changed, 25 insertions(+)
>>
>> diff --git a/fs/ocfs2/dlmglue.c b/fs/ocfs2/dlmglue.c
>> index 4689940..a68efa3 100644
>> --- a/fs/ocfs2/dlmglue.c
>> +++ b/fs/ocfs2/dlmglue.c
>> @@ -1742,6 +1742,27 @@ int ocfs2_rw_lock(struct inode *inode, int write)
>> return status;
>> }
>>
>> +int ocfs2_try_rw_lock(struct inode *inode, int write)
>> +{
>> + int status, level;
>> + struct ocfs2_lock_res *lockres;
>> + struct ocfs2_super *osb = OCFS2_SB(inode->i_sb);
>> +
>> + mlog(0, "inode %llu try to take %s RW lock\n",
>> + (unsigned long long)OCFS2_I(inode)->ip_blkno,
>> + write ? "EXMODE" : "PRMODE");
>> +
>> + if (ocfs2_mount_local(osb))
>> + return 0;
>> +
>> + lockres = &OCFS2_I(inode)->ip_rw_lockres;
>> +
>> + level = write ? DLM_LOCK_EX : DLM_LOCK_PR;
>> +
>> + status = ocfs2_cluster_lock(osb, lockres, level, DLM_LKF_NOQUEUE, 0);
>
> Hi Gang,
> Should we consider about passing a flag - OCFS2_LOCK_NONBLOCK to
> ocfs2_cluster_lock. Otherwise a cluster locking progress may be waiting
> for accomplishment of DC, which I think violates _NO_WAIT_ semantics.
If ocfs2 is a local file system, we should not wait for any condition, but for a cluster file system,
we cannot avoid this totally according to the current DLM lock design, we need to wait for a little to get lock for the first time.
Why do we not use OCFS2_LOCK_NONBLOCK flag to get a lock?
since this flag is not stable to get a lock no matter this lock is occupied by other nodes, or not.
If you use OCFS2_LOCK_NONBLOCK flag to get a fresh lock, you possibly fail or success, depends on when the lock acquisition callback happens.
So, I think DLM_LKF_NOQUEUE flag should be more matched to _NO_WAIT_ semantics.
we always get a fresh lock successfully, always failed if the lock is/was occupied by other nodes.
This flag can give us a consistent locking behavior.
Thanks
Gang
>
> Thanks,
> Changwei.
>
>> + return status;
>> +}
>> +
>> void ocfs2_rw_unlock(struct inode *inode, int write)
>> {
>> int level = write ? DLM_LOCK_EX : DLM_LOCK_PR;
>> diff --git a/fs/ocfs2/dlmglue.h b/fs/ocfs2/dlmglue.h
>> index a7fc18b..05910fc 100644
>> --- a/fs/ocfs2/dlmglue.h
>> +++ b/fs/ocfs2/dlmglue.h
>> @@ -116,6 +116,7 @@ void ocfs2_refcount_lock_res_init(struct ocfs2_lock_res
> *lockres,
>> int ocfs2_create_new_inode_locks(struct inode *inode);
>> int ocfs2_drop_inode_locks(struct inode *inode);
>> int ocfs2_rw_lock(struct inode *inode, int write);
>> +int ocfs2_try_rw_lock(struct inode *inode, int write);
>> void ocfs2_rw_unlock(struct inode *inode, int write);
>> int ocfs2_open_lock(struct inode *inode);
>> int ocfs2_try_open_lock(struct inode *inode, int write);
>> @@ -140,6 +141,9 @@ int ocfs2_inode_lock_with_page(struct inode *inode,
>> /* 99% of the time we don't want to supply any additional flags --
>> * those are for very specific cases only. */
>> #define ocfs2_inode_lock(i, b, e) ocfs2_inode_lock_full_nested(i, b, e, 0,
> OI_LS_NORMAL)
>> +#define ocfs2_try_inode_lock(i, b, e)\
>> + ocfs2_inode_lock_full_nested(i, b, e, OCFS2_META_LOCK_NOQUEUE,\
>> + OI_LS_NORMAL)
>> void ocfs2_inode_unlock(struct inode *inode,
>> int ex);
>> int ocfs2_super_lock(struct ocfs2_super *osb,
>>
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [Ocfs2-devel] [PATCH v2 1/3] ocfs2: add ocfs2_try_rw_lock and ocfs2_try_inode_lock
2017-11-30 2:44 ` Gang He
@ 2017-11-30 3:01 ` Changwei Ge
-1 siblings, 0 replies; 20+ messages in thread
From: Changwei Ge @ 2017-11-30 3:01 UTC (permalink / raw)
To: Gang He, jlbec, hch, Goldwyn Rodrigues, mfasheh; +Cc: ocfs2-devel, linux-kernel
Hi Gang,
On 2017/11/30 10:45, Gang He wrote:
> Hello Changwei,
>
>
>>>>
>> On 2017/11/29 16:38, Gang He wrote:
>>> Add ocfs2_try_rw_lock and ocfs2_try_inode_lock functions, which
>>> will be used in non-block IO scenarios.
>>>
>>> Signed-off-by: Gang He <ghe@suse.com>
>>> ---
>>> fs/ocfs2/dlmglue.c | 21 +++++++++++++++++++++
>>> fs/ocfs2/dlmglue.h | 4 ++++
>>> 2 files changed, 25 insertions(+)
>>>
>>> diff --git a/fs/ocfs2/dlmglue.c b/fs/ocfs2/dlmglue.c
>>> index 4689940..a68efa3 100644
>>> --- a/fs/ocfs2/dlmglue.c
>>> +++ b/fs/ocfs2/dlmglue.c
>>> @@ -1742,6 +1742,27 @@ int ocfs2_rw_lock(struct inode *inode, int write)
>>> return status;
>>> }
>>>
>>> +int ocfs2_try_rw_lock(struct inode *inode, int write)
>>> +{
>>> + int status, level;
>>> + struct ocfs2_lock_res *lockres;
>>> + struct ocfs2_super *osb = OCFS2_SB(inode->i_sb);
>>> +
>>> + mlog(0, "inode %llu try to take %s RW lock\n",
>>> + (unsigned long long)OCFS2_I(inode)->ip_blkno,
>>> + write ? "EXMODE" : "PRMODE");
>>> +
>>> + if (ocfs2_mount_local(osb))
>>> + return 0;
>>> +
>>> + lockres = &OCFS2_I(inode)->ip_rw_lockres;
>>> +
>>> + level = write ? DLM_LOCK_EX : DLM_LOCK_PR;
>>> +
>>> + status = ocfs2_cluster_lock(osb, lockres, level, DLM_LKF_NOQUEUE, 0);
>>
>> Hi Gang,
>> Should we consider about passing a flag - OCFS2_LOCK_NONBLOCK to
>> ocfs2_cluster_lock. Otherwise a cluster locking progress may be waiting
>> for accomplishment of DC, which I think violates _NO_WAIT_ semantics.
>
> If ocfs2 is a local file system, we should not wait for any condition, but for a cluster file system,
> we cannot avoid this totally according to the current DLM lock design, we need to wait for a little to get lock for the first time.
> Why do we not use OCFS2_LOCK_NONBLOCK flag to get a lock?
> since this flag is not stable to get a lock no matter this lock is occupied by other nodes, or not.
I suppose local node must be granted under the condition that it is
marked with *OCFS2_LOCK_BLOCKED*. And the control flag
_OCFS2_LOCK_NONBLOCK_ will make lock progress directly return -EAGAIN
without any waiting.
> If you use OCFS2_LOCK_NONBLOCK flag to get a fresh lock, you possibly fail or success, depends on when the lock acquisition callback happens.
> So, I think DLM_LKF_NOQUEUE flag should be more matched to _NO_WAIT_ semantics.
I thinks OCFS2_LOCK_NONBLOCK doesn't conflict DLM_LKF_NOQUEUE, they are
the forth and fifth argument respectively.
> we always get a fresh lock successfully, always failed if the lock is/was occupied by other nodes.
What do you mean by a fresh lock? A lock that is never granted or
acquired? If a lock is marked with OCFS2_LOCK_BLOCKED, local node must
has acquired it.
Thanks,
Changwei
> This flag can give us a consistent locking behavior.
>
> Thanks
> Gang
>
>
>
>>
>> Thanks,
>> Changwei.
>>
>>> + return status;
>>> +}
>>> +
>>> void ocfs2_rw_unlock(struct inode *inode, int write)
>>> {
>>> int level = write ? DLM_LOCK_EX : DLM_LOCK_PR;
>>> diff --git a/fs/ocfs2/dlmglue.h b/fs/ocfs2/dlmglue.h
>>> index a7fc18b..05910fc 100644
>>> --- a/fs/ocfs2/dlmglue.h
>>> +++ b/fs/ocfs2/dlmglue.h
>>> @@ -116,6 +116,7 @@ void ocfs2_refcount_lock_res_init(struct ocfs2_lock_res
>> *lockres,
>>> int ocfs2_create_new_inode_locks(struct inode *inode);
>>> int ocfs2_drop_inode_locks(struct inode *inode);
>>> int ocfs2_rw_lock(struct inode *inode, int write);
>>> +int ocfs2_try_rw_lock(struct inode *inode, int write);
>>> void ocfs2_rw_unlock(struct inode *inode, int write);
>>> int ocfs2_open_lock(struct inode *inode);
>>> int ocfs2_try_open_lock(struct inode *inode, int write);
>>> @@ -140,6 +141,9 @@ int ocfs2_inode_lock_with_page(struct inode *inode,
>>> /* 99% of the time we don't want to supply any additional flags --
>>> * those are for very specific cases only. */
>>> #define ocfs2_inode_lock(i, b, e) ocfs2_inode_lock_full_nested(i, b, e, 0,
>> OI_LS_NORMAL)
>>> +#define ocfs2_try_inode_lock(i, b, e)\
>>> + ocfs2_inode_lock_full_nested(i, b, e, OCFS2_META_LOCK_NOQUEUE,\
>>> + OI_LS_NORMAL)
>>> void ocfs2_inode_unlock(struct inode *inode,
>>> int ex);
>>> int ocfs2_super_lock(struct ocfs2_super *osb,
>>>
>
>
^ permalink raw reply [flat|nested] 20+ messages in thread
* [Ocfs2-devel] [PATCH v2 1/3] ocfs2: add ocfs2_try_rw_lock and ocfs2_try_inode_lock
@ 2017-11-30 3:01 ` Changwei Ge
0 siblings, 0 replies; 20+ messages in thread
From: Changwei Ge @ 2017-11-30 3:01 UTC (permalink / raw)
To: Gang He, jlbec, hch, Goldwyn Rodrigues, mfasheh; +Cc: ocfs2-devel, linux-kernel
Hi Gang,
On 2017/11/30 10:45, Gang He wrote:
> Hello Changwei,
>
>
>>>>
>> On 2017/11/29 16:38, Gang He wrote:
>>> Add ocfs2_try_rw_lock and ocfs2_try_inode_lock functions, which
>>> will be used in non-block IO scenarios.
>>>
>>> Signed-off-by: Gang He <ghe@suse.com>
>>> ---
>>> fs/ocfs2/dlmglue.c | 21 +++++++++++++++++++++
>>> fs/ocfs2/dlmglue.h | 4 ++++
>>> 2 files changed, 25 insertions(+)
>>>
>>> diff --git a/fs/ocfs2/dlmglue.c b/fs/ocfs2/dlmglue.c
>>> index 4689940..a68efa3 100644
>>> --- a/fs/ocfs2/dlmglue.c
>>> +++ b/fs/ocfs2/dlmglue.c
>>> @@ -1742,6 +1742,27 @@ int ocfs2_rw_lock(struct inode *inode, int write)
>>> return status;
>>> }
>>>
>>> +int ocfs2_try_rw_lock(struct inode *inode, int write)
>>> +{
>>> + int status, level;
>>> + struct ocfs2_lock_res *lockres;
>>> + struct ocfs2_super *osb = OCFS2_SB(inode->i_sb);
>>> +
>>> + mlog(0, "inode %llu try to take %s RW lock\n",
>>> + (unsigned long long)OCFS2_I(inode)->ip_blkno,
>>> + write ? "EXMODE" : "PRMODE");
>>> +
>>> + if (ocfs2_mount_local(osb))
>>> + return 0;
>>> +
>>> + lockres = &OCFS2_I(inode)->ip_rw_lockres;
>>> +
>>> + level = write ? DLM_LOCK_EX : DLM_LOCK_PR;
>>> +
>>> + status = ocfs2_cluster_lock(osb, lockres, level, DLM_LKF_NOQUEUE, 0);
>>
>> Hi Gang,
>> Should we consider about passing a flag - OCFS2_LOCK_NONBLOCK to
>> ocfs2_cluster_lock. Otherwise a cluster locking progress may be waiting
>> for accomplishment of DC, which I think violates _NO_WAIT_ semantics.
>
> If ocfs2 is a local file system, we should not wait for any condition, but for a cluster file system,
> we cannot avoid this totally according to the current DLM lock design, we need to wait for a little to get lock for the first time.
> Why do we not use OCFS2_LOCK_NONBLOCK flag to get a lock?
> since this flag is not stable to get a lock no matter this lock is occupied by other nodes, or not.
I suppose local node must be granted under the condition that it is
marked with *OCFS2_LOCK_BLOCKED*. And the control flag
_OCFS2_LOCK_NONBLOCK_ will make lock progress directly return -EAGAIN
without any waiting.
> If you use OCFS2_LOCK_NONBLOCK flag to get a fresh lock, you possibly fail or success, depends on when the lock acquisition callback happens.
> So, I think DLM_LKF_NOQUEUE flag should be more matched to _NO_WAIT_ semantics.
I thinks OCFS2_LOCK_NONBLOCK doesn't conflict DLM_LKF_NOQUEUE, they are
the forth and fifth argument respectively.
> we always get a fresh lock successfully, always failed if the lock is/was occupied by other nodes.
What do you mean by a fresh lock? A lock that is never granted or
acquired? If a lock is marked with OCFS2_LOCK_BLOCKED, local node must
has acquired it.
Thanks,
Changwei
> This flag can give us a consistent locking behavior.
>
> Thanks
> Gang
>
>
>
>>
>> Thanks,
>> Changwei.
>>
>>> + return status;
>>> +}
>>> +
>>> void ocfs2_rw_unlock(struct inode *inode, int write)
>>> {
>>> int level = write ? DLM_LOCK_EX : DLM_LOCK_PR;
>>> diff --git a/fs/ocfs2/dlmglue.h b/fs/ocfs2/dlmglue.h
>>> index a7fc18b..05910fc 100644
>>> --- a/fs/ocfs2/dlmglue.h
>>> +++ b/fs/ocfs2/dlmglue.h
>>> @@ -116,6 +116,7 @@ void ocfs2_refcount_lock_res_init(struct ocfs2_lock_res
>> *lockres,
>>> int ocfs2_create_new_inode_locks(struct inode *inode);
>>> int ocfs2_drop_inode_locks(struct inode *inode);
>>> int ocfs2_rw_lock(struct inode *inode, int write);
>>> +int ocfs2_try_rw_lock(struct inode *inode, int write);
>>> void ocfs2_rw_unlock(struct inode *inode, int write);
>>> int ocfs2_open_lock(struct inode *inode);
>>> int ocfs2_try_open_lock(struct inode *inode, int write);
>>> @@ -140,6 +141,9 @@ int ocfs2_inode_lock_with_page(struct inode *inode,
>>> /* 99% of the time we don't want to supply any additional flags --
>>> * those are for very specific cases only. */
>>> #define ocfs2_inode_lock(i, b, e) ocfs2_inode_lock_full_nested(i, b, e, 0,
>> OI_LS_NORMAL)
>>> +#define ocfs2_try_inode_lock(i, b, e)\
>>> + ocfs2_inode_lock_full_nested(i, b, e, OCFS2_META_LOCK_NOQUEUE,\
>>> + OI_LS_NORMAL)
>>> void ocfs2_inode_unlock(struct inode *inode,
>>> int ex);
>>> int ocfs2_super_lock(struct ocfs2_super *osb,
>>>
>
>
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [Ocfs2-devel] [PATCH v2 1/3] ocfs2: add ocfs2_try_rw_lock and ocfs2_try_inode_lock
2017-11-30 3:01 ` Changwei Ge
@ 2017-11-30 3:21 ` Gang He
-1 siblings, 0 replies; 20+ messages in thread
From: Gang He @ 2017-11-30 3:21 UTC (permalink / raw)
To: jlbec, ge.changwei, hch, Goldwyn Rodrigues, mfasheh
Cc: ocfs2-devel, linux-kernel
Hi Changwei,
>>>
> Hi Gang,
>
> On 2017/11/30 10:45, Gang He wrote:
>> Hello Changwei,
>>
>>
>>>>>
>>> On 2017/11/29 16:38, Gang He wrote:
>>>> Add ocfs2_try_rw_lock and ocfs2_try_inode_lock functions, which
>>>> will be used in non-block IO scenarios.
>>>>
>>>> Signed-off-by: Gang He <ghe@suse.com>
>>>> ---
>>>> fs/ocfs2/dlmglue.c | 21 +++++++++++++++++++++
>>>> fs/ocfs2/dlmglue.h | 4 ++++
>>>> 2 files changed, 25 insertions(+)
>>>>
>>>> diff --git a/fs/ocfs2/dlmglue.c b/fs/ocfs2/dlmglue.c
>>>> index 4689940..a68efa3 100644
>>>> --- a/fs/ocfs2/dlmglue.c
>>>> +++ b/fs/ocfs2/dlmglue.c
>>>> @@ -1742,6 +1742,27 @@ int ocfs2_rw_lock(struct inode *inode, int write)
>>>> return status;
>>>> }
>>>>
>>>> +int ocfs2_try_rw_lock(struct inode *inode, int write)
>>>> +{
>>>> + int status, level;
>>>> + struct ocfs2_lock_res *lockres;
>>>> + struct ocfs2_super *osb = OCFS2_SB(inode->i_sb);
>>>> +
>>>> + mlog(0, "inode %llu try to take %s RW lock\n",
>>>> + (unsigned long long)OCFS2_I(inode)->ip_blkno,
>>>> + write ? "EXMODE" : "PRMODE");
>>>> +
>>>> + if (ocfs2_mount_local(osb))
>>>> + return 0;
>>>> +
>>>> + lockres = &OCFS2_I(inode)->ip_rw_lockres;
>>>> +
>>>> + level = write ? DLM_LOCK_EX : DLM_LOCK_PR;
>>>> +
>>>> + status = ocfs2_cluster_lock(osb, lockres, level, DLM_LKF_NOQUEUE, 0);
>>>
>>> Hi Gang,
>>> Should we consider about passing a flag - OCFS2_LOCK_NONBLOCK to
>>> ocfs2_cluster_lock. Otherwise a cluster locking progress may be waiting
>>> for accomplishment of DC, which I think violates _NO_WAIT_ semantics.
>>
>> If ocfs2 is a local file system, we should not wait for any condition, but
> for a cluster file system,
>> we cannot avoid this totally according to the current DLM lock design, we
> need to wait for a little to get lock for the first time.
>> Why do we not use OCFS2_LOCK_NONBLOCK flag to get a lock?
>> since this flag is not stable to get a lock no matter this lock is occupied
> by other nodes, or not.
> I suppose local node must be granted under the condition that it is
> marked with *OCFS2_LOCK_BLOCKED*. And the control flag
> _OCFS2_LOCK_NONBLOCK_ will make lock progress directly return -EAGAIN
> without any waiting.
>
>> If you use OCFS2_LOCK_NONBLOCK flag to get a fresh lock, you possibly fail
> or success, depends on when the lock acquisition callback happens.
>> So, I think DLM_LKF_NOQUEUE flag should be more matched to _NO_WAIT_
> semantics.
>
> I thinks OCFS2_LOCK_NONBLOCK doesn't conflict DLM_LKF_NOQUEUE, they are
> the forth and fifth argument respectively.
Two arguments doesn't conflict,
but the _OCFS2_LOCK_NONBLOCK_ flag will bring the uncertainty of lock acquisition, since depends on when the lock acquisition callback happens.
In the other words, you possibly do not get a lock for one time if passing the _OCFS2_LOCK_NONBLOCK_ flag.
I think that this flag is suitable to ocfs2_inode_lock_with_page case, since when we can not get the lock immediately, we can try it again and again.
In actually, introducing this flag is to avoid a dead-lock case, a very tricky case.
Thanks
Gang
>
>> we always get a fresh lock successfully, always failed if the lock is/was
> occupied by other nodes.
>
> What do you mean by a fresh lock? A lock that is never granted or
> acquired? If a lock is marked with OCFS2_LOCK_BLOCKED, local node must
> has acquired it.
>
> Thanks,
> Changwei
>
>> This flag can give us a consistent locking behavior.
>>
>> Thanks
>> Gang
>>
>>
>>
>>>
>>> Thanks,
>>> Changwei.
>>>
>>>> + return status;
>>>> +}
>>>> +
>>>> void ocfs2_rw_unlock(struct inode *inode, int write)
>>>> {
>>>> int level = write ? DLM_LOCK_EX : DLM_LOCK_PR;
>>>> diff --git a/fs/ocfs2/dlmglue.h b/fs/ocfs2/dlmglue.h
>>>> index a7fc18b..05910fc 100644
>>>> --- a/fs/ocfs2/dlmglue.h
>>>> +++ b/fs/ocfs2/dlmglue.h
>>>> @@ -116,6 +116,7 @@ void ocfs2_refcount_lock_res_init(struct ocfs2_lock_res
>>> *lockres,
>>>> int ocfs2_create_new_inode_locks(struct inode *inode);
>>>> int ocfs2_drop_inode_locks(struct inode *inode);
>>>> int ocfs2_rw_lock(struct inode *inode, int write);
>>>> +int ocfs2_try_rw_lock(struct inode *inode, int write);
>>>> void ocfs2_rw_unlock(struct inode *inode, int write);
>>>> int ocfs2_open_lock(struct inode *inode);
>>>> int ocfs2_try_open_lock(struct inode *inode, int write);
>>>> @@ -140,6 +141,9 @@ int ocfs2_inode_lock_with_page(struct inode *inode,
>>>> /* 99% of the time we don't want to supply any additional flags --
>>>> * those are for very specific cases only. */
>>>> #define ocfs2_inode_lock(i, b, e) ocfs2_inode_lock_full_nested(i, b, e,
> 0,
>>> OI_LS_NORMAL)
>>>> +#define ocfs2_try_inode_lock(i, b, e)\
>>>> + ocfs2_inode_lock_full_nested(i, b, e, OCFS2_META_LOCK_NOQUEUE,\
>>>> + OI_LS_NORMAL)
>>>> void ocfs2_inode_unlock(struct inode *inode,
>>>> int ex);
>>>> int ocfs2_super_lock(struct ocfs2_super *osb,
>>>>
>>
>>
^ permalink raw reply [flat|nested] 20+ messages in thread
* [Ocfs2-devel] [PATCH v2 1/3] ocfs2: add ocfs2_try_rw_lock and ocfs2_try_inode_lock
@ 2017-11-30 3:21 ` Gang He
0 siblings, 0 replies; 20+ messages in thread
From: Gang He @ 2017-11-30 3:21 UTC (permalink / raw)
To: jlbec, ge.changwei, hch, Goldwyn Rodrigues, mfasheh
Cc: ocfs2-devel, linux-kernel
Hi Changwei,
>>>
> Hi Gang,
>
> On 2017/11/30 10:45, Gang He wrote:
>> Hello Changwei,
>>
>>
>>>>>
>>> On 2017/11/29 16:38, Gang He wrote:
>>>> Add ocfs2_try_rw_lock and ocfs2_try_inode_lock functions, which
>>>> will be used in non-block IO scenarios.
>>>>
>>>> Signed-off-by: Gang He <ghe@suse.com>
>>>> ---
>>>> fs/ocfs2/dlmglue.c | 21 +++++++++++++++++++++
>>>> fs/ocfs2/dlmglue.h | 4 ++++
>>>> 2 files changed, 25 insertions(+)
>>>>
>>>> diff --git a/fs/ocfs2/dlmglue.c b/fs/ocfs2/dlmglue.c
>>>> index 4689940..a68efa3 100644
>>>> --- a/fs/ocfs2/dlmglue.c
>>>> +++ b/fs/ocfs2/dlmglue.c
>>>> @@ -1742,6 +1742,27 @@ int ocfs2_rw_lock(struct inode *inode, int write)
>>>> return status;
>>>> }
>>>>
>>>> +int ocfs2_try_rw_lock(struct inode *inode, int write)
>>>> +{
>>>> + int status, level;
>>>> + struct ocfs2_lock_res *lockres;
>>>> + struct ocfs2_super *osb = OCFS2_SB(inode->i_sb);
>>>> +
>>>> + mlog(0, "inode %llu try to take %s RW lock\n",
>>>> + (unsigned long long)OCFS2_I(inode)->ip_blkno,
>>>> + write ? "EXMODE" : "PRMODE");
>>>> +
>>>> + if (ocfs2_mount_local(osb))
>>>> + return 0;
>>>> +
>>>> + lockres = &OCFS2_I(inode)->ip_rw_lockres;
>>>> +
>>>> + level = write ? DLM_LOCK_EX : DLM_LOCK_PR;
>>>> +
>>>> + status = ocfs2_cluster_lock(osb, lockres, level, DLM_LKF_NOQUEUE, 0);
>>>
>>> Hi Gang,
>>> Should we consider about passing a flag - OCFS2_LOCK_NONBLOCK to
>>> ocfs2_cluster_lock. Otherwise a cluster locking progress may be waiting
>>> for accomplishment of DC, which I think violates _NO_WAIT_ semantics.
>>
>> If ocfs2 is a local file system, we should not wait for any condition, but
> for a cluster file system,
>> we cannot avoid this totally according to the current DLM lock design, we
> need to wait for a little to get lock for the first time.
>> Why do we not use OCFS2_LOCK_NONBLOCK flag to get a lock?
>> since this flag is not stable to get a lock no matter this lock is occupied
> by other nodes, or not.
> I suppose local node must be granted under the condition that it is
> marked with *OCFS2_LOCK_BLOCKED*. And the control flag
> _OCFS2_LOCK_NONBLOCK_ will make lock progress directly return -EAGAIN
> without any waiting.
>
>> If you use OCFS2_LOCK_NONBLOCK flag to get a fresh lock, you possibly fail
> or success, depends on when the lock acquisition callback happens.
>> So, I think DLM_LKF_NOQUEUE flag should be more matched to _NO_WAIT_
> semantics.
>
> I thinks OCFS2_LOCK_NONBLOCK doesn't conflict DLM_LKF_NOQUEUE, they are
> the forth and fifth argument respectively.
Two arguments doesn't conflict,
but the _OCFS2_LOCK_NONBLOCK_ flag will bring the uncertainty of lock acquisition, since depends on when the lock acquisition callback happens.
In the other words, you possibly do not get a lock for one time if passing the _OCFS2_LOCK_NONBLOCK_ flag.
I think that this flag is suitable to ocfs2_inode_lock_with_page case, since when we can not get the lock immediately, we can try it again and again.
In actually, introducing this flag is to avoid a dead-lock case, a very tricky case.
Thanks
Gang
>
>> we always get a fresh lock successfully, always failed if the lock is/was
> occupied by other nodes.
>
> What do you mean by a fresh lock? A lock that is never granted or
> acquired? If a lock is marked with OCFS2_LOCK_BLOCKED, local node must
> has acquired it.
>
> Thanks,
> Changwei
>
>> This flag can give us a consistent locking behavior.
>>
>> Thanks
>> Gang
>>
>>
>>
>>>
>>> Thanks,
>>> Changwei.
>>>
>>>> + return status;
>>>> +}
>>>> +
>>>> void ocfs2_rw_unlock(struct inode *inode, int write)
>>>> {
>>>> int level = write ? DLM_LOCK_EX : DLM_LOCK_PR;
>>>> diff --git a/fs/ocfs2/dlmglue.h b/fs/ocfs2/dlmglue.h
>>>> index a7fc18b..05910fc 100644
>>>> --- a/fs/ocfs2/dlmglue.h
>>>> +++ b/fs/ocfs2/dlmglue.h
>>>> @@ -116,6 +116,7 @@ void ocfs2_refcount_lock_res_init(struct ocfs2_lock_res
>>> *lockres,
>>>> int ocfs2_create_new_inode_locks(struct inode *inode);
>>>> int ocfs2_drop_inode_locks(struct inode *inode);
>>>> int ocfs2_rw_lock(struct inode *inode, int write);
>>>> +int ocfs2_try_rw_lock(struct inode *inode, int write);
>>>> void ocfs2_rw_unlock(struct inode *inode, int write);
>>>> int ocfs2_open_lock(struct inode *inode);
>>>> int ocfs2_try_open_lock(struct inode *inode, int write);
>>>> @@ -140,6 +141,9 @@ int ocfs2_inode_lock_with_page(struct inode *inode,
>>>> /* 99% of the time we don't want to supply any additional flags --
>>>> * those are for very specific cases only. */
>>>> #define ocfs2_inode_lock(i, b, e) ocfs2_inode_lock_full_nested(i, b, e,
> 0,
>>> OI_LS_NORMAL)
>>>> +#define ocfs2_try_inode_lock(i, b, e)\
>>>> + ocfs2_inode_lock_full_nested(i, b, e, OCFS2_META_LOCK_NOQUEUE,\
>>>> + OI_LS_NORMAL)
>>>> void ocfs2_inode_unlock(struct inode *inode,
>>>> int ex);
>>>> int ocfs2_super_lock(struct ocfs2_super *osb,
>>>>
>>
>>
^ permalink raw reply [flat|nested] 20+ messages in thread
end of thread, other threads:[~2017-11-30 3:21 UTC | newest]
Thread overview: 20+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-11-29 8:36 [PATCH v2 0/3] ocfs2: add nowait aio support Gang He
2017-11-29 8:36 ` [Ocfs2-devel] " Gang He
2017-11-29 8:36 ` [PATCH v2 1/3] ocfs2: add ocfs2_try_rw_lock and ocfs2_try_inode_lock Gang He
2017-11-29 8:36 ` [Ocfs2-devel] " Gang He
2017-11-29 9:05 ` piaojun
2017-11-29 9:05 ` piaojun
2017-11-29 12:02 ` Changwei Ge
2017-11-29 12:02 ` Changwei Ge
2017-11-30 2:44 ` Gang He
2017-11-30 2:44 ` Gang He
2017-11-30 3:01 ` Changwei Ge
2017-11-30 3:01 ` Changwei Ge
2017-11-30 3:21 ` Gang He
2017-11-30 3:21 ` Gang He
2017-11-29 8:36 ` [PATCH v2 2/3] ocfs2: add ocfs2_overwrite_io function Gang He
2017-11-29 8:36 ` [Ocfs2-devel] " Gang He
2017-11-29 12:05 ` Changwei Ge
2017-11-29 12:05 ` Changwei Ge
2017-11-29 8:36 ` [PATCH v2 3/3] ocfs2: nowait aio support Gang He
2017-11-29 8:36 ` [Ocfs2-devel] " Gang He
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.