From: Gang He <ghe@suse.com>
To: mark@fasheh.com, jlbec@evilplan.org, joseph.qi@linux.alibaba.com
Cc: linux-kernel@vger.kernel.org, ocfs2-devel@oss.oracle.com
Subject: Re: [Ocfs2-devel] [PATCH] ocfs2: avoid getting dlm lock of the target directory multiple times during reflink process
Date: Tue, 31 Aug 2021 14:25:08 +0800 [thread overview]
Message-ID: <744d756c-7640-d312-37ef-126755324e8a@suse.com> (raw)
In-Reply-To: <20210826075941.28480-1-ghe@suse.com>
Hello Joseph and Wengang,
When you have time, please help review this patch.
About the deadlock problem which was caused by ocfs2_downconvert_lock
failure, we have the fix patch, it is very key.
But I feel this patch is still useful as a optimization patch, the user
case is to reflink the files to the same directory concurrently, our
users usually backup the files(via reflink) from the cluster nodes
concurrently(via crontab) every day/hour.
The current design, during the reflink process, the node will
acquire/release dlm lock of the target directory multiple times,
this is very inefficient in concurrently reflink.
Thanks
Gang
On 2021/8/26 15:59, Gang He wrote:
> During the reflink process, we should acquire the target directory
> inode dlm lock at the beginning, and hold this dlm lock until end
> of the function.
> With this patch, we avoid dlm lock ping-pong effect when clone
> files to the same directory simultaneously from multiple nodes.
> There is a typical user scenario, users regularly back up files
> to a specified directory through the reflink feature from the
> multiple nodes.
>
> Signed-off-by: Gang He <ghe@suse.com>
> ---
> fs/ocfs2/namei.c | 32 +++++++++++++-------------------
> fs/ocfs2/namei.h | 2 ++
> fs/ocfs2/refcounttree.c | 15 +++++++++++----
> fs/ocfs2/xattr.c | 12 +-----------
> fs/ocfs2/xattr.h | 1 +
> 5 files changed, 28 insertions(+), 34 deletions(-)
>
> diff --git a/fs/ocfs2/namei.c b/fs/ocfs2/namei.c
> index 2c46ff6ba4ea..f8bbb22cc60b 100644
> --- a/fs/ocfs2/namei.c
> +++ b/fs/ocfs2/namei.c
> @@ -2489,6 +2489,7 @@ static int ocfs2_prep_new_orphaned_file(struct inode *dir,
> }
>
> int ocfs2_create_inode_in_orphan(struct inode *dir,
> + struct buffer_head **dir_bh,
> int mode,
> struct inode **new_inode)
> {
> @@ -2597,13 +2598,16 @@ int ocfs2_create_inode_in_orphan(struct inode *dir,
>
> brelse(new_di_bh);
>
> - if (!status)
> - *new_inode = inode;
> -
> ocfs2_free_dir_lookup_result(&orphan_insert);
>
> - ocfs2_inode_unlock(dir, 1);
> - brelse(parent_di_bh);
> + if (!status) {
> + *new_inode = inode;
> + *dir_bh = parent_di_bh;
> + } else {
> + ocfs2_inode_unlock(dir, 1);
> + brelse(parent_di_bh);
> + }
> +
> return status;
> }
>
> @@ -2760,11 +2764,11 @@ int ocfs2_del_inode_from_orphan(struct ocfs2_super *osb,
> }
>
> int ocfs2_mv_orphaned_inode_to_new(struct inode *dir,
> + struct buffer_head *dir_bh,
> struct inode *inode,
> struct dentry *dentry)
> {
> int status = 0;
> - struct buffer_head *parent_di_bh = NULL;
> handle_t *handle = NULL;
> struct ocfs2_super *osb = OCFS2_SB(dir->i_sb);
> struct ocfs2_dinode *dir_di, *di;
> @@ -2778,14 +2782,7 @@ int ocfs2_mv_orphaned_inode_to_new(struct inode *dir,
> (unsigned long long)OCFS2_I(dir)->ip_blkno,
> (unsigned long long)OCFS2_I(inode)->ip_blkno);
>
> - status = ocfs2_inode_lock(dir, &parent_di_bh, 1);
> - if (status < 0) {
> - if (status != -ENOENT)
> - mlog_errno(status);
> - return status;
> - }
> -
> - dir_di = (struct ocfs2_dinode *) parent_di_bh->b_data;
> + dir_di = (struct ocfs2_dinode *) dir_bh->b_data;
> if (!dir_di->i_links_count) {
> /* can't make a file in a deleted directory. */
> status = -ENOENT;
> @@ -2798,7 +2795,7 @@ int ocfs2_mv_orphaned_inode_to_new(struct inode *dir,
> goto leave;
>
> /* get a spot inside the dir. */
> - status = ocfs2_prepare_dir_for_insert(osb, dir, parent_di_bh,
> + status = ocfs2_prepare_dir_for_insert(osb, dir, dir_bh,
> dentry->d_name.name,
> dentry->d_name.len, &lookup);
> if (status < 0) {
> @@ -2862,7 +2859,7 @@ int ocfs2_mv_orphaned_inode_to_new(struct inode *dir,
> ocfs2_journal_dirty(handle, di_bh);
>
> status = ocfs2_add_entry(handle, dentry, inode,
> - OCFS2_I(inode)->ip_blkno, parent_di_bh,
> + OCFS2_I(inode)->ip_blkno, dir_bh,
> &lookup);
> if (status < 0) {
> mlog_errno(status);
> @@ -2886,10 +2883,7 @@ int ocfs2_mv_orphaned_inode_to_new(struct inode *dir,
> iput(orphan_dir_inode);
> leave:
>
> - ocfs2_inode_unlock(dir, 1);
> -
> brelse(di_bh);
> - brelse(parent_di_bh);
> brelse(orphan_dir_bh);
>
> ocfs2_free_dir_lookup_result(&lookup);
> diff --git a/fs/ocfs2/namei.h b/fs/ocfs2/namei.h
> index 9cc891eb874e..03a2c526e2c1 100644
> --- a/fs/ocfs2/namei.h
> +++ b/fs/ocfs2/namei.h
> @@ -24,6 +24,7 @@ int ocfs2_orphan_del(struct ocfs2_super *osb,
> struct buffer_head *orphan_dir_bh,
> bool dio);
> int ocfs2_create_inode_in_orphan(struct inode *dir,
> + struct buffer_head **dir_bh,
> int mode,
> struct inode **new_inode);
> int ocfs2_add_inode_to_orphan(struct ocfs2_super *osb,
> @@ -32,6 +33,7 @@ int ocfs2_del_inode_from_orphan(struct ocfs2_super *osb,
> struct inode *inode, struct buffer_head *di_bh,
> int update_isize, loff_t end);
> int ocfs2_mv_orphaned_inode_to_new(struct inode *dir,
> + struct buffer_head *dir_bh,
> struct inode *new_inode,
> struct dentry *new_dentry);
>
> diff --git a/fs/ocfs2/refcounttree.c b/fs/ocfs2/refcounttree.c
> index 7f6355cbb587..a9a0c7c37e8e 100644
> --- a/fs/ocfs2/refcounttree.c
> +++ b/fs/ocfs2/refcounttree.c
> @@ -4250,7 +4250,7 @@ static int ocfs2_reflink(struct dentry *old_dentry, struct inode *dir,
> {
> int error, had_lock;
> struct inode *inode = d_inode(old_dentry);
> - struct buffer_head *old_bh = NULL;
> + struct buffer_head *old_bh = NULL, *dir_bh = NULL;
> struct inode *new_orphan_inode = NULL;
> struct ocfs2_lock_holder oh;
>
> @@ -4258,7 +4258,7 @@ static int ocfs2_reflink(struct dentry *old_dentry, struct inode *dir,
> return -EOPNOTSUPP;
>
>
> - error = ocfs2_create_inode_in_orphan(dir, inode->i_mode,
> + error = ocfs2_create_inode_in_orphan(dir, &dir_bh, inode->i_mode,
> &new_orphan_inode);
> if (error) {
> mlog_errno(error);
> @@ -4304,13 +4304,15 @@ static int ocfs2_reflink(struct dentry *old_dentry, struct inode *dir,
>
> /* If the security isn't preserved, we need to re-initialize them. */
> if (!preserve) {
> - error = ocfs2_init_security_and_acl(dir, new_orphan_inode,
> + error = ocfs2_init_security_and_acl(dir, dir_bh,
> + new_orphan_inode,
> &new_dentry->d_name);
> if (error)
> mlog_errno(error);
> }
> if (!error) {
> - error = ocfs2_mv_orphaned_inode_to_new(dir, new_orphan_inode,
> + error = ocfs2_mv_orphaned_inode_to_new(dir, dir_bh,
> + new_orphan_inode,
> new_dentry);
> if (error)
> mlog_errno(error);
> @@ -4328,6 +4330,11 @@ static int ocfs2_reflink(struct dentry *old_dentry, struct inode *dir,
> iput(new_orphan_inode);
> }
>
> + if (dir_bh) {
> + ocfs2_inode_unlock(dir, 1);
> + brelse(dir_bh);
> + }
> +
> return error;
> }
>
> diff --git a/fs/ocfs2/xattr.c b/fs/ocfs2/xattr.c
> index dd784eb0cd7c..3f23e3a5018c 100644
> --- a/fs/ocfs2/xattr.c
> +++ b/fs/ocfs2/xattr.c
> @@ -7203,16 +7203,13 @@ int ocfs2_reflink_xattrs(struct inode *old_inode,
> /*
> * Initialize security and acl for a already created inode.
> * Used for reflink a non-preserve-security file.
> - *
> - * It uses common api like ocfs2_xattr_set, so the caller
> - * must not hold any lock expect i_mutex.
> */
> int ocfs2_init_security_and_acl(struct inode *dir,
> + struct buffer_head *dir_bh,
> struct inode *inode,
> const struct qstr *qstr)
> {
> int ret = 0;
> - struct buffer_head *dir_bh = NULL;
>
> ret = ocfs2_init_security_get(inode, dir, qstr, NULL);
> if (ret) {
> @@ -7220,17 +7217,10 @@ int ocfs2_init_security_and_acl(struct inode *dir,
> goto leave;
> }
>
> - ret = ocfs2_inode_lock(dir, &dir_bh, 0);
> - if (ret) {
> - mlog_errno(ret);
> - goto leave;
> - }
> ret = ocfs2_init_acl(NULL, inode, dir, NULL, dir_bh, NULL, NULL);
> if (ret)
> mlog_errno(ret);
>
> - ocfs2_inode_unlock(dir, 0);
> - brelse(dir_bh);
> leave:
> return ret;
> }
> diff --git a/fs/ocfs2/xattr.h b/fs/ocfs2/xattr.h
> index 00308b57f64f..b27fd8ba0019 100644
> --- a/fs/ocfs2/xattr.h
> +++ b/fs/ocfs2/xattr.h
> @@ -83,6 +83,7 @@ int ocfs2_reflink_xattrs(struct inode *old_inode,
> struct buffer_head *new_bh,
> bool preserve_security);
> int ocfs2_init_security_and_acl(struct inode *dir,
> + struct buffer_head *dir_bh,
> struct inode *inode,
> const struct qstr *qstr);
> #endif /* OCFS2_XATTR_H */
>
_______________________________________________
Ocfs2-devel mailing list
Ocfs2-devel@oss.oracle.com
https://oss.oracle.com/mailman/listinfo/ocfs2-devel
next prev parent reply other threads:[~2021-08-31 6:25 UTC|newest]
Thread overview: 11+ messages / expand[flat|nested] mbox.gz Atom feed top
2021-08-26 7:59 [Ocfs2-devel] [PATCH] ocfs2: avoid getting dlm lock of the target directory multiple times during reflink process Gang He
2021-08-31 6:25 ` Gang He [this message]
2021-08-31 7:39 ` Joseph Qi
2021-09-07 15:57 ` Wengang Wang
2021-09-08 6:06 ` Gang He
2021-09-08 16:00 ` Wengang Wang
2021-09-14 7:34 ` Gang He
2021-09-14 17:50 ` Wengang Wang
2021-09-17 7:37 ` Gang He
2021-09-06 11:14 ` Joseph Qi
2021-09-07 9:40 ` Gang He
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=744d756c-7640-d312-37ef-126755324e8a@suse.com \
--to=ghe@suse.com \
--cc=jlbec@evilplan.org \
--cc=joseph.qi@linux.alibaba.com \
--cc=linux-kernel@vger.kernel.org \
--cc=mark@fasheh.com \
--cc=ocfs2-devel@oss.oracle.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).