[Ocfs2-devel] [PATCH 0/7] [RESEND] Fix some deadlocks in quota code and implement lockdep for cluster locks

All of lore.kernel.org
 help / color / mirror / Atom feed

* [Ocfs2-devel] [PATCH 0/7] [RESEND] Fix some deadlocks in quota code and implement lockdep for cluster locks
@ 2009-06-02 12:23 Jan Kara
  2009-06-02 12:23 ` [Ocfs2-devel] [PATCH] ocfs2: Fix possible deadlock in ocfs2_global_read_dquot() Jan Kara
                   ` (8 more replies)
  0 siblings, 9 replies; 20+ messages in thread
From: Jan Kara @ 2009-06-02 12:23 UTC (permalink / raw)
  To: ocfs2-devel

  Hi,

  I'm resending this patch series. It's rediffed against linux-next branch of
Joel's git tree. The first four patches are obvious fixes of deadlocks in quota
code and should go in as soon as possible. The other three patches implement
lockdep support for OCFS2 cluster locks. So you can have a look whether the
code make sence to you and possibly merge them. They should be NOP when lockdep
is disabled and help a lot when debugging various locking problems (with these
patches I found the problems fixed in the beginning of the series).
  BTW: Currently lockdep complains about some configfs issue and turns itself
off so that's why I didn't get to finding more deadlocks ;)

									Honza

^ permalink raw reply	[flat|nested] 20+ messages in thread

* [Ocfs2-devel] [PATCH] ocfs2: Fix possible deadlock in ocfs2_global_read_dquot()
  2009-06-02 12:23 [Ocfs2-devel] [PATCH 0/7] [RESEND] Fix some deadlocks in quota code and implement lockdep for cluster locks Jan Kara
@ 2009-06-02 12:23 ` Jan Kara
  2009-06-03  0:15   ` Joel Becker
  2009-06-02 12:24 ` [Ocfs2-devel] [PATCH] ocfs2: Fix lock inversion in ocfs2_local_read_info() Jan Kara
                   ` (7 subsequent siblings)
  8 siblings, 1 reply; 20+ messages in thread
From: Jan Kara @ 2009-06-02 12:23 UTC (permalink / raw)
  To: ocfs2-devel

It is not possible to get a read lock and then try to get the same write lock
in one thread as that can block on downconvert being requested by other node
leading to deadlock. So first drop the quota lock for reading and only after
that get it for writing.

Signed-off-by: Jan Kara <jack@suse.cz>
---
 fs/ocfs2/quota_global.c |    4 +++-
 1 files changed, 3 insertions(+), 1 deletions(-)

diff --git a/fs/ocfs2/quota_global.c b/fs/ocfs2/quota_global.c
index 1ed0f7c..edfa60c 100644
--- a/fs/ocfs2/quota_global.c
+++ b/fs/ocfs2/quota_global.c
@@ -421,6 +421,7 @@ int ocfs2_global_read_dquot(struct dquot *dquot)
 	OCFS2_DQUOT(dquot)->dq_originodes = dquot->dq_dqb.dqb_curinodes;
 	if (!dquot->dq_off) {	/* No real quota entry? */
 		/* Upgrade to exclusive lock for allocation */
+		ocfs2_qinfo_unlock(info, 0);
 		err = ocfs2_qinfo_lock(info, 1);
 		if (err < 0)
 			goto out_qlock;
@@ -435,7 +436,8 @@ int ocfs2_global_read_dquot(struct dquot *dquot)
 out_qlock:
 	if (ex)
 		ocfs2_qinfo_unlock(info, 1);
-	ocfs2_qinfo_unlock(info, 0);
+	else
+		ocfs2_qinfo_unlock(info, 0);
 out:
 	if (err < 0)
 		mlog_errno(err);
-- 
1.6.0.2

^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [Ocfs2-devel] [PATCH] ocfs2: Fix lock inversion in ocfs2_local_read_info()
  2009-06-02 12:23 [Ocfs2-devel] [PATCH 0/7] [RESEND] Fix some deadlocks in quota code and implement lockdep for cluster locks Jan Kara
  2009-06-02 12:23 ` [Ocfs2-devel] [PATCH] ocfs2: Fix possible deadlock in ocfs2_global_read_dquot() Jan Kara
@ 2009-06-02 12:24 ` Jan Kara
  2009-06-02 12:24 ` [Ocfs2-devel] [PATCH] ocfs2: Fix possible deadlock with quotas in ocfs2_setattr() Jan Kara
                   ` (6 subsequent siblings)
  8 siblings, 0 replies; 20+ messages in thread
From: Jan Kara @ 2009-06-02 12:24 UTC (permalink / raw)
  To: ocfs2-devel

This function is called with dqio_mutex held but it has to acquire lock
from global quota file which ranks above this lock. This is not deadlockable
lock inversion since this code path is take only during mount when noone
else can race with us but let's clean this up to silence lockdep.

We just drop the dqio_mutex in the beginning of the function and reacquire
it in the end since we don't need it - noone can race with us at this moment.

Signed-off-by: Jan Kara <jack@suse.cz>
---
 fs/ocfs2/quota_local.c |    5 +++++
 1 files changed, 5 insertions(+), 0 deletions(-)

diff --git a/fs/ocfs2/quota_local.c b/fs/ocfs2/quota_local.c
index 07deec5..71cf410 100644
--- a/fs/ocfs2/quota_local.c
+++ b/fs/ocfs2/quota_local.c
@@ -655,6 +655,9 @@ static int ocfs2_local_read_info(struct super_block *sb, int type)
 	struct ocfs2_quota_recovery *rec;
 	int locked = 0;
 
+	/* We don't need the lock and we have to acquire quota file locks
+	 * which will later depend on this lock */
+	mutex_unlock(&sb_dqopt(sb)->dqio_mutex);
 	info->dqi_maxblimit = 0x7fffffffffffffffLL;
 	info->dqi_maxilimit = 0x7fffffffffffffffLL;
 	oinfo = kmalloc(sizeof(struct ocfs2_mem_dqinfo), GFP_NOFS);
@@ -733,6 +736,7 @@ static int ocfs2_local_read_info(struct super_block *sb, int type)
 		goto out_err;
 	}
 
+	mutex_lock(&sb_dqopt(sb)->dqio_mutex);
 	return 0;
 out_err:
 	if (oinfo) {
@@ -746,6 +750,7 @@ out_err:
 		kfree(oinfo);
 	}
 	brelse(bh);
+	mutex_lock(&sb_dqopt(sb)->dqio_mutex);
 	return -1;
 }
 
-- 
1.6.0.2

^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [Ocfs2-devel] [PATCH] ocfs2: Fix possible deadlock with quotas in ocfs2_setattr()
  2009-06-02 12:23 [Ocfs2-devel] [PATCH 0/7] [RESEND] Fix some deadlocks in quota code and implement lockdep for cluster locks Jan Kara
  2009-06-02 12:23 ` [Ocfs2-devel] [PATCH] ocfs2: Fix possible deadlock in ocfs2_global_read_dquot() Jan Kara
  2009-06-02 12:24 ` [Ocfs2-devel] [PATCH] ocfs2: Fix lock inversion in ocfs2_local_read_info() Jan Kara
@ 2009-06-02 12:24 ` Jan Kara
  2009-06-02 12:24 ` [Ocfs2-devel] [PATCH] ocfs2: Fix possible deadlock in quota recovery Jan Kara
                   ` (5 subsequent siblings)
  8 siblings, 0 replies; 20+ messages in thread
From: Jan Kara @ 2009-06-02 12:24 UTC (permalink / raw)
  To: ocfs2-devel

We called vfs_dq_transfer() with global quota file lock held. This can lead
to deadlocks as if vfs_dq_transfer() has to allocate new quota structure,
it calls ocfs2_dquot_acquire() which tries to get quota file lock again and
this can block if another node requested the lock in the mean time.

Since we have to call vfs_dq_transfer() with transaction already started
and quota file lock ranks above the transaction start, we cannot just rely
on ocfs2_dquot_acquire() or ocfs2_dquot_release() on getting the lock
if they need it. We fix the problem by acquiring pointers to all quota
structures needed by vfs_dq_transfer() already before calling the function.
By this we are sure that all quota structures are properly allocated and
they can be freed only after we drop references to them. Thus we don't need
quota file lock anywhere inside vfs_dq_transfer().

Signed-off-by: Jan Kara <jack@suse.cz>
---
 fs/ocfs2/file.c |   53 ++++++++++++++++++++++++++++++-----------------------
 1 files changed, 30 insertions(+), 23 deletions(-)

diff --git a/fs/ocfs2/file.c b/fs/ocfs2/file.c
index c2a87c8..1a96cac 100644
--- a/fs/ocfs2/file.c
+++ b/fs/ocfs2/file.c
@@ -894,9 +894,9 @@ int ocfs2_setattr(struct dentry *dentry, struct iattr *attr)
 	struct ocfs2_super *osb = OCFS2_SB(sb);
 	struct buffer_head *bh = NULL;
 	handle_t *handle = NULL;
-	int locked[MAXQUOTAS] = {0, 0};
-	int credits, qtype;
-	struct ocfs2_mem_dqinfo *oinfo;
+	int qtype;
+	struct dquot *transfer_from[MAXQUOTAS] = { };
+	struct dquot *transfer_to[MAXQUOTAS] = { };
 
 	mlog_entry("(0x%p, '%.*s')\n", dentry,
 	           dentry->d_name.len, dentry->d_name.name);
@@ -969,30 +969,37 @@ int ocfs2_setattr(struct dentry *dentry, struct iattr *attr)
 
 	if ((attr->ia_valid & ATTR_UID && attr->ia_uid != inode->i_uid) ||
 	    (attr->ia_valid & ATTR_GID && attr->ia_gid != inode->i_gid)) {
-		credits = OCFS2_INODE_UPDATE_CREDITS;
+		/*
+		 * Gather pointers to quota structures so that allocation /
+		 * freeing of quota structures happens here and not inside
+		 * vfs_dq_transfer() where we have problems with lock ordering
+		 */
 		if (attr->ia_valid & ATTR_UID && attr->ia_uid != inode->i_uid
 		    && OCFS2_HAS_RO_COMPAT_FEATURE(sb,
 		    OCFS2_FEATURE_RO_COMPAT_USRQUOTA)) {
-			oinfo = sb_dqinfo(sb, USRQUOTA)->dqi_priv;
-			status = ocfs2_lock_global_qf(oinfo, 1);
-			if (status < 0)
+			transfer_to[USRQUOTA] = dqget(sb, attr->ia_uid,
+						      USRQUOTA);
+			transfer_from[USRQUOTA] = dqget(sb, inode->i_uid,
+							USRQUOTA);
+			if (!transfer_to[USRQUOTA] || !transfer_from[USRQUOTA]) {
+				status = -ESRCH;
 				goto bail_unlock;
-			credits += ocfs2_calc_qinit_credits(sb, USRQUOTA) +
-				ocfs2_calc_qdel_credits(sb, USRQUOTA);
-			locked[USRQUOTA] = 1;
+			}
 		}
 		if (attr->ia_valid & ATTR_GID && attr->ia_gid != inode->i_gid
 		    && OCFS2_HAS_RO_COMPAT_FEATURE(sb,
 		    OCFS2_FEATURE_RO_COMPAT_GRPQUOTA)) {
-			oinfo = sb_dqinfo(sb, GRPQUOTA)->dqi_priv;
-			status = ocfs2_lock_global_qf(oinfo, 1);
-			if (status < 0)
+			transfer_to[GRPQUOTA] = dqget(sb, attr->ia_gid,
+						      GRPQUOTA);
+			transfer_from[GRPQUOTA] = dqget(sb, inode->i_gid,
+							GRPQUOTA);
+			if (!transfer_to[GRPQUOTA] || !transfer_from[GRPQUOTA]) {
+				status = -ESRCH;
 				goto bail_unlock;
-			credits += ocfs2_calc_qinit_credits(sb, GRPQUOTA) +
-				   ocfs2_calc_qdel_credits(sb, GRPQUOTA);
-			locked[GRPQUOTA] = 1;
+			}
 		}
-		handle = ocfs2_start_trans(osb, credits);
+		handle = ocfs2_start_trans(osb, OCFS2_INODE_UPDATE_CREDITS +
+					   2 * ocfs2_quota_trans_credits(sb));
 		if (IS_ERR(handle)) {
 			status = PTR_ERR(handle);
 			mlog_errno(status);
@@ -1030,12 +1037,6 @@ int ocfs2_setattr(struct dentry *dentry, struct iattr *attr)
 bail_commit:
 	ocfs2_commit_trans(osb, handle);
 bail_unlock:
-	for (qtype = 0; qtype < MAXQUOTAS; qtype++) {
-		if (!locked[qtype])
-			continue;
-		oinfo = sb_dqinfo(sb, qtype)->dqi_priv;
-		ocfs2_unlock_global_qf(oinfo, 1);
-	}
 	ocfs2_inode_unlock(inode, 1);
 bail_unlock_rw:
 	if (size_change)
@@ -1043,6 +1044,12 @@ bail_unlock_rw:
 bail:
 	brelse(bh);
 
+	/* Release quota pointers in case we acquired them */
+	for (qtype = 0; qtype < MAXQUOTAS; qtype++) {
+		dqput(transfer_to[qtype]);
+		dqput(transfer_from[qtype]);
+	}
+
 	if (!status && attr->ia_valid & ATTR_MODE) {
 		status = ocfs2_acl_chmod(inode);
 		if (status < 0)
-- 
1.6.0.2

^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [Ocfs2-devel] [PATCH] ocfs2: Fix possible deadlock in quota recovery
  2009-06-02 12:23 [Ocfs2-devel] [PATCH 0/7] [RESEND] Fix some deadlocks in quota code and implement lockdep for cluster locks Jan Kara
                   ` (2 preceding siblings ...)
  2009-06-02 12:24 ` [Ocfs2-devel] [PATCH] ocfs2: Fix possible deadlock with quotas in ocfs2_setattr() Jan Kara
@ 2009-06-02 12:24 ` Jan Kara
  2009-06-02 12:24 ` [Ocfs2-devel] [PATCH] ocfs2: Correct ordering of ip_alloc_sem and localloc locks for directories Jan Kara
                   ` (4 subsequent siblings)
  8 siblings, 0 replies; 20+ messages in thread
From: Jan Kara @ 2009-06-02 12:24 UTC (permalink / raw)
  To: ocfs2-devel

In ocfs2_finish_quota_recovery() we acquired global quota file lock and started
recovering local quota file. During this process we need to get quota
structures, which calls ocfs2_dquot_acquire() which gets global quota file lock
again. This second lock can block in case some other node has requested the
quota file lock in the mean time. Fix the problem by moving quota file locking
down into the function where it is really needed.  Then dqget() or dqput()
won't be called with the lock held.

Signed-off-by: Jan Kara <jack@suse.cz>
---
 fs/ocfs2/quota_local.c |   16 +++++++++-------
 1 files changed, 9 insertions(+), 7 deletions(-)

diff --git a/fs/ocfs2/quota_local.c b/fs/ocfs2/quota_local.c
index 71cf410..5a460fa 100644
--- a/fs/ocfs2/quota_local.c
+++ b/fs/ocfs2/quota_local.c
@@ -444,10 +444,6 @@ static int ocfs2_recover_local_quota_file(struct inode *lqinode,
 
 	mlog_entry("ino=%lu type=%u", (unsigned long)lqinode->i_ino, type);
 
-	status = ocfs2_lock_global_qf(oinfo, 1);
-	if (status < 0)
-		goto out;
-
 	list_for_each_entry_safe(rchunk, next, &(rec->r_list[type]), rc_list) {
 		chunk = rchunk->rc_chunk;
 		hbh = NULL;
@@ -480,12 +476,18 @@ static int ocfs2_recover_local_quota_file(struct inode *lqinode,
 				     type);
 				goto out_put_bh;
 			}
+			status = ocfs2_lock_global_qf(oinfo, 1);
+			if (status < 0) {
+				mlog_errno(status);
+				goto out_put_dquot;
+			}
+
 			handle = ocfs2_start_trans(OCFS2_SB(sb),
 						   OCFS2_QSYNC_CREDITS);
 			if (IS_ERR(handle)) {
 				status = PTR_ERR(handle);
 				mlog_errno(status);
-				goto out_put_dquot;
+				goto out_drop_lock;
 			}
 			mutex_lock(&sb_dqopt(sb)->dqio_mutex);
 			spin_lock(&dq_data_lock);
@@ -523,6 +525,8 @@ static int ocfs2_recover_local_quota_file(struct inode *lqinode,
 out_commit:
 			mutex_unlock(&sb_dqopt(sb)->dqio_mutex);
 			ocfs2_commit_trans(OCFS2_SB(sb), handle);
+out_drop_lock:
+			ocfs2_unlock_global_qf(oinfo, 1);
 out_put_dquot:
 			dqput(dquot);
 out_put_bh:
@@ -537,8 +541,6 @@ out_put_bh:
 		if (status < 0)
 			break;
 	}
-	ocfs2_unlock_global_qf(oinfo, 1);
-out:
 	if (status < 0)
 		free_recovery_list(&(rec->r_list[type]));
 	mlog_exit(status);
-- 
1.6.0.2

^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [Ocfs2-devel] [PATCH] ocfs2: Correct ordering of ip_alloc_sem and localloc locks for directories
  2009-06-02 12:23 [Ocfs2-devel] [PATCH 0/7] [RESEND] Fix some deadlocks in quota code and implement lockdep for cluster locks Jan Kara
                   ` (3 preceding siblings ...)
  2009-06-02 12:24 ` [Ocfs2-devel] [PATCH] ocfs2: Fix possible deadlock in quota recovery Jan Kara
@ 2009-06-02 12:24 ` Jan Kara
  2009-06-03 21:05   ` Mark Fasheh
  2009-06-02 12:24 ` [Ocfs2-devel] [PATCH] vfs: Set special lockdep map for dirs only if not set by fs Jan Kara
                   ` (3 subsequent siblings)
  8 siblings, 1 reply; 20+ messages in thread
From: Jan Kara @ 2009-06-02 12:24 UTC (permalink / raw)
  To: ocfs2-devel

We use ordering ip_alloc_sem -> local alloc locks in ocfs2_write_begin().
So change lock ordering in ocfs2_extend_dir() and ocfs2_expand_inline_dir()
to also use this lock ordering.

Signed-off-by: Jan Kara <jack@suse.cz>
---
 fs/ocfs2/dir.c |   21 ++++++++++-----------
 1 files changed, 10 insertions(+), 11 deletions(-)

diff --git a/fs/ocfs2/dir.c b/fs/ocfs2/dir.c
index c575230..b358f3b 100644
--- a/fs/ocfs2/dir.c
+++ b/fs/ocfs2/dir.c
@@ -2900,6 +2900,8 @@ static int ocfs2_expand_inline_dir(struct inode *dir, struct buffer_head *di_bh,
 	alloc = ocfs2_clusters_for_bytes(sb, bytes);
 	dx_alloc = 0;
 
+	down_write(&oi->ip_alloc_sem);
+
 	if (ocfs2_supports_indexed_dirs(osb)) {
 		credits += ocfs2_add_dir_index_credits(sb);
 
@@ -2940,8 +2942,6 @@ static int ocfs2_expand_inline_dir(struct inode *dir, struct buffer_head *di_bh,
 		goto out;
 	}
 
-	down_write(&oi->ip_alloc_sem);
-
 	/*
 	 * Prepare for worst case allocation scenario of two separate
 	 * extents in the unindexed tree.
@@ -2953,7 +2953,7 @@ static int ocfs2_expand_inline_dir(struct inode *dir, struct buffer_head *di_bh,
 	if (IS_ERR(handle)) {
 		ret = PTR_ERR(handle);
 		mlog_errno(ret);
-		goto out_sem;
+		goto out;
 	}
 
 	if (vfs_dq_alloc_space_nodirty(dir,
@@ -3172,10 +3172,8 @@ out_commit:
 
 	ocfs2_commit_trans(osb, handle);
 
-out_sem:
-	up_write(&oi->ip_alloc_sem);
-
 out:
+	up_write(&oi->ip_alloc_sem);
 	if (data_ac)
 		ocfs2_free_alloc_context(data_ac);
 	if (meta_ac)
@@ -3322,11 +3320,15 @@ static int ocfs2_extend_dir(struct ocfs2_super *osb,
 		brelse(new_bh);
 		new_bh = NULL;
 
+		down_write(&OCFS2_I(dir)->ip_alloc_sem);
+		drop_alloc_sem = 1;
 		dir_i_size = i_size_read(dir);
 		credits = OCFS2_SIMPLE_DIR_EXTEND_CREDITS;
 		goto do_extend;
 	}
 
+	down_write(&OCFS2_I(dir)->ip_alloc_sem);
+	drop_alloc_sem = 1;
 	dir_i_size = i_size_read(dir);
 	mlog(0, "extending dir %llu (i_size = %lld)\n",
 	     (unsigned long long)OCFS2_I(dir)->ip_blkno, dir_i_size);
@@ -3370,9 +3372,6 @@ do_extend:
 		credits++; /* For attaching the new dirent block to the
 			    * dx_root */
 
-	down_write(&OCFS2_I(dir)->ip_alloc_sem);
-	drop_alloc_sem = 1;
-
 	handle = ocfs2_start_trans(osb, credits);
 	if (IS_ERR(handle)) {
 		status = PTR_ERR(handle);
@@ -3435,10 +3434,10 @@ bail_bh:
 	*new_de_bh = new_bh;
 	get_bh(*new_de_bh);
 bail:
-	if (drop_alloc_sem)
-		up_write(&OCFS2_I(dir)->ip_alloc_sem);
 	if (handle)
 		ocfs2_commit_trans(osb, handle);
+	if (drop_alloc_sem)
+		up_write(&OCFS2_I(dir)->ip_alloc_sem);
 
 	if (data_ac)
 		ocfs2_free_alloc_context(data_ac);
-- 
1.6.0.2

^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [Ocfs2-devel] [PATCH] vfs: Set special lockdep map for dirs only if not set by fs
  2009-06-02 12:23 [Ocfs2-devel] [PATCH 0/7] [RESEND] Fix some deadlocks in quota code and implement lockdep for cluster locks Jan Kara
                   ` (4 preceding siblings ...)
  2009-06-02 12:24 ` [Ocfs2-devel] [PATCH] ocfs2: Correct ordering of ip_alloc_sem and localloc locks for directories Jan Kara
@ 2009-06-02 12:24 ` Jan Kara
  2009-06-02 12:53   ` Peter Zijlstra
  2009-06-02 12:24 ` [Ocfs2-devel] [PATCH] ocfs2: Add lockdep annotations Jan Kara
                   ` (2 subsequent siblings)
  8 siblings, 1 reply; 20+ messages in thread
From: Jan Kara @ 2009-06-02 12:24 UTC (permalink / raw)
  To: ocfs2-devel

Some filesystems need to set lockdep map for i_mutex differently for
different directories. For example OCFS2 has system directories (for
orphan inode tracking and for gathering all system files like journal
or quota files into a single place) which have different locking
locking rules than standard directories. For a filesystem setting
lockdep map is naturaly done when the inode is read but we have to
modify unlock_new_inode() not to overwrite the lockdep map the filesystem
has set.

CC: peterz at infradead.org
CC: mingo at redhat.com
Signed-off-by: Jan Kara <jack@suse.cz>
---
 fs/inode.c              |   17 +++++++++++------
 include/linux/lockdep.h |   15 +++++++++++++++
 2 files changed, 26 insertions(+), 6 deletions(-)

diff --git a/fs/inode.c b/fs/inode.c
index 6ad14a1..44e1848 100644
--- a/fs/inode.c
+++ b/fs/inode.c
@@ -654,12 +654,17 @@ void unlock_new_inode(struct inode *inode)
 	if (inode->i_mode & S_IFDIR) {
 		struct file_system_type *type = inode->i_sb->s_type;
 
-		/*
-		 * ensure nobody is actually holding i_mutex
-		 */
-		mutex_destroy(&inode->i_mutex);
-		mutex_init(&inode->i_mutex);
-		lockdep_set_class(&inode->i_mutex, &type->i_mutex_dir_key);
+		/* Set new key only if filesystem hasn't already changed it */
+		if (!lockdep_match_class(&inode->i_mutex,
+		    &type->i_mutex_key)) {
+			/*
+			 * ensure nobody is actually holding i_mutex
+			 */
+			mutex_destroy(&inode->i_mutex);
+			mutex_init(&inode->i_mutex);
+			lockdep_set_class(&inode->i_mutex,
+					  &type->i_mutex_dir_key);
+		}
 	}
 #endif
 	/*
diff --git a/include/linux/lockdep.h b/include/linux/lockdep.h
index da5a5a1..b25d1b5 100644
--- a/include/linux/lockdep.h
+++ b/include/linux/lockdep.h
@@ -258,6 +258,16 @@ extern void lockdep_init_map(struct lockdep_map *lock, const char *name,
 #define lockdep_set_subclass(lock, sub)	\
 		lockdep_init_map(&(lock)->dep_map, #lock, \
 				 (lock)->dep_map.key, sub)
+/*
+ * Compare locking classes
+ */
+#define lockdep_match_class(lock, key) lockdep_match_key(&(lock)->dep_map, key)
+
+static inline int lockdep_match_key(struct lockdep_map *lock,
+				    struct lock_class_key *key)
+{
+	return lock->key == key;
+}
 
 /*
  * Acquire a lock.
@@ -326,6 +336,11 @@ static inline void lockdep_on(void)
 #define lockdep_set_class_and_subclass(lock, key, sub) \
 		do { (void)(key); } while (0)
 #define lockdep_set_subclass(lock, sub)		do { } while (0)
+/*
+ * We don't define lockdep_match_class() and lockdep_match_key() for !LOCKDEP
+ * case since the result is not well defined and the caller should rather
+ * #ifdef the call himself.
+ */
 
 # define INIT_LOCKDEP
 # define lockdep_reset()		do { debug_locks = 1; } while (0)
-- 
1.6.0.2

^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [Ocfs2-devel] [PATCH] ocfs2: Add lockdep annotations
  2009-06-02 12:23 [Ocfs2-devel] [PATCH 0/7] [RESEND] Fix some deadlocks in quota code and implement lockdep for cluster locks Jan Kara
                   ` (5 preceding siblings ...)
  2009-06-02 12:24 ` [Ocfs2-devel] [PATCH] vfs: Set special lockdep map for dirs only if not set by fs Jan Kara
@ 2009-06-02 12:24 ` Jan Kara
  2009-06-03 16:31   ` Joel Becker
  2009-06-02 18:43 ` [Ocfs2-devel] [PATCH 0/7] [RESEND] Fix some deadlocks in quota code and implement lockdep for cluster locks Joel Becker
  2009-06-04  2:26 ` Joel Becker
  8 siblings, 1 reply; 20+ messages in thread
From: Jan Kara @ 2009-06-02 12:24 UTC (permalink / raw)
  To: ocfs2-devel

Add lockdep support to OCFS2. The support also covers all of the cluster
locks except for open locks, journal locks, and local quotafile locks. These
are special because they are acquired for a node, not for a particular process
and lockdep cannot deal with such type of locking.

Signed-off-by: Jan Kara <jack@suse.cz>
---
 fs/ocfs2/dlmglue.c |   99 ++++++++++++++++++++++++++++++++++++++++++++--------
 fs/ocfs2/dlmglue.h |   12 +++++--
 fs/ocfs2/inode.c   |   11 ++++++
 fs/ocfs2/inode.h   |    8 ++++
 fs/ocfs2/namei.c   |   15 +++++---
 fs/ocfs2/ocfs2.h   |    4 ++
 fs/ocfs2/sysfile.c |   17 +++++++++
 7 files changed, 142 insertions(+), 24 deletions(-)

diff --git a/fs/ocfs2/dlmglue.c b/fs/ocfs2/dlmglue.c
index e15fc7d..5cf731f 100644
--- a/fs/ocfs2/dlmglue.c
+++ b/fs/ocfs2/dlmglue.c
@@ -92,6 +92,9 @@ struct ocfs2_unblock_ctl {
 	enum ocfs2_unblock_action unblock_action;
 };
 
+/* Lockdep class keys */
+struct lock_class_key lockdep_keys[OCFS2_NUM_LOCK_TYPES];
+
 static int ocfs2_check_meta_downconvert(struct ocfs2_lock_res *lockres,
 					int new_level);
 static void ocfs2_set_meta_lvb(struct ocfs2_lock_res *lockres);
@@ -313,9 +316,16 @@ static int ocfs2_lock_create(struct ocfs2_super *osb,
 			     u32 dlm_flags);
 static inline int ocfs2_may_continue_on_blocked_lock(struct ocfs2_lock_res *lockres,
 						     int wanted);
-static void ocfs2_cluster_unlock(struct ocfs2_super *osb,
-				 struct ocfs2_lock_res *lockres,
-				 int level);
+static void __ocfs2_cluster_unlock(struct ocfs2_super *osb,
+				   struct ocfs2_lock_res *lockres,
+				   int level, unsigned long ip);
+static inline void ocfs2_cluster_unlock(struct ocfs2_super *osb,
+					struct ocfs2_lock_res *lockres,
+					int level)
+{
+	__ocfs2_cluster_unlock(osb, lockres, level, _RET_IP_);
+}
+
 static inline void ocfs2_generic_handle_downconvert_action(struct ocfs2_lock_res *lockres);
 static inline void ocfs2_generic_handle_convert_action(struct ocfs2_lock_res *lockres);
 static inline void ocfs2_generic_handle_attach_action(struct ocfs2_lock_res *lockres);
@@ -485,6 +495,13 @@ static void ocfs2_lock_res_init_common(struct ocfs2_super *osb,
 	ocfs2_add_lockres_tracking(res, osb->osb_dlm_debug);
 
 	ocfs2_init_lock_stats(res);
+#ifdef CONFIG_DEBUG_LOCK_ALLOC
+	if (type != OCFS2_LOCK_TYPE_OPEN)
+		lockdep_init_map(&res->l_lockdep_map, ocfs2_lock_type_strings[type],
+				 &lockdep_keys[type], 0);
+	else
+		res->l_lockdep_map.key = NULL;
+#endif
 }
 
 void ocfs2_lock_res_init_once(struct ocfs2_lock_res *res)
@@ -1239,11 +1256,21 @@ static int ocfs2_wait_for_mask_interruptible(struct ocfs2_mask_waiter *mw,
 	return ret;
 }
 
-static int ocfs2_cluster_lock(struct ocfs2_super *osb,
-			      struct ocfs2_lock_res *lockres,
-			      int level,
-			      u32 lkm_flags,
-			      int arg_flags)
+#ifdef CONFIG_DEBUG_LOCK_ALLOC
+static int __ocfs2_cluster_lock(struct ocfs2_super *osb,
+				struct ocfs2_lock_res *lockres,
+				int level,
+				u32 lkm_flags,
+				int arg_flags,
+				int l_subclass,
+				unsigned long ip)
+#else
+static int __ocfs2_cluster_lock(struct ocfs2_super *osb,
+				struct ocfs2_lock_res *lockres,
+				int level,
+				u32 lkm_flags,
+				int arg_flags)
+#endif
 {
 	struct ocfs2_mask_waiter mw;
 	int wait, catch_signals = !(osb->s_mount_opt & OCFS2_MOUNT_NOINTR);
@@ -1386,13 +1413,45 @@ out:
 	}
 	ocfs2_update_lock_stats(lockres, level, &mw, ret);
 
+#ifdef CONFIG_DEBUG_LOCK_ALLOC
+	if (!ret && lockres->l_lockdep_map.key != NULL) {
+		if (level == DLM_LOCK_PR)
+			rwsem_acquire_read(&lockres->l_lockdep_map, l_subclass,
+				!!(arg_flags & OCFS2_META_LOCK_NOQUEUE), ip);
+		else
+			rwsem_acquire(&lockres->l_lockdep_map, l_subclass,
+				!!(arg_flags & OCFS2_META_LOCK_NOQUEUE), ip);
+	}
+#endif
 	mlog_exit(ret);
 	return ret;
 }
 
-static void ocfs2_cluster_unlock(struct ocfs2_super *osb,
-				 struct ocfs2_lock_res *lockres,
-				 int level)
+static inline int ocfs2_cluster_lock(struct ocfs2_super *osb,
+				     struct ocfs2_lock_res *lockres,
+				     int level,
+				     u32 lkm_flags,
+				     int arg_flags)
+{
+#ifdef CONFIG_DEBUG_LOCK_ALLOC
+	return __ocfs2_cluster_lock(osb, lockres, level, lkm_flags, arg_flags,
+				    0, _RET_IP_);
+#else
+	return __ocfs2_cluster_lock(osb, lockres, level, lkm_flags, arg_flags);
+#endif
+}
+
+
+#ifdef CONFIG_DEBUG_LOCK_ALLOC
+static void __ocfs2_cluster_unlock(struct ocfs2_super *osb,
+				   struct ocfs2_lock_res *lockres,
+				   int level,
+				   unsigned long ip)
+#else
+static void __ocfs2_cluster_unlock(struct ocfs2_super *osb,
+				   struct ocfs2_lock_res *lockres,
+				   int level)
+#endif
 {
 	unsigned long flags;
 
@@ -1401,6 +1460,10 @@ static void ocfs2_cluster_unlock(struct ocfs2_super *osb,
 	ocfs2_dec_holders(lockres, level);
 	ocfs2_downconvert_on_unlock(osb, lockres);
 	spin_unlock_irqrestore(&lockres->l_lock, flags);
+#ifdef CONFIG_DEBUG_LOCK_ALLOC
+	if (lockres->l_lockdep_map.key != NULL)
+		rwsem_release(&lockres->l_lockdep_map, 1, ip);
+#endif
 	mlog_exit_void();
 }
 
@@ -2145,10 +2208,11 @@ static int ocfs2_assign_bh(struct inode *inode,
  * returns < 0 error if the callback will never be called, otherwise
  * the result of the lock will be communicated via the callback.
  */
-int ocfs2_inode_lock_full(struct inode *inode,
-			 struct buffer_head **ret_bh,
-			 int ex,
-			 int arg_flags)
+int ocfs2_inode_lock_full_nested(struct inode *inode,
+				 struct buffer_head **ret_bh,
+				 int ex,
+				 int arg_flags,
+				 int subclass)
 {
 	int status, level, acquired;
 	u32 dlm_flags;
@@ -2186,7 +2250,12 @@ int ocfs2_inode_lock_full(struct inode *inode,
 	if (arg_flags & OCFS2_META_LOCK_NOQUEUE)
 		dlm_flags |= DLM_LKF_NOQUEUE;
 
+#ifdef CONFIG_DEBUG_LOCK_ALLOC
+	status = __ocfs2_cluster_lock(osb, lockres, level, dlm_flags,
+				      arg_flags, subclass, _RET_IP_);
+#else
 	status = ocfs2_cluster_lock(osb, lockres, level, dlm_flags, arg_flags);
+#endif
 	if (status < 0) {
 		if (status != -EAGAIN && status != -EIOCBRETRY)
 			mlog_errno(status);
diff --git a/fs/ocfs2/dlmglue.h b/fs/ocfs2/dlmglue.h
index e1fd572..9bb9eb5 100644
--- a/fs/ocfs2/dlmglue.h
+++ b/fs/ocfs2/dlmglue.h
@@ -96,17 +96,23 @@ void ocfs2_open_unlock(struct inode *inode);
 int ocfs2_inode_lock_atime(struct inode *inode,
 			  struct vfsmount *vfsmnt,
 			  int *level);
-int ocfs2_inode_lock_full(struct inode *inode,
+int ocfs2_inode_lock_full_nested(struct inode *inode,
 			 struct buffer_head **ret_bh,
 			 int ex,
-			 int arg_flags);
+			 int arg_flags,
+			 int subclass);
 int ocfs2_inode_lock_with_page(struct inode *inode,
 			      struct buffer_head **ret_bh,
 			      int ex,
 			      struct page *page);
+/* Variants without special locking class or flags */
+#define ocfs2_inode_lock_full(i, r, e, f)\
+		ocfs2_inode_lock_full_nested(i, r, e, f, 0)
+#define ocfs2_inode_lock_nested(i, b, e, s)\
+		ocfs2_inode_lock_full_nested(i, b, e, 0, s)
 /* 99% of the time we don't want to supply any additional flags --
  * those are for very specific cases only. */
-#define ocfs2_inode_lock(i, b, e) ocfs2_inode_lock_full(i, b, e, 0)
+#define ocfs2_inode_lock(i, b, e) ocfs2_inode_lock_full_nested(i, b, e, 0, 0)
 void ocfs2_inode_unlock(struct inode *inode,
 		       int ex);
 int ocfs2_super_lock(struct ocfs2_super *osb,
diff --git a/fs/ocfs2/inode.c b/fs/ocfs2/inode.c
index 10e1fa8..4dc8890 100644
--- a/fs/ocfs2/inode.c
+++ b/fs/ocfs2/inode.c
@@ -215,6 +215,8 @@ bail:
 static int ocfs2_init_locked_inode(struct inode *inode, void *opaque)
 {
 	struct ocfs2_find_inode_args *args = opaque;
+	static struct lock_class_key ocfs2_quota_ip_alloc_sem_key,
+				     ocfs2_file_ip_alloc_sem_key;
 
 	mlog_entry("inode = %p, opaque = %p\n", inode, opaque);
 
@@ -223,6 +225,15 @@ static int ocfs2_init_locked_inode(struct inode *inode, void *opaque)
 	if (args->fi_sysfile_type != 0)
 		lockdep_set_class(&inode->i_mutex,
 			&ocfs2_sysfile_lock_key[args->fi_sysfile_type]);
+	if (args->fi_sysfile_type == USER_QUOTA_SYSTEM_INODE ||
+	    args->fi_sysfile_type == GROUP_QUOTA_SYSTEM_INODE ||
+	    args->fi_sysfile_type == LOCAL_USER_QUOTA_SYSTEM_INODE ||
+	    args->fi_sysfile_type == LOCAL_GROUP_QUOTA_SYSTEM_INODE)
+		lockdep_set_class(&OCFS2_I(inode)->ip_alloc_sem,
+				  &ocfs2_quota_ip_alloc_sem_key);
+	else
+		lockdep_set_class(&OCFS2_I(inode)->ip_alloc_sem,
+				  &ocfs2_file_ip_alloc_sem_key);
 
 	mlog_exit(0);
 	return 0;
diff --git a/fs/ocfs2/inode.h b/fs/ocfs2/inode.h
index ea71525..7096516 100644
--- a/fs/ocfs2/inode.h
+++ b/fs/ocfs2/inode.h
@@ -109,6 +109,14 @@ struct ocfs2_inode_info
 /* Indicates that the metadata cache should be used as an array. */
 #define OCFS2_INODE_CACHE_INLINE	0x00000080
 
+/* Locking subclasses of inode cluster lock */
+enum {
+	OI_LS_NORMAL,
+	OI_LS_PARENT,
+	OI_LS_RENAME1,
+	OI_LS_RENAME2,
+};
+
 static inline struct ocfs2_inode_info *OCFS2_I(struct inode *inode)
 {
 	return container_of(inode, struct ocfs2_inode_info, vfs_inode);
diff --git a/fs/ocfs2/namei.c b/fs/ocfs2/namei.c
index 33464c6..8601f93 100644
--- a/fs/ocfs2/namei.c
+++ b/fs/ocfs2/namei.c
@@ -118,7 +118,7 @@ static struct dentry *ocfs2_lookup(struct inode *dir, struct dentry *dentry,
 	mlog(0, "find name %.*s in directory %llu\n", dentry->d_name.len,
 	     dentry->d_name.name, (unsigned long long)OCFS2_I(dir)->ip_blkno);
 
-	status = ocfs2_inode_lock(dir, NULL, 0);
+	status = ocfs2_inode_lock_nested(dir, NULL, 0, OI_LS_PARENT);
 	if (status < 0) {
 		if (status != -ENOENT)
 			mlog_errno(status);
@@ -636,7 +636,7 @@ static int ocfs2_link(struct dentry *old_dentry,
 	if (S_ISDIR(inode->i_mode))
 		return -EPERM;
 
-	err = ocfs2_inode_lock(dir, &parent_fe_bh, 1);
+	err = ocfs2_inode_lock_nested(dir, &parent_fe_bh, 1, OI_LS_PARENT);
 	if (err < 0) {
 		if (err != -ENOENT)
 			mlog_errno(err);
@@ -800,7 +800,8 @@ static int ocfs2_unlink(struct inode *dir,
 		return -EPERM;
 	}
 
-	status = ocfs2_inode_lock(dir, &parent_node_bh, 1);
+	status = ocfs2_inode_lock_nested(dir, &parent_node_bh, 1,
+					 OI_LS_PARENT);
 	if (status < 0) {
 		if (status != -ENOENT)
 			mlog_errno(status);
@@ -978,7 +979,8 @@ static int ocfs2_double_lock(struct ocfs2_super *osb,
 			inode1 = tmpinode;
 		}
 		/* lock id2 */
-		status = ocfs2_inode_lock(inode2, bh2, 1);
+		status = ocfs2_inode_lock_nested(inode2, bh2, 1,
+						 OI_LS_RENAME1);
 		if (status < 0) {
 			if (status != -ENOENT)
 				mlog_errno(status);
@@ -987,7 +989,7 @@ static int ocfs2_double_lock(struct ocfs2_super *osb,
 	}
 
 	/* lock id1 */
-	status = ocfs2_inode_lock(inode1, bh1, 1);
+	status = ocfs2_inode_lock_nested(inode1, bh1, 1, OI_LS_RENAME2);
 	if (status < 0) {
 		/*
 		 * An error return must mean that no cluster locks
@@ -1103,7 +1105,8 @@ static int ocfs2_rename(struct inode *old_dir,
 	 * won't have to concurrently downconvert the inode and the
 	 * dentry locks.
 	 */
-	status = ocfs2_inode_lock(old_inode, &old_inode_bh, 1);
+	status = ocfs2_inode_lock_nested(old_inode, &old_inode_bh, 1,
+					 OI_LS_PARENT);
 	if (status < 0) {
 		if (status != -ENOENT)
 			mlog_errno(status);
diff --git a/fs/ocfs2/ocfs2.h b/fs/ocfs2/ocfs2.h
index 1386281..99e89ba 100644
--- a/fs/ocfs2/ocfs2.h
+++ b/fs/ocfs2/ocfs2.h
@@ -34,6 +34,7 @@
 #include <linux/workqueue.h>
 #include <linux/kref.h>
 #include <linux/mutex.h>
+#include <linux/lockdep.h>
 #ifndef CONFIG_OCFS2_COMPAT_JBD
 # include <linux/jbd2.h>
 #else
@@ -149,6 +150,9 @@ struct ocfs2_lock_res {
 	unsigned int		 l_lock_max_exmode; 	   /* Max wait for EX */
 	unsigned int		 l_lock_refresh;	   /* Disk refreshes */
 #endif
+#ifdef CONFIG_DEBUG_LOCK_ALLOC
+	struct lockdep_map	 l_lockdep_map;
+#endif
 };
 
 struct ocfs2_dlm_debug {
diff --git a/fs/ocfs2/sysfile.c b/fs/ocfs2/sysfile.c
index ab713eb..6f53f5e 100644
--- a/fs/ocfs2/sysfile.c
+++ b/fs/ocfs2/sysfile.c
@@ -50,6 +50,8 @@ static inline int is_in_system_inode_array(struct ocfs2_super *osb,
 					   int type,
 					   u32 slot);
 
+static struct lock_class_key ocfs2_sysfile_cluster_lock_key[NUM_SYSTEM_INODES];
+
 static inline int is_global_system_inode(int type)
 {
 	return type >= OCFS2_FIRST_ONLINE_SYSTEM_INODE &&
@@ -118,6 +120,21 @@ static struct inode * _ocfs2_get_system_file_inode(struct ocfs2_super *osb,
 		inode = NULL;
 		goto bail;
 	}
+#ifdef CONFIG_DEBUG_LOCK_ALLOC
+	if (type == LOCAL_USER_QUOTA_SYSTEM_INODE ||
+	    type == LOCAL_GROUP_QUOTA_SYSTEM_INODE ||
+	    type == JOURNAL_SYSTEM_INODE) {
+		/* Ignore inode lock on these inodes as the lock does not
+		 * really belong to any process and lockdep cannot handle
+		 * that */
+		OCFS2_I(inode)->ip_inode_lockres.l_lockdep_map.key = NULL;
+	} else {
+		lockdep_init_map(&OCFS2_I(inode)->ip_inode_lockres.
+								l_lockdep_map,
+				 ocfs2_system_inodes[type].si_name,
+				 &ocfs2_sysfile_cluster_lock_key[type], 0);
+	}
+#endif
 bail:
 
 	return inode;
-- 
1.6.0.2

^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [Ocfs2-devel] [PATCH] vfs: Set special lockdep map for dirs only if not set by fs
  2009-06-02 12:24 ` [Ocfs2-devel] [PATCH] vfs: Set special lockdep map for dirs only if not set by fs Jan Kara
@ 2009-06-02 12:53   ` Peter Zijlstra
  0 siblings, 0 replies; 20+ messages in thread
From: Peter Zijlstra @ 2009-06-02 12:53 UTC (permalink / raw)
  To: ocfs2-devel

On Tue, 2009-06-02 at 14:24 +0200, Jan Kara wrote:
> Some filesystems need to set lockdep map for i_mutex differently for
> different directories. For example OCFS2 has system directories (for
> orphan inode tracking and for gathering all system files like journal
> or quota files into a single place) which have different locking
> locking rules than standard directories. For a filesystem setting
> lockdep map is naturaly done when the inode is read but we have to
> modify unlock_new_inode() not to overwrite the lockdep map the filesystem
> has set.

Still looks good ;-)

Acked-by: peterz at infradead.org

> CC: mingo at redhat.com
> Signed-off-by: Jan Kara <jack@suse.cz>
> ---
>  fs/inode.c              |   17 +++++++++++------
>  include/linux/lockdep.h |   15 +++++++++++++++
>  2 files changed, 26 insertions(+), 6 deletions(-)
> 
> diff --git a/fs/inode.c b/fs/inode.c
> index 6ad14a1..44e1848 100644
> --- a/fs/inode.c
> +++ b/fs/inode.c
> @@ -654,12 +654,17 @@ void unlock_new_inode(struct inode *inode)
>  	if (inode->i_mode & S_IFDIR) {
>  		struct file_system_type *type = inode->i_sb->s_type;
>  
> -		/*
> -		 * ensure nobody is actually holding i_mutex
> -		 */
> -		mutex_destroy(&inode->i_mutex);
> -		mutex_init(&inode->i_mutex);
> -		lockdep_set_class(&inode->i_mutex, &type->i_mutex_dir_key);
> +		/* Set new key only if filesystem hasn't already changed it */
> +		if (!lockdep_match_class(&inode->i_mutex,
> +		    &type->i_mutex_key)) {
> +			/*
> +			 * ensure nobody is actually holding i_mutex
> +			 */
> +			mutex_destroy(&inode->i_mutex);
> +			mutex_init(&inode->i_mutex);
> +			lockdep_set_class(&inode->i_mutex,
> +					  &type->i_mutex_dir_key);
> +		}
>  	}
>  #endif
>  	/*
> diff --git a/include/linux/lockdep.h b/include/linux/lockdep.h
> index da5a5a1..b25d1b5 100644
> --- a/include/linux/lockdep.h
> +++ b/include/linux/lockdep.h
> @@ -258,6 +258,16 @@ extern void lockdep_init_map(struct lockdep_map *lock, const char *name,
>  #define lockdep_set_subclass(lock, sub)	\
>  		lockdep_init_map(&(lock)->dep_map, #lock, \
>  				 (lock)->dep_map.key, sub)
> +/*
> + * Compare locking classes
> + */
> +#define lockdep_match_class(lock, key) lockdep_match_key(&(lock)->dep_map, key)
> +
> +static inline int lockdep_match_key(struct lockdep_map *lock,
> +				    struct lock_class_key *key)
> +{
> +	return lock->key == key;
> +}
>  
>  /*
>   * Acquire a lock.
> @@ -326,6 +336,11 @@ static inline void lockdep_on(void)
>  #define lockdep_set_class_and_subclass(lock, key, sub) \
>  		do { (void)(key); } while (0)
>  #define lockdep_set_subclass(lock, sub)		do { } while (0)
> +/*
> + * We don't define lockdep_match_class() and lockdep_match_key() for !LOCKDEP
> + * case since the result is not well defined and the caller should rather
> + * #ifdef the call himself.
> + */
>  
>  # define INIT_LOCKDEP
>  # define lockdep_reset()		do { debug_locks = 1; } while (0)

^ permalink raw reply	[flat|nested] 20+ messages in thread

* [Ocfs2-devel] [PATCH 0/7] [RESEND] Fix some deadlocks in quota code and implement lockdep for cluster locks
  2009-06-02 12:23 [Ocfs2-devel] [PATCH 0/7] [RESEND] Fix some deadlocks in quota code and implement lockdep for cluster locks Jan Kara
                   ` (6 preceding siblings ...)
  2009-06-02 12:24 ` [Ocfs2-devel] [PATCH] ocfs2: Add lockdep annotations Jan Kara
@ 2009-06-02 18:43 ` Joel Becker
  2009-06-04  2:26 ` Joel Becker
  8 siblings, 0 replies; 20+ messages in thread
From: Joel Becker @ 2009-06-02 18:43 UTC (permalink / raw)
  To: ocfs2-devel

On Tue, Jun 02, 2009 at 02:23:58PM +0200, Jan Kara wrote:
>   BTW: Currently lockdep complains about some configfs issue and turns itself
> off so that's why I didn't get to finding more deadlocks ;)

	That's in linux-next of the configfs tree :-)

Joel

-- 

"Hell is oneself, hell is alone, the other figures in it, merely projections."
        - T. S. Eliot

Joel Becker
Principal Software Developer
Oracle
E-mail: joel.becker at oracle.com
Phone: (650) 506-8127

^ permalink raw reply	[flat|nested] 20+ messages in thread

* [Ocfs2-devel] [PATCH] ocfs2: Fix possible deadlock in ocfs2_global_read_dquot()
  2009-06-02 12:23 ` [Ocfs2-devel] [PATCH] ocfs2: Fix possible deadlock in ocfs2_global_read_dquot() Jan Kara
@ 2009-06-03  0:15   ` Joel Becker
  2009-06-03 13:31     ` Jan Kara
  0 siblings, 1 reply; 20+ messages in thread
From: Joel Becker @ 2009-06-03  0:15 UTC (permalink / raw)
  To: ocfs2-devel

On Tue, Jun 02, 2009 at 02:23:59PM +0200, Jan Kara wrote:
> It is not possible to get a read lock and then try to get the same write lock
> in one thread as that can block on downconvert being requested by other node
> leading to deadlock. So first drop the quota lock for reading and only after
> that get it for writing.
> 
> Signed-off-by: Jan Kara <jack@suse.cz>
> ---
>  fs/ocfs2/quota_global.c |    4 +++-
>  1 files changed, 3 insertions(+), 1 deletions(-)
> 
> diff --git a/fs/ocfs2/quota_global.c b/fs/ocfs2/quota_global.c
> index 1ed0f7c..edfa60c 100644
> --- a/fs/ocfs2/quota_global.c
> +++ b/fs/ocfs2/quota_global.c
> @@ -421,6 +421,7 @@ int ocfs2_global_read_dquot(struct dquot *dquot)
>  	OCFS2_DQUOT(dquot)->dq_originodes = dquot->dq_dqb.dqb_curinodes;
>  	if (!dquot->dq_off) {	/* No real quota entry? */
>  		/* Upgrade to exclusive lock for allocation */
> +		ocfs2_qinfo_unlock(info, 0);
>  		err = ocfs2_qinfo_lock(info, 1);
>  		if (err < 0)
>  			goto out_qlock;

	Can other nodes change the state of dq_off while you're
getting the exclusive?  I'm wondering if you have to retry the
qtree_read_dquot(), and I don't know enough about the code to say yes or
no.

Joel

-- 

	f/8 and be there.

Joel Becker
Principal Software Developer
Oracle
E-mail: joel.becker at oracle.com
Phone: (650) 506-8127

^ permalink raw reply	[flat|nested] 20+ messages in thread

* [Ocfs2-devel] [PATCH] ocfs2: Fix possible deadlock in ocfs2_global_read_dquot()
  2009-06-03  0:15   ` Joel Becker
@ 2009-06-03 13:31     ` Jan Kara
  0 siblings, 0 replies; 20+ messages in thread
From: Jan Kara @ 2009-06-03 13:31 UTC (permalink / raw)
  To: ocfs2-devel

On Tue 02-06-09 17:15:54, Joel Becker wrote:
> On Tue, Jun 02, 2009 at 02:23:59PM +0200, Jan Kara wrote:
> > It is not possible to get a read lock and then try to get the same write lock
> > in one thread as that can block on downconvert being requested by other node
> > leading to deadlock. So first drop the quota lock for reading and only after
> > that get it for writing.
> > 
> > Signed-off-by: Jan Kara <jack@suse.cz>
> > ---
> >  fs/ocfs2/quota_global.c |    4 +++-
> >  1 files changed, 3 insertions(+), 1 deletions(-)
> > 
> > diff --git a/fs/ocfs2/quota_global.c b/fs/ocfs2/quota_global.c
> > index 1ed0f7c..edfa60c 100644
> > --- a/fs/ocfs2/quota_global.c
> > +++ b/fs/ocfs2/quota_global.c
> > @@ -421,6 +421,7 @@ int ocfs2_global_read_dquot(struct dquot *dquot)
> >  	OCFS2_DQUOT(dquot)->dq_originodes = dquot->dq_dqb.dqb_curinodes;
> >  	if (!dquot->dq_off) {	/* No real quota entry? */
> >  		/* Upgrade to exclusive lock for allocation */
> > +		ocfs2_qinfo_unlock(info, 0);
> >  		err = ocfs2_qinfo_lock(info, 1);
> >  		if (err < 0)
> >  			goto out_qlock;
> 
> 	Can other nodes change the state of dq_off while you're
> getting the exclusive?  I'm wondering if you have to retry the
> qtree_read_dquot(), and I don't know enough about the code to say yes or
> no.
  Good question. No, we don't have to restart it since we also hold a
global quotafile lock which protects us from other nodes adding the quota
entry. There's a comment about it just above ocfs2_global_read_dquot()...

									Honza
-- 
Jan Kara <jack@suse.cz>
SUSE Labs, CR

^ permalink raw reply	[flat|nested] 20+ messages in thread

* [Ocfs2-devel] [PATCH] ocfs2: Add lockdep annotations
  2009-06-02 12:24 ` [Ocfs2-devel] [PATCH] ocfs2: Add lockdep annotations Jan Kara
@ 2009-06-03 16:31   ` Joel Becker
  2009-06-03 21:48     ` Jan Kara
  0 siblings, 1 reply; 20+ messages in thread
From: Joel Becker @ 2009-06-03 16:31 UTC (permalink / raw)
  To: ocfs2-devel

On Tue, Jun 02, 2009 at 02:24:05PM +0200, Jan Kara wrote:
> Add lockdep support to OCFS2. The support also covers all of the cluster
> locks except for open locks, journal locks, and local quotafile locks. These
> are special because they are acquired for a node, not for a particular process
> and lockdep cannot deal with such type of locking.

	I like the idea of ocfs2 having lockdep on everything.

> -static void ocfs2_cluster_unlock(struct ocfs2_super *osb,
> -				 struct ocfs2_lock_res *lockres,
> -				 int level);
> +static void __ocfs2_cluster_unlock(struct ocfs2_super *osb,
> +				   struct ocfs2_lock_res *lockres,
> +				   int level, unsigned long ip);
> +static inline void ocfs2_cluster_unlock(struct ocfs2_super *osb,
> +					struct ocfs2_lock_res *lockres,
> +					int level)
> +{
> +	__ocfs2_cluster_unlock(osb, lockres, level, _RET_IP_);
> +}

	I don't know why 'ip' needs to be part of the function prototype
- every place you use it, you just pass _RET_IP_.

> @@ -1239,11 +1256,21 @@ static int ocfs2_wait_for_mask_interruptible(struct ocfs2_mask_waiter *mw,
>  	return ret;
>  }
>  
> -static int ocfs2_cluster_lock(struct ocfs2_super *osb,
> -			      struct ocfs2_lock_res *lockres,
> -			      int level,
> -			      u32 lkm_flags,
> -			      int arg_flags)
> +#ifdef CONFIG_DEBUG_LOCK_ALLOC
> +static int __ocfs2_cluster_lock(struct ocfs2_super *osb,
> +				struct ocfs2_lock_res *lockres,
> +				int level,
> +				u32 lkm_flags,
> +				int arg_flags,
> +				int l_subclass,
> +				unsigned long ip)
> +#else
> +static int __ocfs2_cluster_lock(struct ocfs2_super *osb,
> +				struct ocfs2_lock_res *lockres,
> +				int level,
> +				u32 lkm_flags,
> +				int arg_flags)
> +#endif
>  {
>  	struct ocfs2_mask_waiter mw;
>  	int wait, catch_signals = !(osb->s_mount_opt & OCFS2_MOUNT_NOINTR);
> @@ -1386,13 +1413,45 @@ out:
>  	}
>  	ocfs2_update_lock_stats(lockres, level, &mw, ret);
>  
> +#ifdef CONFIG_DEBUG_LOCK_ALLOC
> +	if (!ret && lockres->l_lockdep_map.key != NULL) {
> +		if (level == DLM_LOCK_PR)
> +			rwsem_acquire_read(&lockres->l_lockdep_map, l_subclass,
> +				!!(arg_flags & OCFS2_META_LOCK_NOQUEUE), ip);
> +		else
> +			rwsem_acquire(&lockres->l_lockdep_map, l_subclass,
> +				!!(arg_flags & OCFS2_META_LOCK_NOQUEUE), ip);
> +	}
> +#endif
>  	mlog_exit(ret);
>  	return ret;
>  }

	Here again you've wrapped an ifdef around whether 'ip' is in the
argument list or not.  Since it's always _RET_IP_, why not just leave it
out of the argument list and just pas _RET_IP_ to the rwsem_acquire
calls?  That also eliminates the ugly ifdefs and need for
__ocfs2_cluster_lock().

> +#ifdef CONFIG_DEBUG_LOCK_ALLOC
> +static void __ocfs2_cluster_unlock(struct ocfs2_super *osb,
> +				   struct ocfs2_lock_res *lockres,
> +				   int level,
> +				   unsigned long ip)
> +#else
> +static void __ocfs2_cluster_unlock(struct ocfs2_super *osb,
> +				   struct ocfs2_lock_res *lockres,
> +				   int level)
> +#endif
>  {
>  	unsigned long flags;
>  
> @@ -1401,6 +1460,10 @@ static void ocfs2_cluster_unlock(struct ocfs2_super *osb,
>  	ocfs2_dec_holders(lockres, level);
>  	ocfs2_downconvert_on_unlock(osb, lockres);
>  	spin_unlock_irqrestore(&lockres->l_lock, flags);
> +#ifdef CONFIG_DEBUG_LOCK_ALLOC
> +	if (lockres->l_lockdep_map.key != NULL)
> +		rwsem_release(&lockres->l_lockdep_map, 1, ip);
> +#endif
>  	mlog_exit_void();
>  }

	Same here.

> @@ -2145,10 +2208,11 @@ static int ocfs2_assign_bh(struct inode *inode,
>   * returns < 0 error if the callback will never be called, otherwise
>   * the result of the lock will be communicated via the callback.
>   */
> -int ocfs2_inode_lock_full(struct inode *inode,
> -			 struct buffer_head **ret_bh,
> -			 int ex,
> -			 int arg_flags)
> +int ocfs2_inode_lock_full_nested(struct inode *inode,
> +				 struct buffer_head **ret_bh,
> +				 int ex,
> +				 int arg_flags,
> +				 int subclass)
>  {
>  	int status, level, acquired;
>  	u32 dlm_flags;
> @@ -2186,7 +2250,12 @@ int ocfs2_inode_lock_full(struct inode *inode,
>  	if (arg_flags & OCFS2_META_LOCK_NOQUEUE)
>  		dlm_flags |= DLM_LKF_NOQUEUE;
>  
> +#ifdef CONFIG_DEBUG_LOCK_ALLOC
> +	status = __ocfs2_cluster_lock(osb, lockres, level, dlm_flags,
> +				      arg_flags, subclass, _RET_IP_);
> +#else
>  	status = ocfs2_cluster_lock(osb, lockres, level, dlm_flags, arg_flags);
> +#endif

	This hunk wouldn't be necessary either.

> --- a/fs/ocfs2/inode.c
> +++ b/fs/ocfs2/inode.c
> @@ -215,6 +215,8 @@ bail:
>  static int ocfs2_init_locked_inode(struct inode *inode, void *opaque)
>  {
>  	struct ocfs2_find_inode_args *args = opaque;
> +	static struct lock_class_key ocfs2_quota_ip_alloc_sem_key,
> +				     ocfs2_file_ip_alloc_sem_key;
>  
>  	mlog_entry("inode = %p, opaque = %p\n", inode, opaque);
>  
> @@ -223,6 +225,15 @@ static int ocfs2_init_locked_inode(struct inode *inode, void *opaque)
>  	if (args->fi_sysfile_type != 0)
>  		lockdep_set_class(&inode->i_mutex,
>  			&ocfs2_sysfile_lock_key[args->fi_sysfile_type]);
> +	if (args->fi_sysfile_type == USER_QUOTA_SYSTEM_INODE ||
> +	    args->fi_sysfile_type == GROUP_QUOTA_SYSTEM_INODE ||
> +	    args->fi_sysfile_type == LOCAL_USER_QUOTA_SYSTEM_INODE ||
> +	    args->fi_sysfile_type == LOCAL_GROUP_QUOTA_SYSTEM_INODE)
> +		lockdep_set_class(&OCFS2_I(inode)->ip_alloc_sem,
> +				  &ocfs2_quota_ip_alloc_sem_key);
> +	else
> +		lockdep_set_class(&OCFS2_I(inode)->ip_alloc_sem,
> +				  &ocfs2_file_ip_alloc_sem_key);

	These keys don't need to be intialized?

> diff --git a/fs/ocfs2/inode.h b/fs/ocfs2/inode.h
> index ea71525..7096516 100644
> --- a/fs/ocfs2/inode.h
> +++ b/fs/ocfs2/inode.h
> @@ -109,6 +109,14 @@ struct ocfs2_inode_info
>  /* Indicates that the metadata cache should be used as an array. */
>  #define OCFS2_INODE_CACHE_INLINE	0x00000080
>  
> +/* Locking subclasses of inode cluster lock */
> +enum {
> +	OI_LS_NORMAL,
> +	OI_LS_PARENT,
> +	OI_LS_RENAME1,
> +	OI_LS_RENAME2,
> +};
> +

	I couldn't see where OI_LS_NORMAL was ever used, until I 
realized that your #define of ocfs2_inode_lock_full hardcodes
OI_LS_NORMAL as '0'.  It should really use the subclass name, right?

Joel

-- 

"Senator let's be sincere,
 As much as you can."

Joel Becker
Principal Software Developer
Oracle
E-mail: joel.becker at oracle.com
Phone: (650) 506-8127

^ permalink raw reply	[flat|nested] 20+ messages in thread

* [Ocfs2-devel] [PATCH] ocfs2: Correct ordering of ip_alloc_sem and localloc locks for directories
  2009-06-02 12:24 ` [Ocfs2-devel] [PATCH] ocfs2: Correct ordering of ip_alloc_sem and localloc locks for directories Jan Kara
@ 2009-06-03 21:05   ` Mark Fasheh
  0 siblings, 0 replies; 20+ messages in thread
From: Mark Fasheh @ 2009-06-03 21:05 UTC (permalink / raw)
  To: ocfs2-devel

On Tue, Jun 02, 2009 at 02:24:03PM +0200, Jan Kara wrote:
> We use ordering ip_alloc_sem -> local alloc locks in ocfs2_write_begin().
> So change lock ordering in ocfs2_extend_dir() and ocfs2_expand_inline_dir()
> to also use this lock ordering.
> 
> Signed-off-by: Jan Kara <jack@suse.cz>
Acked-by: Mark Fasheh <mfasheh@suse.com>

--
Mark Fasheh

^ permalink raw reply	[flat|nested] 20+ messages in thread

* [Ocfs2-devel] [PATCH] ocfs2: Add lockdep annotations
  2009-06-03 16:31   ` Joel Becker
@ 2009-06-03 21:48     ` Jan Kara
  2009-06-03 22:42       ` Joel Becker
  0 siblings, 1 reply; 20+ messages in thread
From: Jan Kara @ 2009-06-03 21:48 UTC (permalink / raw)
  To: ocfs2-devel

  Hi Joel,

  thanks for review.

On Wed 03-06-09 09:31:56, Joel Becker wrote:
> On Tue, Jun 02, 2009 at 02:24:05PM +0200, Jan Kara wrote:
> > Add lockdep support to OCFS2. The support also covers all of the cluster
> > locks except for open locks, journal locks, and local quotafile locks. These
> > are special because they are acquired for a node, not for a particular process
> > and lockdep cannot deal with such type of locking.
> 
> 	I like the idea of ocfs2 having lockdep on everything.
> 
> > -static void ocfs2_cluster_unlock(struct ocfs2_super *osb,
> > -				 struct ocfs2_lock_res *lockres,
> > -				 int level);
> > +static void __ocfs2_cluster_unlock(struct ocfs2_super *osb,
> > +				   struct ocfs2_lock_res *lockres,
> > +				   int level, unsigned long ip);
> > +static inline void ocfs2_cluster_unlock(struct ocfs2_super *osb,
> > +					struct ocfs2_lock_res *lockres,
> > +					int level)
> > +{
> > +	__ocfs2_cluster_unlock(osb, lockres, level, _RET_IP_);
> > +}
> 
> 	I don't know why 'ip' needs to be part of the function prototype
> - every place you use it, you just pass _RET_IP_.
  The problem is following: I need to give to lockdep code an IP of a
function which acquires a lock. For this to be useful I don't want to show
"ocfs2_cluster_lock" or "ocfs2_rw_lock" all the time. So I have to get IP
of the function calling e.g. ocfs2_rw_lock and that's why there's this
ip argument because it's hard to find what the IP of the caller of our
caller is...

> > @@ -1239,11 +1256,21 @@ static int ocfs2_wait_for_mask_interruptible(struct ocfs2_mask_waiter *mw,
> >  	return ret;
> >  }
> >  
> > -static int ocfs2_cluster_lock(struct ocfs2_super *osb,
> > -			      struct ocfs2_lock_res *lockres,
> > -			      int level,
> > -			      u32 lkm_flags,
> > -			      int arg_flags)
> > +#ifdef CONFIG_DEBUG_LOCK_ALLOC
> > +static int __ocfs2_cluster_lock(struct ocfs2_super *osb,
> > +				struct ocfs2_lock_res *lockres,
> > +				int level,
> > +				u32 lkm_flags,
> > +				int arg_flags,
> > +				int l_subclass,
> > +				unsigned long ip)
> > +#else
> > +static int __ocfs2_cluster_lock(struct ocfs2_super *osb,
> > +				struct ocfs2_lock_res *lockres,
> > +				int level,
> > +				u32 lkm_flags,
> > +				int arg_flags)
> > +#endif
> >  {
> >  	struct ocfs2_mask_waiter mw;
> >  	int wait, catch_signals = !(osb->s_mount_opt & OCFS2_MOUNT_NOINTR);
> > @@ -1386,13 +1413,45 @@ out:
> >  	}
> >  	ocfs2_update_lock_stats(lockres, level, &mw, ret);
> >  
> > +#ifdef CONFIG_DEBUG_LOCK_ALLOC
> > +	if (!ret && lockres->l_lockdep_map.key != NULL) {
> > +		if (level == DLM_LOCK_PR)
> > +			rwsem_acquire_read(&lockres->l_lockdep_map, l_subclass,
> > +				!!(arg_flags & OCFS2_META_LOCK_NOQUEUE), ip);
> > +		else
> > +			rwsem_acquire(&lockres->l_lockdep_map, l_subclass,
> > +				!!(arg_flags & OCFS2_META_LOCK_NOQUEUE), ip);
> > +	}
> > +#endif
> >  	mlog_exit(ret);
> >  	return ret;
> >  }
> 
> 	Here again you've wrapped an ifdef around whether 'ip' is in the
> argument list or not.  Since it's always _RET_IP_, why not just leave it
> out of the argument list and just pas _RET_IP_ to the rwsem_acquire
> calls?  That also eliminates the ugly ifdefs and need for
> __ocfs2_cluster_lock().
  I agree ifdefs aren't nice but I wanted to avoid the unnecessary argument
in !LOCKDEP case. I could also just always pass _RET_IP_ and hope that
compiler optimizes this in case we don't use the value of ip argument in
!LOCKDEP case.
  I can do that if you think it's better.
 
> > +#ifdef CONFIG_DEBUG_LOCK_ALLOC
> > +static void __ocfs2_cluster_unlock(struct ocfs2_super *osb,
> > +				   struct ocfs2_lock_res *lockres,
> > +				   int level,
> > +				   unsigned long ip)
> > +#else
> > +static void __ocfs2_cluster_unlock(struct ocfs2_super *osb,
> > +				   struct ocfs2_lock_res *lockres,
> > +				   int level)
> > +#endif
> >  {
> >  	unsigned long flags;
> >  
> > @@ -1401,6 +1460,10 @@ static void ocfs2_cluster_unlock(struct ocfs2_super *osb,
> >  	ocfs2_dec_holders(lockres, level);
> >  	ocfs2_downconvert_on_unlock(osb, lockres);
> >  	spin_unlock_irqrestore(&lockres->l_lock, flags);
> > +#ifdef CONFIG_DEBUG_LOCK_ALLOC
> > +	if (lockres->l_lockdep_map.key != NULL)
> > +		rwsem_release(&lockres->l_lockdep_map, 1, ip);
> > +#endif
> >  	mlog_exit_void();
> >  }
> 
> 	Same here.
> 
> > @@ -2145,10 +2208,11 @@ static int ocfs2_assign_bh(struct inode *inode,
> >   * returns < 0 error if the callback will never be called, otherwise
> >   * the result of the lock will be communicated via the callback.
> >   */
> > -int ocfs2_inode_lock_full(struct inode *inode,
> > -			 struct buffer_head **ret_bh,
> > -			 int ex,
> > -			 int arg_flags)
> > +int ocfs2_inode_lock_full_nested(struct inode *inode,
> > +				 struct buffer_head **ret_bh,
> > +				 int ex,
> > +				 int arg_flags,
> > +				 int subclass)
> >  {
> >  	int status, level, acquired;
> >  	u32 dlm_flags;
> > @@ -2186,7 +2250,12 @@ int ocfs2_inode_lock_full(struct inode *inode,
> >  	if (arg_flags & OCFS2_META_LOCK_NOQUEUE)
> >  		dlm_flags |= DLM_LKF_NOQUEUE;
> >  
> > +#ifdef CONFIG_DEBUG_LOCK_ALLOC
> > +	status = __ocfs2_cluster_lock(osb, lockres, level, dlm_flags,
> > +				      arg_flags, subclass, _RET_IP_);
> > +#else
> >  	status = ocfs2_cluster_lock(osb, lockres, level, dlm_flags, arg_flags);
> > +#endif
> 
> 	This hunk wouldn't be necessary either.
> 
> > --- a/fs/ocfs2/inode.c
> > +++ b/fs/ocfs2/inode.c
> > @@ -215,6 +215,8 @@ bail:
> >  static int ocfs2_init_locked_inode(struct inode *inode, void *opaque)
> >  {
> >  	struct ocfs2_find_inode_args *args = opaque;
> > +	static struct lock_class_key ocfs2_quota_ip_alloc_sem_key,
> > +				     ocfs2_file_ip_alloc_sem_key;
> >  
> >  	mlog_entry("inode = %p, opaque = %p\n", inode, opaque);
> >  
> > @@ -223,6 +225,15 @@ static int ocfs2_init_locked_inode(struct inode *inode, void *opaque)
> >  	if (args->fi_sysfile_type != 0)
> >  		lockdep_set_class(&inode->i_mutex,
> >  			&ocfs2_sysfile_lock_key[args->fi_sysfile_type]);
> > +	if (args->fi_sysfile_type == USER_QUOTA_SYSTEM_INODE ||
> > +	    args->fi_sysfile_type == GROUP_QUOTA_SYSTEM_INODE ||
> > +	    args->fi_sysfile_type == LOCAL_USER_QUOTA_SYSTEM_INODE ||
> > +	    args->fi_sysfile_type == LOCAL_GROUP_QUOTA_SYSTEM_INODE)
> > +		lockdep_set_class(&OCFS2_I(inode)->ip_alloc_sem,
> > +				  &ocfs2_quota_ip_alloc_sem_key);
> > +	else
> > +		lockdep_set_class(&OCFS2_I(inode)->ip_alloc_sem,
> > +				  &ocfs2_file_ip_alloc_sem_key);
> 
> 	These keys don't need to be intialized?
  No. The key here is the fixed adress of that variable - see
include/linux/lockdep.h:
/*
 * Lock-classes are keyed via unique addresses, by embedding the
 * lockclass-key into the kernel (or module) .data section. (For
 * static locks we use the lock address itself as the key.)
 */
struct lockdep_subclass_key {
        char __one_byte;
} __attribute__ ((__packed__));

struct lock_class_key {
        struct lockdep_subclass_key     subkeys[MAX_LOCKDEP_SUBCLASSES];
};

> > diff --git a/fs/ocfs2/inode.h b/fs/ocfs2/inode.h
> > index ea71525..7096516 100644
> > --- a/fs/ocfs2/inode.h
> > +++ b/fs/ocfs2/inode.h
> > @@ -109,6 +109,14 @@ struct ocfs2_inode_info
> >  /* Indicates that the metadata cache should be used as an array. */
> >  #define OCFS2_INODE_CACHE_INLINE	0x00000080
> >  
> > +/* Locking subclasses of inode cluster lock */
> > +enum {
> > +	OI_LS_NORMAL,
> > +	OI_LS_PARENT,
> > +	OI_LS_RENAME1,
> > +	OI_LS_RENAME2,
> > +};
> > +
> 
> 	I couldn't see where OI_LS_NORMAL was ever used, until I 
> realized that your #define of ocfs2_inode_lock_full hardcodes
> OI_LS_NORMAL as '0'.  It should really use the subclass name, right?
  Ah, right. Thanks for spotting this. Fixed.

									Honza

-- 
Jan Kara <jack@suse.cz>
SUSE Labs, CR

^ permalink raw reply	[flat|nested] 20+ messages in thread

* [Ocfs2-devel] [PATCH] ocfs2: Add lockdep annotations
  2009-06-03 21:48     ` Jan Kara
@ 2009-06-03 22:42       ` Joel Becker
  2009-06-04 10:22         ` Jan Kara
  0 siblings, 1 reply; 20+ messages in thread
From: Joel Becker @ 2009-06-03 22:42 UTC (permalink / raw)
  To: ocfs2-devel

On Wed, Jun 03, 2009 at 11:48:52PM +0200, Jan Kara wrote:
> On Wed 03-06-09 09:31:56, Joel Becker wrote:
> > On Tue, Jun 02, 2009 at 02:24:05PM +0200, Jan Kara wrote:
> > > Add lockdep support to OCFS2. The support also covers all of the cluster
> > > locks except for open locks, journal locks, and local quotafile locks. These
> > > are special because they are acquired for a node, not for a particular process
> > > and lockdep cannot deal with such type of locking.
> > 
> > 	I like the idea of ocfs2 having lockdep on everything.
> > 
> > > -static void ocfs2_cluster_unlock(struct ocfs2_super *osb,
> > > -				 struct ocfs2_lock_res *lockres,
> > > -				 int level);
> > > +static void __ocfs2_cluster_unlock(struct ocfs2_super *osb,
> > > +				   struct ocfs2_lock_res *lockres,
> > > +				   int level, unsigned long ip);
> > > +static inline void ocfs2_cluster_unlock(struct ocfs2_super *osb,
> > > +					struct ocfs2_lock_res *lockres,
> > > +					int level)
> > > +{
> > > +	__ocfs2_cluster_unlock(osb, lockres, level, _RET_IP_);
> > > +}
> > 
> > 	I don't know why 'ip' needs to be part of the function prototype
> > - every place you use it, you just pass _RET_IP_.
>   The problem is following: I need to give to lockdep code an IP of a
> function which acquires a lock. For this to be useful I don't want to show
> "ocfs2_cluster_lock" or "ocfs2_rw_lock" all the time. So I have to get IP
> of the function calling e.g. ocfs2_rw_lock and that's why there's this
> ip argument because it's hard to find what the IP of the caller of our
> caller is...

	Doesn't lockdep provide backtraces when things go wrong?  Or is
this used to differentiate between callers internally in lockdep?

> > > @@ -1239,11 +1256,21 @@ static int ocfs2_wait_for_mask_interruptible(struct ocfs2_mask_waiter *mw,
> > >  	return ret;
> > >  }
> > >  
> > > -static int ocfs2_cluster_lock(struct ocfs2_super *osb,
> > > -			      struct ocfs2_lock_res *lockres,
> > > -			      int level,
> > > -			      u32 lkm_flags,
> > > -			      int arg_flags)
> > > +#ifdef CONFIG_DEBUG_LOCK_ALLOC
> > > +static int __ocfs2_cluster_lock(struct ocfs2_super *osb,
> > > +				struct ocfs2_lock_res *lockres,
> > > +				int level,
> > > +				u32 lkm_flags,
> > > +				int arg_flags,
> > > +				int l_subclass,
> > > +				unsigned long ip)
> > > +#else
> > > +static int __ocfs2_cluster_lock(struct ocfs2_super *osb,
> > > +				struct ocfs2_lock_res *lockres,
> > > +				int level,
> > > +				u32 lkm_flags,
> > > +				int arg_flags)
> > > +#endif
> > >  {
> > >  	struct ocfs2_mask_waiter mw;
> > >  	int wait, catch_signals = !(osb->s_mount_opt & OCFS2_MOUNT_NOINTR);
> > > @@ -1386,13 +1413,45 @@ out:
> > >  	}
> > >  	ocfs2_update_lock_stats(lockres, level, &mw, ret);
> > >  
> > > +#ifdef CONFIG_DEBUG_LOCK_ALLOC
> > > +	if (!ret && lockres->l_lockdep_map.key != NULL) {
> > > +		if (level == DLM_LOCK_PR)
> > > +			rwsem_acquire_read(&lockres->l_lockdep_map, l_subclass,
> > > +				!!(arg_flags & OCFS2_META_LOCK_NOQUEUE), ip);
> > > +		else
> > > +			rwsem_acquire(&lockres->l_lockdep_map, l_subclass,
> > > +				!!(arg_flags & OCFS2_META_LOCK_NOQUEUE), ip);
> > > +	}
> > > +#endif
> > >  	mlog_exit(ret);
> > >  	return ret;
> > >  }
> > 
> > 	Here again you've wrapped an ifdef around whether 'ip' is in the
> > argument list or not.  Since it's always _RET_IP_, why not just leave it
> > out of the argument list and just pas _RET_IP_ to the rwsem_acquire
> > calls?  That also eliminates the ugly ifdefs and need for
> > __ocfs2_cluster_lock().
>   I agree ifdefs aren't nice but I wanted to avoid the unnecessary argument
> in !LOCKDEP case. I could also just always pass _RET_IP_ and hope that
> compiler optimizes this in case we don't use the value of ip argument in
> !LOCKDEP case.
>   I can do that if you think it's better.

	I would much rather you just rename to __ocfs2.., add the
argument, call it "caller_ip", and then wrap it with a static inline.
This keeps the ifdefs in one place and much more readable.  You can pass
0 when lockdep isn't enabled:

#ifdef CONFIG_DEBUG_LOCK_ALLOC
static inline int ocfs2_cluster_lock(struct ocfs2_super *osb,
				     struct ocfs2_lock_res *lockres,
				     int level,
				     u32 lkm_flags,
				     int arg_flags)
{
	return __ocfs2_cluster_lock(osb, lockres, level, lkm_flags,
				    arg_flags, _RET_IP_);
}

static inline void ocfs2_cluster_unlock(struct ocfs2_super *osb,
					struct ocfs2_lock_res *lockres,
					int level)
{
	return __ocfs2_cluster_unlock(osb, lockres, level, _RET_IP_);
}
#else
static inline int ocfs2_cluster_lock(struct ocfs2_super *osb,
				     struct ocfs2_lock_res *lockres,
				     int level,
				     u32 lkm_flags,
				     int arg_flags)
{
	return __ocfs2_cluster_lock(osb, lockres, level, lkm_flags,
				    arg_flags, 0);
}

static inline void ocfs2_cluster_unlock(struct ocfs2_super *osb,
					struct ocfs2_lock_res *lockres,
					int level)
{
	return __ocfs2_cluster_unlock(osb, lockres, level, 0);
}
#endif

	Don't worry about the efficiency of passing a sixth argument.
That generally fits on x86, and ocfs2 is one of the worst abusers of
12-argument functions anyway.  Locking code really, really wants
readability first.

> > > @@ -223,6 +225,15 @@ static int ocfs2_init_locked_inode(struct inode *inode, void *opaque)
> > >  	if (args->fi_sysfile_type != 0)
> > >  		lockdep_set_class(&inode->i_mutex,
> > >  			&ocfs2_sysfile_lock_key[args->fi_sysfile_type]);
> > > +	if (args->fi_sysfile_type == USER_QUOTA_SYSTEM_INODE ||
> > > +	    args->fi_sysfile_type == GROUP_QUOTA_SYSTEM_INODE ||
> > > +	    args->fi_sysfile_type == LOCAL_USER_QUOTA_SYSTEM_INODE ||
> > > +	    args->fi_sysfile_type == LOCAL_GROUP_QUOTA_SYSTEM_INODE)
> > > +		lockdep_set_class(&OCFS2_I(inode)->ip_alloc_sem,
> > > +				  &ocfs2_quota_ip_alloc_sem_key);
> > > +	else
> > > +		lockdep_set_class(&OCFS2_I(inode)->ip_alloc_sem,
> > > +				  &ocfs2_file_ip_alloc_sem_key);
> > 
> > 	These keys don't need to be intialized?
>   No. The key here is the fixed adress of that variable - see
> include/linux/lockdep.h:
> /*
>  * Lock-classes are keyed via unique addresses, by embedding the
>  * lockclass-key into the kernel (or module) .data section. (For
>  * static locks we use the lock address itself as the key.)
>  */
> struct lockdep_subclass_key {
>         char __one_byte;
> } __attribute__ ((__packed__));
> 
> struct lock_class_key {
>         struct lockdep_subclass_key     subkeys[MAX_LOCKDEP_SUBCLASSES];
> };

	Ok, got it.

Joel

-- 

"Heav'n hath no rage like love to hatred turn'd, nor Hell a fury,
 like a woman scorn'd."
        - William Congreve

Joel Becker
Principal Software Developer
Oracle
E-mail: joel.becker at oracle.com
Phone: (650) 506-8127

^ permalink raw reply	[flat|nested] 20+ messages in thread

* [Ocfs2-devel] [PATCH 0/7] [RESEND] Fix some deadlocks in quota code and implement lockdep for cluster locks
  2009-06-02 12:23 [Ocfs2-devel] [PATCH 0/7] [RESEND] Fix some deadlocks in quota code and implement lockdep for cluster locks Jan Kara
                   ` (7 preceding siblings ...)
  2009-06-02 18:43 ` [Ocfs2-devel] [PATCH 0/7] [RESEND] Fix some deadlocks in quota code and implement lockdep for cluster locks Joel Becker
@ 2009-06-04  2:26 ` Joel Becker
  2009-06-04 10:09   ` Jan Kara
  8 siblings, 1 reply; 20+ messages in thread
From: Joel Becker @ 2009-06-04  2:26 UTC (permalink / raw)
  To: ocfs2-devel

On Tue, Jun 02, 2009 at 02:23:58PM +0200, Jan Kara wrote:
>   I'm resending this patch series. It's rediffed against linux-next branch of
> Joel's git tree. The first four patches are obvious fixes of deadlocks in quota
> code and should go in as soon as possible. The other three patches implement
> lockdep support for OCFS2 cluster locks. So you can have a look whether the
> code make sence to you and possibly merge them. They should be NOP when lockdep
> is disabled and help a lot when debugging various locking problems (with these
> patches I found the problems fixed in the beginning of the series).
>   BTW: Currently lockdep complains about some configfs issue and turns itself
> off so that's why I didn't get to finding more deadlocks ;)

	The fixes are now in the merge-window branch of ocfs2.git.  They
also live on the quota-fixes topic branch.  The lockdep changes still
need some marinating.

Joel

-- 

Life's Little Instruction Book #274

	"Leave everything a little better than you found it."

Joel Becker
Principal Software Developer
Oracle
E-mail: joel.becker at oracle.com
Phone: (650) 506-8127

^ permalink raw reply	[flat|nested] 20+ messages in thread

* [Ocfs2-devel] [PATCH 0/7] [RESEND] Fix some deadlocks in quota code and implement lockdep for cluster locks
  2009-06-04  2:26 ` Joel Becker
@ 2009-06-04 10:09   ` Jan Kara
  2009-06-09 17:05     ` Joel Becker
  0 siblings, 1 reply; 20+ messages in thread
From: Jan Kara @ 2009-06-04 10:09 UTC (permalink / raw)
  To: ocfs2-devel

On Wed 03-06-09 19:26:46, Joel Becker wrote:
> On Tue, Jun 02, 2009 at 02:23:58PM +0200, Jan Kara wrote:
> >   I'm resending this patch series. It's rediffed against linux-next branch of
> > Joel's git tree. The first four patches are obvious fixes of deadlocks in quota
> > code and should go in as soon as possible. The other three patches implement
> > lockdep support for OCFS2 cluster locks. So you can have a look whether the
> > code make sence to you and possibly merge them. They should be NOP when lockdep
> > is disabled and help a lot when debugging various locking problems (with these
> > patches I found the problems fixed in the beginning of the series).
> >   BTW: Currently lockdep complains about some configfs issue and turns itself
> > off so that's why I didn't get to finding more deadlocks ;)
> 
> 	The fixes are now in the merge-window branch of ocfs2.git.  They
> also live on the quota-fixes topic branch.  The lockdep changes still
> need some marinating.
  Thanks. With lockdep patch, there's this 'ip' parameter thing causing
ifdefs but after we agree how to resolve this, it can go into linux-next
branch, right?

									Honza
-- 
Jan Kara <jack@suse.cz>
SUSE Labs, CR

^ permalink raw reply	[flat|nested] 20+ messages in thread

* [Ocfs2-devel] [PATCH] ocfs2: Add lockdep annotations
  2009-06-03 22:42       ` Joel Becker
@ 2009-06-04 10:22         ` Jan Kara
  0 siblings, 0 replies; 20+ messages in thread
From: Jan Kara @ 2009-06-04 10:22 UTC (permalink / raw)
  To: ocfs2-devel

On Wed 03-06-09 15:42:41, Joel Becker wrote:
> On Wed, Jun 03, 2009 at 11:48:52PM +0200, Jan Kara wrote:
> > On Wed 03-06-09 09:31:56, Joel Becker wrote:
> > > On Tue, Jun 02, 2009 at 02:24:05PM +0200, Jan Kara wrote:
> > > > Add lockdep support to OCFS2. The support also covers all of the cluster
> > > > locks except for open locks, journal locks, and local quotafile locks. These
> > > > are special because they are acquired for a node, not for a particular process
> > > > and lockdep cannot deal with such type of locking.
> > > 
> > > 	I like the idea of ocfs2 having lockdep on everything.
> > > 
> > > > -static void ocfs2_cluster_unlock(struct ocfs2_super *osb,
> > > > -				 struct ocfs2_lock_res *lockres,
> > > > -				 int level);
> > > > +static void __ocfs2_cluster_unlock(struct ocfs2_super *osb,
> > > > +				   struct ocfs2_lock_res *lockres,
> > > > +				   int level, unsigned long ip);
> > > > +static inline void ocfs2_cluster_unlock(struct ocfs2_super *osb,
> > > > +					struct ocfs2_lock_res *lockres,
> > > > +					int level)
> > > > +{
> > > > +	__ocfs2_cluster_unlock(osb, lockres, level, _RET_IP_);
> > > > +}
> > > 
> > > 	I don't know why 'ip' needs to be part of the function prototype
> > > - every place you use it, you just pass _RET_IP_.
> >   The problem is following: I need to give to lockdep code an IP of a
> > function which acquires a lock. For this to be useful I don't want to show
> > "ocfs2_cluster_lock" or "ocfs2_rw_lock" all the time. So I have to get IP
> > of the function calling e.g. ocfs2_rw_lock and that's why there's this
> > ip argument because it's hard to find what the IP of the caller of our
> > caller is...
> 
> 	Doesn't lockdep provide backtraces when things go wrong?  Or is
> this used to differentiate between callers internally in lockdep?
  Yes, it does. But this IP is for the message like:
 =======================================================
 [ INFO: possible circular locking dependency detected ]
 2.6.25-0.161.rc7.fc9.i686 #1
 -------------------------------------------------------
 NetworkManager/2308 is trying to acquire lock:
  (events){--..}, at: [flush_workqueue+0/133] flush_workqueue+0x0/0x85
                       ^^^^^^^^^^^^^^^^^^^^^^ HERE
 
 but task is already holding lock:
  (rtnl_mutex){--..}, at: [rtnetlink_rcv+18/38] rtnetlink_rcv+0x12/0x26
                          ^^^^^^^^^^^^^^^^^^^^ HERE
 
 which lock already depends on the new lock.
 =======================================================

So it's usually possible to find all the needed information in the
backtraces but it's much more convenient to find it here at the beginning.
BTW: What I did just mimics which IP is passed for ordinary mutexes so it's
also more consistent.

> > > > @@ -1239,11 +1256,21 @@ static int ocfs2_wait_for_mask_interruptible(struct ocfs2_mask_waiter *mw,
> > > >  	return ret;
> > > >  }
> > > >  
> > > > -static int ocfs2_cluster_lock(struct ocfs2_super *osb,
> > > > -			      struct ocfs2_lock_res *lockres,
> > > > -			      int level,
> > > > -			      u32 lkm_flags,
> > > > -			      int arg_flags)
> > > > +#ifdef CONFIG_DEBUG_LOCK_ALLOC
> > > > +static int __ocfs2_cluster_lock(struct ocfs2_super *osb,
> > > > +				struct ocfs2_lock_res *lockres,
> > > > +				int level,
> > > > +				u32 lkm_flags,
> > > > +				int arg_flags,
> > > > +				int l_subclass,
> > > > +				unsigned long ip)
> > > > +#else
> > > > +static int __ocfs2_cluster_lock(struct ocfs2_super *osb,
> > > > +				struct ocfs2_lock_res *lockres,
> > > > +				int level,
> > > > +				u32 lkm_flags,
> > > > +				int arg_flags)
> > > > +#endif
> > > >  {
> > > >  	struct ocfs2_mask_waiter mw;
> > > >  	int wait, catch_signals = !(osb->s_mount_opt & OCFS2_MOUNT_NOINTR);
> > > > @@ -1386,13 +1413,45 @@ out:
> > > >  	}
> > > >  	ocfs2_update_lock_stats(lockres, level, &mw, ret);
> > > >  
> > > > +#ifdef CONFIG_DEBUG_LOCK_ALLOC
> > > > +	if (!ret && lockres->l_lockdep_map.key != NULL) {
> > > > +		if (level == DLM_LOCK_PR)
> > > > +			rwsem_acquire_read(&lockres->l_lockdep_map, l_subclass,
> > > > +				!!(arg_flags & OCFS2_META_LOCK_NOQUEUE), ip);
> > > > +		else
> > > > +			rwsem_acquire(&lockres->l_lockdep_map, l_subclass,
> > > > +				!!(arg_flags & OCFS2_META_LOCK_NOQUEUE), ip);
> > > > +	}
> > > > +#endif
> > > >  	mlog_exit(ret);
> > > >  	return ret;
> > > >  }
> > > 
> > > 	Here again you've wrapped an ifdef around whether 'ip' is in the
> > > argument list or not.  Since it's always _RET_IP_, why not just leave it
> > > out of the argument list and just pas _RET_IP_ to the rwsem_acquire
> > > calls?  That also eliminates the ugly ifdefs and need for
> > > __ocfs2_cluster_lock().
> >   I agree ifdefs aren't nice but I wanted to avoid the unnecessary argument
> > in !LOCKDEP case. I could also just always pass _RET_IP_ and hope that
> > compiler optimizes this in case we don't use the value of ip argument in
> > !LOCKDEP case.
> >   I can do that if you think it's better.
> 
> 	I would much rather you just rename to __ocfs2.., add the
> argument, call it "caller_ip", and then wrap it with a static inline.
> This keeps the ifdefs in one place and much more readable.  You can pass
> 0 when lockdep isn't enabled:
> 
> #ifdef CONFIG_DEBUG_LOCK_ALLOC
> static inline int ocfs2_cluster_lock(struct ocfs2_super *osb,
> 				     struct ocfs2_lock_res *lockres,
> 				     int level,
> 				     u32 lkm_flags,
> 				     int arg_flags)
> {
> 	return __ocfs2_cluster_lock(osb, lockres, level, lkm_flags,
> 				    arg_flags, _RET_IP_);
> }
> 
> static inline void ocfs2_cluster_unlock(struct ocfs2_super *osb,
> 					struct ocfs2_lock_res *lockres,
> 					int level)
> {
> 	return __ocfs2_cluster_unlock(osb, lockres, level, _RET_IP_);
> }
> #else
> static inline int ocfs2_cluster_lock(struct ocfs2_super *osb,
> 				     struct ocfs2_lock_res *lockres,
> 				     int level,
> 				     u32 lkm_flags,
> 				     int arg_flags)
> {
> 	return __ocfs2_cluster_lock(osb, lockres, level, lkm_flags,
> 				    arg_flags, 0);
> }
> 
> static inline void ocfs2_cluster_unlock(struct ocfs2_super *osb,
> 					struct ocfs2_lock_res *lockres,
> 					int level)
> {
> 	return __ocfs2_cluster_unlock(osb, lockres, level, 0);
> }
> #endif
> 
> 	Don't worry about the efficiency of passing a sixth argument.
> That generally fits on x86, and ocfs2 is one of the worst abusers of
> 12-argument functions anyway.  Locking code really, really wants
> readability first.
  OK, I like what you suggest. Will do.

								Honza
-- 
Jan Kara <jack@suse.cz>
SUSE Labs, CR

^ permalink raw reply	[flat|nested] 20+ messages in thread

* [Ocfs2-devel] [PATCH 0/7] [RESEND] Fix some deadlocks in quota code and implement lockdep for cluster locks
  2009-06-04 10:09   ` Jan Kara
@ 2009-06-09 17:05     ` Joel Becker
  0 siblings, 0 replies; 20+ messages in thread
From: Joel Becker @ 2009-06-09 17:05 UTC (permalink / raw)
  To: ocfs2-devel

On Thu, Jun 04, 2009 at 12:09:36PM +0200, Jan Kara wrote:
> On Wed 03-06-09 19:26:46, Joel Becker wrote:
> > 	The fixes are now in the merge-window branch of ocfs2.git.  They
> > also live on the quota-fixes topic branch.  The lockdep changes still
> > need some marinating.
>   Thanks. With lockdep patch, there's this 'ip' parameter thing causing
> ifdefs but after we agree how to resolve this, it can go into linux-next
> branch, right?

	Yeah.  I'm going to hold off on merging it until the 2.6.32
window so that it can get some testing, as it's a significant change to
our locking code.  The rest of the fixes are going for 2.6.31.

Joel

-- 

"Sometimes when reading Goethe I have the paralyzing suspicion
 that he is trying to be funny."
         - Guy Davenport

Joel Becker
Principal Software Developer
Oracle
E-mail: joel.becker at oracle.com
Phone: (650) 506-8127

^ permalink raw reply	[flat|nested] 20+ messages in thread

end of thread, other threads:[~2009-06-09 17:05 UTC | newest]

Thread overview: 20+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2009-06-02 12:23 [Ocfs2-devel] [PATCH 0/7] [RESEND] Fix some deadlocks in quota code and implement lockdep for cluster locks Jan Kara
2009-06-02 12:23 ` [Ocfs2-devel] [PATCH] ocfs2: Fix possible deadlock in ocfs2_global_read_dquot() Jan Kara
2009-06-03  0:15   ` Joel Becker
2009-06-03 13:31     ` Jan Kara
2009-06-02 12:24 ` [Ocfs2-devel] [PATCH] ocfs2: Fix lock inversion in ocfs2_local_read_info() Jan Kara
2009-06-02 12:24 ` [Ocfs2-devel] [PATCH] ocfs2: Fix possible deadlock with quotas in ocfs2_setattr() Jan Kara
2009-06-02 12:24 ` [Ocfs2-devel] [PATCH] ocfs2: Fix possible deadlock in quota recovery Jan Kara
2009-06-02 12:24 ` [Ocfs2-devel] [PATCH] ocfs2: Correct ordering of ip_alloc_sem and localloc locks for directories Jan Kara
2009-06-03 21:05   ` Mark Fasheh
2009-06-02 12:24 ` [Ocfs2-devel] [PATCH] vfs: Set special lockdep map for dirs only if not set by fs Jan Kara
2009-06-02 12:53   ` Peter Zijlstra
2009-06-02 12:24 ` [Ocfs2-devel] [PATCH] ocfs2: Add lockdep annotations Jan Kara
2009-06-03 16:31   ` Joel Becker
2009-06-03 21:48     ` Jan Kara
2009-06-03 22:42       ` Joel Becker
2009-06-04 10:22         ` Jan Kara
2009-06-02 18:43 ` [Ocfs2-devel] [PATCH 0/7] [RESEND] Fix some deadlocks in quota code and implement lockdep for cluster locks Joel Becker
2009-06-04  2:26 ` Joel Becker
2009-06-04 10:09   ` Jan Kara
2009-06-09 17:05     ` Joel Becker

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.