All of lore.kernel.org
 help / color / mirror / Atom feed
* [Cluster-devel] [PATCH 0/8] GFS2 shrinker deadlock
@ 2017-05-31 15:03 Andreas Gruenbacher
  2017-05-31 15:03 ` [Cluster-devel] [PATCH 1/8] gfs2: Get rid of flush_delayed_work in gfs2_evict_inode Andreas Gruenbacher
                   ` (7 more replies)
  0 siblings, 8 replies; 20+ messages in thread
From: Andreas Gruenbacher @ 2017-05-31 15:03 UTC (permalink / raw)
  To: cluster-devel.redhat.com

This patch queue fixes a long-standing GFS2 bug that can cause cluster nodes to
lock up under memory pressure.

Right now, when inodes are evicted, GFS2 calls into DLM.  Inode eviction can be
triggered by memory pressure, and it can happen in the context of a random
user-space process.  If DLM happens to block in the process in question (for
example, it that process is a fence agent), it will deadlock.

This patch queue stops GFS2 from calling into DLM on the inode evict path when
under memory pressure.  It does so by first decoupling destroying inodes and
putting their associated glocks, which is what ends up calling into DLM.
Second, when under memory pressure, it moves putting glocks into work queue
context where it cannot block DLM.

This patch queue is a bit complicated, so careful reviews would be very
welcome ;-)

Thanks,
Andreas

Andreas Gruenbacher (8):
  gfs2: Get rid of flush_delayed_work in gfs2_evict_inode
  gfs2: Protect gl->gl_object by spin lock
  gfs2: Clean up glock work enqueuing
  gfs2: Always check block type in gfs2_evict_inode
  gfs2: gfs2_glock_get: Wait on freeing glocks
  gfs2: Get rid of gfs2_set_nlink
  gfs2: gfs2_evict_inode: Put glocks asynchronously
  gfs2: Warn when not deleting inodes under memory pressure

 fs/gfs2/bmap.c   |   7 +-
 fs/gfs2/dir.c    |   6 +-
 fs/gfs2/glock.c  | 247 ++++++++++++++++++++++++++++++++++++++++---------------
 fs/gfs2/glock.h  |   9 ++
 fs/gfs2/glops.c  |  77 +++++++++--------
 fs/gfs2/incore.h |   2 +
 fs/gfs2/inode.c  |  32 +++----
 fs/gfs2/lops.c   |   9 +-
 fs/gfs2/rgrp.c   |   6 +-
 fs/gfs2/super.c  |  57 ++++++++++---
 fs/gfs2/xattr.c  |   6 +-
 11 files changed, 322 insertions(+), 136 deletions(-)

-- 
2.7.4



^ permalink raw reply	[flat|nested] 20+ messages in thread

* [Cluster-devel] [PATCH 1/8] gfs2: Get rid of flush_delayed_work in gfs2_evict_inode
  2017-05-31 15:03 [Cluster-devel] [PATCH 0/8] GFS2 shrinker deadlock Andreas Gruenbacher
@ 2017-05-31 15:03 ` Andreas Gruenbacher
  2017-06-01 15:13   ` Steven Whitehouse
  2017-05-31 15:03 ` [Cluster-devel] [PATCH 2/8] gfs2: Protect gl->gl_object by spin lock Andreas Gruenbacher
                   ` (6 subsequent siblings)
  7 siblings, 1 reply; 20+ messages in thread
From: Andreas Gruenbacher @ 2017-05-31 15:03 UTC (permalink / raw)
  To: cluster-devel.redhat.com

So far, gfs2_evict_inode clears gl->gl_object and then flushes the glock
work queue to make sure that inode glops which dereference gl->gl_object
have finished running before the inode is destroyed.  However, flushing
the work queue may do more work than needed, and in particular, it may
call into DLM, which we want to avoid here.  Use a bit lock
(GIF_GLOP_PENDING) to synchronize between the inode glops and
gfs2_evict_inode instead to get rid of the flushing.

In addition, flush the work queues of existing glocks before reusing
them for new inodes to get those glocks into a known state: the glock
state engine currently doesn't handle glock re-appropriation correctly.
(We may be able to fix the glock state engine instead later.)

Based on a patch by Steven Whitehouse <swhiteho@redhat.com>.

Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
---
 fs/gfs2/glock.h  |  7 +++++++
 fs/gfs2/glops.c  | 39 ++++++++++++++++++++++++++++++++-------
 fs/gfs2/incore.h |  1 +
 fs/gfs2/inode.c  |  7 ++++---
 fs/gfs2/super.c  |  4 ++--
 5 files changed, 46 insertions(+), 12 deletions(-)

diff --git a/fs/gfs2/glock.h b/fs/gfs2/glock.h
index ab1ef32..9ad4a6a 100644
--- a/fs/gfs2/glock.h
+++ b/fs/gfs2/glock.h
@@ -257,4 +257,11 @@ static inline bool gfs2_holder_initialized(struct gfs2_holder *gh)
 	return gh->gh_gl;
 }
 
+static inline void glock_set_object(struct gfs2_glock *gl, void *object)
+{
+	spin_lock(&gl->gl_lockref.lock);
+	gl->gl_object = object;
+	spin_unlock(&gl->gl_lockref.lock);
+}
+
 #endif /* __GLOCK_DOT_H__ */
diff --git a/fs/gfs2/glops.c b/fs/gfs2/glops.c
index 5db59d4..7449b19 100644
--- a/fs/gfs2/glops.c
+++ b/fs/gfs2/glops.c
@@ -197,6 +197,27 @@ static void rgrp_go_inval(struct gfs2_glock *gl, int flags)
 		rgd->rd_flags &= ~GFS2_RDF_UPTODATE;
 }
 
+static struct gfs2_inode *gfs2_glock2inode(struct gfs2_glock *gl)
+{
+	struct gfs2_inode *ip;
+
+	spin_lock(&gl->gl_lockref.lock);
+	ip = gl->gl_object;
+	if (ip)
+		set_bit(GIF_GLOP_PENDING, &ip->i_flags);
+	spin_unlock(&gl->gl_lockref.lock);
+	return ip;
+}
+
+static void gfs2_clear_glop_pending(struct gfs2_inode *ip)
+{
+	if (!ip)
+		return;
+
+	clear_bit_unlock(GIF_GLOP_PENDING, &ip->i_flags);
+	wake_up_bit(&ip->i_flags, GIF_GLOP_PENDING);
+}
+
 /**
  * inode_go_sync - Sync the dirty data and/or metadata for an inode glock
  * @gl: the glock protecting the inode
@@ -205,25 +226,24 @@ static void rgrp_go_inval(struct gfs2_glock *gl, int flags)
 
 static void inode_go_sync(struct gfs2_glock *gl)
 {
-	struct gfs2_inode *ip = gl->gl_object;
+	struct gfs2_inode *ip = gfs2_glock2inode(gl);
+	int isreg = ip && S_ISREG(ip->i_inode.i_mode);
 	struct address_space *metamapping = gfs2_glock2aspace(gl);
 	int error;
 
-	if (ip && !S_ISREG(ip->i_inode.i_mode))
-		ip = NULL;
-	if (ip) {
+	if (isreg) {
 		if (test_and_clear_bit(GIF_SW_PAGED, &ip->i_flags))
 			unmap_shared_mapping_range(ip->i_inode.i_mapping, 0, 0);
 		inode_dio_wait(&ip->i_inode);
 	}
 	if (!test_and_clear_bit(GLF_DIRTY, &gl->gl_flags))
-		return;
+		goto out;
 
 	GLOCK_BUG_ON(gl, gl->gl_state != LM_ST_EXCLUSIVE);
 
 	gfs2_log_flush(gl->gl_name.ln_sbd, gl, NORMAL_FLUSH);
 	filemap_fdatawrite(metamapping);
-	if (ip) {
+	if (isreg) {
 		struct address_space *mapping = ip->i_inode.i_mapping;
 		filemap_fdatawrite(mapping);
 		error = filemap_fdatawait(mapping);
@@ -238,6 +258,9 @@ static void inode_go_sync(struct gfs2_glock *gl)
 	 */
 	smp_mb__before_atomic();
 	clear_bit(GLF_DIRTY, &gl->gl_flags);
+
+out:
+	gfs2_clear_glop_pending(ip);
 }
 
 /**
@@ -253,7 +276,7 @@ static void inode_go_sync(struct gfs2_glock *gl)
 
 static void inode_go_inval(struct gfs2_glock *gl, int flags)
 {
-	struct gfs2_inode *ip = gl->gl_object;
+	struct gfs2_inode *ip = gfs2_glock2inode(gl);
 
 	gfs2_assert_withdraw(gl->gl_name.ln_sbd, !atomic_read(&gl->gl_ail_count));
 
@@ -274,6 +297,8 @@ static void inode_go_inval(struct gfs2_glock *gl, int flags)
 	}
 	if (ip && S_ISREG(ip->i_inode.i_mode))
 		truncate_inode_pages(ip->i_inode.i_mapping, 0);
+
+	gfs2_clear_glop_pending(ip);
 }
 
 /**
diff --git a/fs/gfs2/incore.h b/fs/gfs2/incore.h
index b7cf65d..b6f5b8d 100644
--- a/fs/gfs2/incore.h
+++ b/fs/gfs2/incore.h
@@ -386,6 +386,7 @@ enum {
 	GIF_SW_PAGED		= 3,
 	GIF_ORDERED		= 4,
 	GIF_FREE_VFS_INODE      = 5,
+	GIF_GLOP_PENDING	= 6,
 };
 
 struct gfs2_inode {
diff --git a/fs/gfs2/inode.c b/fs/gfs2/inode.c
index 9f605ea..912d4e6 100644
--- a/fs/gfs2/inode.c
+++ b/fs/gfs2/inode.c
@@ -144,7 +144,8 @@ struct inode *gfs2_inode_lookup(struct super_block *sb, unsigned int type,
 		error = gfs2_glock_get(sdp, no_addr, &gfs2_inode_glops, CREATE, &ip->i_gl);
 		if (unlikely(error))
 			goto fail;
-		ip->i_gl->gl_object = ip;
+		flush_delayed_work(&ip->i_gl->gl_work);
+		glock_set_object(ip->i_gl, ip);
 
 		error = gfs2_glock_get(sdp, no_addr, &gfs2_iopen_glops, CREATE, &io_gl);
 		if (unlikely(error))
@@ -173,8 +174,8 @@ struct inode *gfs2_inode_lookup(struct super_block *sb, unsigned int type,
 		error = gfs2_glock_nq_init(io_gl, LM_ST_SHARED, GL_EXACT, &ip->i_iopen_gh);
 		if (unlikely(error))
 			goto fail_put;
-
-		ip->i_iopen_gh.gh_gl->gl_object = ip;
+		flush_delayed_work(&ip->i_iopen_gh.gh_gl->gl_work);
+		glock_set_object(ip->i_iopen_gh.gh_gl, ip);
 		gfs2_glock_put(io_gl);
 		io_gl = NULL;
 
diff --git a/fs/gfs2/super.c b/fs/gfs2/super.c
index 29b0473..7d12c12 100644
--- a/fs/gfs2/super.c
+++ b/fs/gfs2/super.c
@@ -1631,8 +1631,8 @@ static void gfs2_evict_inode(struct inode *inode)
 	gfs2_ordered_del_inode(ip);
 	clear_inode(inode);
 	gfs2_dir_hash_inval(ip);
-	ip->i_gl->gl_object = NULL;
-	flush_delayed_work(&ip->i_gl->gl_work);
+	glock_set_object(ip->i_gl, NULL);
+	wait_on_bit_io(&ip->i_flags, GIF_GLOP_PENDING, TASK_UNINTERRUPTIBLE);
 	gfs2_glock_add_to_lru(ip->i_gl);
 	gfs2_glock_put(ip->i_gl);
 	ip->i_gl = NULL;
-- 
2.7.4



^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [Cluster-devel] [PATCH 2/8] gfs2: Protect gl->gl_object by spin lock
  2017-05-31 15:03 [Cluster-devel] [PATCH 0/8] GFS2 shrinker deadlock Andreas Gruenbacher
  2017-05-31 15:03 ` [Cluster-devel] [PATCH 1/8] gfs2: Get rid of flush_delayed_work in gfs2_evict_inode Andreas Gruenbacher
@ 2017-05-31 15:03 ` Andreas Gruenbacher
  2017-06-01 15:16   ` Steven Whitehouse
  2017-05-31 15:03 ` [Cluster-devel] [PATCH 3/8] gfs2: Clean up glock work enqueuing Andreas Gruenbacher
                   ` (5 subsequent siblings)
  7 siblings, 1 reply; 20+ messages in thread
From: Andreas Gruenbacher @ 2017-05-31 15:03 UTC (permalink / raw)
  To: cluster-devel.redhat.com

Put all remaining accesses to gl->gl_object under the
gl->gl_lockref.lock spinlock to prevent races.

Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
---
 fs/gfs2/bmap.c  |  7 ++++++-
 fs/gfs2/dir.c   |  6 +++++-
 fs/gfs2/glops.c | 10 ++++++++--
 fs/gfs2/inode.c |  8 ++++----
 fs/gfs2/lops.c  |  9 +++++++--
 fs/gfs2/rgrp.c  |  6 ++----
 fs/gfs2/super.c | 11 ++++++++---
 fs/gfs2/xattr.c |  6 +++++-
 8 files changed, 45 insertions(+), 18 deletions(-)

diff --git a/fs/gfs2/bmap.c b/fs/gfs2/bmap.c
index 4d810be..e43decb 100644
--- a/fs/gfs2/bmap.c
+++ b/fs/gfs2/bmap.c
@@ -970,7 +970,12 @@ static int sweep_bh_for_rgrps(struct gfs2_inode *ip, struct gfs2_holder *rd_gh,
 			continue;
 		bn = be64_to_cpu(*p);
 		if (gfs2_holder_initialized(rd_gh)) {
-			rgd = (struct gfs2_rgrpd *)rd_gh->gh_gl->gl_object;
+			struct gfs2_glock *gl = rd_gh->gh_gl;
+
+			spin_lock(&gl->gl_lockref.lock);
+			rgd = rd_gh->gh_gl->gl_object;
+			spin_unlock(&gl->gl_lockref.lock);
+
 			gfs2_assert_withdraw(sdp,
 				     gfs2_glock_is_locked_by_me(rd_gh->gh_gl));
 		} else {
diff --git a/fs/gfs2/dir.c b/fs/gfs2/dir.c
index 7911321..965f1c7 100644
--- a/fs/gfs2/dir.c
+++ b/fs/gfs2/dir.c
@@ -2031,9 +2031,13 @@ static int leaf_dealloc(struct gfs2_inode *dip, u32 index, u32 len,
 	gfs2_rlist_alloc(&rlist, LM_ST_EXCLUSIVE);
 
 	for (x = 0; x < rlist.rl_rgrps; x++) {
+		struct gfs2_glock *gl = rlist.rl_ghs[x].gh_gl;
 		struct gfs2_rgrpd *rgd;
-		rgd = rlist.rl_ghs[x].gh_gl->gl_object;
+
+		spin_lock(&gl->gl_lockref.lock);
+		rgd = gl->gl_object;
 		rg_blocks += rgd->rd_length;
+		spin_unlock(&gl->gl_lockref.lock);
 	}
 
 	error = gfs2_glock_nq_m(rlist.rl_rgrps, rlist.rl_ghs);
diff --git a/fs/gfs2/glops.c b/fs/gfs2/glops.c
index 7449b19..238a78c 100644
--- a/fs/gfs2/glops.c
+++ b/fs/gfs2/glops.c
@@ -184,17 +184,23 @@ static void rgrp_go_inval(struct gfs2_glock *gl, int flags)
 {
 	struct gfs2_sbd *sdp = gl->gl_name.ln_sbd;
 	struct address_space *mapping = &sdp->sd_aspace;
-	struct gfs2_rgrpd *rgd = gl->gl_object;
+	struct gfs2_rgrpd *rgd;
 
+	spin_lock(&gl->gl_lockref.lock);
+	rgd = gl->gl_object;
 	if (rgd)
 		gfs2_rgrp_brelse(rgd);
+	spin_unlock(&gl->gl_lockref.lock);
 
 	WARN_ON_ONCE(!(flags & DIO_METADATA));
 	gfs2_assert_withdraw(sdp, !atomic_read(&gl->gl_ail_count));
 	truncate_inode_pages_range(mapping, gl->gl_vm.start, gl->gl_vm.end);
 
+	spin_lock(&gl->gl_lockref.lock);
+	rgd = gl->gl_object;
 	if (rgd)
 		rgd->rd_flags &= ~GFS2_RDF_UPTODATE;
+	spin_unlock(&gl->gl_lockref.lock);
 }
 
 static struct gfs2_inode *gfs2_glock2inode(struct gfs2_glock *gl)
@@ -566,7 +572,7 @@ static int freeze_go_demote_ok(const struct gfs2_glock *gl)
  */
 static void iopen_go_callback(struct gfs2_glock *gl, bool remote)
 {
-	struct gfs2_inode *ip = (struct gfs2_inode *)gl->gl_object;
+	struct gfs2_inode *ip = gl->gl_object;
 	struct gfs2_sbd *sdp = gl->gl_name.ln_sbd;
 
 	if (!remote || (sdp->sd_vfs->s_flags & MS_RDONLY))
diff --git a/fs/gfs2/inode.c b/fs/gfs2/inode.c
index 912d4e6..50108fa 100644
--- a/fs/gfs2/inode.c
+++ b/fs/gfs2/inode.c
@@ -202,14 +202,14 @@ struct inode *gfs2_inode_lookup(struct super_block *sb, unsigned int type,
 
 fail_refresh:
 	ip->i_iopen_gh.gh_flags |= GL_NOCACHE;
-	ip->i_iopen_gh.gh_gl->gl_object = NULL;
+	glock_set_object(ip->i_iopen_gh.gh_gl, NULL);
 	gfs2_glock_dq_uninit(&ip->i_iopen_gh);
 fail_put:
 	if (io_gl)
 		gfs2_glock_put(io_gl);
 	if (gfs2_holder_initialized(&i_gh))
 		gfs2_glock_dq_uninit(&i_gh);
-	ip->i_gl->gl_object = NULL;
+	glock_set_object(ip->i_gl, NULL);
 fail:
 	iget_failed(inode);
 	return ERR_PTR(error);
@@ -706,7 +706,7 @@ static int gfs2_create_inode(struct inode *dir, struct dentry *dentry,
 	if (error)
 		goto fail_free_inode;
 
-	ip->i_gl->gl_object = ip;
+	glock_set_object(ip->i_gl, ip);
 	error = gfs2_glock_nq_init(ip->i_gl, LM_ST_EXCLUSIVE, GL_SKIP, ghs + 1);
 	if (error)
 		goto fail_free_inode;
@@ -732,7 +732,7 @@ static int gfs2_create_inode(struct inode *dir, struct dentry *dentry,
 	if (error)
 		goto fail_gunlock2;
 
-	ip->i_iopen_gh.gh_gl->gl_object = ip;
+	glock_set_object(ip->i_iopen_gh.gh_gl, ip);
 	gfs2_glock_put(io_gl);
 	gfs2_set_iop(inode);
 	insert_inode_hash(inode);
diff --git a/fs/gfs2/lops.c b/fs/gfs2/lops.c
index b1f9144..93c1074 100644
--- a/fs/gfs2/lops.c
+++ b/fs/gfs2/lops.c
@@ -71,10 +71,15 @@ static void maybe_release_space(struct gfs2_bufdata *bd)
 {
 	struct gfs2_glock *gl = bd->bd_gl;
 	struct gfs2_sbd *sdp = gl->gl_name.ln_sbd;
-	struct gfs2_rgrpd *rgd = gl->gl_object;
+	struct gfs2_rgrpd *rgd;
 	unsigned int index = bd->bd_bh->b_blocknr - gl->gl_name.ln_number;
-	struct gfs2_bitmap *bi = rgd->rd_bits + index;
+	struct gfs2_bitmap *bi;
 
+	spin_lock(&gl->gl_lockref.lock);
+	rgd = gl->gl_object;
+	spin_unlock(&gl->gl_lockref.lock);
+
+	bi = rgd->rd_bits + index;
 	if (bi->bi_clone == NULL)
 		return;
 	if (sdp->sd_args.ar_discard)
diff --git a/fs/gfs2/rgrp.c b/fs/gfs2/rgrp.c
index 83c9909..836e38b 100644
--- a/fs/gfs2/rgrp.c
+++ b/fs/gfs2/rgrp.c
@@ -705,9 +705,7 @@ void gfs2_clear_rgrpd(struct gfs2_sbd *sdp)
 		rb_erase(n, &sdp->sd_rindex_tree);
 
 		if (gl) {
-			spin_lock(&gl->gl_lockref.lock);
-			gl->gl_object = NULL;
-			spin_unlock(&gl->gl_lockref.lock);
+			glock_set_object(gl, NULL);
 			gfs2_glock_add_to_lru(gl);
 			gfs2_glock_put(gl);
 		}
@@ -917,7 +915,7 @@ static int read_rindex_entry(struct gfs2_inode *ip)
 	error = rgd_insert(rgd);
 	spin_unlock(&sdp->sd_rindex_spin);
 	if (!error) {
-		rgd->rd_gl->gl_object = rgd;
+		glock_set_object(rgd->rd_gl, rgd);
 		rgd->rd_gl->gl_vm.start = (rgd->rd_addr * bsize) & PAGE_MASK;
 		rgd->rd_gl->gl_vm.end = PAGE_ALIGN((rgd->rd_addr +
 						    rgd->rd_length) * bsize) - 1;
diff --git a/fs/gfs2/super.c b/fs/gfs2/super.c
index 7d12c12..554bd0c 100644
--- a/fs/gfs2/super.c
+++ b/fs/gfs2/super.c
@@ -1105,9 +1105,14 @@ static int gfs2_statfs_slow(struct gfs2_sbd *sdp, struct gfs2_statfs_change_host
 					gfs2_holder_uninit(gh);
 					error = err;
 				} else {
+					struct gfs2_glock *gl = gh->gh_gl;
+					struct gfs2_rgrpd *rgd;
+
+					spin_lock(&gl->gl_lockref.lock);
+					rgd = gl->gl_object;
+					spin_unlock(&gl->gl_lockref.lock);
 					if (!error)
-						error = statfs_slow_fill(
-							gh->gh_gl->gl_object, sc);
+						error = statfs_slow_fill(rgd, sc);
 					gfs2_glock_dq_uninit(gh);
 				}
 			}
@@ -1637,7 +1642,7 @@ static void gfs2_evict_inode(struct inode *inode)
 	gfs2_glock_put(ip->i_gl);
 	ip->i_gl = NULL;
 	if (gfs2_holder_initialized(&ip->i_iopen_gh)) {
-		ip->i_iopen_gh.gh_gl->gl_object = NULL;
+		glock_set_object(ip->i_iopen_gh.gh_gl, NULL);
 		ip->i_iopen_gh.gh_flags |= GL_NOCACHE;
 		gfs2_glock_dq_uninit(&ip->i_iopen_gh);
 	}
diff --git a/fs/gfs2/xattr.c b/fs/gfs2/xattr.c
index d87721a..4774584 100644
--- a/fs/gfs2/xattr.c
+++ b/fs/gfs2/xattr.c
@@ -1327,9 +1327,13 @@ static int ea_dealloc_indirect(struct gfs2_inode *ip)
 	gfs2_rlist_alloc(&rlist, LM_ST_EXCLUSIVE);
 
 	for (x = 0; x < rlist.rl_rgrps; x++) {
+		struct gfs2_glock *gl = rlist.rl_ghs[x].gh_gl;
 		struct gfs2_rgrpd *rgd;
-		rgd = rlist.rl_ghs[x].gh_gl->gl_object;
+
+		spin_lock(&gl->gl_lockref.lock);
+		rgd = gl->gl_object;
 		rg_blocks += rgd->rd_length;
+		spin_unlock(&gl->gl_lockref.lock);
 	}
 
 	error = gfs2_glock_nq_m(rlist.rl_rgrps, rlist.rl_ghs);
-- 
2.7.4



^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [Cluster-devel] [PATCH 3/8] gfs2: Clean up glock work enqueuing
  2017-05-31 15:03 [Cluster-devel] [PATCH 0/8] GFS2 shrinker deadlock Andreas Gruenbacher
  2017-05-31 15:03 ` [Cluster-devel] [PATCH 1/8] gfs2: Get rid of flush_delayed_work in gfs2_evict_inode Andreas Gruenbacher
  2017-05-31 15:03 ` [Cluster-devel] [PATCH 2/8] gfs2: Protect gl->gl_object by spin lock Andreas Gruenbacher
@ 2017-05-31 15:03 ` Andreas Gruenbacher
  2017-06-01 15:21   ` Steven Whitehouse
  2017-05-31 15:03 ` [Cluster-devel] [PATCH 4/8] gfs2: Always check block type in gfs2_evict_inode Andreas Gruenbacher
                   ` (4 subsequent siblings)
  7 siblings, 1 reply; 20+ messages in thread
From: Andreas Gruenbacher @ 2017-05-31 15:03 UTC (permalink / raw)
  To: cluster-devel.redhat.com

Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
---
 fs/gfs2/glock.c | 102 +++++++++++++++++++++++++++++++++-----------------------
 1 file changed, 61 insertions(+), 41 deletions(-)

diff --git a/fs/gfs2/glock.c b/fs/gfs2/glock.c
index 959a19c..6d7147c8 100644
--- a/fs/gfs2/glock.c
+++ b/fs/gfs2/glock.c
@@ -152,6 +152,29 @@ static void gfs2_glock_remove_from_lru(struct gfs2_glock *gl)
 	spin_unlock(&lru_lock);
 }
 
+/*
+ * Enqueue the glock on the work queue.  Passes one glock reference on to the
+ * work queue.
+ */
+static void __gfs2_glock_queue_work(struct gfs2_glock *gl, unsigned long delay) {
+	if (!queue_delayed_work(glock_workqueue, &gl->gl_work, delay)) {
+		/*
+		 * We are holding the lockref spinlock, and the work was still
+		 * queued above.  The queued work (glock_work_func) takes that
+		 * spinlock before dropping its glock reference(s), so it
+		 * cannot have dropped them in the meantime.
+		 */
+		GLOCK_BUG_ON(gl, gl->gl_lockref.count < 2);
+		gl->gl_lockref.count--;
+	}
+}
+
+static void gfs2_glock_queue_work(struct gfs2_glock *gl, unsigned long delay) {
+	spin_lock(&gl->gl_lockref.lock);
+	__gfs2_glock_queue_work(gl, delay);
+	spin_unlock(&gl->gl_lockref.lock);
+}
+
 /**
  * gfs2_glock_put() - Decrement reference count on glock
  * @gl: The glock to put
@@ -482,8 +505,7 @@ __acquires(&gl->gl_lockref.lock)
 		    target == LM_ST_UNLOCKED &&
 		    test_bit(SDF_SKIP_DLM_UNLOCK, &sdp->sd_flags)) {
 			finish_xmote(gl, target);
-			if (queue_delayed_work(glock_workqueue, &gl->gl_work, 0) == 0)
-				gfs2_glock_put(gl);
+			gfs2_glock_queue_work(gl, 0);
 		}
 		else if (ret) {
 			pr_err("lm_lock ret %d\n", ret);
@@ -492,8 +514,7 @@ __acquires(&gl->gl_lockref.lock)
 		}
 	} else { /* lock_nolock */
 		finish_xmote(gl, target);
-		if (queue_delayed_work(glock_workqueue, &gl->gl_work, 0) == 0)
-			gfs2_glock_put(gl);
+		gfs2_glock_queue_work(gl, 0);
 	}
 
 	spin_lock(&gl->gl_lockref.lock);
@@ -565,8 +586,7 @@ __acquires(&gl->gl_lockref.lock)
 	clear_bit(GLF_LOCK, &gl->gl_flags);
 	smp_mb__after_atomic();
 	gl->gl_lockref.count++;
-	if (queue_delayed_work(glock_workqueue, &gl->gl_work, 0) == 0)
-		gl->gl_lockref.count--;
+	__gfs2_glock_queue_work(gl, 0);
 	return;
 
 out_unlock:
@@ -601,11 +621,11 @@ static void glock_work_func(struct work_struct *work)
 {
 	unsigned long delay = 0;
 	struct gfs2_glock *gl = container_of(work, struct gfs2_glock, gl_work.work);
-	int drop_ref = 0;
+	unsigned int drop_refs = 1;
 
 	if (test_and_clear_bit(GLF_REPLY_PENDING, &gl->gl_flags)) {
 		finish_xmote(gl, gl->gl_reply);
-		drop_ref = 1;
+		drop_refs++;
 	}
 	spin_lock(&gl->gl_lockref.lock);
 	if (test_bit(GLF_PENDING_DEMOTE, &gl->gl_flags) &&
@@ -623,16 +643,25 @@ static void glock_work_func(struct work_struct *work)
 		}
 	}
 	run_queue(gl, 0);
-	spin_unlock(&gl->gl_lockref.lock);
-	if (!delay)
-		gfs2_glock_put(gl);
-	else {
+	if (delay) {
+		/* Keep one glock reference for the work we requeue. */
+		drop_refs--;
 		if (gl->gl_name.ln_type != LM_TYPE_INODE)
 			delay = 0;
-		if (queue_delayed_work(glock_workqueue, &gl->gl_work, delay) == 0)
-			gfs2_glock_put(gl);
+		__gfs2_glock_queue_work(gl, delay);
+	}
+	if (drop_refs > 1) {
+		gl->gl_lockref.count -= drop_refs - 1;
+		drop_refs = 1;
 	}
-	if (drop_ref)
+	spin_unlock(&gl->gl_lockref.lock);
+
+	/*
+	 * Drop the remaining glock reference after grabbing (and releasing)
+	 * the lockref spinlock: this allows tasks to safely drop glock
+	 * references under that spinlock (see __gfs2_glock_queue_work).
+	 */
+	if (drop_refs)
 		gfs2_glock_put(gl);
 }
 
@@ -986,8 +1015,7 @@ int gfs2_glock_nq(struct gfs2_holder *gh)
 		     test_and_clear_bit(GLF_FROZEN, &gl->gl_flags))) {
 		set_bit(GLF_REPLY_PENDING, &gl->gl_flags);
 		gl->gl_lockref.count++;
-		if (queue_delayed_work(glock_workqueue, &gl->gl_work, 0) == 0)
-			gl->gl_lockref.count--;
+		__gfs2_glock_queue_work(gl, 0);
 	}
 	run_queue(gl, 1);
 	spin_unlock(&gl->gl_lockref.lock);
@@ -1047,17 +1075,15 @@ void gfs2_glock_dq(struct gfs2_holder *gh)
 		gfs2_glock_add_to_lru(gl);
 
 	trace_gfs2_glock_queue(gh, 0);
+	if (unlikely(!fast_path)) {
+		gl->gl_lockref.count++;
+		if (test_bit(GLF_PENDING_DEMOTE, &gl->gl_flags) &&
+		    !test_bit(GLF_DEMOTE, &gl->gl_flags) &&
+		    gl->gl_name.ln_type == LM_TYPE_INODE)
+			delay = gl->gl_hold_time;
+		__gfs2_glock_queue_work(gl, delay);
+	}
 	spin_unlock(&gl->gl_lockref.lock);
-	if (likely(fast_path))
-		return;
-
-	gfs2_glock_hold(gl);
-	if (test_bit(GLF_PENDING_DEMOTE, &gl->gl_flags) &&
-	    !test_bit(GLF_DEMOTE, &gl->gl_flags) &&
-	    gl->gl_name.ln_type == LM_TYPE_INODE)
-		delay = gl->gl_hold_time;
-	if (queue_delayed_work(glock_workqueue, &gl->gl_work, delay) == 0)
-		gfs2_glock_put(gl);
 }
 
 void gfs2_glock_dq_wait(struct gfs2_holder *gh)
@@ -1233,9 +1259,8 @@ void gfs2_glock_cb(struct gfs2_glock *gl, unsigned int state)
 
 	spin_lock(&gl->gl_lockref.lock);
 	handle_callback(gl, state, delay, true);
+	__gfs2_glock_queue_work(gl, delay);
 	spin_unlock(&gl->gl_lockref.lock);
-	if (queue_delayed_work(glock_workqueue, &gl->gl_work, delay) == 0)
-		gfs2_glock_put(gl);
 }
 
 /**
@@ -1294,10 +1319,8 @@ void gfs2_glock_complete(struct gfs2_glock *gl, int ret)
 
 	gl->gl_lockref.count++;
 	set_bit(GLF_REPLY_PENDING, &gl->gl_flags);
+	__gfs2_glock_queue_work(gl, 0);
 	spin_unlock(&gl->gl_lockref.lock);
-
-	if (queue_delayed_work(glock_workqueue, &gl->gl_work, 0) == 0)
-		gfs2_glock_put(gl);
 }
 
 static int glock_cmp(void *priv, struct list_head *a, struct list_head *b)
@@ -1355,8 +1378,7 @@ __acquires(&lru_lock)
 		if (demote_ok(gl))
 			handle_callback(gl, LM_ST_UNLOCKED, 0, false);
 		WARN_ON(!test_and_clear_bit(GLF_LOCK, &gl->gl_flags));
-		if (queue_delayed_work(glock_workqueue, &gl->gl_work, 0) == 0)
-			gl->gl_lockref.count--;
+		__gfs2_glock_queue_work(gl, 0);
 		spin_unlock(&gl->gl_lockref.lock);
 		cond_resched_lock(&lru_lock);
 	}
@@ -1462,13 +1484,12 @@ static void glock_hash_walk(glock_examiner examiner, const struct gfs2_sbd *sdp)
 
 static void thaw_glock(struct gfs2_glock *gl)
 {
-	if (!test_and_clear_bit(GLF_FROZEN, &gl->gl_flags))
-		goto out;
-	set_bit(GLF_REPLY_PENDING, &gl->gl_flags);
-	if (queue_delayed_work(glock_workqueue, &gl->gl_work, 0) == 0) {
-out:
+	if (!test_and_clear_bit(GLF_FROZEN, &gl->gl_flags)) {
 		gfs2_glock_put(gl);
+		return;
 	}
+	set_bit(GLF_REPLY_PENDING, &gl->gl_flags);
+	gfs2_glock_queue_work(gl, 0);
 }
 
 /**
@@ -1484,9 +1505,8 @@ static void clear_glock(struct gfs2_glock *gl)
 	spin_lock(&gl->gl_lockref.lock);
 	if (gl->gl_state != LM_ST_UNLOCKED)
 		handle_callback(gl, LM_ST_UNLOCKED, 0, false);
+	__gfs2_glock_queue_work(gl, 0);
 	spin_unlock(&gl->gl_lockref.lock);
-	if (queue_delayed_work(glock_workqueue, &gl->gl_work, 0) == 0)
-		gfs2_glock_put(gl);
 }
 
 /**
-- 
2.7.4



^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [Cluster-devel] [PATCH 4/8] gfs2: Always check block type in gfs2_evict_inode
  2017-05-31 15:03 [Cluster-devel] [PATCH 0/8] GFS2 shrinker deadlock Andreas Gruenbacher
                   ` (2 preceding siblings ...)
  2017-05-31 15:03 ` [Cluster-devel] [PATCH 3/8] gfs2: Clean up glock work enqueuing Andreas Gruenbacher
@ 2017-05-31 15:03 ` Andreas Gruenbacher
  2017-05-31 15:03 ` [Cluster-devel] [PATCH 5/8] gfs2: gfs2_glock_get: Wait on freeing glocks Andreas Gruenbacher
                   ` (3 subsequent siblings)
  7 siblings, 0 replies; 20+ messages in thread
From: Andreas Gruenbacher @ 2017-05-31 15:03 UTC (permalink / raw)
  To: cluster-devel.redhat.com

Commit 40ac218f introduces flag GIF_ALLOC_FAILED which indicates that
the inode has likely been allocated in the bitmap already (type
GFS2_BLKST_DINODE), but allocating the inode has eventually failed.
This is used to skip the block type check when destroying inodes in
gfs2_evict_inode.  This is problematic because we do not keep lock
protection between gfs2_create_inode and gfs2_evict_inode.

Fix this by changing the logic in gfs2_create_inode so that the
GIF_FREE_VFS_INODE / GIF_ALLOC_FAILED flags indicate the actual expected
block type, and always check the block type before deleting / freeing an
inode in in gfs2_evict_inode.

(In addition, fix a related typo in a comment of gfs2_set_iop.)

Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
---
 fs/gfs2/inode.c | 17 +++++++++--------
 fs/gfs2/super.c | 11 ++++++-----
 2 files changed, 15 insertions(+), 13 deletions(-)

diff --git a/fs/gfs2/inode.c b/fs/gfs2/inode.c
index 50108fa..96ebb06 100644
--- a/fs/gfs2/inode.c
+++ b/fs/gfs2/inode.c
@@ -109,7 +109,7 @@ static void gfs2_set_iop(struct inode *inode)
  * @no_addr: The inode number
  * @no_formal_ino: The inode generation number
  * @blktype: Requested block type (GFS2_BLKST_DINODE or GFS2_BLKST_UNLINKED;
- *           GFS2_BLKST_FREE do indicate not to verify)
+ *           GFS2_BLKST_FREE to indicate not to verify)
  *
  * If @type is DT_UNKNOWN, the inode type is fetched from disk.
  *
@@ -589,7 +589,7 @@ static int gfs2_create_inode(struct inode *dir, struct dentry *dentry,
 	struct gfs2_inode *dip = GFS2_I(dir), *ip;
 	struct gfs2_sbd *sdp = GFS2_SB(&dip->i_inode);
 	struct gfs2_glock *io_gl = NULL;
-	int error, free_vfs_inode = 1;
+	int error, fail_bit = GIF_FREE_VFS_INODE;
 	u32 aflags = 0;
 	unsigned blocks = 1;
 	struct gfs2_diradd da = { .bh = NULL, .save_loc = 1, };
@@ -699,6 +699,7 @@ static int gfs2_create_inode(struct inode *dir, struct dentry *dentry,
 	error = alloc_dinode(ip, aflags, &blocks);
 	if (error)
 		goto fail_free_inode;
+	fail_bit = GIF_ALLOC_FAILED;
 
 	gfs2_set_inode_blocks(inode, blocks);
 
@@ -737,9 +738,9 @@ static int gfs2_create_inode(struct inode *dir, struct dentry *dentry,
 	gfs2_set_iop(inode);
 	insert_inode_hash(inode);
 
-	free_vfs_inode = 0; /* After this point, the inode is no longer
-			       considered free. Any failures need to undo
-			       the gfs2 structures. */
+	fail_bit = 0; /* After this point, the inode is no longer considered
+			 free. Any failures need to undo the gfs2 structures. */
+
 	if (default_acl) {
 		error = __gfs2_set_acl(inode, default_acl, ACL_TYPE_DEFAULT);
 		posix_acl_release(default_acl);
@@ -794,10 +795,10 @@ static int gfs2_create_inode(struct inode *dir, struct dentry *dentry,
 	gfs2_glock_dq_uninit(ghs);
 	if (inode && !IS_ERR(inode)) {
 		clear_nlink(inode);
-		if (!free_vfs_inode)
+		if (fail_bit != GIF_FREE_VFS_INODE)
 			mark_inode_dirty(inode);
-		set_bit(free_vfs_inode ? GIF_FREE_VFS_INODE : GIF_ALLOC_FAILED,
-			&GFS2_I(inode)->i_flags);
+		if (fail_bit)
+			set_bit(fail_bit, &GFS2_I(inode)->i_flags);
 		iput(inode);
 	}
 fail:
diff --git a/fs/gfs2/super.c b/fs/gfs2/super.c
index 554bd0c..c651983 100644
--- a/fs/gfs2/super.c
+++ b/fs/gfs2/super.c
@@ -1530,6 +1530,7 @@ static void gfs2_evict_inode(struct inode *inode)
 	struct gfs2_inode *ip = GFS2_I(inode);
 	struct gfs2_holder gh;
 	struct address_space *metamapping;
+	unsigned int blk_type;
 	int error;
 
 	if (test_bit(GIF_FREE_VFS_INODE, &ip->i_flags)) {
@@ -1548,11 +1549,11 @@ static void gfs2_evict_inode(struct inode *inode)
 		goto out;
 	}
 
-	if (!test_bit(GIF_ALLOC_FAILED, &ip->i_flags)) {
-		error = gfs2_check_blk_type(sdp, ip->i_no_addr, GFS2_BLKST_UNLINKED);
-		if (error)
-			goto out_truncate;
-	}
+	blk_type = test_bit(GIF_ALLOC_FAILED, &ip->i_flags) ?
+		GFS2_BLKST_DINODE : GFS2_BLKST_UNLINKED;
+	error = gfs2_check_blk_type(sdp, ip->i_no_addr, blk_type);
+	if (error)
+		goto out_truncate;
 
 	if (test_bit(GIF_INVALID, &ip->i_flags)) {
 		error = gfs2_inode_refresh(ip);
-- 
2.7.4



^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [Cluster-devel] [PATCH 5/8] gfs2: gfs2_glock_get: Wait on freeing glocks
  2017-05-31 15:03 [Cluster-devel] [PATCH 0/8] GFS2 shrinker deadlock Andreas Gruenbacher
                   ` (3 preceding siblings ...)
  2017-05-31 15:03 ` [Cluster-devel] [PATCH 4/8] gfs2: Always check block type in gfs2_evict_inode Andreas Gruenbacher
@ 2017-05-31 15:03 ` Andreas Gruenbacher
  2017-05-31 15:03 ` [Cluster-devel] [PATCH 6/8] gfs2: Get rid of gfs2_set_nlink Andreas Gruenbacher
                   ` (2 subsequent siblings)
  7 siblings, 0 replies; 20+ messages in thread
From: Andreas Gruenbacher @ 2017-05-31 15:03 UTC (permalink / raw)
  To: cluster-devel.redhat.com

Keep glocks in their hash table until they are freed instead of removing
them when their last reference is dropped.  This allows to wait for
glocks to go away before recreating them in gfs2_glock_get so that the
previous use of the glock won't overlap with the next.

Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
---
 fs/gfs2/glock.c  | 135 +++++++++++++++++++++++++++++++++++++++++++++----------
 fs/gfs2/incore.h |   1 +
 2 files changed, 113 insertions(+), 23 deletions(-)

diff --git a/fs/gfs2/glock.c b/fs/gfs2/glock.c
index 6d7147c8..e6d32d2 100644
--- a/fs/gfs2/glock.c
+++ b/fs/gfs2/glock.c
@@ -15,6 +15,7 @@
 #include <linux/buffer_head.h>
 #include <linux/delay.h>
 #include <linux/sort.h>
+#include <linux/hash.h>
 #include <linux/jhash.h>
 #include <linux/kallsyms.h>
 #include <linux/gfs2_ondisk.h>
@@ -80,9 +81,69 @@ static struct rhashtable_params ht_parms = {
 
 static struct rhashtable gl_hash_table;
 
-void gfs2_glock_free(struct gfs2_glock *gl)
+#define GLOCK_WAIT_TABLE_BITS 12
+#define GLOCK_WAIT_TABLE_SIZE (1 << GLOCK_WAIT_TABLE_BITS)
+static wait_queue_head_t glock_wait_table[GLOCK_WAIT_TABLE_SIZE] __cacheline_aligned;
+
+struct wait_glock_queue {
+	struct lm_lockname *name;
+	wait_queue_t wait;
+};
+
+static int glock_wake_function(wait_queue_t *wait, unsigned int mode, int sync,
+			       void *key)
 {
-	struct gfs2_sbd *sdp = gl->gl_name.ln_sbd;
+	struct wait_glock_queue *wait_glock =
+		container_of(wait, struct wait_glock_queue, wait);
+	struct lm_lockname *wait_name = wait_glock->name;
+	struct lm_lockname *wake_name = key;
+
+	if (wake_name->ln_sbd != wait_name->ln_sbd ||
+	    wake_name->ln_number != wait_name->ln_number ||
+	    wake_name->ln_type != wait_name->ln_type)
+		return 0;
+	return autoremove_wake_function(wait, mode, sync, key);
+}
+
+static wait_queue_head_t *glock_waitqueue(struct lm_lockname *name)
+{
+	u32 hash = jhash2((u32 *)name, sizeof(*name) / 4, 0);
+
+	return glock_wait_table + hash_32(hash, GLOCK_WAIT_TABLE_BITS);
+}
+
+static void prepare_to_wait_on_glock(wait_queue_head_t **wq,
+				     struct wait_glock_queue *wait,
+				     struct lm_lockname *name)
+{
+	wait->name = name;
+	init_wait(&wait->wait);
+	wait->wait.func = glock_wake_function;
+	*wq = glock_waitqueue(name);
+	prepare_to_wait(*wq, &wait->wait, TASK_UNINTERRUPTIBLE);
+}
+
+static void finish_wait_on_glock(wait_queue_head_t *wq,
+				 struct wait_glock_queue *wait)
+{
+	finish_wait(wq, &wait->wait);
+}
+
+/**
+ * wake_up_glock  -  Wake up waiters on a glock
+ * @gl: the glock
+ */
+static void wake_up_glock(struct gfs2_glock *gl)
+{
+	wait_queue_head_t *wq = glock_waitqueue(&gl->gl_name);
+
+	if (waitqueue_active(wq))
+		__wake_up(wq, TASK_NORMAL, 1, &gl->gl_name);
+}
+
+static void gfs2_glock_dealloc(struct rcu_head *rcu)
+{
+	struct gfs2_glock *gl = container_of(rcu, struct gfs2_glock, gl_rcu);
 
 	if (gl->gl_ops->go_flags & GLOF_ASPACE) {
 		kmem_cache_free(gfs2_glock_aspace_cachep, gl);
@@ -90,6 +151,15 @@ void gfs2_glock_free(struct gfs2_glock *gl)
 		kfree(gl->gl_lksb.sb_lvbptr);
 		kmem_cache_free(gfs2_glock_cachep, gl);
 	}
+}
+
+void gfs2_glock_free(struct gfs2_glock *gl)
+{
+	struct gfs2_sbd *sdp = gl->gl_name.ln_sbd;
+
+	rhashtable_remove_fast(&gl_hash_table, &gl->gl_node, ht_parms);
+	wake_up_glock(gl);
+	call_rcu(&gl->gl_rcu, gfs2_glock_dealloc);
 	if (atomic_dec_and_test(&sdp->sd_glock_disposal))
 		wake_up(&sdp->sd_glock_wait);
 }
@@ -193,7 +263,6 @@ void gfs2_glock_put(struct gfs2_glock *gl)
 
 	gfs2_glock_remove_from_lru(gl);
 	spin_unlock(&gl->gl_lockref.lock);
-	rhashtable_remove_fast(&gl_hash_table, &gl->gl_node, ht_parms);
 	GLOCK_BUG_ON(gl, !list_empty(&gl->gl_holders));
 	GLOCK_BUG_ON(gl, mapping && mapping->nrpages);
 	trace_gfs2_glock_put(gl);
@@ -665,6 +734,36 @@ static void glock_work_func(struct work_struct *work)
 		gfs2_glock_put(gl);
 }
 
+static struct gfs2_glock *find_insert_glock(struct lm_lockname *name,
+					    struct gfs2_glock *new)
+{
+	struct wait_glock_queue wait;
+	wait_queue_head_t *wq;
+	struct gfs2_glock *gl;
+
+again:
+	prepare_to_wait_on_glock(&wq, &wait, name);
+	rcu_read_lock();
+	if (new) {
+		gl = rhashtable_lookup_get_insert_fast(&gl_hash_table,
+			&new->gl_node, ht_parms);
+		if (IS_ERR(gl))
+			goto out;
+	} else {
+		gl = rhashtable_lookup_fast(&gl_hash_table,
+			name, ht_parms);
+	}
+	if (gl && !lockref_get_not_dead(&gl->gl_lockref)) {
+		rcu_read_unlock();
+		schedule();
+		goto again;
+	}
+out:
+	rcu_read_unlock();
+	finish_wait_on_glock(wq, &wait);
+	return gl;
+}
+
 /**
  * gfs2_glock_get() - Get a glock, or create one if one doesn't exist
  * @sdp: The GFS2 superblock
@@ -691,15 +790,11 @@ int gfs2_glock_get(struct gfs2_sbd *sdp, u64 number,
 	struct kmem_cache *cachep;
 	int ret = 0;
 
-	rcu_read_lock();
-	gl = rhashtable_lookup_fast(&gl_hash_table, &name, ht_parms);
-	if (gl && !lockref_get_not_dead(&gl->gl_lockref))
-		gl = NULL;
-	rcu_read_unlock();
-
-	*glp = gl;
-	if (gl)
+	gl = find_insert_glock(&name, NULL);
+	if (gl) {
+		*glp = gl;
 		return 0;
+	}
 	if (!create)
 		return -ENOENT;
 
@@ -753,10 +848,7 @@ int gfs2_glock_get(struct gfs2_sbd *sdp, u64 number,
 		mapping->writeback_index = 0;
 	}
 
-again:
-	rcu_read_lock();
-	tmp = rhashtable_lookup_get_insert_fast(&gl_hash_table, &gl->gl_node,
-						ht_parms);
+	tmp = find_insert_glock(&name, gl);
 	if (!tmp) {
 		*glp = gl;
 		goto out;
@@ -765,13 +857,7 @@ int gfs2_glock_get(struct gfs2_sbd *sdp, u64 number,
 		ret = PTR_ERR(tmp);
 		goto out_free;
 	}
-	if (lockref_get_not_dead(&tmp->gl_lockref)) {
-		*glp = tmp;
-		goto out_free;
-	}
-	rcu_read_unlock();
-	cond_resched();
-	goto again;
+	*glp = tmp;
 
 out_free:
 	kfree(gl->gl_lksb.sb_lvbptr);
@@ -1792,7 +1878,7 @@ static int gfs2_sbstats_seq_show(struct seq_file *seq, void *iter_ptr)
 
 int __init gfs2_glock_init(void)
 {
-	int ret;
+	int i, ret;
 
 	ret = rhashtable_init(&gl_hash_table, &ht_parms);
 	if (ret < 0)
@@ -1821,6 +1907,9 @@ int __init gfs2_glock_init(void)
 		return ret;
 	}
 
+	for (i = 0; i < GLOCK_WAIT_TABLE_SIZE; i++)
+		init_waitqueue_head(glock_wait_table + i);
+
 	return 0;
 }
 
diff --git a/fs/gfs2/incore.h b/fs/gfs2/incore.h
index b6f5b8d..99706ed 100644
--- a/fs/gfs2/incore.h
+++ b/fs/gfs2/incore.h
@@ -374,6 +374,7 @@ struct gfs2_glock {
 			loff_t end;
 		} gl_vm;
 	};
+	struct rcu_head gl_rcu;
 	struct rhash_head gl_node;
 };
 
-- 
2.7.4



^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [Cluster-devel] [PATCH 6/8] gfs2: Get rid of gfs2_set_nlink
  2017-05-31 15:03 [Cluster-devel] [PATCH 0/8] GFS2 shrinker deadlock Andreas Gruenbacher
                   ` (4 preceding siblings ...)
  2017-05-31 15:03 ` [Cluster-devel] [PATCH 5/8] gfs2: gfs2_glock_get: Wait on freeing glocks Andreas Gruenbacher
@ 2017-05-31 15:03 ` Andreas Gruenbacher
  2017-05-31 15:03 ` [Cluster-devel] [PATCH 7/8] gfs2: gfs2_evict_inode: Put glocks asynchronously Andreas Gruenbacher
  2017-05-31 15:03 ` [Cluster-devel] [PATCH 8/8] gfs2: Warn when not deleting inodes under memory pressure Andreas Gruenbacher
  7 siblings, 0 replies; 20+ messages in thread
From: Andreas Gruenbacher @ 2017-05-31 15:03 UTC (permalink / raw)
  To: cluster-devel.redhat.com

Remove gfs2_set_nlink which prevents the link count of an inode to
become non-zero once it has reached zero.  The following changes will
reduce the amount of waiting on glocks when an inode is evicted from
memory.  With that, it seems that an inode can become reallocated before
all the remote-unlink callbacks from a previous delete have been
processed.  In that case, the link count will change from zero to
non-zero.

Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
---
 fs/gfs2/glops.c | 28 +---------------------------
 1 file changed, 1 insertion(+), 27 deletions(-)

diff --git a/fs/gfs2/glops.c b/fs/gfs2/glops.c
index 238a78c..29d58fd 100644
--- a/fs/gfs2/glops.c
+++ b/fs/gfs2/glops.c
@@ -324,32 +324,6 @@ static int inode_go_demote_ok(const struct gfs2_glock *gl)
 	return 1;
 }
 
-/**
- * gfs2_set_nlink - Set the inode's link count based on on-disk info
- * @inode: The inode in question
- * @nlink: The link count
- *
- * If the link count has hit zero, it must never be raised, whatever the
- * on-disk inode might say. When new struct inodes are created the link
- * count is set to 1, so that we can safely use this test even when reading
- * in on disk information for the first time.
- */
-
-static void gfs2_set_nlink(struct inode *inode, u32 nlink)
-{
-	/*
-	 * We will need to review setting the nlink count here in the
-	 * light of the forthcoming ro bind mount work. This is a reminder
-	 * to do that.
-	 */
-	if ((inode->i_nlink != nlink) && (inode->i_nlink != 0)) {
-		if (nlink == 0)
-			clear_nlink(inode);
-		else
-			set_nlink(inode, nlink);
-	}
-}
-
 static int gfs2_dinode_in(struct gfs2_inode *ip, const void *buf)
 {
 	const struct gfs2_dinode *str = buf;
@@ -371,7 +345,7 @@ static int gfs2_dinode_in(struct gfs2_inode *ip, const void *buf)
 
 	i_uid_write(&ip->i_inode, be32_to_cpu(str->di_uid));
 	i_gid_write(&ip->i_inode, be32_to_cpu(str->di_gid));
-	gfs2_set_nlink(&ip->i_inode, be32_to_cpu(str->di_nlink));
+	set_nlink(&ip->i_inode, be32_to_cpu(str->di_nlink));
 	i_size_write(&ip->i_inode, be64_to_cpu(str->di_size));
 	gfs2_set_inode_blocks(&ip->i_inode, be64_to_cpu(str->di_blocks));
 	atime.tv_sec = be64_to_cpu(str->di_atime);
-- 
2.7.4



^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [Cluster-devel] [PATCH 7/8] gfs2: gfs2_evict_inode: Put glocks asynchronously
  2017-05-31 15:03 [Cluster-devel] [PATCH 0/8] GFS2 shrinker deadlock Andreas Gruenbacher
                   ` (5 preceding siblings ...)
  2017-05-31 15:03 ` [Cluster-devel] [PATCH 6/8] gfs2: Get rid of gfs2_set_nlink Andreas Gruenbacher
@ 2017-05-31 15:03 ` Andreas Gruenbacher
  2017-06-01 10:51   ` Andreas Gruenbacher
  2017-06-01 15:37   ` Steven Whitehouse
  2017-05-31 15:03 ` [Cluster-devel] [PATCH 8/8] gfs2: Warn when not deleting inodes under memory pressure Andreas Gruenbacher
  7 siblings, 2 replies; 20+ messages in thread
From: Andreas Gruenbacher @ 2017-05-31 15:03 UTC (permalink / raw)
  To: cluster-devel.redhat.com

gfs2_evict_inode is called to free inodes under memory pressure.  The
function calls into DLM when an inode's last cluster-wide reference goes
away (remote unlink) and to release the glock and associated DLM lock
before finally destroying the inode.  However, if DLM is blocked on
memory to become available, calling into DLM again will deadlock.

Avoid that by decoupling releasing glocks from destroying inodes in that
case: with gfs2_glock_queue_put, glocks will be dequeued asynchronously
in work queue context, when the associated inodes have most likely
already been destroyed.

With this change, it appears that inodes can end up being unlinked,
remote-unlink can be triggered, and then the inode can be reallocated
before all remote-unlink callbacks are processed.  Revalidate the link
count in gfs2_evict_inode to make sure we're not destroying an
allocated, referenced inode.

In addition, skip remote unlinks under memory pressure; the next inode
allocation in the same resource group will take care of destroying
unlinked inodes.

Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
---
 fs/gfs2/glock.c | 10 +++++++++-
 fs/gfs2/glock.h |  2 ++
 fs/gfs2/super.c | 30 ++++++++++++++++++++++++++++--
 3 files changed, 39 insertions(+), 3 deletions(-)

diff --git a/fs/gfs2/glock.c b/fs/gfs2/glock.c
index e6d32d2..4ba53e9 100644
--- a/fs/gfs2/glock.c
+++ b/fs/gfs2/glock.c
@@ -170,7 +170,7 @@ void gfs2_glock_free(struct gfs2_glock *gl)
  *
  */
 
-static void gfs2_glock_hold(struct gfs2_glock *gl)
+void gfs2_glock_hold(struct gfs2_glock *gl)
 {
 	GLOCK_BUG_ON(gl, __lockref_is_dead(&gl->gl_lockref));
 	lockref_get(&gl->gl_lockref);
@@ -269,6 +269,14 @@ void gfs2_glock_put(struct gfs2_glock *gl)
 	sdp->sd_lockstruct.ls_ops->lm_put_lock(gl);
 }
 
+/*
+ * Cause the glock to be put in work queue context.
+ */
+void gfs2_glock_queue_put(struct gfs2_glock *gl)
+{
+	gfs2_glock_queue_work(gl, 0);
+}
+
 /**
  * may_grant - check if its ok to grant a new lock
  * @gl: The glock
diff --git a/fs/gfs2/glock.h b/fs/gfs2/glock.h
index 9ad4a6a..33e0511 100644
--- a/fs/gfs2/glock.h
+++ b/fs/gfs2/glock.h
@@ -181,7 +181,9 @@ static inline struct address_space *gfs2_glock2aspace(struct gfs2_glock *gl)
 extern int gfs2_glock_get(struct gfs2_sbd *sdp, u64 number,
 			  const struct gfs2_glock_operations *glops,
 			  int create, struct gfs2_glock **glp);
+extern void gfs2_glock_hold(struct gfs2_glock *gl);
 extern void gfs2_glock_put(struct gfs2_glock *gl);
+extern void gfs2_glock_queue_put(struct gfs2_glock *gl);
 extern void gfs2_holder_init(struct gfs2_glock *gl, unsigned int state,
 			     u16 flags, struct gfs2_holder *gh);
 extern void gfs2_holder_reinit(unsigned int state, u16 flags,
diff --git a/fs/gfs2/super.c b/fs/gfs2/super.c
index c651983..ace4814 100644
--- a/fs/gfs2/super.c
+++ b/fs/gfs2/super.c
@@ -1541,6 +1541,16 @@ static void gfs2_evict_inode(struct inode *inode)
 	if (inode->i_nlink || (sb->s_flags & MS_RDONLY))
 		goto out;
 
+	/*
+	 * If we are in shrinker context, DLM may depend on us to make
+	 * progress.  In that case, calling into DLM again would deadlock.  To
+	 * prevent that from happening, skip deallocating the inode here; it
+	 * will be deallocated when another inode is allocated in the same
+	 * resource group.
+	 */
+	if (current->flags & PF_MEMALLOC)
+		goto out;
+
 	/* Must not read inode block until block type has been verified */
 	error = gfs2_glock_nq_init(ip->i_gl, LM_ST_EXCLUSIVE, GL_SKIP, &gh);
 	if (unlikely(error)) {
@@ -1561,6 +1571,12 @@ static void gfs2_evict_inode(struct inode *inode)
 			goto out_truncate;
 	}
 
+	/*
+	 * The inode may have been recreated in the meantime.
+	 */
+	if (inode->i_nlink)
+		goto out_truncate;
+
 	if (gfs2_holder_initialized(&ip->i_iopen_gh) &&
 	    test_bit(HIF_HOLDER, &ip->i_iopen_gh.gh_iflags)) {
 		ip->i_iopen_gh.gh_flags |= GL_NOCACHE;
@@ -1640,12 +1656,22 @@ static void gfs2_evict_inode(struct inode *inode)
 	glock_set_object(ip->i_gl, NULL);
 	wait_on_bit_io(&ip->i_flags, GIF_GLOP_PENDING, TASK_UNINTERRUPTIBLE);
 	gfs2_glock_add_to_lru(ip->i_gl);
-	gfs2_glock_put(ip->i_gl);
+	if (current->flags & PF_MEMALLOC)
+		gfs2_glock_queue_put(ip->i_gl);
+	else
+		gfs2_glock_put(ip->i_gl);
 	ip->i_gl = NULL;
 	if (gfs2_holder_initialized(&ip->i_iopen_gh)) {
-		glock_set_object(ip->i_iopen_gh.gh_gl, NULL);
+		struct gfs2_glock *gl = ip->i_iopen_gh.gh_gl;
+
+		glock_set_object(gl, NULL);
 		ip->i_iopen_gh.gh_flags |= GL_NOCACHE;
+		gfs2_glock_hold(gl);
 		gfs2_glock_dq_uninit(&ip->i_iopen_gh);
+		if (current->flags & PF_MEMALLOC)
+			gfs2_glock_queue_put(gl);
+		else
+			gfs2_glock_put(ip->i_gl);
 	}
 }
 
-- 
2.7.4



^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [Cluster-devel] [PATCH 8/8] gfs2: Warn when not deleting inodes under memory pressure
  2017-05-31 15:03 [Cluster-devel] [PATCH 0/8] GFS2 shrinker deadlock Andreas Gruenbacher
                   ` (6 preceding siblings ...)
  2017-05-31 15:03 ` [Cluster-devel] [PATCH 7/8] gfs2: gfs2_evict_inode: Put glocks asynchronously Andreas Gruenbacher
@ 2017-05-31 15:03 ` Andreas Gruenbacher
  2017-06-01 15:39   ` Steven Whitehouse
  7 siblings, 1 reply; 20+ messages in thread
From: Andreas Gruenbacher @ 2017-05-31 15:03 UTC (permalink / raw)
  To: cluster-devel.redhat.com

Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
---
 fs/gfs2/super.c | 5 ++++-
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/fs/gfs2/super.c b/fs/gfs2/super.c
index ace4814..3637662 100644
--- a/fs/gfs2/super.c
+++ b/fs/gfs2/super.c
@@ -1548,8 +1548,11 @@ static void gfs2_evict_inode(struct inode *inode)
 	 * will be deallocated when another inode is allocated in the same
 	 * resource group.
 	 */
-	if (current->flags & PF_MEMALLOC)
+	if (current->flags & PF_MEMALLOC) {
+		fs_warn(sdp, "not deleting inode %llu under memory pressure\n",
+			(unsigned long long)ip->i_no_addr);
 		goto out;
+	}
 
 	/* Must not read inode block until block type has been verified */
 	error = gfs2_glock_nq_init(ip->i_gl, LM_ST_EXCLUSIVE, GL_SKIP, &gh);
-- 
2.7.4



^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [Cluster-devel] [PATCH 7/8] gfs2: gfs2_evict_inode: Put glocks asynchronously
  2017-05-31 15:03 ` [Cluster-devel] [PATCH 7/8] gfs2: gfs2_evict_inode: Put glocks asynchronously Andreas Gruenbacher
@ 2017-06-01 10:51   ` Andreas Gruenbacher
  2017-06-01 15:37   ` Steven Whitehouse
  1 sibling, 0 replies; 20+ messages in thread
From: Andreas Gruenbacher @ 2017-06-01 10:51 UTC (permalink / raw)
  To: cluster-devel.redhat.com

On Wed, May 31, 2017 at 5:03 PM, Andreas Gruenbacher
<agruenba@redhat.com> wrote:
> gfs2_evict_inode is called to free inodes under memory pressure.  The
> function calls into DLM when an inode's last cluster-wide reference goes
> away (remote unlink) and to release the glock and associated DLM lock
> before finally destroying the inode.  However, if DLM is blocked on
> memory to become available, calling into DLM again will deadlock.
>
> Avoid that by decoupling releasing glocks from destroying inodes in that
> case: with gfs2_glock_queue_put, glocks will be dequeued asynchronously
> in work queue context, when the associated inodes have most likely
> already been destroyed.
>
> With this change, it appears that inodes can end up being unlinked,
> remote-unlink can be triggered, and then the inode can be reallocated
> before all remote-unlink callbacks are processed.  Revalidate the link
> count in gfs2_evict_inode to make sure we're not destroying an
> allocated, referenced inode.
>
> In addition, skip remote unlinks under memory pressure; the next inode
> allocation in the same resource group will take care of destroying
> unlinked inodes.
>
> Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
> ---
>  fs/gfs2/glock.c | 10 +++++++++-
>  fs/gfs2/glock.h |  2 ++
>  fs/gfs2/super.c | 30 ++++++++++++++++++++++++++++--
>  3 files changed, 39 insertions(+), 3 deletions(-)
>
> diff --git a/fs/gfs2/glock.c b/fs/gfs2/glock.c
> index e6d32d2..4ba53e9 100644
> --- a/fs/gfs2/glock.c
> +++ b/fs/gfs2/glock.c
> @@ -170,7 +170,7 @@ void gfs2_glock_free(struct gfs2_glock *gl)
>   *
>   */
>
> -static void gfs2_glock_hold(struct gfs2_glock *gl)
> +void gfs2_glock_hold(struct gfs2_glock *gl)
>  {
>         GLOCK_BUG_ON(gl, __lockref_is_dead(&gl->gl_lockref));
>         lockref_get(&gl->gl_lockref);
> @@ -269,6 +269,14 @@ void gfs2_glock_put(struct gfs2_glock *gl)
>         sdp->sd_lockstruct.ls_ops->lm_put_lock(gl);
>  }
>
> +/*
> + * Cause the glock to be put in work queue context.
> + */
> +void gfs2_glock_queue_put(struct gfs2_glock *gl)
> +{
> +       gfs2_glock_queue_work(gl, 0);
> +}
> +
>  /**
>   * may_grant - check if its ok to grant a new lock
>   * @gl: The glock
> diff --git a/fs/gfs2/glock.h b/fs/gfs2/glock.h
> index 9ad4a6a..33e0511 100644
> --- a/fs/gfs2/glock.h
> +++ b/fs/gfs2/glock.h
> @@ -181,7 +181,9 @@ static inline struct address_space *gfs2_glock2aspace(struct gfs2_glock *gl)
>  extern int gfs2_glock_get(struct gfs2_sbd *sdp, u64 number,
>                           const struct gfs2_glock_operations *glops,
>                           int create, struct gfs2_glock **glp);
> +extern void gfs2_glock_hold(struct gfs2_glock *gl);
>  extern void gfs2_glock_put(struct gfs2_glock *gl);
> +extern void gfs2_glock_queue_put(struct gfs2_glock *gl);
>  extern void gfs2_holder_init(struct gfs2_glock *gl, unsigned int state,
>                              u16 flags, struct gfs2_holder *gh);
>  extern void gfs2_holder_reinit(unsigned int state, u16 flags,
> diff --git a/fs/gfs2/super.c b/fs/gfs2/super.c
> index c651983..ace4814 100644
> --- a/fs/gfs2/super.c
> +++ b/fs/gfs2/super.c
> @@ -1541,6 +1541,16 @@ static void gfs2_evict_inode(struct inode *inode)
>         if (inode->i_nlink || (sb->s_flags & MS_RDONLY))
>                 goto out;
>
> +       /*
> +        * If we are in shrinker context, DLM may depend on us to make
> +        * progress.  In that case, calling into DLM again would deadlock.  To
> +        * prevent that from happening, skip deallocating the inode here; it
> +        * will be deallocated when another inode is allocated in the same
> +        * resource group.
> +        */
> +       if (current->flags & PF_MEMALLOC)
> +               goto out;
> +
>         /* Must not read inode block until block type has been verified */
>         error = gfs2_glock_nq_init(ip->i_gl, LM_ST_EXCLUSIVE, GL_SKIP, &gh);
>         if (unlikely(error)) {
> @@ -1561,6 +1571,12 @@ static void gfs2_evict_inode(struct inode *inode)
>                         goto out_truncate;
>         }
>
> +       /*
> +        * The inode may have been recreated in the meantime.
> +        */
> +       if (inode->i_nlink)
> +               goto out_truncate;
> +
>         if (gfs2_holder_initialized(&ip->i_iopen_gh) &&
>             test_bit(HIF_HOLDER, &ip->i_iopen_gh.gh_iflags)) {
>                 ip->i_iopen_gh.gh_flags |= GL_NOCACHE;
> @@ -1640,12 +1656,22 @@ static void gfs2_evict_inode(struct inode *inode)
>         glock_set_object(ip->i_gl, NULL);
>         wait_on_bit_io(&ip->i_flags, GIF_GLOP_PENDING, TASK_UNINTERRUPTIBLE);
>         gfs2_glock_add_to_lru(ip->i_gl);
> -       gfs2_glock_put(ip->i_gl);
> +       if (current->flags & PF_MEMALLOC)
> +               gfs2_glock_queue_put(ip->i_gl);
> +       else
> +               gfs2_glock_put(ip->i_gl);
>         ip->i_gl = NULL;
>         if (gfs2_holder_initialized(&ip->i_iopen_gh)) {
> -               glock_set_object(ip->i_iopen_gh.gh_gl, NULL);
> +               struct gfs2_glock *gl = ip->i_iopen_gh.gh_gl;
> +
> +               glock_set_object(gl, NULL);
>                 ip->i_iopen_gh.gh_flags |= GL_NOCACHE;
> +               gfs2_glock_hold(gl);
>                 gfs2_glock_dq_uninit(&ip->i_iopen_gh);
> +               if (current->flags & PF_MEMALLOC)
> +                       gfs2_glock_queue_put(gl);
> +               else
> +                       gfs2_glock_put(ip->i_gl);

This should have been gfs2_glock_put(gl), obviously.

>         }
>  }
>
> --
> 2.7.4
>



^ permalink raw reply	[flat|nested] 20+ messages in thread

* [Cluster-devel] [PATCH 1/8] gfs2: Get rid of flush_delayed_work in gfs2_evict_inode
  2017-05-31 15:03 ` [Cluster-devel] [PATCH 1/8] gfs2: Get rid of flush_delayed_work in gfs2_evict_inode Andreas Gruenbacher
@ 2017-06-01 15:13   ` Steven Whitehouse
  0 siblings, 0 replies; 20+ messages in thread
From: Steven Whitehouse @ 2017-06-01 15:13 UTC (permalink / raw)
  To: cluster-devel.redhat.com



On 31/05/17 16:03, Andreas Gruenbacher wrote:
> So far, gfs2_evict_inode clears gl->gl_object and then flushes the glock
> work queue to make sure that inode glops which dereference gl->gl_object
> have finished running before the inode is destroyed.  However, flushing
> the work queue may do more work than needed, and in particular, it may
> call into DLM, which we want to avoid here.  Use a bit lock
> (GIF_GLOP_PENDING) to synchronize between the inode glops and
> gfs2_evict_inode instead to get rid of the flushing.
>
> In addition, flush the work queues of existing glocks before reusing
> them for new inodes to get those glocks into a known state: the glock
> state engine currently doesn't handle glock re-appropriation correctly.
> (We may be able to fix the glock state engine instead later.)
>
> Based on a patch by Steven Whitehouse <swhiteho@redhat.com>.
>
> Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
Acked-by: Steven Whitehouse <swhiteho@redhat.com>

Steve.

> ---
>   fs/gfs2/glock.h  |  7 +++++++
>   fs/gfs2/glops.c  | 39 ++++++++++++++++++++++++++++++++-------
>   fs/gfs2/incore.h |  1 +
>   fs/gfs2/inode.c  |  7 ++++---
>   fs/gfs2/super.c  |  4 ++--
>   5 files changed, 46 insertions(+), 12 deletions(-)
>
> diff --git a/fs/gfs2/glock.h b/fs/gfs2/glock.h
> index ab1ef32..9ad4a6a 100644
> --- a/fs/gfs2/glock.h
> +++ b/fs/gfs2/glock.h
> @@ -257,4 +257,11 @@ static inline bool gfs2_holder_initialized(struct gfs2_holder *gh)
>   	return gh->gh_gl;
>   }
>   
> +static inline void glock_set_object(struct gfs2_glock *gl, void *object)
> +{
> +	spin_lock(&gl->gl_lockref.lock);
> +	gl->gl_object = object;
> +	spin_unlock(&gl->gl_lockref.lock);
> +}
> +
>   #endif /* __GLOCK_DOT_H__ */
> diff --git a/fs/gfs2/glops.c b/fs/gfs2/glops.c
> index 5db59d4..7449b19 100644
> --- a/fs/gfs2/glops.c
> +++ b/fs/gfs2/glops.c
> @@ -197,6 +197,27 @@ static void rgrp_go_inval(struct gfs2_glock *gl, int flags)
>   		rgd->rd_flags &= ~GFS2_RDF_UPTODATE;
>   }
>   
> +static struct gfs2_inode *gfs2_glock2inode(struct gfs2_glock *gl)
> +{
> +	struct gfs2_inode *ip;
> +
> +	spin_lock(&gl->gl_lockref.lock);
> +	ip = gl->gl_object;
> +	if (ip)
> +		set_bit(GIF_GLOP_PENDING, &ip->i_flags);
> +	spin_unlock(&gl->gl_lockref.lock);
> +	return ip;
> +}
> +
> +static void gfs2_clear_glop_pending(struct gfs2_inode *ip)
> +{
> +	if (!ip)
> +		return;
> +
> +	clear_bit_unlock(GIF_GLOP_PENDING, &ip->i_flags);
> +	wake_up_bit(&ip->i_flags, GIF_GLOP_PENDING);
> +}
> +
>   /**
>    * inode_go_sync - Sync the dirty data and/or metadata for an inode glock
>    * @gl: the glock protecting the inode
> @@ -205,25 +226,24 @@ static void rgrp_go_inval(struct gfs2_glock *gl, int flags)
>   
>   static void inode_go_sync(struct gfs2_glock *gl)
>   {
> -	struct gfs2_inode *ip = gl->gl_object;
> +	struct gfs2_inode *ip = gfs2_glock2inode(gl);
> +	int isreg = ip && S_ISREG(ip->i_inode.i_mode);
>   	struct address_space *metamapping = gfs2_glock2aspace(gl);
>   	int error;
>   
> -	if (ip && !S_ISREG(ip->i_inode.i_mode))
> -		ip = NULL;
> -	if (ip) {
> +	if (isreg) {
>   		if (test_and_clear_bit(GIF_SW_PAGED, &ip->i_flags))
>   			unmap_shared_mapping_range(ip->i_inode.i_mapping, 0, 0);
>   		inode_dio_wait(&ip->i_inode);
>   	}
>   	if (!test_and_clear_bit(GLF_DIRTY, &gl->gl_flags))
> -		return;
> +		goto out;
>   
>   	GLOCK_BUG_ON(gl, gl->gl_state != LM_ST_EXCLUSIVE);
>   
>   	gfs2_log_flush(gl->gl_name.ln_sbd, gl, NORMAL_FLUSH);
>   	filemap_fdatawrite(metamapping);
> -	if (ip) {
> +	if (isreg) {
>   		struct address_space *mapping = ip->i_inode.i_mapping;
>   		filemap_fdatawrite(mapping);
>   		error = filemap_fdatawait(mapping);
> @@ -238,6 +258,9 @@ static void inode_go_sync(struct gfs2_glock *gl)
>   	 */
>   	smp_mb__before_atomic();
>   	clear_bit(GLF_DIRTY, &gl->gl_flags);
> +
> +out:
> +	gfs2_clear_glop_pending(ip);
>   }
>   
>   /**
> @@ -253,7 +276,7 @@ static void inode_go_sync(struct gfs2_glock *gl)
>   
>   static void inode_go_inval(struct gfs2_glock *gl, int flags)
>   {
> -	struct gfs2_inode *ip = gl->gl_object;
> +	struct gfs2_inode *ip = gfs2_glock2inode(gl);
>   
>   	gfs2_assert_withdraw(gl->gl_name.ln_sbd, !atomic_read(&gl->gl_ail_count));
>   
> @@ -274,6 +297,8 @@ static void inode_go_inval(struct gfs2_glock *gl, int flags)
>   	}
>   	if (ip && S_ISREG(ip->i_inode.i_mode))
>   		truncate_inode_pages(ip->i_inode.i_mapping, 0);
> +
> +	gfs2_clear_glop_pending(ip);
>   }
>   
>   /**
> diff --git a/fs/gfs2/incore.h b/fs/gfs2/incore.h
> index b7cf65d..b6f5b8d 100644
> --- a/fs/gfs2/incore.h
> +++ b/fs/gfs2/incore.h
> @@ -386,6 +386,7 @@ enum {
>   	GIF_SW_PAGED		= 3,
>   	GIF_ORDERED		= 4,
>   	GIF_FREE_VFS_INODE      = 5,
> +	GIF_GLOP_PENDING	= 6,
>   };
>   
>   struct gfs2_inode {
> diff --git a/fs/gfs2/inode.c b/fs/gfs2/inode.c
> index 9f605ea..912d4e6 100644
> --- a/fs/gfs2/inode.c
> +++ b/fs/gfs2/inode.c
> @@ -144,7 +144,8 @@ struct inode *gfs2_inode_lookup(struct super_block *sb, unsigned int type,
>   		error = gfs2_glock_get(sdp, no_addr, &gfs2_inode_glops, CREATE, &ip->i_gl);
>   		if (unlikely(error))
>   			goto fail;
> -		ip->i_gl->gl_object = ip;
> +		flush_delayed_work(&ip->i_gl->gl_work);
> +		glock_set_object(ip->i_gl, ip);
>   
>   		error = gfs2_glock_get(sdp, no_addr, &gfs2_iopen_glops, CREATE, &io_gl);
>   		if (unlikely(error))
> @@ -173,8 +174,8 @@ struct inode *gfs2_inode_lookup(struct super_block *sb, unsigned int type,
>   		error = gfs2_glock_nq_init(io_gl, LM_ST_SHARED, GL_EXACT, &ip->i_iopen_gh);
>   		if (unlikely(error))
>   			goto fail_put;
> -
> -		ip->i_iopen_gh.gh_gl->gl_object = ip;
> +		flush_delayed_work(&ip->i_iopen_gh.gh_gl->gl_work);
> +		glock_set_object(ip->i_iopen_gh.gh_gl, ip);
>   		gfs2_glock_put(io_gl);
>   		io_gl = NULL;
>   
> diff --git a/fs/gfs2/super.c b/fs/gfs2/super.c
> index 29b0473..7d12c12 100644
> --- a/fs/gfs2/super.c
> +++ b/fs/gfs2/super.c
> @@ -1631,8 +1631,8 @@ static void gfs2_evict_inode(struct inode *inode)
>   	gfs2_ordered_del_inode(ip);
>   	clear_inode(inode);
>   	gfs2_dir_hash_inval(ip);
> -	ip->i_gl->gl_object = NULL;
> -	flush_delayed_work(&ip->i_gl->gl_work);
> +	glock_set_object(ip->i_gl, NULL);
> +	wait_on_bit_io(&ip->i_flags, GIF_GLOP_PENDING, TASK_UNINTERRUPTIBLE);
>   	gfs2_glock_add_to_lru(ip->i_gl);
>   	gfs2_glock_put(ip->i_gl);
>   	ip->i_gl = NULL;



^ permalink raw reply	[flat|nested] 20+ messages in thread

* [Cluster-devel] [PATCH 2/8] gfs2: Protect gl->gl_object by spin lock
  2017-05-31 15:03 ` [Cluster-devel] [PATCH 2/8] gfs2: Protect gl->gl_object by spin lock Andreas Gruenbacher
@ 2017-06-01 15:16   ` Steven Whitehouse
  0 siblings, 0 replies; 20+ messages in thread
From: Steven Whitehouse @ 2017-06-01 15:16 UTC (permalink / raw)
  To: cluster-devel.redhat.com



On 31/05/17 16:03, Andreas Gruenbacher wrote:
> Put all remaining accesses to gl->gl_object under the
> gl->gl_lockref.lock spinlock to prevent races.
>
> Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
This looks like a good thing to do, however since we have now 
gfs2_glock2inode() should not also have gfs2_glock2rgrp and hide the 
locking behind that? It would be good to keep things consistent I think,

Steve.

> ---
>   fs/gfs2/bmap.c  |  7 ++++++-
>   fs/gfs2/dir.c   |  6 +++++-
>   fs/gfs2/glops.c | 10 ++++++++--
>   fs/gfs2/inode.c |  8 ++++----
>   fs/gfs2/lops.c  |  9 +++++++--
>   fs/gfs2/rgrp.c  |  6 ++----
>   fs/gfs2/super.c | 11 ++++++++---
>   fs/gfs2/xattr.c |  6 +++++-
>   8 files changed, 45 insertions(+), 18 deletions(-)
>
> diff --git a/fs/gfs2/bmap.c b/fs/gfs2/bmap.c
> index 4d810be..e43decb 100644
> --- a/fs/gfs2/bmap.c
> +++ b/fs/gfs2/bmap.c
> @@ -970,7 +970,12 @@ static int sweep_bh_for_rgrps(struct gfs2_inode *ip, struct gfs2_holder *rd_gh,
>   			continue;
>   		bn = be64_to_cpu(*p);
>   		if (gfs2_holder_initialized(rd_gh)) {
> -			rgd = (struct gfs2_rgrpd *)rd_gh->gh_gl->gl_object;
> +			struct gfs2_glock *gl = rd_gh->gh_gl;
> +
> +			spin_lock(&gl->gl_lockref.lock);
> +			rgd = rd_gh->gh_gl->gl_object;
> +			spin_unlock(&gl->gl_lockref.lock);
> +
>   			gfs2_assert_withdraw(sdp,
>   				     gfs2_glock_is_locked_by_me(rd_gh->gh_gl));
>   		} else {
> diff --git a/fs/gfs2/dir.c b/fs/gfs2/dir.c
> index 7911321..965f1c7 100644
> --- a/fs/gfs2/dir.c
> +++ b/fs/gfs2/dir.c
> @@ -2031,9 +2031,13 @@ static int leaf_dealloc(struct gfs2_inode *dip, u32 index, u32 len,
>   	gfs2_rlist_alloc(&rlist, LM_ST_EXCLUSIVE);
>   
>   	for (x = 0; x < rlist.rl_rgrps; x++) {
> +		struct gfs2_glock *gl = rlist.rl_ghs[x].gh_gl;
>   		struct gfs2_rgrpd *rgd;
> -		rgd = rlist.rl_ghs[x].gh_gl->gl_object;
> +
> +		spin_lock(&gl->gl_lockref.lock);
> +		rgd = gl->gl_object;
>   		rg_blocks += rgd->rd_length;
> +		spin_unlock(&gl->gl_lockref.lock);
>   	}
>   
>   	error = gfs2_glock_nq_m(rlist.rl_rgrps, rlist.rl_ghs);
> diff --git a/fs/gfs2/glops.c b/fs/gfs2/glops.c
> index 7449b19..238a78c 100644
> --- a/fs/gfs2/glops.c
> +++ b/fs/gfs2/glops.c
> @@ -184,17 +184,23 @@ static void rgrp_go_inval(struct gfs2_glock *gl, int flags)
>   {
>   	struct gfs2_sbd *sdp = gl->gl_name.ln_sbd;
>   	struct address_space *mapping = &sdp->sd_aspace;
> -	struct gfs2_rgrpd *rgd = gl->gl_object;
> +	struct gfs2_rgrpd *rgd;
>   
> +	spin_lock(&gl->gl_lockref.lock);
> +	rgd = gl->gl_object;
>   	if (rgd)
>   		gfs2_rgrp_brelse(rgd);
> +	spin_unlock(&gl->gl_lockref.lock);
>   
>   	WARN_ON_ONCE(!(flags & DIO_METADATA));
>   	gfs2_assert_withdraw(sdp, !atomic_read(&gl->gl_ail_count));
>   	truncate_inode_pages_range(mapping, gl->gl_vm.start, gl->gl_vm.end);
>   
> +	spin_lock(&gl->gl_lockref.lock);
> +	rgd = gl->gl_object;
>   	if (rgd)
>   		rgd->rd_flags &= ~GFS2_RDF_UPTODATE;
> +	spin_unlock(&gl->gl_lockref.lock);
>   }
>   
>   static struct gfs2_inode *gfs2_glock2inode(struct gfs2_glock *gl)
> @@ -566,7 +572,7 @@ static int freeze_go_demote_ok(const struct gfs2_glock *gl)
>    */
>   static void iopen_go_callback(struct gfs2_glock *gl, bool remote)
>   {
> -	struct gfs2_inode *ip = (struct gfs2_inode *)gl->gl_object;
> +	struct gfs2_inode *ip = gl->gl_object;
>   	struct gfs2_sbd *sdp = gl->gl_name.ln_sbd;
>   
>   	if (!remote || (sdp->sd_vfs->s_flags & MS_RDONLY))
> diff --git a/fs/gfs2/inode.c b/fs/gfs2/inode.c
> index 912d4e6..50108fa 100644
> --- a/fs/gfs2/inode.c
> +++ b/fs/gfs2/inode.c
> @@ -202,14 +202,14 @@ struct inode *gfs2_inode_lookup(struct super_block *sb, unsigned int type,
>   
>   fail_refresh:
>   	ip->i_iopen_gh.gh_flags |= GL_NOCACHE;
> -	ip->i_iopen_gh.gh_gl->gl_object = NULL;
> +	glock_set_object(ip->i_iopen_gh.gh_gl, NULL);
>   	gfs2_glock_dq_uninit(&ip->i_iopen_gh);
>   fail_put:
>   	if (io_gl)
>   		gfs2_glock_put(io_gl);
>   	if (gfs2_holder_initialized(&i_gh))
>   		gfs2_glock_dq_uninit(&i_gh);
> -	ip->i_gl->gl_object = NULL;
> +	glock_set_object(ip->i_gl, NULL);
>   fail:
>   	iget_failed(inode);
>   	return ERR_PTR(error);
> @@ -706,7 +706,7 @@ static int gfs2_create_inode(struct inode *dir, struct dentry *dentry,
>   	if (error)
>   		goto fail_free_inode;
>   
> -	ip->i_gl->gl_object = ip;
> +	glock_set_object(ip->i_gl, ip);
>   	error = gfs2_glock_nq_init(ip->i_gl, LM_ST_EXCLUSIVE, GL_SKIP, ghs + 1);
>   	if (error)
>   		goto fail_free_inode;
> @@ -732,7 +732,7 @@ static int gfs2_create_inode(struct inode *dir, struct dentry *dentry,
>   	if (error)
>   		goto fail_gunlock2;
>   
> -	ip->i_iopen_gh.gh_gl->gl_object = ip;
> +	glock_set_object(ip->i_iopen_gh.gh_gl, ip);
>   	gfs2_glock_put(io_gl);
>   	gfs2_set_iop(inode);
>   	insert_inode_hash(inode);
> diff --git a/fs/gfs2/lops.c b/fs/gfs2/lops.c
> index b1f9144..93c1074 100644
> --- a/fs/gfs2/lops.c
> +++ b/fs/gfs2/lops.c
> @@ -71,10 +71,15 @@ static void maybe_release_space(struct gfs2_bufdata *bd)
>   {
>   	struct gfs2_glock *gl = bd->bd_gl;
>   	struct gfs2_sbd *sdp = gl->gl_name.ln_sbd;
> -	struct gfs2_rgrpd *rgd = gl->gl_object;
> +	struct gfs2_rgrpd *rgd;
>   	unsigned int index = bd->bd_bh->b_blocknr - gl->gl_name.ln_number;
> -	struct gfs2_bitmap *bi = rgd->rd_bits + index;
> +	struct gfs2_bitmap *bi;
>   
> +	spin_lock(&gl->gl_lockref.lock);
> +	rgd = gl->gl_object;
> +	spin_unlock(&gl->gl_lockref.lock);
> +
> +	bi = rgd->rd_bits + index;
>   	if (bi->bi_clone == NULL)
>   		return;
>   	if (sdp->sd_args.ar_discard)
> diff --git a/fs/gfs2/rgrp.c b/fs/gfs2/rgrp.c
> index 83c9909..836e38b 100644
> --- a/fs/gfs2/rgrp.c
> +++ b/fs/gfs2/rgrp.c
> @@ -705,9 +705,7 @@ void gfs2_clear_rgrpd(struct gfs2_sbd *sdp)
>   		rb_erase(n, &sdp->sd_rindex_tree);
>   
>   		if (gl) {
> -			spin_lock(&gl->gl_lockref.lock);
> -			gl->gl_object = NULL;
> -			spin_unlock(&gl->gl_lockref.lock);
> +			glock_set_object(gl, NULL);
>   			gfs2_glock_add_to_lru(gl);
>   			gfs2_glock_put(gl);
>   		}
> @@ -917,7 +915,7 @@ static int read_rindex_entry(struct gfs2_inode *ip)
>   	error = rgd_insert(rgd);
>   	spin_unlock(&sdp->sd_rindex_spin);
>   	if (!error) {
> -		rgd->rd_gl->gl_object = rgd;
> +		glock_set_object(rgd->rd_gl, rgd);
>   		rgd->rd_gl->gl_vm.start = (rgd->rd_addr * bsize) & PAGE_MASK;
>   		rgd->rd_gl->gl_vm.end = PAGE_ALIGN((rgd->rd_addr +
>   						    rgd->rd_length) * bsize) - 1;
> diff --git a/fs/gfs2/super.c b/fs/gfs2/super.c
> index 7d12c12..554bd0c 100644
> --- a/fs/gfs2/super.c
> +++ b/fs/gfs2/super.c
> @@ -1105,9 +1105,14 @@ static int gfs2_statfs_slow(struct gfs2_sbd *sdp, struct gfs2_statfs_change_host
>   					gfs2_holder_uninit(gh);
>   					error = err;
>   				} else {
> +					struct gfs2_glock *gl = gh->gh_gl;
> +					struct gfs2_rgrpd *rgd;
> +
> +					spin_lock(&gl->gl_lockref.lock);
> +					rgd = gl->gl_object;
> +					spin_unlock(&gl->gl_lockref.lock);
>   					if (!error)
> -						error = statfs_slow_fill(
> -							gh->gh_gl->gl_object, sc);
> +						error = statfs_slow_fill(rgd, sc);
>   					gfs2_glock_dq_uninit(gh);
>   				}
>   			}
> @@ -1637,7 +1642,7 @@ static void gfs2_evict_inode(struct inode *inode)
>   	gfs2_glock_put(ip->i_gl);
>   	ip->i_gl = NULL;
>   	if (gfs2_holder_initialized(&ip->i_iopen_gh)) {
> -		ip->i_iopen_gh.gh_gl->gl_object = NULL;
> +		glock_set_object(ip->i_iopen_gh.gh_gl, NULL);
>   		ip->i_iopen_gh.gh_flags |= GL_NOCACHE;
>   		gfs2_glock_dq_uninit(&ip->i_iopen_gh);
>   	}
> diff --git a/fs/gfs2/xattr.c b/fs/gfs2/xattr.c
> index d87721a..4774584 100644
> --- a/fs/gfs2/xattr.c
> +++ b/fs/gfs2/xattr.c
> @@ -1327,9 +1327,13 @@ static int ea_dealloc_indirect(struct gfs2_inode *ip)
>   	gfs2_rlist_alloc(&rlist, LM_ST_EXCLUSIVE);
>   
>   	for (x = 0; x < rlist.rl_rgrps; x++) {
> +		struct gfs2_glock *gl = rlist.rl_ghs[x].gh_gl;
>   		struct gfs2_rgrpd *rgd;
> -		rgd = rlist.rl_ghs[x].gh_gl->gl_object;
> +
> +		spin_lock(&gl->gl_lockref.lock);
> +		rgd = gl->gl_object;
>   		rg_blocks += rgd->rd_length;
> +		spin_unlock(&gl->gl_lockref.lock);
>   	}
>   
>   	error = gfs2_glock_nq_m(rlist.rl_rgrps, rlist.rl_ghs);



^ permalink raw reply	[flat|nested] 20+ messages in thread

* [Cluster-devel] [PATCH 3/8] gfs2: Clean up glock work enqueuing
  2017-05-31 15:03 ` [Cluster-devel] [PATCH 3/8] gfs2: Clean up glock work enqueuing Andreas Gruenbacher
@ 2017-06-01 15:21   ` Steven Whitehouse
  2017-06-01 15:43     ` Andreas Gruenbacher
  0 siblings, 1 reply; 20+ messages in thread
From: Steven Whitehouse @ 2017-06-01 15:21 UTC (permalink / raw)
  To: cluster-devel.redhat.com

Hi,


On 31/05/17 16:03, Andreas Gruenbacher wrote:
> Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
Looks generally like a nice clean up, but...
> ---
>   fs/gfs2/glock.c | 102 +++++++++++++++++++++++++++++++++-----------------------
>   1 file changed, 61 insertions(+), 41 deletions(-)
>
> diff --git a/fs/gfs2/glock.c b/fs/gfs2/glock.c
> index 959a19c..6d7147c8 100644
> --- a/fs/gfs2/glock.c
> +++ b/fs/gfs2/glock.c
> @@ -152,6 +152,29 @@ static void gfs2_glock_remove_from_lru(struct gfs2_glock *gl)
>   	spin_unlock(&lru_lock);
>   }
>   
> +/*
> + * Enqueue the glock on the work queue.  Passes one glock reference on to the
> + * work queue.
> + */
> +static void __gfs2_glock_queue_work(struct gfs2_glock *gl, unsigned long delay) {
> +	if (!queue_delayed_work(glock_workqueue, &gl->gl_work, delay)) {
> +		/*
> +		 * We are holding the lockref spinlock, and the work was still
> +		 * queued above.  The queued work (glock_work_func) takes that
> +		 * spinlock before dropping its glock reference(s), so it
> +		 * cannot have dropped them in the meantime.
> +		 */
> +		GLOCK_BUG_ON(gl, gl->gl_lockref.count < 2);
> +		gl->gl_lockref.count--;
> +	}
> +}
> +
> +static void gfs2_glock_queue_work(struct gfs2_glock *gl, unsigned long delay) {
> +	spin_lock(&gl->gl_lockref.lock);
> +	__gfs2_glock_queue_work(gl, delay);
> +	spin_unlock(&gl->gl_lockref.lock);
> +}
> +
>   /**
>    * gfs2_glock_put() - Decrement reference count on glock
>    * @gl: The glock to put
> @@ -482,8 +505,7 @@ __acquires(&gl->gl_lockref.lock)
>   		    target == LM_ST_UNLOCKED &&
>   		    test_bit(SDF_SKIP_DLM_UNLOCK, &sdp->sd_flags)) {
>   			finish_xmote(gl, target);
> -			if (queue_delayed_work(glock_workqueue, &gl->gl_work, 0) == 0)
> -				gfs2_glock_put(gl);
> +			gfs2_glock_queue_work(gl, 0);
>   		}
>   		else if (ret) {
>   			pr_err("lm_lock ret %d\n", ret);
> @@ -492,8 +514,7 @@ __acquires(&gl->gl_lockref.lock)
>   		}
>   	} else { /* lock_nolock */
>   		finish_xmote(gl, target);
> -		if (queue_delayed_work(glock_workqueue, &gl->gl_work, 0) == 0)
> -			gfs2_glock_put(gl);
> +		gfs2_glock_queue_work(gl, 0);
>   	}
>   
>   	spin_lock(&gl->gl_lockref.lock);
> @@ -565,8 +586,7 @@ __acquires(&gl->gl_lockref.lock)
>   	clear_bit(GLF_LOCK, &gl->gl_flags);
>   	smp_mb__after_atomic();
>   	gl->gl_lockref.count++;
> -	if (queue_delayed_work(glock_workqueue, &gl->gl_work, 0) == 0)
> -		gl->gl_lockref.count--;
> +	__gfs2_glock_queue_work(gl, 0);
>   	return;
>   
>   out_unlock:
> @@ -601,11 +621,11 @@ static void glock_work_func(struct work_struct *work)
>   {
>   	unsigned long delay = 0;
>   	struct gfs2_glock *gl = container_of(work, struct gfs2_glock, gl_work.work);
> -	int drop_ref = 0;
> +	unsigned int drop_refs = 1;
>   
>   	if (test_and_clear_bit(GLF_REPLY_PENDING, &gl->gl_flags)) {
>   		finish_xmote(gl, gl->gl_reply);
> -		drop_ref = 1;
> +		drop_refs++;
>   	}
>   	spin_lock(&gl->gl_lockref.lock);
>   	if (test_bit(GLF_PENDING_DEMOTE, &gl->gl_flags) &&
> @@ -623,16 +643,25 @@ static void glock_work_func(struct work_struct *work)
>   		}
>   	}
>   	run_queue(gl, 0);
> -	spin_unlock(&gl->gl_lockref.lock);
> -	if (!delay)
> -		gfs2_glock_put(gl);
> -	else {
> +	if (delay) {
> +		/* Keep one glock reference for the work we requeue. */
> +		drop_refs--;
>   		if (gl->gl_name.ln_type != LM_TYPE_INODE)
>   			delay = 0;
> -		if (queue_delayed_work(glock_workqueue, &gl->gl_work, delay) == 0)
> -			gfs2_glock_put(gl);
> +		__gfs2_glock_queue_work(gl, delay);
> +	}
> +	if (drop_refs > 1) {
> +		gl->gl_lockref.count -= drop_refs - 1;
> +		drop_refs = 1;
>   	}
This looks very confusing... is there no cleaner way to write this? 
After all the effort of cleaning up the queuing of the work, this could 
also do with tidying up too. Otherwise I think the patch looks good,

Steve.

> -	if (drop_ref)
> +	spin_unlock(&gl->gl_lockref.lock);
> +
> +	/*
> +	 * Drop the remaining glock reference after grabbing (and releasing)
> +	 * the lockref spinlock: this allows tasks to safely drop glock
> +	 * references under that spinlock (see __gfs2_glock_queue_work).
> +	 */
> +	if (drop_refs)
>   		gfs2_glock_put(gl);
>   }
>   
> @@ -986,8 +1015,7 @@ int gfs2_glock_nq(struct gfs2_holder *gh)
>   		     test_and_clear_bit(GLF_FROZEN, &gl->gl_flags))) {
>   		set_bit(GLF_REPLY_PENDING, &gl->gl_flags);
>   		gl->gl_lockref.count++;
> -		if (queue_delayed_work(glock_workqueue, &gl->gl_work, 0) == 0)
> -			gl->gl_lockref.count--;
> +		__gfs2_glock_queue_work(gl, 0);
>   	}
>   	run_queue(gl, 1);
>   	spin_unlock(&gl->gl_lockref.lock);
> @@ -1047,17 +1075,15 @@ void gfs2_glock_dq(struct gfs2_holder *gh)
>   		gfs2_glock_add_to_lru(gl);
>   
>   	trace_gfs2_glock_queue(gh, 0);
> +	if (unlikely(!fast_path)) {
> +		gl->gl_lockref.count++;
> +		if (test_bit(GLF_PENDING_DEMOTE, &gl->gl_flags) &&
> +		    !test_bit(GLF_DEMOTE, &gl->gl_flags) &&
> +		    gl->gl_name.ln_type == LM_TYPE_INODE)
> +			delay = gl->gl_hold_time;
> +		__gfs2_glock_queue_work(gl, delay);
> +	}
>   	spin_unlock(&gl->gl_lockref.lock);
> -	if (likely(fast_path))
> -		return;
> -
> -	gfs2_glock_hold(gl);
> -	if (test_bit(GLF_PENDING_DEMOTE, &gl->gl_flags) &&
> -	    !test_bit(GLF_DEMOTE, &gl->gl_flags) &&
> -	    gl->gl_name.ln_type == LM_TYPE_INODE)
> -		delay = gl->gl_hold_time;
> -	if (queue_delayed_work(glock_workqueue, &gl->gl_work, delay) == 0)
> -		gfs2_glock_put(gl);
>   }
>   
>   void gfs2_glock_dq_wait(struct gfs2_holder *gh)
> @@ -1233,9 +1259,8 @@ void gfs2_glock_cb(struct gfs2_glock *gl, unsigned int state)
>   
>   	spin_lock(&gl->gl_lockref.lock);
>   	handle_callback(gl, state, delay, true);
> +	__gfs2_glock_queue_work(gl, delay);
>   	spin_unlock(&gl->gl_lockref.lock);
> -	if (queue_delayed_work(glock_workqueue, &gl->gl_work, delay) == 0)
> -		gfs2_glock_put(gl);
>   }
>   
>   /**
> @@ -1294,10 +1319,8 @@ void gfs2_glock_complete(struct gfs2_glock *gl, int ret)
>   
>   	gl->gl_lockref.count++;
>   	set_bit(GLF_REPLY_PENDING, &gl->gl_flags);
> +	__gfs2_glock_queue_work(gl, 0);
>   	spin_unlock(&gl->gl_lockref.lock);
> -
> -	if (queue_delayed_work(glock_workqueue, &gl->gl_work, 0) == 0)
> -		gfs2_glock_put(gl);
>   }
>   
>   static int glock_cmp(void *priv, struct list_head *a, struct list_head *b)
> @@ -1355,8 +1378,7 @@ __acquires(&lru_lock)
>   		if (demote_ok(gl))
>   			handle_callback(gl, LM_ST_UNLOCKED, 0, false);
>   		WARN_ON(!test_and_clear_bit(GLF_LOCK, &gl->gl_flags));
> -		if (queue_delayed_work(glock_workqueue, &gl->gl_work, 0) == 0)
> -			gl->gl_lockref.count--;
> +		__gfs2_glock_queue_work(gl, 0);
>   		spin_unlock(&gl->gl_lockref.lock);
>   		cond_resched_lock(&lru_lock);
>   	}
> @@ -1462,13 +1484,12 @@ static void glock_hash_walk(glock_examiner examiner, const struct gfs2_sbd *sdp)
>   
>   static void thaw_glock(struct gfs2_glock *gl)
>   {
> -	if (!test_and_clear_bit(GLF_FROZEN, &gl->gl_flags))
> -		goto out;
> -	set_bit(GLF_REPLY_PENDING, &gl->gl_flags);
> -	if (queue_delayed_work(glock_workqueue, &gl->gl_work, 0) == 0) {
> -out:
> +	if (!test_and_clear_bit(GLF_FROZEN, &gl->gl_flags)) {
>   		gfs2_glock_put(gl);
> +		return;
>   	}
> +	set_bit(GLF_REPLY_PENDING, &gl->gl_flags);
> +	gfs2_glock_queue_work(gl, 0);
>   }
>   
>   /**
> @@ -1484,9 +1505,8 @@ static void clear_glock(struct gfs2_glock *gl)
>   	spin_lock(&gl->gl_lockref.lock);
>   	if (gl->gl_state != LM_ST_UNLOCKED)
>   		handle_callback(gl, LM_ST_UNLOCKED, 0, false);
> +	__gfs2_glock_queue_work(gl, 0);
>   	spin_unlock(&gl->gl_lockref.lock);
> -	if (queue_delayed_work(glock_workqueue, &gl->gl_work, 0) == 0)
> -		gfs2_glock_put(gl);
>   }
>   
>   /**



^ permalink raw reply	[flat|nested] 20+ messages in thread

* [Cluster-devel] [PATCH 7/8] gfs2: gfs2_evict_inode: Put glocks asynchronously
  2017-05-31 15:03 ` [Cluster-devel] [PATCH 7/8] gfs2: gfs2_evict_inode: Put glocks asynchronously Andreas Gruenbacher
  2017-06-01 10:51   ` Andreas Gruenbacher
@ 2017-06-01 15:37   ` Steven Whitehouse
  2017-06-02 13:57     ` Andreas Gruenbacher
  1 sibling, 1 reply; 20+ messages in thread
From: Steven Whitehouse @ 2017-06-01 15:37 UTC (permalink / raw)
  To: cluster-devel.redhat.com

Hi,

comments below...


On 31/05/17 16:03, Andreas Gruenbacher wrote:
> gfs2_evict_inode is called to free inodes under memory pressure.  The
> function calls into DLM when an inode's last cluster-wide reference goes
> away (remote unlink) and to release the glock and associated DLM lock
> before finally destroying the inode.  However, if DLM is blocked on
> memory to become available, calling into DLM again will deadlock.
>
> Avoid that by decoupling releasing glocks from destroying inodes in that
> case: with gfs2_glock_queue_put, glocks will be dequeued asynchronously
> in work queue context, when the associated inodes have most likely
> already been destroyed.
>
> With this change, it appears that inodes can end up being unlinked,
> remote-unlink can be triggered, and then the inode can be reallocated
> before all remote-unlink callbacks are processed.  Revalidate the link
> count in gfs2_evict_inode to make sure we're not destroying an
> allocated, referenced inode.
>
> In addition, skip remote unlinks under memory pressure; the next inode
> allocation in the same resource group will take care of destroying
> unlinked inodes.
>
> Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
> ---
>   fs/gfs2/glock.c | 10 +++++++++-
>   fs/gfs2/glock.h |  2 ++
>   fs/gfs2/super.c | 30 ++++++++++++++++++++++++++++--
>   3 files changed, 39 insertions(+), 3 deletions(-)
>
> diff --git a/fs/gfs2/glock.c b/fs/gfs2/glock.c
> index e6d32d2..4ba53e9 100644
> --- a/fs/gfs2/glock.c
> +++ b/fs/gfs2/glock.c
> @@ -170,7 +170,7 @@ void gfs2_glock_free(struct gfs2_glock *gl)
>    *
>    */
>   
> -static void gfs2_glock_hold(struct gfs2_glock *gl)
> +void gfs2_glock_hold(struct gfs2_glock *gl)
>   {
>   	GLOCK_BUG_ON(gl, __lockref_is_dead(&gl->gl_lockref));
>   	lockref_get(&gl->gl_lockref);
> @@ -269,6 +269,14 @@ void gfs2_glock_put(struct gfs2_glock *gl)
>   	sdp->sd_lockstruct.ls_ops->lm_put_lock(gl);
>   }
>   
> +/*
> + * Cause the glock to be put in work queue context.
> + */
> +void gfs2_glock_queue_put(struct gfs2_glock *gl)
> +{
> +	gfs2_glock_queue_work(gl, 0);
> +}
> +
This is confusing. If the work queue is already scheduled, then it will 
simply drop the ref count on the glock by one directly and not 
reschedule the work item. That means that if the ref count were to hit 
zero, then that state would potentially by missed, maybe that can't 
happen in this case, but it isn't clear that it can't at first glance.

>   /**
>    * may_grant - check if its ok to grant a new lock
>    * @gl: The glock
> diff --git a/fs/gfs2/glock.h b/fs/gfs2/glock.h
> index 9ad4a6a..33e0511 100644
> --- a/fs/gfs2/glock.h
> +++ b/fs/gfs2/glock.h
> @@ -181,7 +181,9 @@ static inline struct address_space *gfs2_glock2aspace(struct gfs2_glock *gl)
>   extern int gfs2_glock_get(struct gfs2_sbd *sdp, u64 number,
>   			  const struct gfs2_glock_operations *glops,
>   			  int create, struct gfs2_glock **glp);
> +extern void gfs2_glock_hold(struct gfs2_glock *gl);
>   extern void gfs2_glock_put(struct gfs2_glock *gl);
> +extern void gfs2_glock_queue_put(struct gfs2_glock *gl);
>   extern void gfs2_holder_init(struct gfs2_glock *gl, unsigned int state,
>   			     u16 flags, struct gfs2_holder *gh);
>   extern void gfs2_holder_reinit(unsigned int state, u16 flags,
> diff --git a/fs/gfs2/super.c b/fs/gfs2/super.c
> index c651983..ace4814 100644
> --- a/fs/gfs2/super.c
> +++ b/fs/gfs2/super.c
> @@ -1541,6 +1541,16 @@ static void gfs2_evict_inode(struct inode *inode)
>   	if (inode->i_nlink || (sb->s_flags & MS_RDONLY))
>   		goto out;
>   
> +	/*
> +	 * If we are in shrinker context, DLM may depend on us to make
> +	 * progress.  In that case, calling into DLM again would deadlock.  To
> +	 * prevent that from happening, skip deallocating the inode here; it
> +	 * will be deallocated when another inode is allocated in the same
> +	 * resource group.
> +	 */
> +	if (current->flags & PF_MEMALLOC)
> +		goto out;
> +
That is not quick enough, and in fact I'm not sure it is true either. We 
don't search the whole rgrp when we go to allocate new blocks, we only 
look when we are short of space. So the question is what causes the 
inode to be deallocated in this case? We need to be able to do:

rm foo
df

and see that the space taken up by foo has been returned, otherwise we 
are substituting one bug for another one.


>   	/* Must not read inode block until block type has been verified */
>   	error = gfs2_glock_nq_init(ip->i_gl, LM_ST_EXCLUSIVE, GL_SKIP, &gh);
>   	if (unlikely(error)) {
> @@ -1561,6 +1571,12 @@ static void gfs2_evict_inode(struct inode *inode)
>   			goto out_truncate;
>   	}
>   
> +	/*
> +	 * The inode may have been recreated in the meantime.
> +	 */
> +	if (inode->i_nlink)
> +		goto out_truncate;
> +
>   	if (gfs2_holder_initialized(&ip->i_iopen_gh) &&
>   	    test_bit(HIF_HOLDER, &ip->i_iopen_gh.gh_iflags)) {
>   		ip->i_iopen_gh.gh_flags |= GL_NOCACHE;
> @@ -1640,12 +1656,22 @@ static void gfs2_evict_inode(struct inode *inode)
>   	glock_set_object(ip->i_gl, NULL);
>   	wait_on_bit_io(&ip->i_flags, GIF_GLOP_PENDING, TASK_UNINTERRUPTIBLE);
>   	gfs2_glock_add_to_lru(ip->i_gl);
> -	gfs2_glock_put(ip->i_gl);
> +	if (current->flags & PF_MEMALLOC)
> +		gfs2_glock_queue_put(ip->i_gl);
> +	else
> +		gfs2_glock_put(ip->i_gl);
>   	ip->i_gl = NULL;
>   	if (gfs2_holder_initialized(&ip->i_iopen_gh)) {
> -		glock_set_object(ip->i_iopen_gh.gh_gl, NULL);
> +		struct gfs2_glock *gl = ip->i_iopen_gh.gh_gl;
> +
> +		glock_set_object(gl, NULL);
>   		ip->i_iopen_gh.gh_flags |= GL_NOCACHE;
> +		gfs2_glock_hold(gl);
>   		gfs2_glock_dq_uninit(&ip->i_iopen_gh);
> +		if (current->flags & PF_MEMALLOC)
> +			gfs2_glock_queue_put(gl);
> +		else
> +			gfs2_glock_put(ip->i_gl);
>   	}
>   }
>   



^ permalink raw reply	[flat|nested] 20+ messages in thread

* [Cluster-devel] [PATCH 8/8] gfs2: Warn when not deleting inodes under memory pressure
  2017-05-31 15:03 ` [Cluster-devel] [PATCH 8/8] gfs2: Warn when not deleting inodes under memory pressure Andreas Gruenbacher
@ 2017-06-01 15:39   ` Steven Whitehouse
  0 siblings, 0 replies; 20+ messages in thread
From: Steven Whitehouse @ 2017-06-01 15:39 UTC (permalink / raw)
  To: cluster-devel.redhat.com

Hi,


On 31/05/17 16:03, Andreas Gruenbacher wrote:
> Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
> ---
>   fs/gfs2/super.c | 5 ++++-
>   1 file changed, 4 insertions(+), 1 deletion(-)
>
> diff --git a/fs/gfs2/super.c b/fs/gfs2/super.c
> index ace4814..3637662 100644
> --- a/fs/gfs2/super.c
> +++ b/fs/gfs2/super.c
> @@ -1548,8 +1548,11 @@ static void gfs2_evict_inode(struct inode *inode)
>   	 * will be deallocated when another inode is allocated in the same
>   	 * resource group.
>   	 */
> -	if (current->flags & PF_MEMALLOC)
> +	if (current->flags & PF_MEMALLOC) {
> +		fs_warn(sdp, "not deleting inode %llu under memory pressure\n",
> +			(unsigned long long)ip->i_no_addr);
>   		goto out;
> +	}
>   
>   	/* Must not read inode block until block type has been verified */
>   	error = gfs2_glock_nq_init(ip->i_gl, LM_ST_EXCLUSIVE, GL_SKIP, &gh);

NACK for this one. We shouldn't be printing out messages in this case, 
we just need to handle the situation properly,

Steve.



^ permalink raw reply	[flat|nested] 20+ messages in thread

* [Cluster-devel] [PATCH 3/8] gfs2: Clean up glock work enqueuing
  2017-06-01 15:21   ` Steven Whitehouse
@ 2017-06-01 15:43     ` Andreas Gruenbacher
  2017-06-01 15:46       ` Steven Whitehouse
  0 siblings, 1 reply; 20+ messages in thread
From: Andreas Gruenbacher @ 2017-06-01 15:43 UTC (permalink / raw)
  To: cluster-devel.redhat.com

On Thu, Jun 1, 2017 at 5:21 PM, Steven Whitehouse <swhiteho@redhat.com> wrote:
> On 31/05/17 16:03, Andreas Gruenbacher wrote:
> Looks generally like a nice clean up, but...
> [...]
>> @@ -623,16 +643,25 @@ static void glock_work_func(struct work_struct *work)
>>                 }
>>         }
>>         run_queue(gl, 0);
>> -       spin_unlock(&gl->gl_lockref.lock);
>> -       if (!delay)
>> -               gfs2_glock_put(gl);
>> -       else {
>> +       if (delay) {
>> +               /* Keep one glock reference for the work we requeue. */
>> +               drop_refs--;
>>                 if (gl->gl_name.ln_type != LM_TYPE_INODE)
>>                         delay = 0;
>> -               if (queue_delayed_work(glock_workqueue, &gl->gl_work, delay) == 0)
>> -                       gfs2_glock_put(gl);
>> +               __gfs2_glock_queue_work(gl, delay);
>> +       }
>> +       if (drop_refs > 1) {
>> +               gl->gl_lockref.count -= drop_refs - 1;
>> +               drop_refs = 1;
>>         }
>
> This looks very confusing... is there no cleaner way to write this? After
> all the effort of cleaning up the queuing of the work, this could also do
> with tidying up too. Otherwise I think the patch looks good,

You mean like below, removing the refcount optimization?

Andreas

---
 fs/gfs2/glock.c | 6 +-----
 1 file changed, 1 insertion(+), 5 deletions(-)

diff --git a/fs/gfs2/glock.c b/fs/gfs2/glock.c
index 51ec6de..f833f1b 100644
--- a/fs/gfs2/glock.c
+++ b/fs/gfs2/glock.c
@@ -732,10 +732,6 @@ static void glock_work_func(struct work_struct *work)
 			delay = 0;
 		__gfs2_glock_queue_work(gl, delay);
 	}
-	if (drop_refs > 1) {
-		gl->gl_lockref.count -= drop_refs - 1;
-		drop_refs = 1;
-	}
 	spin_unlock(&gl->gl_lockref.lock);
 
 	/*
@@ -743,7 +739,7 @@ static void glock_work_func(struct work_struct *work)
 	 * the lockref spinlock: this allows tasks to safely drop glock
 	 * references under that spinlock (see __gfs2_glock_queue_work).
 	 */
-	if (drop_refs)
+	while (drop_refs--)
 		gfs2_glock_put(gl);
 }
 
-- 
2.7.4



^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [Cluster-devel] [PATCH 3/8] gfs2: Clean up glock work enqueuing
  2017-06-01 15:43     ` Andreas Gruenbacher
@ 2017-06-01 15:46       ` Steven Whitehouse
  2017-06-02  9:19         ` Andreas Gruenbacher
  0 siblings, 1 reply; 20+ messages in thread
From: Steven Whitehouse @ 2017-06-01 15:46 UTC (permalink / raw)
  To: cluster-devel.redhat.com

Hi,


On 01/06/17 16:43, Andreas Gruenbacher wrote:
> On Thu, Jun 1, 2017 at 5:21 PM, Steven Whitehouse <swhiteho@redhat.com> wrote:
>> On 31/05/17 16:03, Andreas Gruenbacher wrote:
>> Looks generally like a nice clean up, but...
>> [...]
>>> @@ -623,16 +643,25 @@ static void glock_work_func(struct work_struct *work)
>>>                  }
>>>          }
>>>          run_queue(gl, 0);
>>> -       spin_unlock(&gl->gl_lockref.lock);
>>> -       if (!delay)
>>> -               gfs2_glock_put(gl);
>>> -       else {
>>> +       if (delay) {
>>> +               /* Keep one glock reference for the work we requeue. */
>>> +               drop_refs--;
>>>                  if (gl->gl_name.ln_type != LM_TYPE_INODE)
>>>                          delay = 0;
>>> -               if (queue_delayed_work(glock_workqueue, &gl->gl_work, delay) == 0)
>>> -                       gfs2_glock_put(gl);
>>> +               __gfs2_glock_queue_work(gl, delay);
>>> +       }
>>> +       if (drop_refs > 1) {
>>> +               gl->gl_lockref.count -= drop_refs - 1;
>>> +               drop_refs = 1;
>>>          }
>> This looks very confusing... is there no cleaner way to write this? After
>> all the effort of cleaning up the queuing of the work, this could also do
>> with tidying up too. Otherwise I think the patch looks good,
> You mean like below, removing the refcount optimization?
>
> Andreas
>
> ---
>   fs/gfs2/glock.c | 6 +-----
>   1 file changed, 1 insertion(+), 5 deletions(-)
>
> diff --git a/fs/gfs2/glock.c b/fs/gfs2/glock.c
> index 51ec6de..f833f1b 100644
> --- a/fs/gfs2/glock.c
> +++ b/fs/gfs2/glock.c
> @@ -732,10 +732,6 @@ static void glock_work_func(struct work_struct *work)
>   			delay = 0;
>   		__gfs2_glock_queue_work(gl, delay);
>   	}
> -	if (drop_refs > 1) {
> -		gl->gl_lockref.count -= drop_refs - 1;
> -		drop_refs = 1;
> -	}
>   	spin_unlock(&gl->gl_lockref.lock);
>   
>   	/*
> @@ -743,7 +739,7 @@ static void glock_work_func(struct work_struct *work)
>   	 * the lockref spinlock: this allows tasks to safely drop glock
>   	 * references under that spinlock (see __gfs2_glock_queue_work).
>   	 */
> -	if (drop_refs)
> +	while (drop_refs--)
>   		gfs2_glock_put(gl);
>   }
>   
Not entirely no. Can we not just have a version of gfs2_glock_put that 
takes the number of refs to drop in case that we really have to count 
them up like that?

gfs2_glock_put_n(gl, drop_refs)

kind of a thing,

Steve.



^ permalink raw reply	[flat|nested] 20+ messages in thread

* [Cluster-devel] [PATCH 3/8] gfs2: Clean up glock work enqueuing
  2017-06-01 15:46       ` Steven Whitehouse
@ 2017-06-02  9:19         ` Andreas Gruenbacher
  2017-06-02  9:29           ` Steven Whitehouse
  0 siblings, 1 reply; 20+ messages in thread
From: Andreas Gruenbacher @ 2017-06-02  9:19 UTC (permalink / raw)
  To: cluster-devel.redhat.com

On Thu, Jun 1, 2017 at 5:46 PM, Steven Whitehouse <swhiteho@redhat.com> wrote:
> [...] Can we not just have a version of gfs2_glock_put that takes
> the number of refs to drop in case that we really have to count them up like
> that?
>
> gfs2_glock_put_n(gl, drop_refs)

Not so easily: there is no function that would drop more than one reference
from a lockref, and adding one doesn't seem very economical.  We could
open-code the entire lockref dropping here though; it's not too horrible.

Thanks,
Andreas

diff --git a/fs/gfs2/glock.c b/fs/gfs2/glock.c
index 1e7b5f2..4733ec2 100644
--- a/fs/gfs2/glock.c
+++ b/fs/gfs2/glock.c
@@ -178,20 +178,11 @@ static void gfs2_glock_queue_work(struct gfs2_glock *gl, unsigned long delay) {
 	spin_unlock(&gl->gl_lockref.lock);
 }
 
-/**
- * gfs2_glock_put() - Decrement reference count on glock
- * @gl: The glock to put
- *
- */
-
-void gfs2_glock_put(struct gfs2_glock *gl)
+static void __gfs2_glock_put(struct gfs2_glock *gl)
 {
 	struct gfs2_sbd *sdp = gl->gl_name.ln_sbd;
 	struct address_space *mapping = gfs2_glock2aspace(gl);
 
-	if (lockref_put_or_lock(&gl->gl_lockref))
-		return;
-
 	lockref_mark_dead(&gl->gl_lockref);
 
 	spin_lock(&lru_lock);
@@ -206,6 +197,20 @@ void gfs2_glock_put(struct gfs2_glock *gl)
 }
 
 /**
+ * gfs2_glock_put() - Decrement reference count on glock
+ * @gl: The glock to put
+ *
+ */
+
+void gfs2_glock_put(struct gfs2_glock *gl)
+{
+	if (lockref_put_or_lock(&gl->gl_lockref))
+		return;
+
+	__gfs2_glock_put(gl);
+}
+
+/**
  * may_grant - check if its ok to grant a new lock
  * @gl: The glock
  * @gh: The lock request which we wish to grant
@@ -655,19 +660,18 @@ static void glock_work_func(struct work_struct *work)
 			delay = 0;
 		__gfs2_glock_queue_work(gl, delay);
 	}
-	if (drop_refs > 1) {
-		gl->gl_lockref.count -= drop_refs - 1;
-		drop_refs = 1;
-	}
-	spin_unlock(&gl->gl_lockref.lock);
 
 	/*
-	 * Drop the remaining glock reference after grabbing (and releasing)
-	 * the lockref spinlock: this allows tasks to safely drop glock
-	 * references under that spinlock (see __gfs2_glock_queue_work).
+	 * Drop the remaining glock references manually here. (Mind that
+	 * __gfs2_glock_queue_work depends on the lockref spinlock begin held
+	 * here as well.)
 	 */
-	if (drop_refs)
-		gfs2_glock_put(gl);
+	gl->gl_lockref.count -= drop_refs;
+	if (!gl->gl_lockref.count) {
+		__gfs2_glock_put(gl);
+		return;
+	}
+	spin_unlock(&gl->gl_lockref.lock);
 }
 
 /**



^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [Cluster-devel] [PATCH 3/8] gfs2: Clean up glock work enqueuing
  2017-06-02  9:19         ` Andreas Gruenbacher
@ 2017-06-02  9:29           ` Steven Whitehouse
  0 siblings, 0 replies; 20+ messages in thread
From: Steven Whitehouse @ 2017-06-02  9:29 UTC (permalink / raw)
  To: cluster-devel.redhat.com

Hi,


On 02/06/17 10:19, Andreas Gruenbacher wrote:
> On Thu, Jun 1, 2017 at 5:46 PM, Steven Whitehouse <swhiteho@redhat.com> wrote:
>> [...] Can we not just have a version of gfs2_glock_put that takes
>> the number of refs to drop in case that we really have to count them up like
>> that?
>>
>> gfs2_glock_put_n(gl, drop_refs)
> Not so easily: there is no function that would drop more than one reference
> from a lockref, and adding one doesn't seem very economical.  We could
> open-code the entire lockref dropping here though; it's not too horrible.
>
> Thanks,
> Andreas
>
> diff --git a/fs/gfs2/glock.c b/fs/gfs2/glock.c
> index 1e7b5f2..4733ec2 100644
> --- a/fs/gfs2/glock.c
> +++ b/fs/gfs2/glock.c
> @@ -178,20 +178,11 @@ static void gfs2_glock_queue_work(struct gfs2_glock *gl, unsigned long delay) {
>   	spin_unlock(&gl->gl_lockref.lock);
>   }
>   
> -/**
> - * gfs2_glock_put() - Decrement reference count on glock
> - * @gl: The glock to put
> - *
> - */
> -
> -void gfs2_glock_put(struct gfs2_glock *gl)
> +static void __gfs2_glock_put(struct gfs2_glock *gl)
>   {
>   	struct gfs2_sbd *sdp = gl->gl_name.ln_sbd;
>   	struct address_space *mapping = gfs2_glock2aspace(gl);
>   
> -	if (lockref_put_or_lock(&gl->gl_lockref))
> -		return;
> -
>   	lockref_mark_dead(&gl->gl_lockref);
>   
>   	spin_lock(&lru_lock);
> @@ -206,6 +197,20 @@ void gfs2_glock_put(struct gfs2_glock *gl)
>   }
>   
>   /**
> + * gfs2_glock_put() - Decrement reference count on glock
> + * @gl: The glock to put
> + *
> + */
> +
> +void gfs2_glock_put(struct gfs2_glock *gl)
> +{
> +	if (lockref_put_or_lock(&gl->gl_lockref))
> +		return;
> +
> +	__gfs2_glock_put(gl);
> +}
> +
> +/**
>    * may_grant - check if its ok to grant a new lock
>    * @gl: The glock
>    * @gh: The lock request which we wish to grant
> @@ -655,19 +660,18 @@ static void glock_work_func(struct work_struct *work)
>   			delay = 0;
>   		__gfs2_glock_queue_work(gl, delay);
>   	}
> -	if (drop_refs > 1) {
> -		gl->gl_lockref.count -= drop_refs - 1;
> -		drop_refs = 1;
> -	}
> -	spin_unlock(&gl->gl_lockref.lock);
>   
>   	/*
> -	 * Drop the remaining glock reference after grabbing (and releasing)
> -	 * the lockref spinlock: this allows tasks to safely drop glock
> -	 * references under that spinlock (see __gfs2_glock_queue_work).
> +	 * Drop the remaining glock references manually here. (Mind that
> +	 * __gfs2_glock_queue_work depends on the lockref spinlock begin held
> +	 * here as well.)
>   	 */
> -	if (drop_refs)
> -		gfs2_glock_put(gl);
> +	gl->gl_lockref.count -= drop_refs;
> +	if (!gl->gl_lockref.count) {
> +		__gfs2_glock_put(gl);
> +		return;
> +	}
> +	spin_unlock(&gl->gl_lockref.lock);
>   }
>   
>   /**

Yes, that is definitely an improvement over the original, and looks much 
less confusing I think. That seems like a reasonable solution to me,

Steve.



^ permalink raw reply	[flat|nested] 20+ messages in thread

* [Cluster-devel] [PATCH 7/8] gfs2: gfs2_evict_inode: Put glocks asynchronously
  2017-06-01 15:37   ` Steven Whitehouse
@ 2017-06-02 13:57     ` Andreas Gruenbacher
  0 siblings, 0 replies; 20+ messages in thread
From: Andreas Gruenbacher @ 2017-06-02 13:57 UTC (permalink / raw)
  To: cluster-devel.redhat.com

On Thu, Jun 1, 2017 at 5:37 PM, Steven Whitehouse <swhiteho@redhat.com> wrote:
> Hi,
>
> comments below...
>
>
>
> On 31/05/17 16:03, Andreas Gruenbacher wrote:
>>
>> gfs2_evict_inode is called to free inodes under memory pressure.  The
>> function calls into DLM when an inode's last cluster-wide reference goes
>> away (remote unlink) and to release the glock and associated DLM lock
>> before finally destroying the inode.  However, if DLM is blocked on
>> memory to become available, calling into DLM again will deadlock.
>>
>> Avoid that by decoupling releasing glocks from destroying inodes in that
>> case: with gfs2_glock_queue_put, glocks will be dequeued asynchronously
>> in work queue context, when the associated inodes have most likely
>> already been destroyed.
>>
>> With this change, it appears that inodes can end up being unlinked,
>> remote-unlink can be triggered, and then the inode can be reallocated
>> before all remote-unlink callbacks are processed.  Revalidate the link
>> count in gfs2_evict_inode to make sure we're not destroying an
>> allocated, referenced inode.
>>
>> In addition, skip remote unlinks under memory pressure; the next inode
>> allocation in the same resource group will take care of destroying
>> unlinked inodes.
>>
>> Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
>> ---
>>   fs/gfs2/glock.c | 10 +++++++++-
>>   fs/gfs2/glock.h |  2 ++
>>   fs/gfs2/super.c | 30 ++++++++++++++++++++++++++++--
>>   3 files changed, 39 insertions(+), 3 deletions(-)
>>
>> diff --git a/fs/gfs2/glock.c b/fs/gfs2/glock.c
>> index e6d32d2..4ba53e9 100644
>> --- a/fs/gfs2/glock.c
>> +++ b/fs/gfs2/glock.c
>> @@ -170,7 +170,7 @@ void gfs2_glock_free(struct gfs2_glock *gl)
>>    *
>>    */
>>   -static void gfs2_glock_hold(struct gfs2_glock *gl)
>> +void gfs2_glock_hold(struct gfs2_glock *gl)
>>   {
>>         GLOCK_BUG_ON(gl, __lockref_is_dead(&gl->gl_lockref));
>>         lockref_get(&gl->gl_lockref);
>> @@ -269,6 +269,14 @@ void gfs2_glock_put(struct gfs2_glock *gl)
>>         sdp->sd_lockstruct.ls_ops->lm_put_lock(gl);
>>   }
>>   +/*
>> + * Cause the glock to be put in work queue context.
>> + */
>> +void gfs2_glock_queue_put(struct gfs2_glock *gl)
>> +{
>> +       gfs2_glock_queue_work(gl, 0);
>> +}
>> +
>
> This is confusing. If the work queue is already scheduled, then it will
> simply drop the ref count on the glock by one directly and not reschedule
> the work item. That means that if the ref count were to hit zero, then that
> state would potentially by missed, maybe that can't happen in this case,
> but it isn't clear that it can't at first glance.

gfs2_glock_queue_work is meant to make sure glock_work_func runs at
least once in the future, and consumes one glock reference. Always.

In case you're wondering why decrementing the ref count in
__gfs2_glock_queue_work is safe: inside __gfs2_glock_queue_work, we
are holding the glock spin lock. glock_work_func grabs that spin lock
before dropping extra references, and it owns at least one reference.
So when queue_delayed_work tells us that the work is already queued,
we know that glock_work_func cannot get to the point of dropping its
references, and therefore the reference we drop in
__gfs2_glock_queue_work cannot be the last one.

Thanks,
Andreas



^ permalink raw reply	[flat|nested] 20+ messages in thread

end of thread, other threads:[~2017-06-02 13:57 UTC | newest]

Thread overview: 20+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-05-31 15:03 [Cluster-devel] [PATCH 0/8] GFS2 shrinker deadlock Andreas Gruenbacher
2017-05-31 15:03 ` [Cluster-devel] [PATCH 1/8] gfs2: Get rid of flush_delayed_work in gfs2_evict_inode Andreas Gruenbacher
2017-06-01 15:13   ` Steven Whitehouse
2017-05-31 15:03 ` [Cluster-devel] [PATCH 2/8] gfs2: Protect gl->gl_object by spin lock Andreas Gruenbacher
2017-06-01 15:16   ` Steven Whitehouse
2017-05-31 15:03 ` [Cluster-devel] [PATCH 3/8] gfs2: Clean up glock work enqueuing Andreas Gruenbacher
2017-06-01 15:21   ` Steven Whitehouse
2017-06-01 15:43     ` Andreas Gruenbacher
2017-06-01 15:46       ` Steven Whitehouse
2017-06-02  9:19         ` Andreas Gruenbacher
2017-06-02  9:29           ` Steven Whitehouse
2017-05-31 15:03 ` [Cluster-devel] [PATCH 4/8] gfs2: Always check block type in gfs2_evict_inode Andreas Gruenbacher
2017-05-31 15:03 ` [Cluster-devel] [PATCH 5/8] gfs2: gfs2_glock_get: Wait on freeing glocks Andreas Gruenbacher
2017-05-31 15:03 ` [Cluster-devel] [PATCH 6/8] gfs2: Get rid of gfs2_set_nlink Andreas Gruenbacher
2017-05-31 15:03 ` [Cluster-devel] [PATCH 7/8] gfs2: gfs2_evict_inode: Put glocks asynchronously Andreas Gruenbacher
2017-06-01 10:51   ` Andreas Gruenbacher
2017-06-01 15:37   ` Steven Whitehouse
2017-06-02 13:57     ` Andreas Gruenbacher
2017-05-31 15:03 ` [Cluster-devel] [PATCH 8/8] gfs2: Warn when not deleting inodes under memory pressure Andreas Gruenbacher
2017-06-01 15:39   ` Steven Whitehouse

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.