All of lore.kernel.org
 help / color / mirror / Atom feed
* [RFC PATCH 0/6] ceph: remove excess mutex locking from cap/snap flushing codepaths
@ 2021-06-15 14:57 Jeff Layton
  2021-06-15 14:57 ` [RFC PATCH 1/6] ceph: allow ceph_put_mds_session to take NULL or ERR_PTR Jeff Layton
                   ` (5 more replies)
  0 siblings, 6 replies; 18+ messages in thread
From: Jeff Layton @ 2021-06-15 14:57 UTC (permalink / raw)
  To: ceph-devel; +Cc: pdonnell, ukernel, idryomov, xiubli

tldr: I'm hoping we can greatly simplify the locking in the cap sending
codepath, which should clean some some ugliness around iput and
unmounting.

For a long time, ceph has required the s_mutex and (to a lesser degree)
the snap_rwsem in the cap sending codepath. At least, it's been
documentd this way, but what these locks are intended to protect has
never been made clear.

I posted a query about requiring the s_mutex during cap/snap messages a
week or so ago, and have spent the last week going over that code. My
conclusion is that it's not actually necessary and we can just not
require it in many codepaths.

The problem with them is that we often want to call things like
ceph_check_caps after releasing references, but we may already hold
these same mutexes because we're in a reply handler causing a deadlock.

This has prompted things like the lock inversion loops in
ceph_check_caps and ceph_async_iput, which will queue the final inode
cleanup to a workqueue. That works reasonably well, but has problems
when we go to unmount. We can end up queueing these jobs after the point
where the workqueue is flushed.

I think the better fix for this is to just simplify the locking in these
codepaths, and make it unnecessary to take these mutexes in the first
place. The caps themselves are generally protected by the i_ceph_lock,
and nothing else seems to require that we take and hold these locks.

There _might_ be an argument to be made that we need to hold the
session->s_state constant when marshalling up the call, but it's hard to
see why. A mutex is a very heavyweight way to do this, and the s_state
field isn't consistently protected by it anyway.

I've tested this pretty heavily on my local test rig and it seems to
have done well. What I'd like to do next is put this into the testing
kernel and see if any bugs shake out. If any do, we can just revert the
set and figure out what went wrong.

Thoughts?

Jeff Layton (6):
  ceph: allow ceph_put_mds_session to take NULL or ERR_PTR
  ceph: eliminate session->s_gen_ttl_lock
  ceph: don't take s_mutex or snap_rwsem in ceph_check_caps
  ceph: don't take s_mutex in try_flush_caps
  ceph: don't take s_mutex in ceph_flush_snaps
  ceph: eliminate ceph_async_iput()

 fs/ceph/caps.c       | 106 ++++++++-----------------------------------
 fs/ceph/dir.c        |   7 +--
 fs/ceph/inode.c      |  35 +++-----------
 fs/ceph/mds_client.c |  45 +++++++++---------
 fs/ceph/mds_client.h |   6 +--
 fs/ceph/metric.c     |   3 +-
 fs/ceph/quota.c      |   6 +--
 fs/ceph/snap.c       |  14 +++---
 fs/ceph/super.h      |   2 -
 9 files changed, 61 insertions(+), 163 deletions(-)

-- 
2.31.1


^ permalink raw reply	[flat|nested] 18+ messages in thread

* [RFC PATCH 1/6] ceph: allow ceph_put_mds_session to take NULL or ERR_PTR
  2021-06-15 14:57 [RFC PATCH 0/6] ceph: remove excess mutex locking from cap/snap flushing codepaths Jeff Layton
@ 2021-06-15 14:57 ` Jeff Layton
  2021-06-16 15:24   ` Luis Henriques
  2021-06-15 14:57 ` [RFC PATCH 2/6] ceph: eliminate session->s_gen_ttl_lock Jeff Layton
                   ` (4 subsequent siblings)
  5 siblings, 1 reply; 18+ messages in thread
From: Jeff Layton @ 2021-06-15 14:57 UTC (permalink / raw)
  To: ceph-devel; +Cc: pdonnell, ukernel, idryomov, xiubli

...to simplify some error paths.

Signed-off-by: Jeff Layton <jlayton@kernel.org>
---
 fs/ceph/dir.c        | 3 +--
 fs/ceph/inode.c      | 6 ++----
 fs/ceph/mds_client.c | 6 ++++--
 fs/ceph/metric.c     | 3 +--
 4 files changed, 8 insertions(+), 10 deletions(-)

diff --git a/fs/ceph/dir.c b/fs/ceph/dir.c
index ac431246e0c9..0dc5f8357f58 100644
--- a/fs/ceph/dir.c
+++ b/fs/ceph/dir.c
@@ -1802,8 +1802,7 @@ static void ceph_d_release(struct dentry *dentry)
 	dentry->d_fsdata = NULL;
 	spin_unlock(&dentry->d_lock);
 
-	if (di->lease_session)
-		ceph_put_mds_session(di->lease_session);
+	ceph_put_mds_session(di->lease_session);
 	kmem_cache_free(ceph_dentry_cachep, di);
 }
 
diff --git a/fs/ceph/inode.c b/fs/ceph/inode.c
index df0c8a724609..6f43542b3344 100644
--- a/fs/ceph/inode.c
+++ b/fs/ceph/inode.c
@@ -1154,8 +1154,7 @@ static inline void update_dentry_lease(struct inode *dir, struct dentry *dentry,
 	__update_dentry_lease(dir, dentry, lease, session, from_time,
 			      &old_lease_session);
 	spin_unlock(&dentry->d_lock);
-	if (old_lease_session)
-		ceph_put_mds_session(old_lease_session);
+	ceph_put_mds_session(old_lease_session);
 }
 
 /*
@@ -1200,8 +1199,7 @@ static void update_dentry_lease_careful(struct dentry *dentry,
 			      from_time, &old_lease_session);
 out_unlock:
 	spin_unlock(&dentry->d_lock);
-	if (old_lease_session)
-		ceph_put_mds_session(old_lease_session);
+	ceph_put_mds_session(old_lease_session);
 }
 
 /*
diff --git a/fs/ceph/mds_client.c b/fs/ceph/mds_client.c
index e5af591d3bd4..ec669634c649 100644
--- a/fs/ceph/mds_client.c
+++ b/fs/ceph/mds_client.c
@@ -664,6 +664,9 @@ struct ceph_mds_session *ceph_get_mds_session(struct ceph_mds_session *s)
 
 void ceph_put_mds_session(struct ceph_mds_session *s)
 {
+	if (IS_ERR_OR_NULL(s))
+		return;
+
 	dout("mdsc put_session %p %d -> %d\n", s,
 	     refcount_read(&s->s_ref), refcount_read(&s->s_ref)-1);
 	if (refcount_dec_and_test(&s->s_ref)) {
@@ -1438,8 +1441,7 @@ static void __open_export_target_sessions(struct ceph_mds_client *mdsc,
 
 	for (i = 0; i < mi->num_export_targets; i++) {
 		ts = __open_export_target_session(mdsc, mi->export_targets[i]);
-		if (!IS_ERR(ts))
-			ceph_put_mds_session(ts);
+		ceph_put_mds_session(ts);
 	}
 }
 
diff --git a/fs/ceph/metric.c b/fs/ceph/metric.c
index 9577c71e645d..5ac151eb0d49 100644
--- a/fs/ceph/metric.c
+++ b/fs/ceph/metric.c
@@ -311,8 +311,7 @@ void ceph_metric_destroy(struct ceph_client_metric *m)
 
 	cancel_delayed_work_sync(&m->delayed_work);
 
-	if (m->session)
-		ceph_put_mds_session(m->session);
+	ceph_put_mds_session(m->session);
 }
 
 #define METRIC_UPDATE_MIN_MAX(min, max, new)	\
-- 
2.31.1


^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [RFC PATCH 2/6] ceph: eliminate session->s_gen_ttl_lock
  2021-06-15 14:57 [RFC PATCH 0/6] ceph: remove excess mutex locking from cap/snap flushing codepaths Jeff Layton
  2021-06-15 14:57 ` [RFC PATCH 1/6] ceph: allow ceph_put_mds_session to take NULL or ERR_PTR Jeff Layton
@ 2021-06-15 14:57 ` Jeff Layton
  2021-06-16 15:24   ` Luis Henriques
  2021-06-15 14:57 ` [RFC PATCH 3/6] ceph: don't take s_mutex or snap_rwsem in ceph_check_caps Jeff Layton
                   ` (3 subsequent siblings)
  5 siblings, 1 reply; 18+ messages in thread
From: Jeff Layton @ 2021-06-15 14:57 UTC (permalink / raw)
  To: ceph-devel; +Cc: pdonnell, ukernel, idryomov, xiubli

Turn s_cap_gen field into an atomic_t, and just rely on the fact that we
hold the s_mutex when changing the s_cap_ttl field.

Signed-off-by: Jeff Layton <jlayton@kernel.org>
---
 fs/ceph/caps.c       | 15 ++++++---------
 fs/ceph/dir.c        |  4 +---
 fs/ceph/inode.c      |  4 ++--
 fs/ceph/mds_client.c | 17 ++++++-----------
 fs/ceph/mds_client.h |  6 ++----
 5 files changed, 17 insertions(+), 29 deletions(-)

diff --git a/fs/ceph/caps.c b/fs/ceph/caps.c
index a5e93b185515..919eada97a1f 100644
--- a/fs/ceph/caps.c
+++ b/fs/ceph/caps.c
@@ -645,9 +645,7 @@ void ceph_add_cap(struct inode *inode,
 	dout("add_cap %p mds%d cap %llx %s seq %d\n", inode,
 	     session->s_mds, cap_id, ceph_cap_string(issued), seq);
 
-	spin_lock(&session->s_gen_ttl_lock);
-	gen = session->s_cap_gen;
-	spin_unlock(&session->s_gen_ttl_lock);
+	gen = atomic_read(&session->s_cap_gen);
 
 	cap = __get_cap_for_mds(ci, mds);
 	if (!cap) {
@@ -785,10 +783,8 @@ static int __cap_is_valid(struct ceph_cap *cap)
 	unsigned long ttl;
 	u32 gen;
 
-	spin_lock(&cap->session->s_gen_ttl_lock);
-	gen = cap->session->s_cap_gen;
+	gen = atomic_read(&cap->session->s_cap_gen);
 	ttl = cap->session->s_cap_ttl;
-	spin_unlock(&cap->session->s_gen_ttl_lock);
 
 	if (cap->cap_gen < gen || time_after_eq(jiffies, ttl)) {
 		dout("__cap_is_valid %p cap %p issued %s "
@@ -1182,7 +1178,8 @@ void __ceph_remove_cap(struct ceph_cap *cap, bool queue_release)
 	 * s_cap_gen while session is in the reconnect state.
 	 */
 	if (queue_release &&
-	    (!session->s_cap_reconnect || cap->cap_gen == session->s_cap_gen)) {
+	    (!session->s_cap_reconnect ||
+	     cap->cap_gen == atomic_read(&session->s_cap_gen))) {
 		cap->queue_release = 1;
 		if (removed) {
 			__ceph_queue_cap_release(session, cap);
@@ -3288,7 +3285,7 @@ static void handle_cap_grant(struct inode *inode,
 	u64 size = le64_to_cpu(grant->size);
 	u64 max_size = le64_to_cpu(grant->max_size);
 	unsigned char check_caps = 0;
-	bool was_stale = cap->cap_gen < session->s_cap_gen;
+	bool was_stale = cap->cap_gen < atomic_read(&session->s_cap_gen);
 	bool wake = false;
 	bool writeback = false;
 	bool queue_trunc = false;
@@ -3340,7 +3337,7 @@ static void handle_cap_grant(struct inode *inode,
 	}
 
 	/* side effects now are allowed */
-	cap->cap_gen = session->s_cap_gen;
+	cap->cap_gen = atomic_read(&session->s_cap_gen);
 	cap->seq = seq;
 
 	__check_cap_issue(ci, cap, newcaps);
diff --git a/fs/ceph/dir.c b/fs/ceph/dir.c
index 0dc5f8357f58..bd508b1aeac2 100644
--- a/fs/ceph/dir.c
+++ b/fs/ceph/dir.c
@@ -1541,10 +1541,8 @@ static bool __dentry_lease_is_valid(struct ceph_dentry_info *di)
 		u32 gen;
 		unsigned long ttl;
 
-		spin_lock(&session->s_gen_ttl_lock);
-		gen = session->s_cap_gen;
+		gen = atomic_read(&session->s_cap_gen);
 		ttl = session->s_cap_ttl;
-		spin_unlock(&session->s_gen_ttl_lock);
 
 		if (di->lease_gen == gen &&
 		    time_before(jiffies, ttl) &&
diff --git a/fs/ceph/inode.c b/fs/ceph/inode.c
index 6f43542b3344..6034821c9d63 100644
--- a/fs/ceph/inode.c
+++ b/fs/ceph/inode.c
@@ -1124,7 +1124,7 @@ static void __update_dentry_lease(struct inode *dir, struct dentry *dentry,
 		return;
 	}
 
-	if (di->lease_gen == session->s_cap_gen &&
+	if (di->lease_gen == atomic_read(&session->s_cap_gen) &&
 	    time_before(ttl, di->time))
 		return;  /* we already have a newer lease. */
 
@@ -1135,7 +1135,7 @@ static void __update_dentry_lease(struct inode *dir, struct dentry *dentry,
 
 	if (!di->lease_session)
 		di->lease_session = ceph_get_mds_session(session);
-	di->lease_gen = session->s_cap_gen;
+	di->lease_gen = atomic_read(&session->s_cap_gen);
 	di->lease_seq = le32_to_cpu(lease->seq);
 	di->lease_renew_after = half_ttl;
 	di->lease_renew_from = 0;
diff --git a/fs/ceph/mds_client.c b/fs/ceph/mds_client.c
index ec669634c649..87d3be10af25 100644
--- a/fs/ceph/mds_client.c
+++ b/fs/ceph/mds_client.c
@@ -749,8 +749,7 @@ static struct ceph_mds_session *register_session(struct ceph_mds_client *mdsc,
 
 	ceph_con_init(&s->s_con, s, &mds_con_ops, &mdsc->fsc->client->msgr);
 
-	spin_lock_init(&s->s_gen_ttl_lock);
-	s->s_cap_gen = 1;
+	atomic_set(&s->s_cap_gen, 1);
 	s->s_cap_ttl = jiffies - 1;
 
 	spin_lock_init(&s->s_cap_lock);
@@ -1763,7 +1762,7 @@ static int wake_up_session_cb(struct inode *inode, struct ceph_cap *cap,
 		ci->i_requested_max_size = 0;
 		spin_unlock(&ci->i_ceph_lock);
 	} else if (ev == RENEWCAPS) {
-		if (cap->cap_gen < cap->session->s_cap_gen) {
+		if (cap->cap_gen < atomic_read(&cap->session->s_cap_gen)) {
 			/* mds did not re-issue stale cap */
 			spin_lock(&ci->i_ceph_lock);
 			cap->issued = cap->implemented = CEPH_CAP_PIN;
@@ -3501,10 +3500,8 @@ static void handle_session(struct ceph_mds_session *session,
 	case CEPH_SESSION_STALE:
 		pr_info("mds%d caps went stale, renewing\n",
 			session->s_mds);
-		spin_lock(&session->s_gen_ttl_lock);
-		session->s_cap_gen++;
+		atomic_inc(&session->s_cap_gen);
 		session->s_cap_ttl = jiffies - 1;
-		spin_unlock(&session->s_gen_ttl_lock);
 		send_renew_caps(mdsc, session);
 		break;
 
@@ -3773,7 +3770,7 @@ static int reconnect_caps_cb(struct inode *inode, struct ceph_cap *cap,
 	cap->seq = 0;        /* reset cap seq */
 	cap->issue_seq = 0;  /* and issue_seq */
 	cap->mseq = 0;       /* and migrate_seq */
-	cap->cap_gen = cap->session->s_cap_gen;
+	cap->cap_gen = atomic_read(&cap->session->s_cap_gen);
 
 	/* These are lost when the session goes away */
 	if (S_ISDIR(inode->i_mode)) {
@@ -4013,9 +4010,7 @@ static void send_mds_reconnect(struct ceph_mds_client *mdsc,
 	dout("session %p state %s\n", session,
 	     ceph_session_state_name(session->s_state));
 
-	spin_lock(&session->s_gen_ttl_lock);
-	session->s_cap_gen++;
-	spin_unlock(&session->s_gen_ttl_lock);
+	atomic_inc(&session->s_cap_gen);
 
 	spin_lock(&session->s_cap_lock);
 	/* don't know if session is readonly */
@@ -4346,7 +4341,7 @@ static void handle_lease(struct ceph_mds_client *mdsc,
 
 	case CEPH_MDS_LEASE_RENEW:
 		if (di->lease_session == session &&
-		    di->lease_gen == session->s_cap_gen &&
+		    di->lease_gen == atomic_read(&session->s_cap_gen) &&
 		    di->lease_renew_from &&
 		    di->lease_renew_after == 0) {
 			unsigned long duration =
diff --git a/fs/ceph/mds_client.h b/fs/ceph/mds_client.h
index 15c11a0f2caf..20e42d8b66c6 100644
--- a/fs/ceph/mds_client.h
+++ b/fs/ceph/mds_client.h
@@ -186,10 +186,8 @@ struct ceph_mds_session {
 
 	struct ceph_auth_handshake s_auth;
 
-	/* protected by s_gen_ttl_lock */
-	spinlock_t        s_gen_ttl_lock;
-	u32               s_cap_gen;  /* inc each time we get mds stale msg */
-	unsigned long     s_cap_ttl;  /* when session caps expire */
+	atomic_t          s_cap_gen;  /* inc each time we get mds stale msg */
+	unsigned long     s_cap_ttl;  /* when session caps expire. protected by s_mutex */
 
 	/* protected by s_cap_lock */
 	spinlock_t        s_cap_lock;
-- 
2.31.1


^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [RFC PATCH 3/6] ceph: don't take s_mutex or snap_rwsem in ceph_check_caps
  2021-06-15 14:57 [RFC PATCH 0/6] ceph: remove excess mutex locking from cap/snap flushing codepaths Jeff Layton
  2021-06-15 14:57 ` [RFC PATCH 1/6] ceph: allow ceph_put_mds_session to take NULL or ERR_PTR Jeff Layton
  2021-06-15 14:57 ` [RFC PATCH 2/6] ceph: eliminate session->s_gen_ttl_lock Jeff Layton
@ 2021-06-15 14:57 ` Jeff Layton
  2021-06-16 15:25   ` Luis Henriques
  2021-06-15 14:57 ` [RFC PATCH 4/6] ceph: don't take s_mutex in try_flush_caps Jeff Layton
                   ` (2 subsequent siblings)
  5 siblings, 1 reply; 18+ messages in thread
From: Jeff Layton @ 2021-06-15 14:57 UTC (permalink / raw)
  To: ceph-devel; +Cc: pdonnell, ukernel, idryomov, xiubli

These locks appear to be completely unnecessary. Almost all of this
function is done under the inode->i_ceph_lock, aside from the actual
sending of the message. Don't take either lock in this function.

Signed-off-by: Jeff Layton <jlayton@kernel.org>
---
 fs/ceph/caps.c | 61 ++++++--------------------------------------------
 1 file changed, 7 insertions(+), 54 deletions(-)

diff --git a/fs/ceph/caps.c b/fs/ceph/caps.c
index 919eada97a1f..825b1e463ad3 100644
--- a/fs/ceph/caps.c
+++ b/fs/ceph/caps.c
@@ -1912,7 +1912,6 @@ void ceph_check_caps(struct ceph_inode_info *ci, int flags,
 	struct ceph_cap *cap;
 	u64 flush_tid, oldest_flush_tid;
 	int file_wanted, used, cap_used;
-	int took_snap_rwsem = 0;             /* true if mdsc->snap_rwsem held */
 	int issued, implemented, want, retain, revoking, flushing = 0;
 	int mds = -1;   /* keep track of how far we've gone through i_caps list
 			   to avoid an infinite loop on retry */
@@ -1920,6 +1919,9 @@ void ceph_check_caps(struct ceph_inode_info *ci, int flags,
 	bool queue_invalidate = false;
 	bool tried_invalidate = false;
 
+	if (session)
+		ceph_get_mds_session(session);
+
 	spin_lock(&ci->i_ceph_lock);
 	if (ci->i_ceph_flags & CEPH_I_FLUSH)
 		flags |= CHECK_CAPS_FLUSH;
@@ -2021,8 +2023,6 @@ void ceph_check_caps(struct ceph_inode_info *ci, int flags,
 		    ((flags & CHECK_CAPS_AUTHONLY) && cap != ci->i_auth_cap))
 			continue;
 
-		/* NOTE: no side-effects allowed, until we take s_mutex */
-
 		/*
 		 * If we have an auth cap, we don't need to consider any
 		 * overlapping caps as used.
@@ -2085,37 +2085,8 @@ void ceph_check_caps(struct ceph_inode_info *ci, int flags,
 			continue;     /* nope, all good */
 
 ack:
-		if (session && session != cap->session) {
-			dout("oops, wrong session %p mutex\n", session);
-			mutex_unlock(&session->s_mutex);
-			session = NULL;
-		}
-		if (!session) {
-			session = cap->session;
-			if (mutex_trylock(&session->s_mutex) == 0) {
-				dout("inverting session/ino locks on %p\n",
-				     session);
-				session = ceph_get_mds_session(session);
-				spin_unlock(&ci->i_ceph_lock);
-				if (took_snap_rwsem) {
-					up_read(&mdsc->snap_rwsem);
-					took_snap_rwsem = 0;
-				}
-				if (session) {
-					mutex_lock(&session->s_mutex);
-					ceph_put_mds_session(session);
-				} else {
-					/*
-					 * Because we take the reference while
-					 * holding the i_ceph_lock, it should
-					 * never be NULL. Throw a warning if it
-					 * ever is.
-					 */
-					WARN_ON_ONCE(true);
-				}
-				goto retry;
-			}
-		}
+		ceph_put_mds_session(session);
+		session = ceph_get_mds_session(cap->session);
 
 		/* kick flushing and flush snaps before sending normal
 		 * cap message */
@@ -2130,19 +2101,6 @@ void ceph_check_caps(struct ceph_inode_info *ci, int flags,
 			goto retry_locked;
 		}
 
-		/* take snap_rwsem after session mutex */
-		if (!took_snap_rwsem) {
-			if (down_read_trylock(&mdsc->snap_rwsem) == 0) {
-				dout("inverting snap/in locks on %p\n",
-				     inode);
-				spin_unlock(&ci->i_ceph_lock);
-				down_read(&mdsc->snap_rwsem);
-				took_snap_rwsem = 1;
-				goto retry;
-			}
-			took_snap_rwsem = 1;
-		}
-
 		if (cap == ci->i_auth_cap && ci->i_dirty_caps) {
 			flushing = ci->i_dirty_caps;
 			flush_tid = __mark_caps_flushing(inode, session, false,
@@ -2179,13 +2137,9 @@ void ceph_check_caps(struct ceph_inode_info *ci, int flags,
 
 	spin_unlock(&ci->i_ceph_lock);
 
+	ceph_put_mds_session(session);
 	if (queue_invalidate)
 		ceph_queue_invalidate(inode);
-
-	if (session)
-		mutex_unlock(&session->s_mutex);
-	if (took_snap_rwsem)
-		up_read(&mdsc->snap_rwsem);
 }
 
 /*
@@ -3550,13 +3504,12 @@ static void handle_cap_grant(struct inode *inode,
 	if (wake)
 		wake_up_all(&ci->i_cap_wq);
 
+	mutex_unlock(&session->s_mutex);
 	if (check_caps == 1)
 		ceph_check_caps(ci, CHECK_CAPS_AUTHONLY | CHECK_CAPS_NOINVAL,
 				session);
 	else if (check_caps == 2)
 		ceph_check_caps(ci, CHECK_CAPS_NOINVAL, session);
-	else
-		mutex_unlock(&session->s_mutex);
 }
 
 /*
-- 
2.31.1


^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [RFC PATCH 4/6] ceph: don't take s_mutex in try_flush_caps
  2021-06-15 14:57 [RFC PATCH 0/6] ceph: remove excess mutex locking from cap/snap flushing codepaths Jeff Layton
                   ` (2 preceding siblings ...)
  2021-06-15 14:57 ` [RFC PATCH 3/6] ceph: don't take s_mutex or snap_rwsem in ceph_check_caps Jeff Layton
@ 2021-06-15 14:57 ` Jeff Layton
  2021-06-16 15:25   ` Luis Henriques
  2021-06-15 14:57 ` [RFC PATCH 5/6] ceph: don't take s_mutex in ceph_flush_snaps Jeff Layton
  2021-06-15 14:57 ` [RFC PATCH 6/6] ceph: eliminate ceph_async_iput() Jeff Layton
  5 siblings, 1 reply; 18+ messages in thread
From: Jeff Layton @ 2021-06-15 14:57 UTC (permalink / raw)
  To: ceph-devel; +Cc: pdonnell, ukernel, idryomov, xiubli

The s_mutex doesn't protect anything in this codepath.

Signed-off-by: Jeff Layton <jlayton@kernel.org>
---
 fs/ceph/caps.c | 16 ++--------------
 1 file changed, 2 insertions(+), 14 deletions(-)

diff --git a/fs/ceph/caps.c b/fs/ceph/caps.c
index 825b1e463ad3..d21b1fa36875 100644
--- a/fs/ceph/caps.c
+++ b/fs/ceph/caps.c
@@ -2149,26 +2149,17 @@ static int try_flush_caps(struct inode *inode, u64 *ptid)
 {
 	struct ceph_mds_client *mdsc = ceph_sb_to_client(inode->i_sb)->mdsc;
 	struct ceph_inode_info *ci = ceph_inode(inode);
-	struct ceph_mds_session *session = NULL;
 	int flushing = 0;
 	u64 flush_tid = 0, oldest_flush_tid = 0;
 
-retry:
 	spin_lock(&ci->i_ceph_lock);
 retry_locked:
 	if (ci->i_dirty_caps && ci->i_auth_cap) {
 		struct ceph_cap *cap = ci->i_auth_cap;
 		struct cap_msg_args arg;
+		struct ceph_mds_session *session = cap->session;
 
-		if (session != cap->session) {
-			spin_unlock(&ci->i_ceph_lock);
-			if (session)
-				mutex_unlock(&session->s_mutex);
-			session = cap->session;
-			mutex_lock(&session->s_mutex);
-			goto retry;
-		}
-		if (cap->session->s_state < CEPH_MDS_SESSION_OPEN) {
+		if (session->s_state < CEPH_MDS_SESSION_OPEN) {
 			spin_unlock(&ci->i_ceph_lock);
 			goto out;
 		}
@@ -2205,9 +2196,6 @@ static int try_flush_caps(struct inode *inode, u64 *ptid)
 		spin_unlock(&ci->i_ceph_lock);
 	}
 out:
-	if (session)
-		mutex_unlock(&session->s_mutex);
-
 	*ptid = flush_tid;
 	return flushing;
 }
-- 
2.31.1


^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [RFC PATCH 5/6] ceph: don't take s_mutex in ceph_flush_snaps
  2021-06-15 14:57 [RFC PATCH 0/6] ceph: remove excess mutex locking from cap/snap flushing codepaths Jeff Layton
                   ` (3 preceding siblings ...)
  2021-06-15 14:57 ` [RFC PATCH 4/6] ceph: don't take s_mutex in try_flush_caps Jeff Layton
@ 2021-06-15 14:57 ` Jeff Layton
  2021-06-16 15:25   ` Luis Henriques
  2021-06-15 14:57 ` [RFC PATCH 6/6] ceph: eliminate ceph_async_iput() Jeff Layton
  5 siblings, 1 reply; 18+ messages in thread
From: Jeff Layton @ 2021-06-15 14:57 UTC (permalink / raw)
  To: ceph-devel; +Cc: pdonnell, ukernel, idryomov, xiubli

Signed-off-by: Jeff Layton <jlayton@kernel.org>
---
 fs/ceph/caps.c | 8 +-------
 fs/ceph/snap.c | 4 +---
 2 files changed, 2 insertions(+), 10 deletions(-)

diff --git a/fs/ceph/caps.c b/fs/ceph/caps.c
index d21b1fa36875..5864d5088e27 100644
--- a/fs/ceph/caps.c
+++ b/fs/ceph/caps.c
@@ -1531,7 +1531,7 @@ static inline int __send_flush_snap(struct inode *inode,
  * asynchronously back to the MDS once sync writes complete and dirty
  * data is written out.
  *
- * Called under i_ceph_lock.  Takes s_mutex as needed.
+ * Called under i_ceph_lock.
  */
 static void __ceph_flush_snaps(struct ceph_inode_info *ci,
 			       struct ceph_mds_session *session)
@@ -1653,7 +1653,6 @@ void ceph_flush_snaps(struct ceph_inode_info *ci,
 	mds = ci->i_auth_cap->session->s_mds;
 	if (session && session->s_mds != mds) {
 		dout(" oops, wrong session %p mutex\n", session);
-		mutex_unlock(&session->s_mutex);
 		ceph_put_mds_session(session);
 		session = NULL;
 	}
@@ -1662,10 +1661,6 @@ void ceph_flush_snaps(struct ceph_inode_info *ci,
 		mutex_lock(&mdsc->mutex);
 		session = __ceph_lookup_mds_session(mdsc, mds);
 		mutex_unlock(&mdsc->mutex);
-		if (session) {
-			dout(" inverting session/ino locks on %p\n", session);
-			mutex_lock(&session->s_mutex);
-		}
 		goto retry;
 	}
 
@@ -1680,7 +1675,6 @@ void ceph_flush_snaps(struct ceph_inode_info *ci,
 	if (psession) {
 		*psession = session;
 	} else if (session) {
-		mutex_unlock(&session->s_mutex);
 		ceph_put_mds_session(session);
 	}
 	/* we flushed them all; remove this inode from the queue */
diff --git a/fs/ceph/snap.c b/fs/ceph/snap.c
index f8cac2abab3f..afc7f4c32364 100644
--- a/fs/ceph/snap.c
+++ b/fs/ceph/snap.c
@@ -846,10 +846,8 @@ static void flush_snaps(struct ceph_mds_client *mdsc)
 	}
 	spin_unlock(&mdsc->snap_flush_lock);
 
-	if (session) {
-		mutex_unlock(&session->s_mutex);
+	if (session)
 		ceph_put_mds_session(session);
-	}
 	dout("flush_snaps done\n");
 }
 
-- 
2.31.1


^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [RFC PATCH 6/6] ceph: eliminate ceph_async_iput()
  2021-06-15 14:57 [RFC PATCH 0/6] ceph: remove excess mutex locking from cap/snap flushing codepaths Jeff Layton
                   ` (4 preceding siblings ...)
  2021-06-15 14:57 ` [RFC PATCH 5/6] ceph: don't take s_mutex in ceph_flush_snaps Jeff Layton
@ 2021-06-15 14:57 ` Jeff Layton
  2021-06-16 15:26   ` Luis Henriques
  5 siblings, 1 reply; 18+ messages in thread
From: Jeff Layton @ 2021-06-15 14:57 UTC (permalink / raw)
  To: ceph-devel; +Cc: pdonnell, ukernel, idryomov, xiubli

Now that we don't need to hold session->s_mutex or the snap_rwsem when
calling ceph_check_caps, we can eliminate ceph_async_iput and just use
normal iput calls.

Signed-off-by: Jeff Layton <jlayton@kernel.org>
---
 fs/ceph/caps.c       |  6 +++---
 fs/ceph/inode.c      | 25 +++----------------------
 fs/ceph/mds_client.c | 22 +++++++++++-----------
 fs/ceph/quota.c      |  6 +++---
 fs/ceph/snap.c       | 10 +++++-----
 fs/ceph/super.h      |  2 --
 6 files changed, 25 insertions(+), 46 deletions(-)

diff --git a/fs/ceph/caps.c b/fs/ceph/caps.c
index 5864d5088e27..fd9243e9a1b2 100644
--- a/fs/ceph/caps.c
+++ b/fs/ceph/caps.c
@@ -3147,7 +3147,7 @@ void ceph_put_wrbuffer_cap_refs(struct ceph_inode_info *ci, int nr,
 		wake_up_all(&ci->i_cap_wq);
 	while (put-- > 0) {
 		/* avoid calling iput_final() in osd dispatch threads */
-		ceph_async_iput(inode);
+		iput(inode);
 	}
 }
 
@@ -4136,7 +4136,7 @@ void ceph_handle_caps(struct ceph_mds_session *session,
 done_unlocked:
 	ceph_put_string(extra_info.pool_ns);
 	/* avoid calling iput_final() in mds dispatch threads */
-	ceph_async_iput(inode);
+	iput(inode);
 	return;
 
 flush_cap_releases:
@@ -4179,7 +4179,7 @@ void ceph_check_delayed_caps(struct ceph_mds_client *mdsc)
 			dout("check_delayed_caps on %p\n", inode);
 			ceph_check_caps(ci, 0, NULL);
 			/* avoid calling iput_final() in tick thread */
-			ceph_async_iput(inode);
+			iput(inode);
 			spin_lock(&mdsc->cap_delay_lock);
 		}
 	}
diff --git a/fs/ceph/inode.c b/fs/ceph/inode.c
index 6034821c9d63..5f1c27caf0b6 100644
--- a/fs/ceph/inode.c
+++ b/fs/ceph/inode.c
@@ -1567,7 +1567,7 @@ static int readdir_prepopulate_inodes_only(struct ceph_mds_request *req,
 		}
 
 		/* avoid calling iput_final() in mds dispatch threads */
-		ceph_async_iput(in);
+		iput(in);
 	}
 
 	return err;
@@ -1770,7 +1770,7 @@ int ceph_readdir_prepopulate(struct ceph_mds_request *req,
 					ihold(in);
 					discard_new_inode(in);
 				}
-				ceph_async_iput(in);
+				iput(in);
 			}
 			d_drop(dn);
 			err = ret;
@@ -1783,7 +1783,7 @@ int ceph_readdir_prepopulate(struct ceph_mds_request *req,
 			if (ceph_security_xattr_deadlock(in)) {
 				dout(" skip splicing dn %p to inode %p"
 				     " (security xattr deadlock)\n", dn, in);
-				ceph_async_iput(in);
+				iput(in);
 				skipped++;
 				goto next_item;
 			}
@@ -1834,25 +1834,6 @@ bool ceph_inode_set_size(struct inode *inode, loff_t size)
 	return ret;
 }
 
-/*
- * Put reference to inode, but avoid calling iput_final() in current thread.
- * iput_final() may wait for reahahead pages. The wait can cause deadlock in
- * some contexts.
- */
-void ceph_async_iput(struct inode *inode)
-{
-	if (!inode)
-		return;
-	for (;;) {
-		if (atomic_add_unless(&inode->i_count, -1, 1))
-			break;
-		if (queue_work(ceph_inode_to_client(inode)->inode_wq,
-			       &ceph_inode(inode)->i_work))
-			break;
-		/* queue work failed, i_count must be at least 2 */
-	}
-}
-
 void ceph_queue_inode_work(struct inode *inode, int work_bit)
 {
 	struct ceph_fs_client *fsc = ceph_inode_to_client(inode);
diff --git a/fs/ceph/mds_client.c b/fs/ceph/mds_client.c
index 87d3be10af25..1f960361b78e 100644
--- a/fs/ceph/mds_client.c
+++ b/fs/ceph/mds_client.c
@@ -825,13 +825,13 @@ void ceph_mdsc_release_request(struct kref *kref)
 	if (req->r_inode) {
 		ceph_put_cap_refs(ceph_inode(req->r_inode), CEPH_CAP_PIN);
 		/* avoid calling iput_final() in mds dispatch threads */
-		ceph_async_iput(req->r_inode);
+		iput(req->r_inode);
 	}
 	if (req->r_parent) {
 		ceph_put_cap_refs(ceph_inode(req->r_parent), CEPH_CAP_PIN);
-		ceph_async_iput(req->r_parent);
+		iput(req->r_parent);
 	}
-	ceph_async_iput(req->r_target_inode);
+	iput(req->r_target_inode);
 	if (req->r_dentry)
 		dput(req->r_dentry);
 	if (req->r_old_dentry)
@@ -845,7 +845,7 @@ void ceph_mdsc_release_request(struct kref *kref)
 		 */
 		ceph_put_cap_refs(ceph_inode(req->r_old_dentry_dir),
 				  CEPH_CAP_PIN);
-		ceph_async_iput(req->r_old_dentry_dir);
+		iput(req->r_old_dentry_dir);
 	}
 	kfree(req->r_path1);
 	kfree(req->r_path2);
@@ -961,7 +961,7 @@ static void __unregister_request(struct ceph_mds_client *mdsc,
 
 	if (req->r_unsafe_dir) {
 		/* avoid calling iput_final() in mds dispatch threads */
-		ceph_async_iput(req->r_unsafe_dir);
+		iput(req->r_unsafe_dir);
 		req->r_unsafe_dir = NULL;
 	}
 
@@ -1132,7 +1132,7 @@ static int __choose_mds(struct ceph_mds_client *mdsc,
 		cap = rb_entry(rb_first(&ci->i_caps), struct ceph_cap, ci_node);
 	if (!cap) {
 		spin_unlock(&ci->i_ceph_lock);
-		ceph_async_iput(inode);
+		iput(inode);
 		goto random;
 	}
 	mds = cap->session->s_mds;
@@ -1143,7 +1143,7 @@ static int __choose_mds(struct ceph_mds_client *mdsc,
 out:
 	/* avoid calling iput_final() while holding mdsc->mutex or
 	 * in mds dispatch threads */
-	ceph_async_iput(inode);
+	iput(inode);
 	return mds;
 
 random:
@@ -1548,7 +1548,7 @@ int ceph_iterate_session_caps(struct ceph_mds_session *session,
 		if (last_inode) {
 			/* avoid calling iput_final() while holding
 			 * s_mutex or in mds dispatch threads */
-			ceph_async_iput(last_inode);
+			iput(last_inode);
 			last_inode = NULL;
 		}
 		if (old_cap) {
@@ -1582,7 +1582,7 @@ int ceph_iterate_session_caps(struct ceph_mds_session *session,
 	session->s_cap_iterator = NULL;
 	spin_unlock(&session->s_cap_lock);
 
-	ceph_async_iput(last_inode);
+	iput(last_inode);
 	if (old_cap)
 		ceph_put_cap(session->s_mdsc, old_cap);
 
@@ -1723,7 +1723,7 @@ static void remove_session_caps(struct ceph_mds_session *session)
 
 			inode = ceph_find_inode(sb, vino);
 			 /* avoid calling iput_final() while holding s_mutex */
-			ceph_async_iput(inode);
+			iput(inode);
 
 			spin_lock(&session->s_cap_lock);
 		}
@@ -4370,7 +4370,7 @@ static void handle_lease(struct ceph_mds_client *mdsc,
 out:
 	mutex_unlock(&session->s_mutex);
 	/* avoid calling iput_final() in mds dispatch threads */
-	ceph_async_iput(inode);
+	iput(inode);
 	return;
 
 bad:
diff --git a/fs/ceph/quota.c b/fs/ceph/quota.c
index 4e32c9600ecc..988c6e04b327 100644
--- a/fs/ceph/quota.c
+++ b/fs/ceph/quota.c
@@ -75,7 +75,7 @@ void ceph_handle_quota(struct ceph_mds_client *mdsc,
 	spin_unlock(&ci->i_ceph_lock);
 
 	/* avoid calling iput_final() in dispatch thread */
-	ceph_async_iput(inode);
+	iput(inode);
 }
 
 static struct ceph_quotarealm_inode *
@@ -248,7 +248,7 @@ static struct ceph_snap_realm *get_quota_realm(struct ceph_mds_client *mdsc,
 		ci = ceph_inode(in);
 		has_quota = __ceph_has_any_quota(ci);
 		/* avoid calling iput_final() while holding mdsc->snap_rwsem */
-		ceph_async_iput(in);
+		iput(in);
 
 		next = realm->parent;
 		if (has_quota || !next)
@@ -384,7 +384,7 @@ static bool check_quota_exceeded(struct inode *inode, enum quota_check_op op,
 			exceeded = true; /* Just break the loop */
 		}
 		/* avoid calling iput_final() while holding mdsc->snap_rwsem */
-		ceph_async_iput(in);
+		iput(in);
 
 		next = realm->parent;
 		if (exceeded || !next)
diff --git a/fs/ceph/snap.c b/fs/ceph/snap.c
index afc7f4c32364..f928b02bcd31 100644
--- a/fs/ceph/snap.c
+++ b/fs/ceph/snap.c
@@ -679,13 +679,13 @@ static void queue_realm_cap_snaps(struct ceph_snap_realm *realm)
 		spin_unlock(&realm->inodes_with_caps_lock);
 		/* avoid calling iput_final() while holding
 		 * mdsc->snap_rwsem or in mds dispatch threads */
-		ceph_async_iput(lastinode);
+		iput(lastinode);
 		lastinode = inode;
 		ceph_queue_cap_snap(ci);
 		spin_lock(&realm->inodes_with_caps_lock);
 	}
 	spin_unlock(&realm->inodes_with_caps_lock);
-	ceph_async_iput(lastinode);
+	iput(lastinode);
 
 	dout("queue_realm_cap_snaps %p %llx done\n", realm, realm->ino);
 }
@@ -841,7 +841,7 @@ static void flush_snaps(struct ceph_mds_client *mdsc)
 		ceph_flush_snaps(ci, &session);
 		/* avoid calling iput_final() while holding
 		 * session->s_mutex or in mds dispatch threads */
-		ceph_async_iput(inode);
+		iput(inode);
 		spin_lock(&mdsc->snap_flush_lock);
 	}
 	spin_unlock(&mdsc->snap_flush_lock);
@@ -985,12 +985,12 @@ void ceph_handle_snap(struct ceph_mds_client *mdsc,
 
 			/* avoid calling iput_final() while holding
 			 * mdsc->snap_rwsem or mds in dispatch threads */
-			ceph_async_iput(inode);
+			iput(inode);
 			continue;
 
 skip_inode:
 			spin_unlock(&ci->i_ceph_lock);
-			ceph_async_iput(inode);
+			iput(inode);
 		}
 
 		/* we may have taken some of the old realm's children. */
diff --git a/fs/ceph/super.h b/fs/ceph/super.h
index 31f0be9120dd..6b6332a5c113 100644
--- a/fs/ceph/super.h
+++ b/fs/ceph/super.h
@@ -988,8 +988,6 @@ extern int ceph_inode_holds_cap(struct inode *inode, int mask);
 extern bool ceph_inode_set_size(struct inode *inode, loff_t size);
 extern void __ceph_do_pending_vmtruncate(struct inode *inode);
 
-extern void ceph_async_iput(struct inode *inode);
-
 void ceph_queue_inode_work(struct inode *inode, int work_bit);
 
 static inline void ceph_queue_vmtruncate(struct inode *inode)
-- 
2.31.1


^ permalink raw reply related	[flat|nested] 18+ messages in thread

* Re: [RFC PATCH 1/6] ceph: allow ceph_put_mds_session to take NULL or ERR_PTR
  2021-06-15 14:57 ` [RFC PATCH 1/6] ceph: allow ceph_put_mds_session to take NULL or ERR_PTR Jeff Layton
@ 2021-06-16 15:24   ` Luis Henriques
  0 siblings, 0 replies; 18+ messages in thread
From: Luis Henriques @ 2021-06-16 15:24 UTC (permalink / raw)
  To: Jeff Layton; +Cc: ceph-devel, pdonnell, ukernel, idryomov, xiubli

On Tue, Jun 15, 2021 at 10:57:25AM -0400, Jeff Layton wrote:
> ...to simplify some error paths.
> 
> Signed-off-by: Jeff Layton <jlayton@kernel.org>
> ---
>  fs/ceph/dir.c        | 3 +--
>  fs/ceph/inode.c      | 6 ++----
>  fs/ceph/mds_client.c | 6 ++++--
>  fs/ceph/metric.c     | 3 +--
>  4 files changed, 8 insertions(+), 10 deletions(-)
> 
> diff --git a/fs/ceph/dir.c b/fs/ceph/dir.c
> index ac431246e0c9..0dc5f8357f58 100644
> --- a/fs/ceph/dir.c
> +++ b/fs/ceph/dir.c
> @@ -1802,8 +1802,7 @@ static void ceph_d_release(struct dentry *dentry)
>  	dentry->d_fsdata = NULL;
>  	spin_unlock(&dentry->d_lock);
>  
> -	if (di->lease_session)
> -		ceph_put_mds_session(di->lease_session);
> +	ceph_put_mds_session(di->lease_session);
>  	kmem_cache_free(ceph_dentry_cachep, di);
>  }
>  
> diff --git a/fs/ceph/inode.c b/fs/ceph/inode.c
> index df0c8a724609..6f43542b3344 100644
> --- a/fs/ceph/inode.c
> +++ b/fs/ceph/inode.c
> @@ -1154,8 +1154,7 @@ static inline void update_dentry_lease(struct inode *dir, struct dentry *dentry,
>  	__update_dentry_lease(dir, dentry, lease, session, from_time,
>  			      &old_lease_session);
>  	spin_unlock(&dentry->d_lock);
> -	if (old_lease_session)
> -		ceph_put_mds_session(old_lease_session);
> +	ceph_put_mds_session(old_lease_session);
>  }
>  
>  /*
> @@ -1200,8 +1199,7 @@ static void update_dentry_lease_careful(struct dentry *dentry,
>  			      from_time, &old_lease_session);
>  out_unlock:
>  	spin_unlock(&dentry->d_lock);
> -	if (old_lease_session)
> -		ceph_put_mds_session(old_lease_session);
> +	ceph_put_mds_session(old_lease_session);
>  }
>  
>  /*
> diff --git a/fs/ceph/mds_client.c b/fs/ceph/mds_client.c
> index e5af591d3bd4..ec669634c649 100644
> --- a/fs/ceph/mds_client.c
> +++ b/fs/ceph/mds_client.c
> @@ -664,6 +664,9 @@ struct ceph_mds_session *ceph_get_mds_session(struct ceph_mds_session *s)
>  
>  void ceph_put_mds_session(struct ceph_mds_session *s)
>  {
> +	if (IS_ERR_OR_NULL(s))
> +		return;
> +
>  	dout("mdsc put_session %p %d -> %d\n", s,
>  	     refcount_read(&s->s_ref), refcount_read(&s->s_ref)-1);
>  	if (refcount_dec_and_test(&s->s_ref)) {
> @@ -1438,8 +1441,7 @@ static void __open_export_target_sessions(struct ceph_mds_client *mdsc,
>  
>  	for (i = 0; i < mi->num_export_targets; i++) {
>  		ts = __open_export_target_session(mdsc, mi->export_targets[i]);
> -		if (!IS_ERR(ts))
> -			ceph_put_mds_session(ts);
> +		ceph_put_mds_session(ts);
>  	}
>  }
>  
> diff --git a/fs/ceph/metric.c b/fs/ceph/metric.c
> index 9577c71e645d..5ac151eb0d49 100644
> --- a/fs/ceph/metric.c
> +++ b/fs/ceph/metric.c
> @@ -311,8 +311,7 @@ void ceph_metric_destroy(struct ceph_client_metric *m)
>  
>  	cancel_delayed_work_sync(&m->delayed_work);
>  
> -	if (m->session)
> -		ceph_put_mds_session(m->session);
> +	ceph_put_mds_session(m->session);
>  }
>  
>  #define METRIC_UPDATE_MIN_MAX(min, max, new)	\
> -- 
> 2.31.1
> 

LGTM, feel free to add my

Reviewed-by: Luis Henriques <lhenriques@suse.de>

Cheers,
--
Luís

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [RFC PATCH 2/6] ceph: eliminate session->s_gen_ttl_lock
  2021-06-15 14:57 ` [RFC PATCH 2/6] ceph: eliminate session->s_gen_ttl_lock Jeff Layton
@ 2021-06-16 15:24   ` Luis Henriques
  2021-06-18 13:31     ` Jeff Layton
  0 siblings, 1 reply; 18+ messages in thread
From: Luis Henriques @ 2021-06-16 15:24 UTC (permalink / raw)
  To: Jeff Layton; +Cc: ceph-devel, pdonnell, ukernel, idryomov, xiubli

On Tue, Jun 15, 2021 at 10:57:26AM -0400, Jeff Layton wrote:
> Turn s_cap_gen field into an atomic_t, and just rely on the fact that we
> hold the s_mutex when changing the s_cap_ttl field.
> 
> Signed-off-by: Jeff Layton <jlayton@kernel.org>
> ---
>  fs/ceph/caps.c       | 15 ++++++---------
>  fs/ceph/dir.c        |  4 +---
>  fs/ceph/inode.c      |  4 ++--
>  fs/ceph/mds_client.c | 17 ++++++-----------
>  fs/ceph/mds_client.h |  6 ++----
>  5 files changed, 17 insertions(+), 29 deletions(-)
> 
> diff --git a/fs/ceph/caps.c b/fs/ceph/caps.c
> index a5e93b185515..919eada97a1f 100644
> --- a/fs/ceph/caps.c
> +++ b/fs/ceph/caps.c
> @@ -645,9 +645,7 @@ void ceph_add_cap(struct inode *inode,
>  	dout("add_cap %p mds%d cap %llx %s seq %d\n", inode,
>  	     session->s_mds, cap_id, ceph_cap_string(issued), seq);
>  
> -	spin_lock(&session->s_gen_ttl_lock);
> -	gen = session->s_cap_gen;
> -	spin_unlock(&session->s_gen_ttl_lock);
> +	gen = atomic_read(&session->s_cap_gen);
>  
>  	cap = __get_cap_for_mds(ci, mds);
>  	if (!cap) {
> @@ -785,10 +783,8 @@ static int __cap_is_valid(struct ceph_cap *cap)
>  	unsigned long ttl;
>  	u32 gen;
>  
> -	spin_lock(&cap->session->s_gen_ttl_lock);
> -	gen = cap->session->s_cap_gen;
> +	gen = atomic_read(&cap->session->s_cap_gen);
>  	ttl = cap->session->s_cap_ttl;
> -	spin_unlock(&cap->session->s_gen_ttl_lock);
>  
>  	if (cap->cap_gen < gen || time_after_eq(jiffies, ttl)) {
>  		dout("__cap_is_valid %p cap %p issued %s "
> @@ -1182,7 +1178,8 @@ void __ceph_remove_cap(struct ceph_cap *cap, bool queue_release)
>  	 * s_cap_gen while session is in the reconnect state.
>  	 */
>  	if (queue_release &&
> -	    (!session->s_cap_reconnect || cap->cap_gen == session->s_cap_gen)) {
> +	    (!session->s_cap_reconnect ||
> +	     cap->cap_gen == atomic_read(&session->s_cap_gen))) {
>  		cap->queue_release = 1;
>  		if (removed) {
>  			__ceph_queue_cap_release(session, cap);
> @@ -3288,7 +3285,7 @@ static void handle_cap_grant(struct inode *inode,
>  	u64 size = le64_to_cpu(grant->size);
>  	u64 max_size = le64_to_cpu(grant->max_size);
>  	unsigned char check_caps = 0;
> -	bool was_stale = cap->cap_gen < session->s_cap_gen;
> +	bool was_stale = cap->cap_gen < atomic_read(&session->s_cap_gen);
>  	bool wake = false;
>  	bool writeback = false;
>  	bool queue_trunc = false;
> @@ -3340,7 +3337,7 @@ static void handle_cap_grant(struct inode *inode,
>  	}
>  
>  	/* side effects now are allowed */
> -	cap->cap_gen = session->s_cap_gen;
> +	cap->cap_gen = atomic_read(&session->s_cap_gen);
>  	cap->seq = seq;
>  
>  	__check_cap_issue(ci, cap, newcaps);
> diff --git a/fs/ceph/dir.c b/fs/ceph/dir.c
> index 0dc5f8357f58..bd508b1aeac2 100644
> --- a/fs/ceph/dir.c
> +++ b/fs/ceph/dir.c
> @@ -1541,10 +1541,8 @@ static bool __dentry_lease_is_valid(struct ceph_dentry_info *di)
>  		u32 gen;
>  		unsigned long ttl;
>  
> -		spin_lock(&session->s_gen_ttl_lock);
> -		gen = session->s_cap_gen;
> +		gen = atomic_read(&session->s_cap_gen);
>  		ttl = session->s_cap_ttl;
> -		spin_unlock(&session->s_gen_ttl_lock);
>  
>  		if (di->lease_gen == gen &&
>  		    time_before(jiffies, ttl) &&
> diff --git a/fs/ceph/inode.c b/fs/ceph/inode.c
> index 6f43542b3344..6034821c9d63 100644
> --- a/fs/ceph/inode.c
> +++ b/fs/ceph/inode.c
> @@ -1124,7 +1124,7 @@ static void __update_dentry_lease(struct inode *dir, struct dentry *dentry,
>  		return;
>  	}
>  
> -	if (di->lease_gen == session->s_cap_gen &&
> +	if (di->lease_gen == atomic_read(&session->s_cap_gen) &&
>  	    time_before(ttl, di->time))
>  		return;  /* we already have a newer lease. */
>  
> @@ -1135,7 +1135,7 @@ static void __update_dentry_lease(struct inode *dir, struct dentry *dentry,
>  
>  	if (!di->lease_session)
>  		di->lease_session = ceph_get_mds_session(session);
> -	di->lease_gen = session->s_cap_gen;
> +	di->lease_gen = atomic_read(&session->s_cap_gen);
>  	di->lease_seq = le32_to_cpu(lease->seq);
>  	di->lease_renew_after = half_ttl;
>  	di->lease_renew_from = 0;
> diff --git a/fs/ceph/mds_client.c b/fs/ceph/mds_client.c
> index ec669634c649..87d3be10af25 100644
> --- a/fs/ceph/mds_client.c
> +++ b/fs/ceph/mds_client.c
> @@ -749,8 +749,7 @@ static struct ceph_mds_session *register_session(struct ceph_mds_client *mdsc,
>  
>  	ceph_con_init(&s->s_con, s, &mds_con_ops, &mdsc->fsc->client->msgr);
>  
> -	spin_lock_init(&s->s_gen_ttl_lock);
> -	s->s_cap_gen = 1;
> +	atomic_set(&s->s_cap_gen, 1);
>  	s->s_cap_ttl = jiffies - 1;
>  
>  	spin_lock_init(&s->s_cap_lock);
> @@ -1763,7 +1762,7 @@ static int wake_up_session_cb(struct inode *inode, struct ceph_cap *cap,
>  		ci->i_requested_max_size = 0;
>  		spin_unlock(&ci->i_ceph_lock);
>  	} else if (ev == RENEWCAPS) {
> -		if (cap->cap_gen < cap->session->s_cap_gen) {
> +		if (cap->cap_gen < atomic_read(&cap->session->s_cap_gen)) {
>  			/* mds did not re-issue stale cap */
>  			spin_lock(&ci->i_ceph_lock);
>  			cap->issued = cap->implemented = CEPH_CAP_PIN;
> @@ -3501,10 +3500,8 @@ static void handle_session(struct ceph_mds_session *session,
>  	case CEPH_SESSION_STALE:
>  		pr_info("mds%d caps went stale, renewing\n",
>  			session->s_mds);
> -		spin_lock(&session->s_gen_ttl_lock);
> -		session->s_cap_gen++;
> +		atomic_inc(&session->s_cap_gen);
>  		session->s_cap_ttl = jiffies - 1;
> -		spin_unlock(&session->s_gen_ttl_lock);
>  		send_renew_caps(mdsc, session);
>  		break;
>  
> @@ -3773,7 +3770,7 @@ static int reconnect_caps_cb(struct inode *inode, struct ceph_cap *cap,
>  	cap->seq = 0;        /* reset cap seq */
>  	cap->issue_seq = 0;  /* and issue_seq */
>  	cap->mseq = 0;       /* and migrate_seq */
> -	cap->cap_gen = cap->session->s_cap_gen;
> +	cap->cap_gen = atomic_read(&cap->session->s_cap_gen);
>  
>  	/* These are lost when the session goes away */
>  	if (S_ISDIR(inode->i_mode)) {
> @@ -4013,9 +4010,7 @@ static void send_mds_reconnect(struct ceph_mds_client *mdsc,
>  	dout("session %p state %s\n", session,
>  	     ceph_session_state_name(session->s_state));
>  
> -	spin_lock(&session->s_gen_ttl_lock);
> -	session->s_cap_gen++;
> -	spin_unlock(&session->s_gen_ttl_lock);
> +	atomic_inc(&session->s_cap_gen);
>  
>  	spin_lock(&session->s_cap_lock);
>  	/* don't know if session is readonly */
> @@ -4346,7 +4341,7 @@ static void handle_lease(struct ceph_mds_client *mdsc,
>  
>  	case CEPH_MDS_LEASE_RENEW:
>  		if (di->lease_session == session &&
> -		    di->lease_gen == session->s_cap_gen &&
> +		    di->lease_gen == atomic_read(&session->s_cap_gen) &&
>  		    di->lease_renew_from &&
>  		    di->lease_renew_after == 0) {
>  			unsigned long duration =
> diff --git a/fs/ceph/mds_client.h b/fs/ceph/mds_client.h
> index 15c11a0f2caf..20e42d8b66c6 100644
> --- a/fs/ceph/mds_client.h
> +++ b/fs/ceph/mds_client.h
> @@ -186,10 +186,8 @@ struct ceph_mds_session {
>  
>  	struct ceph_auth_handshake s_auth;
>  
> -	/* protected by s_gen_ttl_lock */
> -	spinlock_t        s_gen_ttl_lock;
> -	u32               s_cap_gen;  /* inc each time we get mds stale msg */
> -	unsigned long     s_cap_ttl;  /* when session caps expire */
> +	atomic_t          s_cap_gen;  /* inc each time we get mds stale msg */
> +	unsigned long     s_cap_ttl;  /* when session caps expire. protected by s_mutex */
>  
>  	/* protected by s_cap_lock */
>  	spinlock_t        s_cap_lock;
> -- 
> 2.31.1
> 

Reviewed-by: Luis Henriques <lhenriques@suse.de>

(Since we don't have s_mutex when reading s_cap_ttl, would it make sense
to make it an atomic_t too?)

Cheers,
--
Luís

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [RFC PATCH 3/6] ceph: don't take s_mutex or snap_rwsem in ceph_check_caps
  2021-06-15 14:57 ` [RFC PATCH 3/6] ceph: don't take s_mutex or snap_rwsem in ceph_check_caps Jeff Layton
@ 2021-06-16 15:25   ` Luis Henriques
  2021-06-18 13:35     ` Jeff Layton
  0 siblings, 1 reply; 18+ messages in thread
From: Luis Henriques @ 2021-06-16 15:25 UTC (permalink / raw)
  To: Jeff Layton; +Cc: ceph-devel, pdonnell, ukernel, idryomov, xiubli

On Tue, Jun 15, 2021 at 10:57:27AM -0400, Jeff Layton wrote:
> These locks appear to be completely unnecessary. Almost all of this
> function is done under the inode->i_ceph_lock, aside from the actual
> sending of the message. Don't take either lock in this function.
> 
> Signed-off-by: Jeff Layton <jlayton@kernel.org>
> ---
>  fs/ceph/caps.c | 61 ++++++--------------------------------------------
>  1 file changed, 7 insertions(+), 54 deletions(-)
> 
> diff --git a/fs/ceph/caps.c b/fs/ceph/caps.c
> index 919eada97a1f..825b1e463ad3 100644
> --- a/fs/ceph/caps.c
> +++ b/fs/ceph/caps.c
> @@ -1912,7 +1912,6 @@ void ceph_check_caps(struct ceph_inode_info *ci, int flags,
>  	struct ceph_cap *cap;
>  	u64 flush_tid, oldest_flush_tid;
>  	int file_wanted, used, cap_used;
> -	int took_snap_rwsem = 0;             /* true if mdsc->snap_rwsem held */
>  	int issued, implemented, want, retain, revoking, flushing = 0;
>  	int mds = -1;   /* keep track of how far we've gone through i_caps list
>  			   to avoid an infinite loop on retry */
> @@ -1920,6 +1919,9 @@ void ceph_check_caps(struct ceph_inode_info *ci, int flags,
>  	bool queue_invalidate = false;
>  	bool tried_invalidate = false;
>  
> +	if (session)
> +		ceph_get_mds_session(session);
> +
>  	spin_lock(&ci->i_ceph_lock);
>  	if (ci->i_ceph_flags & CEPH_I_FLUSH)
>  		flags |= CHECK_CAPS_FLUSH;
> @@ -2021,8 +2023,6 @@ void ceph_check_caps(struct ceph_inode_info *ci, int flags,
>  		    ((flags & CHECK_CAPS_AUTHONLY) && cap != ci->i_auth_cap))
>  			continue;
>  
> -		/* NOTE: no side-effects allowed, until we take s_mutex */
> -
>  		/*
>  		 * If we have an auth cap, we don't need to consider any
>  		 * overlapping caps as used.
> @@ -2085,37 +2085,8 @@ void ceph_check_caps(struct ceph_inode_info *ci, int flags,
>  			continue;     /* nope, all good */
>  
>  ack:
> -		if (session && session != cap->session) {
> -			dout("oops, wrong session %p mutex\n", session);
> -			mutex_unlock(&session->s_mutex);
> -			session = NULL;
> -		}
> -		if (!session) {
> -			session = cap->session;
> -			if (mutex_trylock(&session->s_mutex) == 0) {
> -				dout("inverting session/ino locks on %p\n",
> -				     session);
> -				session = ceph_get_mds_session(session);
> -				spin_unlock(&ci->i_ceph_lock);
> -				if (took_snap_rwsem) {
> -					up_read(&mdsc->snap_rwsem);
> -					took_snap_rwsem = 0;
> -				}
> -				if (session) {
> -					mutex_lock(&session->s_mutex);
> -					ceph_put_mds_session(session);
> -				} else {
> -					/*
> -					 * Because we take the reference while
> -					 * holding the i_ceph_lock, it should
> -					 * never be NULL. Throw a warning if it
> -					 * ever is.
> -					 */
> -					WARN_ON_ONCE(true);
> -				}
> -				goto retry;
> -			}
> -		}
> +		ceph_put_mds_session(session);
> +		session = ceph_get_mds_session(cap->session);
>  
>  		/* kick flushing and flush snaps before sending normal
>  		 * cap message */
> @@ -2130,19 +2101,6 @@ void ceph_check_caps(struct ceph_inode_info *ci, int flags,
>  			goto retry_locked;
>  		}
>  
> -		/* take snap_rwsem after session mutex */
> -		if (!took_snap_rwsem) {
> -			if (down_read_trylock(&mdsc->snap_rwsem) == 0) {
> -				dout("inverting snap/in locks on %p\n",
> -				     inode);
> -				spin_unlock(&ci->i_ceph_lock);
> -				down_read(&mdsc->snap_rwsem);
> -				took_snap_rwsem = 1;
> -				goto retry;
> -			}
> -			took_snap_rwsem = 1;
> -		}
> -
>  		if (cap == ci->i_auth_cap && ci->i_dirty_caps) {
>  			flushing = ci->i_dirty_caps;
>  			flush_tid = __mark_caps_flushing(inode, session, false,
> @@ -2179,13 +2137,9 @@ void ceph_check_caps(struct ceph_inode_info *ci, int flags,
>  
>  	spin_unlock(&ci->i_ceph_lock);
>  
> +	ceph_put_mds_session(session);
>  	if (queue_invalidate)
>  		ceph_queue_invalidate(inode);
> -
> -	if (session)
> -		mutex_unlock(&session->s_mutex);
> -	if (took_snap_rwsem)
> -		up_read(&mdsc->snap_rwsem);
>  }
>  
>  /*
> @@ -3550,13 +3504,12 @@ static void handle_cap_grant(struct inode *inode,
>  	if (wake)
>  		wake_up_all(&ci->i_cap_wq);
>  
> +	mutex_unlock(&session->s_mutex);
>  	if (check_caps == 1)
>  		ceph_check_caps(ci, CHECK_CAPS_AUTHONLY | CHECK_CAPS_NOINVAL,
>  				session);
>  	else if (check_caps == 2)
>  		ceph_check_caps(ci, CHECK_CAPS_NOINVAL, session);
> -	else
> -		mutex_unlock(&session->s_mutex);
>  }
>  
>  /*
> -- 
> 2.31.1
> 

Ugh, this is a tricky one.  I couldn't find anything wrong but... yeah,
here it is:

Reviewed-by: Luis Henriques <lhenriques@suse.de>

(Suggestion: remove the 'retry/retry_locked' goto dance and simply lock
i_ceph_lock in the only 'goto retry' call.)

Cheers,
--
Luís

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [RFC PATCH 4/6] ceph: don't take s_mutex in try_flush_caps
  2021-06-15 14:57 ` [RFC PATCH 4/6] ceph: don't take s_mutex in try_flush_caps Jeff Layton
@ 2021-06-16 15:25   ` Luis Henriques
  0 siblings, 0 replies; 18+ messages in thread
From: Luis Henriques @ 2021-06-16 15:25 UTC (permalink / raw)
  To: Jeff Layton; +Cc: ceph-devel, pdonnell, ukernel, idryomov, xiubli

On Tue, Jun 15, 2021 at 10:57:28AM -0400, Jeff Layton wrote:
> The s_mutex doesn't protect anything in this codepath.
> 
> Signed-off-by: Jeff Layton <jlayton@kernel.org>
> ---
>  fs/ceph/caps.c | 16 ++--------------
>  1 file changed, 2 insertions(+), 14 deletions(-)
> 
> diff --git a/fs/ceph/caps.c b/fs/ceph/caps.c
> index 825b1e463ad3..d21b1fa36875 100644
> --- a/fs/ceph/caps.c
> +++ b/fs/ceph/caps.c
> @@ -2149,26 +2149,17 @@ static int try_flush_caps(struct inode *inode, u64 *ptid)
>  {
>  	struct ceph_mds_client *mdsc = ceph_sb_to_client(inode->i_sb)->mdsc;
>  	struct ceph_inode_info *ci = ceph_inode(inode);
> -	struct ceph_mds_session *session = NULL;
>  	int flushing = 0;
>  	u64 flush_tid = 0, oldest_flush_tid = 0;
>  
> -retry:
>  	spin_lock(&ci->i_ceph_lock);
>  retry_locked:
>  	if (ci->i_dirty_caps && ci->i_auth_cap) {
>  		struct ceph_cap *cap = ci->i_auth_cap;
>  		struct cap_msg_args arg;
> +		struct ceph_mds_session *session = cap->session;
>  
> -		if (session != cap->session) {
> -			spin_unlock(&ci->i_ceph_lock);
> -			if (session)
> -				mutex_unlock(&session->s_mutex);
> -			session = cap->session;
> -			mutex_lock(&session->s_mutex);
> -			goto retry;
> -		}
> -		if (cap->session->s_state < CEPH_MDS_SESSION_OPEN) {
> +		if (session->s_state < CEPH_MDS_SESSION_OPEN) {
>  			spin_unlock(&ci->i_ceph_lock);
>  			goto out;
>  		}
> @@ -2205,9 +2196,6 @@ static int try_flush_caps(struct inode *inode, u64 *ptid)
>  		spin_unlock(&ci->i_ceph_lock);
>  	}
>  out:
> -	if (session)
> -		mutex_unlock(&session->s_mutex);
> -
>  	*ptid = flush_tid;
>  	return flushing;
>  }
> -- 
> 2.31.1
> 

Reviewed-by: Luis Henriques <lhenriques@suse.de>

Cheers,
--
Luís

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [RFC PATCH 5/6] ceph: don't take s_mutex in ceph_flush_snaps
  2021-06-15 14:57 ` [RFC PATCH 5/6] ceph: don't take s_mutex in ceph_flush_snaps Jeff Layton
@ 2021-06-16 15:25   ` Luis Henriques
  0 siblings, 0 replies; 18+ messages in thread
From: Luis Henriques @ 2021-06-16 15:25 UTC (permalink / raw)
  To: Jeff Layton; +Cc: ceph-devel, pdonnell, ukernel, idryomov, xiubli

On Tue, Jun 15, 2021 at 10:57:29AM -0400, Jeff Layton wrote:
> Signed-off-by: Jeff Layton <jlayton@kernel.org>
> ---
>  fs/ceph/caps.c | 8 +-------
>  fs/ceph/snap.c | 4 +---
>  2 files changed, 2 insertions(+), 10 deletions(-)
> 
> diff --git a/fs/ceph/caps.c b/fs/ceph/caps.c
> index d21b1fa36875..5864d5088e27 100644
> --- a/fs/ceph/caps.c
> +++ b/fs/ceph/caps.c
> @@ -1531,7 +1531,7 @@ static inline int __send_flush_snap(struct inode *inode,
>   * asynchronously back to the MDS once sync writes complete and dirty
>   * data is written out.
>   *
> - * Called under i_ceph_lock.  Takes s_mutex as needed.
> + * Called under i_ceph_lock.
>   */
>  static void __ceph_flush_snaps(struct ceph_inode_info *ci,
>  			       struct ceph_mds_session *session)
> @@ -1653,7 +1653,6 @@ void ceph_flush_snaps(struct ceph_inode_info *ci,
>  	mds = ci->i_auth_cap->session->s_mds;
>  	if (session && session->s_mds != mds) {
>  		dout(" oops, wrong session %p mutex\n", session);
> -		mutex_unlock(&session->s_mutex);
>  		ceph_put_mds_session(session);
>  		session = NULL;
>  	}
> @@ -1662,10 +1661,6 @@ void ceph_flush_snaps(struct ceph_inode_info *ci,
>  		mutex_lock(&mdsc->mutex);
>  		session = __ceph_lookup_mds_session(mdsc, mds);
>  		mutex_unlock(&mdsc->mutex);
> -		if (session) {
> -			dout(" inverting session/ino locks on %p\n", session);
> -			mutex_lock(&session->s_mutex);
> -		}
>  		goto retry;
>  	}
>  
> @@ -1680,7 +1675,6 @@ void ceph_flush_snaps(struct ceph_inode_info *ci,
>  	if (psession) {
>  		*psession = session;
>  	} else if (session) {
> -		mutex_unlock(&session->s_mutex);
>  		ceph_put_mds_session(session);

nit: since ceph_put_mds_session() now checks for NULL, the 'else' doesn't
really need the condition.

>  	}
>  	/* we flushed them all; remove this inode from the queue */
> diff --git a/fs/ceph/snap.c b/fs/ceph/snap.c
> index f8cac2abab3f..afc7f4c32364 100644
> --- a/fs/ceph/snap.c
> +++ b/fs/ceph/snap.c
> @@ -846,10 +846,8 @@ static void flush_snaps(struct ceph_mds_client *mdsc)
>  	}
>  	spin_unlock(&mdsc->snap_flush_lock);
>  
> -	if (session) {
> -		mutex_unlock(&session->s_mutex);
> +	if (session)
>  		ceph_put_mds_session(session);
> -	}

Same here: the 'if (session)' can be dropped.

>  	dout("flush_snaps done\n");
>  }
>  
> -- 
> 2.31.1
> 

Reviewed-by: Luis Henriques <lhenriques@suse.de>

Cheers,
--
Luís

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [RFC PATCH 6/6] ceph: eliminate ceph_async_iput()
  2021-06-15 14:57 ` [RFC PATCH 6/6] ceph: eliminate ceph_async_iput() Jeff Layton
@ 2021-06-16 15:26   ` Luis Henriques
  2021-06-17 16:24     ` Jeff Layton
  0 siblings, 1 reply; 18+ messages in thread
From: Luis Henriques @ 2021-06-16 15:26 UTC (permalink / raw)
  To: Jeff Layton; +Cc: ceph-devel, pdonnell, ukernel, idryomov, xiubli

On Tue, Jun 15, 2021 at 10:57:30AM -0400, Jeff Layton wrote:
> Now that we don't need to hold session->s_mutex or the snap_rwsem when
> calling ceph_check_caps, we can eliminate ceph_async_iput and just use
> normal iput calls.
> 
> Signed-off-by: Jeff Layton <jlayton@kernel.org>
> ---
>  fs/ceph/caps.c       |  6 +++---
>  fs/ceph/inode.c      | 25 +++----------------------
>  fs/ceph/mds_client.c | 22 +++++++++++-----------
>  fs/ceph/quota.c      |  6 +++---
>  fs/ceph/snap.c       | 10 +++++-----
>  fs/ceph/super.h      |  2 --
>  6 files changed, 25 insertions(+), 46 deletions(-)
> 
> diff --git a/fs/ceph/caps.c b/fs/ceph/caps.c
> index 5864d5088e27..fd9243e9a1b2 100644
> --- a/fs/ceph/caps.c
> +++ b/fs/ceph/caps.c
> @@ -3147,7 +3147,7 @@ void ceph_put_wrbuffer_cap_refs(struct ceph_inode_info *ci, int nr,
>  		wake_up_all(&ci->i_cap_wq);
>  	while (put-- > 0) {
>  		/* avoid calling iput_final() in osd dispatch threads */
> -		ceph_async_iput(inode);
> +		iput(inode);
>  	}
>  }
>  
> @@ -4136,7 +4136,7 @@ void ceph_handle_caps(struct ceph_mds_session *session,
>  done_unlocked:
>  	ceph_put_string(extra_info.pool_ns);
>  	/* avoid calling iput_final() in mds dispatch threads */
> -	ceph_async_iput(inode);
> +	iput(inode);

To be honest, I'm not really convinced we can blindly substitute
ceph_async_iput() by iput().  This case specifically can problematic
because we may have called ceph_queue_vmtruncate() above (or
handle_cap_grant()).  If we did, we have ci->i_work_mask set and the wq
would have invoked __ceph_do_pending_vmtruncate().  Using the iput() here
won't have that result.  Am I missing something?

Cheers,
--
Luís

>  	return;
>  
>  flush_cap_releases:
> @@ -4179,7 +4179,7 @@ void ceph_check_delayed_caps(struct ceph_mds_client *mdsc)
>  			dout("check_delayed_caps on %p\n", inode);
>  			ceph_check_caps(ci, 0, NULL);
>  			/* avoid calling iput_final() in tick thread */
> -			ceph_async_iput(inode);
> +			iput(inode);
>  			spin_lock(&mdsc->cap_delay_lock);
>  		}
>  	}
> diff --git a/fs/ceph/inode.c b/fs/ceph/inode.c
> index 6034821c9d63..5f1c27caf0b6 100644
> --- a/fs/ceph/inode.c
> +++ b/fs/ceph/inode.c
> @@ -1567,7 +1567,7 @@ static int readdir_prepopulate_inodes_only(struct ceph_mds_request *req,
>  		}
>  
>  		/* avoid calling iput_final() in mds dispatch threads */
> -		ceph_async_iput(in);
> +		iput(in);
>  	}
>  
>  	return err;
> @@ -1770,7 +1770,7 @@ int ceph_readdir_prepopulate(struct ceph_mds_request *req,
>  					ihold(in);
>  					discard_new_inode(in);
>  				}
> -				ceph_async_iput(in);
> +				iput(in);
>  			}
>  			d_drop(dn);
>  			err = ret;
> @@ -1783,7 +1783,7 @@ int ceph_readdir_prepopulate(struct ceph_mds_request *req,
>  			if (ceph_security_xattr_deadlock(in)) {
>  				dout(" skip splicing dn %p to inode %p"
>  				     " (security xattr deadlock)\n", dn, in);
> -				ceph_async_iput(in);
> +				iput(in);
>  				skipped++;
>  				goto next_item;
>  			}
> @@ -1834,25 +1834,6 @@ bool ceph_inode_set_size(struct inode *inode, loff_t size)
>  	return ret;
>  }
>  
> -/*
> - * Put reference to inode, but avoid calling iput_final() in current thread.
> - * iput_final() may wait for reahahead pages. The wait can cause deadlock in
> - * some contexts.
> - */
> -void ceph_async_iput(struct inode *inode)
> -{
> -	if (!inode)
> -		return;
> -	for (;;) {
> -		if (atomic_add_unless(&inode->i_count, -1, 1))
> -			break;
> -		if (queue_work(ceph_inode_to_client(inode)->inode_wq,
> -			       &ceph_inode(inode)->i_work))
> -			break;
> -		/* queue work failed, i_count must be at least 2 */
> -	}
> -}
> -
>  void ceph_queue_inode_work(struct inode *inode, int work_bit)
>  {
>  	struct ceph_fs_client *fsc = ceph_inode_to_client(inode);
> diff --git a/fs/ceph/mds_client.c b/fs/ceph/mds_client.c
> index 87d3be10af25..1f960361b78e 100644
> --- a/fs/ceph/mds_client.c
> +++ b/fs/ceph/mds_client.c
> @@ -825,13 +825,13 @@ void ceph_mdsc_release_request(struct kref *kref)
>  	if (req->r_inode) {
>  		ceph_put_cap_refs(ceph_inode(req->r_inode), CEPH_CAP_PIN);
>  		/* avoid calling iput_final() in mds dispatch threads */
> -		ceph_async_iput(req->r_inode);
> +		iput(req->r_inode);
>  	}
>  	if (req->r_parent) {
>  		ceph_put_cap_refs(ceph_inode(req->r_parent), CEPH_CAP_PIN);
> -		ceph_async_iput(req->r_parent);
> +		iput(req->r_parent);
>  	}
> -	ceph_async_iput(req->r_target_inode);
> +	iput(req->r_target_inode);
>  	if (req->r_dentry)
>  		dput(req->r_dentry);
>  	if (req->r_old_dentry)
> @@ -845,7 +845,7 @@ void ceph_mdsc_release_request(struct kref *kref)
>  		 */
>  		ceph_put_cap_refs(ceph_inode(req->r_old_dentry_dir),
>  				  CEPH_CAP_PIN);
> -		ceph_async_iput(req->r_old_dentry_dir);
> +		iput(req->r_old_dentry_dir);
>  	}
>  	kfree(req->r_path1);
>  	kfree(req->r_path2);
> @@ -961,7 +961,7 @@ static void __unregister_request(struct ceph_mds_client *mdsc,
>  
>  	if (req->r_unsafe_dir) {
>  		/* avoid calling iput_final() in mds dispatch threads */
> -		ceph_async_iput(req->r_unsafe_dir);
> +		iput(req->r_unsafe_dir);
>  		req->r_unsafe_dir = NULL;
>  	}
>  
> @@ -1132,7 +1132,7 @@ static int __choose_mds(struct ceph_mds_client *mdsc,
>  		cap = rb_entry(rb_first(&ci->i_caps), struct ceph_cap, ci_node);
>  	if (!cap) {
>  		spin_unlock(&ci->i_ceph_lock);
> -		ceph_async_iput(inode);
> +		iput(inode);
>  		goto random;
>  	}
>  	mds = cap->session->s_mds;
> @@ -1143,7 +1143,7 @@ static int __choose_mds(struct ceph_mds_client *mdsc,
>  out:
>  	/* avoid calling iput_final() while holding mdsc->mutex or
>  	 * in mds dispatch threads */
> -	ceph_async_iput(inode);
> +	iput(inode);
>  	return mds;
>  
>  random:
> @@ -1548,7 +1548,7 @@ int ceph_iterate_session_caps(struct ceph_mds_session *session,
>  		if (last_inode) {
>  			/* avoid calling iput_final() while holding
>  			 * s_mutex or in mds dispatch threads */
> -			ceph_async_iput(last_inode);
> +			iput(last_inode);
>  			last_inode = NULL;
>  		}
>  		if (old_cap) {
> @@ -1582,7 +1582,7 @@ int ceph_iterate_session_caps(struct ceph_mds_session *session,
>  	session->s_cap_iterator = NULL;
>  	spin_unlock(&session->s_cap_lock);
>  
> -	ceph_async_iput(last_inode);
> +	iput(last_inode);
>  	if (old_cap)
>  		ceph_put_cap(session->s_mdsc, old_cap);
>  
> @@ -1723,7 +1723,7 @@ static void remove_session_caps(struct ceph_mds_session *session)
>  
>  			inode = ceph_find_inode(sb, vino);
>  			 /* avoid calling iput_final() while holding s_mutex */
> -			ceph_async_iput(inode);
> +			iput(inode);
>  
>  			spin_lock(&session->s_cap_lock);
>  		}
> @@ -4370,7 +4370,7 @@ static void handle_lease(struct ceph_mds_client *mdsc,
>  out:
>  	mutex_unlock(&session->s_mutex);
>  	/* avoid calling iput_final() in mds dispatch threads */
> -	ceph_async_iput(inode);
> +	iput(inode);
>  	return;
>  
>  bad:
> diff --git a/fs/ceph/quota.c b/fs/ceph/quota.c
> index 4e32c9600ecc..988c6e04b327 100644
> --- a/fs/ceph/quota.c
> +++ b/fs/ceph/quota.c
> @@ -75,7 +75,7 @@ void ceph_handle_quota(struct ceph_mds_client *mdsc,
>  	spin_unlock(&ci->i_ceph_lock);
>  
>  	/* avoid calling iput_final() in dispatch thread */
> -	ceph_async_iput(inode);
> +	iput(inode);
>  }
>  
>  static struct ceph_quotarealm_inode *
> @@ -248,7 +248,7 @@ static struct ceph_snap_realm *get_quota_realm(struct ceph_mds_client *mdsc,
>  		ci = ceph_inode(in);
>  		has_quota = __ceph_has_any_quota(ci);
>  		/* avoid calling iput_final() while holding mdsc->snap_rwsem */
> -		ceph_async_iput(in);
> +		iput(in);
>  
>  		next = realm->parent;
>  		if (has_quota || !next)
> @@ -384,7 +384,7 @@ static bool check_quota_exceeded(struct inode *inode, enum quota_check_op op,
>  			exceeded = true; /* Just break the loop */
>  		}
>  		/* avoid calling iput_final() while holding mdsc->snap_rwsem */
> -		ceph_async_iput(in);
> +		iput(in);
>  
>  		next = realm->parent;
>  		if (exceeded || !next)
> diff --git a/fs/ceph/snap.c b/fs/ceph/snap.c
> index afc7f4c32364..f928b02bcd31 100644
> --- a/fs/ceph/snap.c
> +++ b/fs/ceph/snap.c
> @@ -679,13 +679,13 @@ static void queue_realm_cap_snaps(struct ceph_snap_realm *realm)
>  		spin_unlock(&realm->inodes_with_caps_lock);
>  		/* avoid calling iput_final() while holding
>  		 * mdsc->snap_rwsem or in mds dispatch threads */
> -		ceph_async_iput(lastinode);
> +		iput(lastinode);
>  		lastinode = inode;
>  		ceph_queue_cap_snap(ci);
>  		spin_lock(&realm->inodes_with_caps_lock);
>  	}
>  	spin_unlock(&realm->inodes_with_caps_lock);
> -	ceph_async_iput(lastinode);
> +	iput(lastinode);
>  
>  	dout("queue_realm_cap_snaps %p %llx done\n", realm, realm->ino);
>  }
> @@ -841,7 +841,7 @@ static void flush_snaps(struct ceph_mds_client *mdsc)
>  		ceph_flush_snaps(ci, &session);
>  		/* avoid calling iput_final() while holding
>  		 * session->s_mutex or in mds dispatch threads */
> -		ceph_async_iput(inode);
> +		iput(inode);
>  		spin_lock(&mdsc->snap_flush_lock);
>  	}
>  	spin_unlock(&mdsc->snap_flush_lock);
> @@ -985,12 +985,12 @@ void ceph_handle_snap(struct ceph_mds_client *mdsc,
>  
>  			/* avoid calling iput_final() while holding
>  			 * mdsc->snap_rwsem or mds in dispatch threads */
> -			ceph_async_iput(inode);
> +			iput(inode);
>  			continue;
>  
>  skip_inode:
>  			spin_unlock(&ci->i_ceph_lock);
> -			ceph_async_iput(inode);
> +			iput(inode);
>  		}
>  
>  		/* we may have taken some of the old realm's children. */
> diff --git a/fs/ceph/super.h b/fs/ceph/super.h
> index 31f0be9120dd..6b6332a5c113 100644
> --- a/fs/ceph/super.h
> +++ b/fs/ceph/super.h
> @@ -988,8 +988,6 @@ extern int ceph_inode_holds_cap(struct inode *inode, int mask);
>  extern bool ceph_inode_set_size(struct inode *inode, loff_t size);
>  extern void __ceph_do_pending_vmtruncate(struct inode *inode);
>  
> -extern void ceph_async_iput(struct inode *inode);
> -
>  void ceph_queue_inode_work(struct inode *inode, int work_bit);
>  
>  static inline void ceph_queue_vmtruncate(struct inode *inode)
> -- 
> 2.31.1
> 

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [RFC PATCH 6/6] ceph: eliminate ceph_async_iput()
  2021-06-16 15:26   ` Luis Henriques
@ 2021-06-17 16:24     ` Jeff Layton
  2021-06-18 12:56       ` Luis Henriques
  0 siblings, 1 reply; 18+ messages in thread
From: Jeff Layton @ 2021-06-17 16:24 UTC (permalink / raw)
  To: Luis Henriques; +Cc: ceph-devel, pdonnell, ukernel, idryomov, xiubli

On Wed, 2021-06-16 at 16:26 +0100, Luis Henriques wrote:
> On Tue, Jun 15, 2021 at 10:57:30AM -0400, Jeff Layton wrote:
> > Now that we don't need to hold session->s_mutex or the snap_rwsem when
> > calling ceph_check_caps, we can eliminate ceph_async_iput and just use
> > normal iput calls.
> > 
> > Signed-off-by: Jeff Layton <jlayton@kernel.org>
> > ---
> >  fs/ceph/caps.c       |  6 +++---
> >  fs/ceph/inode.c      | 25 +++----------------------
> >  fs/ceph/mds_client.c | 22 +++++++++++-----------
> >  fs/ceph/quota.c      |  6 +++---
> >  fs/ceph/snap.c       | 10 +++++-----
> >  fs/ceph/super.h      |  2 --
> >  6 files changed, 25 insertions(+), 46 deletions(-)
> > 
> > diff --git a/fs/ceph/caps.c b/fs/ceph/caps.c
> > index 5864d5088e27..fd9243e9a1b2 100644
> > --- a/fs/ceph/caps.c
> > +++ b/fs/ceph/caps.c
> > @@ -3147,7 +3147,7 @@ void ceph_put_wrbuffer_cap_refs(struct ceph_inode_info *ci, int nr,
> >  		wake_up_all(&ci->i_cap_wq);
> >  	while (put-- > 0) {
> >  		/* avoid calling iput_final() in osd dispatch threads */
> > -		ceph_async_iput(inode);
> > +		iput(inode);
> >  	}
> >  }
> >  
> > @@ -4136,7 +4136,7 @@ void ceph_handle_caps(struct ceph_mds_session *session,
> >  done_unlocked:
> >  	ceph_put_string(extra_info.pool_ns);
> >  	/* avoid calling iput_final() in mds dispatch threads */
> > -	ceph_async_iput(inode);
> > +	iput(inode);
> 
> To be honest, I'm not really convinced we can blindly substitute
> ceph_async_iput() by iput().  This case specifically can problematic
> because we may have called ceph_queue_vmtruncate() above (or
> handle_cap_grant()).  If we did, we have ci->i_work_mask set and the wq
> would have invoked __ceph_do_pending_vmtruncate().  Using the iput() here
> won't have that result.  Am I missing something?
> 

The point of this set is to make iput safe to run even when the s_mutex
and/or snap_rwsem is held. When we queue up the ci->i_work, we do take a
reference to the inode and still run iput there. This set just stops
queueing iputs themselves to the workqueue.

I really don't see a problem with this call site in ceph_handle_caps in
particular, as it's calling iput after dropping the s_mutex. Probably
that should not have been converted to use ceph_async_iput in the first
place.

> Cheers,
> --
> Luís
> 
> >  	return;
> >  
> >  flush_cap_releases:
> > @@ -4179,7 +4179,7 @@ void ceph_check_delayed_caps(struct ceph_mds_client *mdsc)
> >  			dout("check_delayed_caps on %p\n", inode);
> >  			ceph_check_caps(ci, 0, NULL);
> >  			/* avoid calling iput_final() in tick thread */
> > -			ceph_async_iput(inode);
> > +			iput(inode);
> >  			spin_lock(&mdsc->cap_delay_lock);
> >  		}
> >  	}
> > diff --git a/fs/ceph/inode.c b/fs/ceph/inode.c
> > index 6034821c9d63..5f1c27caf0b6 100644
> > --- a/fs/ceph/inode.c
> > +++ b/fs/ceph/inode.c
> > @@ -1567,7 +1567,7 @@ static int readdir_prepopulate_inodes_only(struct ceph_mds_request *req,
> >  		}
> >  
> >  		/* avoid calling iput_final() in mds dispatch threads */
> > -		ceph_async_iput(in);
> > +		iput(in);
> >  	}
> >  
> >  	return err;
> > @@ -1770,7 +1770,7 @@ int ceph_readdir_prepopulate(struct ceph_mds_request *req,
> >  					ihold(in);
> >  					discard_new_inode(in);
> >  				}
> > -				ceph_async_iput(in);
> > +				iput(in);
> >  			}
> >  			d_drop(dn);
> >  			err = ret;
> > @@ -1783,7 +1783,7 @@ int ceph_readdir_prepopulate(struct ceph_mds_request *req,
> >  			if (ceph_security_xattr_deadlock(in)) {
> >  				dout(" skip splicing dn %p to inode %p"
> >  				     " (security xattr deadlock)\n", dn, in);
> > -				ceph_async_iput(in);
> > +				iput(in);
> >  				skipped++;
> >  				goto next_item;
> >  			}
> > @@ -1834,25 +1834,6 @@ bool ceph_inode_set_size(struct inode *inode, loff_t size)
> >  	return ret;
> >  }
> >  
> > -/*
> > - * Put reference to inode, but avoid calling iput_final() in current thread.
> > - * iput_final() may wait for reahahead pages. The wait can cause deadlock in
> > - * some contexts.
> > - */
> > -void ceph_async_iput(struct inode *inode)
> > -{
> > -	if (!inode)
> > -		return;
> > -	for (;;) {
> > -		if (atomic_add_unless(&inode->i_count, -1, 1))
> > -			break;
> > -		if (queue_work(ceph_inode_to_client(inode)->inode_wq,
> > -			       &ceph_inode(inode)->i_work))
> > -			break;
> > -		/* queue work failed, i_count must be at least 2 */
> > -	}
> > -}
> > -
> >  void ceph_queue_inode_work(struct inode *inode, int work_bit)
> >  {
> >  	struct ceph_fs_client *fsc = ceph_inode_to_client(inode);
> > diff --git a/fs/ceph/mds_client.c b/fs/ceph/mds_client.c
> > index 87d3be10af25..1f960361b78e 100644
> > --- a/fs/ceph/mds_client.c
> > +++ b/fs/ceph/mds_client.c
> > @@ -825,13 +825,13 @@ void ceph_mdsc_release_request(struct kref *kref)
> >  	if (req->r_inode) {
> >  		ceph_put_cap_refs(ceph_inode(req->r_inode), CEPH_CAP_PIN);
> >  		/* avoid calling iput_final() in mds dispatch threads */
> > -		ceph_async_iput(req->r_inode);
> > +		iput(req->r_inode);
> >  	}
> >  	if (req->r_parent) {
> >  		ceph_put_cap_refs(ceph_inode(req->r_parent), CEPH_CAP_PIN);
> > -		ceph_async_iput(req->r_parent);
> > +		iput(req->r_parent);
> >  	}
> > -	ceph_async_iput(req->r_target_inode);
> > +	iput(req->r_target_inode);
> >  	if (req->r_dentry)
> >  		dput(req->r_dentry);
> >  	if (req->r_old_dentry)
> > @@ -845,7 +845,7 @@ void ceph_mdsc_release_request(struct kref *kref)
> >  		 */
> >  		ceph_put_cap_refs(ceph_inode(req->r_old_dentry_dir),
> >  				  CEPH_CAP_PIN);
> > -		ceph_async_iput(req->r_old_dentry_dir);
> > +		iput(req->r_old_dentry_dir);
> >  	}
> >  	kfree(req->r_path1);
> >  	kfree(req->r_path2);
> > @@ -961,7 +961,7 @@ static void __unregister_request(struct ceph_mds_client *mdsc,
> >  
> >  	if (req->r_unsafe_dir) {
> >  		/* avoid calling iput_final() in mds dispatch threads */
> > -		ceph_async_iput(req->r_unsafe_dir);
> > +		iput(req->r_unsafe_dir);
> >  		req->r_unsafe_dir = NULL;
> >  	}
> >  
> > @@ -1132,7 +1132,7 @@ static int __choose_mds(struct ceph_mds_client *mdsc,
> >  		cap = rb_entry(rb_first(&ci->i_caps), struct ceph_cap, ci_node);
> >  	if (!cap) {
> >  		spin_unlock(&ci->i_ceph_lock);
> > -		ceph_async_iput(inode);
> > +		iput(inode);
> >  		goto random;
> >  	}
> >  	mds = cap->session->s_mds;
> > @@ -1143,7 +1143,7 @@ static int __choose_mds(struct ceph_mds_client *mdsc,
> >  out:
> >  	/* avoid calling iput_final() while holding mdsc->mutex or
> >  	 * in mds dispatch threads */
> > -	ceph_async_iput(inode);
> > +	iput(inode);
> >  	return mds;
> >  
> >  random:
> > @@ -1548,7 +1548,7 @@ int ceph_iterate_session_caps(struct ceph_mds_session *session,
> >  		if (last_inode) {
> >  			/* avoid calling iput_final() while holding
> >  			 * s_mutex or in mds dispatch threads */
> > -			ceph_async_iput(last_inode);
> > +			iput(last_inode);
> >  			last_inode = NULL;
> >  		}
> >  		if (old_cap) {
> > @@ -1582,7 +1582,7 @@ int ceph_iterate_session_caps(struct ceph_mds_session *session,
> >  	session->s_cap_iterator = NULL;
> >  	spin_unlock(&session->s_cap_lock);
> >  
> > -	ceph_async_iput(last_inode);
> > +	iput(last_inode);
> >  	if (old_cap)
> >  		ceph_put_cap(session->s_mdsc, old_cap);
> >  
> > @@ -1723,7 +1723,7 @@ static void remove_session_caps(struct ceph_mds_session *session)
> >  
> >  			inode = ceph_find_inode(sb, vino);
> >  			 /* avoid calling iput_final() while holding s_mutex */
> > -			ceph_async_iput(inode);
> > +			iput(inode);
> >  
> >  			spin_lock(&session->s_cap_lock);
> >  		}
> > @@ -4370,7 +4370,7 @@ static void handle_lease(struct ceph_mds_client *mdsc,
> >  out:
> >  	mutex_unlock(&session->s_mutex);
> >  	/* avoid calling iput_final() in mds dispatch threads */
> > -	ceph_async_iput(inode);
> > +	iput(inode);
> >  	return;
> >  
> >  bad:
> > diff --git a/fs/ceph/quota.c b/fs/ceph/quota.c
> > index 4e32c9600ecc..988c6e04b327 100644
> > --- a/fs/ceph/quota.c
> > +++ b/fs/ceph/quota.c
> > @@ -75,7 +75,7 @@ void ceph_handle_quota(struct ceph_mds_client *mdsc,
> >  	spin_unlock(&ci->i_ceph_lock);
> >  
> >  	/* avoid calling iput_final() in dispatch thread */
> > -	ceph_async_iput(inode);
> > +	iput(inode);
> >  }
> >  
> >  static struct ceph_quotarealm_inode *
> > @@ -248,7 +248,7 @@ static struct ceph_snap_realm *get_quota_realm(struct ceph_mds_client *mdsc,
> >  		ci = ceph_inode(in);
> >  		has_quota = __ceph_has_any_quota(ci);
> >  		/* avoid calling iput_final() while holding mdsc->snap_rwsem */
> > -		ceph_async_iput(in);
> > +		iput(in);
> >  
> >  		next = realm->parent;
> >  		if (has_quota || !next)
> > @@ -384,7 +384,7 @@ static bool check_quota_exceeded(struct inode *inode, enum quota_check_op op,
> >  			exceeded = true; /* Just break the loop */
> >  		}
> >  		/* avoid calling iput_final() while holding mdsc->snap_rwsem */
> > -		ceph_async_iput(in);
> > +		iput(in);
> >  
> >  		next = realm->parent;
> >  		if (exceeded || !next)
> > diff --git a/fs/ceph/snap.c b/fs/ceph/snap.c
> > index afc7f4c32364..f928b02bcd31 100644
> > --- a/fs/ceph/snap.c
> > +++ b/fs/ceph/snap.c
> > @@ -679,13 +679,13 @@ static void queue_realm_cap_snaps(struct ceph_snap_realm *realm)
> >  		spin_unlock(&realm->inodes_with_caps_lock);
> >  		/* avoid calling iput_final() while holding
> >  		 * mdsc->snap_rwsem or in mds dispatch threads */
> > -		ceph_async_iput(lastinode);
> > +		iput(lastinode);
> >  		lastinode = inode;
> >  		ceph_queue_cap_snap(ci);
> >  		spin_lock(&realm->inodes_with_caps_lock);
> >  	}
> >  	spin_unlock(&realm->inodes_with_caps_lock);
> > -	ceph_async_iput(lastinode);
> > +	iput(lastinode);
> >  
> >  	dout("queue_realm_cap_snaps %p %llx done\n", realm, realm->ino);
> >  }
> > @@ -841,7 +841,7 @@ static void flush_snaps(struct ceph_mds_client *mdsc)
> >  		ceph_flush_snaps(ci, &session);
> >  		/* avoid calling iput_final() while holding
> >  		 * session->s_mutex or in mds dispatch threads */
> > -		ceph_async_iput(inode);
> > +		iput(inode);
> >  		spin_lock(&mdsc->snap_flush_lock);
> >  	}
> >  	spin_unlock(&mdsc->snap_flush_lock);
> > @@ -985,12 +985,12 @@ void ceph_handle_snap(struct ceph_mds_client *mdsc,
> >  
> >  			/* avoid calling iput_final() while holding
> >  			 * mdsc->snap_rwsem or mds in dispatch threads */
> > -			ceph_async_iput(inode);
> > +			iput(inode);
> >  			continue;
> >  
> >  skip_inode:
> >  			spin_unlock(&ci->i_ceph_lock);
> > -			ceph_async_iput(inode);
> > +			iput(inode);
> >  		}
> >  
> >  		/* we may have taken some of the old realm's children. */
> > diff --git a/fs/ceph/super.h b/fs/ceph/super.h
> > index 31f0be9120dd..6b6332a5c113 100644
> > --- a/fs/ceph/super.h
> > +++ b/fs/ceph/super.h
> > @@ -988,8 +988,6 @@ extern int ceph_inode_holds_cap(struct inode *inode, int mask);
> >  extern bool ceph_inode_set_size(struct inode *inode, loff_t size);
> >  extern void __ceph_do_pending_vmtruncate(struct inode *inode);
> >  
> > -extern void ceph_async_iput(struct inode *inode);
> > -
> >  void ceph_queue_inode_work(struct inode *inode, int work_bit);
> >  
> >  static inline void ceph_queue_vmtruncate(struct inode *inode)
> > -- 
> > 2.31.1
> > 

-- 
Jeff Layton <jlayton@kernel.org>


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [RFC PATCH 6/6] ceph: eliminate ceph_async_iput()
  2021-06-17 16:24     ` Jeff Layton
@ 2021-06-18 12:56       ` Luis Henriques
  2021-06-18 13:04         ` Jeff Layton
  0 siblings, 1 reply; 18+ messages in thread
From: Luis Henriques @ 2021-06-18 12:56 UTC (permalink / raw)
  To: Jeff Layton; +Cc: ceph-devel, pdonnell, ukernel, idryomov, xiubli

On Thu, Jun 17, 2021 at 12:24:35PM -0400, Jeff Layton wrote:
> On Wed, 2021-06-16 at 16:26 +0100, Luis Henriques wrote:
> > On Tue, Jun 15, 2021 at 10:57:30AM -0400, Jeff Layton wrote:
> > > Now that we don't need to hold session->s_mutex or the snap_rwsem when
> > > calling ceph_check_caps, we can eliminate ceph_async_iput and just use
> > > normal iput calls.
> > > 
> > > Signed-off-by: Jeff Layton <jlayton@kernel.org>
> > > ---
> > >  fs/ceph/caps.c       |  6 +++---
> > >  fs/ceph/inode.c      | 25 +++----------------------
> > >  fs/ceph/mds_client.c | 22 +++++++++++-----------
> > >  fs/ceph/quota.c      |  6 +++---
> > >  fs/ceph/snap.c       | 10 +++++-----
> > >  fs/ceph/super.h      |  2 --
> > >  6 files changed, 25 insertions(+), 46 deletions(-)
> > > 
> > > diff --git a/fs/ceph/caps.c b/fs/ceph/caps.c
> > > index 5864d5088e27..fd9243e9a1b2 100644
> > > --- a/fs/ceph/caps.c
> > > +++ b/fs/ceph/caps.c
> > > @@ -3147,7 +3147,7 @@ void ceph_put_wrbuffer_cap_refs(struct ceph_inode_info *ci, int nr,
> > >  		wake_up_all(&ci->i_cap_wq);
> > >  	while (put-- > 0) {
> > >  		/* avoid calling iput_final() in osd dispatch threads */
> > > -		ceph_async_iput(inode);
> > > +		iput(inode);
> > >  	}
> > >  }
> > >  
> > > @@ -4136,7 +4136,7 @@ void ceph_handle_caps(struct ceph_mds_session *session,
> > >  done_unlocked:
> > >  	ceph_put_string(extra_info.pool_ns);
> > >  	/* avoid calling iput_final() in mds dispatch threads */
> > > -	ceph_async_iput(inode);
> > > +	iput(inode);
> > 
> > To be honest, I'm not really convinced we can blindly substitute
> > ceph_async_iput() by iput().  This case specifically can problematic
> > because we may have called ceph_queue_vmtruncate() above (or
> > handle_cap_grant()).  If we did, we have ci->i_work_mask set and the wq
> > would have invoked __ceph_do_pending_vmtruncate().  Using the iput() here
> > won't have that result.  Am I missing something?
> > 
> 
> The point of this set is to make iput safe to run even when the s_mutex
> and/or snap_rwsem is held. When we queue up the ci->i_work, we do take a
> reference to the inode and still run iput there. This set just stops
> queueing iputs themselves to the workqueue.
> 
> I really don't see a problem with this call site in ceph_handle_caps in
> particular, as it's calling iput after dropping the s_mutex. Probably
> that should not have been converted to use ceph_async_iput in the first
> place.

Obviously, you're right.  I don't what I was thinking of when I was
reading this code and saw: ci->i_work_mask bits are set in one place (in
ceph_queue_vmtruncate(), for ex.) and then the work is queued only in
ceph_async_iput().  Which is obviously wrong!

Anyway, sorry for the noise.

(Oh, and BTW: this patch should also remove comments such as "avoid
calling iput_final() in mds dispatch threads", and similar that exist in
several places before ceph_async_iput() is (or rather *was*) called.)

Cheers,
--
Luís

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [RFC PATCH 6/6] ceph: eliminate ceph_async_iput()
  2021-06-18 12:56       ` Luis Henriques
@ 2021-06-18 13:04         ` Jeff Layton
  0 siblings, 0 replies; 18+ messages in thread
From: Jeff Layton @ 2021-06-18 13:04 UTC (permalink / raw)
  To: Luis Henriques; +Cc: ceph-devel, pdonnell, ukernel, idryomov, xiubli

On Fri, 2021-06-18 at 13:56 +0100, Luis Henriques wrote:
> On Thu, Jun 17, 2021 at 12:24:35PM -0400, Jeff Layton wrote:
> > On Wed, 2021-06-16 at 16:26 +0100, Luis Henriques wrote:
> > > On Tue, Jun 15, 2021 at 10:57:30AM -0400, Jeff Layton wrote:
> > > > Now that we don't need to hold session->s_mutex or the snap_rwsem when
> > > > calling ceph_check_caps, we can eliminate ceph_async_iput and just use
> > > > normal iput calls.
> > > > 
> > > > Signed-off-by: Jeff Layton <jlayton@kernel.org>
> > > > ---
> > > >  fs/ceph/caps.c       |  6 +++---
> > > >  fs/ceph/inode.c      | 25 +++----------------------
> > > >  fs/ceph/mds_client.c | 22 +++++++++++-----------
> > > >  fs/ceph/quota.c      |  6 +++---
> > > >  fs/ceph/snap.c       | 10 +++++-----
> > > >  fs/ceph/super.h      |  2 --
> > > >  6 files changed, 25 insertions(+), 46 deletions(-)
> > > > 
> > > > diff --git a/fs/ceph/caps.c b/fs/ceph/caps.c
> > > > index 5864d5088e27..fd9243e9a1b2 100644
> > > > --- a/fs/ceph/caps.c
> > > > +++ b/fs/ceph/caps.c
> > > > @@ -3147,7 +3147,7 @@ void ceph_put_wrbuffer_cap_refs(struct ceph_inode_info *ci, int nr,
> > > >  		wake_up_all(&ci->i_cap_wq);
> > > >  	while (put-- > 0) {
> > > >  		/* avoid calling iput_final() in osd dispatch threads */
> > > > -		ceph_async_iput(inode);
> > > > +		iput(inode);
> > > >  	}
> > > >  }
> > > >  
> > > > @@ -4136,7 +4136,7 @@ void ceph_handle_caps(struct ceph_mds_session *session,
> > > >  done_unlocked:
> > > >  	ceph_put_string(extra_info.pool_ns);
> > > >  	/* avoid calling iput_final() in mds dispatch threads */
> > > > -	ceph_async_iput(inode);
> > > > +	iput(inode);
> > > 
> > > To be honest, I'm not really convinced we can blindly substitute
> > > ceph_async_iput() by iput().  This case specifically can problematic
> > > because we may have called ceph_queue_vmtruncate() above (or
> > > handle_cap_grant()).  If we did, we have ci->i_work_mask set and the wq
> > > would have invoked __ceph_do_pending_vmtruncate().  Using the iput() here
> > > won't have that result.  Am I missing something?
> > > 
> > 
> > The point of this set is to make iput safe to run even when the s_mutex
> > and/or snap_rwsem is held. When we queue up the ci->i_work, we do take a
> > reference to the inode and still run iput there. This set just stops
> > queueing iputs themselves to the workqueue.
> > 
> > I really don't see a problem with this call site in ceph_handle_caps in
> > particular, as it's calling iput after dropping the s_mutex. Probably
> > that should not have been converted to use ceph_async_iput in the first
> > place.
> 
> Obviously, you're right.  I don't what I was thinking of when I was
> reading this code and saw: ci->i_work_mask bits are set in one place (in
> ceph_queue_vmtruncate(), for ex.) and then the work is queued only in
> ceph_async_iput().  Which is obviously wrong!
> 
> Anyway, sorry for the noise.
> 
> (Oh, and BTW: this patch should also remove comments such as "avoid
> calling iput_final() in mds dispatch threads", and similar that exist in
> several places before ceph_async_iput() is (or rather *was*) called.)
> 

Thanks. I saw that I missed a few of those sorts of comments. I'll plan
to remove those before I merge this.

Cheers,
-- 
Jeff Layton <jlayton@kernel.org>


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [RFC PATCH 2/6] ceph: eliminate session->s_gen_ttl_lock
  2021-06-16 15:24   ` Luis Henriques
@ 2021-06-18 13:31     ` Jeff Layton
  0 siblings, 0 replies; 18+ messages in thread
From: Jeff Layton @ 2021-06-18 13:31 UTC (permalink / raw)
  To: Luis Henriques; +Cc: ceph-devel, pdonnell, ukernel, idryomov, xiubli

On Wed, 2021-06-16 at 16:24 +0100, Luis Henriques wrote:
> On Tue, Jun 15, 2021 at 10:57:26AM -0400, Jeff Layton wrote:
> > Turn s_cap_gen field into an atomic_t, and just rely on the fact that we
> > hold the s_mutex when changing the s_cap_ttl field.
> > 
> > Signed-off-by: Jeff Layton <jlayton@kernel.org>
> > ---
> >  fs/ceph/caps.c       | 15 ++++++---------
> >  fs/ceph/dir.c        |  4 +---
> >  fs/ceph/inode.c      |  4 ++--
> >  fs/ceph/mds_client.c | 17 ++++++-----------
> >  fs/ceph/mds_client.h |  6 ++----
> >  5 files changed, 17 insertions(+), 29 deletions(-)
> > 
> > diff --git a/fs/ceph/caps.c b/fs/ceph/caps.c
> > index a5e93b185515..919eada97a1f 100644
> > --- a/fs/ceph/caps.c
> > +++ b/fs/ceph/caps.c
> > @@ -645,9 +645,7 @@ void ceph_add_cap(struct inode *inode,
> >  	dout("add_cap %p mds%d cap %llx %s seq %d\n", inode,
> >  	     session->s_mds, cap_id, ceph_cap_string(issued), seq);
> >  
> > -	spin_lock(&session->s_gen_ttl_lock);
> > -	gen = session->s_cap_gen;
> > -	spin_unlock(&session->s_gen_ttl_lock);
> > +	gen = atomic_read(&session->s_cap_gen);
> >  
> >  	cap = __get_cap_for_mds(ci, mds);
> >  	if (!cap) {
> > @@ -785,10 +783,8 @@ static int __cap_is_valid(struct ceph_cap *cap)
> >  	unsigned long ttl;
> >  	u32 gen;
> >  
> > -	spin_lock(&cap->session->s_gen_ttl_lock);
> > -	gen = cap->session->s_cap_gen;
> > +	gen = atomic_read(&cap->session->s_cap_gen);
> >  	ttl = cap->session->s_cap_ttl;
> > -	spin_unlock(&cap->session->s_gen_ttl_lock);
> >  
> >  	if (cap->cap_gen < gen || time_after_eq(jiffies, ttl)) {
> >  		dout("__cap_is_valid %p cap %p issued %s "
> > @@ -1182,7 +1178,8 @@ void __ceph_remove_cap(struct ceph_cap *cap, bool queue_release)
> >  	 * s_cap_gen while session is in the reconnect state.
> >  	 */
> >  	if (queue_release &&
> > -	    (!session->s_cap_reconnect || cap->cap_gen == session->s_cap_gen)) {
> > +	    (!session->s_cap_reconnect ||
> > +	     cap->cap_gen == atomic_read(&session->s_cap_gen))) {
> >  		cap->queue_release = 1;
> >  		if (removed) {
> >  			__ceph_queue_cap_release(session, cap);
> > @@ -3288,7 +3285,7 @@ static void handle_cap_grant(struct inode *inode,
> >  	u64 size = le64_to_cpu(grant->size);
> >  	u64 max_size = le64_to_cpu(grant->max_size);
> >  	unsigned char check_caps = 0;
> > -	bool was_stale = cap->cap_gen < session->s_cap_gen;
> > +	bool was_stale = cap->cap_gen < atomic_read(&session->s_cap_gen);
> >  	bool wake = false;
> >  	bool writeback = false;
> >  	bool queue_trunc = false;
> > @@ -3340,7 +3337,7 @@ static void handle_cap_grant(struct inode *inode,
> >  	}
> >  
> >  	/* side effects now are allowed */
> > -	cap->cap_gen = session->s_cap_gen;
> > +	cap->cap_gen = atomic_read(&session->s_cap_gen);
> >  	cap->seq = seq;
> >  
> >  	__check_cap_issue(ci, cap, newcaps);
> > diff --git a/fs/ceph/dir.c b/fs/ceph/dir.c
> > index 0dc5f8357f58..bd508b1aeac2 100644
> > --- a/fs/ceph/dir.c
> > +++ b/fs/ceph/dir.c
> > @@ -1541,10 +1541,8 @@ static bool __dentry_lease_is_valid(struct ceph_dentry_info *di)
> >  		u32 gen;
> >  		unsigned long ttl;
> >  
> > -		spin_lock(&session->s_gen_ttl_lock);
> > -		gen = session->s_cap_gen;
> > +		gen = atomic_read(&session->s_cap_gen);
> >  		ttl = session->s_cap_ttl;
> > -		spin_unlock(&session->s_gen_ttl_lock);
> >  
> >  		if (di->lease_gen == gen &&
> >  		    time_before(jiffies, ttl) &&
> > diff --git a/fs/ceph/inode.c b/fs/ceph/inode.c
> > index 6f43542b3344..6034821c9d63 100644
> > --- a/fs/ceph/inode.c
> > +++ b/fs/ceph/inode.c
> > @@ -1124,7 +1124,7 @@ static void __update_dentry_lease(struct inode *dir, struct dentry *dentry,
> >  		return;
> >  	}
> >  
> > -	if (di->lease_gen == session->s_cap_gen &&
> > +	if (di->lease_gen == atomic_read(&session->s_cap_gen) &&
> >  	    time_before(ttl, di->time))
> >  		return;  /* we already have a newer lease. */
> >  
> > @@ -1135,7 +1135,7 @@ static void __update_dentry_lease(struct inode *dir, struct dentry *dentry,
> >  
> >  	if (!di->lease_session)
> >  		di->lease_session = ceph_get_mds_session(session);
> > -	di->lease_gen = session->s_cap_gen;
> > +	di->lease_gen = atomic_read(&session->s_cap_gen);
> >  	di->lease_seq = le32_to_cpu(lease->seq);
> >  	di->lease_renew_after = half_ttl;
> >  	di->lease_renew_from = 0;
> > diff --git a/fs/ceph/mds_client.c b/fs/ceph/mds_client.c
> > index ec669634c649..87d3be10af25 100644
> > --- a/fs/ceph/mds_client.c
> > +++ b/fs/ceph/mds_client.c
> > @@ -749,8 +749,7 @@ static struct ceph_mds_session *register_session(struct ceph_mds_client *mdsc,
> >  
> >  	ceph_con_init(&s->s_con, s, &mds_con_ops, &mdsc->fsc->client->msgr);
> >  
> > -	spin_lock_init(&s->s_gen_ttl_lock);
> > -	s->s_cap_gen = 1;
> > +	atomic_set(&s->s_cap_gen, 1);
> >  	s->s_cap_ttl = jiffies - 1;
> >  
> >  	spin_lock_init(&s->s_cap_lock);
> > @@ -1763,7 +1762,7 @@ static int wake_up_session_cb(struct inode *inode, struct ceph_cap *cap,
> >  		ci->i_requested_max_size = 0;
> >  		spin_unlock(&ci->i_ceph_lock);
> >  	} else if (ev == RENEWCAPS) {
> > -		if (cap->cap_gen < cap->session->s_cap_gen) {
> > +		if (cap->cap_gen < atomic_read(&cap->session->s_cap_gen)) {
> >  			/* mds did not re-issue stale cap */
> >  			spin_lock(&ci->i_ceph_lock);
> >  			cap->issued = cap->implemented = CEPH_CAP_PIN;
> > @@ -3501,10 +3500,8 @@ static void handle_session(struct ceph_mds_session *session,
> >  	case CEPH_SESSION_STALE:
> >  		pr_info("mds%d caps went stale, renewing\n",
> >  			session->s_mds);
> > -		spin_lock(&session->s_gen_ttl_lock);
> > -		session->s_cap_gen++;
> > +		atomic_inc(&session->s_cap_gen);
> >  		session->s_cap_ttl = jiffies - 1;
> > -		spin_unlock(&session->s_gen_ttl_lock);
> >  		send_renew_caps(mdsc, session);
> >  		break;
> >  
> > @@ -3773,7 +3770,7 @@ static int reconnect_caps_cb(struct inode *inode, struct ceph_cap *cap,
> >  	cap->seq = 0;        /* reset cap seq */
> >  	cap->issue_seq = 0;  /* and issue_seq */
> >  	cap->mseq = 0;       /* and migrate_seq */
> > -	cap->cap_gen = cap->session->s_cap_gen;
> > +	cap->cap_gen = atomic_read(&cap->session->s_cap_gen);
> >  
> >  	/* These are lost when the session goes away */
> >  	if (S_ISDIR(inode->i_mode)) {
> > @@ -4013,9 +4010,7 @@ static void send_mds_reconnect(struct ceph_mds_client *mdsc,
> >  	dout("session %p state %s\n", session,
> >  	     ceph_session_state_name(session->s_state));
> >  
> > -	spin_lock(&session->s_gen_ttl_lock);
> > -	session->s_cap_gen++;
> > -	spin_unlock(&session->s_gen_ttl_lock);
> > +	atomic_inc(&session->s_cap_gen);
> >  
> >  	spin_lock(&session->s_cap_lock);
> >  	/* don't know if session is readonly */
> > @@ -4346,7 +4341,7 @@ static void handle_lease(struct ceph_mds_client *mdsc,
> >  
> >  	case CEPH_MDS_LEASE_RENEW:
> >  		if (di->lease_session == session &&
> > -		    di->lease_gen == session->s_cap_gen &&
> > +		    di->lease_gen == atomic_read(&session->s_cap_gen) &&
> >  		    di->lease_renew_from &&
> >  		    di->lease_renew_after == 0) {
> >  			unsigned long duration =
> > diff --git a/fs/ceph/mds_client.h b/fs/ceph/mds_client.h
> > index 15c11a0f2caf..20e42d8b66c6 100644
> > --- a/fs/ceph/mds_client.h
> > +++ b/fs/ceph/mds_client.h
> > @@ -186,10 +186,8 @@ struct ceph_mds_session {
> >  
> >  	struct ceph_auth_handshake s_auth;
> >  
> > -	/* protected by s_gen_ttl_lock */
> > -	spinlock_t        s_gen_ttl_lock;
> > -	u32               s_cap_gen;  /* inc each time we get mds stale msg */
> > -	unsigned long     s_cap_ttl;  /* when session caps expire */
> > +	atomic_t          s_cap_gen;  /* inc each time we get mds stale msg */
> > +	unsigned long     s_cap_ttl;  /* when session caps expire. protected by s_mutex */
> >  
> >  	/* protected by s_cap_lock */
> >  	spinlock_t        s_cap_lock;
> > -- 
> > 2.31.1
> > 
> 
> Reviewed-by: Luis Henriques <lhenriques@suse.de>
> 
> (Since we don't have s_mutex when reading s_cap_ttl, would it make sense
> to make it an atomic_t too?)
> 

We could, but I don't see a huge need to manage consistency on that
variable. We only use it to determine how soon to do cap updates or to
renew dentry leases. If we see problems in that area though, we might
want to do that. 
-- 
Jeff Layton <jlayton@kernel.org>


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [RFC PATCH 3/6] ceph: don't take s_mutex or snap_rwsem in ceph_check_caps
  2021-06-16 15:25   ` Luis Henriques
@ 2021-06-18 13:35     ` Jeff Layton
  0 siblings, 0 replies; 18+ messages in thread
From: Jeff Layton @ 2021-06-18 13:35 UTC (permalink / raw)
  To: Luis Henriques; +Cc: ceph-devel, pdonnell, ukernel, idryomov, xiubli

On Wed, 2021-06-16 at 16:25 +0100, Luis Henriques wrote:
> On Tue, Jun 15, 2021 at 10:57:27AM -0400, Jeff Layton wrote:
> > These locks appear to be completely unnecessary. Almost all of this
> > function is done under the inode->i_ceph_lock, aside from the actual
> > sending of the message. Don't take either lock in this function.
> > 
> > Signed-off-by: Jeff Layton <jlayton@kernel.org>
> > ---
> >  fs/ceph/caps.c | 61 ++++++--------------------------------------------
> >  1 file changed, 7 insertions(+), 54 deletions(-)
> > 
> > diff --git a/fs/ceph/caps.c b/fs/ceph/caps.c
> > index 919eada97a1f..825b1e463ad3 100644
> > --- a/fs/ceph/caps.c
> > +++ b/fs/ceph/caps.c
> > @@ -1912,7 +1912,6 @@ void ceph_check_caps(struct ceph_inode_info *ci, int flags,
> >  	struct ceph_cap *cap;
> >  	u64 flush_tid, oldest_flush_tid;
> >  	int file_wanted, used, cap_used;
> > -	int took_snap_rwsem = 0;             /* true if mdsc->snap_rwsem held */
> >  	int issued, implemented, want, retain, revoking, flushing = 0;
> >  	int mds = -1;   /* keep track of how far we've gone through i_caps list
> >  			   to avoid an infinite loop on retry */
> > @@ -1920,6 +1919,9 @@ void ceph_check_caps(struct ceph_inode_info *ci, int flags,
> >  	bool queue_invalidate = false;
> >  	bool tried_invalidate = false;
> >  
> > +	if (session)
> > +		ceph_get_mds_session(session);
> > +
> >  	spin_lock(&ci->i_ceph_lock);
> >  	if (ci->i_ceph_flags & CEPH_I_FLUSH)
> >  		flags |= CHECK_CAPS_FLUSH;
> > @@ -2021,8 +2023,6 @@ void ceph_check_caps(struct ceph_inode_info *ci, int flags,
> >  		    ((flags & CHECK_CAPS_AUTHONLY) && cap != ci->i_auth_cap))
> >  			continue;
> >  
> > -		/* NOTE: no side-effects allowed, until we take s_mutex */
> > -
> >  		/*
> >  		 * If we have an auth cap, we don't need to consider any
> >  		 * overlapping caps as used.
> > @@ -2085,37 +2085,8 @@ void ceph_check_caps(struct ceph_inode_info *ci, int flags,
> >  			continue;     /* nope, all good */
> >  
> >  ack:
> > -		if (session && session != cap->session) {
> > -			dout("oops, wrong session %p mutex\n", session);
> > -			mutex_unlock(&session->s_mutex);
> > -			session = NULL;
> > -		}
> > -		if (!session) {
> > -			session = cap->session;
> > -			if (mutex_trylock(&session->s_mutex) == 0) {
> > -				dout("inverting session/ino locks on %p\n",
> > -				     session);
> > -				session = ceph_get_mds_session(session);
> > -				spin_unlock(&ci->i_ceph_lock);
> > -				if (took_snap_rwsem) {
> > -					up_read(&mdsc->snap_rwsem);
> > -					took_snap_rwsem = 0;
> > -				}
> > -				if (session) {
> > -					mutex_lock(&session->s_mutex);
> > -					ceph_put_mds_session(session);
> > -				} else {
> > -					/*
> > -					 * Because we take the reference while
> > -					 * holding the i_ceph_lock, it should
> > -					 * never be NULL. Throw a warning if it
> > -					 * ever is.
> > -					 */
> > -					WARN_ON_ONCE(true);
> > -				}
> > -				goto retry;
> > -			}
> > -		}
> > +		ceph_put_mds_session(session);
> > +		session = ceph_get_mds_session(cap->session);
> >  
> >  		/* kick flushing and flush snaps before sending normal
> >  		 * cap message */
> > @@ -2130,19 +2101,6 @@ void ceph_check_caps(struct ceph_inode_info *ci, int flags,
> >  			goto retry_locked;
> >  		}
> >  
> > -		/* take snap_rwsem after session mutex */
> > -		if (!took_snap_rwsem) {
> > -			if (down_read_trylock(&mdsc->snap_rwsem) == 0) {
> > -				dout("inverting snap/in locks on %p\n",
> > -				     inode);
> > -				spin_unlock(&ci->i_ceph_lock);
> > -				down_read(&mdsc->snap_rwsem);
> > -				took_snap_rwsem = 1;
> > -				goto retry;
> > -			}
> > -			took_snap_rwsem = 1;
> > -		}
> > -
> >  		if (cap == ci->i_auth_cap && ci->i_dirty_caps) {
> >  			flushing = ci->i_dirty_caps;
> >  			flush_tid = __mark_caps_flushing(inode, session, false,
> > @@ -2179,13 +2137,9 @@ void ceph_check_caps(struct ceph_inode_info *ci, int flags,
> >  
> >  	spin_unlock(&ci->i_ceph_lock);
> >  
> > +	ceph_put_mds_session(session);
> >  	if (queue_invalidate)
> >  		ceph_queue_invalidate(inode);
> > -
> > -	if (session)
> > -		mutex_unlock(&session->s_mutex);
> > -	if (took_snap_rwsem)
> > -		up_read(&mdsc->snap_rwsem);
> >  }
> >  
> >  /*
> > @@ -3550,13 +3504,12 @@ static void handle_cap_grant(struct inode *inode,
> >  	if (wake)
> >  		wake_up_all(&ci->i_cap_wq);
> >  
> > +	mutex_unlock(&session->s_mutex);
> >  	if (check_caps == 1)
> >  		ceph_check_caps(ci, CHECK_CAPS_AUTHONLY | CHECK_CAPS_NOINVAL,
> >  				session);
> >  	else if (check_caps == 2)
> >  		ceph_check_caps(ci, CHECK_CAPS_NOINVAL, session);
> > -	else
> > -		mutex_unlock(&session->s_mutex);
> >  }
> >  
> >  /*
> > -- 
> > 2.31.1
> > 
> 
> Ugh, this is a tricky one.  I couldn't find anything wrong but... yeah,
> here it is:
> 
> Reviewed-by: Luis Henriques <lhenriques@suse.de>
> 
> (Suggestion: remove the 'retry/retry_locked' goto dance and simply lock
> i_ceph_lock in the only 'goto retry' call.)
> 

Thanks. Good suggestion. I'll incorporate it before I merge, but I won't
plan to resend unless there are bigger changes needed.
-- 
Jeff Layton <jlayton@kernel.org>


^ permalink raw reply	[flat|nested] 18+ messages in thread

end of thread, other threads:[~2021-06-18 13:35 UTC | newest]

Thread overview: 18+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-06-15 14:57 [RFC PATCH 0/6] ceph: remove excess mutex locking from cap/snap flushing codepaths Jeff Layton
2021-06-15 14:57 ` [RFC PATCH 1/6] ceph: allow ceph_put_mds_session to take NULL or ERR_PTR Jeff Layton
2021-06-16 15:24   ` Luis Henriques
2021-06-15 14:57 ` [RFC PATCH 2/6] ceph: eliminate session->s_gen_ttl_lock Jeff Layton
2021-06-16 15:24   ` Luis Henriques
2021-06-18 13:31     ` Jeff Layton
2021-06-15 14:57 ` [RFC PATCH 3/6] ceph: don't take s_mutex or snap_rwsem in ceph_check_caps Jeff Layton
2021-06-16 15:25   ` Luis Henriques
2021-06-18 13:35     ` Jeff Layton
2021-06-15 14:57 ` [RFC PATCH 4/6] ceph: don't take s_mutex in try_flush_caps Jeff Layton
2021-06-16 15:25   ` Luis Henriques
2021-06-15 14:57 ` [RFC PATCH 5/6] ceph: don't take s_mutex in ceph_flush_snaps Jeff Layton
2021-06-16 15:25   ` Luis Henriques
2021-06-15 14:57 ` [RFC PATCH 6/6] ceph: eliminate ceph_async_iput() Jeff Layton
2021-06-16 15:26   ` Luis Henriques
2021-06-17 16:24     ` Jeff Layton
2021-06-18 12:56       ` Luis Henriques
2021-06-18 13:04         ` Jeff Layton

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.