All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v5 00/12] ceph: async directory operations support
@ 2020-02-19 13:25 Jeff Layton
  2020-02-19 13:25 ` [PATCH v5 01/12] ceph: add flag to designate that a request is asynchronous Jeff Layton
                   ` (11 more replies)
  0 siblings, 12 replies; 24+ messages in thread
From: Jeff Layton @ 2020-02-19 13:25 UTC (permalink / raw)
  To: ceph-devel; +Cc: idryomov, sage, zyan, pdonnell, xiubli

A lot of changes in this set -- some highlights:

v5: reorganize patchset for easier review and better bisectability
    rework how dir caps are acquired and tracked in requests
    preemptively release cap refs when reconnecting session
    restore inode number back to pool when fall back to sync create
    rework unlink cap acquisition to be lighter weight
    new "nowsync" mount opt, patterned after xfs "wsync" mount opt

Performance is on par with earlier sets.

I previously pulled the async unlink patch from ceph-client/testing, so
this set includes a revised version of that as well, and orders it
some other changes. I also broke that one up into several patches.

This should (hopefully) address Zheng's concerns about releasing the
caps when the session is lost. Those are preemptively released now
when the session is reconnected. 

This adds a new mount option too. xfs has a "wsync" mount option which
makes it wait for namespaced directory operations to be journalled
before returning. This patchset adds "wsync" and "nowsync" options, so
it can now be enabled/disabled on a per-sb-basis.

The default for xfs is "nowsync". For ceph though, I'm leaving it as
"wsync" for now, so you need to mount with "nowsync" to enable async
dirops.

We may not actually need patch #6 here. Zheng had that delta in one
of the earlier patches, but I'm not sure it's really needed now. It
may make sense to just take it on its own merits though.

Comments and suggestions welcome.

Jeff Layton (11):
  ceph: add flag to designate that a request is asynchronous
  ceph: track primary dentry link
  ceph: add infrastructure for waiting for async create to complete
  ceph: make __take_cap_refs non-static
  ceph: cap tracking for async directory operations
  ceph: perform asynchronous unlink if we have sufficient caps
  ceph: make ceph_fill_inode non-static
  ceph: decode interval_sets for delegated inos
  ceph: add new MDS req field to hold delegated inode number
  ceph: cache layout in parent dir on first sync create
  ceph: attempt to do async create when possible

Yan, Zheng (1):
  ceph: don't take refs to want mask unless we have all bits

 fs/ceph/caps.c               |  72 +++++++---
 fs/ceph/dir.c                | 106 +++++++++++++-
 fs/ceph/file.c               | 270 +++++++++++++++++++++++++++++++++--
 fs/ceph/inode.c              |  58 ++++----
 fs/ceph/mds_client.c         | 196 ++++++++++++++++++++++---
 fs/ceph/mds_client.h         |  24 +++-
 fs/ceph/super.c              |  20 +++
 fs/ceph/super.h              |  21 ++-
 include/linux/ceph/ceph_fs.h |  17 ++-
 9 files changed, 701 insertions(+), 83 deletions(-)

-- 
2.24.1

^ permalink raw reply	[flat|nested] 24+ messages in thread

* [PATCH v5 01/12] ceph: add flag to designate that a request is asynchronous
  2020-02-19 13:25 [PATCH v5 00/12] ceph: async directory operations support Jeff Layton
@ 2020-02-19 13:25 ` Jeff Layton
  2020-02-19 13:25 ` [PATCH v5 02/12] ceph: track primary dentry link Jeff Layton
                   ` (10 subsequent siblings)
  11 siblings, 0 replies; 24+ messages in thread
From: Jeff Layton @ 2020-02-19 13:25 UTC (permalink / raw)
  To: ceph-devel; +Cc: idryomov, sage, zyan, pdonnell, xiubli

...and ensure that such requests are never queued. The MDS has need to
know that a request is asynchronous so add flags and proper
infrastructure for that.

Also, delegated inode numbers and directory caps are associated with the
session, so ensure that async requests are always transmitted on the
first attempt and are never queued to wait for session reestablishment.

If it does end up looking like we'll need to queue the request, then
have it return -EJUKEBOX so the caller can reattempt with a synchronous
request.

Signed-off-by: Jeff Layton <jlayton@kernel.org>
---
 fs/ceph/inode.c              |  1 +
 fs/ceph/mds_client.c         | 15 +++++++++++++++
 fs/ceph/mds_client.h         |  1 +
 include/linux/ceph/ceph_fs.h |  5 +++--
 4 files changed, 20 insertions(+), 2 deletions(-)

diff --git a/fs/ceph/inode.c b/fs/ceph/inode.c
index 094b8fc37787..9869ec101e88 100644
--- a/fs/ceph/inode.c
+++ b/fs/ceph/inode.c
@@ -1311,6 +1311,7 @@ int ceph_fill_trace(struct super_block *sb, struct ceph_mds_request *req)
 		err = fill_inode(in, req->r_locked_page, &rinfo->targeti, NULL,
 				session,
 				(!test_bit(CEPH_MDS_R_ABORTED, &req->r_req_flags) &&
+				 !test_bit(CEPH_MDS_R_ASYNC, &req->r_req_flags) &&
 				 rinfo->head->result == 0) ?  req->r_fmode : -1,
 				&req->r_caps_reservation);
 		if (err < 0) {
diff --git a/fs/ceph/mds_client.c b/fs/ceph/mds_client.c
index fab9d6461a65..94d18e643a3d 100644
--- a/fs/ceph/mds_client.c
+++ b/fs/ceph/mds_client.c
@@ -2528,6 +2528,8 @@ static int __prepare_send_request(struct ceph_mds_client *mdsc,
 	rhead->oldest_client_tid = cpu_to_le64(__get_oldest_tid(mdsc));
 	if (test_bit(CEPH_MDS_R_GOT_UNSAFE, &req->r_req_flags))
 		flags |= CEPH_MDS_FLAG_REPLAY;
+	if (test_bit(CEPH_MDS_R_ASYNC, &req->r_req_flags))
+		flags |= CEPH_MDS_FLAG_ASYNC;
 	if (req->r_parent)
 		flags |= CEPH_MDS_FLAG_WANT_DENTRY;
 	rhead->flags = cpu_to_le32(flags);
@@ -2611,6 +2613,10 @@ static void __do_request(struct ceph_mds_client *mdsc,
 	mds = __choose_mds(mdsc, req, &random);
 	if (mds < 0 ||
 	    ceph_mdsmap_get_state(mdsc->mdsmap, mds) < CEPH_MDS_STATE_ACTIVE) {
+		if (test_bit(CEPH_MDS_R_ASYNC, &req->r_req_flags)) {
+			err = -EJUKEBOX;
+			goto finish;
+		}
 		dout("do_request no mds or not active, waiting for map\n");
 		list_add(&req->r_wait, &mdsc->waiting_for_map);
 		return;
@@ -2635,6 +2641,15 @@ static void __do_request(struct ceph_mds_client *mdsc,
 			err = -EACCES;
 			goto out_session;
 		}
+		/*
+		 * We cannot queue async requests since the caps and delegated
+		 * inodes are bound to the session. Just return -EJUKEBOX and
+		 * let the caller retry a sync request in that case.
+		 */
+		if (test_bit(CEPH_MDS_R_ASYNC, &req->r_req_flags)) {
+			err = -EJUKEBOX;
+			goto out_session;
+		}
 		if (session->s_state == CEPH_MDS_SESSION_NEW ||
 		    session->s_state == CEPH_MDS_SESSION_CLOSING) {
 			__open_session(mdsc, session);
diff --git a/fs/ceph/mds_client.h b/fs/ceph/mds_client.h
index a0918d00117c..95ac00e59e66 100644
--- a/fs/ceph/mds_client.h
+++ b/fs/ceph/mds_client.h
@@ -255,6 +255,7 @@ struct ceph_mds_request {
 #define CEPH_MDS_R_GOT_RESULT		(5) /* got a result */
 #define CEPH_MDS_R_DID_PREPOPULATE	(6) /* prepopulated readdir */
 #define CEPH_MDS_R_PARENT_LOCKED	(7) /* is r_parent->i_rwsem wlocked? */
+#define CEPH_MDS_R_ASYNC		(8) /* async request */
 	unsigned long	r_req_flags;
 
 	struct mutex r_fill_mutex;
diff --git a/include/linux/ceph/ceph_fs.h b/include/linux/ceph/ceph_fs.h
index cb21c5cf12c3..9f747a1b8788 100644
--- a/include/linux/ceph/ceph_fs.h
+++ b/include/linux/ceph/ceph_fs.h
@@ -444,8 +444,9 @@ union ceph_mds_request_args {
 	} __attribute__ ((packed)) lookupino;
 } __attribute__ ((packed));
 
-#define CEPH_MDS_FLAG_REPLAY        1  /* this is a replayed op */
-#define CEPH_MDS_FLAG_WANT_DENTRY   2  /* want dentry in reply */
+#define CEPH_MDS_FLAG_REPLAY		1 /* this is a replayed op */
+#define CEPH_MDS_FLAG_WANT_DENTRY	2 /* want dentry in reply */
+#define CEPH_MDS_FLAG_ASYNC		4 /* request is asynchronous */
 
 struct ceph_mds_request_head {
 	__le64 oldest_client_tid;
-- 
2.24.1

^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [PATCH v5 02/12] ceph: track primary dentry link
  2020-02-19 13:25 [PATCH v5 00/12] ceph: async directory operations support Jeff Layton
  2020-02-19 13:25 ` [PATCH v5 01/12] ceph: add flag to designate that a request is asynchronous Jeff Layton
@ 2020-02-19 13:25 ` Jeff Layton
  2020-02-19 13:25 ` [PATCH v5 03/12] ceph: add infrastructure for waiting for async create to complete Jeff Layton
                   ` (9 subsequent siblings)
  11 siblings, 0 replies; 24+ messages in thread
From: Jeff Layton @ 2020-02-19 13:25 UTC (permalink / raw)
  To: ceph-devel; +Cc: idryomov, sage, zyan, pdonnell, xiubli

Newer versions of the MDS will flag a dentry as "primary". In later
patches, we'll need to consult this info, so track it in di->flags.

Signed-off-by: Jeff Layton <jlayton@kernel.org>
---
 fs/ceph/dir.c                | 1 +
 fs/ceph/inode.c              | 8 +++++++-
 fs/ceph/super.h              | 1 +
 include/linux/ceph/ceph_fs.h | 3 +++
 4 files changed, 12 insertions(+), 1 deletion(-)

diff --git a/fs/ceph/dir.c b/fs/ceph/dir.c
index d0cd0aba5843..a87274935a09 100644
--- a/fs/ceph/dir.c
+++ b/fs/ceph/dir.c
@@ -1411,6 +1411,7 @@ void ceph_invalidate_dentry_lease(struct dentry *dentry)
 	spin_lock(&dentry->d_lock);
 	di->time = jiffies;
 	di->lease_shared_gen = 0;
+	di->flags &= ~CEPH_DENTRY_PRIMARY_LINK;
 	__dentry_lease_unlist(di);
 	spin_unlock(&dentry->d_lock);
 }
diff --git a/fs/ceph/inode.c b/fs/ceph/inode.c
index 9869ec101e88..7478bd0283c1 100644
--- a/fs/ceph/inode.c
+++ b/fs/ceph/inode.c
@@ -1051,6 +1051,7 @@ static void __update_dentry_lease(struct inode *dir, struct dentry *dentry,
 				  struct ceph_mds_session **old_lease_session)
 {
 	struct ceph_dentry_info *di = ceph_dentry(dentry);
+	unsigned mask = le16_to_cpu(lease->mask);
 	long unsigned duration = le32_to_cpu(lease->duration_ms);
 	long unsigned ttl = from_time + (duration * HZ) / 1000;
 	long unsigned half_ttl = from_time + (duration * HZ / 2) / 1000;
@@ -1062,8 +1063,13 @@ static void __update_dentry_lease(struct inode *dir, struct dentry *dentry,
 	if (ceph_snap(dir) != CEPH_NOSNAP)
 		return;
 
+	if (mask & CEPH_LEASE_PRIMARY_LINK)
+		di->flags |= CEPH_DENTRY_PRIMARY_LINK;
+	else
+		di->flags &= ~CEPH_DENTRY_PRIMARY_LINK;
+
 	di->lease_shared_gen = atomic_read(&ceph_inode(dir)->i_shared_gen);
-	if (duration == 0) {
+	if (!(mask & CEPH_LEASE_VALID)) {
 		__ceph_dentry_dir_lease_touch(di);
 		return;
 	}
diff --git a/fs/ceph/super.h b/fs/ceph/super.h
index 37dc1ac8f6c3..3430d7ffe8f7 100644
--- a/fs/ceph/super.h
+++ b/fs/ceph/super.h
@@ -284,6 +284,7 @@ struct ceph_dentry_info {
 #define CEPH_DENTRY_REFERENCED		1
 #define CEPH_DENTRY_LEASE_LIST		2
 #define CEPH_DENTRY_SHRINK_LIST		4
+#define CEPH_DENTRY_PRIMARY_LINK	8
 
 struct ceph_inode_xattrs_info {
 	/*
diff --git a/include/linux/ceph/ceph_fs.h b/include/linux/ceph/ceph_fs.h
index 9f747a1b8788..94cc4b047987 100644
--- a/include/linux/ceph/ceph_fs.h
+++ b/include/linux/ceph/ceph_fs.h
@@ -531,6 +531,9 @@ struct ceph_mds_reply_lease {
 	__le32 seq;
 } __attribute__ ((packed));
 
+#define CEPH_LEASE_VALID        (1 | 2) /* old and new bit values */
+#define CEPH_LEASE_PRIMARY_LINK 4       /* primary linkage */
+
 struct ceph_mds_reply_dirfrag {
 	__le32 frag;            /* fragment */
 	__le32 auth;            /* auth mds, if this is a delegation point */
-- 
2.24.1

^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [PATCH v5 03/12] ceph: add infrastructure for waiting for async create to complete
  2020-02-19 13:25 [PATCH v5 00/12] ceph: async directory operations support Jeff Layton
  2020-02-19 13:25 ` [PATCH v5 01/12] ceph: add flag to designate that a request is asynchronous Jeff Layton
  2020-02-19 13:25 ` [PATCH v5 02/12] ceph: track primary dentry link Jeff Layton
@ 2020-02-19 13:25 ` Jeff Layton
  2020-02-20  3:32   ` Yan, Zheng
  2020-02-19 13:25 ` [PATCH v5 04/12] ceph: make __take_cap_refs non-static Jeff Layton
                   ` (8 subsequent siblings)
  11 siblings, 1 reply; 24+ messages in thread
From: Jeff Layton @ 2020-02-19 13:25 UTC (permalink / raw)
  To: ceph-devel; +Cc: idryomov, sage, zyan, pdonnell, xiubli

When we issue an async create, we must ensure that any later on-the-wire
requests involving it wait for the create reply.

Expand i_ceph_flags to be an unsigned long, and add a new bit that
MDS requests can wait on. If the bit is set in the inode when sending
caps, then don't send it and just return that it has been delayed.

Signed-off-by: Jeff Layton <jlayton@kernel.org>
---
 fs/ceph/caps.c       | 13 ++++++++++++-
 fs/ceph/dir.c        |  2 +-
 fs/ceph/mds_client.c | 20 +++++++++++++++++++-
 fs/ceph/mds_client.h |  7 +++++++
 fs/ceph/super.h      |  4 +++-
 5 files changed, 42 insertions(+), 4 deletions(-)

diff --git a/fs/ceph/caps.c b/fs/ceph/caps.c
index d05717397c2a..85e13aa359d2 100644
--- a/fs/ceph/caps.c
+++ b/fs/ceph/caps.c
@@ -511,7 +511,7 @@ static void __cap_delay_requeue(struct ceph_mds_client *mdsc,
 				struct ceph_inode_info *ci,
 				bool set_timeout)
 {
-	dout("__cap_delay_requeue %p flags %d at %lu\n", &ci->vfs_inode,
+	dout("__cap_delay_requeue %p flags 0x%lx at %lu\n", &ci->vfs_inode,
 	     ci->i_ceph_flags, ci->i_hold_caps_max);
 	if (!mdsc->stopping) {
 		spin_lock(&mdsc->cap_delay_lock);
@@ -1294,6 +1294,13 @@ static int __send_cap(struct ceph_mds_client *mdsc, struct ceph_cap *cap,
 	int delayed = 0;
 	int ret;
 
+	/* Don't send anything if it's still being created. Return delayed */
+	if (ci->i_ceph_flags & CEPH_I_ASYNC_CREATE) {
+		spin_unlock(&ci->i_ceph_lock);
+		dout("%s async create in flight for %p\n", __func__, inode);
+		return 1;
+	}
+
 	held = cap->issued | cap->implemented;
 	revoking = cap->implemented & ~cap->issued;
 	retain &= ~revoking;
@@ -2250,6 +2257,10 @@ int ceph_fsync(struct file *file, loff_t start, loff_t end, int datasync)
 	if (datasync)
 		goto out;
 
+	ret = ceph_wait_on_async_create(inode);
+	if (ret)
+		goto out;
+
 	dirty = try_flush_caps(inode, &flush_tid);
 	dout("fsync dirty caps are %s\n", ceph_cap_string(dirty));
 
diff --git a/fs/ceph/dir.c b/fs/ceph/dir.c
index a87274935a09..5b83bda57056 100644
--- a/fs/ceph/dir.c
+++ b/fs/ceph/dir.c
@@ -752,7 +752,7 @@ static struct dentry *ceph_lookup(struct inode *dir, struct dentry *dentry,
 		struct ceph_dentry_info *di = ceph_dentry(dentry);
 
 		spin_lock(&ci->i_ceph_lock);
-		dout(" dir %p flags are %d\n", dir, ci->i_ceph_flags);
+		dout(" dir %p flags are 0x%lx\n", dir, ci->i_ceph_flags);
 		if (strncmp(dentry->d_name.name,
 			    fsc->mount_options->snapdir_name,
 			    dentry->d_name.len) &&
diff --git a/fs/ceph/mds_client.c b/fs/ceph/mds_client.c
index 94d18e643a3d..38eb9dd5062b 100644
--- a/fs/ceph/mds_client.c
+++ b/fs/ceph/mds_client.c
@@ -2730,7 +2730,7 @@ static void kick_requests(struct ceph_mds_client *mdsc, int mds)
 int ceph_mdsc_submit_request(struct ceph_mds_client *mdsc, struct inode *dir,
 			      struct ceph_mds_request *req)
 {
-	int err;
+	int err = 0;
 
 	/* take CAP_PIN refs for r_inode, r_parent, r_old_dentry */
 	if (req->r_inode)
@@ -2743,6 +2743,24 @@ int ceph_mdsc_submit_request(struct ceph_mds_client *mdsc, struct inode *dir,
 		ceph_get_cap_refs(ceph_inode(req->r_old_dentry_dir),
 				  CEPH_CAP_PIN);
 
+	if (req->r_inode) {
+		err = ceph_wait_on_async_create(req->r_inode);
+		if (err) {
+			dout("%s: wait for async create returned: %d\n",
+			     __func__, err);
+			return err;
+		}
+	}
+
+	if (!err && req->r_old_inode) {
+		err = ceph_wait_on_async_create(req->r_old_inode);
+		if (err) {
+			dout("%s: wait for async create returned: %d\n",
+			     __func__, err);
+			return err;
+		}
+	}
+
 	dout("submit_request on %p for inode %p\n", req, dir);
 	mutex_lock(&mdsc->mutex);
 	__register_request(mdsc, req, dir);
diff --git a/fs/ceph/mds_client.h b/fs/ceph/mds_client.h
index 95ac00e59e66..8043f2b439b1 100644
--- a/fs/ceph/mds_client.h
+++ b/fs/ceph/mds_client.h
@@ -538,4 +538,11 @@ extern void ceph_mdsc_open_export_target_sessions(struct ceph_mds_client *mdsc,
 extern int ceph_trim_caps(struct ceph_mds_client *mdsc,
 			  struct ceph_mds_session *session,
 			  int max_caps);
+static inline int ceph_wait_on_async_create(struct inode *inode)
+{
+	struct ceph_inode_info *ci = ceph_inode(inode);
+
+	return wait_on_bit(&ci->i_ceph_flags, CEPH_ASYNC_CREATE_BIT,
+			   TASK_INTERRUPTIBLE);
+}
 #endif
diff --git a/fs/ceph/super.h b/fs/ceph/super.h
index 3430d7ffe8f7..bfb03adb4a08 100644
--- a/fs/ceph/super.h
+++ b/fs/ceph/super.h
@@ -316,7 +316,7 @@ struct ceph_inode_info {
 	u64 i_inline_version;
 	u32 i_time_warp_seq;
 
-	unsigned i_ceph_flags;
+	unsigned long i_ceph_flags;
 	atomic64_t i_release_count;
 	atomic64_t i_ordered_count;
 	atomic64_t i_complete_seq[2];
@@ -524,6 +524,8 @@ static inline struct inode *ceph_find_inode(struct super_block *sb,
 #define CEPH_I_ERROR_WRITE	(1 << 10) /* have seen write errors */
 #define CEPH_I_ERROR_FILELOCK	(1 << 11) /* have seen file lock errors */
 #define CEPH_I_ODIRECT		(1 << 12) /* inode in direct I/O mode */
+#define CEPH_ASYNC_CREATE_BIT	(13)	  /* async create in flight for this */
+#define CEPH_I_ASYNC_CREATE	(1 << CEPH_ASYNC_CREATE_BIT)
 
 /*
  * Masks of ceph inode work.
-- 
2.24.1

^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [PATCH v5 04/12] ceph: make __take_cap_refs non-static
  2020-02-19 13:25 [PATCH v5 00/12] ceph: async directory operations support Jeff Layton
                   ` (2 preceding siblings ...)
  2020-02-19 13:25 ` [PATCH v5 03/12] ceph: add infrastructure for waiting for async create to complete Jeff Layton
@ 2020-02-19 13:25 ` Jeff Layton
  2020-02-19 13:25 ` [PATCH v5 05/12] ceph: cap tracking for async directory operations Jeff Layton
                   ` (7 subsequent siblings)
  11 siblings, 0 replies; 24+ messages in thread
From: Jeff Layton @ 2020-02-19 13:25 UTC (permalink / raw)
  To: ceph-devel; +Cc: idryomov, sage, zyan, pdonnell, xiubli

Rename it to ceph_take_cap_refs and make it available to other files.
Also replace a comment with a lockdep assertion.

Signed-off-by: Jeff Layton <jlayton@kernel.org>
---
 fs/ceph/caps.c  | 12 ++++++------
 fs/ceph/super.h |  2 ++
 2 files changed, 8 insertions(+), 6 deletions(-)

diff --git a/fs/ceph/caps.c b/fs/ceph/caps.c
index 85e13aa359d2..295837215a3a 100644
--- a/fs/ceph/caps.c
+++ b/fs/ceph/caps.c
@@ -2516,12 +2516,12 @@ static void kick_flushing_inode_caps(struct ceph_mds_client *mdsc,
 /*
  * Take references to capabilities we hold, so that we don't release
  * them to the MDS prematurely.
- *
- * Protected by i_ceph_lock.
  */
-static void __take_cap_refs(struct ceph_inode_info *ci, int got,
+void ceph_take_cap_refs(struct ceph_inode_info *ci, int got,
 			    bool snap_rwsem_locked)
 {
+	lockdep_assert_held(&ci->i_ceph_lock);
+
 	if (got & CEPH_CAP_PIN)
 		ci->i_pin_ref++;
 	if (got & CEPH_CAP_FILE_RD)
@@ -2542,7 +2542,7 @@ static void __take_cap_refs(struct ceph_inode_info *ci, int got,
 		if (ci->i_wb_ref == 0)
 			ihold(&ci->vfs_inode);
 		ci->i_wb_ref++;
-		dout("__take_cap_refs %p wb %d -> %d (?)\n",
+		dout("%s %p wb %d -> %d (?)\n", __func__,
 		     &ci->vfs_inode, ci->i_wb_ref-1, ci->i_wb_ref);
 	}
 }
@@ -2665,7 +2665,7 @@ static int try_get_cap_refs(struct inode *inode, int need, int want,
 			    (need & CEPH_CAP_FILE_RD) &&
 			    !(*got & CEPH_CAP_FILE_CACHE))
 				ceph_disable_fscache_readpage(ci);
-			__take_cap_refs(ci, *got, true);
+			ceph_take_cap_refs(ci, *got, true);
 			ret = 1;
 		}
 	} else {
@@ -2894,7 +2894,7 @@ int ceph_get_caps(struct file *filp, int need, int want,
 void ceph_get_cap_refs(struct ceph_inode_info *ci, int caps)
 {
 	spin_lock(&ci->i_ceph_lock);
-	__take_cap_refs(ci, caps, false);
+	ceph_take_cap_refs(ci, caps, false);
 	spin_unlock(&ci->i_ceph_lock);
 }
 
diff --git a/fs/ceph/super.h b/fs/ceph/super.h
index bfb03adb4a08..2393803c38de 100644
--- a/fs/ceph/super.h
+++ b/fs/ceph/super.h
@@ -1053,6 +1053,8 @@ extern void ceph_kick_flushing_caps(struct ceph_mds_client *mdsc,
 				    struct ceph_mds_session *session);
 extern struct ceph_cap *ceph_get_cap_for_mds(struct ceph_inode_info *ci,
 					     int mds);
+extern void ceph_take_cap_refs(struct ceph_inode_info *ci, int caps,
+				bool snap_rwsem_locked);
 extern void ceph_get_cap_refs(struct ceph_inode_info *ci, int caps);
 extern void ceph_put_cap_refs(struct ceph_inode_info *ci, int had);
 extern void ceph_put_wrbuffer_cap_refs(struct ceph_inode_info *ci, int nr,
-- 
2.24.1

^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [PATCH v5 05/12] ceph: cap tracking for async directory operations
  2020-02-19 13:25 [PATCH v5 00/12] ceph: async directory operations support Jeff Layton
                   ` (3 preceding siblings ...)
  2020-02-19 13:25 ` [PATCH v5 04/12] ceph: make __take_cap_refs non-static Jeff Layton
@ 2020-02-19 13:25 ` Jeff Layton
  2020-02-20  6:42   ` Yan, Zheng
  2020-02-19 13:25 ` [PATCH v5 06/12] ceph: don't take refs to want mask unless we have all bits Jeff Layton
                   ` (6 subsequent siblings)
  11 siblings, 1 reply; 24+ messages in thread
From: Jeff Layton @ 2020-02-19 13:25 UTC (permalink / raw)
  To: ceph-devel; +Cc: idryomov, sage, zyan, pdonnell, xiubli

Track and correctly handle directory caps for asynchronous operations.
Add aliases for Frc caps that we now designate at Dcu caps (when dealing
with directories).

Unlike file caps, we don't reclaim these when the session goes away, and
instead preemptively release them. In-flight async dirops are instead
handled during reconnect phase. The client needs to re-do a synchronous
operation in order to re-get directory caps.

Signed-off-by: Jeff Layton <jlayton@kernel.org>
---
 fs/ceph/caps.c               | 29 ++++++++++++++++++++---------
 fs/ceph/mds_client.c         | 31 ++++++++++++++++++++++++++-----
 fs/ceph/mds_client.h         |  6 +++++-
 include/linux/ceph/ceph_fs.h |  6 ++++++
 4 files changed, 57 insertions(+), 15 deletions(-)

diff --git a/fs/ceph/caps.c b/fs/ceph/caps.c
index 295837215a3a..d6c5ee33f30f 100644
--- a/fs/ceph/caps.c
+++ b/fs/ceph/caps.c
@@ -992,7 +992,11 @@ int __ceph_caps_file_wanted(struct ceph_inode_info *ci)
 int __ceph_caps_wanted(struct ceph_inode_info *ci)
 {
 	int w = __ceph_caps_file_wanted(ci) | __ceph_caps_used(ci);
-	if (!S_ISDIR(ci->vfs_inode.i_mode)) {
+	if (S_ISDIR(ci->vfs_inode.i_mode)) {
+		/* we want EXCL if holding caps of dir ops */
+		if (w & CEPH_CAP_ANY_DIR_OPS)
+			w |= CEPH_CAP_FILE_EXCL;
+	} else {
 		/* we want EXCL if dirty data */
 		if (w & CEPH_CAP_FILE_BUFFER)
 			w |= CEPH_CAP_FILE_EXCL;
@@ -1890,10 +1894,13 @@ void ceph_check_caps(struct ceph_inode_info *ci, int flags,
 			 * revoking the shared cap on every create/unlink
 			 * operation.
 			 */
-			if (IS_RDONLY(inode))
+			if (IS_RDONLY(inode)) {
 				want = CEPH_CAP_ANY_SHARED;
-			else
-				want = CEPH_CAP_ANY_SHARED | CEPH_CAP_FILE_EXCL;
+			} else {
+				want = CEPH_CAP_ANY_SHARED |
+				       CEPH_CAP_FILE_EXCL |
+				       CEPH_CAP_ANY_DIR_OPS;
+			}
 			retain |= want;
 		} else {
 
@@ -2750,13 +2757,17 @@ int ceph_try_get_caps(struct inode *inode, int need, int want,
 	int ret;
 
 	BUG_ON(need & ~CEPH_CAP_FILE_RD);
-	BUG_ON(want & ~(CEPH_CAP_FILE_CACHE|CEPH_CAP_FILE_LAZYIO|CEPH_CAP_FILE_SHARED));
-	ret = ceph_pool_perm_check(inode, need);
-	if (ret < 0)
-		return ret;
+	if (need) {
+		ret = ceph_pool_perm_check(inode, need);
+		if (ret < 0)
+			return ret;
+	}
 
+	BUG_ON(want & ~(CEPH_CAP_FILE_CACHE | CEPH_CAP_FILE_LAZYIO |
+			CEPH_CAP_FILE_SHARED | CEPH_CAP_FILE_EXCL |
+			CEPH_CAP_ANY_DIR_OPS));
 	ret = try_get_cap_refs(inode, need, want, 0,
-			       (nonblock ? NON_BLOCKING : 0), got);
+			       nonblock ? NON_BLOCKING : 0, got);
 	return ret == -EAGAIN ? 0 : ret;
 }
 
diff --git a/fs/ceph/mds_client.c b/fs/ceph/mds_client.c
index 38eb9dd5062b..ef3dd6fe2f4d 100644
--- a/fs/ceph/mds_client.c
+++ b/fs/ceph/mds_client.c
@@ -699,6 +699,7 @@ void ceph_mdsc_release_request(struct kref *kref)
 	struct ceph_mds_request *req = container_of(kref,
 						    struct ceph_mds_request,
 						    r_kref);
+	ceph_mdsc_release_dir_caps(req);
 	destroy_reply_info(&req->r_reply_info);
 	if (req->r_request)
 		ceph_msg_put(req->r_request);
@@ -3280,6 +3281,17 @@ static void handle_session(struct ceph_mds_session *session,
 	return;
 }
 
+void ceph_mdsc_release_dir_caps(struct ceph_mds_request *req)
+{
+	int dcaps;
+
+	dcaps = xchg(&req->r_dir_caps, 0);
+	if (dcaps) {
+		dout("releasing r_dir_caps=%s\n", ceph_cap_string(dcaps));
+		ceph_put_cap_refs(ceph_inode(req->r_parent), dcaps);
+	}
+}
+
 /*
  * called under session->mutex.
  */
@@ -3307,9 +3319,14 @@ static void replay_unsafe_requests(struct ceph_mds_client *mdsc,
 			continue;
 		if (req->r_attempts == 0)
 			continue; /* only old requests */
-		if (req->r_session &&
-		    req->r_session->s_mds == session->s_mds)
-			__send_request(mdsc, session, req, true);
+		if (!req->r_session)
+			continue;
+		if (req->r_session->s_mds != session->s_mds)
+			continue;
+
+		ceph_mdsc_release_dir_caps(req);
+
+		__send_request(mdsc, session, req, true);
 	}
 	mutex_unlock(&mdsc->mutex);
 }
@@ -3393,7 +3410,7 @@ static int send_reconnect_partial(struct ceph_reconnect_state *recon_state)
 /*
  * Encode information about a cap for a reconnect with the MDS.
  */
-static int encode_caps_cb(struct inode *inode, struct ceph_cap *cap,
+static int reconnect_caps_cb(struct inode *inode, struct ceph_cap *cap,
 			  void *arg)
 {
 	union {
@@ -3416,6 +3433,10 @@ static int encode_caps_cb(struct inode *inode, struct ceph_cap *cap,
 	cap->mseq = 0;       /* and migrate_seq */
 	cap->cap_gen = cap->session->s_cap_gen;
 
+	/* These are lost when the session goes away */
+	if (S_ISDIR(inode->i_mode))
+		cap->issued &= ~CEPH_CAP_ANY_DIR_OPS;
+
 	if (recon_state->msg_version >= 2) {
 		rec.v2.cap_id = cpu_to_le64(cap->cap_id);
 		rec.v2.wanted = cpu_to_le32(__ceph_caps_wanted(ci));
@@ -3712,7 +3733,7 @@ static void send_mds_reconnect(struct ceph_mds_client *mdsc,
 		recon_state.msg_version = 2;
 	}
 	/* trsaverse this session's caps */
-	err = ceph_iterate_session_caps(session, encode_caps_cb, &recon_state);
+	err = ceph_iterate_session_caps(session, reconnect_caps_cb, &recon_state);
 
 	spin_lock(&session->s_cap_lock);
 	session->s_cap_reconnect = 0;
diff --git a/fs/ceph/mds_client.h b/fs/ceph/mds_client.h
index 8043f2b439b1..f10d342ea585 100644
--- a/fs/ceph/mds_client.h
+++ b/fs/ceph/mds_client.h
@@ -284,8 +284,11 @@ struct ceph_mds_request {
 	struct ceph_msg  *r_request;  /* original request */
 	struct ceph_msg  *r_reply;
 	struct ceph_mds_reply_info_parsed r_reply_info;
-	struct page *r_locked_page;
 	int r_err;
+
+
+	struct page *r_locked_page;
+	int r_dir_caps;
 	int r_num_caps;
 	u32               r_readdir_offset;
 
@@ -489,6 +492,7 @@ extern int ceph_mdsc_submit_request(struct ceph_mds_client *mdsc,
 extern int ceph_mdsc_do_request(struct ceph_mds_client *mdsc,
 				struct inode *dir,
 				struct ceph_mds_request *req);
+extern void ceph_mdsc_release_dir_caps(struct ceph_mds_request *req);
 static inline void ceph_mdsc_get_request(struct ceph_mds_request *req)
 {
 	kref_get(&req->r_kref);
diff --git a/include/linux/ceph/ceph_fs.h b/include/linux/ceph/ceph_fs.h
index 94cc4b047987..91d09cf37649 100644
--- a/include/linux/ceph/ceph_fs.h
+++ b/include/linux/ceph/ceph_fs.h
@@ -663,6 +663,12 @@ int ceph_flags_to_mode(int flags);
 #define CEPH_CAP_LOCKS (CEPH_LOCK_IFILE | CEPH_LOCK_IAUTH | CEPH_LOCK_ILINK | \
 			CEPH_LOCK_IXATTR)
 
+/* cap masks async dir operations */
+#define CEPH_CAP_DIR_CREATE	CEPH_CAP_FILE_CACHE
+#define CEPH_CAP_DIR_UNLINK	CEPH_CAP_FILE_RD
+#define CEPH_CAP_ANY_DIR_OPS	(CEPH_CAP_FILE_CACHE | CEPH_CAP_FILE_RD | \
+				 CEPH_CAP_FILE_WREXTEND | CEPH_CAP_FILE_LAZYIO)
+
 int ceph_caps_for_mode(int mode);
 
 enum {
-- 
2.24.1

^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [PATCH v5 06/12] ceph: don't take refs to want mask unless we have all bits
  2020-02-19 13:25 [PATCH v5 00/12] ceph: async directory operations support Jeff Layton
                   ` (4 preceding siblings ...)
  2020-02-19 13:25 ` [PATCH v5 05/12] ceph: cap tracking for async directory operations Jeff Layton
@ 2020-02-19 13:25 ` Jeff Layton
  2020-02-19 13:25 ` [PATCH v5 07/12] ceph: perform asynchronous unlink if we have sufficient caps Jeff Layton
                   ` (5 subsequent siblings)
  11 siblings, 0 replies; 24+ messages in thread
From: Jeff Layton @ 2020-02-19 13:25 UTC (permalink / raw)
  To: ceph-devel; +Cc: idryomov, sage, zyan, pdonnell, xiubli

From: "Yan, Zheng" <ukernel@gmail.com>

If we don't have all of the cap bits for the want mask in
try_get_cap_refs, then just take refs on the need bits.

Signed-off-by: "Yan, Zheng" <ukernel@gmail.com>
Signed-off-by: Jeff Layton <jlayton@kernel.org>
---
 fs/ceph/caps.c | 5 ++++-
 1 file changed, 4 insertions(+), 1 deletion(-)

Zheng,

I broke this patch out on its own as I wasn't sure it was still
needed with the latest iteration of the code. We can fold it into
the previous one if we do want it, or just drop it.

Thanks,
Jeff

diff --git a/fs/ceph/caps.c b/fs/ceph/caps.c
index d6c5ee33f30f..c96b18407aef 100644
--- a/fs/ceph/caps.c
+++ b/fs/ceph/caps.c
@@ -2667,7 +2667,10 @@ static int try_get_cap_refs(struct inode *inode, int need, int want,
 				}
 				snap_rwsem_locked = true;
 			}
-			*got = need | (have & want);
+			if ((have & want) == want)
+				*got = need | want;
+			else
+				*got = need;
 			if (S_ISREG(inode->i_mode) &&
 			    (need & CEPH_CAP_FILE_RD) &&
 			    !(*got & CEPH_CAP_FILE_CACHE))
-- 
2.24.1

^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [PATCH v5 07/12] ceph: perform asynchronous unlink if we have sufficient caps
  2020-02-19 13:25 [PATCH v5 00/12] ceph: async directory operations support Jeff Layton
                   ` (5 preceding siblings ...)
  2020-02-19 13:25 ` [PATCH v5 06/12] ceph: don't take refs to want mask unless we have all bits Jeff Layton
@ 2020-02-19 13:25 ` Jeff Layton
  2020-02-20  6:44   ` Yan, Zheng
  2020-02-19 13:25 ` [PATCH v5 08/12] ceph: make ceph_fill_inode non-static Jeff Layton
                   ` (4 subsequent siblings)
  11 siblings, 1 reply; 24+ messages in thread
From: Jeff Layton @ 2020-02-19 13:25 UTC (permalink / raw)
  To: ceph-devel; +Cc: idryomov, sage, zyan, pdonnell, xiubli

The MDS is getting a new lock-caching facility that will allow it
to cache the necessary locks to allow asynchronous directory operations.
Since the CEPH_CAP_FILE_* caps are currently unused on directories,
we can repurpose those bits for this purpose.

When performing an unlink, if we have Fx on the parent directory,
and CEPH_CAP_DIR_UNLINK (aka Fr), and we know that the dentry being
removed is the primary link, then then we can fire off an unlink
request immediately and don't need to wait on reply before returning.

In that situation, just fix up the dcache and link count and return
immediately after issuing the call to the MDS. This does mean that we
need to hold an extra reference to the inode being unlinked, and extra
references to the caps to avoid races. Those references are put and
error handling is done in the r_callback routine.

If the operation ends up failing, then set a writeback error on the
directory inode, and the inode itself that can be fetched later by
an fsync on the dir.

The behavior of dir caps is slightly different from caps on normal
files. Because these are just considered an optimization, if the
session is reconnected, we will not automatically reclaim them. They
are instead considered lost until we do another synchronous op in the
parent directory.

Async dirops are enabled via the "nowsync" mount option, which is
patterned after the xfs "wsync" mount option. For now, the default
is "wsync", but eventually we may flip that.

Signed-off-by: Jeff Layton <jlayton@kernel.org>
---
 fs/ceph/dir.c   | 103 ++++++++++++++++++++++++++++++++++++++++++++++--
 fs/ceph/super.c |  20 ++++++++++
 fs/ceph/super.h |   5 ++-
 3 files changed, 123 insertions(+), 5 deletions(-)

diff --git a/fs/ceph/dir.c b/fs/ceph/dir.c
index 5b83bda57056..37ab09d223fc 100644
--- a/fs/ceph/dir.c
+++ b/fs/ceph/dir.c
@@ -1036,6 +1036,73 @@ static int ceph_link(struct dentry *old_dentry, struct inode *dir,
 	return err;
 }
 
+static void ceph_async_unlink_cb(struct ceph_mds_client *mdsc,
+				 struct ceph_mds_request *req)
+{
+	int result = req->r_err ? req->r_err :
+			le32_to_cpu(req->r_reply_info.head->result);
+
+	/* If op failed, mark everyone involved for errors */
+	if (result) {
+		int pathlen;
+		u64 base;
+		char *path = ceph_mdsc_build_path(req->r_dentry, &pathlen,
+						  &base, 0);
+
+		/* mark error on parent + clear complete */
+		mapping_set_error(req->r_parent->i_mapping, result);
+		ceph_dir_clear_complete(req->r_parent);
+
+		/* drop the dentry -- we don't know its status */
+		if (!d_unhashed(req->r_dentry))
+			d_drop(req->r_dentry);
+
+		/* mark inode itself for an error (since metadata is bogus) */
+		mapping_set_error(req->r_old_inode->i_mapping, result);
+
+		pr_warn("ceph: async unlink failure path=(%llx)%s result=%d!\n",
+			base, IS_ERR(path) ? "<<bad>>" : path, result);
+		ceph_mdsc_free_path(path, pathlen);
+	}
+	iput(req->r_old_inode);
+}
+
+static int get_caps_for_async_unlink(struct inode *dir, struct dentry *dentry)
+{
+	struct ceph_inode_info *ci = ceph_inode(dir);
+	struct ceph_dentry_info *di;
+	int got = 0, want = CEPH_CAP_FILE_EXCL | CEPH_CAP_DIR_UNLINK;
+
+	spin_lock(&ci->i_ceph_lock);
+	if ((__ceph_caps_issued(ci, NULL) & want) == want) {
+		ceph_take_cap_refs(ci, want, false);
+		got = want;
+	}
+	spin_unlock(&ci->i_ceph_lock);
+
+	/* If we didn't get anything, return 0 */
+	if (!got)
+		return 0;
+
+        spin_lock(&dentry->d_lock);
+        di = ceph_dentry(dentry);
+	/*
+	 * - We are holding Fx, which implies Fs caps.
+	 * - Only support async unlink for primary linkage
+	 */
+	if (atomic_read(&ci->i_shared_gen) != di->lease_shared_gen ||
+	    !(di->flags & CEPH_DENTRY_PRIMARY_LINK))
+		want = 0;
+        spin_unlock(&dentry->d_lock);
+
+	/* Do we still want what we've got? */
+	if (want == got)
+		return got;
+
+	ceph_put_cap_refs(ci, got);
+	return 0;
+}
+
 /*
  * rmdir and unlink are differ only by the metadata op code
  */
@@ -1045,6 +1112,7 @@ static int ceph_unlink(struct inode *dir, struct dentry *dentry)
 	struct ceph_mds_client *mdsc = fsc->mdsc;
 	struct inode *inode = d_inode(dentry);
 	struct ceph_mds_request *req;
+	bool try_async = ceph_test_mount_opt(fsc, ASYNC_DIROPS);
 	int err = -EROFS;
 	int op;
 
@@ -1059,6 +1127,7 @@ static int ceph_unlink(struct inode *dir, struct dentry *dentry)
 			CEPH_MDS_OP_RMDIR : CEPH_MDS_OP_UNLINK;
 	} else
 		goto out;
+retry:
 	req = ceph_mdsc_create_request(mdsc, op, USE_AUTH_MDS);
 	if (IS_ERR(req)) {
 		err = PTR_ERR(req);
@@ -1067,13 +1136,39 @@ static int ceph_unlink(struct inode *dir, struct dentry *dentry)
 	req->r_dentry = dget(dentry);
 	req->r_num_caps = 2;
 	req->r_parent = dir;
-	set_bit(CEPH_MDS_R_PARENT_LOCKED, &req->r_req_flags);
 	req->r_dentry_drop = CEPH_CAP_FILE_SHARED;
 	req->r_dentry_unless = CEPH_CAP_FILE_EXCL;
 	req->r_inode_drop = ceph_drop_caps_for_unlink(inode);
-	err = ceph_mdsc_do_request(mdsc, dir, req);
-	if (!err && !req->r_reply_info.head->is_dentry)
-		d_delete(dentry);
+
+	if (try_async && op == CEPH_MDS_OP_UNLINK &&
+	    (req->r_dir_caps = get_caps_for_async_unlink(dir, dentry))) {
+		dout("async unlink on %lu/%.*s caps=%s", dir->i_ino,
+		     dentry->d_name.len, dentry->d_name.name,
+		     ceph_cap_string(req->r_dir_caps));
+		set_bit(CEPH_MDS_R_ASYNC, &req->r_req_flags);
+		req->r_callback = ceph_async_unlink_cb;
+		req->r_old_inode = d_inode(dentry);
+		ihold(req->r_old_inode);
+		err = ceph_mdsc_submit_request(mdsc, dir, req);
+		if (!err) {
+			/*
+			 * We have enough caps, so we assume that the unlink
+			 * will succeed. Fix up the target inode and dcache.
+			 */
+			drop_nlink(inode);
+			d_delete(dentry);
+		} else if (err == -EJUKEBOX) {
+			try_async = false;
+			ceph_mdsc_put_request(req);
+			goto retry;
+		}
+	} else {
+		set_bit(CEPH_MDS_R_PARENT_LOCKED, &req->r_req_flags);
+		err = ceph_mdsc_do_request(mdsc, dir, req);
+		if (!err && !req->r_reply_info.head->is_dentry)
+			d_delete(dentry);
+	}
+
 	ceph_mdsc_put_request(req);
 out:
 	return err;
diff --git a/fs/ceph/super.c b/fs/ceph/super.c
index b1329cd5388a..c9784eb1159a 100644
--- a/fs/ceph/super.c
+++ b/fs/ceph/super.c
@@ -155,6 +155,7 @@ enum {
 	Opt_acl,
 	Opt_quotadf,
 	Opt_copyfrom,
+	Opt_wsync,
 };
 
 enum ceph_recover_session_mode {
@@ -194,6 +195,7 @@ static const struct fs_parameter_spec ceph_mount_parameters[] = {
 	fsparam_string	("snapdirname",			Opt_snapdirname),
 	fsparam_string	("source",			Opt_source),
 	fsparam_u32	("wsize",			Opt_wsize),
+	fsparam_flag_no	("wsync",			Opt_wsync),
 	{}
 };
 
@@ -444,6 +446,12 @@ static int ceph_parse_mount_param(struct fs_context *fc,
 			fc->sb_flags &= ~SB_POSIXACL;
 		}
 		break;
+	case Opt_wsync:
+		if (!result.negated)
+			fsopt->flags &= ~CEPH_MOUNT_OPT_ASYNC_DIROPS;
+		else
+			fsopt->flags |= CEPH_MOUNT_OPT_ASYNC_DIROPS;
+		break;
 	default:
 		BUG();
 	}
@@ -567,6 +575,9 @@ static int ceph_show_options(struct seq_file *m, struct dentry *root)
 	if (fsopt->flags & CEPH_MOUNT_OPT_CLEANRECOVER)
 		seq_show_option(m, "recover_session", "clean");
 
+	if (fsopt->flags & CEPH_MOUNT_OPT_ASYNC_DIROPS)
+		seq_puts(m, ",nowsync");
+
 	if (fsopt->wsize != CEPH_MAX_WRITE_SIZE)
 		seq_printf(m, ",wsize=%u", fsopt->wsize);
 	if (fsopt->rsize != CEPH_MAX_READ_SIZE)
@@ -1115,6 +1126,15 @@ static void ceph_free_fc(struct fs_context *fc)
 
 static int ceph_reconfigure_fc(struct fs_context *fc)
 {
+	struct ceph_parse_opts_ctx *pctx = fc->fs_private;
+	struct ceph_mount_options *fsopt = pctx->opts;
+	struct ceph_fs_client *fsc = ceph_sb_to_client(fc->root->d_sb);
+
+	if (fsopt->flags & CEPH_MOUNT_OPT_ASYNC_DIROPS)
+		ceph_set_mount_opt(fsc, ASYNC_DIROPS);
+	else
+		ceph_clear_mount_opt(fsc, ASYNC_DIROPS);
+
 	sync_filesystem(fc->root->d_sb);
 	return 0;
 }
diff --git a/fs/ceph/super.h b/fs/ceph/super.h
index 2393803c38de..1b4996efc111 100644
--- a/fs/ceph/super.h
+++ b/fs/ceph/super.h
@@ -43,13 +43,16 @@
 #define CEPH_MOUNT_OPT_MOUNTWAIT       (1<<12) /* mount waits if no mds is up */
 #define CEPH_MOUNT_OPT_NOQUOTADF       (1<<13) /* no root dir quota in statfs */
 #define CEPH_MOUNT_OPT_NOCOPYFROM      (1<<14) /* don't use RADOS 'copy-from' op */
+#define CEPH_MOUNT_OPT_ASYNC_DIROPS    (1<<15) /* allow async directory ops */
 
 #define CEPH_MOUNT_OPT_DEFAULT			\
 	(CEPH_MOUNT_OPT_DCACHE |		\
 	 CEPH_MOUNT_OPT_NOCOPYFROM)
 
 #define ceph_set_mount_opt(fsc, opt) \
-	(fsc)->mount_options->flags |= CEPH_MOUNT_OPT_##opt;
+	(fsc)->mount_options->flags |= CEPH_MOUNT_OPT_##opt
+#define ceph_clear_mount_opt(fsc, opt) \
+	(fsc)->mount_options->flags &= ~CEPH_MOUNT_OPT_##opt
 #define ceph_test_mount_opt(fsc, opt) \
 	(!!((fsc)->mount_options->flags & CEPH_MOUNT_OPT_##opt))
 
-- 
2.24.1

^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [PATCH v5 08/12] ceph: make ceph_fill_inode non-static
  2020-02-19 13:25 [PATCH v5 00/12] ceph: async directory operations support Jeff Layton
                   ` (6 preceding siblings ...)
  2020-02-19 13:25 ` [PATCH v5 07/12] ceph: perform asynchronous unlink if we have sufficient caps Jeff Layton
@ 2020-02-19 13:25 ` Jeff Layton
  2020-02-19 13:25 ` [PATCH v5 09/12] ceph: decode interval_sets for delegated inos Jeff Layton
                   ` (3 subsequent siblings)
  11 siblings, 0 replies; 24+ messages in thread
From: Jeff Layton @ 2020-02-19 13:25 UTC (permalink / raw)
  To: ceph-devel; +Cc: idryomov, sage, zyan, pdonnell, xiubli

Signed-off-by: Jeff Layton <jlayton@kernel.org>
---
 fs/ceph/inode.c | 47 ++++++++++++++++++++++++-----------------------
 fs/ceph/super.h |  8 ++++++++
 2 files changed, 32 insertions(+), 23 deletions(-)

diff --git a/fs/ceph/inode.c b/fs/ceph/inode.c
index 7478bd0283c1..4056c7968b86 100644
--- a/fs/ceph/inode.c
+++ b/fs/ceph/inode.c
@@ -728,11 +728,11 @@ void ceph_fill_file_time(struct inode *inode, int issued,
  * Populate an inode based on info from mds.  May be called on new or
  * existing inodes.
  */
-static int fill_inode(struct inode *inode, struct page *locked_page,
-		      struct ceph_mds_reply_info_in *iinfo,
-		      struct ceph_mds_reply_dirfrag *dirinfo,
-		      struct ceph_mds_session *session, int cap_fmode,
-		      struct ceph_cap_reservation *caps_reservation)
+int ceph_fill_inode(struct inode *inode, struct page *locked_page,
+		    struct ceph_mds_reply_info_in *iinfo,
+		    struct ceph_mds_reply_dirfrag *dirinfo,
+		    struct ceph_mds_session *session, int cap_fmode,
+		    struct ceph_cap_reservation *caps_reservation)
 {
 	struct ceph_mds_client *mdsc = ceph_inode_to_client(inode)->mdsc;
 	struct ceph_mds_reply_inode *info = iinfo->in;
@@ -749,7 +749,7 @@ static int fill_inode(struct inode *inode, struct page *locked_page,
 	bool new_version = false;
 	bool fill_inline = false;
 
-	dout("fill_inode %p ino %llx.%llx v %llu had %llu\n",
+	dout("%s %p ino %llx.%llx v %llu had %llu\n", __func__,
 	     inode, ceph_vinop(inode), le64_to_cpu(info->version),
 	     ci->i_version);
 
@@ -770,7 +770,7 @@ static int fill_inode(struct inode *inode, struct page *locked_page,
 	if (iinfo->xattr_len > 4) {
 		xattr_blob = ceph_buffer_new(iinfo->xattr_len, GFP_NOFS);
 		if (!xattr_blob)
-			pr_err("fill_inode ENOMEM xattr blob %d bytes\n",
+			pr_err("%s ENOMEM xattr blob %d bytes\n", __func__,
 			       iinfo->xattr_len);
 	}
 
@@ -933,8 +933,9 @@ static int fill_inode(struct inode *inode, struct page *locked_page,
 			spin_unlock(&ci->i_ceph_lock);
 
 			if (symlen != i_size_read(inode)) {
-				pr_err("fill_inode %llx.%llx BAD symlink "
-					"size %lld\n", ceph_vinop(inode),
+				pr_err("%s %llx.%llx BAD symlink "
+					"size %lld\n", __func__,
+					ceph_vinop(inode),
 					i_size_read(inode));
 				i_size_write(inode, symlen);
 				inode->i_blocks = calc_inode_blocks(symlen);
@@ -958,7 +959,7 @@ static int fill_inode(struct inode *inode, struct page *locked_page,
 		inode->i_fop = &ceph_dir_fops;
 		break;
 	default:
-		pr_err("fill_inode %llx.%llx BAD mode 0%o\n",
+		pr_err("%s %llx.%llx BAD mode 0%o\n", __func__,
 		       ceph_vinop(inode), inode->i_mode);
 	}
 
@@ -1246,10 +1247,9 @@ int ceph_fill_trace(struct super_block *sb, struct ceph_mds_request *req)
 		struct inode *dir = req->r_parent;
 
 		if (dir) {
-			err = fill_inode(dir, NULL,
-					 &rinfo->diri, rinfo->dirfrag,
-					 session, -1,
-					 &req->r_caps_reservation);
+			err = ceph_fill_inode(dir, NULL, &rinfo->diri,
+					      rinfo->dirfrag, session, -1,
+					      &req->r_caps_reservation);
 			if (err < 0)
 				goto done;
 		} else {
@@ -1314,14 +1314,14 @@ int ceph_fill_trace(struct super_block *sb, struct ceph_mds_request *req)
 			goto done;
 		}
 
-		err = fill_inode(in, req->r_locked_page, &rinfo->targeti, NULL,
-				session,
+		err = ceph_fill_inode(in, req->r_locked_page, &rinfo->targeti,
+				NULL, session,
 				(!test_bit(CEPH_MDS_R_ABORTED, &req->r_req_flags) &&
 				 !test_bit(CEPH_MDS_R_ASYNC, &req->r_req_flags) &&
 				 rinfo->head->result == 0) ?  req->r_fmode : -1,
 				&req->r_caps_reservation);
 		if (err < 0) {
-			pr_err("fill_inode badness %p %llx.%llx\n",
+			pr_err("ceph_fill_inode badness %p %llx.%llx\n",
 				in, ceph_vinop(in));
 			if (in->i_state & I_NEW)
 				discard_new_inode(in);
@@ -1508,10 +1508,11 @@ static int readdir_prepopulate_inodes_only(struct ceph_mds_request *req,
 			dout("new_inode badness got %d\n", err);
 			continue;
 		}
-		rc = fill_inode(in, NULL, &rde->inode, NULL, session,
-				-1, &req->r_caps_reservation);
+		rc = ceph_fill_inode(in, NULL, &rde->inode, NULL, session,
+				     -1, &req->r_caps_reservation);
 		if (rc < 0) {
-			pr_err("fill_inode badness on %p got %d\n", in, rc);
+			pr_err("ceph_fill_inode badness on %p got %d\n",
+			       in, rc);
 			err = rc;
 			if (in->i_state & I_NEW) {
 				ihold(in);
@@ -1715,10 +1716,10 @@ int ceph_readdir_prepopulate(struct ceph_mds_request *req,
 			}
 		}
 
-		ret = fill_inode(in, NULL, &rde->inode, NULL, session,
-				 -1, &req->r_caps_reservation);
+		ret = ceph_fill_inode(in, NULL, &rde->inode, NULL, session,
+				      -1, &req->r_caps_reservation);
 		if (ret < 0) {
-			pr_err("fill_inode badness on %p\n", in);
+			pr_err("ceph_fill_inode badness on %p\n", in);
 			if (d_really_is_negative(dn)) {
 				/* avoid calling iput_final() in mds
 				 * dispatch threads */
diff --git a/fs/ceph/super.h b/fs/ceph/super.h
index 1b4996efc111..47fb6e022339 100644
--- a/fs/ceph/super.h
+++ b/fs/ceph/super.h
@@ -895,6 +895,9 @@ static inline bool __ceph_have_pending_cap_snap(struct ceph_inode_info *ci)
 }
 
 /* inode.c */
+struct ceph_mds_reply_info_in;
+struct ceph_mds_reply_dirfrag;
+
 extern const struct inode_operations ceph_file_iops;
 
 extern struct inode *ceph_alloc_inode(struct super_block *sb);
@@ -910,6 +913,11 @@ extern void ceph_fill_file_time(struct inode *inode, int issued,
 				u64 time_warp_seq, struct timespec64 *ctime,
 				struct timespec64 *mtime,
 				struct timespec64 *atime);
+extern int ceph_fill_inode(struct inode *inode, struct page *locked_page,
+		    struct ceph_mds_reply_info_in *iinfo,
+		    struct ceph_mds_reply_dirfrag *dirinfo,
+		    struct ceph_mds_session *session, int cap_fmode,
+		    struct ceph_cap_reservation *caps_reservation);
 extern int ceph_fill_trace(struct super_block *sb,
 			   struct ceph_mds_request *req);
 extern int ceph_readdir_prepopulate(struct ceph_mds_request *req,
-- 
2.24.1

^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [PATCH v5 09/12] ceph: decode interval_sets for delegated inos
  2020-02-19 13:25 [PATCH v5 00/12] ceph: async directory operations support Jeff Layton
                   ` (7 preceding siblings ...)
  2020-02-19 13:25 ` [PATCH v5 08/12] ceph: make ceph_fill_inode non-static Jeff Layton
@ 2020-02-19 13:25 ` Jeff Layton
  2020-02-19 13:25 ` [PATCH v5 10/12] ceph: add new MDS req field to hold delegated inode number Jeff Layton
                   ` (2 subsequent siblings)
  11 siblings, 0 replies; 24+ messages in thread
From: Jeff Layton @ 2020-02-19 13:25 UTC (permalink / raw)
  To: ceph-devel; +Cc: idryomov, sage, zyan, pdonnell, xiubli

Starting in Octopus, the MDS will hand out caps that allow the client
to do asynchronous file creates under certain conditions. As part of
that, the MDS will delegate ranges of inode numbers to the client.

Add the infrastructure to decode these ranges, and stuff them into an
xarray for later consumption by the async creation code.

Because the xarray code currently only handles unsigned long indexes,
and those are 32-bits on 32-bit arches, we only enable the decoding when
running on a 64-bit arch.

Signed-off-by: Jeff Layton <jlayton@kernel.org>
---
 fs/ceph/mds_client.c | 122 +++++++++++++++++++++++++++++++++++++++----
 fs/ceph/mds_client.h |   9 +++-
 2 files changed, 121 insertions(+), 10 deletions(-)

diff --git a/fs/ceph/mds_client.c b/fs/ceph/mds_client.c
index ef3dd6fe2f4d..831578798b6e 100644
--- a/fs/ceph/mds_client.c
+++ b/fs/ceph/mds_client.c
@@ -415,21 +415,121 @@ static int parse_reply_info_filelock(void **p, void *end,
 	return -EIO;
 }
 
+
+#if BITS_PER_LONG == 64
+
+#define DELEGATED_INO_AVAILABLE		xa_mk_value(1)
+
+static int ceph_parse_deleg_inos(void **p, void *end,
+				 struct ceph_mds_session *s)
+{
+	u32 sets;
+
+	ceph_decode_32_safe(p, end, sets, bad);
+	dout("got %u sets of delegated inodes\n", sets);
+	while (sets--) {
+		u64 start, len, ino;
+
+		ceph_decode_64_safe(p, end, start, bad);
+		ceph_decode_64_safe(p, end, len, bad);
+		while (len--) {
+			int err = xa_insert(&s->s_delegated_inos, ino = start++,
+					    DELEGATED_INO_AVAILABLE,
+					    GFP_KERNEL);
+			if (!err) {
+				dout("added delegated inode 0x%llx\n",
+				     start - 1);
+			} else if (err == -EBUSY) {
+				pr_warn("ceph: MDS delegated inode 0x%llx more than once.\n",
+					start - 1);
+			} else {
+				return err;
+			}
+		}
+	}
+	return 0;
+bad:
+	return -EIO;
+}
+
+u64 ceph_get_deleg_ino(struct ceph_mds_session *s)
+{
+	unsigned long ino;
+	void *val;
+
+	xa_for_each(&s->s_delegated_inos, ino, val) {
+		val = xa_erase(&s->s_delegated_inos, ino);
+		if (val == DELEGATED_INO_AVAILABLE)
+			return ino;
+	}
+	return 0;
+}
+
+int ceph_restore_deleg_ino(struct ceph_mds_session *s, u64 ino)
+{
+	return xa_insert(&s->s_delegated_inos, ino, DELEGATED_INO_AVAILABLE,
+			 GFP_KERNEL);
+}
+#else /* BITS_PER_LONG == 64 */
+/*
+ * FIXME: xarrays can't handle 64-bit indexes on a 32-bit arch. For now, just
+ * ignore delegated_inos on 32 bit arch. Maybe eventually add xarrays for top
+ * and bottom words?
+ */
+static int ceph_parse_deleg_inos(void **p, void *end,
+				 struct ceph_mds_session *s)
+{
+	u32 sets;
+
+	ceph_decode_32_safe(p, end, sets, bad);
+	if (sets)
+		ceph_decode_skip_n(p, end, sets * 2 * sizeof(__le64), bad);
+	return 0;
+bad:
+	return -EIO;
+}
+
+u64 ceph_get_deleg_ino(struct ceph_mds_session *s)
+{
+	return 0;
+}
+
+int ceph_restore_deleg_ino(struct ceph_mds_session *s, u64 ino)
+{
+	return 0;
+}
+#endif /* BITS_PER_LONG == 64 */
+
 /*
  * parse create results
  */
 static int parse_reply_info_create(void **p, void *end,
 				  struct ceph_mds_reply_info_parsed *info,
-				  u64 features)
+				  u64 features, struct ceph_mds_session *s)
 {
+	int ret;
+
 	if (features == (u64)-1 ||
 	    (features & CEPH_FEATURE_REPLY_CREATE_INODE)) {
-		/* Malformed reply? */
 		if (*p == end) {
+			/* Malformed reply? */
 			info->has_create_ino = false;
-		} else {
+		} else if (test_bit(CEPHFS_FEATURE_DELEG_INO, &s->s_features)) {
+			u8 struct_v, struct_compat;
+			u32 len;
+
 			info->has_create_ino = true;
+			ceph_decode_8_safe(p, end, struct_v, bad);
+			ceph_decode_8_safe(p, end, struct_compat, bad);
+			ceph_decode_32_safe(p, end, len, bad);
+			ceph_decode_64_safe(p, end, info->ino, bad);
+			ret = ceph_parse_deleg_inos(p, end, s);
+			if (ret)
+				return ret;
+		} else {
+			/* legacy */
 			ceph_decode_64_safe(p, end, info->ino, bad);
+			info->has_create_ino = true;
 		}
 	} else {
 		if (*p != end)
@@ -448,7 +548,7 @@ static int parse_reply_info_create(void **p, void *end,
  */
 static int parse_reply_info_extra(void **p, void *end,
 				  struct ceph_mds_reply_info_parsed *info,
-				  u64 features)
+				  u64 features, struct ceph_mds_session *s)
 {
 	u32 op = le32_to_cpu(info->head->op);
 
@@ -457,7 +557,7 @@ static int parse_reply_info_extra(void **p, void *end,
 	else if (op == CEPH_MDS_OP_READDIR || op == CEPH_MDS_OP_LSSNAP)
 		return parse_reply_info_readdir(p, end, info, features);
 	else if (op == CEPH_MDS_OP_CREATE)
-		return parse_reply_info_create(p, end, info, features);
+		return parse_reply_info_create(p, end, info, features, s);
 	else
 		return -EIO;
 }
@@ -465,7 +565,7 @@ static int parse_reply_info_extra(void **p, void *end,
 /*
  * parse entire mds reply
  */
-static int parse_reply_info(struct ceph_msg *msg,
+static int parse_reply_info(struct ceph_mds_session *s, struct ceph_msg *msg,
 			    struct ceph_mds_reply_info_parsed *info,
 			    u64 features)
 {
@@ -490,7 +590,7 @@ static int parse_reply_info(struct ceph_msg *msg,
 	ceph_decode_32_safe(&p, end, len, bad);
 	if (len > 0) {
 		ceph_decode_need(&p, end, len, bad);
-		err = parse_reply_info_extra(&p, p+len, info, features);
+		err = parse_reply_info_extra(&p, p+len, info, features, s);
 		if (err < 0)
 			goto out_bad;
 	}
@@ -558,6 +658,7 @@ void ceph_put_mds_session(struct ceph_mds_session *s)
 	if (refcount_dec_and_test(&s->s_ref)) {
 		if (s->s_auth.authorizer)
 			ceph_auth_destroy_authorizer(s->s_auth.authorizer);
+		xa_destroy(&s->s_delegated_inos);
 		kfree(s);
 	}
 }
@@ -645,6 +746,7 @@ static struct ceph_mds_session *register_session(struct ceph_mds_client *mdsc,
 	refcount_set(&s->s_ref, 1);
 	INIT_LIST_HEAD(&s->s_waiting);
 	INIT_LIST_HEAD(&s->s_unsafe);
+	xa_init(&s->s_delegated_inos);
 	s->s_num_cap_releases = 0;
 	s->s_cap_reconnect = 0;
 	s->s_cap_iterator = NULL;
@@ -2980,9 +3082,9 @@ static void handle_reply(struct ceph_mds_session *session, struct ceph_msg *msg)
 	dout("handle_reply tid %lld result %d\n", tid, result);
 	rinfo = &req->r_reply_info;
 	if (test_bit(CEPHFS_FEATURE_REPLY_ENCODING, &session->s_features))
-		err = parse_reply_info(msg, rinfo, (u64)-1);
+		err = parse_reply_info(session, msg, rinfo, (u64)-1);
 	else
-		err = parse_reply_info(msg, rinfo, session->s_con.peer_features);
+		err = parse_reply_info(session, msg, rinfo, session->s_con.peer_features);
 	mutex_unlock(&mdsc->mutex);
 
 	mutex_lock(&session->s_mutex);
@@ -3678,6 +3780,8 @@ static void send_mds_reconnect(struct ceph_mds_client *mdsc,
 	if (!reply)
 		goto fail_nomsg;
 
+	xa_destroy(&session->s_delegated_inos);
+
 	mutex_lock(&session->s_mutex);
 	session->s_state = CEPH_MDS_SESSION_RECONNECTING;
 	session->s_seq = 0;
diff --git a/fs/ceph/mds_client.h b/fs/ceph/mds_client.h
index f10d342ea585..4c3b71707470 100644
--- a/fs/ceph/mds_client.h
+++ b/fs/ceph/mds_client.h
@@ -23,8 +23,9 @@ enum ceph_feature_type {
 	CEPHFS_FEATURE_RECLAIM_CLIENT,
 	CEPHFS_FEATURE_LAZY_CAP_WANTED,
 	CEPHFS_FEATURE_MULTI_RECONNECT,
+	CEPHFS_FEATURE_DELEG_INO,
 
-	CEPHFS_FEATURE_MAX = CEPHFS_FEATURE_MULTI_RECONNECT,
+	CEPHFS_FEATURE_MAX = CEPHFS_FEATURE_DELEG_INO,
 };
 
 /*
@@ -37,6 +38,7 @@ enum ceph_feature_type {
 	CEPHFS_FEATURE_REPLY_ENCODING,		\
 	CEPHFS_FEATURE_LAZY_CAP_WANTED,		\
 	CEPHFS_FEATURE_MULTI_RECONNECT,		\
+	CEPHFS_FEATURE_DELEG_INO,		\
 						\
 	CEPHFS_FEATURE_MAX,			\
 }
@@ -201,6 +203,7 @@ struct ceph_mds_session {
 
 	struct list_head  s_waiting;  /* waiting requests */
 	struct list_head  s_unsafe;   /* unsafe requests */
+	struct xarray	  s_delegated_inos;
 };
 
 /*
@@ -542,6 +545,7 @@ extern void ceph_mdsc_open_export_target_sessions(struct ceph_mds_client *mdsc,
 extern int ceph_trim_caps(struct ceph_mds_client *mdsc,
 			  struct ceph_mds_session *session,
 			  int max_caps);
+
 static inline int ceph_wait_on_async_create(struct inode *inode)
 {
 	struct ceph_inode_info *ci = ceph_inode(inode);
@@ -549,4 +553,7 @@ static inline int ceph_wait_on_async_create(struct inode *inode)
 	return wait_on_bit(&ci->i_ceph_flags, CEPH_ASYNC_CREATE_BIT,
 			   TASK_INTERRUPTIBLE);
 }
+
+extern u64 ceph_get_deleg_ino(struct ceph_mds_session *session);
+extern int ceph_restore_deleg_ino(struct ceph_mds_session *session, u64 ino);
 #endif
-- 
2.24.1

^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [PATCH v5 10/12] ceph: add new MDS req field to hold delegated inode number
  2020-02-19 13:25 [PATCH v5 00/12] ceph: async directory operations support Jeff Layton
                   ` (8 preceding siblings ...)
  2020-02-19 13:25 ` [PATCH v5 09/12] ceph: decode interval_sets for delegated inos Jeff Layton
@ 2020-02-19 13:25 ` Jeff Layton
  2020-02-19 13:25 ` [PATCH v5 11/12] ceph: cache layout in parent dir on first sync create Jeff Layton
  2020-02-19 13:25 ` [PATCH v5 12/12] ceph: attempt to do async create when possible Jeff Layton
  11 siblings, 0 replies; 24+ messages in thread
From: Jeff Layton @ 2020-02-19 13:25 UTC (permalink / raw)
  To: ceph-devel; +Cc: idryomov, sage, zyan, pdonnell, xiubli

Add new request field to hold the delegated inode number. Encode that
into the message when it's set.

Signed-off-by: Jeff Layton <jlayton@kernel.org>
---
 fs/ceph/mds_client.c | 3 +--
 fs/ceph/mds_client.h | 1 +
 2 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/fs/ceph/mds_client.c b/fs/ceph/mds_client.c
index 831578798b6e..925f6ca334b9 100644
--- a/fs/ceph/mds_client.c
+++ b/fs/ceph/mds_client.c
@@ -2477,7 +2477,7 @@ static struct ceph_msg *create_request_message(struct ceph_mds_client *mdsc,
 	head->op = cpu_to_le32(req->r_op);
 	head->caller_uid = cpu_to_le32(from_kuid(&init_user_ns, req->r_uid));
 	head->caller_gid = cpu_to_le32(from_kgid(&init_user_ns, req->r_gid));
-	head->ino = 0;
+	head->ino = cpu_to_le64(req->r_deleg_ino);
 	head->args = req->r_args;
 
 	ceph_encode_filepath(&p, end, ino1, path1);
@@ -2638,7 +2638,6 @@ static int __prepare_send_request(struct ceph_mds_client *mdsc,
 	rhead->flags = cpu_to_le32(flags);
 	rhead->num_fwd = req->r_num_fwd;
 	rhead->num_retry = req->r_attempts - 1;
-	rhead->ino = 0;
 
 	dout(" r_parent = %p\n", req->r_parent);
 	return 0;
diff --git a/fs/ceph/mds_client.h b/fs/ceph/mds_client.h
index 4c3b71707470..4e5be79bf080 100644
--- a/fs/ceph/mds_client.h
+++ b/fs/ceph/mds_client.h
@@ -313,6 +313,7 @@ struct ceph_mds_request {
 	int               r_num_fwd;    /* number of forward attempts */
 	int               r_resend_mds; /* mds to resend to next, if any*/
 	u32               r_sent_on_mseq; /* cap mseq request was sent at*/
+	u64		  r_deleg_ino;
 
 	struct list_head  r_wait;
 	struct completion r_completion;
-- 
2.24.1

^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [PATCH v5 11/12] ceph: cache layout in parent dir on first sync create
  2020-02-19 13:25 [PATCH v5 00/12] ceph: async directory operations support Jeff Layton
                   ` (9 preceding siblings ...)
  2020-02-19 13:25 ` [PATCH v5 10/12] ceph: add new MDS req field to hold delegated inode number Jeff Layton
@ 2020-02-19 13:25 ` Jeff Layton
  2020-02-19 13:25 ` [PATCH v5 12/12] ceph: attempt to do async create when possible Jeff Layton
  11 siblings, 0 replies; 24+ messages in thread
From: Jeff Layton @ 2020-02-19 13:25 UTC (permalink / raw)
  To: ceph-devel; +Cc: idryomov, sage, zyan, pdonnell, xiubli

If a create is done, then typically we'll end up writing to the file
soon afterward. We don't want to wait for the reply before doing that
when doing an async create, so that means we need the layout for the
new file before we've gotten the response from the MDS.

All files created in a directory will initially inherit the same layout,
so copy off the requisite info from the first synchronous create in the
directory, and save it in a new i_cached_layout field. Zero out the
layout when we lose Dc caps in the dir.

Signed-off-by: Jeff Layton <jlayton@kernel.org>
---
 fs/ceph/caps.c       | 13 ++++++++++---
 fs/ceph/file.c       | 22 +++++++++++++++++++++-
 fs/ceph/inode.c      |  2 ++
 fs/ceph/mds_client.c |  7 ++++++-
 fs/ceph/super.h      |  1 +
 5 files changed, 40 insertions(+), 5 deletions(-)

diff --git a/fs/ceph/caps.c b/fs/ceph/caps.c
index c96b18407aef..c85dee8b8fcf 100644
--- a/fs/ceph/caps.c
+++ b/fs/ceph/caps.c
@@ -561,14 +561,14 @@ static void __cap_delay_cancel(struct ceph_mds_client *mdsc,
 	spin_unlock(&mdsc->cap_delay_lock);
 }
 
-/*
- * Common issue checks for add_cap, handle_cap_grant.
- */
+/* Common issue checks for add_cap, handle_cap_grant. */
 static void __check_cap_issue(struct ceph_inode_info *ci, struct ceph_cap *cap,
 			      unsigned issued)
 {
 	unsigned had = __ceph_caps_issued(ci, NULL);
 
+	lockdep_assert_held(&ci->i_ceph_lock);
+
 	/*
 	 * Each time we receive FILE_CACHE anew, we increment
 	 * i_rdcache_gen.
@@ -593,6 +593,13 @@ static void __check_cap_issue(struct ceph_inode_info *ci, struct ceph_cap *cap,
 			__ceph_dir_clear_complete(ci);
 		}
 	}
+
+	/* Wipe saved layout if we're losing DIR_CREATE caps */
+	if (S_ISDIR(ci->vfs_inode.i_mode) && (had & CEPH_CAP_DIR_CREATE) &&
+		!(issued & CEPH_CAP_DIR_CREATE)) {
+	     ceph_put_string(rcu_dereference_raw(ci->i_cached_layout.pool_ns));
+	     memset(&ci->i_cached_layout, 0, sizeof(ci->i_cached_layout));
+	}
 }
 
 /*
diff --git a/fs/ceph/file.c b/fs/ceph/file.c
index 7e0190b1f821..472d90ccdf44 100644
--- a/fs/ceph/file.c
+++ b/fs/ceph/file.c
@@ -430,6 +430,23 @@ int ceph_open(struct inode *inode, struct file *file)
 	return err;
 }
 
+/* Clone the layout from a synchronous create, if the dir now has Dc caps */
+static void
+cache_file_layout(struct inode *dst, struct inode *src)
+{
+	struct ceph_inode_info *cdst = ceph_inode(dst);
+	struct ceph_inode_info *csrc = ceph_inode(src);
+
+	spin_lock(&cdst->i_ceph_lock);
+	if ((__ceph_caps_issued(cdst, NULL) & CEPH_CAP_DIR_CREATE) &&
+	    !ceph_file_layout_is_valid(&cdst->i_cached_layout)) {
+		memcpy(&cdst->i_cached_layout, &csrc->i_layout,
+			sizeof(cdst->i_cached_layout));
+		rcu_assign_pointer(cdst->i_cached_layout.pool_ns,
+				   ceph_try_get_string(csrc->i_layout.pool_ns));
+	}
+	spin_unlock(&cdst->i_ceph_lock);
+}
 
 /*
  * Do a lookup + open with a single request.  If we get a non-existent
@@ -518,7 +535,10 @@ int ceph_atomic_open(struct inode *dir, struct dentry *dentry,
 	} else {
 		dout("atomic_open finish_open on dn %p\n", dn);
 		if (req->r_op == CEPH_MDS_OP_CREATE && req->r_reply_info.has_create_ino) {
-			ceph_init_inode_acls(d_inode(dentry), &as_ctx);
+			struct inode *newino = d_inode(dentry);
+
+			cache_file_layout(dir, newino);
+			ceph_init_inode_acls(newino, &as_ctx);
 			file->f_mode |= FMODE_CREATED;
 		}
 		err = finish_open(file, dentry, ceph_open);
diff --git a/fs/ceph/inode.c b/fs/ceph/inode.c
index 4056c7968b86..73f986efb1fd 100644
--- a/fs/ceph/inode.c
+++ b/fs/ceph/inode.c
@@ -447,6 +447,7 @@ struct inode *ceph_alloc_inode(struct super_block *sb)
 	ci->i_max_files = 0;
 
 	memset(&ci->i_dir_layout, 0, sizeof(ci->i_dir_layout));
+	memset(&ci->i_cached_layout, 0, sizeof(ci->i_cached_layout));
 	RCU_INIT_POINTER(ci->i_layout.pool_ns, NULL);
 
 	ci->i_fragtree = RB_ROOT;
@@ -587,6 +588,7 @@ void ceph_evict_inode(struct inode *inode)
 		ceph_buffer_put(ci->i_xattrs.prealloc_blob);
 
 	ceph_put_string(rcu_dereference_raw(ci->i_layout.pool_ns));
+	ceph_put_string(rcu_dereference_raw(ci->i_cached_layout.pool_ns));
 }
 
 static inline blkcnt_t calc_inode_blocks(u64 size)
diff --git a/fs/ceph/mds_client.c b/fs/ceph/mds_client.c
index 925f6ca334b9..6352e66d915f 100644
--- a/fs/ceph/mds_client.c
+++ b/fs/ceph/mds_client.c
@@ -3535,8 +3535,13 @@ static int reconnect_caps_cb(struct inode *inode, struct ceph_cap *cap,
 	cap->cap_gen = cap->session->s_cap_gen;
 
 	/* These are lost when the session goes away */
-	if (S_ISDIR(inode->i_mode))
+	if (S_ISDIR(inode->i_mode)) {
+		if (cap->issued & CEPH_CAP_DIR_CREATE) {
+			ceph_put_string(rcu_dereference_raw(ci->i_cached_layout.pool_ns));
+			memset(&ci->i_cached_layout, 0, sizeof(ci->i_cached_layout));
+		}
 		cap->issued &= ~CEPH_CAP_ANY_DIR_OPS;
+	}
 
 	if (recon_state->msg_version >= 2) {
 		rec.v2.cap_id = cpu_to_le64(cap->cap_id);
diff --git a/fs/ceph/super.h b/fs/ceph/super.h
index 47fb6e022339..60701a2e36b3 100644
--- a/fs/ceph/super.h
+++ b/fs/ceph/super.h
@@ -326,6 +326,7 @@ struct ceph_inode_info {
 
 	struct ceph_dir_layout i_dir_layout;
 	struct ceph_file_layout i_layout;
+	struct ceph_file_layout i_cached_layout;	// for async creates
 	char *i_symlink;
 
 	/* for dirs */
-- 
2.24.1

^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [PATCH v5 12/12] ceph: attempt to do async create when possible
  2020-02-19 13:25 [PATCH v5 00/12] ceph: async directory operations support Jeff Layton
                   ` (10 preceding siblings ...)
  2020-02-19 13:25 ` [PATCH v5 11/12] ceph: cache layout in parent dir on first sync create Jeff Layton
@ 2020-02-19 13:25 ` Jeff Layton
  11 siblings, 0 replies; 24+ messages in thread
From: Jeff Layton @ 2020-02-19 13:25 UTC (permalink / raw)
  To: ceph-devel; +Cc: idryomov, sage, zyan, pdonnell, xiubli

With the Octopus release, the MDS will hand out directory create caps.

If we have Fxc caps on the directory, and complete directory information
or a known negative dentry, then we can return without waiting on the
reply, allowing the open() call to return very quickly to userland.

We use the normal ceph_fill_inode() routine to fill in the inode, so we
have to gin up some reply inode information with what we'd expect the
newly-created inode to have. The client assumes that it has a full set
of caps on the new inode, and that the MDS will revoke them when there
is conflicting access.

This functionality is gated on the wsync/nowsync mount options.

Signed-off-by: Jeff Layton <jlayton@kernel.org>
---
 fs/ceph/file.c               | 248 +++++++++++++++++++++++++++++++++--
 include/linux/ceph/ceph_fs.h |   3 +
 2 files changed, 241 insertions(+), 10 deletions(-)

diff --git a/fs/ceph/file.c b/fs/ceph/file.c
index 472d90ccdf44..d8041638319d 100644
--- a/fs/ceph/file.c
+++ b/fs/ceph/file.c
@@ -448,6 +448,210 @@ cache_file_layout(struct inode *dst, struct inode *src)
 	spin_unlock(&cdst->i_ceph_lock);
 }
 
+/*
+ * Try to set up an async create. We need caps, a file layout, and inode number,
+ * and either a lease on the dentry or complete dir info. If any of those
+ * criteria are not satisfied, then return false and the caller can go
+ * synchronous.
+ */
+static int try_prep_async_create(struct inode *dir, struct dentry *dentry,
+				 struct ceph_file_layout *lo, u64 *pino)
+{
+	struct ceph_inode_info *ci = ceph_inode(dir);
+	struct ceph_dentry_info *di = ceph_dentry(dentry);
+	int got = 0, want = CEPH_CAP_FILE_EXCL | CEPH_CAP_DIR_CREATE;
+	u64 ino;
+
+	spin_lock(&ci->i_ceph_lock);
+	/* No auth cap means no chance for Dc caps */
+	if (!ci->i_auth_cap)
+		goto no_async;
+
+	/* Any delegated inos? */
+	if (xa_empty(&ci->i_auth_cap->session->s_delegated_inos))
+		goto no_async;
+
+	if (!ceph_file_layout_is_valid(&ci->i_cached_layout))
+		goto no_async;
+
+	if ((__ceph_caps_issued(ci, NULL) & want) != want)
+		goto no_async;
+
+	if (d_in_lookup(dentry)) {
+		if (!__ceph_dir_is_complete(ci))
+			goto no_async;
+	} else if (atomic_read(&ci->i_shared_gen) !=
+		   READ_ONCE(di->lease_shared_gen)) {
+		goto no_async;
+	}
+
+	ino = ceph_get_deleg_ino(ci->i_auth_cap->session);
+	if (!ino)
+		goto no_async;
+
+	*pino = ino;
+	ceph_take_cap_refs(ci, want, false);
+	memcpy(lo, &ci->i_cached_layout, sizeof(*lo));
+	rcu_assign_pointer(lo->pool_ns,
+			   ceph_try_get_string(ci->i_cached_layout.pool_ns));
+	got = want;
+no_async:
+	spin_unlock(&ci->i_ceph_lock);
+	return got;
+}
+
+static void restore_deleg_ino(struct inode *dir, u64 ino)
+{
+	struct ceph_inode_info *ci = ceph_inode(dir);
+	struct ceph_mds_session *s = NULL;
+
+	spin_lock(&ci->i_ceph_lock);
+	if (ci->i_auth_cap)
+		s = ceph_get_mds_session(ci->i_auth_cap->session);
+	spin_unlock(&ci->i_ceph_lock);
+	if (s) {
+		int err = ceph_restore_deleg_ino(s, ino);
+		if (err)
+			pr_warn("ceph: unable to restore delegated ino 0x%llx to session: %d\n",
+				ino, err);
+		ceph_put_mds_session(s);
+	}
+}
+
+static void ceph_async_create_cb(struct ceph_mds_client *mdsc,
+                                 struct ceph_mds_request *req)
+{
+	int result = req->r_err ? req->r_err :
+			le32_to_cpu(req->r_reply_info.head->result);
+
+	mapping_set_error(req->r_parent->i_mapping, result);
+
+	if (result) {
+		struct dentry *dentry = req->r_dentry;
+		int pathlen;
+		u64 base;
+		char *path = ceph_mdsc_build_path(req->r_dentry, &pathlen,
+						  &base, 0);
+
+		ceph_dir_clear_complete(req->r_parent);
+		if (!d_unhashed(dentry))
+			d_drop(dentry);
+
+		/* FIXME: start returning I/O errors on all accesses? */
+		pr_warn("ceph: async create failure path=(%llx)%s result=%d!\n",
+			base, IS_ERR(path) ? "<<bad>>" : path, result);
+		ceph_mdsc_free_path(path, pathlen);
+	}
+
+	if (req->r_target_inode) {
+		struct ceph_inode_info *ci = ceph_inode(req->r_target_inode);
+		u64 ino = ceph_vino(req->r_target_inode).ino;
+
+		if (req->r_deleg_ino != ino)
+			pr_warn("%s: inode number mismatch! err=%d deleg_ino=0x%llx target=0x%llx\n",
+				__func__, req->r_err, req->r_deleg_ino, ino);
+		mapping_set_error(req->r_target_inode->i_mapping, result);
+
+		spin_lock(&ci->i_ceph_lock);
+		if (ci->i_ceph_flags & CEPH_I_ASYNC_CREATE) {
+			ci->i_ceph_flags &= ~CEPH_I_ASYNC_CREATE;
+			wake_up_bit(&ci->i_ceph_flags, CEPH_ASYNC_CREATE_BIT);
+		}
+		spin_unlock(&ci->i_ceph_lock);
+	} else {
+		pr_warn("%s: no req->r_target_inode for 0x%llx\n", __func__,
+			req->r_deleg_ino);
+	}
+}
+
+static int ceph_finish_async_create(struct inode *dir, struct dentry *dentry,
+				    struct file *file, umode_t mode,
+				    struct ceph_mds_request *req,
+				    struct ceph_acl_sec_ctx *as_ctx,
+				    struct ceph_file_layout *lo)
+{
+	int ret;
+	char xattr_buf[4];
+	struct ceph_mds_reply_inode in = { };
+	struct ceph_mds_reply_info_in iinfo = { .in = &in };
+	struct ceph_inode_info *ci = ceph_inode(dir);
+	struct inode *inode;
+	struct timespec64 now;
+	struct ceph_vino vino = { .ino = req->r_deleg_ino,
+				  .snap = CEPH_NOSNAP };
+
+	ktime_get_real_ts64(&now);
+
+	inode = ceph_get_inode(dentry->d_sb, vino);
+	if (IS_ERR(inode))
+		return PTR_ERR(inode);
+
+	iinfo.inline_version = CEPH_INLINE_NONE;
+	iinfo.change_attr = 1;
+	ceph_encode_timespec64(&iinfo.btime, &now);
+
+	iinfo.xattr_len = ARRAY_SIZE(xattr_buf);
+	iinfo.xattr_data = xattr_buf;
+	memset(iinfo.xattr_data, 0, iinfo.xattr_len);
+
+	in.ino = cpu_to_le64(vino.ino);
+	in.snapid = cpu_to_le64(CEPH_NOSNAP);
+	in.version = cpu_to_le64(1);	// ???
+	in.cap.caps = in.cap.wanted = cpu_to_le32(CEPH_CAP_ALL_FILE);
+	in.cap.cap_id = cpu_to_le64(1);
+	in.cap.realm = cpu_to_le64(ci->i_snap_realm->ino);
+	in.cap.flags = CEPH_CAP_FLAG_AUTH;
+	in.ctime = in.mtime = in.atime = iinfo.btime;
+	in.mode = cpu_to_le32((u32)mode);
+	in.truncate_seq = cpu_to_le32(1);
+	in.truncate_size = cpu_to_le64(-1ULL);
+	in.xattr_version = cpu_to_le64(1);
+	in.uid = cpu_to_le32(from_kuid(&init_user_ns, current_fsuid()));
+	in.gid = cpu_to_le32(from_kgid(&init_user_ns, dir->i_mode & S_ISGID ?
+				dir->i_gid : current_fsgid()));
+	in.nlink = cpu_to_le32(1);
+	in.max_size = cpu_to_le64(lo->stripe_unit);
+
+	ceph_file_layout_to_legacy(lo, &in.layout);
+
+	ret = ceph_fill_inode(inode, NULL, &iinfo, NULL, req->r_session,
+			      req->r_fmode, NULL);
+	if (ret) {
+		dout("%s failed to fill inode: %d\n", __func__, ret);
+		ceph_dir_clear_complete(dir);
+		if (!d_unhashed(dentry))
+			d_drop(dentry);
+		if (inode->i_state & I_NEW)
+			discard_new_inode(inode);
+	} else {
+		struct dentry *dn;
+
+		dout("%s d_adding new inode 0x%llx to 0x%lx/%s\n", __func__,
+			vino.ino, dir->i_ino, dentry->d_name.name);
+		ceph_dir_clear_ordered(dir);
+		ceph_init_inode_acls(inode, as_ctx);
+		if (inode->i_state & I_NEW) {
+			/*
+			 * If it's not I_NEW, then someone created this before
+			 * we got here. Assume the server is aware of it at
+			 * that point and don't worry about setting
+			 * CEPH_I_ASYNC_CREATE.
+			 */
+			ceph_inode(inode)->i_ceph_flags = CEPH_I_ASYNC_CREATE;
+			unlock_new_inode(inode);
+		}
+		if (d_in_lookup(dentry) || d_really_is_negative(dentry)) {
+			if (!d_unhashed(dentry))
+				d_drop(dentry);
+			dn = d_splice_alias(inode, dentry);
+			WARN_ON_ONCE(dn && dn != dentry);
+		}
+		file->f_mode |= FMODE_CREATED;
+		ret = finish_open(file, dentry, ceph_open);
+	}
+	return ret;
+}
+
 /*
  * Do a lookup + open with a single request.  If we get a non-existent
  * file or symlink, return 1 so the VFS can retry.
@@ -460,6 +664,7 @@ int ceph_atomic_open(struct inode *dir, struct dentry *dentry,
 	struct ceph_mds_request *req;
 	struct dentry *dn;
 	struct ceph_acl_sec_ctx as_ctx = {};
+	bool try_async = ceph_test_mount_opt(fsc, ASYNC_DIROPS);
 	int mask;
 	int err;
 
@@ -483,7 +688,7 @@ int ceph_atomic_open(struct inode *dir, struct dentry *dentry,
 		/* If it's not being looked up, it's negative */
 		return -ENOENT;
 	}
-
+retry:
 	/* do the open */
 	req = prepare_open_request(dir->i_sb, flags, mode);
 	if (IS_ERR(req)) {
@@ -492,28 +697,50 @@ int ceph_atomic_open(struct inode *dir, struct dentry *dentry,
 	}
 	req->r_dentry = dget(dentry);
 	req->r_num_caps = 2;
+	mask = CEPH_STAT_CAP_INODE | CEPH_CAP_AUTH_SHARED;
+	if (ceph_security_xattr_wanted(dir))
+		mask |= CEPH_CAP_XATTR_SHARED;
+	req->r_args.open.mask = cpu_to_le32(mask);
+	req->r_parent = dir;
+
 	if (flags & O_CREAT) {
+		struct ceph_file_layout lo;
+
 		req->r_dentry_drop = CEPH_CAP_FILE_SHARED | CEPH_CAP_AUTH_EXCL;
 		req->r_dentry_unless = CEPH_CAP_FILE_EXCL;
 		if (as_ctx.pagelist) {
 			req->r_pagelist = as_ctx.pagelist;
 			as_ctx.pagelist = NULL;
 		}
+		if (try_async &&
+		    (req->r_dir_caps =
+		      try_prep_async_create(dir, dentry, &lo,
+					    &req->r_deleg_ino))) {
+			set_bit(CEPH_MDS_R_ASYNC, &req->r_req_flags);
+			req->r_args.open.flags |= cpu_to_le32(CEPH_O_EXCL);
+			req->r_callback = ceph_async_create_cb;
+			err = ceph_mdsc_submit_request(mdsc, dir, req);
+			if (!err) {
+				err = ceph_finish_async_create(dir, dentry,
+							file, mode, req,
+							&as_ctx, &lo);
+			} else if (err == -EJUKEBOX) {
+				restore_deleg_ino(dir, req->r_deleg_ino);
+				ceph_mdsc_put_request(req);
+				try_async = false;
+				goto retry;
+			}
+			goto out_req;
+		}
 	}
 
-       mask = CEPH_STAT_CAP_INODE | CEPH_CAP_AUTH_SHARED;
-       if (ceph_security_xattr_wanted(dir))
-               mask |= CEPH_CAP_XATTR_SHARED;
-       req->r_args.open.mask = cpu_to_le32(mask);
-
-	req->r_parent = dir;
 	set_bit(CEPH_MDS_R_PARENT_LOCKED, &req->r_req_flags);
 	err = ceph_mdsc_do_request(mdsc,
 				   (flags & (O_CREAT|O_TRUNC)) ? dir : NULL,
 				   req);
 	err = ceph_handle_snapdir(req, dentry, err);
 	if (err)
-		goto out_req;
+		goto out_fmode;
 
 	if ((flags & O_CREAT) && !req->r_reply_info.head->is_dentry)
 		err = ceph_handle_notrace_create(dir, dentry);
@@ -527,7 +754,7 @@ int ceph_atomic_open(struct inode *dir, struct dentry *dentry,
 		dn = NULL;
 	}
 	if (err)
-		goto out_req;
+		goto out_fmode;
 	if (dn || d_really_is_negative(dentry) || d_is_symlink(dentry)) {
 		/* make vfs retry on splice, ENOENT, or symlink */
 		dout("atomic_open finish_no_open on dn %p\n", dn);
@@ -543,9 +770,10 @@ int ceph_atomic_open(struct inode *dir, struct dentry *dentry,
 		}
 		err = finish_open(file, dentry, ceph_open);
 	}
-out_req:
+out_fmode:
 	if (!req->r_err && req->r_target_inode)
 		ceph_put_fmode(ceph_inode(req->r_target_inode), req->r_fmode);
+out_req:
 	ceph_mdsc_put_request(req);
 out_ctx:
 	ceph_release_acl_sec_ctx(&as_ctx);
diff --git a/include/linux/ceph/ceph_fs.h b/include/linux/ceph/ceph_fs.h
index 91d09cf37649..e035c5194005 100644
--- a/include/linux/ceph/ceph_fs.h
+++ b/include/linux/ceph/ceph_fs.h
@@ -659,6 +659,9 @@ int ceph_flags_to_mode(int flags);
 #define CEPH_CAP_ANY      (CEPH_CAP_ANY_RD | CEPH_CAP_ANY_EXCL | \
 			   CEPH_CAP_ANY_FILE_WR | CEPH_CAP_FILE_LAZYIO | \
 			   CEPH_CAP_PIN)
+#define CEPH_CAP_ALL_FILE (CEPH_CAP_PIN | CEPH_CAP_ANY_SHARED | \
+			   CEPH_CAP_AUTH_EXCL | CEPH_CAP_XATTR_EXCL | \
+			   CEPH_CAP_ANY_FILE_RD | CEPH_CAP_ANY_FILE_WR)
 
 #define CEPH_CAP_LOCKS (CEPH_LOCK_IFILE | CEPH_LOCK_IAUTH | CEPH_LOCK_ILINK | \
 			CEPH_LOCK_IXATTR)
-- 
2.24.1

^ permalink raw reply related	[flat|nested] 24+ messages in thread

* Re: [PATCH v5 03/12] ceph: add infrastructure for waiting for async create to complete
  2020-02-19 13:25 ` [PATCH v5 03/12] ceph: add infrastructure for waiting for async create to complete Jeff Layton
@ 2020-02-20  3:32   ` Yan, Zheng
  2020-02-20 13:01     ` Jeff Layton
  0 siblings, 1 reply; 24+ messages in thread
From: Yan, Zheng @ 2020-02-20  3:32 UTC (permalink / raw)
  To: Jeff Layton
  Cc: ceph-devel, Ilya Dryomov, Sage Weil, Zheng Yan, Patrick Donnelly,
	Xiubo Li

On Wed, Feb 19, 2020 at 9:27 PM Jeff Layton <jlayton@kernel.org> wrote:
>
> When we issue an async create, we must ensure that any later on-the-wire
> requests involving it wait for the create reply.
>
> Expand i_ceph_flags to be an unsigned long, and add a new bit that
> MDS requests can wait on. If the bit is set in the inode when sending
> caps, then don't send it and just return that it has been delayed.
>
> Signed-off-by: Jeff Layton <jlayton@kernel.org>
> ---
>  fs/ceph/caps.c       | 13 ++++++++++++-
>  fs/ceph/dir.c        |  2 +-
>  fs/ceph/mds_client.c | 20 +++++++++++++++++++-
>  fs/ceph/mds_client.h |  7 +++++++
>  fs/ceph/super.h      |  4 +++-
>  5 files changed, 42 insertions(+), 4 deletions(-)
>
> diff --git a/fs/ceph/caps.c b/fs/ceph/caps.c
> index d05717397c2a..85e13aa359d2 100644
> --- a/fs/ceph/caps.c
> +++ b/fs/ceph/caps.c
> @@ -511,7 +511,7 @@ static void __cap_delay_requeue(struct ceph_mds_client *mdsc,
>                                 struct ceph_inode_info *ci,
>                                 bool set_timeout)
>  {
> -       dout("__cap_delay_requeue %p flags %d at %lu\n", &ci->vfs_inode,
> +       dout("__cap_delay_requeue %p flags 0x%lx at %lu\n", &ci->vfs_inode,
>              ci->i_ceph_flags, ci->i_hold_caps_max);
>         if (!mdsc->stopping) {
>                 spin_lock(&mdsc->cap_delay_lock);
> @@ -1294,6 +1294,13 @@ static int __send_cap(struct ceph_mds_client *mdsc, struct ceph_cap *cap,
>         int delayed = 0;
>         int ret;
>
> +       /* Don't send anything if it's still being created. Return delayed */
> +       if (ci->i_ceph_flags & CEPH_I_ASYNC_CREATE) {
> +               spin_unlock(&ci->i_ceph_lock);
> +               dout("%s async create in flight for %p\n", __func__, inode);
> +               return 1;
> +       }
> +

Maybe it's better to check this in ceph_check_caps().  Other callers
of __send_cap() shouldn't encounter async creating inode

>         held = cap->issued | cap->implemented;
>         revoking = cap->implemented & ~cap->issued;
>         retain &= ~revoking;
> @@ -2250,6 +2257,10 @@ int ceph_fsync(struct file *file, loff_t start, loff_t end, int datasync)
>         if (datasync)
>                 goto out;
>
> +       ret = ceph_wait_on_async_create(inode);
> +       if (ret)
> +               goto out;
> +
>         dirty = try_flush_caps(inode, &flush_tid);
>         dout("fsync dirty caps are %s\n", ceph_cap_string(dirty));
>
> diff --git a/fs/ceph/dir.c b/fs/ceph/dir.c
> index a87274935a09..5b83bda57056 100644
> --- a/fs/ceph/dir.c
> +++ b/fs/ceph/dir.c
> @@ -752,7 +752,7 @@ static struct dentry *ceph_lookup(struct inode *dir, struct dentry *dentry,
>                 struct ceph_dentry_info *di = ceph_dentry(dentry);
>
>                 spin_lock(&ci->i_ceph_lock);
> -               dout(" dir %p flags are %d\n", dir, ci->i_ceph_flags);
> +               dout(" dir %p flags are 0x%lx\n", dir, ci->i_ceph_flags);
>                 if (strncmp(dentry->d_name.name,
>                             fsc->mount_options->snapdir_name,
>                             dentry->d_name.len) &&
> diff --git a/fs/ceph/mds_client.c b/fs/ceph/mds_client.c
> index 94d18e643a3d..38eb9dd5062b 100644
> --- a/fs/ceph/mds_client.c
> +++ b/fs/ceph/mds_client.c
> @@ -2730,7 +2730,7 @@ static void kick_requests(struct ceph_mds_client *mdsc, int mds)
>  int ceph_mdsc_submit_request(struct ceph_mds_client *mdsc, struct inode *dir,
>                               struct ceph_mds_request *req)
>  {
> -       int err;
> +       int err = 0;
>
>         /* take CAP_PIN refs for r_inode, r_parent, r_old_dentry */
>         if (req->r_inode)
> @@ -2743,6 +2743,24 @@ int ceph_mdsc_submit_request(struct ceph_mds_client *mdsc, struct inode *dir,
>                 ceph_get_cap_refs(ceph_inode(req->r_old_dentry_dir),
>                                   CEPH_CAP_PIN);
>
> +       if (req->r_inode) {
> +               err = ceph_wait_on_async_create(req->r_inode);
> +               if (err) {
> +                       dout("%s: wait for async create returned: %d\n",
> +                            __func__, err);
> +                       return err;
> +               }
> +       }
> +
> +       if (!err && req->r_old_inode) {
> +               err = ceph_wait_on_async_create(req->r_old_inode);
> +               if (err) {
> +                       dout("%s: wait for async create returned: %d\n",
> +                            __func__, err);
> +                       return err;
> +               }
> +       }
> +
>         dout("submit_request on %p for inode %p\n", req, dir);
>         mutex_lock(&mdsc->mutex);
>         __register_request(mdsc, req, dir);
> diff --git a/fs/ceph/mds_client.h b/fs/ceph/mds_client.h
> index 95ac00e59e66..8043f2b439b1 100644
> --- a/fs/ceph/mds_client.h
> +++ b/fs/ceph/mds_client.h
> @@ -538,4 +538,11 @@ extern void ceph_mdsc_open_export_target_sessions(struct ceph_mds_client *mdsc,
>  extern int ceph_trim_caps(struct ceph_mds_client *mdsc,
>                           struct ceph_mds_session *session,
>                           int max_caps);
> +static inline int ceph_wait_on_async_create(struct inode *inode)
> +{
> +       struct ceph_inode_info *ci = ceph_inode(inode);
> +
> +       return wait_on_bit(&ci->i_ceph_flags, CEPH_ASYNC_CREATE_BIT,
> +                          TASK_INTERRUPTIBLE);
> +}
>  #endif
> diff --git a/fs/ceph/super.h b/fs/ceph/super.h
> index 3430d7ffe8f7..bfb03adb4a08 100644
> --- a/fs/ceph/super.h
> +++ b/fs/ceph/super.h
> @@ -316,7 +316,7 @@ struct ceph_inode_info {
>         u64 i_inline_version;
>         u32 i_time_warp_seq;
>
> -       unsigned i_ceph_flags;
> +       unsigned long i_ceph_flags;
>         atomic64_t i_release_count;
>         atomic64_t i_ordered_count;
>         atomic64_t i_complete_seq[2];
> @@ -524,6 +524,8 @@ static inline struct inode *ceph_find_inode(struct super_block *sb,
>  #define CEPH_I_ERROR_WRITE     (1 << 10) /* have seen write errors */
>  #define CEPH_I_ERROR_FILELOCK  (1 << 11) /* have seen file lock errors */
>  #define CEPH_I_ODIRECT         (1 << 12) /* inode in direct I/O mode */
> +#define CEPH_ASYNC_CREATE_BIT  (13)      /* async create in flight for this */
> +#define CEPH_I_ASYNC_CREATE    (1 << CEPH_ASYNC_CREATE_BIT)
>
>  /*
>   * Masks of ceph inode work.
> --
> 2.24.1
>

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH v5 05/12] ceph: cap tracking for async directory operations
  2020-02-19 13:25 ` [PATCH v5 05/12] ceph: cap tracking for async directory operations Jeff Layton
@ 2020-02-20  6:42   ` Yan, Zheng
  2020-02-20 11:30     ` Jeff Layton
  0 siblings, 1 reply; 24+ messages in thread
From: Yan, Zheng @ 2020-02-20  6:42 UTC (permalink / raw)
  To: Jeff Layton
  Cc: ceph-devel, Ilya Dryomov, Sage Weil, Zheng Yan, Patrick Donnelly,
	Xiubo Li

On Wed, Feb 19, 2020 at 9:27 PM Jeff Layton <jlayton@kernel.org> wrote:
>
> Track and correctly handle directory caps for asynchronous operations.
> Add aliases for Frc caps that we now designate at Dcu caps (when dealing
> with directories).
>
> Unlike file caps, we don't reclaim these when the session goes away, and
> instead preemptively release them. In-flight async dirops are instead
> handled during reconnect phase. The client needs to re-do a synchronous
> operation in order to re-get directory caps.
>
> Signed-off-by: Jeff Layton <jlayton@kernel.org>
> ---
>  fs/ceph/caps.c               | 29 ++++++++++++++++++++---------
>  fs/ceph/mds_client.c         | 31 ++++++++++++++++++++++++++-----
>  fs/ceph/mds_client.h         |  6 +++++-
>  include/linux/ceph/ceph_fs.h |  6 ++++++
>  4 files changed, 57 insertions(+), 15 deletions(-)
>
> diff --git a/fs/ceph/caps.c b/fs/ceph/caps.c
> index 295837215a3a..d6c5ee33f30f 100644
> --- a/fs/ceph/caps.c
> +++ b/fs/ceph/caps.c
> @@ -992,7 +992,11 @@ int __ceph_caps_file_wanted(struct ceph_inode_info *ci)
>  int __ceph_caps_wanted(struct ceph_inode_info *ci)
>  {
>         int w = __ceph_caps_file_wanted(ci) | __ceph_caps_used(ci);
> -       if (!S_ISDIR(ci->vfs_inode.i_mode)) {
> +       if (S_ISDIR(ci->vfs_inode.i_mode)) {
> +               /* we want EXCL if holding caps of dir ops */
> +               if (w & CEPH_CAP_ANY_DIR_OPS)
> +                       w |= CEPH_CAP_FILE_EXCL;
> +       } else {
>                 /* we want EXCL if dirty data */
>                 if (w & CEPH_CAP_FILE_BUFFER)
>                         w |= CEPH_CAP_FILE_EXCL;
> @@ -1890,10 +1894,13 @@ void ceph_check_caps(struct ceph_inode_info *ci, int flags,
>                          * revoking the shared cap on every create/unlink
>                          * operation.
>                          */
> -                       if (IS_RDONLY(inode))
> +                       if (IS_RDONLY(inode)) {
>                                 want = CEPH_CAP_ANY_SHARED;
> -                       else
> -                               want = CEPH_CAP_ANY_SHARED | CEPH_CAP_FILE_EXCL;
> +                       } else {
> +                               want = CEPH_CAP_ANY_SHARED |
> +                                      CEPH_CAP_FILE_EXCL |
> +                                      CEPH_CAP_ANY_DIR_OPS;
> +                       }
>                         retain |= want;
>                 } else {
>
> @@ -2750,13 +2757,17 @@ int ceph_try_get_caps(struct inode *inode, int need, int want,
>         int ret;
>
>         BUG_ON(need & ~CEPH_CAP_FILE_RD);
> -       BUG_ON(want & ~(CEPH_CAP_FILE_CACHE|CEPH_CAP_FILE_LAZYIO|CEPH_CAP_FILE_SHARED));
> -       ret = ceph_pool_perm_check(inode, need);
> -       if (ret < 0)
> -               return ret;
> +       if (need) {
> +               ret = ceph_pool_perm_check(inode, need);
> +               if (ret < 0)
> +                       return ret;
> +       }
>
> +       BUG_ON(want & ~(CEPH_CAP_FILE_CACHE | CEPH_CAP_FILE_LAZYIO |
> +                       CEPH_CAP_FILE_SHARED | CEPH_CAP_FILE_EXCL |
> +                       CEPH_CAP_ANY_DIR_OPS));
>         ret = try_get_cap_refs(inode, need, want, 0,
> -                              (nonblock ? NON_BLOCKING : 0), got);
> +                              nonblock ? NON_BLOCKING : 0, got);
>         return ret == -EAGAIN ? 0 : ret;
>  }
>
> diff --git a/fs/ceph/mds_client.c b/fs/ceph/mds_client.c
> index 38eb9dd5062b..ef3dd6fe2f4d 100644
> --- a/fs/ceph/mds_client.c
> +++ b/fs/ceph/mds_client.c
> @@ -699,6 +699,7 @@ void ceph_mdsc_release_request(struct kref *kref)
>         struct ceph_mds_request *req = container_of(kref,
>                                                     struct ceph_mds_request,
>                                                     r_kref);
> +       ceph_mdsc_release_dir_caps(req);

I think we can do this in complete_request()

>         destroy_reply_info(&req->r_reply_info);
>         if (req->r_request)
>                 ceph_msg_put(req->r_request);
> @@ -3280,6 +3281,17 @@ static void handle_session(struct ceph_mds_session *session,
>         return;
>  }
>
> +void ceph_mdsc_release_dir_caps(struct ceph_mds_request *req)
> +{
> +       int dcaps;
> +
> +       dcaps = xchg(&req->r_dir_caps, 0);
> +       if (dcaps) {
> +               dout("releasing r_dir_caps=%s\n", ceph_cap_string(dcaps));
> +               ceph_put_cap_refs(ceph_inode(req->r_parent), dcaps);
> +       }
> +}
> +
>  /*
>   * called under session->mutex.
>   */
> @@ -3307,9 +3319,14 @@ static void replay_unsafe_requests(struct ceph_mds_client *mdsc,
>                         continue;
>                 if (req->r_attempts == 0)
>                         continue; /* only old requests */
> -               if (req->r_session &&
> -                   req->r_session->s_mds == session->s_mds)
> -                       __send_request(mdsc, session, req, true);
> +               if (!req->r_session)
> +                       continue;
> +               if (req->r_session->s_mds != session->s_mds)
> +                       continue;
> +
> +               ceph_mdsc_release_dir_caps(req);
> +
> +               __send_request(mdsc, session, req, true);
>         }
>         mutex_unlock(&mdsc->mutex);
>  }
> @@ -3393,7 +3410,7 @@ static int send_reconnect_partial(struct ceph_reconnect_state *recon_state)
>  /*
>   * Encode information about a cap for a reconnect with the MDS.
>   */
> -static int encode_caps_cb(struct inode *inode, struct ceph_cap *cap,
> +static int reconnect_caps_cb(struct inode *inode, struct ceph_cap *cap,
>                           void *arg)
>  {
>         union {
> @@ -3416,6 +3433,10 @@ static int encode_caps_cb(struct inode *inode, struct ceph_cap *cap,
>         cap->mseq = 0;       /* and migrate_seq */
>         cap->cap_gen = cap->session->s_cap_gen;
>
> +       /* These are lost when the session goes away */
> +       if (S_ISDIR(inode->i_mode))
> +               cap->issued &= ~CEPH_CAP_ANY_DIR_OPS;
> +
>         if (recon_state->msg_version >= 2) {
>                 rec.v2.cap_id = cpu_to_le64(cap->cap_id);
>                 rec.v2.wanted = cpu_to_le32(__ceph_caps_wanted(ci));
> @@ -3712,7 +3733,7 @@ static void send_mds_reconnect(struct ceph_mds_client *mdsc,
>                 recon_state.msg_version = 2;
>         }
>         /* trsaverse this session's caps */
> -       err = ceph_iterate_session_caps(session, encode_caps_cb, &recon_state);
> +       err = ceph_iterate_session_caps(session, reconnect_caps_cb, &recon_state);
>
>         spin_lock(&session->s_cap_lock);
>         session->s_cap_reconnect = 0;
> diff --git a/fs/ceph/mds_client.h b/fs/ceph/mds_client.h
> index 8043f2b439b1..f10d342ea585 100644
> --- a/fs/ceph/mds_client.h
> +++ b/fs/ceph/mds_client.h
> @@ -284,8 +284,11 @@ struct ceph_mds_request {
>         struct ceph_msg  *r_request;  /* original request */
>         struct ceph_msg  *r_reply;
>         struct ceph_mds_reply_info_parsed r_reply_info;
> -       struct page *r_locked_page;
>         int r_err;
> +
> +
> +       struct page *r_locked_page;
> +       int r_dir_caps;
>         int r_num_caps;
>         u32               r_readdir_offset;
>
> @@ -489,6 +492,7 @@ extern int ceph_mdsc_submit_request(struct ceph_mds_client *mdsc,
>  extern int ceph_mdsc_do_request(struct ceph_mds_client *mdsc,
>                                 struct inode *dir,
>                                 struct ceph_mds_request *req);
> +extern void ceph_mdsc_release_dir_caps(struct ceph_mds_request *req);
>  static inline void ceph_mdsc_get_request(struct ceph_mds_request *req)
>  {
>         kref_get(&req->r_kref);
> diff --git a/include/linux/ceph/ceph_fs.h b/include/linux/ceph/ceph_fs.h
> index 94cc4b047987..91d09cf37649 100644
> --- a/include/linux/ceph/ceph_fs.h
> +++ b/include/linux/ceph/ceph_fs.h
> @@ -663,6 +663,12 @@ int ceph_flags_to_mode(int flags);
>  #define CEPH_CAP_LOCKS (CEPH_LOCK_IFILE | CEPH_LOCK_IAUTH | CEPH_LOCK_ILINK | \
>                         CEPH_LOCK_IXATTR)
>
> +/* cap masks async dir operations */
> +#define CEPH_CAP_DIR_CREATE    CEPH_CAP_FILE_CACHE
> +#define CEPH_CAP_DIR_UNLINK    CEPH_CAP_FILE_RD
> +#define CEPH_CAP_ANY_DIR_OPS   (CEPH_CAP_FILE_CACHE | CEPH_CAP_FILE_RD | \
> +                                CEPH_CAP_FILE_WREXTEND | CEPH_CAP_FILE_LAZYIO)
> +
>  int ceph_caps_for_mode(int mode);
>
>  enum {
> --
> 2.24.1
>

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH v5 07/12] ceph: perform asynchronous unlink if we have sufficient caps
  2020-02-19 13:25 ` [PATCH v5 07/12] ceph: perform asynchronous unlink if we have sufficient caps Jeff Layton
@ 2020-02-20  6:44   ` Yan, Zheng
  2020-02-20 11:32     ` Jeff Layton
  0 siblings, 1 reply; 24+ messages in thread
From: Yan, Zheng @ 2020-02-20  6:44 UTC (permalink / raw)
  To: Jeff Layton
  Cc: ceph-devel, Ilya Dryomov, Sage Weil, Zheng Yan, Patrick Donnelly,
	Xiubo Li

On Wed, Feb 19, 2020 at 9:27 PM Jeff Layton <jlayton@kernel.org> wrote:
>
> The MDS is getting a new lock-caching facility that will allow it
> to cache the necessary locks to allow asynchronous directory operations.
> Since the CEPH_CAP_FILE_* caps are currently unused on directories,
> we can repurpose those bits for this purpose.
>
> When performing an unlink, if we have Fx on the parent directory,
> and CEPH_CAP_DIR_UNLINK (aka Fr), and we know that the dentry being
> removed is the primary link, then then we can fire off an unlink
> request immediately and don't need to wait on reply before returning.
>
> In that situation, just fix up the dcache and link count and return
> immediately after issuing the call to the MDS. This does mean that we
> need to hold an extra reference to the inode being unlinked, and extra
> references to the caps to avoid races. Those references are put and
> error handling is done in the r_callback routine.
>
> If the operation ends up failing, then set a writeback error on the
> directory inode, and the inode itself that can be fetched later by
> an fsync on the dir.
>
> The behavior of dir caps is slightly different from caps on normal
> files. Because these are just considered an optimization, if the
> session is reconnected, we will not automatically reclaim them. They
> are instead considered lost until we do another synchronous op in the
> parent directory.
>
> Async dirops are enabled via the "nowsync" mount option, which is
> patterned after the xfs "wsync" mount option. For now, the default
> is "wsync", but eventually we may flip that.
>
> Signed-off-by: Jeff Layton <jlayton@kernel.org>
> ---
>  fs/ceph/dir.c   | 103 ++++++++++++++++++++++++++++++++++++++++++++++--
>  fs/ceph/super.c |  20 ++++++++++
>  fs/ceph/super.h |   5 ++-
>  3 files changed, 123 insertions(+), 5 deletions(-)
>
> diff --git a/fs/ceph/dir.c b/fs/ceph/dir.c
> index 5b83bda57056..37ab09d223fc 100644
> --- a/fs/ceph/dir.c
> +++ b/fs/ceph/dir.c
> @@ -1036,6 +1036,73 @@ static int ceph_link(struct dentry *old_dentry, struct inode *dir,
>         return err;
>  }
>
> +static void ceph_async_unlink_cb(struct ceph_mds_client *mdsc,
> +                                struct ceph_mds_request *req)
> +{
> +       int result = req->r_err ? req->r_err :
> +                       le32_to_cpu(req->r_reply_info.head->result);
> +
> +       /* If op failed, mark everyone involved for errors */
> +       if (result) {

I think this function will get called for -EJUKEBOX case.


> +               int pathlen;
> +               u64 base;
> +               char *path = ceph_mdsc_build_path(req->r_dentry, &pathlen,
> +                                                 &base, 0);
> +
> +               /* mark error on parent + clear complete */
> +               mapping_set_error(req->r_parent->i_mapping, result);
> +               ceph_dir_clear_complete(req->r_parent);
> +
> +               /* drop the dentry -- we don't know its status */
> +               if (!d_unhashed(req->r_dentry))
> +                       d_drop(req->r_dentry);
> +
> +               /* mark inode itself for an error (since metadata is bogus) */
> +               mapping_set_error(req->r_old_inode->i_mapping, result);
> +
> +               pr_warn("ceph: async unlink failure path=(%llx)%s result=%d!\n",
> +                       base, IS_ERR(path) ? "<<bad>>" : path, result);
> +               ceph_mdsc_free_path(path, pathlen);
> +       }
> +       iput(req->r_old_inode);
> +}
> +
> +static int get_caps_for_async_unlink(struct inode *dir, struct dentry *dentry)
> +{
> +       struct ceph_inode_info *ci = ceph_inode(dir);
> +       struct ceph_dentry_info *di;
> +       int got = 0, want = CEPH_CAP_FILE_EXCL | CEPH_CAP_DIR_UNLINK;
> +
> +       spin_lock(&ci->i_ceph_lock);
> +       if ((__ceph_caps_issued(ci, NULL) & want) == want) {
> +               ceph_take_cap_refs(ci, want, false);
> +               got = want;
> +       }
> +       spin_unlock(&ci->i_ceph_lock);
> +
> +       /* If we didn't get anything, return 0 */
> +       if (!got)
> +               return 0;
> +
> +        spin_lock(&dentry->d_lock);
> +        di = ceph_dentry(dentry);
> +       /*
> +        * - We are holding Fx, which implies Fs caps.
> +        * - Only support async unlink for primary linkage
> +        */
> +       if (atomic_read(&ci->i_shared_gen) != di->lease_shared_gen ||
> +           !(di->flags & CEPH_DENTRY_PRIMARY_LINK))
> +               want = 0;
> +        spin_unlock(&dentry->d_lock);
> +
> +       /* Do we still want what we've got? */
> +       if (want == got)
> +               return got;
> +
> +       ceph_put_cap_refs(ci, got);
> +       return 0;
> +}
> +
>  /*
>   * rmdir and unlink are differ only by the metadata op code
>   */
> @@ -1045,6 +1112,7 @@ static int ceph_unlink(struct inode *dir, struct dentry *dentry)
>         struct ceph_mds_client *mdsc = fsc->mdsc;
>         struct inode *inode = d_inode(dentry);
>         struct ceph_mds_request *req;
> +       bool try_async = ceph_test_mount_opt(fsc, ASYNC_DIROPS);
>         int err = -EROFS;
>         int op;
>
> @@ -1059,6 +1127,7 @@ static int ceph_unlink(struct inode *dir, struct dentry *dentry)
>                         CEPH_MDS_OP_RMDIR : CEPH_MDS_OP_UNLINK;
>         } else
>                 goto out;
> +retry:
>         req = ceph_mdsc_create_request(mdsc, op, USE_AUTH_MDS);
>         if (IS_ERR(req)) {
>                 err = PTR_ERR(req);
> @@ -1067,13 +1136,39 @@ static int ceph_unlink(struct inode *dir, struct dentry *dentry)
>         req->r_dentry = dget(dentry);
>         req->r_num_caps = 2;
>         req->r_parent = dir;
> -       set_bit(CEPH_MDS_R_PARENT_LOCKED, &req->r_req_flags);
>         req->r_dentry_drop = CEPH_CAP_FILE_SHARED;
>         req->r_dentry_unless = CEPH_CAP_FILE_EXCL;
>         req->r_inode_drop = ceph_drop_caps_for_unlink(inode);
> -       err = ceph_mdsc_do_request(mdsc, dir, req);
> -       if (!err && !req->r_reply_info.head->is_dentry)
> -               d_delete(dentry);
> +
> +       if (try_async && op == CEPH_MDS_OP_UNLINK &&
> +           (req->r_dir_caps = get_caps_for_async_unlink(dir, dentry))) {
> +               dout("async unlink on %lu/%.*s caps=%s", dir->i_ino,
> +                    dentry->d_name.len, dentry->d_name.name,
> +                    ceph_cap_string(req->r_dir_caps));
> +               set_bit(CEPH_MDS_R_ASYNC, &req->r_req_flags);
> +               req->r_callback = ceph_async_unlink_cb;
> +               req->r_old_inode = d_inode(dentry);
> +               ihold(req->r_old_inode);
> +               err = ceph_mdsc_submit_request(mdsc, dir, req);
> +               if (!err) {
> +                       /*
> +                        * We have enough caps, so we assume that the unlink
> +                        * will succeed. Fix up the target inode and dcache.
> +                        */
> +                       drop_nlink(inode);
> +                       d_delete(dentry);
> +               } else if (err == -EJUKEBOX) {
> +                       try_async = false;
> +                       ceph_mdsc_put_request(req);
> +                       goto retry;
> +               }
> +       } else {
> +               set_bit(CEPH_MDS_R_PARENT_LOCKED, &req->r_req_flags);
> +               err = ceph_mdsc_do_request(mdsc, dir, req);
> +               if (!err && !req->r_reply_info.head->is_dentry)
> +                       d_delete(dentry);
> +       }
> +
>         ceph_mdsc_put_request(req);
>  out:
>         return err;
> diff --git a/fs/ceph/super.c b/fs/ceph/super.c
> index b1329cd5388a..c9784eb1159a 100644
> --- a/fs/ceph/super.c
> +++ b/fs/ceph/super.c
> @@ -155,6 +155,7 @@ enum {
>         Opt_acl,
>         Opt_quotadf,
>         Opt_copyfrom,
> +       Opt_wsync,
>  };
>
>  enum ceph_recover_session_mode {
> @@ -194,6 +195,7 @@ static const struct fs_parameter_spec ceph_mount_parameters[] = {
>         fsparam_string  ("snapdirname",                 Opt_snapdirname),
>         fsparam_string  ("source",                      Opt_source),
>         fsparam_u32     ("wsize",                       Opt_wsize),
> +       fsparam_flag_no ("wsync",                       Opt_wsync),
>         {}
>  };
>
> @@ -444,6 +446,12 @@ static int ceph_parse_mount_param(struct fs_context *fc,
>                         fc->sb_flags &= ~SB_POSIXACL;
>                 }
>                 break;
> +       case Opt_wsync:
> +               if (!result.negated)
> +                       fsopt->flags &= ~CEPH_MOUNT_OPT_ASYNC_DIROPS;
> +               else
> +                       fsopt->flags |= CEPH_MOUNT_OPT_ASYNC_DIROPS;
> +               break;
>         default:
>                 BUG();
>         }
> @@ -567,6 +575,9 @@ static int ceph_show_options(struct seq_file *m, struct dentry *root)
>         if (fsopt->flags & CEPH_MOUNT_OPT_CLEANRECOVER)
>                 seq_show_option(m, "recover_session", "clean");
>
> +       if (fsopt->flags & CEPH_MOUNT_OPT_ASYNC_DIROPS)
> +               seq_puts(m, ",nowsync");
> +
>         if (fsopt->wsize != CEPH_MAX_WRITE_SIZE)
>                 seq_printf(m, ",wsize=%u", fsopt->wsize);
>         if (fsopt->rsize != CEPH_MAX_READ_SIZE)
> @@ -1115,6 +1126,15 @@ static void ceph_free_fc(struct fs_context *fc)
>
>  static int ceph_reconfigure_fc(struct fs_context *fc)
>  {
> +       struct ceph_parse_opts_ctx *pctx = fc->fs_private;
> +       struct ceph_mount_options *fsopt = pctx->opts;
> +       struct ceph_fs_client *fsc = ceph_sb_to_client(fc->root->d_sb);
> +
> +       if (fsopt->flags & CEPH_MOUNT_OPT_ASYNC_DIROPS)
> +               ceph_set_mount_opt(fsc, ASYNC_DIROPS);
> +       else
> +               ceph_clear_mount_opt(fsc, ASYNC_DIROPS);
> +
>         sync_filesystem(fc->root->d_sb);
>         return 0;
>  }
> diff --git a/fs/ceph/super.h b/fs/ceph/super.h
> index 2393803c38de..1b4996efc111 100644
> --- a/fs/ceph/super.h
> +++ b/fs/ceph/super.h
> @@ -43,13 +43,16 @@
>  #define CEPH_MOUNT_OPT_MOUNTWAIT       (1<<12) /* mount waits if no mds is up */
>  #define CEPH_MOUNT_OPT_NOQUOTADF       (1<<13) /* no root dir quota in statfs */
>  #define CEPH_MOUNT_OPT_NOCOPYFROM      (1<<14) /* don't use RADOS 'copy-from' op */
> +#define CEPH_MOUNT_OPT_ASYNC_DIROPS    (1<<15) /* allow async directory ops */
>
>  #define CEPH_MOUNT_OPT_DEFAULT                 \
>         (CEPH_MOUNT_OPT_DCACHE |                \
>          CEPH_MOUNT_OPT_NOCOPYFROM)
>
>  #define ceph_set_mount_opt(fsc, opt) \
> -       (fsc)->mount_options->flags |= CEPH_MOUNT_OPT_##opt;
> +       (fsc)->mount_options->flags |= CEPH_MOUNT_OPT_##opt
> +#define ceph_clear_mount_opt(fsc, opt) \
> +       (fsc)->mount_options->flags &= ~CEPH_MOUNT_OPT_##opt
>  #define ceph_test_mount_opt(fsc, opt) \
>         (!!((fsc)->mount_options->flags & CEPH_MOUNT_OPT_##opt))
>
> --
> 2.24.1
>

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH v5 05/12] ceph: cap tracking for async directory operations
  2020-02-20  6:42   ` Yan, Zheng
@ 2020-02-20 11:30     ` Jeff Layton
  0 siblings, 0 replies; 24+ messages in thread
From: Jeff Layton @ 2020-02-20 11:30 UTC (permalink / raw)
  To: Yan, Zheng
  Cc: ceph-devel, Ilya Dryomov, Sage Weil, Zheng Yan, Patrick Donnelly,
	Xiubo Li

On Thu, 2020-02-20 at 14:42 +0800, Yan, Zheng wrote:
> On Wed, Feb 19, 2020 at 9:27 PM Jeff Layton <jlayton@kernel.org> wrote:
> > Track and correctly handle directory caps for asynchronous operations.
> > Add aliases for Frc caps that we now designate at Dcu caps (when dealing
> > with directories).
> > 
> > Unlike file caps, we don't reclaim these when the session goes away, and
> > instead preemptively release them. In-flight async dirops are instead
> > handled during reconnect phase. The client needs to re-do a synchronous
> > operation in order to re-get directory caps.
> > 
> > Signed-off-by: Jeff Layton <jlayton@kernel.org>
> > ---
> >  fs/ceph/caps.c               | 29 ++++++++++++++++++++---------
> >  fs/ceph/mds_client.c         | 31 ++++++++++++++++++++++++++-----
> >  fs/ceph/mds_client.h         |  6 +++++-
> >  include/linux/ceph/ceph_fs.h |  6 ++++++
> >  4 files changed, 57 insertions(+), 15 deletions(-)
> > 
> > diff --git a/fs/ceph/caps.c b/fs/ceph/caps.c
> > index 295837215a3a..d6c5ee33f30f 100644
> > --- a/fs/ceph/caps.c
> > +++ b/fs/ceph/caps.c
> > @@ -992,7 +992,11 @@ int __ceph_caps_file_wanted(struct ceph_inode_info *ci)
> >  int __ceph_caps_wanted(struct ceph_inode_info *ci)
> >  {
> >         int w = __ceph_caps_file_wanted(ci) | __ceph_caps_used(ci);
> > -       if (!S_ISDIR(ci->vfs_inode.i_mode)) {
> > +       if (S_ISDIR(ci->vfs_inode.i_mode)) {
> > +               /* we want EXCL if holding caps of dir ops */
> > +               if (w & CEPH_CAP_ANY_DIR_OPS)
> > +                       w |= CEPH_CAP_FILE_EXCL;
> > +       } else {
> >                 /* we want EXCL if dirty data */
> >                 if (w & CEPH_CAP_FILE_BUFFER)
> >                         w |= CEPH_CAP_FILE_EXCL;
> > @@ -1890,10 +1894,13 @@ void ceph_check_caps(struct ceph_inode_info *ci, int flags,
> >                          * revoking the shared cap on every create/unlink
> >                          * operation.
> >                          */
> > -                       if (IS_RDONLY(inode))
> > +                       if (IS_RDONLY(inode)) {
> >                                 want = CEPH_CAP_ANY_SHARED;
> > -                       else
> > -                               want = CEPH_CAP_ANY_SHARED | CEPH_CAP_FILE_EXCL;
> > +                       } else {
> > +                               want = CEPH_CAP_ANY_SHARED |
> > +                                      CEPH_CAP_FILE_EXCL |
> > +                                      CEPH_CAP_ANY_DIR_OPS;
> > +                       }
> >                         retain |= want;
> >                 } else {
> > 
> > @@ -2750,13 +2757,17 @@ int ceph_try_get_caps(struct inode *inode, int need, int want,
> >         int ret;
> > 
> >         BUG_ON(need & ~CEPH_CAP_FILE_RD);
> > -       BUG_ON(want & ~(CEPH_CAP_FILE_CACHE|CEPH_CAP_FILE_LAZYIO|CEPH_CAP_FILE_SHARED));
> > -       ret = ceph_pool_perm_check(inode, need);
> > -       if (ret < 0)
> > -               return ret;
> > +       if (need) {
> > +               ret = ceph_pool_perm_check(inode, need);
> > +               if (ret < 0)
> > +                       return ret;
> > +       }
> > 
> > +       BUG_ON(want & ~(CEPH_CAP_FILE_CACHE | CEPH_CAP_FILE_LAZYIO |
> > +                       CEPH_CAP_FILE_SHARED | CEPH_CAP_FILE_EXCL |
> > +                       CEPH_CAP_ANY_DIR_OPS));
> >         ret = try_get_cap_refs(inode, need, want, 0,
> > -                              (nonblock ? NON_BLOCKING : 0), got);
> > +                              nonblock ? NON_BLOCKING : 0, got);
> >         return ret == -EAGAIN ? 0 : ret;
> >  }
> > 
> > diff --git a/fs/ceph/mds_client.c b/fs/ceph/mds_client.c
> > index 38eb9dd5062b..ef3dd6fe2f4d 100644
> > --- a/fs/ceph/mds_client.c
> > +++ b/fs/ceph/mds_client.c
> > @@ -699,6 +699,7 @@ void ceph_mdsc_release_request(struct kref *kref)
> >         struct ceph_mds_request *req = container_of(kref,
> >                                                     struct ceph_mds_request,
> >                                                     r_kref);
> > +       ceph_mdsc_release_dir_caps(req);
> 
> I think we can do this in complete_request()
> 

That's what I was doing originally, but doing it there complicates the
error handling in the case where we end up tearing down the request
before it's ever submitted for transmission. It's also weird, as
ceph_mdsc_release_request is where other resources for the request are
released.

It's safe to call it more than once though, so I suppose we could do it
in complete_request and here.

> >         destroy_reply_info(&req->r_reply_info);
> >         if (req->r_request)
> >                 ceph_msg_put(req->r_request);
> > @@ -3280,6 +3281,17 @@ static void handle_session(struct ceph_mds_session *session,
> >         return;
> >  }
> > 
> > +void ceph_mdsc_release_dir_caps(struct ceph_mds_request *req)
> > +{
> > +       int dcaps;
> > +
> > +       dcaps = xchg(&req->r_dir_caps, 0);
> > +       if (dcaps) {
> > +               dout("releasing r_dir_caps=%s\n", ceph_cap_string(dcaps));
> > +               ceph_put_cap_refs(ceph_inode(req->r_parent), dcaps);
> > +       }
> > +}
> > +
> >  /*
> >   * called under session->mutex.
> >   */
> > @@ -3307,9 +3319,14 @@ static void replay_unsafe_requests(struct ceph_mds_client *mdsc,
> >                         continue;
> >                 if (req->r_attempts == 0)
> >                         continue; /* only old requests */
> > -               if (req->r_session &&
> > -                   req->r_session->s_mds == session->s_mds)
> > -                       __send_request(mdsc, session, req, true);
> > +               if (!req->r_session)
> > +                       continue;
> > +               if (req->r_session->s_mds != session->s_mds)
> > +                       continue;
> > +
> > +               ceph_mdsc_release_dir_caps(req);
> > +
> > +               __send_request(mdsc, session, req, true);
> >         }
> >         mutex_unlock(&mdsc->mutex);
> >  }
> > @@ -3393,7 +3410,7 @@ static int send_reconnect_partial(struct ceph_reconnect_state *recon_state)
> >  /*
> >   * Encode information about a cap for a reconnect with the MDS.
> >   */
> > -static int encode_caps_cb(struct inode *inode, struct ceph_cap *cap,
> > +static int reconnect_caps_cb(struct inode *inode, struct ceph_cap *cap,
> >                           void *arg)
> >  {
> >         union {
> > @@ -3416,6 +3433,10 @@ static int encode_caps_cb(struct inode *inode, struct ceph_cap *cap,
> >         cap->mseq = 0;       /* and migrate_seq */
> >         cap->cap_gen = cap->session->s_cap_gen;
> > 
> > +       /* These are lost when the session goes away */
> > +       if (S_ISDIR(inode->i_mode))
> > +               cap->issued &= ~CEPH_CAP_ANY_DIR_OPS;
> > +
> >         if (recon_state->msg_version >= 2) {
> >                 rec.v2.cap_id = cpu_to_le64(cap->cap_id);
> >                 rec.v2.wanted = cpu_to_le32(__ceph_caps_wanted(ci));
> > @@ -3712,7 +3733,7 @@ static void send_mds_reconnect(struct ceph_mds_client *mdsc,
> >                 recon_state.msg_version = 2;
> >         }
> >         /* trsaverse this session's caps */
> > -       err = ceph_iterate_session_caps(session, encode_caps_cb, &recon_state);
> > +       err = ceph_iterate_session_caps(session, reconnect_caps_cb, &recon_state);
> > 
> >         spin_lock(&session->s_cap_lock);
> >         session->s_cap_reconnect = 0;
> > diff --git a/fs/ceph/mds_client.h b/fs/ceph/mds_client.h
> > index 8043f2b439b1..f10d342ea585 100644
> > --- a/fs/ceph/mds_client.h
> > +++ b/fs/ceph/mds_client.h
> > @@ -284,8 +284,11 @@ struct ceph_mds_request {
> >         struct ceph_msg  *r_request;  /* original request */
> >         struct ceph_msg  *r_reply;
> >         struct ceph_mds_reply_info_parsed r_reply_info;
> > -       struct page *r_locked_page;
> >         int r_err;
> > +
> > +
> > +       struct page *r_locked_page;
> > +       int r_dir_caps;
> >         int r_num_caps;
> >         u32               r_readdir_offset;
> > 
> > @@ -489,6 +492,7 @@ extern int ceph_mdsc_submit_request(struct ceph_mds_client *mdsc,
> >  extern int ceph_mdsc_do_request(struct ceph_mds_client *mdsc,
> >                                 struct inode *dir,
> >                                 struct ceph_mds_request *req);
> > +extern void ceph_mdsc_release_dir_caps(struct ceph_mds_request *req);
> >  static inline void ceph_mdsc_get_request(struct ceph_mds_request *req)
> >  {
> >         kref_get(&req->r_kref);
> > diff --git a/include/linux/ceph/ceph_fs.h b/include/linux/ceph/ceph_fs.h
> > index 94cc4b047987..91d09cf37649 100644
> > --- a/include/linux/ceph/ceph_fs.h
> > +++ b/include/linux/ceph/ceph_fs.h
> > @@ -663,6 +663,12 @@ int ceph_flags_to_mode(int flags);
> >  #define CEPH_CAP_LOCKS (CEPH_LOCK_IFILE | CEPH_LOCK_IAUTH | CEPH_LOCK_ILINK | \
> >                         CEPH_LOCK_IXATTR)
> > 
> > +/* cap masks async dir operations */
> > +#define CEPH_CAP_DIR_CREATE    CEPH_CAP_FILE_CACHE
> > +#define CEPH_CAP_DIR_UNLINK    CEPH_CAP_FILE_RD
> > +#define CEPH_CAP_ANY_DIR_OPS   (CEPH_CAP_FILE_CACHE | CEPH_CAP_FILE_RD | \
> > +                                CEPH_CAP_FILE_WREXTEND | CEPH_CAP_FILE_LAZYIO)
> > +
> >  int ceph_caps_for_mode(int mode);
> > 
> >  enum {
> > --
> > 2.24.1
> > 

-- 
Jeff Layton <jlayton@kernel.org>

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH v5 07/12] ceph: perform asynchronous unlink if we have sufficient caps
  2020-02-20  6:44   ` Yan, Zheng
@ 2020-02-20 11:32     ` Jeff Layton
  0 siblings, 0 replies; 24+ messages in thread
From: Jeff Layton @ 2020-02-20 11:32 UTC (permalink / raw)
  To: Yan, Zheng
  Cc: ceph-devel, Ilya Dryomov, Sage Weil, Zheng Yan, Patrick Donnelly,
	Xiubo Li

On Thu, 2020-02-20 at 14:44 +0800, Yan, Zheng wrote:
> On Wed, Feb 19, 2020 at 9:27 PM Jeff Layton <jlayton@kernel.org> wrote:
> > The MDS is getting a new lock-caching facility that will allow it
> > to cache the necessary locks to allow asynchronous directory operations.
> > Since the CEPH_CAP_FILE_* caps are currently unused on directories,
> > we can repurpose those bits for this purpose.
> > 
> > When performing an unlink, if we have Fx on the parent directory,
> > and CEPH_CAP_DIR_UNLINK (aka Fr), and we know that the dentry being
> > removed is the primary link, then then we can fire off an unlink
> > request immediately and don't need to wait on reply before returning.
> > 
> > In that situation, just fix up the dcache and link count and return
> > immediately after issuing the call to the MDS. This does mean that we
> > need to hold an extra reference to the inode being unlinked, and extra
> > references to the caps to avoid races. Those references are put and
> > error handling is done in the r_callback routine.
> > 
> > If the operation ends up failing, then set a writeback error on the
> > directory inode, and the inode itself that can be fetched later by
> > an fsync on the dir.
> > 
> > The behavior of dir caps is slightly different from caps on normal
> > files. Because these are just considered an optimization, if the
> > session is reconnected, we will not automatically reclaim them. They
> > are instead considered lost until we do another synchronous op in the
> > parent directory.
> > 
> > Async dirops are enabled via the "nowsync" mount option, which is
> > patterned after the xfs "wsync" mount option. For now, the default
> > is "wsync", but eventually we may flip that.
> > 
> > Signed-off-by: Jeff Layton <jlayton@kernel.org>
> > ---
> >  fs/ceph/dir.c   | 103 ++++++++++++++++++++++++++++++++++++++++++++++--
> >  fs/ceph/super.c |  20 ++++++++++
> >  fs/ceph/super.h |   5 ++-
> >  3 files changed, 123 insertions(+), 5 deletions(-)
> > 
> > diff --git a/fs/ceph/dir.c b/fs/ceph/dir.c
> > index 5b83bda57056..37ab09d223fc 100644
> > --- a/fs/ceph/dir.c
> > +++ b/fs/ceph/dir.c
> > @@ -1036,6 +1036,73 @@ static int ceph_link(struct dentry *old_dentry, struct inode *dir,
> >         return err;
> >  }
> > 
> > +static void ceph_async_unlink_cb(struct ceph_mds_client *mdsc,
> > +                                struct ceph_mds_request *req)
> > +{
> > +       int result = req->r_err ? req->r_err :
> > +                       le32_to_cpu(req->r_reply_info.head->result);
> > +
> > +       /* If op failed, mark everyone involved for errors */
> > +       if (result) {
> 
> I think this function will get called for -EJUKEBOX case.
> 

Good catch. I'll have another look at how to handle this better.

> 
> > +               int pathlen;
> > +               u64 base;
> > +               char *path = ceph_mdsc_build_path(req->r_dentry, &pathlen,
> > +                                                 &base, 0);
> > +
> > +               /* mark error on parent + clear complete */
> > +               mapping_set_error(req->r_parent->i_mapping, result);
> > +               ceph_dir_clear_complete(req->r_parent);
> > +
> > +               /* drop the dentry -- we don't know its status */
> > +               if (!d_unhashed(req->r_dentry))
> > +                       d_drop(req->r_dentry);
> > +
> > +               /* mark inode itself for an error (since metadata is bogus) */
> > +               mapping_set_error(req->r_old_inode->i_mapping, result);
> > +
> > +               pr_warn("ceph: async unlink failure path=(%llx)%s result=%d!\n",
> > +                       base, IS_ERR(path) ? "<<bad>>" : path, result);
> > +               ceph_mdsc_free_path(path, pathlen);
> > +       }
> > +       iput(req->r_old_inode);
> > +}
> > +
> > +static int get_caps_for_async_unlink(struct inode *dir, struct dentry *dentry)
> > +{
> > +       struct ceph_inode_info *ci = ceph_inode(dir);
> > +       struct ceph_dentry_info *di;
> > +       int got = 0, want = CEPH_CAP_FILE_EXCL | CEPH_CAP_DIR_UNLINK;
> > +
> > +       spin_lock(&ci->i_ceph_lock);
> > +       if ((__ceph_caps_issued(ci, NULL) & want) == want) {
> > +               ceph_take_cap_refs(ci, want, false);
> > +               got = want;
> > +       }
> > +       spin_unlock(&ci->i_ceph_lock);
> > +
> > +       /* If we didn't get anything, return 0 */
> > +       if (!got)
> > +               return 0;
> > +
> > +        spin_lock(&dentry->d_lock);
> > +        di = ceph_dentry(dentry);
> > +       /*
> > +        * - We are holding Fx, which implies Fs caps.
> > +        * - Only support async unlink for primary linkage
> > +        */
> > +       if (atomic_read(&ci->i_shared_gen) != di->lease_shared_gen ||
> > +           !(di->flags & CEPH_DENTRY_PRIMARY_LINK))
> > +               want = 0;
> > +        spin_unlock(&dentry->d_lock);
> > +
> > +       /* Do we still want what we've got? */
> > +       if (want == got)
> > +               return got;
> > +
> > +       ceph_put_cap_refs(ci, got);
> > +       return 0;
> > +}
> > +
> >  /*
> >   * rmdir and unlink are differ only by the metadata op code
> >   */
> > @@ -1045,6 +1112,7 @@ static int ceph_unlink(struct inode *dir, struct dentry *dentry)
> >         struct ceph_mds_client *mdsc = fsc->mdsc;
> >         struct inode *inode = d_inode(dentry);
> >         struct ceph_mds_request *req;
> > +       bool try_async = ceph_test_mount_opt(fsc, ASYNC_DIROPS);
> >         int err = -EROFS;
> >         int op;
> > 
> > @@ -1059,6 +1127,7 @@ static int ceph_unlink(struct inode *dir, struct dentry *dentry)
> >                         CEPH_MDS_OP_RMDIR : CEPH_MDS_OP_UNLINK;
> >         } else
> >                 goto out;
> > +retry:
> >         req = ceph_mdsc_create_request(mdsc, op, USE_AUTH_MDS);
> >         if (IS_ERR(req)) {
> >                 err = PTR_ERR(req);
> > @@ -1067,13 +1136,39 @@ static int ceph_unlink(struct inode *dir, struct dentry *dentry)
> >         req->r_dentry = dget(dentry);
> >         req->r_num_caps = 2;
> >         req->r_parent = dir;
> > -       set_bit(CEPH_MDS_R_PARENT_LOCKED, &req->r_req_flags);
> >         req->r_dentry_drop = CEPH_CAP_FILE_SHARED;
> >         req->r_dentry_unless = CEPH_CAP_FILE_EXCL;
> >         req->r_inode_drop = ceph_drop_caps_for_unlink(inode);
> > -       err = ceph_mdsc_do_request(mdsc, dir, req);
> > -       if (!err && !req->r_reply_info.head->is_dentry)
> > -               d_delete(dentry);
> > +
> > +       if (try_async && op == CEPH_MDS_OP_UNLINK &&
> > +           (req->r_dir_caps = get_caps_for_async_unlink(dir, dentry))) {
> > +               dout("async unlink on %lu/%.*s caps=%s", dir->i_ino,
> > +                    dentry->d_name.len, dentry->d_name.name,
> > +                    ceph_cap_string(req->r_dir_caps));
> > +               set_bit(CEPH_MDS_R_ASYNC, &req->r_req_flags);
> > +               req->r_callback = ceph_async_unlink_cb;
> > +               req->r_old_inode = d_inode(dentry);
> > +               ihold(req->r_old_inode);
> > +               err = ceph_mdsc_submit_request(mdsc, dir, req);
> > +               if (!err) {
> > +                       /*
> > +                        * We have enough caps, so we assume that the unlink
> > +                        * will succeed. Fix up the target inode and dcache.
> > +                        */
> > +                       drop_nlink(inode);
> > +                       d_delete(dentry);
> > +               } else if (err == -EJUKEBOX) {
> > +                       try_async = false;
> > +                       ceph_mdsc_put_request(req);
> > +                       goto retry;
> > +               }
> > +       } else {
> > +               set_bit(CEPH_MDS_R_PARENT_LOCKED, &req->r_req_flags);
> > +               err = ceph_mdsc_do_request(mdsc, dir, req);
> > +               if (!err && !req->r_reply_info.head->is_dentry)
> > +                       d_delete(dentry);
> > +       }
> > +
> >         ceph_mdsc_put_request(req);
> >  out:
> >         return err;
> > diff --git a/fs/ceph/super.c b/fs/ceph/super.c
> > index b1329cd5388a..c9784eb1159a 100644
> > --- a/fs/ceph/super.c
> > +++ b/fs/ceph/super.c
> > @@ -155,6 +155,7 @@ enum {
> >         Opt_acl,
> >         Opt_quotadf,
> >         Opt_copyfrom,
> > +       Opt_wsync,
> >  };
> > 
> >  enum ceph_recover_session_mode {
> > @@ -194,6 +195,7 @@ static const struct fs_parameter_spec ceph_mount_parameters[] = {
> >         fsparam_string  ("snapdirname",                 Opt_snapdirname),
> >         fsparam_string  ("source",                      Opt_source),
> >         fsparam_u32     ("wsize",                       Opt_wsize),
> > +       fsparam_flag_no ("wsync",                       Opt_wsync),
> >         {}
> >  };
> > 
> > @@ -444,6 +446,12 @@ static int ceph_parse_mount_param(struct fs_context *fc,
> >                         fc->sb_flags &= ~SB_POSIXACL;
> >                 }
> >                 break;
> > +       case Opt_wsync:
> > +               if (!result.negated)
> > +                       fsopt->flags &= ~CEPH_MOUNT_OPT_ASYNC_DIROPS;
> > +               else
> > +                       fsopt->flags |= CEPH_MOUNT_OPT_ASYNC_DIROPS;
> > +               break;
> >         default:
> >                 BUG();
> >         }
> > @@ -567,6 +575,9 @@ static int ceph_show_options(struct seq_file *m, struct dentry *root)
> >         if (fsopt->flags & CEPH_MOUNT_OPT_CLEANRECOVER)
> >                 seq_show_option(m, "recover_session", "clean");
> > 
> > +       if (fsopt->flags & CEPH_MOUNT_OPT_ASYNC_DIROPS)
> > +               seq_puts(m, ",nowsync");
> > +
> >         if (fsopt->wsize != CEPH_MAX_WRITE_SIZE)
> >                 seq_printf(m, ",wsize=%u", fsopt->wsize);
> >         if (fsopt->rsize != CEPH_MAX_READ_SIZE)
> > @@ -1115,6 +1126,15 @@ static void ceph_free_fc(struct fs_context *fc)
> > 
> >  static int ceph_reconfigure_fc(struct fs_context *fc)
> >  {
> > +       struct ceph_parse_opts_ctx *pctx = fc->fs_private;
> > +       struct ceph_mount_options *fsopt = pctx->opts;
> > +       struct ceph_fs_client *fsc = ceph_sb_to_client(fc->root->d_sb);
> > +
> > +       if (fsopt->flags & CEPH_MOUNT_OPT_ASYNC_DIROPS)
> > +               ceph_set_mount_opt(fsc, ASYNC_DIROPS);
> > +       else
> > +               ceph_clear_mount_opt(fsc, ASYNC_DIROPS);
> > +
> >         sync_filesystem(fc->root->d_sb);
> >         return 0;
> >  }
> > diff --git a/fs/ceph/super.h b/fs/ceph/super.h
> > index 2393803c38de..1b4996efc111 100644
> > --- a/fs/ceph/super.h
> > +++ b/fs/ceph/super.h
> > @@ -43,13 +43,16 @@
> >  #define CEPH_MOUNT_OPT_MOUNTWAIT       (1<<12) /* mount waits if no mds is up */
> >  #define CEPH_MOUNT_OPT_NOQUOTADF       (1<<13) /* no root dir quota in statfs */
> >  #define CEPH_MOUNT_OPT_NOCOPYFROM      (1<<14) /* don't use RADOS 'copy-from' op */
> > +#define CEPH_MOUNT_OPT_ASYNC_DIROPS    (1<<15) /* allow async directory ops */
> > 
> >  #define CEPH_MOUNT_OPT_DEFAULT                 \
> >         (CEPH_MOUNT_OPT_DCACHE |                \
> >          CEPH_MOUNT_OPT_NOCOPYFROM)
> > 
> >  #define ceph_set_mount_opt(fsc, opt) \
> > -       (fsc)->mount_options->flags |= CEPH_MOUNT_OPT_##opt;
> > +       (fsc)->mount_options->flags |= CEPH_MOUNT_OPT_##opt
> > +#define ceph_clear_mount_opt(fsc, opt) \
> > +       (fsc)->mount_options->flags &= ~CEPH_MOUNT_OPT_##opt
> >  #define ceph_test_mount_opt(fsc, opt) \
> >         (!!((fsc)->mount_options->flags & CEPH_MOUNT_OPT_##opt))
> > 
> > --
> > 2.24.1
> > 

-- 
Jeff Layton <jlayton@kernel.org>

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH v5 03/12] ceph: add infrastructure for waiting for async create to complete
  2020-02-20  3:32   ` Yan, Zheng
@ 2020-02-20 13:01     ` Jeff Layton
  2020-02-20 13:33       ` Yan, Zheng
  0 siblings, 1 reply; 24+ messages in thread
From: Jeff Layton @ 2020-02-20 13:01 UTC (permalink / raw)
  To: Yan, Zheng
  Cc: ceph-devel, Ilya Dryomov, Sage Weil, Zheng Yan, Patrick Donnelly,
	Xiubo Li

On Thu, 2020-02-20 at 11:32 +0800, Yan, Zheng wrote:
> On Wed, Feb 19, 2020 at 9:27 PM Jeff Layton <jlayton@kernel.org> wrote:
> > When we issue an async create, we must ensure that any later on-the-wire
> > requests involving it wait for the create reply.
> > 
> > Expand i_ceph_flags to be an unsigned long, and add a new bit that
> > MDS requests can wait on. If the bit is set in the inode when sending
> > caps, then don't send it and just return that it has been delayed.
> > 
> > Signed-off-by: Jeff Layton <jlayton@kernel.org>
> > ---
> >  fs/ceph/caps.c       | 13 ++++++++++++-
> >  fs/ceph/dir.c        |  2 +-
> >  fs/ceph/mds_client.c | 20 +++++++++++++++++++-
> >  fs/ceph/mds_client.h |  7 +++++++
> >  fs/ceph/super.h      |  4 +++-
> >  5 files changed, 42 insertions(+), 4 deletions(-)
> > 
> > diff --git a/fs/ceph/caps.c b/fs/ceph/caps.c
> > index d05717397c2a..85e13aa359d2 100644
> > --- a/fs/ceph/caps.c
> > +++ b/fs/ceph/caps.c
> > @@ -511,7 +511,7 @@ static void __cap_delay_requeue(struct ceph_mds_client *mdsc,
> >                                 struct ceph_inode_info *ci,
> >                                 bool set_timeout)
> >  {
> > -       dout("__cap_delay_requeue %p flags %d at %lu\n", &ci->vfs_inode,
> > +       dout("__cap_delay_requeue %p flags 0x%lx at %lu\n", &ci->vfs_inode,
> >              ci->i_ceph_flags, ci->i_hold_caps_max);
> >         if (!mdsc->stopping) {
> >                 spin_lock(&mdsc->cap_delay_lock);
> > @@ -1294,6 +1294,13 @@ static int __send_cap(struct ceph_mds_client *mdsc, struct ceph_cap *cap,
> >         int delayed = 0;
> >         int ret;
> > 
> > +       /* Don't send anything if it's still being created. Return delayed */
> > +       if (ci->i_ceph_flags & CEPH_I_ASYNC_CREATE) {
> > +               spin_unlock(&ci->i_ceph_lock);
> > +               dout("%s async create in flight for %p\n", __func__, inode);
> > +               return 1;
> > +       }
> > +
> 
> Maybe it's better to check this in ceph_check_caps().  Other callers
> of __send_cap() shouldn't encounter async creating inode
> 

I've been looking, but what actually guarantees that?

Only ceph_check_caps calls it for UPDATE, but the other two callers call
it for FLUSH. I don't see what prevents the kernel from (e.g.) calling
write_inode before the create reply comes in, particularly if we just
create and then close the file.

As a side note, I still struggle with the fact thatthere seems to be no
coherent overall description of the cap protocol. What distinguishes a
FLUSH from an UPDATE, for instance? The MDS code and comments seem to
treat them somewhat interchangeably.


> >         held = cap->issued | cap->implemented;
> >         revoking = cap->implemented & ~cap->issued;
> >         retain &= ~revoking;
> > @@ -2250,6 +2257,10 @@ int ceph_fsync(struct file *file, loff_t start, loff_t end, int datasync)
> >         if (datasync)
> >                 goto out;
> > 
> > +       ret = ceph_wait_on_async_create(inode);
> > +       if (ret)
> > +               goto out;
> > +
> >         dirty = try_flush_caps(inode, &flush_tid);
> >         dout("fsync dirty caps are %s\n", ceph_cap_string(dirty));
> > 
> > diff --git a/fs/ceph/dir.c b/fs/ceph/dir.c
> > index a87274935a09..5b83bda57056 100644
> > --- a/fs/ceph/dir.c
> > +++ b/fs/ceph/dir.c
> > @@ -752,7 +752,7 @@ static struct dentry *ceph_lookup(struct inode *dir, struct dentry *dentry,
> >                 struct ceph_dentry_info *di = ceph_dentry(dentry);
> > 
> >                 spin_lock(&ci->i_ceph_lock);
> > -               dout(" dir %p flags are %d\n", dir, ci->i_ceph_flags);
> > +               dout(" dir %p flags are 0x%lx\n", dir, ci->i_ceph_flags);
> >                 if (strncmp(dentry->d_name.name,
> >                             fsc->mount_options->snapdir_name,
> >                             dentry->d_name.len) &&
> > diff --git a/fs/ceph/mds_client.c b/fs/ceph/mds_client.c
> > index 94d18e643a3d..38eb9dd5062b 100644
> > --- a/fs/ceph/mds_client.c
> > +++ b/fs/ceph/mds_client.c
> > @@ -2730,7 +2730,7 @@ static void kick_requests(struct ceph_mds_client *mdsc, int mds)
> >  int ceph_mdsc_submit_request(struct ceph_mds_client *mdsc, struct inode *dir,
> >                               struct ceph_mds_request *req)
> >  {
> > -       int err;
> > +       int err = 0;
> > 
> >         /* take CAP_PIN refs for r_inode, r_parent, r_old_dentry */
> >         if (req->r_inode)
> > @@ -2743,6 +2743,24 @@ int ceph_mdsc_submit_request(struct ceph_mds_client *mdsc, struct inode *dir,
> >                 ceph_get_cap_refs(ceph_inode(req->r_old_dentry_dir),
> >                                   CEPH_CAP_PIN);
> > 
> > +       if (req->r_inode) {
> > +               err = ceph_wait_on_async_create(req->r_inode);
> > +               if (err) {
> > +                       dout("%s: wait for async create returned: %d\n",
> > +                            __func__, err);
> > +                       return err;
> > +               }
> > +       }
> > +
> > +       if (!err && req->r_old_inode) {
> > +               err = ceph_wait_on_async_create(req->r_old_inode);
> > +               if (err) {
> > +                       dout("%s: wait for async create returned: %d\n",
> > +                            __func__, err);
> > +                       return err;
> > +               }
> > +       }
> > +
> >         dout("submit_request on %p for inode %p\n", req, dir);
> >         mutex_lock(&mdsc->mutex);
> >         __register_request(mdsc, req, dir);
> > diff --git a/fs/ceph/mds_client.h b/fs/ceph/mds_client.h
> > index 95ac00e59e66..8043f2b439b1 100644
> > --- a/fs/ceph/mds_client.h
> > +++ b/fs/ceph/mds_client.h
> > @@ -538,4 +538,11 @@ extern void ceph_mdsc_open_export_target_sessions(struct ceph_mds_client *mdsc,
> >  extern int ceph_trim_caps(struct ceph_mds_client *mdsc,
> >                           struct ceph_mds_session *session,
> >                           int max_caps);
> > +static inline int ceph_wait_on_async_create(struct inode *inode)
> > +{
> > +       struct ceph_inode_info *ci = ceph_inode(inode);
> > +
> > +       return wait_on_bit(&ci->i_ceph_flags, CEPH_ASYNC_CREATE_BIT,
> > +                          TASK_INTERRUPTIBLE);
> > +}
> >  #endif
> > diff --git a/fs/ceph/super.h b/fs/ceph/super.h
> > index 3430d7ffe8f7..bfb03adb4a08 100644
> > --- a/fs/ceph/super.h
> > +++ b/fs/ceph/super.h
> > @@ -316,7 +316,7 @@ struct ceph_inode_info {
> >         u64 i_inline_version;
> >         u32 i_time_warp_seq;
> > 
> > -       unsigned i_ceph_flags;
> > +       unsigned long i_ceph_flags;
> >         atomic64_t i_release_count;
> >         atomic64_t i_ordered_count;
> >         atomic64_t i_complete_seq[2];
> > @@ -524,6 +524,8 @@ static inline struct inode *ceph_find_inode(struct super_block *sb,
> >  #define CEPH_I_ERROR_WRITE     (1 << 10) /* have seen write errors */
> >  #define CEPH_I_ERROR_FILELOCK  (1 << 11) /* have seen file lock errors */
> >  #define CEPH_I_ODIRECT         (1 << 12) /* inode in direct I/O mode */
> > +#define CEPH_ASYNC_CREATE_BIT  (13)      /* async create in flight for this */
> > +#define CEPH_I_ASYNC_CREATE    (1 << CEPH_ASYNC_CREATE_BIT)
> > 
> >  /*
> >   * Masks of ceph inode work.
> > --
> > 2.24.1
> > 

-- 
Jeff Layton <jlayton@kernel.org>

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH v5 03/12] ceph: add infrastructure for waiting for async create to complete
  2020-02-20 13:01     ` Jeff Layton
@ 2020-02-20 13:33       ` Yan, Zheng
  2020-02-20 14:53         ` Jeff Layton
  0 siblings, 1 reply; 24+ messages in thread
From: Yan, Zheng @ 2020-02-20 13:33 UTC (permalink / raw)
  To: Jeff Layton
  Cc: ceph-devel, Ilya Dryomov, Sage Weil, Zheng Yan, Patrick Donnelly,
	Xiubo Li

On Thu, Feb 20, 2020 at 9:01 PM Jeff Layton <jlayton@kernel.org> wrote:
>
> On Thu, 2020-02-20 at 11:32 +0800, Yan, Zheng wrote:
> > On Wed, Feb 19, 2020 at 9:27 PM Jeff Layton <jlayton@kernel.org> wrote:
> > > When we issue an async create, we must ensure that any later on-the-wire
> > > requests involving it wait for the create reply.
> > >
> > > Expand i_ceph_flags to be an unsigned long, and add a new bit that
> > > MDS requests can wait on. If the bit is set in the inode when sending
> > > caps, then don't send it and just return that it has been delayed.
> > >
> > > Signed-off-by: Jeff Layton <jlayton@kernel.org>
> > > ---
> > >  fs/ceph/caps.c       | 13 ++++++++++++-
> > >  fs/ceph/dir.c        |  2 +-
> > >  fs/ceph/mds_client.c | 20 +++++++++++++++++++-
> > >  fs/ceph/mds_client.h |  7 +++++++
> > >  fs/ceph/super.h      |  4 +++-
> > >  5 files changed, 42 insertions(+), 4 deletions(-)
> > >
> > > diff --git a/fs/ceph/caps.c b/fs/ceph/caps.c
> > > index d05717397c2a..85e13aa359d2 100644
> > > --- a/fs/ceph/caps.c
> > > +++ b/fs/ceph/caps.c
> > > @@ -511,7 +511,7 @@ static void __cap_delay_requeue(struct ceph_mds_client *mdsc,
> > >                                 struct ceph_inode_info *ci,
> > >                                 bool set_timeout)
> > >  {
> > > -       dout("__cap_delay_requeue %p flags %d at %lu\n", &ci->vfs_inode,
> > > +       dout("__cap_delay_requeue %p flags 0x%lx at %lu\n", &ci->vfs_inode,
> > >              ci->i_ceph_flags, ci->i_hold_caps_max);
> > >         if (!mdsc->stopping) {
> > >                 spin_lock(&mdsc->cap_delay_lock);
> > > @@ -1294,6 +1294,13 @@ static int __send_cap(struct ceph_mds_client *mdsc, struct ceph_cap *cap,
> > >         int delayed = 0;
> > >         int ret;
> > >
> > > +       /* Don't send anything if it's still being created. Return delayed */
> > > +       if (ci->i_ceph_flags & CEPH_I_ASYNC_CREATE) {
> > > +               spin_unlock(&ci->i_ceph_lock);
> > > +               dout("%s async create in flight for %p\n", __func__, inode);
> > > +               return 1;
> > > +       }
> > > +
> >
> > Maybe it's better to check this in ceph_check_caps().  Other callers
> > of __send_cap() shouldn't encounter async creating inode
> >
>
> I've been looking, but what actually guarantees that?
>
> Only ceph_check_caps calls it for UPDATE, but the other two callers call
> it for FLUSH. I don't see what prevents the kernel from (e.g.) calling
> write_inode before the create reply comes in, particularly if we just
> create and then close the file.
>

I missed write_inode case. but make __send_cap() skip sending message
can cause problem. For example, if we skip a message that flush dirty
caps. call ceph_check_caps() again may not re-do the flush.

> As a side note, I still struggle with the fact thatthere seems to be no
> coherent overall description of the cap protocol. What distinguishes a
> FLUSH from an UPDATE, for instance? The MDS code and comments seem to
> treat them somewhat interchangeably.
>

UPDATE is super set of FLUSH, UPDATE can always replace FLUSH.

>
> > >         held = cap->issued | cap->implemented;
> > >         revoking = cap->implemented & ~cap->issued;
> > >         retain &= ~revoking;
> > > @@ -2250,6 +2257,10 @@ int ceph_fsync(struct file *file, loff_t start, loff_t end, int datasync)
> > >         if (datasync)
> > >                 goto out;
> > >
> > > +       ret = ceph_wait_on_async_create(inode);
> > > +       if (ret)
> > > +               goto out;
> > > +
> > >         dirty = try_flush_caps(inode, &flush_tid);
> > >         dout("fsync dirty caps are %s\n", ceph_cap_string(dirty));
> > >
> > > diff --git a/fs/ceph/dir.c b/fs/ceph/dir.c
> > > index a87274935a09..5b83bda57056 100644
> > > --- a/fs/ceph/dir.c
> > > +++ b/fs/ceph/dir.c
> > > @@ -752,7 +752,7 @@ static struct dentry *ceph_lookup(struct inode *dir, struct dentry *dentry,
> > >                 struct ceph_dentry_info *di = ceph_dentry(dentry);
> > >
> > >                 spin_lock(&ci->i_ceph_lock);
> > > -               dout(" dir %p flags are %d\n", dir, ci->i_ceph_flags);
> > > +               dout(" dir %p flags are 0x%lx\n", dir, ci->i_ceph_flags);
> > >                 if (strncmp(dentry->d_name.name,
> > >                             fsc->mount_options->snapdir_name,
> > >                             dentry->d_name.len) &&
> > > diff --git a/fs/ceph/mds_client.c b/fs/ceph/mds_client.c
> > > index 94d18e643a3d..38eb9dd5062b 100644
> > > --- a/fs/ceph/mds_client.c
> > > +++ b/fs/ceph/mds_client.c
> > > @@ -2730,7 +2730,7 @@ static void kick_requests(struct ceph_mds_client *mdsc, int mds)
> > >  int ceph_mdsc_submit_request(struct ceph_mds_client *mdsc, struct inode *dir,
> > >                               struct ceph_mds_request *req)
> > >  {
> > > -       int err;
> > > +       int err = 0;
> > >
> > >         /* take CAP_PIN refs for r_inode, r_parent, r_old_dentry */
> > >         if (req->r_inode)
> > > @@ -2743,6 +2743,24 @@ int ceph_mdsc_submit_request(struct ceph_mds_client *mdsc, struct inode *dir,
> > >                 ceph_get_cap_refs(ceph_inode(req->r_old_dentry_dir),
> > >                                   CEPH_CAP_PIN);
> > >
> > > +       if (req->r_inode) {
> > > +               err = ceph_wait_on_async_create(req->r_inode);
> > > +               if (err) {
> > > +                       dout("%s: wait for async create returned: %d\n",
> > > +                            __func__, err);
> > > +                       return err;
> > > +               }
> > > +       }
> > > +
> > > +       if (!err && req->r_old_inode) {
> > > +               err = ceph_wait_on_async_create(req->r_old_inode);
> > > +               if (err) {
> > > +                       dout("%s: wait for async create returned: %d\n",
> > > +                            __func__, err);
> > > +                       return err;
> > > +               }
> > > +       }
> > > +
> > >         dout("submit_request on %p for inode %p\n", req, dir);
> > >         mutex_lock(&mdsc->mutex);
> > >         __register_request(mdsc, req, dir);
> > > diff --git a/fs/ceph/mds_client.h b/fs/ceph/mds_client.h
> > > index 95ac00e59e66..8043f2b439b1 100644
> > > --- a/fs/ceph/mds_client.h
> > > +++ b/fs/ceph/mds_client.h
> > > @@ -538,4 +538,11 @@ extern void ceph_mdsc_open_export_target_sessions(struct ceph_mds_client *mdsc,
> > >  extern int ceph_trim_caps(struct ceph_mds_client *mdsc,
> > >                           struct ceph_mds_session *session,
> > >                           int max_caps);
> > > +static inline int ceph_wait_on_async_create(struct inode *inode)
> > > +{
> > > +       struct ceph_inode_info *ci = ceph_inode(inode);
> > > +
> > > +       return wait_on_bit(&ci->i_ceph_flags, CEPH_ASYNC_CREATE_BIT,
> > > +                          TASK_INTERRUPTIBLE);
> > > +}
> > >  #endif
> > > diff --git a/fs/ceph/super.h b/fs/ceph/super.h
> > > index 3430d7ffe8f7..bfb03adb4a08 100644
> > > --- a/fs/ceph/super.h
> > > +++ b/fs/ceph/super.h
> > > @@ -316,7 +316,7 @@ struct ceph_inode_info {
> > >         u64 i_inline_version;
> > >         u32 i_time_warp_seq;
> > >
> > > -       unsigned i_ceph_flags;
> > > +       unsigned long i_ceph_flags;
> > >         atomic64_t i_release_count;
> > >         atomic64_t i_ordered_count;
> > >         atomic64_t i_complete_seq[2];
> > > @@ -524,6 +524,8 @@ static inline struct inode *ceph_find_inode(struct super_block *sb,
> > >  #define CEPH_I_ERROR_WRITE     (1 << 10) /* have seen write errors */
> > >  #define CEPH_I_ERROR_FILELOCK  (1 << 11) /* have seen file lock errors */
> > >  #define CEPH_I_ODIRECT         (1 << 12) /* inode in direct I/O mode */
> > > +#define CEPH_ASYNC_CREATE_BIT  (13)      /* async create in flight for this */
> > > +#define CEPH_I_ASYNC_CREATE    (1 << CEPH_ASYNC_CREATE_BIT)
> > >
> > >  /*
> > >   * Masks of ceph inode work.
> > > --
> > > 2.24.1
> > >
>
> --
> Jeff Layton <jlayton@kernel.org>
>

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH v5 03/12] ceph: add infrastructure for waiting for async create to complete
  2020-02-20 13:33       ` Yan, Zheng
@ 2020-02-20 14:53         ` Jeff Layton
  2020-02-25 19:45           ` Jeff Layton
  0 siblings, 1 reply; 24+ messages in thread
From: Jeff Layton @ 2020-02-20 14:53 UTC (permalink / raw)
  To: Yan, Zheng
  Cc: ceph-devel, Ilya Dryomov, Sage Weil, Zheng Yan, Patrick Donnelly,
	Xiubo Li

On Thu, 2020-02-20 at 21:33 +0800, Yan, Zheng wrote:
> On Thu, Feb 20, 2020 at 9:01 PM Jeff Layton <jlayton@kernel.org> wrote:
> > On Thu, 2020-02-20 at 11:32 +0800, Yan, Zheng wrote:
> > > On Wed, Feb 19, 2020 at 9:27 PM Jeff Layton <jlayton@kernel.org> wrote:
> > > > When we issue an async create, we must ensure that any later on-the-wire
> > > > requests involving it wait for the create reply.
> > > > 
> > > > Expand i_ceph_flags to be an unsigned long, and add a new bit that
> > > > MDS requests can wait on. If the bit is set in the inode when sending
> > > > caps, then don't send it and just return that it has been delayed.
> > > > 
> > > > Signed-off-by: Jeff Layton <jlayton@kernel.org>
> > > > ---
> > > >  fs/ceph/caps.c       | 13 ++++++++++++-
> > > >  fs/ceph/dir.c        |  2 +-
> > > >  fs/ceph/mds_client.c | 20 +++++++++++++++++++-
> > > >  fs/ceph/mds_client.h |  7 +++++++
> > > >  fs/ceph/super.h      |  4 +++-
> > > >  5 files changed, 42 insertions(+), 4 deletions(-)
> > > > 
> > > > diff --git a/fs/ceph/caps.c b/fs/ceph/caps.c
> > > > index d05717397c2a..85e13aa359d2 100644
> > > > --- a/fs/ceph/caps.c
> > > > +++ b/fs/ceph/caps.c
> > > > @@ -511,7 +511,7 @@ static void __cap_delay_requeue(struct ceph_mds_client *mdsc,
> > > >                                 struct ceph_inode_info *ci,
> > > >                                 bool set_timeout)
> > > >  {
> > > > -       dout("__cap_delay_requeue %p flags %d at %lu\n", &ci->vfs_inode,
> > > > +       dout("__cap_delay_requeue %p flags 0x%lx at %lu\n", &ci->vfs_inode,
> > > >              ci->i_ceph_flags, ci->i_hold_caps_max);
> > > >         if (!mdsc->stopping) {
> > > >                 spin_lock(&mdsc->cap_delay_lock);
> > > > @@ -1294,6 +1294,13 @@ static int __send_cap(struct ceph_mds_client *mdsc, struct ceph_cap *cap,
> > > >         int delayed = 0;
> > > >         int ret;
> > > > 
> > > > +       /* Don't send anything if it's still being created. Return delayed */
> > > > +       if (ci->i_ceph_flags & CEPH_I_ASYNC_CREATE) {
> > > > +               spin_unlock(&ci->i_ceph_lock);
> > > > +               dout("%s async create in flight for %p\n", __func__, inode);
> > > > +               return 1;
> > > > +       }
> > > > +
> > > 
> > > Maybe it's better to check this in ceph_check_caps().  Other callers
> > > of __send_cap() shouldn't encounter async creating inode
> > > 
> > 
> > I've been looking, but what actually guarantees that?
> > 
> > Only ceph_check_caps calls it for UPDATE, but the other two callers call
> > it for FLUSH. I don't see what prevents the kernel from (e.g.) calling
> > write_inode before the create reply comes in, particularly if we just
> > create and then close the file.
> > 
> 
> I missed write_inode case. but make __send_cap() skip sending message
> can cause problem. For example, if we skip a message that flush dirty
> caps. call ceph_check_caps() again may not re-do the flush.
> 

Ugh. Ok, so I guess we'll need to fix that first. I assume that making
sure the flush is redone after being delayed is the right thing to do
here?

> > As a side note, I still struggle with the fact thatthere seems to be no
> > coherent overall description of the cap protocol. What distinguishes a
> > FLUSH from an UPDATE, for instance? The MDS code and comments seem to
> > treat them somewhat interchangeably.
> > 
> 
> UPDATE is super set of FLUSH, UPDATE can always replace FLUSH.
> 

I'll toss this note onto my jumble of notes, for my (eventual) planned
document that describes the cap protocol.

> > > >         held = cap->issued | cap->implemented;
> > > >         revoking = cap->implemented & ~cap->issued;
> > > >         retain &= ~revoking;
> > > > @@ -2250,6 +2257,10 @@ int ceph_fsync(struct file *file, loff_t start, loff_t end, int datasync)
> > > >         if (datasync)
> > > >                 goto out;
> > > > 
> > > > +       ret = ceph_wait_on_async_create(inode);
> > > > +       if (ret)
> > > > +               goto out;
> > > > +
> > > >         dirty = try_flush_caps(inode, &flush_tid);
> > > >         dout("fsync dirty caps are %s\n", ceph_cap_string(dirty));
> > > > 
> > > > diff --git a/fs/ceph/dir.c b/fs/ceph/dir.c
> > > > index a87274935a09..5b83bda57056 100644
> > > > --- a/fs/ceph/dir.c
> > > > +++ b/fs/ceph/dir.c
> > > > @@ -752,7 +752,7 @@ static struct dentry *ceph_lookup(struct inode *dir, struct dentry *dentry,
> > > >                 struct ceph_dentry_info *di = ceph_dentry(dentry);
> > > > 
> > > >                 spin_lock(&ci->i_ceph_lock);
> > > > -               dout(" dir %p flags are %d\n", dir, ci->i_ceph_flags);
> > > > +               dout(" dir %p flags are 0x%lx\n", dir, ci->i_ceph_flags);
> > > >                 if (strncmp(dentry->d_name.name,
> > > >                             fsc->mount_options->snapdir_name,
> > > >                             dentry->d_name.len) &&
> > > > diff --git a/fs/ceph/mds_client.c b/fs/ceph/mds_client.c
> > > > index 94d18e643a3d..38eb9dd5062b 100644
> > > > --- a/fs/ceph/mds_client.c
> > > > +++ b/fs/ceph/mds_client.c
> > > > @@ -2730,7 +2730,7 @@ static void kick_requests(struct ceph_mds_client *mdsc, int mds)
> > > >  int ceph_mdsc_submit_request(struct ceph_mds_client *mdsc, struct inode *dir,
> > > >                               struct ceph_mds_request *req)
> > > >  {
> > > > -       int err;
> > > > +       int err = 0;
> > > > 
> > > >         /* take CAP_PIN refs for r_inode, r_parent, r_old_dentry */
> > > >         if (req->r_inode)
> > > > @@ -2743,6 +2743,24 @@ int ceph_mdsc_submit_request(struct ceph_mds_client *mdsc, struct inode *dir,
> > > >                 ceph_get_cap_refs(ceph_inode(req->r_old_dentry_dir),
> > > >                                   CEPH_CAP_PIN);
> > > > 
> > > > +       if (req->r_inode) {
> > > > +               err = ceph_wait_on_async_create(req->r_inode);
> > > > +               if (err) {
> > > > +                       dout("%s: wait for async create returned: %d\n",
> > > > +                            __func__, err);
> > > > +                       return err;
> > > > +               }
> > > > +       }
> > > > +
> > > > +       if (!err && req->r_old_inode) {
> > > > +               err = ceph_wait_on_async_create(req->r_old_inode);
> > > > +               if (err) {
> > > > +                       dout("%s: wait for async create returned: %d\n",
> > > > +                            __func__, err);
> > > > +                       return err;
> > > > +               }
> > > > +       }
> > > > +
> > > >         dout("submit_request on %p for inode %p\n", req, dir);
> > > >         mutex_lock(&mdsc->mutex);
> > > >         __register_request(mdsc, req, dir);
> > > > diff --git a/fs/ceph/mds_client.h b/fs/ceph/mds_client.h
> > > > index 95ac00e59e66..8043f2b439b1 100644
> > > > --- a/fs/ceph/mds_client.h
> > > > +++ b/fs/ceph/mds_client.h
> > > > @@ -538,4 +538,11 @@ extern void ceph_mdsc_open_export_target_sessions(struct ceph_mds_client *mdsc,
> > > >  extern int ceph_trim_caps(struct ceph_mds_client *mdsc,
> > > >                           struct ceph_mds_session *session,
> > > >                           int max_caps);
> > > > +static inline int ceph_wait_on_async_create(struct inode *inode)
> > > > +{
> > > > +       struct ceph_inode_info *ci = ceph_inode(inode);
> > > > +
> > > > +       return wait_on_bit(&ci->i_ceph_flags, CEPH_ASYNC_CREATE_BIT,
> > > > +                          TASK_INTERRUPTIBLE);
> > > > +}
> > > >  #endif
> > > > diff --git a/fs/ceph/super.h b/fs/ceph/super.h
> > > > index 3430d7ffe8f7..bfb03adb4a08 100644
> > > > --- a/fs/ceph/super.h
> > > > +++ b/fs/ceph/super.h
> > > > @@ -316,7 +316,7 @@ struct ceph_inode_info {
> > > >         u64 i_inline_version;
> > > >         u32 i_time_warp_seq;
> > > > 
> > > > -       unsigned i_ceph_flags;
> > > > +       unsigned long i_ceph_flags;
> > > >         atomic64_t i_release_count;
> > > >         atomic64_t i_ordered_count;
> > > >         atomic64_t i_complete_seq[2];
> > > > @@ -524,6 +524,8 @@ static inline struct inode *ceph_find_inode(struct super_block *sb,
> > > >  #define CEPH_I_ERROR_WRITE     (1 << 10) /* have seen write errors */
> > > >  #define CEPH_I_ERROR_FILELOCK  (1 << 11) /* have seen file lock errors */
> > > >  #define CEPH_I_ODIRECT         (1 << 12) /* inode in direct I/O mode */
> > > > +#define CEPH_ASYNC_CREATE_BIT  (13)      /* async create in flight for this */
> > > > +#define CEPH_I_ASYNC_CREATE    (1 << CEPH_ASYNC_CREATE_BIT)
> > > > 
> > > >  /*
> > > >   * Masks of ceph inode work.
> > > > --
> > > > 2.24.1
> > > > 
> > 
> > --
> > Jeff Layton <jlayton@kernel.org>
> > 

-- 
Jeff Layton <jlayton@kernel.org>

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH v5 03/12] ceph: add infrastructure for waiting for async create to complete
  2020-02-20 14:53         ` Jeff Layton
@ 2020-02-25 19:45           ` Jeff Layton
  2020-02-26 14:10             ` Yan, Zheng
  0 siblings, 1 reply; 24+ messages in thread
From: Jeff Layton @ 2020-02-25 19:45 UTC (permalink / raw)
  To: Yan, Zheng
  Cc: ceph-devel, Ilya Dryomov, Sage Weil, Zheng Yan, Patrick Donnelly,
	Xiubo Li

On Thu, 2020-02-20 at 09:53 -0500, Jeff Layton wrote:
> On Thu, 2020-02-20 at 21:33 +0800, Yan, Zheng wrote:
> > On Thu, Feb 20, 2020 at 9:01 PM Jeff Layton <jlayton@kernel.org> wrote:
> > > On Thu, 2020-02-20 at 11:32 +0800, Yan, Zheng wrote:
> > > > On Wed, Feb 19, 2020 at 9:27 PM Jeff Layton <jlayton@kernel.org> wrote:
> > > > > When we issue an async create, we must ensure that any later on-the-wire
> > > > > requests involving it wait for the create reply.
> > > > > 
> > > > > Expand i_ceph_flags to be an unsigned long, and add a new bit that
> > > > > MDS requests can wait on. If the bit is set in the inode when sending
> > > > > caps, then don't send it and just return that it has been delayed.
> > > > > 
> > > > > Signed-off-by: Jeff Layton <jlayton@kernel.org>
> > > > > ---
> > > > >  fs/ceph/caps.c       | 13 ++++++++++++-
> > > > >  fs/ceph/dir.c        |  2 +-
> > > > >  fs/ceph/mds_client.c | 20 +++++++++++++++++++-
> > > > >  fs/ceph/mds_client.h |  7 +++++++
> > > > >  fs/ceph/super.h      |  4 +++-
> > > > >  5 files changed, 42 insertions(+), 4 deletions(-)
> > > > > 
> > > > > diff --git a/fs/ceph/caps.c b/fs/ceph/caps.c
> > > > > index d05717397c2a..85e13aa359d2 100644
> > > > > --- a/fs/ceph/caps.c
> > > > > +++ b/fs/ceph/caps.c
> > > > > @@ -511,7 +511,7 @@ static void __cap_delay_requeue(struct ceph_mds_client *mdsc,
> > > > >                                 struct ceph_inode_info *ci,
> > > > >                                 bool set_timeout)
> > > > >  {
> > > > > -       dout("__cap_delay_requeue %p flags %d at %lu\n", &ci->vfs_inode,
> > > > > +       dout("__cap_delay_requeue %p flags 0x%lx at %lu\n", &ci->vfs_inode,
> > > > >              ci->i_ceph_flags, ci->i_hold_caps_max);
> > > > >         if (!mdsc->stopping) {
> > > > >                 spin_lock(&mdsc->cap_delay_lock);
> > > > > @@ -1294,6 +1294,13 @@ static int __send_cap(struct ceph_mds_client *mdsc, struct ceph_cap *cap,
> > > > >         int delayed = 0;
> > > > >         int ret;
> > > > > 
> > > > > +       /* Don't send anything if it's still being created. Return delayed */
> > > > > +       if (ci->i_ceph_flags & CEPH_I_ASYNC_CREATE) {
> > > > > +               spin_unlock(&ci->i_ceph_lock);
> > > > > +               dout("%s async create in flight for %p\n", __func__, inode);
> > > > > +               return 1;
> > > > > +       }
> > > > > +
> > > > 
> > > > Maybe it's better to check this in ceph_check_caps().  Other callers
> > > > of __send_cap() shouldn't encounter async creating inode

I'm not sure that's the case, is it? Suppose we call ceph_check_caps
and it ends up delayed. We requeue the cap and then later someone calls
fsync() and we end up calling try_flush_caps even though we haven't
gotten the async create reply yet.

> > > 
> > > I've been looking, but what actually guarantees that?
> > > 
> > > Only ceph_check_caps calls it for UPDATE, but the other two callers call
> > > it for FLUSH. I don't see what prevents the kernel from (e.g.) calling
> > > write_inode before the create reply comes in, particularly if we just
> > > create and then close the file.
> > > 
> > 
> > I missed write_inode case. but make __send_cap() skip sending message
> > can cause problem. For example, if we skip a message that flush dirty
> > caps. call ceph_check_caps() again may not re-do the flush.
> > 
> 
> Ugh. Ok, so I guess we'll need to fix that first. I assume that making
> sure the flush is redone after being delayed is the right thing to do
> here?
> 

Hmm...looking at this more closely today.

__send_cap calls send_cap_msg, and that function does a number of
allocations which could fail. So if this is a problem, it's a problem
today, and we should fix it. There are 3 callers of __send_cap:

try_flush_caps : requeues the cap (and sets the timeouts) if __send_cap
returns non-zero. I think this one is (probably?) OK.

__kick_flushing_caps : just throws a pr_err if __send_cap returns non-
zero, but since the cap is already queued here, there should be no need
to requeue it.

ceph_check_caps : the cap is requeued iff it's delayed.

So...I'm not sure I fully understand your concern. AFAICT, the cap
should end up being queued if the send failed.

I think that's probably the best we can do here. If we end up trying to
flush caps and we haven't gotten the async reply yet, we don't really
have much of a choice other than to wait to flush.

Perhaps though, we ought to call __kick_flushing_caps when an async
create reply comes in just to ensure that we do flush in a timely
fashion once that does occur.

Thoughts?


> > > As a side note, I still struggle with the fact thatthere seems to be no
> > > coherent overall description of the cap protocol. What distinguishes a
> > > FLUSH from an UPDATE, for instance? The MDS code and comments seem to
> > > treat them somewhat interchangeably.
> > > 
> > 
> > UPDATE is super set of FLUSH, UPDATE can always replace FLUSH.
> > 
> 
> I'll toss this note onto my jumble of notes, for my (eventual) planned
> document that describes the cap protocol.
> 
> > > > >         held = cap->issued | cap->implemented;
> > > > >         revoking = cap->implemented & ~cap->issued;
> > > > >         retain &= ~revoking;
> > > > > @@ -2250,6 +2257,10 @@ int ceph_fsync(struct file *file, loff_t start, loff_t end, int datasync)
> > > > >         if (datasync)
> > > > >                 goto out;
> > > > > 
> > > > > +       ret = ceph_wait_on_async_create(inode);
> > > > > +       if (ret)
> > > > > +               goto out;
> > > > > +
> > > > >         dirty = try_flush_caps(inode, &flush_tid);
> > > > >         dout("fsync dirty caps are %s\n", ceph_cap_string(dirty));
> > > > > 
> > > > > diff --git a/fs/ceph/dir.c b/fs/ceph/dir.c
> > > > > index a87274935a09..5b83bda57056 100644
> > > > > --- a/fs/ceph/dir.c
> > > > > +++ b/fs/ceph/dir.c
> > > > > @@ -752,7 +752,7 @@ static struct dentry *ceph_lookup(struct inode *dir, struct dentry *dentry,
> > > > >                 struct ceph_dentry_info *di = ceph_dentry(dentry);
> > > > > 
> > > > >                 spin_lock(&ci->i_ceph_lock);
> > > > > -               dout(" dir %p flags are %d\n", dir, ci->i_ceph_flags);
> > > > > +               dout(" dir %p flags are 0x%lx\n", dir, ci->i_ceph_flags);
> > > > >                 if (strncmp(dentry->d_name.name,
> > > > >                             fsc->mount_options->snapdir_name,
> > > > >                             dentry->d_name.len) &&
> > > > > diff --git a/fs/ceph/mds_client.c b/fs/ceph/mds_client.c
> > > > > index 94d18e643a3d..38eb9dd5062b 100644
> > > > > --- a/fs/ceph/mds_client.c
> > > > > +++ b/fs/ceph/mds_client.c
> > > > > @@ -2730,7 +2730,7 @@ static void kick_requests(struct ceph_mds_client *mdsc, int mds)
> > > > >  int ceph_mdsc_submit_request(struct ceph_mds_client *mdsc, struct inode *dir,
> > > > >                               struct ceph_mds_request *req)
> > > > >  {
> > > > > -       int err;
> > > > > +       int err = 0;
> > > > > 
> > > > >         /* take CAP_PIN refs for r_inode, r_parent, r_old_dentry */
> > > > >         if (req->r_inode)
> > > > > @@ -2743,6 +2743,24 @@ int ceph_mdsc_submit_request(struct ceph_mds_client *mdsc, struct inode *dir,
> > > > >                 ceph_get_cap_refs(ceph_inode(req->r_old_dentry_dir),
> > > > >                                   CEPH_CAP_PIN);
> > > > > 
> > > > > +       if (req->r_inode) {
> > > > > +               err = ceph_wait_on_async_create(req->r_inode);
> > > > > +               if (err) {
> > > > > +                       dout("%s: wait for async create returned: %d\n",
> > > > > +                            __func__, err);
> > > > > +                       return err;
> > > > > +               }
> > > > > +       }
> > > > > +
> > > > > +       if (!err && req->r_old_inode) {
> > > > > +               err = ceph_wait_on_async_create(req->r_old_inode);
> > > > > +               if (err) {
> > > > > +                       dout("%s: wait for async create returned: %d\n",
> > > > > +                            __func__, err);
> > > > > +                       return err;
> > > > > +               }
> > > > > +       }
> > > > > +
> > > > >         dout("submit_request on %p for inode %p\n", req, dir);
> > > > >         mutex_lock(&mdsc->mutex);
> > > > >         __register_request(mdsc, req, dir);
> > > > > diff --git a/fs/ceph/mds_client.h b/fs/ceph/mds_client.h
> > > > > index 95ac00e59e66..8043f2b439b1 100644
> > > > > --- a/fs/ceph/mds_client.h
> > > > > +++ b/fs/ceph/mds_client.h
> > > > > @@ -538,4 +538,11 @@ extern void ceph_mdsc_open_export_target_sessions(struct ceph_mds_client *mdsc,
> > > > >  extern int ceph_trim_caps(struct ceph_mds_client *mdsc,
> > > > >                           struct ceph_mds_session *session,
> > > > >                           int max_caps);
> > > > > +static inline int ceph_wait_on_async_create(struct inode *inode)
> > > > > +{
> > > > > +       struct ceph_inode_info *ci = ceph_inode(inode);
> > > > > +
> > > > > +       return wait_on_bit(&ci->i_ceph_flags, CEPH_ASYNC_CREATE_BIT,
> > > > > +                          TASK_INTERRUPTIBLE);
> > > > > +}
> > > > >  #endif
> > > > > diff --git a/fs/ceph/super.h b/fs/ceph/super.h
> > > > > index 3430d7ffe8f7..bfb03adb4a08 100644
> > > > > --- a/fs/ceph/super.h
> > > > > +++ b/fs/ceph/super.h
> > > > > @@ -316,7 +316,7 @@ struct ceph_inode_info {
> > > > >         u64 i_inline_version;
> > > > >         u32 i_time_warp_seq;
> > > > > 
> > > > > -       unsigned i_ceph_flags;
> > > > > +       unsigned long i_ceph_flags;
> > > > >         atomic64_t i_release_count;
> > > > >         atomic64_t i_ordered_count;
> > > > >         atomic64_t i_complete_seq[2];
> > > > > @@ -524,6 +524,8 @@ static inline struct inode *ceph_find_inode(struct super_block *sb,
> > > > >  #define CEPH_I_ERROR_WRITE     (1 << 10) /* have seen write errors */
> > > > >  #define CEPH_I_ERROR_FILELOCK  (1 << 11) /* have seen file lock errors */
> > > > >  #define CEPH_I_ODIRECT         (1 << 12) /* inode in direct I/O mode */
> > > > > +#define CEPH_ASYNC_CREATE_BIT  (13)      /* async create in flight for this */
> > > > > +#define CEPH_I_ASYNC_CREATE    (1 << CEPH_ASYNC_CREATE_BIT)
> > > > > 
> > > > >  /*
> > > > >   * Masks of ceph inode work.
> > > > > --
> > > > > 2.24.1
> > > > > 
> > > 
> > > --
> > > Jeff Layton <jlayton@kernel.org>
> > > 

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH v5 03/12] ceph: add infrastructure for waiting for async create to complete
  2020-02-25 19:45           ` Jeff Layton
@ 2020-02-26 14:10             ` Yan, Zheng
  2020-02-27 20:06               ` Jeff Layton
  0 siblings, 1 reply; 24+ messages in thread
From: Yan, Zheng @ 2020-02-26 14:10 UTC (permalink / raw)
  To: Jeff Layton, Yan, Zheng
  Cc: ceph-devel, Ilya Dryomov, Sage Weil, Patrick Donnelly, Xiubo Li

On 2/26/20 3:45 AM, Jeff Layton wrote:
> On Thu, 2020-02-20 at 09:53 -0500, Jeff Layton wrote:
>> On Thu, 2020-02-20 at 21:33 +0800, Yan, Zheng wrote:
>>> On Thu, Feb 20, 2020 at 9:01 PM Jeff Layton <jlayton@kernel.org> wrote:
>>>> On Thu, 2020-02-20 at 11:32 +0800, Yan, Zheng wrote:
>>>>> On Wed, Feb 19, 2020 at 9:27 PM Jeff Layton <jlayton@kernel.org> wrote:
>>>>>> When we issue an async create, we must ensure that any later on-the-wire
>>>>>> requests involving it wait for the create reply.
>>>>>>
>>>>>> Expand i_ceph_flags to be an unsigned long, and add a new bit that
>>>>>> MDS requests can wait on. If the bit is set in the inode when sending
>>>>>> caps, then don't send it and just return that it has been delayed.
>>>>>>
>>>>>> Signed-off-by: Jeff Layton <jlayton@kernel.org>
>>>>>> ---
>>>>>>   fs/ceph/caps.c       | 13 ++++++++++++-
>>>>>>   fs/ceph/dir.c        |  2 +-
>>>>>>   fs/ceph/mds_client.c | 20 +++++++++++++++++++-
>>>>>>   fs/ceph/mds_client.h |  7 +++++++
>>>>>>   fs/ceph/super.h      |  4 +++-
>>>>>>   5 files changed, 42 insertions(+), 4 deletions(-)
>>>>>>
>>>>>> diff --git a/fs/ceph/caps.c b/fs/ceph/caps.c
>>>>>> index d05717397c2a..85e13aa359d2 100644
>>>>>> --- a/fs/ceph/caps.c
>>>>>> +++ b/fs/ceph/caps.c
>>>>>> @@ -511,7 +511,7 @@ static void __cap_delay_requeue(struct ceph_mds_client *mdsc,
>>>>>>                                  struct ceph_inode_info *ci,
>>>>>>                                  bool set_timeout)
>>>>>>   {
>>>>>> -       dout("__cap_delay_requeue %p flags %d at %lu\n", &ci->vfs_inode,
>>>>>> +       dout("__cap_delay_requeue %p flags 0x%lx at %lu\n", &ci->vfs_inode,
>>>>>>               ci->i_ceph_flags, ci->i_hold_caps_max);
>>>>>>          if (!mdsc->stopping) {
>>>>>>                  spin_lock(&mdsc->cap_delay_lock);
>>>>>> @@ -1294,6 +1294,13 @@ static int __send_cap(struct ceph_mds_client *mdsc, struct ceph_cap *cap,
>>>>>>          int delayed = 0;
>>>>>>          int ret;
>>>>>>
>>>>>> +       /* Don't send anything if it's still being created. Return delayed */
>>>>>> +       if (ci->i_ceph_flags & CEPH_I_ASYNC_CREATE) {
>>>>>> +               spin_unlock(&ci->i_ceph_lock);
>>>>>> +               dout("%s async create in flight for %p\n", __func__, inode);
>>>>>> +               return 1;
>>>>>> +       }
>>>>>> +
>>>>>
>>>>> Maybe it's better to check this in ceph_check_caps().  Other callers
>>>>> of __send_cap() shouldn't encounter async creating inode
> 
> I'm not sure that's the case, is it? Suppose we call ceph_check_caps
> and it ends up delayed. We requeue the cap and then later someone calls
> fsync() and we end up calling try_flush_caps even though we haven't
> gotten the async create reply yet.

your patch adds a wait_on_async_create for fsync case.

> 
>>>>
>>>> I've been looking, but what actually guarantees that?
>>>>
>>>> Only ceph_check_caps calls it for UPDATE, but the other two callers call
>>>> it for FLUSH. I don't see what prevents the kernel from (e.g.) calling
>>>> write_inode before the create reply comes in, particularly if we just
>>>> create and then close the file.
>>>>
>>>
>>> I missed write_inode case. but make __send_cap() skip sending message
>>> can cause problem. For example, if we skip a message that flush dirty
>>> caps. call ceph_check_caps() again may not re-do the flush.
>>>
>>
>> Ugh. Ok, so I guess we'll need to fix that first. I assume that making
>> sure the flush is redone after being delayed is the right thing to do
>> here?
>>
> 
> Hmm...looking at this more closely today.
> 
> __send_cap calls send_cap_msg, and that function does a number of
> allocations which could fail. So if this is a problem, it's a problem
> today, and we should fix it. There are 3 callers of __send_cap:
> 
> try_flush_caps : requeues the cap (and sets the timeouts) if __send_cap
> returns non-zero. I think this one is (probably?) OK.
> 
I think we can return error back to fsync() for this case.

> __kick_flushing_caps : just throws a pr_err if __send_cap returns non-
> zero, but since the cap is already queued here, there should be no need
> to requeue it.
>

This one is really problematic. ceph_early_kick_flushing_caps() needs to 
re-send flushes when recovering mds is in reconnect state. Otherwise, 
flush may overwrite other client's new change.


> ceph_check_caps : the cap is requeued iff it's delayed.
> 
> So...I'm not sure I fully understand your concern. AFAICT, the cap
> should end up being queued if the send failed.

If ceph_check_caps flushed dirty cap and it failed to send msg. it need 
to undo what __mark_caps_flushing() did

> 
> I think that's probably the best we can do here. If we end up trying to
> flush caps and we haven't gotten the async reply yet, we don't really
> have much of a choice other than to wait to flush.
> 

I think the best is make send_cap_msg never fail. If free memory is 
slow, make the memory allocation wait.

> Perhaps though, we ought to call __kick_flushing_caps when an async
> create reply comes in just to ensure that we do flush in a timely
> fashion once that does occur.
> 
> Thoughts?
> 
> 
>>>> As a side note, I still struggle with the fact thatthere seems to be no
>>>> coherent overall description of the cap protocol. What distinguishes a
>>>> FLUSH from an UPDATE, for instance? The MDS code and comments seem to
>>>> treat them somewhat interchangeably.
>>>>
>>>
>>> UPDATE is super set of FLUSH, UPDATE can always replace FLUSH.
>>>
>>
>> I'll toss this note onto my jumble of notes, for my (eventual) planned
>> document that describes the cap protocol.
>>
>>>>>>          held = cap->issued | cap->implemented;
>>>>>>          revoking = cap->implemented & ~cap->issued;
>>>>>>          retain &= ~revoking;
>>>>>> @@ -2250,6 +2257,10 @@ int ceph_fsync(struct file *file, loff_t start, loff_t end, int datasync)
>>>>>>          if (datasync)
>>>>>>                  goto out;
>>>>>>
>>>>>> +       ret = ceph_wait_on_async_create(inode);
>>>>>> +       if (ret)
>>>>>> +               goto out;
>>>>>> +
>>>>>>          dirty = try_flush_caps(inode, &flush_tid);
>>>>>>          dout("fsync dirty caps are %s\n", ceph_cap_string(dirty));
>>>>>>
>>>>>> diff --git a/fs/ceph/dir.c b/fs/ceph/dir.c
>>>>>> index a87274935a09..5b83bda57056 100644
>>>>>> --- a/fs/ceph/dir.c
>>>>>> +++ b/fs/ceph/dir.c
>>>>>> @@ -752,7 +752,7 @@ static struct dentry *ceph_lookup(struct inode *dir, struct dentry *dentry,
>>>>>>                  struct ceph_dentry_info *di = ceph_dentry(dentry);
>>>>>>
>>>>>>                  spin_lock(&ci->i_ceph_lock);
>>>>>> -               dout(" dir %p flags are %d\n", dir, ci->i_ceph_flags);
>>>>>> +               dout(" dir %p flags are 0x%lx\n", dir, ci->i_ceph_flags);
>>>>>>                  if (strncmp(dentry->d_name.name,
>>>>>>                              fsc->mount_options->snapdir_name,
>>>>>>                              dentry->d_name.len) &&
>>>>>> diff --git a/fs/ceph/mds_client.c b/fs/ceph/mds_client.c
>>>>>> index 94d18e643a3d..38eb9dd5062b 100644
>>>>>> --- a/fs/ceph/mds_client.c
>>>>>> +++ b/fs/ceph/mds_client.c
>>>>>> @@ -2730,7 +2730,7 @@ static void kick_requests(struct ceph_mds_client *mdsc, int mds)
>>>>>>   int ceph_mdsc_submit_request(struct ceph_mds_client *mdsc, struct inode *dir,
>>>>>>                                struct ceph_mds_request *req)
>>>>>>   {
>>>>>> -       int err;
>>>>>> +       int err = 0;
>>>>>>
>>>>>>          /* take CAP_PIN refs for r_inode, r_parent, r_old_dentry */
>>>>>>          if (req->r_inode)
>>>>>> @@ -2743,6 +2743,24 @@ int ceph_mdsc_submit_request(struct ceph_mds_client *mdsc, struct inode *dir,
>>>>>>                  ceph_get_cap_refs(ceph_inode(req->r_old_dentry_dir),
>>>>>>                                    CEPH_CAP_PIN);
>>>>>>
>>>>>> +       if (req->r_inode) {
>>>>>> +               err = ceph_wait_on_async_create(req->r_inode);
>>>>>> +               if (err) {
>>>>>> +                       dout("%s: wait for async create returned: %d\n",
>>>>>> +                            __func__, err);
>>>>>> +                       return err;
>>>>>> +               }
>>>>>> +       }
>>>>>> +
>>>>>> +       if (!err && req->r_old_inode) {
>>>>>> +               err = ceph_wait_on_async_create(req->r_old_inode);
>>>>>> +               if (err) {
>>>>>> +                       dout("%s: wait for async create returned: %d\n",
>>>>>> +                            __func__, err);
>>>>>> +                       return err;
>>>>>> +               }
>>>>>> +       }
>>>>>> +
>>>>>>          dout("submit_request on %p for inode %p\n", req, dir);
>>>>>>          mutex_lock(&mdsc->mutex);
>>>>>>          __register_request(mdsc, req, dir);
>>>>>> diff --git a/fs/ceph/mds_client.h b/fs/ceph/mds_client.h
>>>>>> index 95ac00e59e66..8043f2b439b1 100644
>>>>>> --- a/fs/ceph/mds_client.h
>>>>>> +++ b/fs/ceph/mds_client.h
>>>>>> @@ -538,4 +538,11 @@ extern void ceph_mdsc_open_export_target_sessions(struct ceph_mds_client *mdsc,
>>>>>>   extern int ceph_trim_caps(struct ceph_mds_client *mdsc,
>>>>>>                            struct ceph_mds_session *session,
>>>>>>                            int max_caps);
>>>>>> +static inline int ceph_wait_on_async_create(struct inode *inode)
>>>>>> +{
>>>>>> +       struct ceph_inode_info *ci = ceph_inode(inode);
>>>>>> +
>>>>>> +       return wait_on_bit(&ci->i_ceph_flags, CEPH_ASYNC_CREATE_BIT,
>>>>>> +                          TASK_INTERRUPTIBLE);
>>>>>> +}
>>>>>>   #endif
>>>>>> diff --git a/fs/ceph/super.h b/fs/ceph/super.h
>>>>>> index 3430d7ffe8f7..bfb03adb4a08 100644
>>>>>> --- a/fs/ceph/super.h
>>>>>> +++ b/fs/ceph/super.h
>>>>>> @@ -316,7 +316,7 @@ struct ceph_inode_info {
>>>>>>          u64 i_inline_version;
>>>>>>          u32 i_time_warp_seq;
>>>>>>
>>>>>> -       unsigned i_ceph_flags;
>>>>>> +       unsigned long i_ceph_flags;
>>>>>>          atomic64_t i_release_count;
>>>>>>          atomic64_t i_ordered_count;
>>>>>>          atomic64_t i_complete_seq[2];
>>>>>> @@ -524,6 +524,8 @@ static inline struct inode *ceph_find_inode(struct super_block *sb,
>>>>>>   #define CEPH_I_ERROR_WRITE     (1 << 10) /* have seen write errors */
>>>>>>   #define CEPH_I_ERROR_FILELOCK  (1 << 11) /* have seen file lock errors */
>>>>>>   #define CEPH_I_ODIRECT         (1 << 12) /* inode in direct I/O mode */
>>>>>> +#define CEPH_ASYNC_CREATE_BIT  (13)      /* async create in flight for this */
>>>>>> +#define CEPH_I_ASYNC_CREATE    (1 << CEPH_ASYNC_CREATE_BIT)
>>>>>>
>>>>>>   /*
>>>>>>    * Masks of ceph inode work.
>>>>>> --
>>>>>> 2.24.1
>>>>>>
>>>>
>>>> --
>>>> Jeff Layton <jlayton@kernel.org>
>>>>
> 

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH v5 03/12] ceph: add infrastructure for waiting for async create to complete
  2020-02-26 14:10             ` Yan, Zheng
@ 2020-02-27 20:06               ` Jeff Layton
  0 siblings, 0 replies; 24+ messages in thread
From: Jeff Layton @ 2020-02-27 20:06 UTC (permalink / raw)
  To: Yan, Zheng, Yan, Zheng
  Cc: ceph-devel, Ilya Dryomov, Sage Weil, Patrick Donnelly, Xiubo Li

On Wed, 2020-02-26 at 22:10 +0800, Yan, Zheng wrote:
> On 2/26/20 3:45 AM, Jeff Layton wrote:
> > On Thu, 2020-02-20 at 09:53 -0500, Jeff Layton wrote:
> > > On Thu, 2020-02-20 at 21:33 +0800, Yan, Zheng wrote:
> > > > On Thu, Feb 20, 2020 at 9:01 PM Jeff Layton <jlayton@kernel.org> wrote:
> > > > > On Thu, 2020-02-20 at 11:32 +0800, Yan, Zheng wrote:
> > > > > > On Wed, Feb 19, 2020 at 9:27 PM Jeff Layton <jlayton@kernel.org> wrote:
> > > > > > > When we issue an async create, we must ensure that any later on-the-wire
> > > > > > > requests involving it wait for the create reply.
> > > > > > > 
> > > > > > > Expand i_ceph_flags to be an unsigned long, and add a new bit that
> > > > > > > MDS requests can wait on. If the bit is set in the inode when sending
> > > > > > > caps, then don't send it and just return that it has been delayed.
> > > > > > > 
> > > > > > > Signed-off-by: Jeff Layton <jlayton@kernel.org>
> > > > > > > ---
> > > > > > >   fs/ceph/caps.c       | 13 ++++++++++++-
> > > > > > >   fs/ceph/dir.c        |  2 +-
> > > > > > >   fs/ceph/mds_client.c | 20 +++++++++++++++++++-
> > > > > > >   fs/ceph/mds_client.h |  7 +++++++
> > > > > > >   fs/ceph/super.h      |  4 +++-
> > > > > > >   5 files changed, 42 insertions(+), 4 deletions(-)
> > > > > > > 
> > > > > > > diff --git a/fs/ceph/caps.c b/fs/ceph/caps.c
> > > > > > > index d05717397c2a..85e13aa359d2 100644
> > > > > > > --- a/fs/ceph/caps.c
> > > > > > > +++ b/fs/ceph/caps.c
> > > > > > > @@ -511,7 +511,7 @@ static void __cap_delay_requeue(struct ceph_mds_client *mdsc,
> > > > > > >                                  struct ceph_inode_info *ci,
> > > > > > >                                  bool set_timeout)
> > > > > > >   {
> > > > > > > -       dout("__cap_delay_requeue %p flags %d at %lu\n", &ci->vfs_inode,
> > > > > > > +       dout("__cap_delay_requeue %p flags 0x%lx at %lu\n", &ci->vfs_inode,
> > > > > > >               ci->i_ceph_flags, ci->i_hold_caps_max);
> > > > > > >          if (!mdsc->stopping) {
> > > > > > >                  spin_lock(&mdsc->cap_delay_lock);
> > > > > > > @@ -1294,6 +1294,13 @@ static int __send_cap(struct ceph_mds_client *mdsc, struct ceph_cap *cap,
> > > > > > >          int delayed = 0;
> > > > > > >          int ret;
> > > > > > > 
> > > > > > > +       /* Don't send anything if it's still being created. Return delayed */
> > > > > > > +       if (ci->i_ceph_flags & CEPH_I_ASYNC_CREATE) {
> > > > > > > +               spin_unlock(&ci->i_ceph_lock);
> > > > > > > +               dout("%s async create in flight for %p\n", __func__, inode);
> > > > > > > +               return 1;
> > > > > > > +       }
> > > > > > > +
> > > > > > 
> > > > > > Maybe it's better to check this in ceph_check_caps().  Other callers
> > > > > > of __send_cap() shouldn't encounter async creating inode
> > 
> > I'm not sure that's the case, is it? Suppose we call ceph_check_caps
> > and it ends up delayed. We requeue the cap and then later someone calls
> > fsync() and we end up calling try_flush_caps even though we haven't
> > gotten the async create reply yet.
> 
> your patch adds a wait_on_async_create for fsync case.
> 
> > > > > I've been looking, but what actually guarantees that?
> > > > > 
> > > > > Only ceph_check_caps calls it for UPDATE, but the other two callers call
> > > > > it for FLUSH. I don't see what prevents the kernel from (e.g.) calling
> > > > > write_inode before the create reply comes in, particularly if we just
> > > > > create and then close the file.
> > > > > 
> > > > 
> > > > I missed write_inode case. but make __send_cap() skip sending message
> > > > can cause problem. For example, if we skip a message that flush dirty
> > > > caps. call ceph_check_caps() again may not re-do the flush.
> > > > 
> > > 
> > > Ugh. Ok, so I guess we'll need to fix that first. I assume that making
> > > sure the flush is redone after being delayed is the right thing to do
> > > here?
> > > 
> > 
> > Hmm...looking at this more closely today.
> > 
> > __send_cap calls send_cap_msg, and that function does a number of
> > allocations which could fail. So if this is a problem, it's a problem
> > today, and we should fix it. There are 3 callers of __send_cap:
> > 
> > try_flush_caps : requeues the cap (and sets the timeouts) if __send_cap
> > returns non-zero. I think this one is (probably?) OK.
> > 
> I think we can return error back to fsync() for this case.
> 

Yeah. For write_inode too, I suppose.

> > __kick_flushing_caps : just throws a pr_err if __send_cap returns non-
> > zero, but since the cap is already queued here, there should be no need
> > to requeue it.
> > 
> 
> This one is really problematic. ceph_early_kick_flushing_caps() needs to 
> re-send flushes when recovering mds is in reconnect state. Otherwise, 
> flush may overwrite other client's new change.
> 
> 

Ok.


> > ceph_check_caps : the cap is requeued iff it's delayed.
> > 
> > So...I'm not sure I fully understand your concern. AFAICT, the cap
> > should end up being queued if the send failed.
> 
> If ceph_check_caps flushed dirty cap and it failed to send msg. it need 
> to undo what __mark_caps_flushing() did
> 

Nasty

> > I think that's probably the best we can do here. If we end up trying to
> > flush caps and we haven't gotten the async reply yet, we don't really
> > have much of a choice other than to wait to flush.
> > 
> 
> I think the best is make send_cap_msg never fail. If free memory is 
> slow, make the memory allocation wait.
> 

Ugh.

Looking...I think we're probably ok on the xattr blob already. AFAICT,
that gets preallocated at the time that the setxattr is done.

The main problem is all of the allocations under the ceph_msg_new call
in send_cap_msg. Maybe we ought to be doing those at the time that the
cap is dirtied?

In fact, we already do some preallocation at what appear to be the right
points in the ceph_alloc_cap_flush() calls. Maybe we should do something
similar with ceph_msg_new()?

> > Perhaps though, we ought to call __kick_flushing_caps when an async
> > create reply comes in just to ensure that we do flush in a timely
> > fashion once that does occur.
> > 
> > Thoughts?
> > 
> > 
> > > > > As a side note, I still struggle with the fact thatthere seems to be no
> > > > > coherent overall description of the cap protocol. What distinguishes a
> > > > > FLUSH from an UPDATE, for instance? The MDS code and comments seem to
> > > > > treat them somewhat interchangeably.
> > > > > 
> > > > 
> > > > UPDATE is super set of FLUSH, UPDATE can always replace FLUSH.
> > > > 
> > > 
> > > I'll toss this note onto my jumble of notes, for my (eventual) planned
> > > document that describes the cap protocol.
> > > 
> > > > > > >          held = cap->issued | cap->implemented;
> > > > > > >          revoking = cap->implemented & ~cap->issued;
> > > > > > >          retain &= ~revoking;
> > > > > > > @@ -2250,6 +2257,10 @@ int ceph_fsync(struct file *file, loff_t start, loff_t end, int datasync)
> > > > > > >          if (datasync)
> > > > > > >                  goto out;
> > > > > > > 
> > > > > > > +       ret = ceph_wait_on_async_create(inode);
> > > > > > > +       if (ret)
> > > > > > > +               goto out;
> > > > > > > +
> > > > > > >          dirty = try_flush_caps(inode, &flush_tid);
> > > > > > >          dout("fsync dirty caps are %s\n", ceph_cap_string(dirty));
> > > > > > > 
> > > > > > > diff --git a/fs/ceph/dir.c b/fs/ceph/dir.c
> > > > > > > index a87274935a09..5b83bda57056 100644
> > > > > > > --- a/fs/ceph/dir.c
> > > > > > > +++ b/fs/ceph/dir.c
> > > > > > > @@ -752,7 +752,7 @@ static struct dentry *ceph_lookup(struct inode *dir, struct dentry *dentry,
> > > > > > >                  struct ceph_dentry_info *di = ceph_dentry(dentry);
> > > > > > > 
> > > > > > >                  spin_lock(&ci->i_ceph_lock);
> > > > > > > -               dout(" dir %p flags are %d\n", dir, ci->i_ceph_flags);
> > > > > > > +               dout(" dir %p flags are 0x%lx\n", dir, ci->i_ceph_flags);
> > > > > > >                  if (strncmp(dentry->d_name.name,
> > > > > > >                              fsc->mount_options->snapdir_name,
> > > > > > >                              dentry->d_name.len) &&
> > > > > > > diff --git a/fs/ceph/mds_client.c b/fs/ceph/mds_client.c
> > > > > > > index 94d18e643a3d..38eb9dd5062b 100644
> > > > > > > --- a/fs/ceph/mds_client.c
> > > > > > > +++ b/fs/ceph/mds_client.c
> > > > > > > @@ -2730,7 +2730,7 @@ static void kick_requests(struct ceph_mds_client *mdsc, int mds)
> > > > > > >   int ceph_mdsc_submit_request(struct ceph_mds_client *mdsc, struct inode *dir,
> > > > > > >                                struct ceph_mds_request *req)
> > > > > > >   {
> > > > > > > -       int err;
> > > > > > > +       int err = 0;
> > > > > > > 
> > > > > > >          /* take CAP_PIN refs for r_inode, r_parent, r_old_dentry */
> > > > > > >          if (req->r_inode)
> > > > > > > @@ -2743,6 +2743,24 @@ int ceph_mdsc_submit_request(struct ceph_mds_client *mdsc, struct inode *dir,
> > > > > > >                  ceph_get_cap_refs(ceph_inode(req->r_old_dentry_dir),
> > > > > > >                                    CEPH_CAP_PIN);
> > > > > > > 
> > > > > > > +       if (req->r_inode) {
> > > > > > > +               err = ceph_wait_on_async_create(req->r_inode);
> > > > > > > +               if (err) {
> > > > > > > +                       dout("%s: wait for async create returned: %d\n",
> > > > > > > +                            __func__, err);
> > > > > > > +                       return err;
> > > > > > > +               }
> > > > > > > +       }
> > > > > > > +
> > > > > > > +       if (!err && req->r_old_inode) {
> > > > > > > +               err = ceph_wait_on_async_create(req->r_old_inode);
> > > > > > > +               if (err) {
> > > > > > > +                       dout("%s: wait for async create returned: %d\n",
> > > > > > > +                            __func__, err);
> > > > > > > +                       return err;
> > > > > > > +               }
> > > > > > > +       }
> > > > > > > +
> > > > > > >          dout("submit_request on %p for inode %p\n", req, dir);
> > > > > > >          mutex_lock(&mdsc->mutex);
> > > > > > >          __register_request(mdsc, req, dir);
> > > > > > > diff --git a/fs/ceph/mds_client.h b/fs/ceph/mds_client.h
> > > > > > > index 95ac00e59e66..8043f2b439b1 100644
> > > > > > > --- a/fs/ceph/mds_client.h
> > > > > > > +++ b/fs/ceph/mds_client.h
> > > > > > > @@ -538,4 +538,11 @@ extern void ceph_mdsc_open_export_target_sessions(struct ceph_mds_client *mdsc,
> > > > > > >   extern int ceph_trim_caps(struct ceph_mds_client *mdsc,
> > > > > > >                            struct ceph_mds_session *session,
> > > > > > >                            int max_caps);
> > > > > > > +static inline int ceph_wait_on_async_create(struct inode *inode)
> > > > > > > +{
> > > > > > > +       struct ceph_inode_info *ci = ceph_inode(inode);
> > > > > > > +
> > > > > > > +       return wait_on_bit(&ci->i_ceph_flags, CEPH_ASYNC_CREATE_BIT,
> > > > > > > +                          TASK_INTERRUPTIBLE);
> > > > > > > +}
> > > > > > >   #endif
> > > > > > > diff --git a/fs/ceph/super.h b/fs/ceph/super.h
> > > > > > > index 3430d7ffe8f7..bfb03adb4a08 100644
> > > > > > > --- a/fs/ceph/super.h
> > > > > > > +++ b/fs/ceph/super.h
> > > > > > > @@ -316,7 +316,7 @@ struct ceph_inode_info {
> > > > > > >          u64 i_inline_version;
> > > > > > >          u32 i_time_warp_seq;
> > > > > > > 
> > > > > > > -       unsigned i_ceph_flags;
> > > > > > > +       unsigned long i_ceph_flags;
> > > > > > >          atomic64_t i_release_count;
> > > > > > >          atomic64_t i_ordered_count;
> > > > > > >          atomic64_t i_complete_seq[2];
> > > > > > > @@ -524,6 +524,8 @@ static inline struct inode *ceph_find_inode(struct super_block *sb,
> > > > > > >   #define CEPH_I_ERROR_WRITE     (1 << 10) /* have seen write errors */
> > > > > > >   #define CEPH_I_ERROR_FILELOCK  (1 << 11) /* have seen file lock errors */
> > > > > > >   #define CEPH_I_ODIRECT         (1 << 12) /* inode in direct I/O mode */
> > > > > > > +#define CEPH_ASYNC_CREATE_BIT  (13)      /* async create in flight for this */
> > > > > > > +#define CEPH_I_ASYNC_CREATE    (1 << CEPH_ASYNC_CREATE_BIT)
> > > > > > > 
> > > > > > >   /*
> > > > > > >    * Masks of ceph inode work.
> > > > > > > --
> > > > > > > 2.24.1
> > > > > > > 
> > > > > 
> > > > > --
> > > > > Jeff Layton <jlayton@kernel.org>
> > > > > 

-- 
Jeff Layton <jlayton@kernel.org>

^ permalink raw reply	[flat|nested] 24+ messages in thread

end of thread, other threads:[~2020-02-27 20:06 UTC | newest]

Thread overview: 24+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-02-19 13:25 [PATCH v5 00/12] ceph: async directory operations support Jeff Layton
2020-02-19 13:25 ` [PATCH v5 01/12] ceph: add flag to designate that a request is asynchronous Jeff Layton
2020-02-19 13:25 ` [PATCH v5 02/12] ceph: track primary dentry link Jeff Layton
2020-02-19 13:25 ` [PATCH v5 03/12] ceph: add infrastructure for waiting for async create to complete Jeff Layton
2020-02-20  3:32   ` Yan, Zheng
2020-02-20 13:01     ` Jeff Layton
2020-02-20 13:33       ` Yan, Zheng
2020-02-20 14:53         ` Jeff Layton
2020-02-25 19:45           ` Jeff Layton
2020-02-26 14:10             ` Yan, Zheng
2020-02-27 20:06               ` Jeff Layton
2020-02-19 13:25 ` [PATCH v5 04/12] ceph: make __take_cap_refs non-static Jeff Layton
2020-02-19 13:25 ` [PATCH v5 05/12] ceph: cap tracking for async directory operations Jeff Layton
2020-02-20  6:42   ` Yan, Zheng
2020-02-20 11:30     ` Jeff Layton
2020-02-19 13:25 ` [PATCH v5 06/12] ceph: don't take refs to want mask unless we have all bits Jeff Layton
2020-02-19 13:25 ` [PATCH v5 07/12] ceph: perform asynchronous unlink if we have sufficient caps Jeff Layton
2020-02-20  6:44   ` Yan, Zheng
2020-02-20 11:32     ` Jeff Layton
2020-02-19 13:25 ` [PATCH v5 08/12] ceph: make ceph_fill_inode non-static Jeff Layton
2020-02-19 13:25 ` [PATCH v5 09/12] ceph: decode interval_sets for delegated inos Jeff Layton
2020-02-19 13:25 ` [PATCH v5 10/12] ceph: add new MDS req field to hold delegated inode number Jeff Layton
2020-02-19 13:25 ` [PATCH v5 11/12] ceph: cache layout in parent dir on first sync create Jeff Layton
2020-02-19 13:25 ` [PATCH v5 12/12] ceph: attempt to do async create when possible Jeff Layton

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.