All of lore.kernel.org
 help / color / mirror / Atom feed
From: Sage Weil <sage@inktank.com>
To: Al Viro <viro@ZenIV.linux.org.uk>
Cc: linux-fsdevel@vger.kernel.org, ceph-devel@vger.kernel.org
Subject: Re: [ceph] locking fun with d_materialise_unique()
Date: Tue, 29 Jan 2013 13:03:23 -0800 (PST)	[thread overview]
Message-ID: <alpine.DEB.2.00.1301291301260.6657@cobra.newdream.net> (raw)
In-Reply-To: <20130129080036.GI4503@ZenIV.linux.org.uk>

Hi Al,

On Tue, 29 Jan 2013, Al Viro wrote:

> On Mon, Jan 28, 2013 at 09:20:34PM -0800, Sage Weil wrote:
> 
> > Yep, that is indeed a problem.  I think we just need to do the r_aborted 
> > and/or r_locked_dir check in the else if condition...
> > 
> > > I'm not sure if we are guaranteed that ceph_readdir_prepopulate() won't
> > > get to its splice_dentry() and d_delete() calls in similar situations -
> > > I hadn't checked that one yet.  If it isn't guaranteed, we have a problem
> > > there as well.
> > 
> > ...and the condition guarding readdir_prepopulate().  :)
> 
> > I think you're reading it correctly.  The main thing to keep in mind here 
> > is that we *do* need to call fill_inode() for the inode metadata on these 
> > requests to keep the mds and client state in sync.  The dentry state is 
> > safe to ignore.
> 
> You mean the parts under
>         if (rinfo->head->is_dentry) {
> and
>         if (rinfo->head->is_target) {
> in there?  Because there's fill_inode() called from readdir_prepopulate()
> and it's a lot more problematic than those two...

Yep, patch below.
 
> > It would be great to have the dir i_mutex rules summarized somewhere, even 
> > if it is just a copy of the below.  It took a fair bit of trial and error 
> > to infer what was going on when writing this code.  :)
> 
> Directory ->i_mutex rules are in part documented - "what VFS guarantees
> to hold" side is in Documentation/filesystems/directory-locking.  It's
> the other side ("what locks are expected to be held by callers of dcache.c
> functions") that is badly missing...
> 
> > Ping me when you've pushed that branch and I'll take a look...
> 
> To gitolite@ra.kernel.org:/pub/scm/linux/kernel/git/viro/vfs.git
>    01a88fa..4056362  master -> master
> 
> with tentative ceph patch in the very end.  Should be on git.kernel.org
> shortly...

We should drop teh mds_client.c hunk from your patch, and then do 
something like the below.  I'll put it in the ceph tree so we can do some 
basic testing.  Unfortunately, the abort case is a bit annoying to 
trigger.. we'll need to write a new test for it.

Thanks-
sage



>From 80d70ef02d5277af8e1355db74fffe71f7217e76 Mon Sep 17 00:00:00 2001
From: Sage Weil <sage@inktank.com>
Date: Tue, 29 Jan 2013 13:01:15 -0800
Subject: [PATCH] ceph: prepopulate inodes only when request is aborted

If r_aborted is true, we do not hold the dir i_mutex, and cannot touch
the dcache.  However, we still need to update the inodes with the state
returned by the MDS.

Reported-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Sage Weil <sage@inktank.com>
---
 fs/ceph/inode.c |   36 ++++++++++++++++++++++++++++++++++++
 1 file changed, 36 insertions(+)

diff --git a/fs/ceph/inode.c b/fs/ceph/inode.c
index 77b1b18..b51653e 100644
--- a/fs/ceph/inode.c
+++ b/fs/ceph/inode.c
@@ -1195,6 +1195,39 @@ done:
 /*
  * Prepopulate our cache with readdir results, leases, etc.
  */
+static int readdir_prepopulate_inodes_only(struct ceph_mds_request *req,
+					   struct ceph_mds_session *session)
+{
+	struct ceph_mds_reply_info_parsed *rinfo = &req->r_reply_info;
+	int i, err = 0;
+
+	for (i = 0; i < rinfo->dir_nr; i++) {
+		struct ceph_vino vino;
+		struct inode *in;
+		int rc;
+
+		vino.ino = le64_to_cpu(rinfo->dir_in[i].in->ino);
+		vino.snap = le64_to_cpu(rinfo->dir_in[i].in->snapid);
+
+		in = ceph_get_inode(req->r_dentry->d_sb, vino);
+		if (IS_ERR(in)) {
+			err = PTR_ERR(in);
+			dout("new_inode badness got %d\n", err);
+			continue;
+		}
+		rc = fill_inode(in, &rinfo->dir_in[i], NULL, session,
+				req->r_request_started, -1,
+				&req->r_caps_reservation);
+		if (rc < 0) {
+			pr_err("fill_inode badness on %p got %d\n", in, rc);
+			err = rc;
+			continue;
+		}
+	}
+
+	return err;
+}
+
 int ceph_readdir_prepopulate(struct ceph_mds_request *req,
 			     struct ceph_mds_session *session)
 {
@@ -1209,6 +1242,9 @@ int ceph_readdir_prepopulate(struct ceph_mds_request *req,
 	u64 frag = le32_to_cpu(rhead->args.readdir.frag);
 	struct ceph_dentry_info *di;
 
+	if (req->r_aborted)
+		return readdir_prepopulate_inodes_only(req, session);
+
 	if (le32_to_cpu(rinfo->head->op) == CEPH_MDS_OP_LSSNAP) {
 		snapdir = ceph_get_snapdir(parent->d_inode);
 		parent = d_find_alias(snapdir);
-- 
1.7.9.5


WARNING: multiple messages have this Message-ID (diff)
From: Sage Weil <sage@inktank.com>
To: Al Viro <viro@ZenIV.linux.org.uk>
Cc: linux-fsdevel@vger.kernel.org, ceph-devel@vger.kernel.org
Subject: Re: [ceph] locking fun with d_materialise_unique()
Date: Tue, 29 Jan 2013 13:03:23 -0800 (PST)	[thread overview]
Message-ID: <alpine.DEB.2.00.1301291301260.6657@cobra.newdream.net> (raw)
In-Reply-To: <20130129080036.GI4503@ZenIV.linux.org.uk>

Hi Al,

On Tue, 29 Jan 2013, Al Viro wrote:

> On Mon, Jan 28, 2013 at 09:20:34PM -0800, Sage Weil wrote:
> 
> > Yep, that is indeed a problem.  I think we just need to do the r_aborted 
> > and/or r_locked_dir check in the else if condition...
> > 
> > > I'm not sure if we are guaranteed that ceph_readdir_prepopulate() won't
> > > get to its splice_dentry() and d_delete() calls in similar situations -
> > > I hadn't checked that one yet.  If it isn't guaranteed, we have a problem
> > > there as well.
> > 
> > ...and the condition guarding readdir_prepopulate().  :)
> 
> > I think you're reading it correctly.  The main thing to keep in mind here 
> > is that we *do* need to call fill_inode() for the inode metadata on these 
> > requests to keep the mds and client state in sync.  The dentry state is 
> > safe to ignore.
> 
> You mean the parts under
>         if (rinfo->head->is_dentry) {
> and
>         if (rinfo->head->is_target) {
> in there?  Because there's fill_inode() called from readdir_prepopulate()
> and it's a lot more problematic than those two...

Yep, patch below.
 
> > It would be great to have the dir i_mutex rules summarized somewhere, even 
> > if it is just a copy of the below.  It took a fair bit of trial and error 
> > to infer what was going on when writing this code.  :)
> 
> Directory ->i_mutex rules are in part documented - "what VFS guarantees
> to hold" side is in Documentation/filesystems/directory-locking.  It's
> the other side ("what locks are expected to be held by callers of dcache.c
> functions") that is badly missing...
> 
> > Ping me when you've pushed that branch and I'll take a look...
> 
> To gitolite@ra.kernel.org:/pub/scm/linux/kernel/git/viro/vfs.git
>    01a88fa..4056362  master -> master
> 
> with tentative ceph patch in the very end.  Should be on git.kernel.org
> shortly...

We should drop teh mds_client.c hunk from your patch, and then do 
something like the below.  I'll put it in the ceph tree so we can do some 
basic testing.  Unfortunately, the abort case is a bit annoying to 
trigger.. we'll need to write a new test for it.

Thanks-
sage



From 80d70ef02d5277af8e1355db74fffe71f7217e76 Mon Sep 17 00:00:00 2001
From: Sage Weil <sage@inktank.com>
Date: Tue, 29 Jan 2013 13:01:15 -0800
Subject: [PATCH] ceph: prepopulate inodes only when request is aborted

If r_aborted is true, we do not hold the dir i_mutex, and cannot touch
the dcache.  However, we still need to update the inodes with the state
returned by the MDS.

Reported-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Sage Weil <sage@inktank.com>
---
 fs/ceph/inode.c |   36 ++++++++++++++++++++++++++++++++++++
 1 file changed, 36 insertions(+)

diff --git a/fs/ceph/inode.c b/fs/ceph/inode.c
index 77b1b18..b51653e 100644
--- a/fs/ceph/inode.c
+++ b/fs/ceph/inode.c
@@ -1195,6 +1195,39 @@ done:
 /*
  * Prepopulate our cache with readdir results, leases, etc.
  */
+static int readdir_prepopulate_inodes_only(struct ceph_mds_request *req,
+					   struct ceph_mds_session *session)
+{
+	struct ceph_mds_reply_info_parsed *rinfo = &req->r_reply_info;
+	int i, err = 0;
+
+	for (i = 0; i < rinfo->dir_nr; i++) {
+		struct ceph_vino vino;
+		struct inode *in;
+		int rc;
+
+		vino.ino = le64_to_cpu(rinfo->dir_in[i].in->ino);
+		vino.snap = le64_to_cpu(rinfo->dir_in[i].in->snapid);
+
+		in = ceph_get_inode(req->r_dentry->d_sb, vino);
+		if (IS_ERR(in)) {
+			err = PTR_ERR(in);
+			dout("new_inode badness got %d\n", err);
+			continue;
+		}
+		rc = fill_inode(in, &rinfo->dir_in[i], NULL, session,
+				req->r_request_started, -1,
+				&req->r_caps_reservation);
+		if (rc < 0) {
+			pr_err("fill_inode badness on %p got %d\n", in, rc);
+			err = rc;
+			continue;
+		}
+	}
+
+	return err;
+}
+
 int ceph_readdir_prepopulate(struct ceph_mds_request *req,
 			     struct ceph_mds_session *session)
 {
@@ -1209,6 +1242,9 @@ int ceph_readdir_prepopulate(struct ceph_mds_request *req,
 	u64 frag = le32_to_cpu(rhead->args.readdir.frag);
 	struct ceph_dentry_info *di;
 
+	if (req->r_aborted)
+		return readdir_prepopulate_inodes_only(req, session);
+
 	if (le32_to_cpu(rinfo->head->op) == CEPH_MDS_OP_LSSNAP) {
 		snapdir = ceph_get_snapdir(parent->d_inode);
 		parent = d_find_alias(snapdir);
-- 
1.7.9.5


  reply	other threads:[~2013-01-29 21:03 UTC|newest]

Thread overview: 8+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2013-01-29  4:52 [ceph] locking fun with d_materialise_unique() Al Viro
2013-01-29  5:20 ` Sage Weil
2013-01-29  8:00   ` Al Viro
2013-01-29 21:03     ` Sage Weil [this message]
2013-01-29 21:03       ` Sage Weil
2013-01-30 14:42       ` Al Viro
2013-02-01  1:14         ` Al Viro
2013-02-05 21:59           ` Sage Weil

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=alpine.DEB.2.00.1301291301260.6657@cobra.newdream.net \
    --to=sage@inktank.com \
    --cc=ceph-devel@vger.kernel.org \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=viro@ZenIV.linux.org.uk \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.