All of lore.kernel.org
 help / color / mirror / Atom feed
* [nfs-ganesha]{PATCH] FSAL_CEPH: do an inode lookup vs. MDS when the Inode is not in cache
@ 2017-11-09 16:03 Jeff Layton
  2017-11-10  2:31 ` Yan, Zheng
  0 siblings, 1 reply; 7+ messages in thread
From: Jeff Layton @ 2017-11-09 16:03 UTC (permalink / raw)
  To: ceph-devel; +Cc: nfs-ganesha-devel

From: Jeff Layton <jlayton@redhat.com>

FSAL_CEPH just does a ceph_ll_get_inode when asked to create a
filehandle today, which only does a query of the local Ceph client
cache. This means, it can only find Inodes that were previously found
via name lookup.

When ganesha is restarted, we have a blank-slate Client, and the Inode
will not be in cache. That ends up with the NFS client getting back an
ESTALE.

We must be able to find Inodes that may not be in the cache. When
ceph_ll_get_inode can't find an Inode, try a lookup vs. the MDS for it.

The only problem is that we currently have no way to look up a snapped
inode from scratch. I think we'll probably need to add that ability to
libcephfs, but for now the best we can do is just bail out with an
ESTALE in that case.

While we're in there, remove the comment about ceph_ll_connectable_m.
There is no such function in libcephfs and never has been. The ds
code does seem to refer to such a call, but it's not clear to me
what it's supposed to do.

Change-Id: I2d0e362ef8f28ba78575b60f3fb2890096d98fc6
Signed-off-by: Jeff Layton <jlayton@redhat.com>
---

I sent this up through git-review to ganesha's gerrit, but I'd
like some cephfs folks to have a look too. This patch is basically
a workaround for the FSAL_CEPH code for now.

Do we need to worry about inodes that are not CEPH_NOSNAP? It
seems like we might need for libcephfs to grow an interface that
can look up snapped inodes as well?

Either that, or teach ceph_ll_get_inode how to look up inodes
that are not in its cache. Would that be a better fix here?

 src/FSAL/FSAL_CEPH/export.c | 19 +++++++++++++++----
 1 file changed, 15 insertions(+), 4 deletions(-)

diff --git a/src/FSAL/FSAL_CEPH/export.c b/src/FSAL/FSAL_CEPH/export.c
index 3de25fdfaf41..a31e3bb3ebef 100644
--- a/src/FSAL/FSAL_CEPH/export.c
+++ b/src/FSAL/FSAL_CEPH/export.c
@@ -234,12 +234,23 @@ static fsal_status_t create_handle(struct fsal_export *export_pub,
 		return status;
 	}
 
+	/* Check our local cache first */
 	i = ceph_ll_get_inode(export->cmount, *vi);
-	if (!i)
-		return ceph2fsal_error(-ESTALE);
+	if (!i) {
+		/*
+		 * Try the slow way, may not be in cache now.
+		 *
+		 * Currently, there is no interface for looking up a snapped
+		 * inode, so we just bail here in that case.
+		 */
+		if (vi->snapid.val != CEPH_NOSNAP)
+			return ceph2fsal_error(-ESTALE);
+
+		rc = ceph_ll_lookup_inode(export->cmount, vi->ino, &i);
+		if (rc)
+			return ceph2fsal_error(rc);
+	}
 
-	/* The ceph_ll_connectable_m should have populated libceph's
-	   cache with all this anyway */
 	rc = fsal_ceph_ll_getattr(export->cmount, i, &stx,
 		attrs_out ? CEPH_STATX_ATTR_MASK : CEPH_STATX_HANDLE_MASK,
 		op_ctx->creds);
-- 
2.13.6


^ permalink raw reply related	[flat|nested] 7+ messages in thread

* Re: [nfs-ganesha]{PATCH] FSAL_CEPH: do an inode lookup vs. MDS when the Inode is not in cache
  2017-11-09 16:03 [nfs-ganesha]{PATCH] FSAL_CEPH: do an inode lookup vs. MDS when the Inode is not in cache Jeff Layton
@ 2017-11-10  2:31 ` Yan, Zheng
  2017-11-10 11:24   ` Jeff Layton
  0 siblings, 1 reply; 7+ messages in thread
From: Yan, Zheng @ 2017-11-10  2:31 UTC (permalink / raw)
  To: Jeff Layton; +Cc: ceph-devel, nfs-ganesha-devel

On Fri, Nov 10, 2017 at 12:03 AM, Jeff Layton <jlayton@kernel.org> wrote:
> From: Jeff Layton <jlayton@redhat.com>
>
> FSAL_CEPH just does a ceph_ll_get_inode when asked to create a
> filehandle today, which only does a query of the local Ceph client
> cache. This means, it can only find Inodes that were previously found
> via name lookup.
>
> When ganesha is restarted, we have a blank-slate Client, and the Inode
> will not be in cache. That ends up with the NFS client getting back an
> ESTALE.
>
> We must be able to find Inodes that may not be in the cache. When
> ceph_ll_get_inode can't find an Inode, try a lookup vs. the MDS for it.
>
> The only problem is that we currently have no way to look up a snapped
> inode from scratch. I think we'll probably need to add that ability to
> libcephfs, but for now the best we can do is just bail out with an
> ESTALE in that case.
>
> While we're in there, remove the comment about ceph_ll_connectable_m.
> There is no such function in libcephfs and never has been. The ds
> code does seem to refer to such a call, but it's not clear to me
> what it's supposed to do.
>
> Change-Id: I2d0e362ef8f28ba78575b60f3fb2890096d98fc6
> Signed-off-by: Jeff Layton <jlayton@redhat.com>
> ---
>
> I sent this up through git-review to ganesha's gerrit, but I'd
> like some cephfs folks to have a look too. This patch is basically
> a workaround for the FSAL_CEPH code for now.
>
> Do we need to worry about inodes that are not CEPH_NOSNAP? It
> seems like we might need for libcephfs to grow an interface that
> can look up snapped inodes as well?
>
> Either that, or teach ceph_ll_get_inode how to look up inodes
> that are not in its cache. Would that be a better fix here?
>
>  src/FSAL/FSAL_CEPH/export.c | 19 +++++++++++++++----
>  1 file changed, 15 insertions(+), 4 deletions(-)
>
> diff --git a/src/FSAL/FSAL_CEPH/export.c b/src/FSAL/FSAL_CEPH/export.c
> index 3de25fdfaf41..a31e3bb3ebef 100644
> --- a/src/FSAL/FSAL_CEPH/export.c
> +++ b/src/FSAL/FSAL_CEPH/export.c
> @@ -234,12 +234,23 @@ static fsal_status_t create_handle(struct fsal_export *export_pub,
>                 return status;
>         }
>
> +       /* Check our local cache first */
>         i = ceph_ll_get_inode(export->cmount, *vi);
> -       if (!i)
> -               return ceph2fsal_error(-ESTALE);
> +       if (!i) {
> +               /*
> +                * Try the slow way, may not be in cache now.
> +                *
> +                * Currently, there is no interface for looking up a snapped
> +                * inode, so we just bail here in that case.
> +                */
> +               if (vi->snapid.val != CEPH_NOSNAP)
> +                       return ceph2fsal_error(-ESTALE);
> +
> +               rc = ceph_ll_lookup_inode(export->cmount, vi->ino, &i);
> +               if (rc)
> +                       return ceph2fsal_error(rc);
> +       }
>
> -       /* The ceph_ll_connectable_m should have populated libceph's
> -          cache with all this anyway */
>         rc = fsal_ceph_ll_getattr(export->cmount, i, &stx,
>                 attrs_out ? CEPH_STATX_ATTR_MASK : CEPH_STATX_HANDLE_MASK,
>                 op_ctx->creds);
> --
> 2.13.6
>
Looks good to me,

Tested-by: "Yan, Zheng" <zyan@redhat.com>

> --
> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [nfs-ganesha]{PATCH] FSAL_CEPH: do an inode lookup vs. MDS when the Inode is not in cache
  2017-11-10  2:31 ` Yan, Zheng
@ 2017-11-10 11:24   ` Jeff Layton
  2017-11-10 13:39     ` Yan, Zheng
  0 siblings, 1 reply; 7+ messages in thread
From: Jeff Layton @ 2017-11-10 11:24 UTC (permalink / raw)
  To: Yan, Zheng; +Cc: ceph-devel, nfs-ganesha-devel

On Fri, 2017-11-10 at 10:31 +0800, Yan, Zheng wrote:
> On Fri, Nov 10, 2017 at 12:03 AM, Jeff Layton <jlayton@kernel.org> wrote:
> > From: Jeff Layton <jlayton@redhat.com>
> > 
> > FSAL_CEPH just does a ceph_ll_get_inode when asked to create a
> > filehandle today, which only does a query of the local Ceph client
> > cache. This means, it can only find Inodes that were previously found
> > via name lookup.
> > 
> > When ganesha is restarted, we have a blank-slate Client, and the Inode
> > will not be in cache. That ends up with the NFS client getting back an
> > ESTALE.
> > 
> > We must be able to find Inodes that may not be in the cache. When
> > ceph_ll_get_inode can't find an Inode, try a lookup vs. the MDS for it.
> > 
> > The only problem is that we currently have no way to look up a snapped
> > inode from scratch. I think we'll probably need to add that ability to
> > libcephfs, but for now the best we can do is just bail out with an
> > ESTALE in that case.
> > 
> > While we're in there, remove the comment about ceph_ll_connectable_m.
> > There is no such function in libcephfs and never has been. The ds
> > code does seem to refer to such a call, but it's not clear to me
> > what it's supposed to do.
> > 
> > Change-Id: I2d0e362ef8f28ba78575b60f3fb2890096d98fc6
> > Signed-off-by: Jeff Layton <jlayton@redhat.com>
> > ---
> > 
> > I sent this up through git-review to ganesha's gerrit, but I'd
> > like some cephfs folks to have a look too. This patch is basically
> > a workaround for the FSAL_CEPH code for now.
> > 
> > Do we need to worry about inodes that are not CEPH_NOSNAP? It
> > seems like we might need for libcephfs to grow an interface that
> > can look up snapped inodes as well?
> > 
> > Either that, or teach ceph_ll_get_inode how to look up inodes
> > that are not in its cache. Would that be a better fix here?
> > 
> >  src/FSAL/FSAL_CEPH/export.c | 19 +++++++++++++++----
> >  1 file changed, 15 insertions(+), 4 deletions(-)
> > 
> > diff --git a/src/FSAL/FSAL_CEPH/export.c b/src/FSAL/FSAL_CEPH/export.c
> > index 3de25fdfaf41..a31e3bb3ebef 100644
> > --- a/src/FSAL/FSAL_CEPH/export.c
> > +++ b/src/FSAL/FSAL_CEPH/export.c
> > @@ -234,12 +234,23 @@ static fsal_status_t create_handle(struct fsal_export *export_pub,
> >                 return status;
> >         }
> > 
> > +       /* Check our local cache first */
> >         i = ceph_ll_get_inode(export->cmount, *vi);
> > -       if (!i)
> > -               return ceph2fsal_error(-ESTALE);
> > +       if (!i) {
> > +               /*
> > +                * Try the slow way, may not be in cache now.
> > +                *
> > +                * Currently, there is no interface for looking up a snapped
> > +                * inode, so we just bail here in that case.
> > +                */
> > +               if (vi->snapid.val != CEPH_NOSNAP)
> > +                       return ceph2fsal_error(-ESTALE);
> > +
> > +               rc = ceph_ll_lookup_inode(export->cmount, vi->ino, &i);
> > +               if (rc)
> > +                       return ceph2fsal_error(rc);
> > +       }
> > 
> > -       /* The ceph_ll_connectable_m should have populated libceph's
> > -          cache with all this anyway */
> >         rc = fsal_ceph_ll_getattr(export->cmount, i, &stx,
> >                 attrs_out ? CEPH_STATX_ATTR_MASK : CEPH_STATX_HANDLE_MASK,
> >                 op_ctx->creds);
> > --
> > 2.13.6
> > 
> 
> Looks good to me,
> 
> Tested-by: "Yan, Zheng" <zyan@redhat.com>
> 

Thanks. I still wonder whether this is a sufficient fix.

ceph_ll_lookup_inode does not take the snapid into account. Do we need a
version of ceph_ll_lookup_inode that can find snapshotted inodes after
ganesha has restarted?

It does seem like we could allow a nfs client do work in a snapshotted
directory, and then have ganesha restart. That client may be very
confused when the server comes back and all of the existing filehandles
go stale.
-- 
Jeff Layton <jlayton@kernel.org>

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [nfs-ganesha]{PATCH] FSAL_CEPH: do an inode lookup vs. MDS when the Inode is not in cache
  2017-11-10 11:24   ` Jeff Layton
@ 2017-11-10 13:39     ` Yan, Zheng
  2017-11-10 13:56       ` Jeff Layton
  0 siblings, 1 reply; 7+ messages in thread
From: Yan, Zheng @ 2017-11-10 13:39 UTC (permalink / raw)
  To: Jeff Layton; +Cc: ceph-devel, nfs-ganesha-devel

On Fri, Nov 10, 2017 at 7:24 PM, Jeff Layton <jlayton@kernel.org> wrote:
> On Fri, 2017-11-10 at 10:31 +0800, Yan, Zheng wrote:
>> On Fri, Nov 10, 2017 at 12:03 AM, Jeff Layton <jlayton@kernel.org> wrote:
>> > From: Jeff Layton <jlayton@redhat.com>
>> >
>> > FSAL_CEPH just does a ceph_ll_get_inode when asked to create a
>> > filehandle today, which only does a query of the local Ceph client
>> > cache. This means, it can only find Inodes that were previously found
>> > via name lookup.
>> >
>> > When ganesha is restarted, we have a blank-slate Client, and the Inode
>> > will not be in cache. That ends up with the NFS client getting back an
>> > ESTALE.
>> >
>> > We must be able to find Inodes that may not be in the cache. When
>> > ceph_ll_get_inode can't find an Inode, try a lookup vs. the MDS for it.
>> >
>> > The only problem is that we currently have no way to look up a snapped
>> > inode from scratch. I think we'll probably need to add that ability to
>> > libcephfs, but for now the best we can do is just bail out with an
>> > ESTALE in that case.
>> >
>> > While we're in there, remove the comment about ceph_ll_connectable_m.
>> > There is no such function in libcephfs and never has been. The ds
>> > code does seem to refer to such a call, but it's not clear to me
>> > what it's supposed to do.
>> >
>> > Change-Id: I2d0e362ef8f28ba78575b60f3fb2890096d98fc6
>> > Signed-off-by: Jeff Layton <jlayton@redhat.com>
>> > ---
>> >
>> > I sent this up through git-review to ganesha's gerrit, but I'd
>> > like some cephfs folks to have a look too. This patch is basically
>> > a workaround for the FSAL_CEPH code for now.
>> >
>> > Do we need to worry about inodes that are not CEPH_NOSNAP? It
>> > seems like we might need for libcephfs to grow an interface that
>> > can look up snapped inodes as well?
>> >
>> > Either that, or teach ceph_ll_get_inode how to look up inodes
>> > that are not in its cache. Would that be a better fix here?
>> >
>> >  src/FSAL/FSAL_CEPH/export.c | 19 +++++++++++++++----
>> >  1 file changed, 15 insertions(+), 4 deletions(-)
>> >
>> > diff --git a/src/FSAL/FSAL_CEPH/export.c b/src/FSAL/FSAL_CEPH/export.c
>> > index 3de25fdfaf41..a31e3bb3ebef 100644
>> > --- a/src/FSAL/FSAL_CEPH/export.c
>> > +++ b/src/FSAL/FSAL_CEPH/export.c
>> > @@ -234,12 +234,23 @@ static fsal_status_t create_handle(struct fsal_export *export_pub,
>> >                 return status;
>> >         }
>> >
>> > +       /* Check our local cache first */
>> >         i = ceph_ll_get_inode(export->cmount, *vi);
>> > -       if (!i)
>> > -               return ceph2fsal_error(-ESTALE);
>> > +       if (!i) {
>> > +               /*
>> > +                * Try the slow way, may not be in cache now.
>> > +                *
>> > +                * Currently, there is no interface for looking up a snapped
>> > +                * inode, so we just bail here in that case.
>> > +                */
>> > +               if (vi->snapid.val != CEPH_NOSNAP)
>> > +                       return ceph2fsal_error(-ESTALE);
>> > +
>> > +               rc = ceph_ll_lookup_inode(export->cmount, vi->ino, &i);
>> > +               if (rc)
>> > +                       return ceph2fsal_error(rc);
>> > +       }
>> >
>> > -       /* The ceph_ll_connectable_m should have populated libceph's
>> > -          cache with all this anyway */
>> >         rc = fsal_ceph_ll_getattr(export->cmount, i, &stx,
>> >                 attrs_out ? CEPH_STATX_ATTR_MASK : CEPH_STATX_HANDLE_MASK,
>> >                 op_ctx->creds);
>> > --
>> > 2.13.6
>> >
>>
>> Looks good to me,
>>
>> Tested-by: "Yan, Zheng" <zyan@redhat.com>
>>
>
> Thanks. I still wonder whether this is a sufficient fix.
>
> ceph_ll_lookup_inode does not take the snapid into account. Do we need a
> version of ceph_ll_lookup_inode that can find snapshotted inodes after
> ganesha has restarted?
>

lookup snapped inode by number hasn't been implemented in mds.  not
sure if we can extend backtrace to support snapped inode.

Regards
Yan, Zheng

> It does seem like we could allow a nfs client do work in a snapshotted
> directory, and then have ganesha restart. That client may be very
> confused when the server comes back and all of the existing filehandles
> go stale.
> --
> Jeff Layton <jlayton@kernel.org>

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [nfs-ganesha]{PATCH] FSAL_CEPH: do an inode lookup vs. MDS when the Inode is not in cache
  2017-11-10 13:39     ` Yan, Zheng
@ 2017-11-10 13:56       ` Jeff Layton
  2017-11-10 14:19         ` Sage Weil
  0 siblings, 1 reply; 7+ messages in thread
From: Jeff Layton @ 2017-11-10 13:56 UTC (permalink / raw)
  To: Yan, Zheng; +Cc: ceph-devel, nfs-ganesha-devel

On Fri, 2017-11-10 at 21:39 +0800, Yan, Zheng wrote:
> On Fri, Nov 10, 2017 at 7:24 PM, Jeff Layton <jlayton@kernel.org> wrote:
> > On Fri, 2017-11-10 at 10:31 +0800, Yan, Zheng wrote:
> > > On Fri, Nov 10, 2017 at 12:03 AM, Jeff Layton <jlayton@kernel.org> wrote:
> > > > From: Jeff Layton <jlayton@redhat.com>
> > > > 
> > > > FSAL_CEPH just does a ceph_ll_get_inode when asked to create a
> > > > filehandle today, which only does a query of the local Ceph client
> > > > cache. This means, it can only find Inodes that were previously found
> > > > via name lookup.
> > > > 
> > > > When ganesha is restarted, we have a blank-slate Client, and the Inode
> > > > will not be in cache. That ends up with the NFS client getting back an
> > > > ESTALE.
> > > > 
> > > > We must be able to find Inodes that may not be in the cache. When
> > > > ceph_ll_get_inode can't find an Inode, try a lookup vs. the MDS for it.
> > > > 
> > > > The only problem is that we currently have no way to look up a snapped
> > > > inode from scratch. I think we'll probably need to add that ability to
> > > > libcephfs, but for now the best we can do is just bail out with an
> > > > ESTALE in that case.
> > > > 
> > > > While we're in there, remove the comment about ceph_ll_connectable_m.
> > > > There is no such function in libcephfs and never has been. The ds
> > > > code does seem to refer to such a call, but it's not clear to me
> > > > what it's supposed to do.
> > > > 
> > > > Change-Id: I2d0e362ef8f28ba78575b60f3fb2890096d98fc6
> > > > Signed-off-by: Jeff Layton <jlayton@redhat.com>
> > > > ---
> > > > 
> > > > I sent this up through git-review to ganesha's gerrit, but I'd
> > > > like some cephfs folks to have a look too. This patch is basically
> > > > a workaround for the FSAL_CEPH code for now.
> > > > 
> > > > Do we need to worry about inodes that are not CEPH_NOSNAP? It
> > > > seems like we might need for libcephfs to grow an interface that
> > > > can look up snapped inodes as well?
> > > > 
> > > > Either that, or teach ceph_ll_get_inode how to look up inodes
> > > > that are not in its cache. Would that be a better fix here?
> > > > 
> > > >  src/FSAL/FSAL_CEPH/export.c | 19 +++++++++++++++----
> > > >  1 file changed, 15 insertions(+), 4 deletions(-)
> > > > 
> > > > diff --git a/src/FSAL/FSAL_CEPH/export.c b/src/FSAL/FSAL_CEPH/export.c
> > > > index 3de25fdfaf41..a31e3bb3ebef 100644
> > > > --- a/src/FSAL/FSAL_CEPH/export.c
> > > > +++ b/src/FSAL/FSAL_CEPH/export.c
> > > > @@ -234,12 +234,23 @@ static fsal_status_t create_handle(struct fsal_export *export_pub,
> > > >                 return status;
> > > >         }
> > > > 
> > > > +       /* Check our local cache first */
> > > >         i = ceph_ll_get_inode(export->cmount, *vi);
> > > > -       if (!i)
> > > > -               return ceph2fsal_error(-ESTALE);
> > > > +       if (!i) {
> > > > +               /*
> > > > +                * Try the slow way, may not be in cache now.
> > > > +                *
> > > > +                * Currently, there is no interface for looking up a snapped
> > > > +                * inode, so we just bail here in that case.
> > > > +                */
> > > > +               if (vi->snapid.val != CEPH_NOSNAP)
> > > > +                       return ceph2fsal_error(-ESTALE);
> > > > +
> > > > +               rc = ceph_ll_lookup_inode(export->cmount, vi->ino, &i);
> > > > +               if (rc)
> > > > +                       return ceph2fsal_error(rc);
> > > > +       }
> > > > 
> > > > -       /* The ceph_ll_connectable_m should have populated libceph's
> > > > -          cache with all this anyway */
> > > >         rc = fsal_ceph_ll_getattr(export->cmount, i, &stx,
> > > >                 attrs_out ? CEPH_STATX_ATTR_MASK : CEPH_STATX_HANDLE_MASK,
> > > >                 op_ctx->creds);
> > > > --
> > > > 2.13.6
> > > > 
> > > 
> > > Looks good to me,
> > > 
> > > Tested-by: "Yan, Zheng" <zyan@redhat.com>
> > > 
> > 
> > Thanks. I still wonder whether this is a sufficient fix.
> > 
> > ceph_ll_lookup_inode does not take the snapid into account. Do we need a
> > version of ceph_ll_lookup_inode that can find snapshotted inodes after
> > ganesha has restarted?
> > 
> 
> lookup snapped inode by number hasn't been implemented in mds.  not
> sure if we can extend backtrace to support snapped inode.
> 
> Regards
> Yan, Zheng
> 

Huh, ok... I looked over the cephfs code a bit yesterday and thought
that CEPH_MDS_OP_LOOKUPSNAP might do the right thing? I don't really
understand the cephfs snapshot model at all though, so I'll take your
word for it. :)

In that case, I guess this is probably the best we can do for now, along
with a recommendation to avoid snapshots when working with ganesha.

If we were worried about it, we could probably prevent ganesha from
using non-CEPH_NOSNAP inodes at all. Have attempts to one for anything
just return a well-known error (NFS4ERR_NXIO or something).

> > It does seem like we could allow a nfs client do work in a snapshotted
> > directory, and then have ganesha restart. That client may be very
> > confused when the server comes back and all of the existing filehandles
> > go stale.
> > --
> > Jeff Layton <jlayton@kernel.org>

-- 
Jeff Layton <jlayton@kernel.org>

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [nfs-ganesha]{PATCH] FSAL_CEPH: do an inode lookup vs. MDS when the Inode is not in cache
  2017-11-10 13:56       ` Jeff Layton
@ 2017-11-10 14:19         ` Sage Weil
  2017-11-10 14:49           ` Jeff Layton
  0 siblings, 1 reply; 7+ messages in thread
From: Sage Weil @ 2017-11-10 14:19 UTC (permalink / raw)
  To: Jeff Layton; +Cc: Yan, Zheng, ceph-devel, nfs-ganesha-devel

On Fri, 10 Nov 2017, Jeff Layton wrote:
> On Fri, 2017-11-10 at 21:39 +0800, Yan, Zheng wrote:
> > On Fri, Nov 10, 2017 at 7:24 PM, Jeff Layton <jlayton@kernel.org> wrote:
> > > On Fri, 2017-11-10 at 10:31 +0800, Yan, Zheng wrote:
> > > > On Fri, Nov 10, 2017 at 12:03 AM, Jeff Layton <jlayton@kernel.org> wrote:
> > > > > From: Jeff Layton <jlayton@redhat.com>
> > > > > 
> > > > > FSAL_CEPH just does a ceph_ll_get_inode when asked to create a
> > > > > filehandle today, which only does a query of the local Ceph client
> > > > > cache. This means, it can only find Inodes that were previously found
> > > > > via name lookup.
> > > > > 
> > > > > When ganesha is restarted, we have a blank-slate Client, and the Inode
> > > > > will not be in cache. That ends up with the NFS client getting back an
> > > > > ESTALE.
> > > > > 
> > > > > We must be able to find Inodes that may not be in the cache. When
> > > > > ceph_ll_get_inode can't find an Inode, try a lookup vs. the MDS for it.
> > > > > 
> > > > > The only problem is that we currently have no way to look up a snapped
> > > > > inode from scratch. I think we'll probably need to add that ability to
> > > > > libcephfs, but for now the best we can do is just bail out with an
> > > > > ESTALE in that case.
> > > > > 
> > > > > While we're in there, remove the comment about ceph_ll_connectable_m.
> > > > > There is no such function in libcephfs and never has been. The ds
> > > > > code does seem to refer to such a call, but it's not clear to me
> > > > > what it's supposed to do.
> > > > > 
> > > > > Change-Id: I2d0e362ef8f28ba78575b60f3fb2890096d98fc6
> > > > > Signed-off-by: Jeff Layton <jlayton@redhat.com>
> > > > > ---
> > > > > 
> > > > > I sent this up through git-review to ganesha's gerrit, but I'd
> > > > > like some cephfs folks to have a look too. This patch is basically
> > > > > a workaround for the FSAL_CEPH code for now.
> > > > > 
> > > > > Do we need to worry about inodes that are not CEPH_NOSNAP? It
> > > > > seems like we might need for libcephfs to grow an interface that
> > > > > can look up snapped inodes as well?
> > > > > 
> > > > > Either that, or teach ceph_ll_get_inode how to look up inodes
> > > > > that are not in its cache. Would that be a better fix here?
> > > > > 
> > > > >  src/FSAL/FSAL_CEPH/export.c | 19 +++++++++++++++----
> > > > >  1 file changed, 15 insertions(+), 4 deletions(-)
> > > > > 
> > > > > diff --git a/src/FSAL/FSAL_CEPH/export.c b/src/FSAL/FSAL_CEPH/export.c
> > > > > index 3de25fdfaf41..a31e3bb3ebef 100644
> > > > > --- a/src/FSAL/FSAL_CEPH/export.c
> > > > > +++ b/src/FSAL/FSAL_CEPH/export.c
> > > > > @@ -234,12 +234,23 @@ static fsal_status_t create_handle(struct fsal_export *export_pub,
> > > > >                 return status;
> > > > >         }
> > > > > 
> > > > > +       /* Check our local cache first */
> > > > >         i = ceph_ll_get_inode(export->cmount, *vi);
> > > > > -       if (!i)
> > > > > -               return ceph2fsal_error(-ESTALE);
> > > > > +       if (!i) {
> > > > > +               /*
> > > > > +                * Try the slow way, may not be in cache now.
> > > > > +                *
> > > > > +                * Currently, there is no interface for looking up a snapped
> > > > > +                * inode, so we just bail here in that case.
> > > > > +                */
> > > > > +               if (vi->snapid.val != CEPH_NOSNAP)
> > > > > +                       return ceph2fsal_error(-ESTALE);
> > > > > +
> > > > > +               rc = ceph_ll_lookup_inode(export->cmount, vi->ino, &i);
> > > > > +               if (rc)
> > > > > +                       return ceph2fsal_error(rc);
> > > > > +       }
> > > > > 
> > > > > -       /* The ceph_ll_connectable_m should have populated libceph's
> > > > > -          cache with all this anyway */
> > > > >         rc = fsal_ceph_ll_getattr(export->cmount, i, &stx,
> > > > >                 attrs_out ? CEPH_STATX_ATTR_MASK : CEPH_STATX_HANDLE_MASK,
> > > > >                 op_ctx->creds);
> > > > > --
> > > > > 2.13.6
> > > > > 
> > > > 
> > > > Looks good to me,
> > > > 
> > > > Tested-by: "Yan, Zheng" <zyan@redhat.com>
> > > > 
> > > 
> > > Thanks. I still wonder whether this is a sufficient fix.
> > > 
> > > ceph_ll_lookup_inode does not take the snapid into account. Do we need a
> > > version of ceph_ll_lookup_inode that can find snapshotted inodes after
> > > ganesha has restarted?
> > > 
> > 
> > lookup snapped inode by number hasn't been implemented in mds.  not
> > sure if we can extend backtrace to support snapped inode.
> > 
> > Regards
> > Yan, Zheng
> > 
> 
> Huh, ok... I looked over the cephfs code a bit yesterday and thought
> that CEPH_MDS_OP_LOOKUPSNAP might do the right thing? I don't really
> understand the cephfs snapshot model at all though, so I'll take your
> word for it. :)

I can't remember what LOOKUPSNAP does, but... doesn't the NFS client 
normally do a re-lookup from teh parent if it gets ESTALE?  The thing 
about snapshots is that they don't change, so if the client retries the 
lookups from the ancestors it will eventually get back to the 
non-snapshotted root they started from, and then lookups from there will 
succeed (unless the snapshot is deleted, in which case ESTALE is just 
fine).

?
s
age



> In that case, I guess this is probably the best we can do for now, along
> with a recommendation to avoid snapshots when working with ganesha.
> 
> If we were worried about it, we could probably prevent ganesha from
> using non-CEPH_NOSNAP inodes at all. Have attempts to one for anything
> just return a well-known error (NFS4ERR_NXIO or something).
> 
> > > It does seem like we could allow a nfs client do work in a snapshotted
> > > directory, and then have ganesha restart. That client may be very
> > > confused when the server comes back and all of the existing filehandles
> > > go stale.
> > > --
> > > Jeff Layton <jlayton@kernel.org>
> 
> -- 
> Jeff Layton <jlayton@kernel.org>
> --
> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 
> 

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [nfs-ganesha]{PATCH] FSAL_CEPH: do an inode lookup vs. MDS when the Inode is not in cache
  2017-11-10 14:19         ` Sage Weil
@ 2017-11-10 14:49           ` Jeff Layton
  0 siblings, 0 replies; 7+ messages in thread
From: Jeff Layton @ 2017-11-10 14:49 UTC (permalink / raw)
  To: Sage Weil; +Cc: Yan, Zheng, ceph-devel, nfs-ganesha-devel

On Fri, 2017-11-10 at 14:19 +0000, Sage Weil wrote:
> On Fri, 10 Nov 2017, Jeff Layton wrote:
> > On Fri, 2017-11-10 at 21:39 +0800, Yan, Zheng wrote:
> > > On Fri, Nov 10, 2017 at 7:24 PM, Jeff Layton <jlayton@kernel.org> wrote:
> > > > On Fri, 2017-11-10 at 10:31 +0800, Yan, Zheng wrote:
> > > > > On Fri, Nov 10, 2017 at 12:03 AM, Jeff Layton <jlayton@kernel.org> wrote:
> > > > > > From: Jeff Layton <jlayton@redhat.com>
> > > > > > 
> > > > > > FSAL_CEPH just does a ceph_ll_get_inode when asked to create a
> > > > > > filehandle today, which only does a query of the local Ceph client
> > > > > > cache. This means, it can only find Inodes that were previously found
> > > > > > via name lookup.
> > > > > > 
> > > > > > When ganesha is restarted, we have a blank-slate Client, and the Inode
> > > > > > will not be in cache. That ends up with the NFS client getting back an
> > > > > > ESTALE.
> > > > > > 
> > > > > > We must be able to find Inodes that may not be in the cache. When
> > > > > > ceph_ll_get_inode can't find an Inode, try a lookup vs. the MDS for it.
> > > > > > 
> > > > > > The only problem is that we currently have no way to look up a snapped
> > > > > > inode from scratch. I think we'll probably need to add that ability to
> > > > > > libcephfs, but for now the best we can do is just bail out with an
> > > > > > ESTALE in that case.
> > > > > > 
> > > > > > While we're in there, remove the comment about ceph_ll_connectable_m.
> > > > > > There is no such function in libcephfs and never has been. The ds
> > > > > > code does seem to refer to such a call, but it's not clear to me
> > > > > > what it's supposed to do.
> > > > > > 
> > > > > > Change-Id: I2d0e362ef8f28ba78575b60f3fb2890096d98fc6
> > > > > > Signed-off-by: Jeff Layton <jlayton@redhat.com>
> > > > > > ---
> > > > > > 
> > > > > > I sent this up through git-review to ganesha's gerrit, but I'd
> > > > > > like some cephfs folks to have a look too. This patch is basically
> > > > > > a workaround for the FSAL_CEPH code for now.
> > > > > > 
> > > > > > Do we need to worry about inodes that are not CEPH_NOSNAP? It
> > > > > > seems like we might need for libcephfs to grow an interface that
> > > > > > can look up snapped inodes as well?
> > > > > > 
> > > > > > Either that, or teach ceph_ll_get_inode how to look up inodes
> > > > > > that are not in its cache. Would that be a better fix here?
> > > > > > 
> > > > > >  src/FSAL/FSAL_CEPH/export.c | 19 +++++++++++++++----
> > > > > >  1 file changed, 15 insertions(+), 4 deletions(-)
> > > > > > 
> > > > > > diff --git a/src/FSAL/FSAL_CEPH/export.c b/src/FSAL/FSAL_CEPH/export.c
> > > > > > index 3de25fdfaf41..a31e3bb3ebef 100644
> > > > > > --- a/src/FSAL/FSAL_CEPH/export.c
> > > > > > +++ b/src/FSAL/FSAL_CEPH/export.c
> > > > > > @@ -234,12 +234,23 @@ static fsal_status_t create_handle(struct fsal_export *export_pub,
> > > > > >                 return status;
> > > > > >         }
> > > > > > 
> > > > > > +       /* Check our local cache first */
> > > > > >         i = ceph_ll_get_inode(export->cmount, *vi);
> > > > > > -       if (!i)
> > > > > > -               return ceph2fsal_error(-ESTALE);
> > > > > > +       if (!i) {
> > > > > > +               /*
> > > > > > +                * Try the slow way, may not be in cache now.
> > > > > > +                *
> > > > > > +                * Currently, there is no interface for looking up a snapped
> > > > > > +                * inode, so we just bail here in that case.
> > > > > > +                */
> > > > > > +               if (vi->snapid.val != CEPH_NOSNAP)
> > > > > > +                       return ceph2fsal_error(-ESTALE);
> > > > > > +
> > > > > > +               rc = ceph_ll_lookup_inode(export->cmount, vi->ino, &i);
> > > > > > +               if (rc)
> > > > > > +                       return ceph2fsal_error(rc);
> > > > > > +       }
> > > > > > 
> > > > > > -       /* The ceph_ll_connectable_m should have populated libceph's
> > > > > > -          cache with all this anyway */
> > > > > >         rc = fsal_ceph_ll_getattr(export->cmount, i, &stx,
> > > > > >                 attrs_out ? CEPH_STATX_ATTR_MASK : CEPH_STATX_HANDLE_MASK,
> > > > > >                 op_ctx->creds);
> > > > > > --
> > > > > > 2.13.6
> > > > > > 
> > > > > 
> > > > > Looks good to me,
> > > > > 
> > > > > Tested-by: "Yan, Zheng" <zyan@redhat.com>
> > > > > 
> > > > 
> > > > Thanks. I still wonder whether this is a sufficient fix.
> > > > 
> > > > ceph_ll_lookup_inode does not take the snapid into account. Do we need a
> > > > version of ceph_ll_lookup_inode that can find snapshotted inodes after
> > > > ganesha has restarted?
> > > > 
> > > 
> > > lookup snapped inode by number hasn't been implemented in mds.  not
> > > sure if we can extend backtrace to support snapped inode.
> > > 
> > > Regards
> > > Yan, Zheng
> > > 
> > 
> > Huh, ok... I looked over the cephfs code a bit yesterday and thought
> > that CEPH_MDS_OP_LOOKUPSNAP might do the right thing? I don't really
> > understand the cephfs snapshot model at all though, so I'll take your
> > word for it. :)
> 
> I can't remember what LOOKUPSNAP does, but... doesn't the NFS client 
> normally do a re-lookup from teh parent if it gets ESTALE?  The thing 
> about snapshots is that they don't change, so if the client retries the 
> lookups from the ancestors it will eventually get back to the 
> non-snapshotted root they started from, and then lookups from there will 
> succeed (unless the snapshot is deleted, in which case ESTALE is just 
> fine).
> 

It'll do that on a pathwalk, but when you already have an open file, any
path information you have may very well not be valid anymore. All you
have is the filehandle at that point.

In fact, in the case where I found this bug, I just had the client doing
this:

    $ tail -f /path/to/file/on/ganesha/cephfs/share

After the client reconnected to the restarted server, it'd just throw
back ESTALE since there's no path involved after the file has been
opened.

I'm pretty sure we're going to need this ability eventually. People will
expect to be able to access their snapshots via NFS client. To that end,
I went ahead and opened this tracker bug.

    http://tracker.ceph.com/issues/22105


> ?
> s
> age
> 
> 
> 
> > In that case, I guess this is probably the best we can do for now, along
> > with a recommendation to avoid snapshots when working with ganesha.
> > 
> > If we were worried about it, we could probably prevent ganesha from
> > using non-CEPH_NOSNAP inodes at all. Have attempts to one for anything
> > just return a well-known error (NFS4ERR_NXIO or something).
> > 
> > > > It does seem like we could allow a nfs client do work in a snapshotted
> > > > directory, and then have ganesha restart. That client may be very
> > > > confused when the server comes back and all of the existing filehandles
> > > > go stale.
> > > > --
> > > > Jeff Layton <jlayton@kernel.org>
> > 
> > -- 
> > Jeff Layton <jlayton@kernel.org>
> > --
> > To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
> > the body of a message to majordomo@vger.kernel.org
> > More majordomo info at  http://vger.kernel.org/majordomo-info.html
> > 
> > 
> --
> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

-- 
Jeff Layton <jlayton@kernel.org>

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2017-11-10 14:49 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-11-09 16:03 [nfs-ganesha]{PATCH] FSAL_CEPH: do an inode lookup vs. MDS when the Inode is not in cache Jeff Layton
2017-11-10  2:31 ` Yan, Zheng
2017-11-10 11:24   ` Jeff Layton
2017-11-10 13:39     ` Yan, Zheng
2017-11-10 13:56       ` Jeff Layton
2017-11-10 14:19         ` Sage Weil
2017-11-10 14:49           ` Jeff Layton

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.