[PATCH 1/3] libceph: call r_unsafe_callback when unsafe reply is received

All of lore.kernel.org
 help / color / mirror / Atom feed

* [PATCH 1/3] libceph: call r_unsafe_callback when unsafe reply is received
@ 2013-06-24  6:41 Yan, Zheng
  2013-06-24  6:41 ` [PATCH 2/3] mds: fix cap revoke race Yan, Zheng
                   ` (3 more replies)
  0 siblings, 4 replies; 25+ messages in thread
From: Yan, Zheng @ 2013-06-24  6:41 UTC (permalink / raw)
  To: ceph-devel; +Cc: sage, alex.elder, Yan, Zheng

From: "Yan, Zheng" <zheng.z.yan@intel.com>

We can't use !req->r_sent to check if OSD request is sent for the
first time, this is because __cancel_request() zeros req->r_sent
when OSD map changes. Rather than adding a new variable to struct
ceph_osd_request to indicate if it's sent for the first time, We
can call the unsafe callback only when unsafe OSD reply is received.
If OSD's first reply is safe, just skip calling the unsafe callback.

The purpose of unsafe callback is adding unsafe request to a list,
so that fsync(2) can wait for the safe reply. fsync(2) doesn't need
to wait for a write(2) that hasn't returned yet. So it's OK to add
request to the unsafe list when the first OSD reply is received.
(ceph_sync_write() returns after receiving the first OSD reply)

Signed-off-by: Yan, Zheng <zheng.z.yan@intel.com>
---
 net/ceph/osd_client.c | 14 +++++++-------
 1 file changed, 7 insertions(+), 7 deletions(-)

diff --git a/net/ceph/osd_client.c b/net/ceph/osd_client.c
index 540dd29..dd47889 100644
--- a/net/ceph/osd_client.c
+++ b/net/ceph/osd_client.c
@@ -1337,10 +1337,6 @@ static void __send_request(struct ceph_osd_client *osdc,
 
 	ceph_msg_get(req->r_request); /* send consumes a ref */
 
-	/* Mark the request unsafe if this is the first timet's being sent. */
-
-	if (!req->r_sent && req->r_unsafe_callback)
-		req->r_unsafe_callback(req, true);
 	req->r_sent = req->r_osd->o_incarnation;
 
 	ceph_con_send(&req->r_osd->o_con, req->r_request);
@@ -1431,8 +1427,6 @@ static void handle_osds_timeout(struct work_struct *work)
 
 static void complete_request(struct ceph_osd_request *req)
 {
-	if (req->r_unsafe_callback)
-		req->r_unsafe_callback(req, false);
 	complete_all(&req->r_safe_completion);  /* fsync waiter */
 }
 
@@ -1559,14 +1553,20 @@ static void handle_reply(struct ceph_osd_client *osdc, struct ceph_msg *msg,
 	mutex_unlock(&osdc->request_mutex);
 
 	if (!already_completed) {
+		if (req->r_unsafe_callback &&
+		    result >= 0 && !(flags & CEPH_OSD_FLAG_ONDISK))
+			req->r_unsafe_callback(req, true);
 		if (req->r_callback)
 			req->r_callback(req, msg);
 		else
 			complete_all(&req->r_completion);
 	}
 
-	if (flags & CEPH_OSD_FLAG_ONDISK)
+	if (flags & CEPH_OSD_FLAG_ONDISK) {
+		if (req->r_unsafe_callback && already_completed)
+			req->r_unsafe_callback(req, false);
 		complete_request(req);
+	}
 
 done:
 	dout("req=%p req->r_linger=%d\n", req, req->r_linger);
-- 
1.8.1.4


^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH 2/3] mds: fix cap revoke race
  2013-06-24  6:41 [PATCH 1/3] libceph: call r_unsafe_callback when unsafe reply is received Yan, Zheng
@ 2013-06-24  6:41 ` Yan, Zheng
  2013-06-24  8:00   ` Yan, Zheng
  2013-06-24  8:19   ` Yan, Zheng
  2013-06-24  6:41 ` [PATCH 3/3] mds: fix race between cap issue and revoke Yan, Zheng
                   ` (2 subsequent siblings)
  3 siblings, 2 replies; 25+ messages in thread
From: Yan, Zheng @ 2013-06-24  6:41 UTC (permalink / raw)
  To: ceph-devel; +Cc: sage, alex.elder, Yan, Zheng

From: "Yan, Zheng" <zheng.z.yan@intel.com>

If caps are been revoking by the auth MDS, don't consider them as
issued even they are still issued by non-auth MDS. The non-auth
MDS should also be revoking/exporting these caps, the client just
hasn't received the cap revoke/export message.

The race I encountered is: When caps are exporting to new MDS, the
client receives cap import message and cap revoke message from the
new MDS, then receives cap export message from the old MDS. When
the client receives cap revoke message from the new MDS, the revoking
caps are still issued by the old MDS, so the client does nothing.
Later when the cap export message is received, the client removes
the caps issued by the old MDS. (Another way to fix the race is
calling ceph_check_caps() in handle_cap_export())

Signed-off-by: Yan, Zheng <zheng.z.yan@intel.com>
---
 fs/ceph/caps.c | 9 +++++++++
 1 file changed, 9 insertions(+)

diff --git a/fs/ceph/caps.c b/fs/ceph/caps.c
index 9a5ccc9..a8c616b 100644
--- a/fs/ceph/caps.c
+++ b/fs/ceph/caps.c
@@ -697,6 +697,15 @@ int __ceph_caps_issued(struct ceph_inode_info *ci, int *implemented)
 		if (implemented)
 			*implemented |= cap->implemented;
 	}
+	/*
+	 * exclude caps issued by non-auth MDS, but are been revoking
+	 * by the auth MDS. The non-auth MDS should be revoking/exporting
+	 * these caps, but the message is delayed.
+	 */
+	if (ci->i_auth_cap) {
+		cap = ci->i_auth_cap;
+		have &= ~cap->implemented | cap->issued;
+	}
 	return have;
 }

-- 
1.8.1.4

^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH 3/3] mds: fix race between cap issue and revoke
  2013-06-24  6:41 [PATCH 1/3] libceph: call r_unsafe_callback when unsafe reply is received Yan, Zheng
  2013-06-24  6:41 ` [PATCH 2/3] mds: fix cap revoke race Yan, Zheng
@ 2013-06-24  6:41 ` Yan, Zheng
  2013-06-24  8:16   ` Yan, Zheng
  2013-07-01  7:28 ` [PATCH 1/3] libceph: call r_unsafe_callback when unsafe reply is received Yan, Zheng
  2013-07-02 13:07 ` Alex Elder
  3 siblings, 1 reply; 25+ messages in thread
From: Yan, Zheng @ 2013-06-24  6:41 UTC (permalink / raw)
  To: ceph-devel; +Cc: sage, alex.elder, Yan, Zheng

From: "Yan, Zheng" <zheng.z.yan@intel.com>

If we receive new caps from the auth MDS and the non-auth MDS is
revoking the newly issued caps, we should release the caps from
the non-auth MDS. The scenario is filelock's state changes from
SYNC to LOCK. Non-auth MDS revokes Fc cap, the client gets Fc
cap from the auth MDS at the same time.

Signed-off-by: Yan, Zheng <zheng.z.yan@intel.com>
---
 fs/ceph/caps.c | 29 ++++++++++++++++++++---------
 1 file changed, 20 insertions(+), 9 deletions(-)

diff --git a/fs/ceph/caps.c b/fs/ceph/caps.c
index a8c616b..9b8b1aa 100644
--- a/fs/ceph/caps.c
+++ b/fs/ceph/caps.c
@@ -813,22 +813,28 @@ int __ceph_caps_issued_mask(struct ceph_inode_info *ci, int mask, int touch)
 /*
  * Return true if mask caps are currently being revoked by an MDS.
  */
-int ceph_caps_revoking(struct ceph_inode_info *ci, int mask)
+int __ceph_caps_revoking_other(struct ceph_inode_info *ci,
+			       struct ceph_cap *ocap, int mask)
 {
-	struct inode *inode = &ci->vfs_inode;
 	struct ceph_cap *cap;
 	struct rb_node *p;
-	int ret = 0;
 
-	spin_lock(&ci->i_ceph_lock);
 	for (p = rb_first(&ci->i_caps); p; p = rb_next(p)) {
 		cap = rb_entry(p, struct ceph_cap, ci_node);
-		if (__cap_is_valid(cap) &&
-		    (cap->implemented & ~cap->issued & mask)) {
-			ret = 1;
-			break;
-		}
+		if (cap != ocap && __cap_is_valid(cap) &&
+		    (cap->implemented & ~cap->issued & mask))
+			return 1;
 	}
+	return 0;
+}
+
+int ceph_caps_revoking(struct ceph_inode_info *ci, int mask)
+{
+	struct inode *inode = &ci->vfs_inode;
+	int ret;
+
+	spin_lock(&ci->i_ceph_lock);
+	ret = __ceph_caps_revoking_other(ci, NULL, mask);
 	spin_unlock(&ci->i_ceph_lock);
 	dout("ceph_caps_revoking %p %s = %d\n", inode,
 	     ceph_cap_string(mask), ret);
@@ -2491,6 +2497,11 @@ static void handle_cap_grant(struct inode *inode, struct ceph_mds_caps *grant,
 	} else {
 		dout("grant: %s -> %s\n", ceph_cap_string(cap->issued),
 		     ceph_cap_string(newcaps));
+		/* non-auth MDS is revoking the newly grant caps ? */
+		if (cap == ci->i_auth_cap &&
+		    __ceph_caps_revoking_other(ci, cap, newcaps))
+		    check_caps = 2;
+
 		cap->issued = newcaps;
 		cap->implemented |= newcaps; /* add bits only, to
 					      * avoid stepping on a
-- 
1.8.1.4


^ permalink raw reply related	[flat|nested] 25+ messages in thread

* Re: [PATCH 2/3] mds: fix cap revoke race
  2013-06-24  6:41 ` [PATCH 2/3] mds: fix cap revoke race Yan, Zheng
@ 2013-06-24  8:00   ` Yan, Zheng
  2013-06-24  8:19   ` Yan, Zheng
  1 sibling, 0 replies; 25+ messages in thread
From: Yan, Zheng @ 2013-06-24  8:00 UTC (permalink / raw)
  To: ceph-devel; +Cc: sage, alex.elder

From d3954de9c471c996c807b61fc37acbb84726da37 Mon Sep 17 00:00:00 2001
From: "Yan, Zheng" <zheng.z.yan@intel.com>
Date: Mon, 17 Jun 2013 09:15:33 +0800
Subject: [PATCH 2/3] ceph: fix cap revoke race

If caps are been revoking by the auth MDS, don't consider them as
issued even they are still issued by non-auth MDS. The non-auth
MDS should also be revoking/exporting these caps, the client just
hasn't received the cap revoke/export message.

The race I encountered is: When caps are exporting to new MDS, the
client receives cap import message and cap revoke message from the
new MDS, then receives cap export message from the old MDS. When
the client receives cap revoke message from the new MDS, the revoking
caps are still issued by the old MDS, so the client does nothing.
Later when the cap export message is received, the client removes
the caps issued by the old MDS. (Another way to fix the race is
calling ceph_check_caps() in handle_cap_export())

Signed-off-by: Yan, Zheng <zheng.z.yan@intel.com>
---
 fs/ceph/caps.c | 9 +++++++++
 1 file changed, 9 insertions(+)

diff --git a/fs/ceph/caps.c b/fs/ceph/caps.c
index 9a5ccc9..a8c616b 100644
--- a/fs/ceph/caps.c
+++ b/fs/ceph/caps.c
@@ -697,6 +697,15 @@ int __ceph_caps_issued(struct ceph_inode_info *ci, int *implemented)
 		if (implemented)
 			*implemented |= cap->implemented;
 	}
+	/*
+	 * exclude caps issued by non-auth MDS, but are been revoking
+	 * by the auth MDS. The non-auth MDS should be revoking/exporting
+	 * these caps, but the message is delayed.
+	 */
+	if (ci->i_auth_cap) {
+		cap = ci->i_auth_cap;
+		have &= ~cap->implemented | cap->issued;
+	}
 	return have;
 }

-- 
1.8.1.4

^ permalink raw reply related	[flat|nested] 25+ messages in thread

* Re: [PATCH 3/3] mds: fix race between cap issue and revoke
  2013-06-24  6:41 ` [PATCH 3/3] mds: fix race between cap issue and revoke Yan, Zheng
@ 2013-06-24  8:16   ` Yan, Zheng
  0 siblings, 0 replies; 25+ messages in thread
From: Yan, Zheng @ 2013-06-24  8:16 UTC (permalink / raw)
  To: ceph-devel; +Cc: sage, alex.elder

This patch's title is wrong, please ignore it.

Thanks
Yan, Zheng

On 06/24/2013 02:41 PM, Yan, Zheng wrote:
> From: "Yan, Zheng" <zheng.z.yan@intel.com>
> 
> If we receive new caps from the auth MDS and the non-auth MDS is
> revoking the newly issued caps, we should release the caps from
> the non-auth MDS. The scenario is filelock's state changes from
> SYNC to LOCK. Non-auth MDS revokes Fc cap, the client gets Fc
> cap from the auth MDS at the same time.
> 
> Signed-off-by: Yan, Zheng <zheng.z.yan@intel.com>
> ---
>  fs/ceph/caps.c | 29 ++++++++++++++++++++---------
>  1 file changed, 20 insertions(+), 9 deletions(-)
> 
> diff --git a/fs/ceph/caps.c b/fs/ceph/caps.c
> index a8c616b..9b8b1aa 100644
> --- a/fs/ceph/caps.c
> +++ b/fs/ceph/caps.c
> @@ -813,22 +813,28 @@ int __ceph_caps_issued_mask(struct ceph_inode_info *ci, int mask, int touch)
>  /*
>   * Return true if mask caps are currently being revoked by an MDS.
>   */
> -int ceph_caps_revoking(struct ceph_inode_info *ci, int mask)
> +int __ceph_caps_revoking_other(struct ceph_inode_info *ci,
> +			       struct ceph_cap *ocap, int mask)
>  {
> -	struct inode *inode = &ci->vfs_inode;
>  	struct ceph_cap *cap;
>  	struct rb_node *p;
> -	int ret = 0;
>  
> -	spin_lock(&ci->i_ceph_lock);
>  	for (p = rb_first(&ci->i_caps); p; p = rb_next(p)) {
>  		cap = rb_entry(p, struct ceph_cap, ci_node);
> -		if (__cap_is_valid(cap) &&
> -		    (cap->implemented & ~cap->issued & mask)) {
> -			ret = 1;
> -			break;
> -		}
> +		if (cap != ocap && __cap_is_valid(cap) &&
> +		    (cap->implemented & ~cap->issued & mask))
> +			return 1;
>  	}
> +	return 0;
> +}
> +
> +int ceph_caps_revoking(struct ceph_inode_info *ci, int mask)
> +{
> +	struct inode *inode = &ci->vfs_inode;
> +	int ret;
> +
> +	spin_lock(&ci->i_ceph_lock);
> +	ret = __ceph_caps_revoking_other(ci, NULL, mask);
>  	spin_unlock(&ci->i_ceph_lock);
>  	dout("ceph_caps_revoking %p %s = %d\n", inode,
>  	     ceph_cap_string(mask), ret);
> @@ -2491,6 +2497,11 @@ static void handle_cap_grant(struct inode *inode, struct ceph_mds_caps *grant,
>  	} else {
>  		dout("grant: %s -> %s\n", ceph_cap_string(cap->issued),
>  		     ceph_cap_string(newcaps));
> +		/* non-auth MDS is revoking the newly grant caps ? */
> +		if (cap == ci->i_auth_cap &&
> +		    __ceph_caps_revoking_other(ci, cap, newcaps))
> +		    check_caps = 2;
> +
>  		cap->issued = newcaps;
>  		cap->implemented |= newcaps; /* add bits only, to
>  					      * avoid stepping on a
> 


^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH 2/3] mds: fix cap revoke race
  2013-06-24  6:41 ` [PATCH 2/3] mds: fix cap revoke race Yan, Zheng
  2013-06-24  8:00   ` Yan, Zheng
@ 2013-06-24  8:19   ` Yan, Zheng
  1 sibling, 0 replies; 25+ messages in thread
From: Yan, Zheng @ 2013-06-24  8:19 UTC (permalink / raw)
  To: ceph-devel; +Cc: sage, alex.elder

This patch's title is wrong, please ignore it.

Thanks
Yan, Zheng

On 06/24/2013 02:41 PM, Yan, Zheng wrote:
> From: "Yan, Zheng" <zheng.z.yan@intel.com>
> 
> If caps are been revoking by the auth MDS, don't consider them as
> issued even they are still issued by non-auth MDS. The non-auth
> MDS should also be revoking/exporting these caps, the client just
> hasn't received the cap revoke/export message.
> 
> The race I encountered is: When caps are exporting to new MDS, the
> client receives cap import message and cap revoke message from the
> new MDS, then receives cap export message from the old MDS. When
> the client receives cap revoke message from the new MDS, the revoking
> caps are still issued by the old MDS, so the client does nothing.
> Later when the cap export message is received, the client removes
> the caps issued by the old MDS. (Another way to fix the race is
> calling ceph_check_caps() in handle_cap_export())
> 
> Signed-off-by: Yan, Zheng <zheng.z.yan@intel.com>
> ---
>  fs/ceph/caps.c | 9 +++++++++
>  1 file changed, 9 insertions(+)
> 
> diff --git a/fs/ceph/caps.c b/fs/ceph/caps.c
> index 9a5ccc9..a8c616b 100644
> --- a/fs/ceph/caps.c
> +++ b/fs/ceph/caps.c
> @@ -697,6 +697,15 @@ int __ceph_caps_issued(struct ceph_inode_info *ci, int *implemented)
>  		if (implemented)
>  			*implemented |= cap->implemented;
>  	}
> +	/*
> +	 * exclude caps issued by non-auth MDS, but are been revoking
> +	 * by the auth MDS. The non-auth MDS should be revoking/exporting
> +	 * these caps, but the message is delayed.
> +	 */
> +	if (ci->i_auth_cap) {
> +		cap = ci->i_auth_cap;
> +		have &= ~cap->implemented | cap->issued;
> +	}
>  	return have;
>  }
>  
> 


^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH 1/3] libceph: call r_unsafe_callback when unsafe reply is received
  2013-06-24  6:41 [PATCH 1/3] libceph: call r_unsafe_callback when unsafe reply is received Yan, Zheng
  2013-06-24  6:41 ` [PATCH 2/3] mds: fix cap revoke race Yan, Zheng
  2013-06-24  6:41 ` [PATCH 3/3] mds: fix race between cap issue and revoke Yan, Zheng
@ 2013-07-01  7:28 ` Yan, Zheng
  2013-07-01 19:46   ` Sage Weil
  2013-07-02 13:07 ` Alex Elder
  3 siblings, 1 reply; 25+ messages in thread
From: Yan, Zheng @ 2013-07-01  7:28 UTC (permalink / raw)
  To: sage; +Cc: ceph-devel, alex.elder

ping

I think this patch should goes into 3.11 or fix the issue by other means


On 06/24/2013 02:41 PM, Yan, Zheng wrote:
> From: "Yan, Zheng" <zheng.z.yan@intel.com>
> 
> We can't use !req->r_sent to check if OSD request is sent for the
> first time, this is because __cancel_request() zeros req->r_sent
> when OSD map changes. Rather than adding a new variable to struct
> ceph_osd_request to indicate if it's sent for the first time, We
> can call the unsafe callback only when unsafe OSD reply is received.
> If OSD's first reply is safe, just skip calling the unsafe callback.
> 
> The purpose of unsafe callback is adding unsafe request to a list,
> so that fsync(2) can wait for the safe reply. fsync(2) doesn't need
> to wait for a write(2) that hasn't returned yet. So it's OK to add
> request to the unsafe list when the first OSD reply is received.
> (ceph_sync_write() returns after receiving the first OSD reply)
> 
> Signed-off-by: Yan, Zheng <zheng.z.yan@intel.com>
> ---
>  net/ceph/osd_client.c | 14 +++++++-------
>  1 file changed, 7 insertions(+), 7 deletions(-)
> 
> diff --git a/net/ceph/osd_client.c b/net/ceph/osd_client.c
> index 540dd29..dd47889 100644
> --- a/net/ceph/osd_client.c
> +++ b/net/ceph/osd_client.c
> @@ -1337,10 +1337,6 @@ static void __send_request(struct ceph_osd_client *osdc,
>  
>  	ceph_msg_get(req->r_request); /* send consumes a ref */
>  
> -	/* Mark the request unsafe if this is the first timet's being sent. */
> -
> -	if (!req->r_sent && req->r_unsafe_callback)
> -		req->r_unsafe_callback(req, true);
>  	req->r_sent = req->r_osd->o_incarnation;
>  
>  	ceph_con_send(&req->r_osd->o_con, req->r_request);
> @@ -1431,8 +1427,6 @@ static void handle_osds_timeout(struct work_struct *work)
>  
>  static void complete_request(struct ceph_osd_request *req)
>  {
> -	if (req->r_unsafe_callback)
> -		req->r_unsafe_callback(req, false);
>  	complete_all(&req->r_safe_completion);  /* fsync waiter */
>  }
>  
> @@ -1559,14 +1553,20 @@ static void handle_reply(struct ceph_osd_client *osdc, struct ceph_msg *msg,
>  	mutex_unlock(&osdc->request_mutex);
>  
>  	if (!already_completed) {
> +		if (req->r_unsafe_callback &&
> +		    result >= 0 && !(flags & CEPH_OSD_FLAG_ONDISK))
> +			req->r_unsafe_callback(req, true);
>  		if (req->r_callback)
>  			req->r_callback(req, msg);
>  		else
>  			complete_all(&req->r_completion);
>  	}
>  
> -	if (flags & CEPH_OSD_FLAG_ONDISK)
> +	if (flags & CEPH_OSD_FLAG_ONDISK) {
> +		if (req->r_unsafe_callback && already_completed)
> +			req->r_unsafe_callback(req, false);
>  		complete_request(req);
> +	}
>  
>  done:
>  	dout("req=%p req->r_linger=%d\n", req, req->r_linger);
> 


^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH 1/3] libceph: call r_unsafe_callback when unsafe reply is received
  2013-07-01  7:28 ` [PATCH 1/3] libceph: call r_unsafe_callback when unsafe reply is received Yan, Zheng
@ 2013-07-01 19:46   ` Sage Weil
  2013-07-03 21:57     ` Sage Weil
  0 siblings, 1 reply; 25+ messages in thread
From: Sage Weil @ 2013-07-01 19:46 UTC (permalink / raw)
  To: Yan, Zheng; +Cc: ceph-devel, alex.elder

On Mon, 1 Jul 2013, Yan, Zheng wrote:
> ping
> 
> I think this patch should goes into 3.11 or fix the issue by other means

Applied this to the testing branch, thanks.  Let me know if there are any 
others I missed!

sage

> 
> 
> On 06/24/2013 02:41 PM, Yan, Zheng wrote:
> > From: "Yan, Zheng" <zheng.z.yan@intel.com>
> > 
> > We can't use !req->r_sent to check if OSD request is sent for the
> > first time, this is because __cancel_request() zeros req->r_sent
> > when OSD map changes. Rather than adding a new variable to struct
> > ceph_osd_request to indicate if it's sent for the first time, We
> > can call the unsafe callback only when unsafe OSD reply is received.
> > If OSD's first reply is safe, just skip calling the unsafe callback.
> > 
> > The purpose of unsafe callback is adding unsafe request to a list,
> > so that fsync(2) can wait for the safe reply. fsync(2) doesn't need
> > to wait for a write(2) that hasn't returned yet. So it's OK to add
> > request to the unsafe list when the first OSD reply is received.
> > (ceph_sync_write() returns after receiving the first OSD reply)
> > 
> > Signed-off-by: Yan, Zheng <zheng.z.yan@intel.com>
> > ---
> >  net/ceph/osd_client.c | 14 +++++++-------
> >  1 file changed, 7 insertions(+), 7 deletions(-)
> > 
> > diff --git a/net/ceph/osd_client.c b/net/ceph/osd_client.c
> > index 540dd29..dd47889 100644
> > --- a/net/ceph/osd_client.c
> > +++ b/net/ceph/osd_client.c
> > @@ -1337,10 +1337,6 @@ static void __send_request(struct ceph_osd_client *osdc,
> >  
> >  	ceph_msg_get(req->r_request); /* send consumes a ref */
> >  
> > -	/* Mark the request unsafe if this is the first timet's being sent. */
> > -
> > -	if (!req->r_sent && req->r_unsafe_callback)
> > -		req->r_unsafe_callback(req, true);
> >  	req->r_sent = req->r_osd->o_incarnation;
> >  
> >  	ceph_con_send(&req->r_osd->o_con, req->r_request);
> > @@ -1431,8 +1427,6 @@ static void handle_osds_timeout(struct work_struct *work)
> >  
> >  static void complete_request(struct ceph_osd_request *req)
> >  {
> > -	if (req->r_unsafe_callback)
> > -		req->r_unsafe_callback(req, false);
> >  	complete_all(&req->r_safe_completion);  /* fsync waiter */
> >  }
> >  
> > @@ -1559,14 +1553,20 @@ static void handle_reply(struct ceph_osd_client *osdc, struct ceph_msg *msg,
> >  	mutex_unlock(&osdc->request_mutex);
> >  
> >  	if (!already_completed) {
> > +		if (req->r_unsafe_callback &&
> > +		    result >= 0 && !(flags & CEPH_OSD_FLAG_ONDISK))
> > +			req->r_unsafe_callback(req, true);
> >  		if (req->r_callback)
> >  			req->r_callback(req, msg);
> >  		else
> >  			complete_all(&req->r_completion);
> >  	}
> >  
> > -	if (flags & CEPH_OSD_FLAG_ONDISK)
> > +	if (flags & CEPH_OSD_FLAG_ONDISK) {
> > +		if (req->r_unsafe_callback && already_completed)
> > +			req->r_unsafe_callback(req, false);
> >  		complete_request(req);
> > +	}
> >  
> >  done:
> >  	dout("req=%p req->r_linger=%d\n", req, req->r_linger);
> > 
> 
> 

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH 1/3] libceph: call r_unsafe_callback when unsafe reply is received
  2013-06-24  6:41 [PATCH 1/3] libceph: call r_unsafe_callback when unsafe reply is received Yan, Zheng
                   ` (2 preceding siblings ...)
  2013-07-01  7:28 ` [PATCH 1/3] libceph: call r_unsafe_callback when unsafe reply is received Yan, Zheng
@ 2013-07-02 13:07 ` Alex Elder
  2013-07-02 14:27   ` Yan, Zheng
  2013-07-02 18:10   ` Sage Weil
  3 siblings, 2 replies; 25+ messages in thread
From: Alex Elder @ 2013-07-02 13:07 UTC (permalink / raw)
  To: Yan, Zheng; +Cc: ceph-devel, sage

On 06/24/2013 01:41 AM, Yan, Zheng wrote:
> From: "Yan, Zheng" <zheng.z.yan@intel.com>

Sorry it took so long, I intended to take a look at this
for you sooner.

I would also like to thank you for this nice clear
description.  It made it very easy to understand
why you were proposing the change, and to focus in
on exactly which parts of the design it's affecting.

> We can't use !req->r_sent to check if OSD request is sent for the
> first time, this is because __cancel_request() zeros req->r_sent
> when OSD map changes. Rather than adding a new variable to struct

You're right.

> ceph_osd_request to indicate if it's sent for the first time, We
> can call the unsafe callback only when unsafe OSD reply is received.
> If OSD's first reply is safe, just skip calling the unsafe callback.

This seems reasonable, but it's different from the way I
thought about what constituted "unsafe."  But I may be
wrong, and the way this is used by the file system might
do something that addresses my concern.

The way I interpreted "unsafe" was simply that it was possible
a write *could* have been made persistent, even if the client
doesn't know about it.  A request could have made it to its
target osd, been written, and the response could be in flight
at the point something (maybe a router?) crashes and the response
gets lost.  During that time window, the stored data may not be
in a state that's consistent with the client's view of it.

So I thought of "unsafe" as meaning that a write is in flight,
and until we get a successful response, the storage might
contain the old data or it might contain the new data; the
client has no way of knowing which.

With that interpretation, a request becomes unsafe the
instant it leaves the client, and becomes safe again
the instant a response arrives.

If my interpretation is correct, this change is wrong.

But I may be wrong, and there may really be no need to
worry about a possible modification of data until after
an acknowledgement response is received.  In that case,
I've looked at your patch and it looks good.

Can you explain why I'm wrong about what is "unsafe?"

					-Alex

> The purpose of unsafe callback is adding unsafe request to a list,
> so that fsync(2) can wait for the safe reply. fsync(2) doesn't need
> to wait for a write(2) that hasn't returned yet. So it's OK to add
> request to the unsafe list when the first OSD reply is received.
> (ceph_sync_write() returns after receiving the first OSD reply)
> 
> Signed-off-by: Yan, Zheng <zheng.z.yan@intel.com>
> ---
>  net/ceph/osd_client.c | 14 +++++++-------
>  1 file changed, 7 insertions(+), 7 deletions(-)
> 
> diff --git a/net/ceph/osd_client.c b/net/ceph/osd_client.c
> index 540dd29..dd47889 100644
> --- a/net/ceph/osd_client.c
> +++ b/net/ceph/osd_client.c
> @@ -1337,10 +1337,6 @@ static void __send_request(struct ceph_osd_client *osdc,
>  
>  	ceph_msg_get(req->r_request); /* send consumes a ref */
>  
> -	/* Mark the request unsafe if this is the first timet's being sent. */
> -
> -	if (!req->r_sent && req->r_unsafe_callback)
> -		req->r_unsafe_callback(req, true);
>  	req->r_sent = req->r_osd->o_incarnation;
>  
>  	ceph_con_send(&req->r_osd->o_con, req->r_request);
> @@ -1431,8 +1427,6 @@ static void handle_osds_timeout(struct work_struct *work)
>  
>  static void complete_request(struct ceph_osd_request *req)
>  {
> -	if (req->r_unsafe_callback)
> -		req->r_unsafe_callback(req, false);
>  	complete_all(&req->r_safe_completion);  /* fsync waiter */
>  }
>  
> @@ -1559,14 +1553,20 @@ static void handle_reply(struct ceph_osd_client *osdc, struct ceph_msg *msg,
>  	mutex_unlock(&osdc->request_mutex);
>  
>  	if (!already_completed) {
> +		if (req->r_unsafe_callback &&
> +		    result >= 0 && !(flags & CEPH_OSD_FLAG_ONDISK))
> +			req->r_unsafe_callback(req, true);
>  		if (req->r_callback)
>  			req->r_callback(req, msg);
>  		else
>  			complete_all(&req->r_completion);
>  	}
>  
> -	if (flags & CEPH_OSD_FLAG_ONDISK)
> +	if (flags & CEPH_OSD_FLAG_ONDISK) {
> +		if (req->r_unsafe_callback && already_completed)
> +			req->r_unsafe_callback(req, false);
>  		complete_request(req);
> +	}
>  
>  done:
>  	dout("req=%p req->r_linger=%d\n", req, req->r_linger);
> 

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH 1/3] libceph: call r_unsafe_callback when unsafe reply is received
  2013-07-02 13:07 ` Alex Elder
@ 2013-07-02 14:27   ` Yan, Zheng
  2013-07-02 18:10   ` Sage Weil
  1 sibling, 0 replies; 25+ messages in thread
From: Yan, Zheng @ 2013-07-02 14:27 UTC (permalink / raw)
  To: Alex Elder; +Cc: Yan, Zheng, ceph-devel, Sage Weil

On Tue, Jul 2, 2013 at 9:07 PM, Alex Elder <alex.elder@linaro.org> wrote:
> On 06/24/2013 01:41 AM, Yan, Zheng wrote:
>> From: "Yan, Zheng" <zheng.z.yan@intel.com>
>
> Sorry it took so long, I intended to take a look at this
> for you sooner.
>
> I would also like to thank you for this nice clear
> description.  It made it very easy to understand
> why you were proposing the change, and to focus in
> on exactly which parts of the design it's affecting.
>
>> We can't use !req->r_sent to check if OSD request is sent for the
>> first time, this is because __cancel_request() zeros req->r_sent
>> when OSD map changes. Rather than adding a new variable to struct
>
> You're right.
>
>> ceph_osd_request to indicate if it's sent for the first time, We
>> can call the unsafe callback only when unsafe OSD reply is received.
>> If OSD's first reply is safe, just skip calling the unsafe callback.
>
> This seems reasonable, but it's different from the way I
> thought about what constituted "unsafe."  But I may be
> wrong, and the way this is used by the file system might
> do something that addresses my concern.
>
> The way I interpreted "unsafe" was simply that it was possible
> a write *could* have been made persistent, even if the client
> doesn't know about it.  A request could have made it to its
> target osd, been written, and the response could be in flight
> at the point something (maybe a router?) crashes and the response
> gets lost.  During that time window, the stored data may not be
> in a state that's consistent with the client's view of it.
>
> So I thought of "unsafe" as meaning that a write is in flight,
> and until we get a successful response, the storage might
> contain the old data or it might contain the new data; the
> client has no way of knowing which.
>
> With that interpretation, a request becomes unsafe the
> instant it leaves the client, and becomes safe again
> the instant a response arrives.
>
> If my interpretation is correct, this change is wrong.
>
> But I may be wrong, and there may really be no need to
> worry about a possible modification of data until after
> an acknowledgement response is received.  In that case,
> I've looked at your patch and it looks good.
>
> Can you explain why I'm wrong about what is "unsafe?"

I didn't say you are wrong.  the reason I changed the meaning of the
unsafe callback is that the "unsafe' callback is only used for fsync(2).
I think it's OK to change it as long as the change does not break
fsync(2).

regards
yan, zheng

>
>                                         -Alex
>
>> The purpose of unsafe callback is adding unsafe request to a list,
>> so that fsync(2) can wait for the safe reply. fsync(2) doesn't need
>> to wait for a write(2) that hasn't returned yet. So it's OK to add
>> request to the unsafe list when the first OSD reply is received.
>> (ceph_sync_write() returns after receiving the first OSD reply)
>>
>> Signed-off-by: Yan, Zheng <zheng.z.yan@intel.com>
>> ---
>>  net/ceph/osd_client.c | 14 +++++++-------
>>  1 file changed, 7 insertions(+), 7 deletions(-)
>>
>> diff --git a/net/ceph/osd_client.c b/net/ceph/osd_client.c
>> index 540dd29..dd47889 100644
>> --- a/net/ceph/osd_client.c
>> +++ b/net/ceph/osd_client.c
>> @@ -1337,10 +1337,6 @@ static void __send_request(struct ceph_osd_client *osdc,
>>
>>       ceph_msg_get(req->r_request); /* send consumes a ref */
>>
>> -     /* Mark the request unsafe if this is the first timet's being sent. */
>> -
>> -     if (!req->r_sent && req->r_unsafe_callback)
>> -             req->r_unsafe_callback(req, true);
>>       req->r_sent = req->r_osd->o_incarnation;
>>
>>       ceph_con_send(&req->r_osd->o_con, req->r_request);
>> @@ -1431,8 +1427,6 @@ static void handle_osds_timeout(struct work_struct *work)
>>
>>  static void complete_request(struct ceph_osd_request *req)
>>  {
>> -     if (req->r_unsafe_callback)
>> -             req->r_unsafe_callback(req, false);
>>       complete_all(&req->r_safe_completion);  /* fsync waiter */
>>  }
>>
>> @@ -1559,14 +1553,20 @@ static void handle_reply(struct ceph_osd_client *osdc, struct ceph_msg *msg,
>>       mutex_unlock(&osdc->request_mutex);
>>
>>       if (!already_completed) {
>> +             if (req->r_unsafe_callback &&
>> +                 result >= 0 && !(flags & CEPH_OSD_FLAG_ONDISK))
>> +                     req->r_unsafe_callback(req, true);
>>               if (req->r_callback)
>>                       req->r_callback(req, msg);
>>               else
>>                       complete_all(&req->r_completion);
>>       }
>>
>> -     if (flags & CEPH_OSD_FLAG_ONDISK)
>> +     if (flags & CEPH_OSD_FLAG_ONDISK) {
>> +             if (req->r_unsafe_callback && already_completed)
>> +                     req->r_unsafe_callback(req, false);
>>               complete_request(req);
>> +     }
>>
>>  done:
>>       dout("req=%p req->r_linger=%d\n", req, req->r_linger);
>>
>
> --
> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH 1/3] libceph: call r_unsafe_callback when unsafe reply is received
  2013-07-02 13:07 ` Alex Elder
  2013-07-02 14:27   ` Yan, Zheng
@ 2013-07-02 18:10   ` Sage Weil
  2013-07-02 18:11     ` Alex Elder
  1 sibling, 1 reply; 25+ messages in thread
From: Sage Weil @ 2013-07-02 18:10 UTC (permalink / raw)
  To: Alex Elder; +Cc: Yan, Zheng, ceph-devel

On Tue, 2 Jul 2013, Alex Elder wrote:
> On 06/24/2013 01:41 AM, Yan, Zheng wrote:
> > From: "Yan, Zheng" <zheng.z.yan@intel.com>
> 
> Sorry it took so long, I intended to take a look at this
> for you sooner.
> 
> I would also like to thank you for this nice clear
> description.  It made it very easy to understand
> why you were proposing the change, and to focus in
> on exactly which parts of the design it's affecting.
> 
> > We can't use !req->r_sent to check if OSD request is sent for the
> > first time, this is because __cancel_request() zeros req->r_sent
> > when OSD map changes. Rather than adding a new variable to struct
> 
> You're right.
> 
> > ceph_osd_request to indicate if it's sent for the first time, We
> > can call the unsafe callback only when unsafe OSD reply is received.
> > If OSD's first reply is safe, just skip calling the unsafe callback.
> 
> This seems reasonable, but it's different from the way I
> thought about what constituted "unsafe."  But I may be
> wrong, and the way this is used by the file system might
> do something that addresses my concern.
> 
> The way I interpreted "unsafe" was simply that it was possible
> a write *could* have been made persistent, even if the client
> doesn't know about it.  A request could have made it to its
> target osd, been written, and the response could be in flight
> at the point something (maybe a router?) crashes and the response
> gets lost.  During that time window, the stored data may not be
> in a state that's consistent with the client's view of it.
> 
> So I thought of "unsafe" as meaning that a write is in flight,
> and until we get a successful response, the storage might
> contain the old data or it might contain the new data; the
> client has no way of knowing which.
> 
> With that interpretation, a request becomes unsafe the
> instant it leaves the client, and becomes safe again
> the instant a response arrives.
> 
> If my interpretation is correct, this change is wrong.

The interpretation is correct, but in this case it doesn't matter.  There 
are two intervals:

 - write(2) starts
 - request is sent
  <interval 1>
 - got ack reply, write(2) returns
  <interval 2>
 - got commit reply

The important end result is that we need to wait for requests in interval 
2 if we fsync().  With your 'unsafe' definition, we *also* wait for 
syscalls that haven't returned yet, but this isn't necessary... fsync() 
need only wait for completed but uncommitted writes, not racing ones.  We 
could quibble about better naming, but the end result is correct.

sage


> 
> But I may be wrong, and there may really be no need to
> worry about a possible modification of data until after
> an acknowledgement response is received.  In that case,
> I've looked at your patch and it looks good.
> 
> Can you explain why I'm wrong about what is "unsafe?"
> 
> 					-Alex
> 
> > The purpose of unsafe callback is adding unsafe request to a list,
> > so that fsync(2) can wait for the safe reply. fsync(2) doesn't need
> > to wait for a write(2) that hasn't returned yet. So it's OK to add
> > request to the unsafe list when the first OSD reply is received.
> > (ceph_sync_write() returns after receiving the first OSD reply)
> > 
> > Signed-off-by: Yan, Zheng <zheng.z.yan@intel.com>
> > ---
> >  net/ceph/osd_client.c | 14 +++++++-------
> >  1 file changed, 7 insertions(+), 7 deletions(-)
> > 
> > diff --git a/net/ceph/osd_client.c b/net/ceph/osd_client.c
> > index 540dd29..dd47889 100644
> > --- a/net/ceph/osd_client.c
> > +++ b/net/ceph/osd_client.c
> > @@ -1337,10 +1337,6 @@ static void __send_request(struct ceph_osd_client *osdc,
> >  
> >  	ceph_msg_get(req->r_request); /* send consumes a ref */
> >  
> > -	/* Mark the request unsafe if this is the first timet's being sent. */
> > -
> > -	if (!req->r_sent && req->r_unsafe_callback)
> > -		req->r_unsafe_callback(req, true);
> >  	req->r_sent = req->r_osd->o_incarnation;
> >  
> >  	ceph_con_send(&req->r_osd->o_con, req->r_request);
> > @@ -1431,8 +1427,6 @@ static void handle_osds_timeout(struct work_struct *work)
> >  
> >  static void complete_request(struct ceph_osd_request *req)
> >  {
> > -	if (req->r_unsafe_callback)
> > -		req->r_unsafe_callback(req, false);
> >  	complete_all(&req->r_safe_completion);  /* fsync waiter */
> >  }
> >  
> > @@ -1559,14 +1553,20 @@ static void handle_reply(struct ceph_osd_client *osdc, struct ceph_msg *msg,
> >  	mutex_unlock(&osdc->request_mutex);
> >  
> >  	if (!already_completed) {
> > +		if (req->r_unsafe_callback &&
> > +		    result >= 0 && !(flags & CEPH_OSD_FLAG_ONDISK))
> > +			req->r_unsafe_callback(req, true);
> >  		if (req->r_callback)
> >  			req->r_callback(req, msg);
> >  		else
> >  			complete_all(&req->r_completion);
> >  	}
> >  
> > -	if (flags & CEPH_OSD_FLAG_ONDISK)
> > +	if (flags & CEPH_OSD_FLAG_ONDISK) {
> > +		if (req->r_unsafe_callback && already_completed)
> > +			req->r_unsafe_callback(req, false);
> >  		complete_request(req);
> > +	}
> >  
> >  done:
> >  	dout("req=%p req->r_linger=%d\n", req, req->r_linger);
> > 
> 
> --
> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 
> 

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH 1/3] libceph: call r_unsafe_callback when unsafe reply is received
  2013-07-02 18:10   ` Sage Weil
@ 2013-07-02 18:11     ` Alex Elder
  0 siblings, 0 replies; 25+ messages in thread
From: Alex Elder @ 2013-07-02 18:11 UTC (permalink / raw)
  To: Sage Weil; +Cc: Yan, Zheng, ceph-devel

On 07/02/2013 01:10 PM, Sage Weil wrote:
> On Tue, 2 Jul 2013, Alex Elder wrote:
>> On 06/24/2013 01:41 AM, Yan, Zheng wrote:
>>> From: "Yan, Zheng" <zheng.z.yan@intel.com>
>>
>> Sorry it took so long, I intended to take a look at this
>> for you sooner.
>>
>> I would also like to thank you for this nice clear
>> description.  It made it very easy to understand
>> why you were proposing the change, and to focus in
>> on exactly which parts of the design it's affecting.
>>
>>> We can't use !req->r_sent to check if OSD request is sent for the
>>> first time, this is because __cancel_request() zeros req->r_sent
>>> when OSD map changes. Rather than adding a new variable to struct
>>
>> You're right.
>>
>>> ceph_osd_request to indicate if it's sent for the first time, We
>>> can call the unsafe callback only when unsafe OSD reply is received.
>>> If OSD's first reply is safe, just skip calling the unsafe callback.
>>
>> This seems reasonable, but it's different from the way I
>> thought about what constituted "unsafe."  But I may be
>> wrong, and the way this is used by the file system might
>> do something that addresses my concern.
>>
>> The way I interpreted "unsafe" was simply that it was possible
>> a write *could* have been made persistent, even if the client
>> doesn't know about it.  A request could have made it to its
>> target osd, been written, and the response could be in flight
>> at the point something (maybe a router?) crashes and the response
>> gets lost.  During that time window, the stored data may not be
>> in a state that's consistent with the client's view of it.
>>
>> So I thought of "unsafe" as meaning that a write is in flight,
>> and until we get a successful response, the storage might
>> contain the old data or it might contain the new data; the
>> client has no way of knowing which.
>>
>> With that interpretation, a request becomes unsafe the
>> instant it leaves the client, and becomes safe again
>> the instant a response arrives.
>>
>> If my interpretation is correct, this change is wrong.
> 
> The interpretation is correct, but in this case it doesn't matter.  There 
> are two intervals:
> 
>  - write(2) starts
>  - request is sent
>   <interval 1>
>  - got ack reply, write(2) returns
>   <interval 2>
>  - got commit reply
> 
> The important end result is that we need to wait for requests in interval 
> 2 if we fsync().  With your 'unsafe' definition, we *also* wait for 
> syscalls that haven't returned yet, but this isn't necessary... fsync() 
> need only wait for completed but uncommitted writes, not racing ones.  We 
> could quibble about better naming, but the end result is correct.

OK, sounds good to me.  In that case you can include this if you like:

Reviewed-by: Alex Elder <elder@linaro.org>

> sage
> 
> 
>>
>> But I may be wrong, and there may really be no need to
>> worry about a possible modification of data until after
>> an acknowledgement response is received.  In that case,
>> I've looked at your patch and it looks good.
>>
>> Can you explain why I'm wrong about what is "unsafe?"
>>
>> 					-Alex
>>
>>> The purpose of unsafe callback is adding unsafe request to a list,
>>> so that fsync(2) can wait for the safe reply. fsync(2) doesn't need
>>> to wait for a write(2) that hasn't returned yet. So it's OK to add
>>> request to the unsafe list when the first OSD reply is received.
>>> (ceph_sync_write() returns after receiving the first OSD reply)
>>>
>>> Signed-off-by: Yan, Zheng <zheng.z.yan@intel.com>
>>> ---
>>>  net/ceph/osd_client.c | 14 +++++++-------
>>>  1 file changed, 7 insertions(+), 7 deletions(-)
>>>
>>> diff --git a/net/ceph/osd_client.c b/net/ceph/osd_client.c
>>> index 540dd29..dd47889 100644
>>> --- a/net/ceph/osd_client.c
>>> +++ b/net/ceph/osd_client.c
>>> @@ -1337,10 +1337,6 @@ static void __send_request(struct ceph_osd_client *osdc,
>>>  
>>>  	ceph_msg_get(req->r_request); /* send consumes a ref */
>>>  
>>> -	/* Mark the request unsafe if this is the first timet's being sent. */
>>> -
>>> -	if (!req->r_sent && req->r_unsafe_callback)
>>> -		req->r_unsafe_callback(req, true);
>>>  	req->r_sent = req->r_osd->o_incarnation;
>>>  
>>>  	ceph_con_send(&req->r_osd->o_con, req->r_request);
>>> @@ -1431,8 +1427,6 @@ static void handle_osds_timeout(struct work_struct *work)
>>>  
>>>  static void complete_request(struct ceph_osd_request *req)
>>>  {
>>> -	if (req->r_unsafe_callback)
>>> -		req->r_unsafe_callback(req, false);
>>>  	complete_all(&req->r_safe_completion);  /* fsync waiter */
>>>  }
>>>  
>>> @@ -1559,14 +1553,20 @@ static void handle_reply(struct ceph_osd_client *osdc, struct ceph_msg *msg,
>>>  	mutex_unlock(&osdc->request_mutex);
>>>  
>>>  	if (!already_completed) {
>>> +		if (req->r_unsafe_callback &&
>>> +		    result >= 0 && !(flags & CEPH_OSD_FLAG_ONDISK))
>>> +			req->r_unsafe_callback(req, true);
>>>  		if (req->r_callback)
>>>  			req->r_callback(req, msg);
>>>  		else
>>>  			complete_all(&req->r_completion);
>>>  	}
>>>  
>>> -	if (flags & CEPH_OSD_FLAG_ONDISK)
>>> +	if (flags & CEPH_OSD_FLAG_ONDISK) {
>>> +		if (req->r_unsafe_callback && already_completed)
>>> +			req->r_unsafe_callback(req, false);
>>>  		complete_request(req);
>>> +	}
>>>  
>>>  done:
>>>  	dout("req=%p req->r_linger=%d\n", req, req->r_linger);
>>>
>>
>> --
>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
>> the body of a message to majordomo@vger.kernel.org
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>
>>


^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH 1/3] libceph: call r_unsafe_callback when unsafe reply is received
  2013-07-01 19:46   ` Sage Weil
@ 2013-07-03 21:57     ` Sage Weil
  2013-07-03 22:07       ` Milosz Tanski
                         ` (2 more replies)
  0 siblings, 3 replies; 25+ messages in thread
From: Sage Weil @ 2013-07-03 21:57 UTC (permalink / raw)
  To: Yan, Zheng; +Cc: ceph-devel, alex.elder

Hi Yan-

On Mon, 1 Jul 2013, Sage Weil wrote:
> On Mon, 1 Jul 2013, Yan, Zheng wrote:
> > ping
> > 
> > I think this patch should goes into 3.11 or fix the issue by other means
> 
> Applied this to the testing branch, thanks.  Let me know if there are any 
> others I missed!

This broke rbd, which was using the unsafe callback. I pushed a patch to 
simplify that (testing-next^); care to take a look?

Thanks!
sage


> 
> sage
> 
> > 
> > 
> > On 06/24/2013 02:41 PM, Yan, Zheng wrote:
> > > From: "Yan, Zheng" <zheng.z.yan@intel.com>
> > > 
> > > We can't use !req->r_sent to check if OSD request is sent for the
> > > first time, this is because __cancel_request() zeros req->r_sent
> > > when OSD map changes. Rather than adding a new variable to struct
> > > ceph_osd_request to indicate if it's sent for the first time, We
> > > can call the unsafe callback only when unsafe OSD reply is received.
> > > If OSD's first reply is safe, just skip calling the unsafe callback.
> > > 
> > > The purpose of unsafe callback is adding unsafe request to a list,
> > > so that fsync(2) can wait for the safe reply. fsync(2) doesn't need
> > > to wait for a write(2) that hasn't returned yet. So it's OK to add
> > > request to the unsafe list when the first OSD reply is received.
> > > (ceph_sync_write() returns after receiving the first OSD reply)
> > > 
> > > Signed-off-by: Yan, Zheng <zheng.z.yan@intel.com>
> > > ---
> > >  net/ceph/osd_client.c | 14 +++++++-------
> > >  1 file changed, 7 insertions(+), 7 deletions(-)
> > > 
> > > diff --git a/net/ceph/osd_client.c b/net/ceph/osd_client.c
> > > index 540dd29..dd47889 100644
> > > --- a/net/ceph/osd_client.c
> > > +++ b/net/ceph/osd_client.c
> > > @@ -1337,10 +1337,6 @@ static void __send_request(struct ceph_osd_client *osdc,
> > >  
> > >  	ceph_msg_get(req->r_request); /* send consumes a ref */
> > >  
> > > -	/* Mark the request unsafe if this is the first timet's being sent. */
> > > -
> > > -	if (!req->r_sent && req->r_unsafe_callback)
> > > -		req->r_unsafe_callback(req, true);
> > >  	req->r_sent = req->r_osd->o_incarnation;
> > >  
> > >  	ceph_con_send(&req->r_osd->o_con, req->r_request);
> > > @@ -1431,8 +1427,6 @@ static void handle_osds_timeout(struct work_struct *work)
> > >  
> > >  static void complete_request(struct ceph_osd_request *req)
> > >  {
> > > -	if (req->r_unsafe_callback)
> > > -		req->r_unsafe_callback(req, false);
> > >  	complete_all(&req->r_safe_completion);  /* fsync waiter */
> > >  }
> > >  
> > > @@ -1559,14 +1553,20 @@ static void handle_reply(struct ceph_osd_client *osdc, struct ceph_msg *msg,
> > >  	mutex_unlock(&osdc->request_mutex);
> > >  
> > >  	if (!already_completed) {
> > > +		if (req->r_unsafe_callback &&
> > > +		    result >= 0 && !(flags & CEPH_OSD_FLAG_ONDISK))
> > > +			req->r_unsafe_callback(req, true);
> > >  		if (req->r_callback)
> > >  			req->r_callback(req, msg);
> > >  		else
> > >  			complete_all(&req->r_completion);
> > >  	}
> > >  
> > > -	if (flags & CEPH_OSD_FLAG_ONDISK)
> > > +	if (flags & CEPH_OSD_FLAG_ONDISK) {
> > > +		if (req->r_unsafe_callback && already_completed)
> > > +			req->r_unsafe_callback(req, false);
> > >  		complete_request(req);
> > > +	}
> > >  
> > >  done:
> > >  	dout("req=%p req->r_linger=%d\n", req, req->r_linger);
> > > 
> > 
> > 
> --
> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 
> 

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH 1/3] libceph: call r_unsafe_callback when unsafe reply is received
  2013-07-03 21:57     ` Sage Weil
@ 2013-07-03 22:07       ` Milosz Tanski
  2013-07-03 22:10         ` Sage Weil
  2013-07-03 22:43         ` Yan, Zheng
  2013-07-03 22:18       ` Alex Elder
  2013-07-03 22:22       ` Yan, Zheng
  2 siblings, 2 replies; 25+ messages in thread
From: Milosz Tanski @ 2013-07-03 22:07 UTC (permalink / raw)
  To: Yan, Zheng; +Cc: ceph-devel, alex.elder, Sage Weil

Yan,

Can you help me understand how this change fixes:
http://tracker.ceph.com/issues/2019 ? The symptom on the client is
that the processes get stuck waiting in ceph_mdsc_do_request according
to /proc/PID/stack.

Thanks in advance,
- Milosz

On Wed, Jul 3, 2013 at 5:57 PM, Sage Weil <sage@inktank.com> wrote:
> Hi Yan-
>
> On Mon, 1 Jul 2013, Sage Weil wrote:
>> On Mon, 1 Jul 2013, Yan, Zheng wrote:
>> > ping
>> >
>> > I think this patch should goes into 3.11 or fix the issue by other means
>>
>> Applied this to the testing branch, thanks.  Let me know if there are any
>> others I missed!
>
> This broke rbd, which was using the unsafe callback. I pushed a patch to
> simplify that (testing-next^); care to take a look?
>
> Thanks!
> sage
>
>
>>
>> sage
>>
>> >
>> >
>> > On 06/24/2013 02:41 PM, Yan, Zheng wrote:
>> > > From: "Yan, Zheng" <zheng.z.yan@intel.com>
>> > >
>> > > We can't use !req->r_sent to check if OSD request is sent for the
>> > > first time, this is because __cancel_request() zeros req->r_sent
>> > > when OSD map changes. Rather than adding a new variable to struct
>> > > ceph_osd_request to indicate if it's sent for the first time, We
>> > > can call the unsafe callback only when unsafe OSD reply is received.
>> > > If OSD's first reply is safe, just skip calling the unsafe callback.
>> > >
>> > > The purpose of unsafe callback is adding unsafe request to a list,
>> > > so that fsync(2) can wait for the safe reply. fsync(2) doesn't need
>> > > to wait for a write(2) that hasn't returned yet. So it's OK to add
>> > > request to the unsafe list when the first OSD reply is received.
>> > > (ceph_sync_write() returns after receiving the first OSD reply)
>> > >
>> > > Signed-off-by: Yan, Zheng <zheng.z.yan@intel.com>
>> > > ---
>> > >  net/ceph/osd_client.c | 14 +++++++-------
>> > >  1 file changed, 7 insertions(+), 7 deletions(-)
>> > >
>> > > diff --git a/net/ceph/osd_client.c b/net/ceph/osd_client.c
>> > > index 540dd29..dd47889 100644
>> > > --- a/net/ceph/osd_client.c
>> > > +++ b/net/ceph/osd_client.c
>> > > @@ -1337,10 +1337,6 @@ static void __send_request(struct ceph_osd_client *osdc,
>> > >
>> > >   ceph_msg_get(req->r_request); /* send consumes a ref */
>> > >
>> > > - /* Mark the request unsafe if this is the first timet's being sent. */
>> > > -
>> > > - if (!req->r_sent && req->r_unsafe_callback)
>> > > -         req->r_unsafe_callback(req, true);
>> > >   req->r_sent = req->r_osd->o_incarnation;
>> > >
>> > >   ceph_con_send(&req->r_osd->o_con, req->r_request);
>> > > @@ -1431,8 +1427,6 @@ static void handle_osds_timeout(struct work_struct *work)
>> > >
>> > >  static void complete_request(struct ceph_osd_request *req)
>> > >  {
>> > > - if (req->r_unsafe_callback)
>> > > -         req->r_unsafe_callback(req, false);
>> > >   complete_all(&req->r_safe_completion);  /* fsync waiter */
>> > >  }
>> > >
>> > > @@ -1559,14 +1553,20 @@ static void handle_reply(struct ceph_osd_client *osdc, struct ceph_msg *msg,
>> > >   mutex_unlock(&osdc->request_mutex);
>> > >
>> > >   if (!already_completed) {
>> > > +         if (req->r_unsafe_callback &&
>> > > +             result >= 0 && !(flags & CEPH_OSD_FLAG_ONDISK))
>> > > +                 req->r_unsafe_callback(req, true);
>> > >           if (req->r_callback)
>> > >                   req->r_callback(req, msg);
>> > >           else
>> > >                   complete_all(&req->r_completion);
>> > >   }
>> > >
>> > > - if (flags & CEPH_OSD_FLAG_ONDISK)
>> > > + if (flags & CEPH_OSD_FLAG_ONDISK) {
>> > > +         if (req->r_unsafe_callback && already_completed)
>> > > +                 req->r_unsafe_callback(req, false);
>> > >           complete_request(req);
>> > > + }
>> > >
>> > >  done:
>> > >   dout("req=%p req->r_linger=%d\n", req, req->r_linger);
>> > >
>> >
>> >
>> --
>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
>> the body of a message to majordomo@vger.kernel.org
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>
>>
> --
> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH 1/3] libceph: call r_unsafe_callback when unsafe reply is received
  2013-07-03 22:07       ` Milosz Tanski
@ 2013-07-03 22:10         ` Sage Weil
  2013-07-03 22:43         ` Yan, Zheng
  1 sibling, 0 replies; 25+ messages in thread
From: Sage Weil @ 2013-07-03 22:10 UTC (permalink / raw)
  To: Milosz Tanski; +Cc: Yan, Zheng, ceph-devel, alex.elder

On Wed, 3 Jul 2013, Milosz Tanski wrote:
> Yan,
> 
> Can you help me understand how this change fixes:
> http://tracker.ceph.com/issues/2019 ? The symptom on the client is
> that the processes get stuck waiting in ceph_mdsc_do_request according
> to /proc/PID/stack.

Note that the blocked request is a secondary effect; the MDS is trying to 
revoke caps (Fcb i think?) on that inode.

It's not clear to me how that is related to this patch either, though.  :)

sage

> 
> Thanks in advance,
> - Milosz
> 
> On Wed, Jul 3, 2013 at 5:57 PM, Sage Weil <sage@inktank.com> wrote:
> > Hi Yan-
> >
> > On Mon, 1 Jul 2013, Sage Weil wrote:
> >> On Mon, 1 Jul 2013, Yan, Zheng wrote:
> >> > ping
> >> >
> >> > I think this patch should goes into 3.11 or fix the issue by other means
> >>
> >> Applied this to the testing branch, thanks.  Let me know if there are any
> >> others I missed!
> >
> > This broke rbd, which was using the unsafe callback. I pushed a patch to
> > simplify that (testing-next^); care to take a look?
> >
> > Thanks!
> > sage
> >
> >
> >>
> >> sage
> >>
> >> >
> >> >
> >> > On 06/24/2013 02:41 PM, Yan, Zheng wrote:
> >> > > From: "Yan, Zheng" <zheng.z.yan@intel.com>
> >> > >
> >> > > We can't use !req->r_sent to check if OSD request is sent for the
> >> > > first time, this is because __cancel_request() zeros req->r_sent
> >> > > when OSD map changes. Rather than adding a new variable to struct
> >> > > ceph_osd_request to indicate if it's sent for the first time, We
> >> > > can call the unsafe callback only when unsafe OSD reply is received.
> >> > > If OSD's first reply is safe, just skip calling the unsafe callback.
> >> > >
> >> > > The purpose of unsafe callback is adding unsafe request to a list,
> >> > > so that fsync(2) can wait for the safe reply. fsync(2) doesn't need
> >> > > to wait for a write(2) that hasn't returned yet. So it's OK to add
> >> > > request to the unsafe list when the first OSD reply is received.
> >> > > (ceph_sync_write() returns after receiving the first OSD reply)
> >> > >
> >> > > Signed-off-by: Yan, Zheng <zheng.z.yan@intel.com>
> >> > > ---
> >> > >  net/ceph/osd_client.c | 14 +++++++-------
> >> > >  1 file changed, 7 insertions(+), 7 deletions(-)
> >> > >
> >> > > diff --git a/net/ceph/osd_client.c b/net/ceph/osd_client.c
> >> > > index 540dd29..dd47889 100644
> >> > > --- a/net/ceph/osd_client.c
> >> > > +++ b/net/ceph/osd_client.c
> >> > > @@ -1337,10 +1337,6 @@ static void __send_request(struct ceph_osd_client *osdc,
> >> > >
> >> > >   ceph_msg_get(req->r_request); /* send consumes a ref */
> >> > >
> >> > > - /* Mark the request unsafe if this is the first timet's being sent. */
> >> > > -
> >> > > - if (!req->r_sent && req->r_unsafe_callback)
> >> > > -         req->r_unsafe_callback(req, true);
> >> > >   req->r_sent = req->r_osd->o_incarnation;
> >> > >
> >> > >   ceph_con_send(&req->r_osd->o_con, req->r_request);
> >> > > @@ -1431,8 +1427,6 @@ static void handle_osds_timeout(struct work_struct *work)
> >> > >
> >> > >  static void complete_request(struct ceph_osd_request *req)
> >> > >  {
> >> > > - if (req->r_unsafe_callback)
> >> > > -         req->r_unsafe_callback(req, false);
> >> > >   complete_all(&req->r_safe_completion);  /* fsync waiter */
> >> > >  }
> >> > >
> >> > > @@ -1559,14 +1553,20 @@ static void handle_reply(struct ceph_osd_client *osdc, struct ceph_msg *msg,
> >> > >   mutex_unlock(&osdc->request_mutex);
> >> > >
> >> > >   if (!already_completed) {
> >> > > +         if (req->r_unsafe_callback &&
> >> > > +             result >= 0 && !(flags & CEPH_OSD_FLAG_ONDISK))
> >> > > +                 req->r_unsafe_callback(req, true);
> >> > >           if (req->r_callback)
> >> > >                   req->r_callback(req, msg);
> >> > >           else
> >> > >                   complete_all(&req->r_completion);
> >> > >   }
> >> > >
> >> > > - if (flags & CEPH_OSD_FLAG_ONDISK)
> >> > > + if (flags & CEPH_OSD_FLAG_ONDISK) {
> >> > > +         if (req->r_unsafe_callback && already_completed)
> >> > > +                 req->r_unsafe_callback(req, false);
> >> > >           complete_request(req);
> >> > > + }
> >> > >
> >> > >  done:
> >> > >   dout("req=%p req->r_linger=%d\n", req, req->r_linger);
> >> > >
> >> >
> >> >
> >> --
> >> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
> >> the body of a message to majordomo@vger.kernel.org
> >> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> >>
> >>
> > --
> > To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
> > the body of a message to majordomo@vger.kernel.org
> > More majordomo info at  http://vger.kernel.org/majordomo-info.html
> --
> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 
> 

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH 1/3] libceph: call r_unsafe_callback when unsafe reply is received
  2013-07-03 21:57     ` Sage Weil
  2013-07-03 22:07       ` Milosz Tanski
@ 2013-07-03 22:18       ` Alex Elder
  2013-07-03 22:22       ` Yan, Zheng
  2 siblings, 0 replies; 25+ messages in thread
From: Alex Elder @ 2013-07-03 22:18 UTC (permalink / raw)
  To: Sage Weil; +Cc: Yan, Zheng, ceph-devel

On 07/03/2013 04:57 PM, Sage Weil wrote:
> Hi Yan-
> 
> On Mon, 1 Jul 2013, Sage Weil wrote:
>> On Mon, 1 Jul 2013, Yan, Zheng wrote:
>>> ping
>>>
>>> I think this patch should goes into 3.11 or fix the issue by other means
>>
>> Applied this to the testing branch, thanks.  Let me know if there are any 
>> others I missed!
> 
> This broke rbd, which was using the unsafe callback. I pushed a patch to 
> simplify that (testing-next^); care to take a look?

Sorry, I should have checked that when I reviewed it but I
was paying attention to the explanation of how it fixed a
problem in the file system code.  I guess I assumed you'd
verified the change didn't break anything else that used
the code (I know, don't assume).

The rbd code does use the callback for write requests, but
only to know when they're safely on disk (it ignores the
initial "request is unsafe" callback).

					-Alex


> Thanks!
> sage
> 
> 
>>
>> sage
>>
>>>
>>>
>>> On 06/24/2013 02:41 PM, Yan, Zheng wrote:
>>>> From: "Yan, Zheng" <zheng.z.yan@intel.com>
>>>>
>>>> We can't use !req->r_sent to check if OSD request is sent for the
>>>> first time, this is because __cancel_request() zeros req->r_sent
>>>> when OSD map changes. Rather than adding a new variable to struct
>>>> ceph_osd_request to indicate if it's sent for the first time, We
>>>> can call the unsafe callback only when unsafe OSD reply is received.
>>>> If OSD's first reply is safe, just skip calling the unsafe callback.
>>>>
>>>> The purpose of unsafe callback is adding unsafe request to a list,
>>>> so that fsync(2) can wait for the safe reply. fsync(2) doesn't need
>>>> to wait for a write(2) that hasn't returned yet. So it's OK to add
>>>> request to the unsafe list when the first OSD reply is received.
>>>> (ceph_sync_write() returns after receiving the first OSD reply)
>>>>
>>>> Signed-off-by: Yan, Zheng <zheng.z.yan@intel.com>
>>>> ---
>>>>  net/ceph/osd_client.c | 14 +++++++-------
>>>>  1 file changed, 7 insertions(+), 7 deletions(-)
>>>>
>>>> diff --git a/net/ceph/osd_client.c b/net/ceph/osd_client.c
>>>> index 540dd29..dd47889 100644
>>>> --- a/net/ceph/osd_client.c
>>>> +++ b/net/ceph/osd_client.c
>>>> @@ -1337,10 +1337,6 @@ static void __send_request(struct ceph_osd_client *osdc,
>>>>  
>>>>  	ceph_msg_get(req->r_request); /* send consumes a ref */
>>>>  
>>>> -	/* Mark the request unsafe if this is the first timet's being sent. */
>>>> -
>>>> -	if (!req->r_sent && req->r_unsafe_callback)
>>>> -		req->r_unsafe_callback(req, true);
>>>>  	req->r_sent = req->r_osd->o_incarnation;
>>>>  
>>>>  	ceph_con_send(&req->r_osd->o_con, req->r_request);
>>>> @@ -1431,8 +1427,6 @@ static void handle_osds_timeout(struct work_struct *work)
>>>>  
>>>>  static void complete_request(struct ceph_osd_request *req)
>>>>  {
>>>> -	if (req->r_unsafe_callback)
>>>> -		req->r_unsafe_callback(req, false);
>>>>  	complete_all(&req->r_safe_completion);  /* fsync waiter */
>>>>  }
>>>>  
>>>> @@ -1559,14 +1553,20 @@ static void handle_reply(struct ceph_osd_client *osdc, struct ceph_msg *msg,
>>>>  	mutex_unlock(&osdc->request_mutex);
>>>>  
>>>>  	if (!already_completed) {
>>>> +		if (req->r_unsafe_callback &&
>>>> +		    result >= 0 && !(flags & CEPH_OSD_FLAG_ONDISK))
>>>> +			req->r_unsafe_callback(req, true);
>>>>  		if (req->r_callback)
>>>>  			req->r_callback(req, msg);
>>>>  		else
>>>>  			complete_all(&req->r_completion);
>>>>  	}
>>>>  
>>>> -	if (flags & CEPH_OSD_FLAG_ONDISK)
>>>> +	if (flags & CEPH_OSD_FLAG_ONDISK) {
>>>> +		if (req->r_unsafe_callback && already_completed)
>>>> +			req->r_unsafe_callback(req, false);
>>>>  		complete_request(req);
>>>> +	}
>>>>  
>>>>  done:
>>>>  	dout("req=%p req->r_linger=%d\n", req, req->r_linger);
>>>>
>>>
>>>
>> --
>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
>> the body of a message to majordomo@vger.kernel.org
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>
>>


^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH 1/3] libceph: call r_unsafe_callback when unsafe reply is received
  2013-07-03 21:57     ` Sage Weil
  2013-07-03 22:07       ` Milosz Tanski
  2013-07-03 22:18       ` Alex Elder
@ 2013-07-03 22:22       ` Yan, Zheng
  2013-07-03 22:26         ` Sage Weil
  2 siblings, 1 reply; 25+ messages in thread
From: Yan, Zheng @ 2013-07-03 22:22 UTC (permalink / raw)
  To: Sage Weil, Alex Elder; +Cc: Yan, Zheng, ceph-devel

On Thu, Jul 4, 2013 at 5:57 AM, Sage Weil <sage@inktank.com> wrote:
> Hi Yan-
>
> On Mon, 1 Jul 2013, Sage Weil wrote:
>> On Mon, 1 Jul 2013, Yan, Zheng wrote:
>> > ping
>> >
>> > I think this patch should goes into 3.11 or fix the issue by other means
>>
>> Applied this to the testing branch, thanks.  Let me know if there are any
>> others I missed!
>
> This broke rbd, which was using the unsafe callback. I pushed a patch to
> simplify that (testing-next^); care to take a look?
>

the patch looks good. looks like issue #5146 actually does not exist.
Alex, could you take a look.

Thanks
 Yan, Zheng

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH 1/3] libceph: call r_unsafe_callback when unsafe reply is received
  2013-07-03 22:22       ` Yan, Zheng
@ 2013-07-03 22:26         ` Sage Weil
  2013-07-03 22:32           ` Sage Weil
  0 siblings, 1 reply; 25+ messages in thread
From: Sage Weil @ 2013-07-03 22:26 UTC (permalink / raw)
  To: Yan, Zheng; +Cc: Alex Elder, Yan, Zheng, ceph-devel

On Thu, 4 Jul 2013, Yan, Zheng wrote:
> On Thu, Jul 4, 2013 at 5:57 AM, Sage Weil <sage@inktank.com> wrote:
> > Hi Yan-
> >
> > On Mon, 1 Jul 2013, Sage Weil wrote:
> >> On Mon, 1 Jul 2013, Yan, Zheng wrote:
> >> > ping
> >> >
> >> > I think this patch should goes into 3.11 or fix the issue by other means
> >>
> >> Applied this to the testing branch, thanks.  Let me know if there are any
> >> others I missed!
> >
> > This broke rbd, which was using the unsafe callback. I pushed a patch to
> > simplify that (testing-next^); care to take a look?
> >
> 
> the patch looks good. looks like issue #5146 actually does not exist.
> Alex, could you take a look.

Yeah, i'll close.

sage

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH 1/3] libceph: call r_unsafe_callback when unsafe reply is received
  2013-07-03 22:26         ` Sage Weil
@ 2013-07-03 22:32           ` Sage Weil
  0 siblings, 0 replies; 25+ messages in thread
From: Sage Weil @ 2013-07-03 22:32 UTC (permalink / raw)
  To: Yan, Zheng; +Cc: Alex Elder, Yan, Zheng, ceph-devel

On Wed, 3 Jul 2013, Sage Weil wrote:
> On Thu, 4 Jul 2013, Yan, Zheng wrote:
> > On Thu, Jul 4, 2013 at 5:57 AM, Sage Weil <sage@inktank.com> wrote:
> > > Hi Yan-
> > >
> > > On Mon, 1 Jul 2013, Sage Weil wrote:
> > >> On Mon, 1 Jul 2013, Yan, Zheng wrote:
> > >> > ping
> > >> >
> > >> > I think this patch should goes into 3.11 or fix the issue by other means
> > >>
> > >> Applied this to the testing branch, thanks.  Let me know if there are any
> > >> others I missed!
> > >
> > > This broke rbd, which was using the unsafe callback. I pushed a patch to
> > > simplify that (testing-next^); care to take a look?
> > >
> > 
> > the patch looks good. looks like issue #5146 actually does not exist.
> > Alex, could you take a look.
> 
> Yeah, i'll close.

Ok, I just realized that my patch is essentially reverting 12166906, which 
AFAICS was based on the assumption that ONDISK would get an ACK but not 
ONDISK reply from the OSD, but in reality we get only the ONDISK.  We 
should just drop them both...

sage

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH 1/3] libceph: call r_unsafe_callback when unsafe reply is received
  2013-07-03 22:07       ` Milosz Tanski
  2013-07-03 22:10         ` Sage Weil
@ 2013-07-03 22:43         ` Yan, Zheng
  2013-07-08 14:42           ` Milosz Tanski
  1 sibling, 1 reply; 25+ messages in thread
From: Yan, Zheng @ 2013-07-03 22:43 UTC (permalink / raw)
  To: Milosz Tanski; +Cc: Yan, Zheng, ceph-devel, Alex Elder, Sage Weil

On Thu, Jul 4, 2013 at 6:07 AM, Milosz Tanski <milosz@adfin.com> wrote:
> Yan,
>
> Can you help me understand how this change fixes:
> http://tracker.ceph.com/issues/2019 ? The symptom on the client is
> that the processes get stuck waiting in ceph_mdsc_do_request according
> to /proc/PID/stack.

The bug this patch fixes is that ceph_sync_write_unsafe can be called
multiple times
with parameter unsafe=true. The bug prevents the kclient from
releasing Fw cap, further
lead to filelock stuck in unstable state forever and request hang.

Yan, Zheng

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH 1/3] libceph: call r_unsafe_callback when unsafe reply is received
  2013-07-03 22:43         ` Yan, Zheng
@ 2013-07-08 14:42           ` Milosz Tanski
  2013-07-08 19:58             ` Milosz Tanski
  0 siblings, 1 reply; 25+ messages in thread
From: Milosz Tanski @ 2013-07-08 14:42 UTC (permalink / raw)
  To: Yan, Zheng; +Cc: Yan, Zheng, ceph-devel, Alex Elder, Sage Weil

Yan,

So it looks like it fixes the issue. I had to update all my clients
and restart MDS and things got back to normal.

- Milosz

On Wed, Jul 3, 2013 at 6:43 PM, Yan, Zheng <ukernel@gmail.com> wrote:
> On Thu, Jul 4, 2013 at 6:07 AM, Milosz Tanski <milosz@adfin.com> wrote:
>> Yan,
>>
>> Can you help me understand how this change fixes:
>> http://tracker.ceph.com/issues/2019 ? The symptom on the client is
>> that the processes get stuck waiting in ceph_mdsc_do_request according
>> to /proc/PID/stack.
>
> The bug this patch fixes is that ceph_sync_write_unsafe can be called
> multiple times
> with parameter unsafe=true. The bug prevents the kclient from
> releasing Fw cap, further
> lead to filelock stuck in unstable state forever and request hang.
>
> Yan, Zheng

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH 1/3] libceph: call r_unsafe_callback when unsafe reply is received
  2013-07-08 14:42           ` Milosz Tanski
@ 2013-07-08 19:58             ` Milosz Tanski
  2013-07-08 20:30               ` Yan, Zheng
  0 siblings, 1 reply; 25+ messages in thread
From: Milosz Tanski @ 2013-07-08 19:58 UTC (permalink / raw)
  To: Yan, Zheng; +Cc: Yan, Zheng, ceph-devel, Alex Elder, Sage Weil

Yan,

Actually after playing some more today I have another one of my
clients stuck in this spot. When I look at the kernel stacks this is
what I see for all the threads:

[<ffffffffa02d2bab>] ceph_mdsc_do_request+0xcb/0x1a0 [ceph]
[<ffffffffa02c018f>] ceph_do_getattr+0xdf/0x120 [ceph]
[<ffffffffa02c01f4>] ceph_getattr+0x24/0x100 [ceph]
[<ffffffff811775fd>] vfs_getattr+0x4d/0x80
[<ffffffff8117784d>] vfs_fstat+0x3d/0x70
[<ffffffff81177895>] SYSC_newfstat+0x15/0x30
[<ffffffff8117794e>] SyS_newfstat+0xe/0x10
[<ffffffff8155dd59>] system_call_fastpath+0x16/0x1b
[<ffffffffffffffff>] 0xffffffffffffffff


Anything I can do on my end to debug this issue?

- Milosz

P.S: Sorry for the second email if you got it gmail keep switching me
to non-plain text mode. Sigh.

On Mon, Jul 8, 2013 at 10:42 AM, Milosz Tanski <milosz@adfin.com> wrote:
> Yan,
>
> So it looks like it fixes the issue. I had to update all my clients
> and restart MDS and things got back to normal.
>
> - Milosz
>
> On Wed, Jul 3, 2013 at 6:43 PM, Yan, Zheng <ukernel@gmail.com> wrote:
>> On Thu, Jul 4, 2013 at 6:07 AM, Milosz Tanski <milosz@adfin.com> wrote:
>>> Yan,
>>>
>>> Can you help me understand how this change fixes:
>>> http://tracker.ceph.com/issues/2019 ? The symptom on the client is
>>> that the processes get stuck waiting in ceph_mdsc_do_request according
>>> to /proc/PID/stack.
>>
>> The bug this patch fixes is that ceph_sync_write_unsafe can be called
>> multiple times
>> with parameter unsafe=true. The bug prevents the kclient from
>> releasing Fw cap, further
>> lead to filelock stuck in unstable state forever and request hang.
>>
>> Yan, Zheng

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH 1/3] libceph: call r_unsafe_callback when unsafe reply is received
  2013-07-08 19:58             ` Milosz Tanski
@ 2013-07-08 20:30               ` Yan, Zheng
  2013-07-08 21:16                 ` Milosz Tanski
  0 siblings, 1 reply; 25+ messages in thread
From: Yan, Zheng @ 2013-07-08 20:30 UTC (permalink / raw)
  To: Milosz Tanski; +Cc: Yan, Zheng, ceph-devel, Alex Elder, Sage Weil

On Tue, Jul 9, 2013 at 3:58 AM, Milosz Tanski <milosz@adfin.com> wrote:
> Yan,
>
> Actually after playing some more today I have another one of my
> clients stuck in this spot. When I look at the kernel stacks this is
> what I see for all the threads:
>
> [<ffffffffa02d2bab>] ceph_mdsc_do_request+0xcb/0x1a0 [ceph]
> [<ffffffffa02c018f>] ceph_do_getattr+0xdf/0x120 [ceph]
> [<ffffffffa02c01f4>] ceph_getattr+0x24/0x100 [ceph]
> [<ffffffff811775fd>] vfs_getattr+0x4d/0x80
> [<ffffffff8117784d>] vfs_fstat+0x3d/0x70
> [<ffffffff81177895>] SYSC_newfstat+0x15/0x30
> [<ffffffff8117794e>] SyS_newfstat+0xe/0x10
> [<ffffffff8155dd59>] system_call_fastpath+0x16/0x1b
> [<ffffffffffffffff>] 0xffffffffffffffff
>
>
> Anything I can do on my end to debug this issue?
>

find the hang request (and inode) through /sys/kernel/debug/ceph/xxx/mdsc,
use 'ceph mds tell \* dumpcache' to dump mds cache.
open /cachedump.xxx and check the inode's state.

does your kernel include all fixes in testing branch of ceph-client ?
does restarting the mds resolve the hang ?

yan, zheng

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH 1/3] libceph: call r_unsafe_callback when unsafe reply is received
  2013-07-08 20:30               ` Yan, Zheng
@ 2013-07-08 21:16                 ` Milosz Tanski
  2013-07-25 15:43                   ` Milosz Tanski
  0 siblings, 1 reply; 25+ messages in thread
From: Milosz Tanski @ 2013-07-08 21:16 UTC (permalink / raw)
  To: Yan, Zheng; +Cc: Yan, Zheng, ceph-devel, Alex Elder, Sage Weil

In this case (unlike last week) the restart did unlock my clients.

- M

On Mon, Jul 8, 2013 at 4:30 PM, Yan, Zheng <ukernel@gmail.com> wrote:
> On Tue, Jul 9, 2013 at 3:58 AM, Milosz Tanski <milosz@adfin.com> wrote:
>> Yan,
>>
>> Actually after playing some more today I have another one of my
>> clients stuck in this spot. When I look at the kernel stacks this is
>> what I see for all the threads:
>>
>> [<ffffffffa02d2bab>] ceph_mdsc_do_request+0xcb/0x1a0 [ceph]
>> [<ffffffffa02c018f>] ceph_do_getattr+0xdf/0x120 [ceph]
>> [<ffffffffa02c01f4>] ceph_getattr+0x24/0x100 [ceph]
>> [<ffffffff811775fd>] vfs_getattr+0x4d/0x80
>> [<ffffffff8117784d>] vfs_fstat+0x3d/0x70
>> [<ffffffff81177895>] SYSC_newfstat+0x15/0x30
>> [<ffffffff8117794e>] SyS_newfstat+0xe/0x10
>> [<ffffffff8155dd59>] system_call_fastpath+0x16/0x1b
>> [<ffffffffffffffff>] 0xffffffffffffffff
>>
>>
>> Anything I can do on my end to debug this issue?
>>
>
> find the hang request (and inode) through /sys/kernel/debug/ceph/xxx/mdsc,
> use 'ceph mds tell \* dumpcache' to dump mds cache.
> open /cachedump.xxx and check the inode's state.
>
> does your kernel include all fixes in testing branch of ceph-client ?
> does restarting the mds resolve the hang ?
>
> yan, zheng

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH 1/3] libceph: call r_unsafe_callback when unsafe reply is received
  2013-07-08 21:16                 ` Milosz Tanski
@ 2013-07-25 15:43                   ` Milosz Tanski
  0 siblings, 0 replies; 25+ messages in thread
From: Milosz Tanski @ 2013-07-25 15:43 UTC (permalink / raw)
  To: Yan, Zheng; +Cc: Yan, Zheng, ceph-devel, Alex Elder, Sage Weil

I just wanted to follow up to say that after applying these patches
and running it for a few weeks we're I haven't seen another lock up
under load.

- Milosz

On Mon, Jul 8, 2013 at 5:16 PM, Milosz Tanski <milosz@adfin.com> wrote:
> In this case (unlike last week) the restart did unlock my clients.
>
> - M
>
> On Mon, Jul 8, 2013 at 4:30 PM, Yan, Zheng <ukernel@gmail.com> wrote:
>> On Tue, Jul 9, 2013 at 3:58 AM, Milosz Tanski <milosz@adfin.com> wrote:
>>> Yan,
>>>
>>> Actually after playing some more today I have another one of my
>>> clients stuck in this spot. When I look at the kernel stacks this is
>>> what I see for all the threads:
>>>
>>> [<ffffffffa02d2bab>] ceph_mdsc_do_request+0xcb/0x1a0 [ceph]
>>> [<ffffffffa02c018f>] ceph_do_getattr+0xdf/0x120 [ceph]
>>> [<ffffffffa02c01f4>] ceph_getattr+0x24/0x100 [ceph]
>>> [<ffffffff811775fd>] vfs_getattr+0x4d/0x80
>>> [<ffffffff8117784d>] vfs_fstat+0x3d/0x70
>>> [<ffffffff81177895>] SYSC_newfstat+0x15/0x30
>>> [<ffffffff8117794e>] SyS_newfstat+0xe/0x10
>>> [<ffffffff8155dd59>] system_call_fastpath+0x16/0x1b
>>> [<ffffffffffffffff>] 0xffffffffffffffff
>>>
>>>
>>> Anything I can do on my end to debug this issue?
>>>
>>
>> find the hang request (and inode) through /sys/kernel/debug/ceph/xxx/mdsc,
>> use 'ceph mds tell \* dumpcache' to dump mds cache.
>> open /cachedump.xxx and check the inode's state.
>>
>> does your kernel include all fixes in testing branch of ceph-client ?
>> does restarting the mds resolve the hang ?
>>
>> yan, zheng

^ permalink raw reply	[flat|nested] 25+ messages in thread

end of thread, other threads:[~2013-07-25 15:43 UTC | newest]

Thread overview: 25+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2013-06-24  6:41 [PATCH 1/3] libceph: call r_unsafe_callback when unsafe reply is received Yan, Zheng
2013-06-24  6:41 ` [PATCH 2/3] mds: fix cap revoke race Yan, Zheng
2013-06-24  8:00   ` Yan, Zheng
2013-06-24  8:19   ` Yan, Zheng
2013-06-24  6:41 ` [PATCH 3/3] mds: fix race between cap issue and revoke Yan, Zheng
2013-06-24  8:16   ` Yan, Zheng
2013-07-01  7:28 ` [PATCH 1/3] libceph: call r_unsafe_callback when unsafe reply is received Yan, Zheng
2013-07-01 19:46   ` Sage Weil
2013-07-03 21:57     ` Sage Weil
2013-07-03 22:07       ` Milosz Tanski
2013-07-03 22:10         ` Sage Weil
2013-07-03 22:43         ` Yan, Zheng
2013-07-08 14:42           ` Milosz Tanski
2013-07-08 19:58             ` Milosz Tanski
2013-07-08 20:30               ` Yan, Zheng
2013-07-08 21:16                 ` Milosz Tanski
2013-07-25 15:43                   ` Milosz Tanski
2013-07-03 22:18       ` Alex Elder
2013-07-03 22:22       ` Yan, Zheng
2013-07-03 22:26         ` Sage Weil
2013-07-03 22:32           ` Sage Weil
2013-07-02 13:07 ` Alex Elder
2013-07-02 14:27   ` Yan, Zheng
2013-07-02 18:10   ` Sage Weil
2013-07-02 18:11     ` Alex Elder

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.