All of lore.kernel.org
 help / color / mirror / Atom feed
* Lock contention in do_rule
@ 2015-10-24  0:41 Somnath Roy
  2015-10-24  1:09 ` Sage Weil
  0 siblings, 1 reply; 6+ messages in thread
From: Somnath Roy @ 2015-10-24  0:41 UTC (permalink / raw)
  To: ceph-devel

Hi Sage,
We are seeing the following mapper_lock is heavily contended and commenting out this lock is improving performance ~10 % (in the short circuit path).
This is called for every io from osd_is_valid_op_target().
I looked into the code ,but, couldn't understand the purpose of the lock , it seems redundant to me , could you please confirm ?


void do_rule(int rule, int x, vector<int>& out, int maxout,
               const vector<__u32>& weight) const {
    Mutex::Locker l(mapper_lock);
    int rawout[maxout];
    int scratch[maxout * 3];
    int numrep = crush_do_rule(crush, rule, x, rawout, maxout, &weight[0], weight.size(), scratch);
    if (numrep < 0)
      numrep = 0;
    out.resize(numrep);
    for (int i=0; i<numrep; i++)
      out[i] = rawout[i];
  }


Thanks & Regards
Somnath

________________________________

PLEASE NOTE: The information contained in this electronic mail message is intended only for the use of the designated recipient(s) named above. If the reader of this message is not the intended recipient, you are hereby notified that you have received this message in error and that any review, dissemination, distribution, or copying of this message is strictly prohibited. If you have received this communication in error, please notify the sender by telephone or e-mail (as shown above) immediately and destroy any and all copies of this message in your possession (whether hard copies or electronically stored copies).


^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Lock contention in do_rule
  2015-10-24  0:41 Lock contention in do_rule Somnath Roy
@ 2015-10-24  1:09 ` Sage Weil
  2015-10-24  2:02   ` Somnath Roy
  0 siblings, 1 reply; 6+ messages in thread
From: Sage Weil @ 2015-10-24  1:09 UTC (permalink / raw)
  To: Somnath Roy; +Cc: ceph-devel

On Sat, 24 Oct 2015, Somnath Roy wrote:
> Hi Sage,
> We are seeing the following mapper_lock is heavily contended and commenting out this lock is improving performance ~10 % (in the short circuit path).
> This is called for every io from osd_is_valid_op_target().
> I looked into the code ,but, couldn't understand the purpose of the lock , it seems redundant to me , could you please confirm ?
> 
> 
> void do_rule(int rule, int x, vector<int>& out, int maxout,
>                const vector<__u32>& weight) const {
>     Mutex::Locker l(mapper_lock);
>     int rawout[maxout];
>     int scratch[maxout * 3];
>     int numrep = crush_do_rule(crush, rule, x, rawout, maxout, &weight[0], weight.size(), scratch);
>     if (numrep < 0)
>       numrep = 0;
>     out.resize(numrep);
>     for (int i=0; i<numrep; i++)
>       out[i] = rawout[i];
>   }

It's needed because of this:

https://github.com/ceph/ceph/blob/master/src/crush/crush.h#L137
https://github.com/ceph/ceph/blob/master/src/crush/mapper.c#L88

This is clearly not the greatest approach.  I think what we need is a 
cache that is provided by the caller (which would be annoying an awkward 
because it's not linked directly to the bucket in question, and would not 
be shared between threads) or crush upcalls that take the lock only when 
in the perm path (which is relatively rare).  I'd lean toward the 
latter, but we need to be careful about it since this code is shared with 
the kernel and it needs to work there as well.  Probably we just need to 
define two callbacks for lock and unlock on the struct crush_map?  

sage

^ permalink raw reply	[flat|nested] 6+ messages in thread

* RE: Lock contention in do_rule
  2015-10-24  1:09 ` Sage Weil
@ 2015-10-24  2:02   ` Somnath Roy
  2015-10-24  2:09     ` Somnath Roy
  0 siblings, 1 reply; 6+ messages in thread
From: Somnath Roy @ 2015-10-24  2:02 UTC (permalink / raw)
  To: Sage Weil; +Cc: ceph-devel

Thanks for the clarification Sage..
I don't have much knowledge on this part , but, what I understood by going through the code is the following..

1. Client calculates the pg->osd map by executing the same functions.

2. Request coming to OSDs and OSDs are executing the same functions again to check if mapping still is the same or not. If something changed and it is not the right OSD to execute the request it error out.

3. So, the lock is to protect against the bucket perm attribute write from multiple threads..

Some ideas :

1. The lock itself may not be expensive but it is held in the beginning of do_rule. If we take the lock in much more granular level like in bucket_perm_choose() , it could be a gain..If it is a possibility , we can test this out.

2. May be checking it from a messenger thread is having an effect, what if we move the check in the OSD worker thread  ?

Thanks & Regards
Somnath

-----Original Message-----
From: Sage Weil [mailto:sage@newdream.net]
Sent: Friday, October 23, 2015 6:10 PM
To: Somnath Roy
Cc: ceph-devel@vger.kernel.org
Subject: Re: Lock contention in do_rule

On Sat, 24 Oct 2015, Somnath Roy wrote:
> Hi Sage,
> We are seeing the following mapper_lock is heavily contended and commenting out this lock is improving performance ~10 % (in the short circuit path).
> This is called for every io from osd_is_valid_op_target().
> I looked into the code ,but, couldn't understand the purpose of the lock , it seems redundant to me , could you please confirm ?
>
>
> void do_rule(int rule, int x, vector<int>& out, int maxout,
>                const vector<__u32>& weight) const {
>     Mutex::Locker l(mapper_lock);
>     int rawout[maxout];
>     int scratch[maxout * 3];
>     int numrep = crush_do_rule(crush, rule, x, rawout, maxout, &weight[0], weight.size(), scratch);
>     if (numrep < 0)
>       numrep = 0;
>     out.resize(numrep);
>     for (int i=0; i<numrep; i++)
>       out[i] = rawout[i];
>   }

It's needed because of this:

https://github.com/ceph/ceph/blob/master/src/crush/crush.h#L137
https://github.com/ceph/ceph/blob/master/src/crush/mapper.c#L88

This is clearly not the greatest approach.  I think what we need is a cache that is provided by the caller (which would be annoying an awkward because it's not linked directly to the bucket in question, and would not be shared between threads) or crush upcalls that take the lock only when in the perm path (which is relatively rare).  I'd lean toward the latter, but we need to be careful about it since this code is shared with the kernel and it needs to work there as well.  Probably we just need to define two callbacks for lock and unlock on the struct crush_map?

sage

________________________________

PLEASE NOTE: The information contained in this electronic mail message is intended only for the use of the designated recipient(s) named above. If the reader of this message is not the intended recipient, you are hereby notified that you have received this message in error and that any review, dissemination, distribution, or copying of this message is strictly prohibited. If you have received this communication in error, please notify the sender by telephone or e-mail (as shown above) immediately and destroy any and all copies of this message in your possession (whether hard copies or electronically stored copies).


^ permalink raw reply	[flat|nested] 6+ messages in thread

* RE: Lock contention in do_rule
  2015-10-24  2:02   ` Somnath Roy
@ 2015-10-24  2:09     ` Somnath Roy
  2015-10-24 22:04       ` Sage Weil
  0 siblings, 1 reply; 6+ messages in thread
From: Somnath Roy @ 2015-10-24  2:09 UTC (permalink / raw)
  To: Sage Weil; +Cc: ceph-devel

Also, why bother doing this expensive recheck in the OSD side ?
The OSDMap can change after this check and OSD actually carrying out transaction , am I right ?
If so, anyway we are not able to protect in all the scenarios.

Thanks & Regards
Somnath

-----Original Message-----
From: ceph-devel-owner@vger.kernel.org [mailto:ceph-devel-owner@vger.kernel.org] On Behalf Of Somnath Roy
Sent: Friday, October 23, 2015 7:02 PM
To: Sage Weil
Cc: ceph-devel@vger.kernel.org
Subject: RE: Lock contention in do_rule

Thanks for the clarification Sage..
I don't have much knowledge on this part , but, what I understood by going through the code is the following..

1. Client calculates the pg->osd map by executing the same functions.

2. Request coming to OSDs and OSDs are executing the same functions again to check if mapping still is the same or not. If something changed and it is not the right OSD to execute the request it error out.

3. So, the lock is to protect against the bucket perm attribute write from multiple threads..

Some ideas :

1. The lock itself may not be expensive but it is held in the beginning of do_rule. If we take the lock in much more granular level like in bucket_perm_choose() , it could be a gain..If it is a possibility , we can test this out.

2. May be checking it from a messenger thread is having an effect, what if we move the check in the OSD worker thread  ?

Thanks & Regards
Somnath

-----Original Message-----
From: Sage Weil [mailto:sage@newdream.net]
Sent: Friday, October 23, 2015 6:10 PM
To: Somnath Roy
Cc: ceph-devel@vger.kernel.org
Subject: Re: Lock contention in do_rule

On Sat, 24 Oct 2015, Somnath Roy wrote:
> Hi Sage,
> We are seeing the following mapper_lock is heavily contended and commenting out this lock is improving performance ~10 % (in the short circuit path).
> This is called for every io from osd_is_valid_op_target().
> I looked into the code ,but, couldn't understand the purpose of the lock , it seems redundant to me , could you please confirm ?
>
>
> void do_rule(int rule, int x, vector<int>& out, int maxout,
>                const vector<__u32>& weight) const {
>     Mutex::Locker l(mapper_lock);
>     int rawout[maxout];
>     int scratch[maxout * 3];
>     int numrep = crush_do_rule(crush, rule, x, rawout, maxout, &weight[0], weight.size(), scratch);
>     if (numrep < 0)
>       numrep = 0;
>     out.resize(numrep);
>     for (int i=0; i<numrep; i++)
>       out[i] = rawout[i];
>   }

It's needed because of this:

https://github.com/ceph/ceph/blob/master/src/crush/crush.h#L137
https://github.com/ceph/ceph/blob/master/src/crush/mapper.c#L88

This is clearly not the greatest approach.  I think what we need is a cache that is provided by the caller (which would be annoying an awkward because it's not linked directly to the bucket in question, and would not be shared between threads) or crush upcalls that take the lock only when in the perm path (which is relatively rare).  I'd lean toward the latter, but we need to be careful about it since this code is shared with the kernel and it needs to work there as well.  Probably we just need to define two callbacks for lock and unlock on the struct crush_map?

sage

________________________________

PLEASE NOTE: The information contained in this electronic mail message is intended only for the use of the designated recipient(s) named above. If the reader of this message is not the intended recipient, you are hereby notified that you have received this message in error and that any review, dissemination, distribution, or copying of this message is strictly prohibited. If you have received this communication in error, please notify the sender by telephone or e-mail (as shown above) immediately and destroy any and all copies of this message in your possession (whether hard copies or electronically stored copies).

--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majordomo@vger.kernel.org More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 6+ messages in thread

* RE: Lock contention in do_rule
  2015-10-24  2:09     ` Somnath Roy
@ 2015-10-24 22:04       ` Sage Weil
  2015-10-24 22:22         ` Somnath Roy
  0 siblings, 1 reply; 6+ messages in thread
From: Sage Weil @ 2015-10-24 22:04 UTC (permalink / raw)
  To: Somnath Roy; +Cc: ceph-devel

On Sat, 24 Oct 2015, Somnath Roy wrote:
> Also, why bother doing this expensive recheck in the OSD side ?

Yeah.. the current code is overly pessimistic. If the pg exists on the OSD 
we have the mapping we need.  If it doesn't, only then do we need to 
calculate the mapping to ensure it is valid.  In OSD::handle_op(), I think 
that the

  if (!send_map->osd_is_valid_op_target(pgid.pgid, whoami)) {

check (and the one that follows) should go in the else if we don't have 
the PG.  And if we do have the pg, the existing can_discard_op() checks in 
PG.cc are already doing the right thing.

https://github.com/ceph/ceph/pull/6371

sage


> The OSDMap can change after this check and OSD actually carrying out transaction , am I right ?
> If so, anyway we are not able to protect in all the scenarios.
> 
> Thanks & Regards
> Somnath
> 
> -----Original Message-----
> From: ceph-devel-owner@vger.kernel.org [mailto:ceph-devel-owner@vger.kernel.org] On Behalf Of Somnath Roy
> Sent: Friday, October 23, 2015 7:02 PM
> To: Sage Weil
> Cc: ceph-devel@vger.kernel.org
> Subject: RE: Lock contention in do_rule
> 
> Thanks for the clarification Sage..
> I don't have much knowledge on this part , but, what I understood by going through the code is the following..
> 
> 1. Client calculates the pg->osd map by executing the same functions.
> 
> 2. Request coming to OSDs and OSDs are executing the same functions again to check if mapping still is the same or not. If something changed and it is not the right OSD to execute the request it error out.
> 
> 3. So, the lock is to protect against the bucket perm attribute write from multiple threads..
> 
> Some ideas :
> 
> 1. The lock itself may not be expensive but it is held in the beginning of do_rule. If we take the lock in much more granular level like in bucket_perm_choose() , it could be a gain..If it is a possibility , we can test this out.
> 
> 2. May be checking it from a messenger thread is having an effect, what if we move the check in the OSD worker thread  ?
> 
> Thanks & Regards
> Somnath
> 
> -----Original Message-----
> From: Sage Weil [mailto:sage@newdream.net]
> Sent: Friday, October 23, 2015 6:10 PM
> To: Somnath Roy
> Cc: ceph-devel@vger.kernel.org
> Subject: Re: Lock contention in do_rule
> 
> On Sat, 24 Oct 2015, Somnath Roy wrote:
> > Hi Sage,
> > We are seeing the following mapper_lock is heavily contended and commenting out this lock is improving performance ~10 % (in the short circuit path).
> > This is called for every io from osd_is_valid_op_target().
> > I looked into the code ,but, couldn't understand the purpose of the lock , it seems redundant to me , could you please confirm ?
> >
> >
> > void do_rule(int rule, int x, vector<int>& out, int maxout,
> >                const vector<__u32>& weight) const {
> >     Mutex::Locker l(mapper_lock);
> >     int rawout[maxout];
> >     int scratch[maxout * 3];
> >     int numrep = crush_do_rule(crush, rule, x, rawout, maxout, &weight[0], weight.size(), scratch);
> >     if (numrep < 0)
> >       numrep = 0;
> >     out.resize(numrep);
> >     for (int i=0; i<numrep; i++)
> >       out[i] = rawout[i];
> >   }
> 
> It's needed because of this:
> 
> https://github.com/ceph/ceph/blob/master/src/crush/crush.h#L137
> https://github.com/ceph/ceph/blob/master/src/crush/mapper.c#L88
> 
> This is clearly not the greatest approach.  I think what we need is a cache that is provided by the caller (which would be annoying an awkward because it's not linked directly to the bucket in question, and would not be shared between threads) or crush upcalls that take the lock only when in the perm path (which is relatively rare).  I'd lean toward the latter, but we need to be careful about it since this code is shared with the kernel and it needs to work there as well.  Probably we just need to define two callbacks for lock and unlock on the struct crush_map?
> 
> sage
> 
> ________________________________
> 
> PLEASE NOTE: The information contained in this electronic mail message is intended only for the use of the designated recipient(s) named above. If the reader of this message is not the intended recipient, you are hereby notified that you have received this message in error and that any review, dissemination, distribution, or copying of this message is strictly prohibited. If you have received this communication in error, please notify the sender by telephone or e-mail (as shown above) immediately and destroy any and all copies of this message in your possession (whether hard copies or electronically stored copies).
> 
> --
> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majordomo@vger.kernel.org More majordomo info at  http://vger.kernel.org/majordomo-info.html
> --
> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 
> 

^ permalink raw reply	[flat|nested] 6+ messages in thread

* RE: Lock contention in do_rule
  2015-10-24 22:04       ` Sage Weil
@ 2015-10-24 22:22         ` Somnath Roy
  0 siblings, 0 replies; 6+ messages in thread
From: Somnath Roy @ 2015-10-24 22:22 UTC (permalink / raw)
  To: Sage Weil; +Cc: ceph-devel

Thanks Sage, I will test with this patch..

Regards
Somnath

-----Original Message-----
From: Sage Weil [mailto:sage@newdream.net] 
Sent: Saturday, October 24, 2015 3:04 PM
To: Somnath Roy
Cc: ceph-devel@vger.kernel.org
Subject: RE: Lock contention in do_rule

On Sat, 24 Oct 2015, Somnath Roy wrote:
> Also, why bother doing this expensive recheck in the OSD side ?

Yeah.. the current code is overly pessimistic. If the pg exists on the OSD we have the mapping we need.  If it doesn't, only then do we need to calculate the mapping to ensure it is valid.  In OSD::handle_op(), I think that the

  if (!send_map->osd_is_valid_op_target(pgid.pgid, whoami)) {

check (and the one that follows) should go in the else if we don't have the PG.  And if we do have the pg, the existing can_discard_op() checks in PG.cc are already doing the right thing.

https://github.com/ceph/ceph/pull/6371

sage


> The OSDMap can change after this check and OSD actually carrying out transaction , am I right ?
> If so, anyway we are not able to protect in all the scenarios.
> 
> Thanks & Regards
> Somnath
> 
> -----Original Message-----
> From: ceph-devel-owner@vger.kernel.org 
> [mailto:ceph-devel-owner@vger.kernel.org] On Behalf Of Somnath Roy
> Sent: Friday, October 23, 2015 7:02 PM
> To: Sage Weil
> Cc: ceph-devel@vger.kernel.org
> Subject: RE: Lock contention in do_rule
> 
> Thanks for the clarification Sage..
> I don't have much knowledge on this part , but, what I understood by going through the code is the following..
> 
> 1. Client calculates the pg->osd map by executing the same functions.
> 
> 2. Request coming to OSDs and OSDs are executing the same functions again to check if mapping still is the same or not. If something changed and it is not the right OSD to execute the request it error out.
> 
> 3. So, the lock is to protect against the bucket perm attribute write from multiple threads..
> 
> Some ideas :
> 
> 1. The lock itself may not be expensive but it is held in the beginning of do_rule. If we take the lock in much more granular level like in bucket_perm_choose() , it could be a gain..If it is a possibility , we can test this out.
> 
> 2. May be checking it from a messenger thread is having an effect, what if we move the check in the OSD worker thread  ?
> 
> Thanks & Regards
> Somnath
> 
> -----Original Message-----
> From: Sage Weil [mailto:sage@newdream.net]
> Sent: Friday, October 23, 2015 6:10 PM
> To: Somnath Roy
> Cc: ceph-devel@vger.kernel.org
> Subject: Re: Lock contention in do_rule
> 
> On Sat, 24 Oct 2015, Somnath Roy wrote:
> > Hi Sage,
> > We are seeing the following mapper_lock is heavily contended and commenting out this lock is improving performance ~10 % (in the short circuit path).
> > This is called for every io from osd_is_valid_op_target().
> > I looked into the code ,but, couldn't understand the purpose of the lock , it seems redundant to me , could you please confirm ?
> >
> >
> > void do_rule(int rule, int x, vector<int>& out, int maxout,
> >                const vector<__u32>& weight) const {
> >     Mutex::Locker l(mapper_lock);
> >     int rawout[maxout];
> >     int scratch[maxout * 3];
> >     int numrep = crush_do_rule(crush, rule, x, rawout, maxout, &weight[0], weight.size(), scratch);
> >     if (numrep < 0)
> >       numrep = 0;
> >     out.resize(numrep);
> >     for (int i=0; i<numrep; i++)
> >       out[i] = rawout[i];
> >   }
> 
> It's needed because of this:
> 
> https://github.com/ceph/ceph/blob/master/src/crush/crush.h#L137
> https://github.com/ceph/ceph/blob/master/src/crush/mapper.c#L88
> 
> This is clearly not the greatest approach.  I think what we need is a cache that is provided by the caller (which would be annoying an awkward because it's not linked directly to the bucket in question, and would not be shared between threads) or crush upcalls that take the lock only when in the perm path (which is relatively rare).  I'd lean toward the latter, but we need to be careful about it since this code is shared with the kernel and it needs to work there as well.  Probably we just need to define two callbacks for lock and unlock on the struct crush_map?
> 
> sage
> 
> ________________________________
> 
> PLEASE NOTE: The information contained in this electronic mail message is intended only for the use of the designated recipient(s) named above. If the reader of this message is not the intended recipient, you are hereby notified that you have received this message in error and that any review, dissemination, distribution, or copying of this message is strictly prohibited. If you have received this communication in error, please notify the sender by telephone or e-mail (as shown above) immediately and destroy any and all copies of this message in your possession (whether hard copies or electronically stored copies).
> 
> --
> To unsubscribe from this list: send the line "unsubscribe ceph-devel" 
> in the body of a message to majordomo@vger.kernel.org More majordomo 
> info at  http://vger.kernel.org/majordomo-info.html
> --
> To unsubscribe from this list: send the line "unsubscribe ceph-devel" 
> in the body of a message to majordomo@vger.kernel.org More majordomo 
> info at  http://vger.kernel.org/majordomo-info.html
> 
> 

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2015-10-24 22:23 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-10-24  0:41 Lock contention in do_rule Somnath Roy
2015-10-24  1:09 ` Sage Weil
2015-10-24  2:02   ` Somnath Roy
2015-10-24  2:09     ` Somnath Roy
2015-10-24 22:04       ` Sage Weil
2015-10-24 22:22         ` Somnath Roy

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.