All of lore.kernel.org
 help / color / mirror / Atom feed
* endless flying slow requests
@ 2012-11-14  9:12 Stefan Priebe - Profihost AG
  2012-11-14 14:59 ` Sage Weil
  0 siblings, 1 reply; 15+ messages in thread
From: Stefan Priebe - Profihost AG @ 2012-11-14  9:12 UTC (permalink / raw)
  To: ceph-devel

Hello list,

i see this several times. Endless flying slow requests. And they never 
stop until i restart the mentioned osd.

2012-11-14 10:11:57.513395 osd.24 [WRN] 1 slow requests, 1 included 
below; oldest blocked for > 31789.858457 secs
2012-11-14 10:11:57.513399 osd.24 [WRN] slow request 31789.858457 
seconds old, received at 2012-11-14 01:22:07.654922: 
osd_op(client.30286.0:6719 rbd_data.75c55bf2fdd7.0000000000001399 [write 
282624~4096] 3.3f6d2373) v4 currently delayed
2012-11-14 10:11:58.513584 osd.24 [WRN] 1 slow requests, 1 included 
below; oldest blocked for > 31790.858646 secs
2012-11-14 10:11:58.513586 osd.24 [WRN] slow request 31790.858646 
seconds old, received at 2012-11-14 01:22:07.654922: 
osd_op(client.30286.0:6719 rbd_data.75c55bf2fdd7.0000000000001399 [write 
282624~4096] 3.3f6d2373) v4 currently delayed
2012-11-14 10:11:59.513766 osd.24 [WRN] 1 slow requests, 1 included 
below; oldest blocked for > 31791.858827 secs
2012-11-14 10:11:59.513768 osd.24 [WRN] slow request 31791.858827 
seconds old, received at 2012-11-14 01:22:07.654922: 
osd_op(client.30286.0:6719 rbd_data.75c55bf2fdd7.0000000000001399 [write 
282624~4096] 3.3f6d2373) v4 currently delayed
2012-11-14 10:12:00.513909 osd.24 [WRN] 1 slow requests, 1 included 
below; oldest blocked for > 31792.858971 secs
2012-11-14 10:12:00.513916 osd.24 [WRN] slow request 31792.858971 
seconds old, received at 2012-11-14 01:22:07.654922: 
osd_op(client.30286.0:6719 rbd_data.75c55bf2fdd7.0000000000001399 [write 
282624~4096] 3.3f6d2373) v4 currently delayed
2012-11-14 10:12:01.514061 osd.24 [WRN] 1 slow requests, 1 included 
below; oldest blocked for > 31793.859124 secs
2012-11-14 10:12:01.514063 osd.24 [WRN] slow request 31793.859124 
seconds old, received at 2012-11-14 01:22:07.654922: 
osd_op(client.30286.0:6719 rbd_data.75c55bf2fdd7.0000000000001399 [write 
282624~4096] 3.3f6d2373) v4 currently delayed

When i now restart osd 24 they go away and everything is fine again.

Stefan

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: endless flying slow requests
  2012-11-14  9:12 endless flying slow requests Stefan Priebe - Profihost AG
@ 2012-11-14 14:59 ` Sage Weil
  2012-11-15  8:09   ` Stefan Priebe - Profihost AG
  0 siblings, 1 reply; 15+ messages in thread
From: Sage Weil @ 2012-11-14 14:59 UTC (permalink / raw)
  To: Stefan Priebe - Profihost AG; +Cc: ceph-devel

Hi Stefan,

I would be nice to confirm that no clients are waiting on replies for 
these requests; currently we suspect that the OSD request tracking is the 
buggy part.  If you query the OSD admin socket you should be able to dump 
requests and see the client IP, and then query the client.  

Is it librbd?  In that case you likely need to change the config so that 
it is listening on an admin socket ('admin socket = path').

Thanks!
sage


On Wed, 14 Nov 2012, Stefan Priebe - Profihost AG wrote:

> Hello list,
> 
> i see this several times. Endless flying slow requests. And they never stop
> until i restart the mentioned osd.
> 
> 2012-11-14 10:11:57.513395 osd.24 [WRN] 1 slow requests, 1 included below;
> oldest blocked for > 31789.858457 secs
> 2012-11-14 10:11:57.513399 osd.24 [WRN] slow request 31789.858457 seconds old,
> received at 2012-11-14 01:22:07.654922: osd_op(client.30286.0:6719
> rbd_data.75c55bf2fdd7.0000000000001399 [write 282624~4096] 3.3f6d2373) v4
> currently delayed
> 2012-11-14 10:11:58.513584 osd.24 [WRN] 1 slow requests, 1 included below;
> oldest blocked for > 31790.858646 secs
> 2012-11-14 10:11:58.513586 osd.24 [WRN] slow request 31790.858646 seconds old,
> received at 2012-11-14 01:22:07.654922: osd_op(client.30286.0:6719
> rbd_data.75c55bf2fdd7.0000000000001399 [write 282624~4096] 3.3f6d2373) v4
> currently delayed
> 2012-11-14 10:11:59.513766 osd.24 [WRN] 1 slow requests, 1 included below;
> oldest blocked for > 31791.858827 secs
> 2012-11-14 10:11:59.513768 osd.24 [WRN] slow request 31791.858827 seconds old,
> received at 2012-11-14 01:22:07.654922: osd_op(client.30286.0:6719
> rbd_data.75c55bf2fdd7.0000000000001399 [write 282624~4096] 3.3f6d2373) v4
> currently delayed
> 2012-11-14 10:12:00.513909 osd.24 [WRN] 1 slow requests, 1 included below;
> oldest blocked for > 31792.858971 secs
> 2012-11-14 10:12:00.513916 osd.24 [WRN] slow request 31792.858971 seconds old,
> received at 2012-11-14 01:22:07.654922: osd_op(client.30286.0:6719
> rbd_data.75c55bf2fdd7.0000000000001399 [write 282624~4096] 3.3f6d2373) v4
> currently delayed
> 2012-11-14 10:12:01.514061 osd.24 [WRN] 1 slow requests, 1 included below;
> oldest blocked for > 31793.859124 secs
> 2012-11-14 10:12:01.514063 osd.24 [WRN] slow request 31793.859124 seconds old,
> received at 2012-11-14 01:22:07.654922: osd_op(client.30286.0:6719
> rbd_data.75c55bf2fdd7.0000000000001399 [write 282624~4096] 3.3f6d2373) v4
> currently delayed
> 
> When i now restart osd 24 they go away and everything is fine again.
> 
> Stefan
> --
> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 
> 

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: endless flying slow requests
  2012-11-14 14:59 ` Sage Weil
@ 2012-11-15  8:09   ` Stefan Priebe - Profihost AG
  2012-11-15  8:18     ` Josh Durgin
  2012-11-15 14:58     ` Sage Weil
  0 siblings, 2 replies; 15+ messages in thread
From: Stefan Priebe - Profihost AG @ 2012-11-15  8:09 UTC (permalink / raw)
  To: Sage Weil; +Cc: ceph-devel

Am 14.11.2012 15:59, schrieb Sage Weil:
> Hi Stefan,
>
> I would be nice to confirm that no clients are waiting on replies for
> these requests; currently we suspect that the OSD request tracking is the
> buggy part.  If you query the OSD admin socket you should be able to dump
> requests and see the client IP, and then query the client.
>
> Is it librbd?  In that case you likely need to change the config so that
> it is listening on an admin socket ('admin socket = path').

Yes it is. So i have to specify admin socket at the KVM host? How do i 
query the admin socket for requests?

Stefan


> On Wed, 14 Nov 2012, Stefan Priebe - Profihost AG wrote:
>
>> Hello list,
>>
>> i see this several times. Endless flying slow requests. And they never stop
>> until i restart the mentioned osd.
>>
>> 2012-11-14 10:11:57.513395 osd.24 [WRN] 1 slow requests, 1 included below;
>> oldest blocked for > 31789.858457 secs
>> 2012-11-14 10:11:57.513399 osd.24 [WRN] slow request 31789.858457 seconds old,
>> received at 2012-11-14 01:22:07.654922: osd_op(client.30286.0:6719
>> rbd_data.75c55bf2fdd7.0000000000001399 [write 282624~4096] 3.3f6d2373) v4
>> currently delayed
>> 2012-11-14 10:11:58.513584 osd.24 [WRN] 1 slow requests, 1 included below;
>> oldest blocked for > 31790.858646 secs
>> 2012-11-14 10:11:58.513586 osd.24 [WRN] slow request 31790.858646 seconds old,
>> received at 2012-11-14 01:22:07.654922: osd_op(client.30286.0:6719
>> rbd_data.75c55bf2fdd7.0000000000001399 [write 282624~4096] 3.3f6d2373) v4
>> currently delayed
>> 2012-11-14 10:11:59.513766 osd.24 [WRN] 1 slow requests, 1 included below;
>> oldest blocked for > 31791.858827 secs
>> 2012-11-14 10:11:59.513768 osd.24 [WRN] slow request 31791.858827 seconds old,
>> received at 2012-11-14 01:22:07.654922: osd_op(client.30286.0:6719
>> rbd_data.75c55bf2fdd7.0000000000001399 [write 282624~4096] 3.3f6d2373) v4
>> currently delayed
>> 2012-11-14 10:12:00.513909 osd.24 [WRN] 1 slow requests, 1 included below;
>> oldest blocked for > 31792.858971 secs
>> 2012-11-14 10:12:00.513916 osd.24 [WRN] slow request 31792.858971 seconds old,
>> received at 2012-11-14 01:22:07.654922: osd_op(client.30286.0:6719
>> rbd_data.75c55bf2fdd7.0000000000001399 [write 282624~4096] 3.3f6d2373) v4
>> currently delayed
>> 2012-11-14 10:12:01.514061 osd.24 [WRN] 1 slow requests, 1 included below;
>> oldest blocked for > 31793.859124 secs
>> 2012-11-14 10:12:01.514063 osd.24 [WRN] slow request 31793.859124 seconds old,
>> received at 2012-11-14 01:22:07.654922: osd_op(client.30286.0:6719
>> rbd_data.75c55bf2fdd7.0000000000001399 [write 282624~4096] 3.3f6d2373) v4
>> currently delayed
>>
>> When i now restart osd 24 they go away and everything is fine again.
>>
>> Stefan
>> --
>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
>> the body of a message to majordomo@vger.kernel.org
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>
>>
> --
> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: endless flying slow requests
  2012-11-15  8:09   ` Stefan Priebe - Profihost AG
@ 2012-11-15  8:18     ` Josh Durgin
  2012-11-15 14:58     ` Sage Weil
  1 sibling, 0 replies; 15+ messages in thread
From: Josh Durgin @ 2012-11-15  8:18 UTC (permalink / raw)
  To: Stefan Priebe - Profihost AG; +Cc: Sage Weil, ceph-devel

On 11/15/2012 12:09 AM, Stefan Priebe - Profihost AG wrote:
> Am 14.11.2012 15:59, schrieb Sage Weil:
>> Hi Stefan,
>>
>> I would be nice to confirm that no clients are waiting on replies for
>> these requests; currently we suspect that the OSD request tracking is the
>> buggy part.  If you query the OSD admin socket you should be able to dump
>> requests and see the client IP, and then query the client.
>>
>> Is it librbd?  In that case you likely need to change the config so that
>> it is listening on an admin socket ('admin socket = path').
>
> Yes it is. So i have to specify admin socket at the KVM host? How do i
> query the admin socket for requests?

Yes, add 'admin socket = /path/to/admin/socket' to the [client] section
in ceph.conf, and when a guest is running, show outstanding requests
with:

ceph --admin-daemon /path/to/admin/socket objecter_requests

Josh

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: endless flying slow requests
  2012-11-15  8:09   ` Stefan Priebe - Profihost AG
  2012-11-15  8:18     ` Josh Durgin
@ 2012-11-15 14:58     ` Sage Weil
  2012-11-28  1:51       ` Sage Weil
  1 sibling, 1 reply; 15+ messages in thread
From: Sage Weil @ 2012-11-15 14:58 UTC (permalink / raw)
  To: Stefan Priebe - Profihost AG; +Cc: ceph-devel

On Thu, 15 Nov 2012, Stefan Priebe - Profihost AG wrote:
> Am 14.11.2012 15:59, schrieb Sage Weil:
> > Hi Stefan,
> > 
> > I would be nice to confirm that no clients are waiting on replies for
> > these requests; currently we suspect that the OSD request tracking is the
> > buggy part.  If you query the OSD admin socket you should be able to dump
> > requests and see the client IP, and then query the client.
> > 
> > Is it librbd?  In that case you likely need to change the config so that
> > it is listening on an admin socket ('admin socket = path').
> 
> Yes it is. So i have to specify admin socket at the KVM host?

Right.  IIRC the disk line is a ; (or \;) separated list of key/value 
pairs.

> How do i query the admin socket for requests?

ceph --admin-daemon /path/to/socket help
ceph --admin-daemon /path/to/socket objecter_dump     (i think)

sage

> 
> Stefan
> 
> 
> > On Wed, 14 Nov 2012, Stefan Priebe - Profihost AG wrote:
> > 
> > > Hello list,
> > > 
> > > i see this several times. Endless flying slow requests. And they never
> > > stop
> > > until i restart the mentioned osd.
> > > 
> > > 2012-11-14 10:11:57.513395 osd.24 [WRN] 1 slow requests, 1 included below;
> > > oldest blocked for > 31789.858457 secs
> > > 2012-11-14 10:11:57.513399 osd.24 [WRN] slow request 31789.858457 seconds
> > > old,
> > > received at 2012-11-14 01:22:07.654922: osd_op(client.30286.0:6719
> > > rbd_data.75c55bf2fdd7.0000000000001399 [write 282624~4096] 3.3f6d2373) v4
> > > currently delayed
> > > 2012-11-14 10:11:58.513584 osd.24 [WRN] 1 slow requests, 1 included below;
> > > oldest blocked for > 31790.858646 secs
> > > 2012-11-14 10:11:58.513586 osd.24 [WRN] slow request 31790.858646 seconds
> > > old,
> > > received at 2012-11-14 01:22:07.654922: osd_op(client.30286.0:6719
> > > rbd_data.75c55bf2fdd7.0000000000001399 [write 282624~4096] 3.3f6d2373) v4
> > > currently delayed
> > > 2012-11-14 10:11:59.513766 osd.24 [WRN] 1 slow requests, 1 included below;
> > > oldest blocked for > 31791.858827 secs
> > > 2012-11-14 10:11:59.513768 osd.24 [WRN] slow request 31791.858827 seconds
> > > old,
> > > received at 2012-11-14 01:22:07.654922: osd_op(client.30286.0:6719
> > > rbd_data.75c55bf2fdd7.0000000000001399 [write 282624~4096] 3.3f6d2373) v4
> > > currently delayed
> > > 2012-11-14 10:12:00.513909 osd.24 [WRN] 1 slow requests, 1 included below;
> > > oldest blocked for > 31792.858971 secs
> > > 2012-11-14 10:12:00.513916 osd.24 [WRN] slow request 31792.858971 seconds
> > > old,
> > > received at 2012-11-14 01:22:07.654922: osd_op(client.30286.0:6719
> > > rbd_data.75c55bf2fdd7.0000000000001399 [write 282624~4096] 3.3f6d2373) v4
> > > currently delayed
> > > 2012-11-14 10:12:01.514061 osd.24 [WRN] 1 slow requests, 1 included below;
> > > oldest blocked for > 31793.859124 secs
> > > 2012-11-14 10:12:01.514063 osd.24 [WRN] slow request 31793.859124 seconds
> > > old,
> > > received at 2012-11-14 01:22:07.654922: osd_op(client.30286.0:6719
> > > rbd_data.75c55bf2fdd7.0000000000001399 [write 282624~4096] 3.3f6d2373) v4
> > > currently delayed
> > > 
> > > When i now restart osd 24 they go away and everything is fine again.
> > > 
> > > Stefan
> > > --
> > > To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
> > > the body of a message to majordomo@vger.kernel.org
> > > More majordomo info at  http://vger.kernel.org/majordomo-info.html
> > > 
> > > 
> > --
> > To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
> > the body of a message to majordomo@vger.kernel.org
> > More majordomo info at  http://vger.kernel.org/majordomo-info.html
> > 
> --
> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 
> 

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: endless flying slow requests
  2012-11-15 14:58     ` Sage Weil
@ 2012-11-28  1:51       ` Sage Weil
  2012-11-28  7:23         ` Stefan Priebe - Profihost AG
  2012-11-28  7:47         ` Andrey Korolyov
  0 siblings, 2 replies; 15+ messages in thread
From: Sage Weil @ 2012-11-28  1:51 UTC (permalink / raw)
  To: Stefan Priebe - Profihost AG; +Cc: ceph-devel

Hi Stefan,

On Thu, 15 Nov 2012, Sage Weil wrote:
> On Thu, 15 Nov 2012, Stefan Priebe - Profihost AG wrote:
> > Am 14.11.2012 15:59, schrieb Sage Weil:
> > > Hi Stefan,
> > > 
> > > I would be nice to confirm that no clients are waiting on replies for
> > > these requests; currently we suspect that the OSD request tracking is the
> > > buggy part.  If you query the OSD admin socket you should be able to dump
> > > requests and see the client IP, and then query the client.
> > > 
> > > Is it librbd?  In that case you likely need to change the config so that
> > > it is listening on an admin socket ('admin socket = path').
> > 
> > Yes it is. So i have to specify admin socket at the KVM host?
> 
> Right.  IIRC the disk line is a ; (or \;) separated list of key/value 
> pairs.
> 
> > How do i query the admin socket for requests?
> 
> ceph --admin-daemon /path/to/socket help
> ceph --admin-daemon /path/to/socket objecter_dump     (i think)

Were you able to reproduce this?

Thanks!
sage


> 
> sage
> 
> > 
> > Stefan
> > 
> > 
> > > On Wed, 14 Nov 2012, Stefan Priebe - Profihost AG wrote:
> > > 
> > > > Hello list,
> > > > 
> > > > i see this several times. Endless flying slow requests. And they never
> > > > stop
> > > > until i restart the mentioned osd.
> > > > 
> > > > 2012-11-14 10:11:57.513395 osd.24 [WRN] 1 slow requests, 1 included below;
> > > > oldest blocked for > 31789.858457 secs
> > > > 2012-11-14 10:11:57.513399 osd.24 [WRN] slow request 31789.858457 seconds
> > > > old,
> > > > received at 2012-11-14 01:22:07.654922: osd_op(client.30286.0:6719
> > > > rbd_data.75c55bf2fdd7.0000000000001399 [write 282624~4096] 3.3f6d2373) v4
> > > > currently delayed
> > > > 2012-11-14 10:11:58.513584 osd.24 [WRN] 1 slow requests, 1 included below;
> > > > oldest blocked for > 31790.858646 secs
> > > > 2012-11-14 10:11:58.513586 osd.24 [WRN] slow request 31790.858646 seconds
> > > > old,
> > > > received at 2012-11-14 01:22:07.654922: osd_op(client.30286.0:6719
> > > > rbd_data.75c55bf2fdd7.0000000000001399 [write 282624~4096] 3.3f6d2373) v4
> > > > currently delayed
> > > > 2012-11-14 10:11:59.513766 osd.24 [WRN] 1 slow requests, 1 included below;
> > > > oldest blocked for > 31791.858827 secs
> > > > 2012-11-14 10:11:59.513768 osd.24 [WRN] slow request 31791.858827 seconds
> > > > old,
> > > > received at 2012-11-14 01:22:07.654922: osd_op(client.30286.0:6719
> > > > rbd_data.75c55bf2fdd7.0000000000001399 [write 282624~4096] 3.3f6d2373) v4
> > > > currently delayed
> > > > 2012-11-14 10:12:00.513909 osd.24 [WRN] 1 slow requests, 1 included below;
> > > > oldest blocked for > 31792.858971 secs
> > > > 2012-11-14 10:12:00.513916 osd.24 [WRN] slow request 31792.858971 seconds
> > > > old,
> > > > received at 2012-11-14 01:22:07.654922: osd_op(client.30286.0:6719
> > > > rbd_data.75c55bf2fdd7.0000000000001399 [write 282624~4096] 3.3f6d2373) v4
> > > > currently delayed
> > > > 2012-11-14 10:12:01.514061 osd.24 [WRN] 1 slow requests, 1 included below;
> > > > oldest blocked for > 31793.859124 secs
> > > > 2012-11-14 10:12:01.514063 osd.24 [WRN] slow request 31793.859124 seconds
> > > > old,
> > > > received at 2012-11-14 01:22:07.654922: osd_op(client.30286.0:6719
> > > > rbd_data.75c55bf2fdd7.0000000000001399 [write 282624~4096] 3.3f6d2373) v4
> > > > currently delayed
> > > > 
> > > > When i now restart osd 24 they go away and everything is fine again.
> > > > 
> > > > Stefan
> > > > --
> > > > To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
> > > > the body of a message to majordomo@vger.kernel.org
> > > > More majordomo info at  http://vger.kernel.org/majordomo-info.html
> > > > 
> > > > 
> > > --
> > > To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
> > > the body of a message to majordomo@vger.kernel.org
> > > More majordomo info at  http://vger.kernel.org/majordomo-info.html
> > > 
> > --
> > To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
> > the body of a message to majordomo@vger.kernel.org
> > More majordomo info at  http://vger.kernel.org/majordomo-info.html
> > 
> > 
> --
> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 
> 

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: endless flying slow requests
  2012-11-28  1:51       ` Sage Weil
@ 2012-11-28  7:23         ` Stefan Priebe - Profihost AG
  2012-11-28  7:47         ` Andrey Korolyov
  1 sibling, 0 replies; 15+ messages in thread
From: Stefan Priebe - Profihost AG @ 2012-11-28  7:23 UTC (permalink / raw)
  To: Sage Weil; +Cc: ceph-devel

Am 28.11.2012 02:51, schrieb Sage Weil:
> Hi Stefan,
>
>>> Yes it is. So i have to specify admin socket at the KVM host?
>>
>> Right.  IIRC the disk line is a ; (or \;) separated list of key/value
>> pairs.
>>
>>> How do i query the admin socket for requests?
>>
>> ceph --admin-daemon /path/to/socket help
>> ceph --admin-daemon /path/to/socket objecter_dump     (i think)
>
> Were you able to reproduce this?

No not right now. But i've saved your answers and will make the tests 
when i see this again.

Right now i'm just seeing extremely slow OSD recovery.

Stefan

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: endless flying slow requests
  2012-11-28  1:51       ` Sage Weil
  2012-11-28  7:23         ` Stefan Priebe - Profihost AG
@ 2012-11-28  7:47         ` Andrey Korolyov
  2012-11-28 21:11           ` Samuel Just
  1 sibling, 1 reply; 15+ messages in thread
From: Andrey Korolyov @ 2012-11-28  7:47 UTC (permalink / raw)
  To: Sage Weil; +Cc: Stefan Priebe - Profihost AG, ceph-devel

On Wed, Nov 28, 2012 at 5:51 AM, Sage Weil <sage@inktank.com> wrote:
> Hi Stefan,
>
> On Thu, 15 Nov 2012, Sage Weil wrote:
>> On Thu, 15 Nov 2012, Stefan Priebe - Profihost AG wrote:
>> > Am 14.11.2012 15:59, schrieb Sage Weil:
>> > > Hi Stefan,
>> > >
>> > > I would be nice to confirm that no clients are waiting on replies for
>> > > these requests; currently we suspect that the OSD request tracking is the
>> > > buggy part.  If you query the OSD admin socket you should be able to dump
>> > > requests and see the client IP, and then query the client.
>> > >
>> > > Is it librbd?  In that case you likely need to change the config so that
>> > > it is listening on an admin socket ('admin socket = path').
>> >
>> > Yes it is. So i have to specify admin socket at the KVM host?
>>
>> Right.  IIRC the disk line is a ; (or \;) separated list of key/value
>> pairs.
>>
>> > How do i query the admin socket for requests?
>>
>> ceph --admin-daemon /path/to/socket help
>> ceph --admin-daemon /path/to/socket objecter_dump     (i think)
>
> Were you able to reproduce this?
>
> Thanks!
> sage

Meanwhile, I did. :)
Such requests will always be created if you have restarted or marked
an osd out and then back in and scrub didn`t happen in the meantime
(after such operation and before request arrival).
What is more interesting, the hangup happens not exactly at the time
of operation, but tens of minutes later.

        { "description": "osd_sub_op(client.1292013.0:45422 4.731
a384cf31\/rbd_data.1415fb1075f187.00000000000000a7\/head\/\/4 [] v
16444'21693 snapset=0=[]:[] snapc=0=[])",
          "received_at": "2012-11-28 03:54:43.094151",
          "age": "27812.942680",
          "duration": "2.676641",
          "flag_point": "started",
          "events": [
                { "time": "2012-11-28 03:54:43.094222",
                  "event": "waiting_for_osdmap"},
                { "time": "2012-11-28 03:54:43.386890",
                  "event": "reached_pg"},
                { "time": "2012-11-28 03:54:43.386894",
                  "event": "started"},
                { "time": "2012-11-28 03:54:43.386973",
                  "event": "commit_queued_for_journal_write"},
                { "time": "2012-11-28 03:54:45.360049",
                  "event": "write_thread_in_journal_buffer"},
                { "time": "2012-11-28 03:54:45.586183",
                  "event": "journaled_completion_queued"},
                { "time": "2012-11-28 03:54:45.586262",
                  "event": "sub_op_commit"},
                { "time": "2012-11-28 03:54:45.770792",
                  "event": "sub_op_applied"}]}]}


>
>
>>
>> sage
>>
>> >
>> > Stefan
>> >
>> >
>> > > On Wed, 14 Nov 2012, Stefan Priebe - Profihost AG wrote:
>> > >
>> > > > Hello list,
>> > > >
>> > > > i see this several times. Endless flying slow requests. And they never
>> > > > stop
>> > > > until i restart the mentioned osd.
>> > > >
>> > > > 2012-11-14 10:11:57.513395 osd.24 [WRN] 1 slow requests, 1 included below;
>> > > > oldest blocked for > 31789.858457 secs
>> > > > 2012-11-14 10:11:57.513399 osd.24 [WRN] slow request 31789.858457 seconds
>> > > > old,
>> > > > received at 2012-11-14 01:22:07.654922: osd_op(client.30286.0:6719
>> > > > rbd_data.75c55bf2fdd7.0000000000001399 [write 282624~4096] 3.3f6d2373) v4
>> > > > currently delayed
>> > > > 2012-11-14 10:11:58.513584 osd.24 [WRN] 1 slow requests, 1 included below;
>> > > > oldest blocked for > 31790.858646 secs
>> > > > 2012-11-14 10:11:58.513586 osd.24 [WRN] slow request 31790.858646 seconds
>> > > > old,
>> > > > received at 2012-11-14 01:22:07.654922: osd_op(client.30286.0:6719
>> > > > rbd_data.75c55bf2fdd7.0000000000001399 [write 282624~4096] 3.3f6d2373) v4
>> > > > currently delayed
>> > > > 2012-11-14 10:11:59.513766 osd.24 [WRN] 1 slow requests, 1 included below;
>> > > > oldest blocked for > 31791.858827 secs
>> > > > 2012-11-14 10:11:59.513768 osd.24 [WRN] slow request 31791.858827 seconds
>> > > > old,
>> > > > received at 2012-11-14 01:22:07.654922: osd_op(client.30286.0:6719
>> > > > rbd_data.75c55bf2fdd7.0000000000001399 [write 282624~4096] 3.3f6d2373) v4
>> > > > currently delayed
>> > > > 2012-11-14 10:12:00.513909 osd.24 [WRN] 1 slow requests, 1 included below;
>> > > > oldest blocked for > 31792.858971 secs
>> > > > 2012-11-14 10:12:00.513916 osd.24 [WRN] slow request 31792.858971 seconds
>> > > > old,
>> > > > received at 2012-11-14 01:22:07.654922: osd_op(client.30286.0:6719
>> > > > rbd_data.75c55bf2fdd7.0000000000001399 [write 282624~4096] 3.3f6d2373) v4
>> > > > currently delayed
>> > > > 2012-11-14 10:12:01.514061 osd.24 [WRN] 1 slow requests, 1 included below;
>> > > > oldest blocked for > 31793.859124 secs
>> > > > 2012-11-14 10:12:01.514063 osd.24 [WRN] slow request 31793.859124 seconds
>> > > > old,
>> > > > received at 2012-11-14 01:22:07.654922: osd_op(client.30286.0:6719
>> > > > rbd_data.75c55bf2fdd7.0000000000001399 [write 282624~4096] 3.3f6d2373) v4
>> > > > currently delayed
>> > > >
>> > > > When i now restart osd 24 they go away and everything is fine again.
>> > > >
>> > > > Stefan
>> > > > --
>> > > > To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
>> > > > the body of a message to majordomo@vger.kernel.org
>> > > > More majordomo info at  http://vger.kernel.org/majordomo-info.html
>> > > >
>> > > >
>> > > --
>> > > To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
>> > > the body of a message to majordomo@vger.kernel.org
>> > > More majordomo info at  http://vger.kernel.org/majordomo-info.html
>> > >
>> > --
>> > To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
>> > the body of a message to majordomo@vger.kernel.org
>> > More majordomo info at  http://vger.kernel.org/majordomo-info.html
>> >
>> >
>> --
>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
>> the body of a message to majordomo@vger.kernel.org
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>
>>
> --
> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: endless flying slow requests
  2012-11-28  7:47         ` Andrey Korolyov
@ 2012-11-28 21:11           ` Samuel Just
  2012-11-28 21:12             ` Samuel Just
  0 siblings, 1 reply; 15+ messages in thread
From: Samuel Just @ 2012-11-28 21:11 UTC (permalink / raw)
  To: Andrey Korolyov; +Cc: Sage Weil, Stefan Priebe - Profihost AG, ceph-devel

Did you observe hung io along with that error?  Both sub_op_commit and
sub_op_applied have happened, so the sub_op_reply should have been
sent back to the primary.  This looks more like a leak.  If you also
observed hung io, then it's possible that the problem is occurring
between the sub_op_applied event and the response.
-Sam

On Tue, Nov 27, 2012 at 11:47 PM, Andrey Korolyov <andrey@xdel.ru> wrote:
> On Wed, Nov 28, 2012 at 5:51 AM, Sage Weil <sage@inktank.com> wrote:
>> Hi Stefan,
>>
>> On Thu, 15 Nov 2012, Sage Weil wrote:
>>> On Thu, 15 Nov 2012, Stefan Priebe - Profihost AG wrote:
>>> > Am 14.11.2012 15:59, schrieb Sage Weil:
>>> > > Hi Stefan,
>>> > >
>>> > > I would be nice to confirm that no clients are waiting on replies for
>>> > > these requests; currently we suspect that the OSD request tracking is the
>>> > > buggy part.  If you query the OSD admin socket you should be able to dump
>>> > > requests and see the client IP, and then query the client.
>>> > >
>>> > > Is it librbd?  In that case you likely need to change the config so that
>>> > > it is listening on an admin socket ('admin socket = path').
>>> >
>>> > Yes it is. So i have to specify admin socket at the KVM host?
>>>
>>> Right.  IIRC the disk line is a ; (or \;) separated list of key/value
>>> pairs.
>>>
>>> > How do i query the admin socket for requests?
>>>
>>> ceph --admin-daemon /path/to/socket help
>>> ceph --admin-daemon /path/to/socket objecter_dump     (i think)
>>
>> Were you able to reproduce this?
>>
>> Thanks!
>> sage
>
> Meanwhile, I did. :)
> Such requests will always be created if you have restarted or marked
> an osd out and then back in and scrub didn`t happen in the meantime
> (after such operation and before request arrival).
> What is more interesting, the hangup happens not exactly at the time
> of operation, but tens of minutes later.
>
>         { "description": "osd_sub_op(client.1292013.0:45422 4.731
> a384cf31\/rbd_data.1415fb1075f187.00000000000000a7\/head\/\/4 [] v
> 16444'21693 snapset=0=[]:[] snapc=0=[])",
>           "received_at": "2012-11-28 03:54:43.094151",
>           "age": "27812.942680",
>           "duration": "2.676641",
>           "flag_point": "started",
>           "events": [
>                 { "time": "2012-11-28 03:54:43.094222",
>                   "event": "waiting_for_osdmap"},
>                 { "time": "2012-11-28 03:54:43.386890",
>                   "event": "reached_pg"},
>                 { "time": "2012-11-28 03:54:43.386894",
>                   "event": "started"},
>                 { "time": "2012-11-28 03:54:43.386973",
>                   "event": "commit_queued_for_journal_write"},
>                 { "time": "2012-11-28 03:54:45.360049",
>                   "event": "write_thread_in_journal_buffer"},
>                 { "time": "2012-11-28 03:54:45.586183",
>                   "event": "journaled_completion_queued"},
>                 { "time": "2012-11-28 03:54:45.586262",
>                   "event": "sub_op_commit"},
>                 { "time": "2012-11-28 03:54:45.770792",
>                   "event": "sub_op_applied"}]}]}
>
>
>>
>>
>>>
>>> sage
>>>
>>> >
>>> > Stefan
>>> >
>>> >
>>> > > On Wed, 14 Nov 2012, Stefan Priebe - Profihost AG wrote:
>>> > >
>>> > > > Hello list,
>>> > > >
>>> > > > i see this several times. Endless flying slow requests. And they never
>>> > > > stop
>>> > > > until i restart the mentioned osd.
>>> > > >
>>> > > > 2012-11-14 10:11:57.513395 osd.24 [WRN] 1 slow requests, 1 included below;
>>> > > > oldest blocked for > 31789.858457 secs
>>> > > > 2012-11-14 10:11:57.513399 osd.24 [WRN] slow request 31789.858457 seconds
>>> > > > old,
>>> > > > received at 2012-11-14 01:22:07.654922: osd_op(client.30286.0:6719
>>> > > > rbd_data.75c55bf2fdd7.0000000000001399 [write 282624~4096] 3.3f6d2373) v4
>>> > > > currently delayed
>>> > > > 2012-11-14 10:11:58.513584 osd.24 [WRN] 1 slow requests, 1 included below;
>>> > > > oldest blocked for > 31790.858646 secs
>>> > > > 2012-11-14 10:11:58.513586 osd.24 [WRN] slow request 31790.858646 seconds
>>> > > > old,
>>> > > > received at 2012-11-14 01:22:07.654922: osd_op(client.30286.0:6719
>>> > > > rbd_data.75c55bf2fdd7.0000000000001399 [write 282624~4096] 3.3f6d2373) v4
>>> > > > currently delayed
>>> > > > 2012-11-14 10:11:59.513766 osd.24 [WRN] 1 slow requests, 1 included below;
>>> > > > oldest blocked for > 31791.858827 secs
>>> > > > 2012-11-14 10:11:59.513768 osd.24 [WRN] slow request 31791.858827 seconds
>>> > > > old,
>>> > > > received at 2012-11-14 01:22:07.654922: osd_op(client.30286.0:6719
>>> > > > rbd_data.75c55bf2fdd7.0000000000001399 [write 282624~4096] 3.3f6d2373) v4
>>> > > > currently delayed
>>> > > > 2012-11-14 10:12:00.513909 osd.24 [WRN] 1 slow requests, 1 included below;
>>> > > > oldest blocked for > 31792.858971 secs
>>> > > > 2012-11-14 10:12:00.513916 osd.24 [WRN] slow request 31792.858971 seconds
>>> > > > old,
>>> > > > received at 2012-11-14 01:22:07.654922: osd_op(client.30286.0:6719
>>> > > > rbd_data.75c55bf2fdd7.0000000000001399 [write 282624~4096] 3.3f6d2373) v4
>>> > > > currently delayed
>>> > > > 2012-11-14 10:12:01.514061 osd.24 [WRN] 1 slow requests, 1 included below;
>>> > > > oldest blocked for > 31793.859124 secs
>>> > > > 2012-11-14 10:12:01.514063 osd.24 [WRN] slow request 31793.859124 seconds
>>> > > > old,
>>> > > > received at 2012-11-14 01:22:07.654922: osd_op(client.30286.0:6719
>>> > > > rbd_data.75c55bf2fdd7.0000000000001399 [write 282624~4096] 3.3f6d2373) v4
>>> > > > currently delayed
>>> > > >
>>> > > > When i now restart osd 24 they go away and everything is fine again.
>>> > > >
>>> > > > Stefan
>>> > > > --
>>> > > > To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
>>> > > > the body of a message to majordomo@vger.kernel.org
>>> > > > More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>> > > >
>>> > > >
>>> > > --
>>> > > To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
>>> > > the body of a message to majordomo@vger.kernel.org
>>> > > More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>> > >
>>> > --
>>> > To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
>>> > the body of a message to majordomo@vger.kernel.org
>>> > More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>> >
>>> >
>>> --
>>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
>>> the body of a message to majordomo@vger.kernel.org
>>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>>
>>>
>> --
>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
>> the body of a message to majordomo@vger.kernel.org
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> --
> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: endless flying slow requests
  2012-11-28 21:11           ` Samuel Just
@ 2012-11-28 21:12             ` Samuel Just
  2012-11-28 21:52               ` Moore, Shawn M
  2012-11-28 22:04               ` Andrey Korolyov
  0 siblings, 2 replies; 15+ messages in thread
From: Samuel Just @ 2012-11-28 21:12 UTC (permalink / raw)
  To: Andrey Korolyov; +Cc: Sage Weil, Stefan Priebe - Profihost AG, ceph-devel

Also, these clusters aren't mixed argonaut and next, are they?  (Not
that that shouldn't work, but it would be a useful data point.)
-Sam

On Wed, Nov 28, 2012 at 1:11 PM, Samuel Just <sam.just@inktank.com> wrote:
> Did you observe hung io along with that error?  Both sub_op_commit and
> sub_op_applied have happened, so the sub_op_reply should have been
> sent back to the primary.  This looks more like a leak.  If you also
> observed hung io, then it's possible that the problem is occurring
> between the sub_op_applied event and the response.
> -Sam
>
> On Tue, Nov 27, 2012 at 11:47 PM, Andrey Korolyov <andrey@xdel.ru> wrote:
>> On Wed, Nov 28, 2012 at 5:51 AM, Sage Weil <sage@inktank.com> wrote:
>>> Hi Stefan,
>>>
>>> On Thu, 15 Nov 2012, Sage Weil wrote:
>>>> On Thu, 15 Nov 2012, Stefan Priebe - Profihost AG wrote:
>>>> > Am 14.11.2012 15:59, schrieb Sage Weil:
>>>> > > Hi Stefan,
>>>> > >
>>>> > > I would be nice to confirm that no clients are waiting on replies for
>>>> > > these requests; currently we suspect that the OSD request tracking is the
>>>> > > buggy part.  If you query the OSD admin socket you should be able to dump
>>>> > > requests and see the client IP, and then query the client.
>>>> > >
>>>> > > Is it librbd?  In that case you likely need to change the config so that
>>>> > > it is listening on an admin socket ('admin socket = path').
>>>> >
>>>> > Yes it is. So i have to specify admin socket at the KVM host?
>>>>
>>>> Right.  IIRC the disk line is a ; (or \;) separated list of key/value
>>>> pairs.
>>>>
>>>> > How do i query the admin socket for requests?
>>>>
>>>> ceph --admin-daemon /path/to/socket help
>>>> ceph --admin-daemon /path/to/socket objecter_dump     (i think)
>>>
>>> Were you able to reproduce this?
>>>
>>> Thanks!
>>> sage
>>
>> Meanwhile, I did. :)
>> Such requests will always be created if you have restarted or marked
>> an osd out and then back in and scrub didn`t happen in the meantime
>> (after such operation and before request arrival).
>> What is more interesting, the hangup happens not exactly at the time
>> of operation, but tens of minutes later.
>>
>>         { "description": "osd_sub_op(client.1292013.0:45422 4.731
>> a384cf31\/rbd_data.1415fb1075f187.00000000000000a7\/head\/\/4 [] v
>> 16444'21693 snapset=0=[]:[] snapc=0=[])",
>>           "received_at": "2012-11-28 03:54:43.094151",
>>           "age": "27812.942680",
>>           "duration": "2.676641",
>>           "flag_point": "started",
>>           "events": [
>>                 { "time": "2012-11-28 03:54:43.094222",
>>                   "event": "waiting_for_osdmap"},
>>                 { "time": "2012-11-28 03:54:43.386890",
>>                   "event": "reached_pg"},
>>                 { "time": "2012-11-28 03:54:43.386894",
>>                   "event": "started"},
>>                 { "time": "2012-11-28 03:54:43.386973",
>>                   "event": "commit_queued_for_journal_write"},
>>                 { "time": "2012-11-28 03:54:45.360049",
>>                   "event": "write_thread_in_journal_buffer"},
>>                 { "time": "2012-11-28 03:54:45.586183",
>>                   "event": "journaled_completion_queued"},
>>                 { "time": "2012-11-28 03:54:45.586262",
>>                   "event": "sub_op_commit"},
>>                 { "time": "2012-11-28 03:54:45.770792",
>>                   "event": "sub_op_applied"}]}]}
>>
>>
>>>
>>>
>>>>
>>>> sage
>>>>
>>>> >
>>>> > Stefan
>>>> >
>>>> >
>>>> > > On Wed, 14 Nov 2012, Stefan Priebe - Profihost AG wrote:
>>>> > >
>>>> > > > Hello list,
>>>> > > >
>>>> > > > i see this several times. Endless flying slow requests. And they never
>>>> > > > stop
>>>> > > > until i restart the mentioned osd.
>>>> > > >
>>>> > > > 2012-11-14 10:11:57.513395 osd.24 [WRN] 1 slow requests, 1 included below;
>>>> > > > oldest blocked for > 31789.858457 secs
>>>> > > > 2012-11-14 10:11:57.513399 osd.24 [WRN] slow request 31789.858457 seconds
>>>> > > > old,
>>>> > > > received at 2012-11-14 01:22:07.654922: osd_op(client.30286.0:6719
>>>> > > > rbd_data.75c55bf2fdd7.0000000000001399 [write 282624~4096] 3.3f6d2373) v4
>>>> > > > currently delayed
>>>> > > > 2012-11-14 10:11:58.513584 osd.24 [WRN] 1 slow requests, 1 included below;
>>>> > > > oldest blocked for > 31790.858646 secs
>>>> > > > 2012-11-14 10:11:58.513586 osd.24 [WRN] slow request 31790.858646 seconds
>>>> > > > old,
>>>> > > > received at 2012-11-14 01:22:07.654922: osd_op(client.30286.0:6719
>>>> > > > rbd_data.75c55bf2fdd7.0000000000001399 [write 282624~4096] 3.3f6d2373) v4
>>>> > > > currently delayed
>>>> > > > 2012-11-14 10:11:59.513766 osd.24 [WRN] 1 slow requests, 1 included below;
>>>> > > > oldest blocked for > 31791.858827 secs
>>>> > > > 2012-11-14 10:11:59.513768 osd.24 [WRN] slow request 31791.858827 seconds
>>>> > > > old,
>>>> > > > received at 2012-11-14 01:22:07.654922: osd_op(client.30286.0:6719
>>>> > > > rbd_data.75c55bf2fdd7.0000000000001399 [write 282624~4096] 3.3f6d2373) v4
>>>> > > > currently delayed
>>>> > > > 2012-11-14 10:12:00.513909 osd.24 [WRN] 1 slow requests, 1 included below;
>>>> > > > oldest blocked for > 31792.858971 secs
>>>> > > > 2012-11-14 10:12:00.513916 osd.24 [WRN] slow request 31792.858971 seconds
>>>> > > > old,
>>>> > > > received at 2012-11-14 01:22:07.654922: osd_op(client.30286.0:6719
>>>> > > > rbd_data.75c55bf2fdd7.0000000000001399 [write 282624~4096] 3.3f6d2373) v4
>>>> > > > currently delayed
>>>> > > > 2012-11-14 10:12:01.514061 osd.24 [WRN] 1 slow requests, 1 included below;
>>>> > > > oldest blocked for > 31793.859124 secs
>>>> > > > 2012-11-14 10:12:01.514063 osd.24 [WRN] slow request 31793.859124 seconds
>>>> > > > old,
>>>> > > > received at 2012-11-14 01:22:07.654922: osd_op(client.30286.0:6719
>>>> > > > rbd_data.75c55bf2fdd7.0000000000001399 [write 282624~4096] 3.3f6d2373) v4
>>>> > > > currently delayed
>>>> > > >
>>>> > > > When i now restart osd 24 they go away and everything is fine again.
>>>> > > >
>>>> > > > Stefan
>>>> > > > --
>>>> > > > To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
>>>> > > > the body of a message to majordomo@vger.kernel.org
>>>> > > > More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>>> > > >
>>>> > > >
>>>> > > --
>>>> > > To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
>>>> > > the body of a message to majordomo@vger.kernel.org
>>>> > > More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>>> > >
>>>> > --
>>>> > To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
>>>> > the body of a message to majordomo@vger.kernel.org
>>>> > More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>>> >
>>>> >
>>>> --
>>>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
>>>> the body of a message to majordomo@vger.kernel.org
>>>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>>>
>>>>
>>> --
>>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
>>> the body of a message to majordomo@vger.kernel.org
>>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>> --
>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
>> the body of a message to majordomo@vger.kernel.org
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 15+ messages in thread

* RE: endless flying slow requests
  2012-11-28 21:12             ` Samuel Just
@ 2012-11-28 21:52               ` Moore, Shawn M
  2012-11-28 22:04               ` Andrey Korolyov
  1 sibling, 0 replies; 15+ messages in thread
From: Moore, Shawn M @ 2012-11-28 21:52 UTC (permalink / raw)
  To: Samuel Just, Andrey Korolyov
  Cc: Sage Weil, Stefan Priebe - Profihost AG, ceph-devel

I actually have this exact issue with hung io during these slow requests.  The odd thing is when I restarted the osd that was being mentioned via "ceph -w" it didn't resume io.  A bit later (unsure on exact time) another one started being listed in "ceph -w" and when I restarted that different osd io came back.  I do believe these occur when I reboot nodes.  I never reboot more than one at a time and I have 3 replicas in the "cluster".

I have a pure "ceph version 0.53 (commit:2528b5ee105b16352c91af064af5c0b5a7d45d7c)" cluster.

Regards,
Shawn

________________________________________
From: ceph-devel-owner@vger.kernel.org [ceph-devel-owner@vger.kernel.org] on behalf of Samuel Just [sam.just@inktank.com]
Sent: Wednesday, November 28, 2012 16:12
To: Andrey Korolyov
Cc: Sage Weil; Stefan Priebe - Profihost AG; ceph-devel@vger.kernel.org
Subject: Re: endless flying slow requests

Also, these clusters aren't mixed argonaut and next, are they?  (Not
that that shouldn't work, but it would be a useful data point.)
-Sam

On Wed, Nov 28, 2012 at 1:11 PM, Samuel Just <sam.just@inktank.com> wrote:
> Did you observe hung io along with that error?  Both sub_op_commit and
> sub_op_applied have happened, so the sub_op_reply should have been
> sent back to the primary.  This looks more like a leak.  If you also
> observed hung io, then it's possible that the problem is occurring
> between the sub_op_applied event and the response.
> -Sam
>
> On Tue, Nov 27, 2012 at 11:47 PM, Andrey Korolyov <andrey@xdel.ru> wrote:
>> On Wed, Nov 28, 2012 at 5:51 AM, Sage Weil <sage@inktank.com> wrote:
>>> Hi Stefan,
>>>
>>> On Thu, 15 Nov 2012, Sage Weil wrote:
>>>> On Thu, 15 Nov 2012, Stefan Priebe - Profihost AG wrote:
>>>> > Am 14.11.2012 15:59, schrieb Sage Weil:
>>>> > > Hi Stefan,
>>>> > >
>>>> > > I would be nice to confirm that no clients are waiting on replies for
>>>> > > these requests; currently we suspect that the OSD request tracking is the
>>>> > > buggy part.  If you query the OSD admin socket you should be able to dump
>>>> > > requests and see the client IP, and then query the client.
>>>> > >
>>>> > > Is it librbd?  In that case you likely need to change the config so that
>>>> > > it is listening on an admin socket ('admin socket = path').
>>>> >
>>>> > Yes it is. So i have to specify admin socket at the KVM host?
>>>>
>>>> Right.  IIRC the disk line is a ; (or \;) separated list of key/value
>>>> pairs.
>>>>
>>>> > How do i query the admin socket for requests?
>>>>
>>>> ceph --admin-daemon /path/to/socket help
>>>> ceph --admin-daemon /path/to/socket objecter_dump     (i think)
>>>
>>> Were you able to reproduce this?
>>>
>>> Thanks!
>>> sage
>>
>> Meanwhile, I did. :)
>> Such requests will always be created if you have restarted or marked
>> an osd out and then back in and scrub didn`t happen in the meantime
>> (after such operation and before request arrival).
>> What is more interesting, the hangup happens not exactly at the time
>> of operation, but tens of minutes later.
>>
>>         { "description": "osd_sub_op(client.1292013.0:45422 4.731
>> a384cf31\/rbd_data.1415fb1075f187.00000000000000a7\/head\/\/4 [] v
>> 16444'21693 snapset=0=[]:[] snapc=0=[])",
>>           "received_at": "2012-11-28 03:54:43.094151",
>>           "age": "27812.942680",
>>           "duration": "2.676641",
>>           "flag_point": "started",
>>           "events": [
>>                 { "time": "2012-11-28 03:54:43.094222",
>>                   "event": "waiting_for_osdmap"},
>>                 { "time": "2012-11-28 03:54:43.386890",
>>                   "event": "reached_pg"},
>>                 { "time": "2012-11-28 03:54:43.386894",
>>                   "event": "started"},
>>                 { "time": "2012-11-28 03:54:43.386973",
>>                   "event": "commit_queued_for_journal_write"},
>>                 { "time": "2012-11-28 03:54:45.360049",
>>                   "event": "write_thread_in_journal_buffer"},
>>                 { "time": "2012-11-28 03:54:45.586183",
>>                   "event": "journaled_completion_queued"},
>>                 { "time": "2012-11-28 03:54:45.586262",
>>                   "event": "sub_op_commit"},
>>                 { "time": "2012-11-28 03:54:45.770792",
>>                   "event": "sub_op_applied"}]}]}
>>
>>
>>>
>>>
>>>>
>>>> sage
>>>>
>>>> >
>>>> > Stefan
>>>> >
>>>> >
>>>> > > On Wed, 14 Nov 2012, Stefan Priebe - Profihost AG wrote:
>>>> > >
>>>> > > > Hello list,
>>>> > > >
>>>> > > > i see this several times. Endless flying slow requests. And they never
>>>> > > > stop
>>>> > > > until i restart the mentioned osd.
>>>> > > >
>>>> > > > 2012-11-14 10:11:57.513395 osd.24 [WRN] 1 slow requests, 1 included below;
>>>> > > > oldest blocked for > 31789.858457 secs
>>>> > > > 2012-11-14 10:11:57.513399 osd.24 [WRN] slow request 31789.858457 seconds
>>>> > > > old,
>>>> > > > received at 2012-11-14 01:22:07.654922: osd_op(client.30286.0:6719
>>>> > > > rbd_data.75c55bf2fdd7.0000000000001399 [write 282624~4096] 3.3f6d2373) v4
>>>> > > > currently delayed
>>>> > > > 2012-11-14 10:11:58.513584 osd.24 [WRN] 1 slow requests, 1 included below;
>>>> > > > oldest blocked for > 31790.858646 secs
>>>> > > > 2012-11-14 10:11:58.513586 osd.24 [WRN] slow request 31790.858646 seconds
>>>> > > > old,
>>>> > > > received at 2012-11-14 01:22:07.654922: osd_op(client.30286.0:6719
>>>> > > > rbd_data.75c55bf2fdd7.0000000000001399 [write 282624~4096] 3.3f6d2373) v4
>>>> > > > currently delayed
>>>> > > > 2012-11-14 10:11:59.513766 osd.24 [WRN] 1 slow requests, 1 included below;
>>>> > > > oldest blocked for > 31791.858827 secs
>>>> > > > 2012-11-14 10:11:59.513768 osd.24 [WRN] slow request 31791.858827 seconds
>>>> > > > old,
>>>> > > > received at 2012-11-14 01:22:07.654922: osd_op(client.30286.0:6719
>>>> > > > rbd_data.75c55bf2fdd7.0000000000001399 [write 282624~4096] 3.3f6d2373) v4
>>>> > > > currently delayed
>>>> > > > 2012-11-14 10:12:00.513909 osd.24 [WRN] 1 slow requests, 1 included below;
>>>> > > > oldest blocked for > 31792.858971 secs
>>>> > > > 2012-11-14 10:12:00.513916 osd.24 [WRN] slow request 31792.858971 seconds
>>>> > > > old,
>>>> > > > received at 2012-11-14 01:22:07.654922: osd_op(client.30286.0:6719
>>>> > > > rbd_data.75c55bf2fdd7.0000000000001399 [write 282624~4096] 3.3f6d2373) v4
>>>> > > > currently delayed
>>>> > > > 2012-11-14 10:12:01.514061 osd.24 [WRN] 1 slow requests, 1 included below;
>>>> > > > oldest blocked for > 31793.859124 secs
>>>> > > > 2012-11-14 10:12:01.514063 osd.24 [WRN] slow request 31793.859124 seconds
>>>> > > > old,
>>>> > > > received at 2012-11-14 01:22:07.654922: osd_op(client.30286.0:6719
>>>> > > > rbd_data.75c55bf2fdd7.0000000000001399 [write 282624~4096] 3.3f6d2373) v4
>>>> > > > currently delayed
>>>> > > >
>>>> > > > When i now restart osd 24 they go away and everything is fine again.
>>>> > > >
>>>> > > > Stefan
>>>> > > > --
>>>> > > > To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
>>>> > > > the body of a message to majordomo@vger.kernel.org
>>>> > > > More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>>> > > >
>>>> > > >
>>>> > > --
>>>> > > To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
>>>> > > the body of a message to majordomo@vger.kernel.org
>>>> > > More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>>> > >
>>>> > --
>>>> > To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
>>>> > the body of a message to majordomo@vger.kernel.org
>>>> > More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>>> >
>>>> >
>>>> --
>>>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
>>>> the body of a message to majordomo@vger.kernel.org
>>>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>>>
>>>>
>>> --
>>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
>>> the body of a message to majordomo@vger.kernel.org
>>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>> --
>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
>> the body of a message to majordomo@vger.kernel.org
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html



^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: endless flying slow requests
  2012-11-28 21:12             ` Samuel Just
  2012-11-28 21:52               ` Moore, Shawn M
@ 2012-11-28 22:04               ` Andrey Korolyov
  2012-12-01  5:06                 ` Samuel Just
  1 sibling, 1 reply; 15+ messages in thread
From: Andrey Korolyov @ 2012-11-28 22:04 UTC (permalink / raw)
  To: Samuel Just; +Cc: Sage Weil, Stefan Priebe - Profihost AG, ceph-devel

On Thu, Nov 29, 2012 at 1:12 AM, Samuel Just <sam.just@inktank.com> wrote:
> Also, these clusters aren't mixed argonaut and next, are they?  (Not
> that that shouldn't work, but it would be a useful data point.)
> -Sam
>
> On Wed, Nov 28, 2012 at 1:11 PM, Samuel Just <sam.just@inktank.com> wrote:
>> Did you observe hung io along with that error?  Both sub_op_commit and
>> sub_op_applied have happened, so the sub_op_reply should have been
>> sent back to the primary.  This looks more like a leak.  If you also
>> observed hung io, then it's possible that the problem is occurring
>> between the sub_op_applied event and the response.
>> -Sam
>>

It is relatively easy to check if one of client VMs has locked one or
more cores to iowait or just hangs, so yes, these ops are related to
real commit operations and they are hanged.
I`m using all-new 0.54 cluster, without mixing of course. Does
everyone who hit that bug readjusted cluster before bug shows
itself(say, in a day-long distance)?

>> On Tue, Nov 27, 2012 at 11:47 PM, Andrey Korolyov <andrey@xdel.ru> wrote:
>>> On Wed, Nov 28, 2012 at 5:51 AM, Sage Weil <sage@inktank.com> wrote:
>>>> Hi Stefan,
>>>>
>>>> On Thu, 15 Nov 2012, Sage Weil wrote:
>>>>> On Thu, 15 Nov 2012, Stefan Priebe - Profihost AG wrote:
>>>>> > Am 14.11.2012 15:59, schrieb Sage Weil:
>>>>> > > Hi Stefan,
>>>>> > >
>>>>> > > I would be nice to confirm that no clients are waiting on replies for
>>>>> > > these requests; currently we suspect that the OSD request tracking is the
>>>>> > > buggy part.  If you query the OSD admin socket you should be able to dump
>>>>> > > requests and see the client IP, and then query the client.
>>>>> > >
>>>>> > > Is it librbd?  In that case you likely need to change the config so that
>>>>> > > it is listening on an admin socket ('admin socket = path').
>>>>> >
>>>>> > Yes it is. So i have to specify admin socket at the KVM host?
>>>>>
>>>>> Right.  IIRC the disk line is a ; (or \;) separated list of key/value
>>>>> pairs.
>>>>>
>>>>> > How do i query the admin socket for requests?
>>>>>
>>>>> ceph --admin-daemon /path/to/socket help
>>>>> ceph --admin-daemon /path/to/socket objecter_dump     (i think)
>>>>
>>>> Were you able to reproduce this?
>>>>
>>>> Thanks!
>>>> sage
>>>
>>> Meanwhile, I did. :)
>>> Such requests will always be created if you have restarted or marked
>>> an osd out and then back in and scrub didn`t happen in the meantime
>>> (after such operation and before request arrival).
>>> What is more interesting, the hangup happens not exactly at the time
>>> of operation, but tens of minutes later.
>>>
>>>         { "description": "osd_sub_op(client.1292013.0:45422 4.731
>>> a384cf31\/rbd_data.1415fb1075f187.00000000000000a7\/head\/\/4 [] v
>>> 16444'21693 snapset=0=[]:[] snapc=0=[])",
>>>           "received_at": "2012-11-28 03:54:43.094151",
>>>           "age": "27812.942680",
>>>           "duration": "2.676641",
>>>           "flag_point": "started",
>>>           "events": [
>>>                 { "time": "2012-11-28 03:54:43.094222",
>>>                   "event": "waiting_for_osdmap"},
>>>                 { "time": "2012-11-28 03:54:43.386890",
>>>                   "event": "reached_pg"},
>>>                 { "time": "2012-11-28 03:54:43.386894",
>>>                   "event": "started"},
>>>                 { "time": "2012-11-28 03:54:43.386973",
>>>                   "event": "commit_queued_for_journal_write"},
>>>                 { "time": "2012-11-28 03:54:45.360049",
>>>                   "event": "write_thread_in_journal_buffer"},
>>>                 { "time": "2012-11-28 03:54:45.586183",
>>>                   "event": "journaled_completion_queued"},
>>>                 { "time": "2012-11-28 03:54:45.586262",
>>>                   "event": "sub_op_commit"},
>>>                 { "time": "2012-11-28 03:54:45.770792",
>>>                   "event": "sub_op_applied"}]}]}
>>>
>>>
>>>>
>>>>
>>>>>
>>>>> sage
>>>>>
>>>>> >
>>>>> > Stefan
>>>>> >
>>>>> >
>>>>> > > On Wed, 14 Nov 2012, Stefan Priebe - Profihost AG wrote:
>>>>> > >
>>>>> > > > Hello list,
>>>>> > > >
>>>>> > > > i see this several times. Endless flying slow requests. And they never
>>>>> > > > stop
>>>>> > > > until i restart the mentioned osd.
>>>>> > > >
>>>>> > > > 2012-11-14 10:11:57.513395 osd.24 [WRN] 1 slow requests, 1 included below;
>>>>> > > > oldest blocked for > 31789.858457 secs
>>>>> > > > 2012-11-14 10:11:57.513399 osd.24 [WRN] slow request 31789.858457 seconds
>>>>> > > > old,
>>>>> > > > received at 2012-11-14 01:22:07.654922: osd_op(client.30286.0:6719
>>>>> > > > rbd_data.75c55bf2fdd7.0000000000001399 [write 282624~4096] 3.3f6d2373) v4
>>>>> > > > currently delayed
>>>>> > > > 2012-11-14 10:11:58.513584 osd.24 [WRN] 1 slow requests, 1 included below;
>>>>> > > > oldest blocked for > 31790.858646 secs
>>>>> > > > 2012-11-14 10:11:58.513586 osd.24 [WRN] slow request 31790.858646 seconds
>>>>> > > > old,
>>>>> > > > received at 2012-11-14 01:22:07.654922: osd_op(client.30286.0:6719
>>>>> > > > rbd_data.75c55bf2fdd7.0000000000001399 [write 282624~4096] 3.3f6d2373) v4
>>>>> > > > currently delayed
>>>>> > > > 2012-11-14 10:11:59.513766 osd.24 [WRN] 1 slow requests, 1 included below;
>>>>> > > > oldest blocked for > 31791.858827 secs
>>>>> > > > 2012-11-14 10:11:59.513768 osd.24 [WRN] slow request 31791.858827 seconds
>>>>> > > > old,
>>>>> > > > received at 2012-11-14 01:22:07.654922: osd_op(client.30286.0:6719
>>>>> > > > rbd_data.75c55bf2fdd7.0000000000001399 [write 282624~4096] 3.3f6d2373) v4
>>>>> > > > currently delayed
>>>>> > > > 2012-11-14 10:12:00.513909 osd.24 [WRN] 1 slow requests, 1 included below;
>>>>> > > > oldest blocked for > 31792.858971 secs
>>>>> > > > 2012-11-14 10:12:00.513916 osd.24 [WRN] slow request 31792.858971 seconds
>>>>> > > > old,
>>>>> > > > received at 2012-11-14 01:22:07.654922: osd_op(client.30286.0:6719
>>>>> > > > rbd_data.75c55bf2fdd7.0000000000001399 [write 282624~4096] 3.3f6d2373) v4
>>>>> > > > currently delayed
>>>>> > > > 2012-11-14 10:12:01.514061 osd.24 [WRN] 1 slow requests, 1 included below;
>>>>> > > > oldest blocked for > 31793.859124 secs
>>>>> > > > 2012-11-14 10:12:01.514063 osd.24 [WRN] slow request 31793.859124 seconds
>>>>> > > > old,
>>>>> > > > received at 2012-11-14 01:22:07.654922: osd_op(client.30286.0:6719
>>>>> > > > rbd_data.75c55bf2fdd7.0000000000001399 [write 282624~4096] 3.3f6d2373) v4
>>>>> > > > currently delayed
>>>>> > > >
>>>>> > > > When i now restart osd 24 they go away and everything is fine again.
>>>>> > > >
>>>>> > > > Stefan
>>>>> > > > --
>>>>> > > > To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
>>>>> > > > the body of a message to majordomo@vger.kernel.org
>>>>> > > > More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>>>> > > >
>>>>> > > >
>>>>> > > --
>>>>> > > To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
>>>>> > > the body of a message to majordomo@vger.kernel.org
>>>>> > > More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>>>> > >
>>>>> > --
>>>>> > To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
>>>>> > the body of a message to majordomo@vger.kernel.org
>>>>> > More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>>>> >
>>>>> >
>>>>> --
>>>>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
>>>>> the body of a message to majordomo@vger.kernel.org
>>>>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>>>>
>>>>>
>>>> --
>>>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
>>>> the body of a message to majordomo@vger.kernel.org
>>>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>> --
>>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
>>> the body of a message to majordomo@vger.kernel.org
>>> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: endless flying slow requests
  2012-11-28 22:04               ` Andrey Korolyov
@ 2012-12-01  5:06                 ` Samuel Just
  2012-12-02  3:18                   ` Sage Weil
  0 siblings, 1 reply; 15+ messages in thread
From: Samuel Just @ 2012-12-01  5:06 UTC (permalink / raw)
  To: Andrey Korolyov; +Cc: Sage Weil, Stefan Priebe - Profihost AG, ceph-devel

I've pushed a fix to next, 49f32cee647c5bd09f36ba7c9fd4f481a697b9d7.
Let me know if the problem persists with this patch.
-Sam

On Wed, Nov 28, 2012 at 2:04 PM, Andrey Korolyov <andrey@xdel.ru> wrote:
> On Thu, Nov 29, 2012 at 1:12 AM, Samuel Just <sam.just@inktank.com> wrote:
>> Also, these clusters aren't mixed argonaut and next, are they?  (Not
>> that that shouldn't work, but it would be a useful data point.)
>> -Sam
>>
>> On Wed, Nov 28, 2012 at 1:11 PM, Samuel Just <sam.just@inktank.com> wrote:
>>> Did you observe hung io along with that error?  Both sub_op_commit and
>>> sub_op_applied have happened, so the sub_op_reply should have been
>>> sent back to the primary.  This looks more like a leak.  If you also
>>> observed hung io, then it's possible that the problem is occurring
>>> between the sub_op_applied event and the response.
>>> -Sam
>>>
>
> It is relatively easy to check if one of client VMs has locked one or
> more cores to iowait or just hangs, so yes, these ops are related to
> real commit operations and they are hanged.
> I`m using all-new 0.54 cluster, without mixing of course. Does
> everyone who hit that bug readjusted cluster before bug shows
> itself(say, in a day-long distance)?
>
>>> On Tue, Nov 27, 2012 at 11:47 PM, Andrey Korolyov <andrey@xdel.ru> wrote:
>>>> On Wed, Nov 28, 2012 at 5:51 AM, Sage Weil <sage@inktank.com> wrote:
>>>>> Hi Stefan,
>>>>>
>>>>> On Thu, 15 Nov 2012, Sage Weil wrote:
>>>>>> On Thu, 15 Nov 2012, Stefan Priebe - Profihost AG wrote:
>>>>>> > Am 14.11.2012 15:59, schrieb Sage Weil:
>>>>>> > > Hi Stefan,
>>>>>> > >
>>>>>> > > I would be nice to confirm that no clients are waiting on replies for
>>>>>> > > these requests; currently we suspect that the OSD request tracking is the
>>>>>> > > buggy part.  If you query the OSD admin socket you should be able to dump
>>>>>> > > requests and see the client IP, and then query the client.
>>>>>> > >
>>>>>> > > Is it librbd?  In that case you likely need to change the config so that
>>>>>> > > it is listening on an admin socket ('admin socket = path').
>>>>>> >
>>>>>> > Yes it is. So i have to specify admin socket at the KVM host?
>>>>>>
>>>>>> Right.  IIRC the disk line is a ; (or \;) separated list of key/value
>>>>>> pairs.
>>>>>>
>>>>>> > How do i query the admin socket for requests?
>>>>>>
>>>>>> ceph --admin-daemon /path/to/socket help
>>>>>> ceph --admin-daemon /path/to/socket objecter_dump     (i think)
>>>>>
>>>>> Were you able to reproduce this?
>>>>>
>>>>> Thanks!
>>>>> sage
>>>>
>>>> Meanwhile, I did. :)
>>>> Such requests will always be created if you have restarted or marked
>>>> an osd out and then back in and scrub didn`t happen in the meantime
>>>> (after such operation and before request arrival).
>>>> What is more interesting, the hangup happens not exactly at the time
>>>> of operation, but tens of minutes later.
>>>>
>>>>         { "description": "osd_sub_op(client.1292013.0:45422 4.731
>>>> a384cf31\/rbd_data.1415fb1075f187.00000000000000a7\/head\/\/4 [] v
>>>> 16444'21693 snapset=0=[]:[] snapc=0=[])",
>>>>           "received_at": "2012-11-28 03:54:43.094151",
>>>>           "age": "27812.942680",
>>>>           "duration": "2.676641",
>>>>           "flag_point": "started",
>>>>           "events": [
>>>>                 { "time": "2012-11-28 03:54:43.094222",
>>>>                   "event": "waiting_for_osdmap"},
>>>>                 { "time": "2012-11-28 03:54:43.386890",
>>>>                   "event": "reached_pg"},
>>>>                 { "time": "2012-11-28 03:54:43.386894",
>>>>                   "event": "started"},
>>>>                 { "time": "2012-11-28 03:54:43.386973",
>>>>                   "event": "commit_queued_for_journal_write"},
>>>>                 { "time": "2012-11-28 03:54:45.360049",
>>>>                   "event": "write_thread_in_journal_buffer"},
>>>>                 { "time": "2012-11-28 03:54:45.586183",
>>>>                   "event": "journaled_completion_queued"},
>>>>                 { "time": "2012-11-28 03:54:45.586262",
>>>>                   "event": "sub_op_commit"},
>>>>                 { "time": "2012-11-28 03:54:45.770792",
>>>>                   "event": "sub_op_applied"}]}]}
>>>>
>>>>
>>>>>
>>>>>
>>>>>>
>>>>>> sage
>>>>>>
>>>>>> >
>>>>>> > Stefan
>>>>>> >
>>>>>> >
>>>>>> > > On Wed, 14 Nov 2012, Stefan Priebe - Profihost AG wrote:
>>>>>> > >
>>>>>> > > > Hello list,
>>>>>> > > >
>>>>>> > > > i see this several times. Endless flying slow requests. And they never
>>>>>> > > > stop
>>>>>> > > > until i restart the mentioned osd.
>>>>>> > > >
>>>>>> > > > 2012-11-14 10:11:57.513395 osd.24 [WRN] 1 slow requests, 1 included below;
>>>>>> > > > oldest blocked for > 31789.858457 secs
>>>>>> > > > 2012-11-14 10:11:57.513399 osd.24 [WRN] slow request 31789.858457 seconds
>>>>>> > > > old,
>>>>>> > > > received at 2012-11-14 01:22:07.654922: osd_op(client.30286.0:6719
>>>>>> > > > rbd_data.75c55bf2fdd7.0000000000001399 [write 282624~4096] 3.3f6d2373) v4
>>>>>> > > > currently delayed
>>>>>> > > > 2012-11-14 10:11:58.513584 osd.24 [WRN] 1 slow requests, 1 included below;
>>>>>> > > > oldest blocked for > 31790.858646 secs
>>>>>> > > > 2012-11-14 10:11:58.513586 osd.24 [WRN] slow request 31790.858646 seconds
>>>>>> > > > old,
>>>>>> > > > received at 2012-11-14 01:22:07.654922: osd_op(client.30286.0:6719
>>>>>> > > > rbd_data.75c55bf2fdd7.0000000000001399 [write 282624~4096] 3.3f6d2373) v4
>>>>>> > > > currently delayed
>>>>>> > > > 2012-11-14 10:11:59.513766 osd.24 [WRN] 1 slow requests, 1 included below;
>>>>>> > > > oldest blocked for > 31791.858827 secs
>>>>>> > > > 2012-11-14 10:11:59.513768 osd.24 [WRN] slow request 31791.858827 seconds
>>>>>> > > > old,
>>>>>> > > > received at 2012-11-14 01:22:07.654922: osd_op(client.30286.0:6719
>>>>>> > > > rbd_data.75c55bf2fdd7.0000000000001399 [write 282624~4096] 3.3f6d2373) v4
>>>>>> > > > currently delayed
>>>>>> > > > 2012-11-14 10:12:00.513909 osd.24 [WRN] 1 slow requests, 1 included below;
>>>>>> > > > oldest blocked for > 31792.858971 secs
>>>>>> > > > 2012-11-14 10:12:00.513916 osd.24 [WRN] slow request 31792.858971 seconds
>>>>>> > > > old,
>>>>>> > > > received at 2012-11-14 01:22:07.654922: osd_op(client.30286.0:6719
>>>>>> > > > rbd_data.75c55bf2fdd7.0000000000001399 [write 282624~4096] 3.3f6d2373) v4
>>>>>> > > > currently delayed
>>>>>> > > > 2012-11-14 10:12:01.514061 osd.24 [WRN] 1 slow requests, 1 included below;
>>>>>> > > > oldest blocked for > 31793.859124 secs
>>>>>> > > > 2012-11-14 10:12:01.514063 osd.24 [WRN] slow request 31793.859124 seconds
>>>>>> > > > old,
>>>>>> > > > received at 2012-11-14 01:22:07.654922: osd_op(client.30286.0:6719
>>>>>> > > > rbd_data.75c55bf2fdd7.0000000000001399 [write 282624~4096] 3.3f6d2373) v4
>>>>>> > > > currently delayed
>>>>>> > > >
>>>>>> > > > When i now restart osd 24 they go away and everything is fine again.
>>>>>> > > >
>>>>>> > > > Stefan
>>>>>> > > > --
>>>>>> > > > To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
>>>>>> > > > the body of a message to majordomo@vger.kernel.org
>>>>>> > > > More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>>>>> > > >
>>>>>> > > >
>>>>>> > > --
>>>>>> > > To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
>>>>>> > > the body of a message to majordomo@vger.kernel.org
>>>>>> > > More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>>>>> > >
>>>>>> > --
>>>>>> > To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
>>>>>> > the body of a message to majordomo@vger.kernel.org
>>>>>> > More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>>>>> >
>>>>>> >
>>>>>> --
>>>>>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
>>>>>> the body of a message to majordomo@vger.kernel.org
>>>>>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>>>>>
>>>>>>
>>>>> --
>>>>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
>>>>> the body of a message to majordomo@vger.kernel.org
>>>>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>>> --
>>>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
>>>> the body of a message to majordomo@vger.kernel.org
>>>> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: endless flying slow requests
  2012-12-01  5:06                 ` Samuel Just
@ 2012-12-02  3:18                   ` Sage Weil
  2012-12-02 22:02                     ` Samuel Just
  0 siblings, 1 reply; 15+ messages in thread
From: Sage Weil @ 2012-12-02  3:18 UTC (permalink / raw)
  To: Samuel Just; +Cc: Andrey Korolyov, Stefan Priebe - Profihost AG, ceph-devel

I pushed another fix that plugs a leak when requests race with peering 
that fixed the 'slow request' for osd_sub_op messages for me.  It's in 
wip-osd-leak.

Sam, when you get a chance, can you take a look and push it to next if it 
looks okay?

Thanks!
sage


On Fri, 30 Nov 2012, Samuel Just wrote:

> I've pushed a fix to next, 49f32cee647c5bd09f36ba7c9fd4f481a697b9d7.
> Let me know if the problem persists with this patch.
> -Sam
> 
> On Wed, Nov 28, 2012 at 2:04 PM, Andrey Korolyov <andrey@xdel.ru> wrote:
> > On Thu, Nov 29, 2012 at 1:12 AM, Samuel Just <sam.just@inktank.com> wrote:
> >> Also, these clusters aren't mixed argonaut and next, are they?  (Not
> >> that that shouldn't work, but it would be a useful data point.)
> >> -Sam
> >>
> >> On Wed, Nov 28, 2012 at 1:11 PM, Samuel Just <sam.just@inktank.com> wrote:
> >>> Did you observe hung io along with that error?  Both sub_op_commit and
> >>> sub_op_applied have happened, so the sub_op_reply should have been
> >>> sent back to the primary.  This looks more like a leak.  If you also
> >>> observed hung io, then it's possible that the problem is occurring
> >>> between the sub_op_applied event and the response.
> >>> -Sam
> >>>
> >
> > It is relatively easy to check if one of client VMs has locked one or
> > more cores to iowait or just hangs, so yes, these ops are related to
> > real commit operations and they are hanged.
> > I`m using all-new 0.54 cluster, without mixing of course. Does
> > everyone who hit that bug readjusted cluster before bug shows
> > itself(say, in a day-long distance)?
> >
> >>> On Tue, Nov 27, 2012 at 11:47 PM, Andrey Korolyov <andrey@xdel.ru> wrote:
> >>>> On Wed, Nov 28, 2012 at 5:51 AM, Sage Weil <sage@inktank.com> wrote:
> >>>>> Hi Stefan,
> >>>>>
> >>>>> On Thu, 15 Nov 2012, Sage Weil wrote:
> >>>>>> On Thu, 15 Nov 2012, Stefan Priebe - Profihost AG wrote:
> >>>>>> > Am 14.11.2012 15:59, schrieb Sage Weil:
> >>>>>> > > Hi Stefan,
> >>>>>> > >
> >>>>>> > > I would be nice to confirm that no clients are waiting on replies for
> >>>>>> > > these requests; currently we suspect that the OSD request tracking is the
> >>>>>> > > buggy part.  If you query the OSD admin socket you should be able to dump
> >>>>>> > > requests and see the client IP, and then query the client.
> >>>>>> > >
> >>>>>> > > Is it librbd?  In that case you likely need to change the config so that
> >>>>>> > > it is listening on an admin socket ('admin socket = path').
> >>>>>> >
> >>>>>> > Yes it is. So i have to specify admin socket at the KVM host?
> >>>>>>
> >>>>>> Right.  IIRC the disk line is a ; (or \;) separated list of key/value
> >>>>>> pairs.
> >>>>>>
> >>>>>> > How do i query the admin socket for requests?
> >>>>>>
> >>>>>> ceph --admin-daemon /path/to/socket help
> >>>>>> ceph --admin-daemon /path/to/socket objecter_dump     (i think)
> >>>>>
> >>>>> Were you able to reproduce this?
> >>>>>
> >>>>> Thanks!
> >>>>> sage
> >>>>
> >>>> Meanwhile, I did. :)
> >>>> Such requests will always be created if you have restarted or marked
> >>>> an osd out and then back in and scrub didn`t happen in the meantime
> >>>> (after such operation and before request arrival).
> >>>> What is more interesting, the hangup happens not exactly at the time
> >>>> of operation, but tens of minutes later.
> >>>>
> >>>>         { "description": "osd_sub_op(client.1292013.0:45422 4.731
> >>>> a384cf31\/rbd_data.1415fb1075f187.00000000000000a7\/head\/\/4 [] v
> >>>> 16444'21693 snapset=0=[]:[] snapc=0=[])",
> >>>>           "received_at": "2012-11-28 03:54:43.094151",
> >>>>           "age": "27812.942680",
> >>>>           "duration": "2.676641",
> >>>>           "flag_point": "started",
> >>>>           "events": [
> >>>>                 { "time": "2012-11-28 03:54:43.094222",
> >>>>                   "event": "waiting_for_osdmap"},
> >>>>                 { "time": "2012-11-28 03:54:43.386890",
> >>>>                   "event": "reached_pg"},
> >>>>                 { "time": "2012-11-28 03:54:43.386894",
> >>>>                   "event": "started"},
> >>>>                 { "time": "2012-11-28 03:54:43.386973",
> >>>>                   "event": "commit_queued_for_journal_write"},
> >>>>                 { "time": "2012-11-28 03:54:45.360049",
> >>>>                   "event": "write_thread_in_journal_buffer"},
> >>>>                 { "time": "2012-11-28 03:54:45.586183",
> >>>>                   "event": "journaled_completion_queued"},
> >>>>                 { "time": "2012-11-28 03:54:45.586262",
> >>>>                   "event": "sub_op_commit"},
> >>>>                 { "time": "2012-11-28 03:54:45.770792",
> >>>>                   "event": "sub_op_applied"}]}]}
> >>>>
> >>>>
> >>>>>
> >>>>>
> >>>>>>
> >>>>>> sage
> >>>>>>
> >>>>>> >
> >>>>>> > Stefan
> >>>>>> >
> >>>>>> >
> >>>>>> > > On Wed, 14 Nov 2012, Stefan Priebe - Profihost AG wrote:
> >>>>>> > >
> >>>>>> > > > Hello list,
> >>>>>> > > >
> >>>>>> > > > i see this several times. Endless flying slow requests. And they never
> >>>>>> > > > stop
> >>>>>> > > > until i restart the mentioned osd.
> >>>>>> > > >
> >>>>>> > > > 2012-11-14 10:11:57.513395 osd.24 [WRN] 1 slow requests, 1 included below;
> >>>>>> > > > oldest blocked for > 31789.858457 secs
> >>>>>> > > > 2012-11-14 10:11:57.513399 osd.24 [WRN] slow request 31789.858457 seconds
> >>>>>> > > > old,
> >>>>>> > > > received at 2012-11-14 01:22:07.654922: osd_op(client.30286.0:6719
> >>>>>> > > > rbd_data.75c55bf2fdd7.0000000000001399 [write 282624~4096] 3.3f6d2373) v4
> >>>>>> > > > currently delayed
> >>>>>> > > > 2012-11-14 10:11:58.513584 osd.24 [WRN] 1 slow requests, 1 included below;
> >>>>>> > > > oldest blocked for > 31790.858646 secs
> >>>>>> > > > 2012-11-14 10:11:58.513586 osd.24 [WRN] slow request 31790.858646 seconds
> >>>>>> > > > old,
> >>>>>> > > > received at 2012-11-14 01:22:07.654922: osd_op(client.30286.0:6719
> >>>>>> > > > rbd_data.75c55bf2fdd7.0000000000001399 [write 282624~4096] 3.3f6d2373) v4
> >>>>>> > > > currently delayed
> >>>>>> > > > 2012-11-14 10:11:59.513766 osd.24 [WRN] 1 slow requests, 1 included below;
> >>>>>> > > > oldest blocked for > 31791.858827 secs
> >>>>>> > > > 2012-11-14 10:11:59.513768 osd.24 [WRN] slow request 31791.858827 seconds
> >>>>>> > > > old,
> >>>>>> > > > received at 2012-11-14 01:22:07.654922: osd_op(client.30286.0:6719
> >>>>>> > > > rbd_data.75c55bf2fdd7.0000000000001399 [write 282624~4096] 3.3f6d2373) v4
> >>>>>> > > > currently delayed
> >>>>>> > > > 2012-11-14 10:12:00.513909 osd.24 [WRN] 1 slow requests, 1 included below;
> >>>>>> > > > oldest blocked for > 31792.858971 secs
> >>>>>> > > > 2012-11-14 10:12:00.513916 osd.24 [WRN] slow request 31792.858971 seconds
> >>>>>> > > > old,
> >>>>>> > > > received at 2012-11-14 01:22:07.654922: osd_op(client.30286.0:6719
> >>>>>> > > > rbd_data.75c55bf2fdd7.0000000000001399 [write 282624~4096] 3.3f6d2373) v4
> >>>>>> > > > currently delayed
> >>>>>> > > > 2012-11-14 10:12:01.514061 osd.24 [WRN] 1 slow requests, 1 included below;
> >>>>>> > > > oldest blocked for > 31793.859124 secs
> >>>>>> > > > 2012-11-14 10:12:01.514063 osd.24 [WRN] slow request 31793.859124 seconds
> >>>>>> > > > old,
> >>>>>> > > > received at 2012-11-14 01:22:07.654922: osd_op(client.30286.0:6719
> >>>>>> > > > rbd_data.75c55bf2fdd7.0000000000001399 [write 282624~4096] 3.3f6d2373) v4
> >>>>>> > > > currently delayed
> >>>>>> > > >
> >>>>>> > > > When i now restart osd 24 they go away and everything is fine again.
> >>>>>> > > >
> >>>>>> > > > Stefan
> >>>>>> > > > --
> >>>>>> > > > To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
> >>>>>> > > > the body of a message to majordomo@vger.kernel.org
> >>>>>> > > > More majordomo info at  http://vger.kernel.org/majordomo-info.html
> >>>>>> > > >
> >>>>>> > > >
> >>>>>> > > --
> >>>>>> > > To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
> >>>>>> > > the body of a message to majordomo@vger.kernel.org
> >>>>>> > > More majordomo info at  http://vger.kernel.org/majordomo-info.html
> >>>>>> > >
> >>>>>> > --
> >>>>>> > To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
> >>>>>> > the body of a message to majordomo@vger.kernel.org
> >>>>>> > More majordomo info at  http://vger.kernel.org/majordomo-info.html
> >>>>>> >
> >>>>>> >
> >>>>>> --
> >>>>>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
> >>>>>> the body of a message to majordomo@vger.kernel.org
> >>>>>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> >>>>>>
> >>>>>>
> >>>>> --
> >>>>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
> >>>>> the body of a message to majordomo@vger.kernel.org
> >>>>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> >>>> --
> >>>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
> >>>> the body of a message to majordomo@vger.kernel.org
> >>>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> --
> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 
> 

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: endless flying slow requests
  2012-12-02  3:18                   ` Sage Weil
@ 2012-12-02 22:02                     ` Samuel Just
  0 siblings, 0 replies; 15+ messages in thread
From: Samuel Just @ 2012-12-02 22:02 UTC (permalink / raw)
  To: Sage Weil; +Cc: Andrey Korolyov, Stefan Priebe - Profihost AG, ceph-devel

Looks good, pushed.
-Sam

On Sat, Dec 1, 2012 at 7:18 PM, Sage Weil <sage@inktank.com> wrote:
> I pushed another fix that plugs a leak when requests race with peering
> that fixed the 'slow request' for osd_sub_op messages for me.  It's in
> wip-osd-leak.
>
> Sam, when you get a chance, can you take a look and push it to next if it
> looks okay?
>
> Thanks!
> sage
>
>
> On Fri, 30 Nov 2012, Samuel Just wrote:
>
>> I've pushed a fix to next, 49f32cee647c5bd09f36ba7c9fd4f481a697b9d7.
>> Let me know if the problem persists with this patch.
>> -Sam
>>
>> On Wed, Nov 28, 2012 at 2:04 PM, Andrey Korolyov <andrey@xdel.ru> wrote:
>> > On Thu, Nov 29, 2012 at 1:12 AM, Samuel Just <sam.just@inktank.com> wrote:
>> >> Also, these clusters aren't mixed argonaut and next, are they?  (Not
>> >> that that shouldn't work, but it would be a useful data point.)
>> >> -Sam
>> >>
>> >> On Wed, Nov 28, 2012 at 1:11 PM, Samuel Just <sam.just@inktank.com> wrote:
>> >>> Did you observe hung io along with that error?  Both sub_op_commit and
>> >>> sub_op_applied have happened, so the sub_op_reply should have been
>> >>> sent back to the primary.  This looks more like a leak.  If you also
>> >>> observed hung io, then it's possible that the problem is occurring
>> >>> between the sub_op_applied event and the response.
>> >>> -Sam
>> >>>
>> >
>> > It is relatively easy to check if one of client VMs has locked one or
>> > more cores to iowait or just hangs, so yes, these ops are related to
>> > real commit operations and they are hanged.
>> > I`m using all-new 0.54 cluster, without mixing of course. Does
>> > everyone who hit that bug readjusted cluster before bug shows
>> > itself(say, in a day-long distance)?
>> >
>> >>> On Tue, Nov 27, 2012 at 11:47 PM, Andrey Korolyov <andrey@xdel.ru> wrote:
>> >>>> On Wed, Nov 28, 2012 at 5:51 AM, Sage Weil <sage@inktank.com> wrote:
>> >>>>> Hi Stefan,
>> >>>>>
>> >>>>> On Thu, 15 Nov 2012, Sage Weil wrote:
>> >>>>>> On Thu, 15 Nov 2012, Stefan Priebe - Profihost AG wrote:
>> >>>>>> > Am 14.11.2012 15:59, schrieb Sage Weil:
>> >>>>>> > > Hi Stefan,
>> >>>>>> > >
>> >>>>>> > > I would be nice to confirm that no clients are waiting on replies for
>> >>>>>> > > these requests; currently we suspect that the OSD request tracking is the
>> >>>>>> > > buggy part.  If you query the OSD admin socket you should be able to dump
>> >>>>>> > > requests and see the client IP, and then query the client.
>> >>>>>> > >
>> >>>>>> > > Is it librbd?  In that case you likely need to change the config so that
>> >>>>>> > > it is listening on an admin socket ('admin socket = path').
>> >>>>>> >
>> >>>>>> > Yes it is. So i have to specify admin socket at the KVM host?
>> >>>>>>
>> >>>>>> Right.  IIRC the disk line is a ; (or \;) separated list of key/value
>> >>>>>> pairs.
>> >>>>>>
>> >>>>>> > How do i query the admin socket for requests?
>> >>>>>>
>> >>>>>> ceph --admin-daemon /path/to/socket help
>> >>>>>> ceph --admin-daemon /path/to/socket objecter_dump     (i think)
>> >>>>>
>> >>>>> Were you able to reproduce this?
>> >>>>>
>> >>>>> Thanks!
>> >>>>> sage
>> >>>>
>> >>>> Meanwhile, I did. :)
>> >>>> Such requests will always be created if you have restarted or marked
>> >>>> an osd out and then back in and scrub didn`t happen in the meantime
>> >>>> (after such operation and before request arrival).
>> >>>> What is more interesting, the hangup happens not exactly at the time
>> >>>> of operation, but tens of minutes later.
>> >>>>
>> >>>>         { "description": "osd_sub_op(client.1292013.0:45422 4.731
>> >>>> a384cf31\/rbd_data.1415fb1075f187.00000000000000a7\/head\/\/4 [] v
>> >>>> 16444'21693 snapset=0=[]:[] snapc=0=[])",
>> >>>>           "received_at": "2012-11-28 03:54:43.094151",
>> >>>>           "age": "27812.942680",
>> >>>>           "duration": "2.676641",
>> >>>>           "flag_point": "started",
>> >>>>           "events": [
>> >>>>                 { "time": "2012-11-28 03:54:43.094222",
>> >>>>                   "event": "waiting_for_osdmap"},
>> >>>>                 { "time": "2012-11-28 03:54:43.386890",
>> >>>>                   "event": "reached_pg"},
>> >>>>                 { "time": "2012-11-28 03:54:43.386894",
>> >>>>                   "event": "started"},
>> >>>>                 { "time": "2012-11-28 03:54:43.386973",
>> >>>>                   "event": "commit_queued_for_journal_write"},
>> >>>>                 { "time": "2012-11-28 03:54:45.360049",
>> >>>>                   "event": "write_thread_in_journal_buffer"},
>> >>>>                 { "time": "2012-11-28 03:54:45.586183",
>> >>>>                   "event": "journaled_completion_queued"},
>> >>>>                 { "time": "2012-11-28 03:54:45.586262",
>> >>>>                   "event": "sub_op_commit"},
>> >>>>                 { "time": "2012-11-28 03:54:45.770792",
>> >>>>                   "event": "sub_op_applied"}]}]}
>> >>>>
>> >>>>
>> >>>>>
>> >>>>>
>> >>>>>>
>> >>>>>> sage
>> >>>>>>
>> >>>>>> >
>> >>>>>> > Stefan
>> >>>>>> >
>> >>>>>> >
>> >>>>>> > > On Wed, 14 Nov 2012, Stefan Priebe - Profihost AG wrote:
>> >>>>>> > >
>> >>>>>> > > > Hello list,
>> >>>>>> > > >
>> >>>>>> > > > i see this several times. Endless flying slow requests. And they never
>> >>>>>> > > > stop
>> >>>>>> > > > until i restart the mentioned osd.
>> >>>>>> > > >
>> >>>>>> > > > 2012-11-14 10:11:57.513395 osd.24 [WRN] 1 slow requests, 1 included below;
>> >>>>>> > > > oldest blocked for > 31789.858457 secs
>> >>>>>> > > > 2012-11-14 10:11:57.513399 osd.24 [WRN] slow request 31789.858457 seconds
>> >>>>>> > > > old,
>> >>>>>> > > > received at 2012-11-14 01:22:07.654922: osd_op(client.30286.0:6719
>> >>>>>> > > > rbd_data.75c55bf2fdd7.0000000000001399 [write 282624~4096] 3.3f6d2373) v4
>> >>>>>> > > > currently delayed
>> >>>>>> > > > 2012-11-14 10:11:58.513584 osd.24 [WRN] 1 slow requests, 1 included below;
>> >>>>>> > > > oldest blocked for > 31790.858646 secs
>> >>>>>> > > > 2012-11-14 10:11:58.513586 osd.24 [WRN] slow request 31790.858646 seconds
>> >>>>>> > > > old,
>> >>>>>> > > > received at 2012-11-14 01:22:07.654922: osd_op(client.30286.0:6719
>> >>>>>> > > > rbd_data.75c55bf2fdd7.0000000000001399 [write 282624~4096] 3.3f6d2373) v4
>> >>>>>> > > > currently delayed
>> >>>>>> > > > 2012-11-14 10:11:59.513766 osd.24 [WRN] 1 slow requests, 1 included below;
>> >>>>>> > > > oldest blocked for > 31791.858827 secs
>> >>>>>> > > > 2012-11-14 10:11:59.513768 osd.24 [WRN] slow request 31791.858827 seconds
>> >>>>>> > > > old,
>> >>>>>> > > > received at 2012-11-14 01:22:07.654922: osd_op(client.30286.0:6719
>> >>>>>> > > > rbd_data.75c55bf2fdd7.0000000000001399 [write 282624~4096] 3.3f6d2373) v4
>> >>>>>> > > > currently delayed
>> >>>>>> > > > 2012-11-14 10:12:00.513909 osd.24 [WRN] 1 slow requests, 1 included below;
>> >>>>>> > > > oldest blocked for > 31792.858971 secs
>> >>>>>> > > > 2012-11-14 10:12:00.513916 osd.24 [WRN] slow request 31792.858971 seconds
>> >>>>>> > > > old,
>> >>>>>> > > > received at 2012-11-14 01:22:07.654922: osd_op(client.30286.0:6719
>> >>>>>> > > > rbd_data.75c55bf2fdd7.0000000000001399 [write 282624~4096] 3.3f6d2373) v4
>> >>>>>> > > > currently delayed
>> >>>>>> > > > 2012-11-14 10:12:01.514061 osd.24 [WRN] 1 slow requests, 1 included below;
>> >>>>>> > > > oldest blocked for > 31793.859124 secs
>> >>>>>> > > > 2012-11-14 10:12:01.514063 osd.24 [WRN] slow request 31793.859124 seconds
>> >>>>>> > > > old,
>> >>>>>> > > > received at 2012-11-14 01:22:07.654922: osd_op(client.30286.0:6719
>> >>>>>> > > > rbd_data.75c55bf2fdd7.0000000000001399 [write 282624~4096] 3.3f6d2373) v4
>> >>>>>> > > > currently delayed
>> >>>>>> > > >
>> >>>>>> > > > When i now restart osd 24 they go away and everything is fine again.
>> >>>>>> > > >
>> >>>>>> > > > Stefan
>> >>>>>> > > > --
>> >>>>>> > > > To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
>> >>>>>> > > > the body of a message to majordomo@vger.kernel.org
>> >>>>>> > > > More majordomo info at  http://vger.kernel.org/majordomo-info.html
>> >>>>>> > > >
>> >>>>>> > > >
>> >>>>>> > > --
>> >>>>>> > > To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
>> >>>>>> > > the body of a message to majordomo@vger.kernel.org
>> >>>>>> > > More majordomo info at  http://vger.kernel.org/majordomo-info.html
>> >>>>>> > >
>> >>>>>> > --
>> >>>>>> > To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
>> >>>>>> > the body of a message to majordomo@vger.kernel.org
>> >>>>>> > More majordomo info at  http://vger.kernel.org/majordomo-info.html
>> >>>>>> >
>> >>>>>> >
>> >>>>>> --
>> >>>>>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
>> >>>>>> the body of a message to majordomo@vger.kernel.org
>> >>>>>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>> >>>>>>
>> >>>>>>
>> >>>>> --
>> >>>>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
>> >>>>> the body of a message to majordomo@vger.kernel.org
>> >>>>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>> >>>> --
>> >>>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
>> >>>> the body of a message to majordomo@vger.kernel.org
>> >>>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>> --
>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
>> the body of a message to majordomo@vger.kernel.org
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>
>>

^ permalink raw reply	[flat|nested] 15+ messages in thread

end of thread, other threads:[~2012-12-02 22:02 UTC | newest]

Thread overview: 15+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2012-11-14  9:12 endless flying slow requests Stefan Priebe - Profihost AG
2012-11-14 14:59 ` Sage Weil
2012-11-15  8:09   ` Stefan Priebe - Profihost AG
2012-11-15  8:18     ` Josh Durgin
2012-11-15 14:58     ` Sage Weil
2012-11-28  1:51       ` Sage Weil
2012-11-28  7:23         ` Stefan Priebe - Profihost AG
2012-11-28  7:47         ` Andrey Korolyov
2012-11-28 21:11           ` Samuel Just
2012-11-28 21:12             ` Samuel Just
2012-11-28 21:52               ` Moore, Shawn M
2012-11-28 22:04               ` Andrey Korolyov
2012-12-01  5:06                 ` Samuel Just
2012-12-02  3:18                   ` Sage Weil
2012-12-02 22:02                     ` Samuel Just

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.