All of lore.kernel.org
 help / color / mirror / Atom feed
* rbd map command hangs for 15 minutes during system start up
@ 2012-11-08 22:10 Mandell Degerness
  2012-11-09  1:43 ` Josh Durgin
  0 siblings, 1 reply; 56+ messages in thread
From: Mandell Degerness @ 2012-11-08 22:10 UTC (permalink / raw)
  To: ceph-devel

[-- Attachment #1: Type: text/plain, Size: 2084 bytes --]

We are seeing a somewhat random, but frequent hang on our systems
during startup.  The hang happens at the point where an "rbd map
<rbdvol>" command is run.

I've attached the ceph logs from the cluster.  The map command happens
at Nov  8 18:41:09 on server 172.18.0.15.  The process which hung can
be seen in the log as 172.18.0.15:0/1143980479.

It appears as if the TCP socket is opened to the OSD, but then times
out 15 minutes later, the process gets data when the socket is closed
on the client server and it retries.

Please help.

We are using ceph version 0.48.2argonaut
(commit:3e02b2fad88c2a95d9c0c86878f10d1beb780bfe).

We are using a 3.5.7 kernel with the following list of patches applied:

1-libceph-encapsulate-out-message-data-setup.patch
2-libceph-dont-mark-footer-complete-before-it-is.patch
3-libceph-move-init-of-bio_iter.patch
4-libceph-dont-use-bio_iter-as-a-flag.patch
5-libceph-resubmit-linger-ops-when-pg-mapping-changes.patch
6-libceph-re-initialize-bio_iter-on-start-of-message-receive.patch
7-ceph-close-old-con-before-reopening-on-mds-reconnect.patch
8-libceph-protect-ceph_con_open-with-mutex.patch
9-libceph-reset-connection-retry-on-successfully-negotiation.patch
10-rbd-only-reset-capacity-when-pointing-to-head.patch
11-rbd-set-image-size-when-header-is-updated.patch
12-libceph-fix-crypto-key-null-deref-memory-leak.patch
13-ceph-tolerate-and-warn-on-extraneous-dentry-from-mds.patch
14-ceph-avoid-divide-by-zero-in-__validate_layout.patch
15-rbd-drop-dev-reference-on-error-in-rbd_open.patch
16-ceph-Fix-oops-when-handling-mdsmap-that-decreases-max_mds.patch
17-libceph-check-for-invalid-mapping.patch
18-ceph-propagate-layout-error-on-osd-request-creation.patch
19-rbd-BUG-on-invalid-layout.patch
20-ceph-return-EIO-on-invalid-layout-on-GET_DATALOC-ioctl.patch
21-ceph-avoid-32-bit-page-index-overflow.patch
23-ceph-fix-dentry-reference-leak-in-encode_fh.patch

Any suggestions?

One thought is that the following patch (which we could not apply) is
what is required:

22-rbd-reset-BACKOFF-if-unable-to-re-queue.patch

Regards,
Mandell Degerness

[-- Attachment #2: hanglog_ceph.log.gz --]
[-- Type: application/x-gzip, Size: 21632 bytes --]

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: rbd map command hangs for 15 minutes during system start up
  2012-11-08 22:10 rbd map command hangs for 15 minutes during system start up Mandell Degerness
@ 2012-11-09  1:43 ` Josh Durgin
  2012-11-12 22:19   ` Nick Bartos
  0 siblings, 1 reply; 56+ messages in thread
From: Josh Durgin @ 2012-11-09  1:43 UTC (permalink / raw)
  To: Mandell Degerness; +Cc: ceph-devel

On 11/08/2012 02:10 PM, Mandell Degerness wrote:
> We are seeing a somewhat random, but frequent hang on our systems
> during startup.  The hang happens at the point where an "rbd map
> <rbdvol>" command is run.
>
> I've attached the ceph logs from the cluster.  The map command happens
> at Nov  8 18:41:09 on server 172.18.0.15.  The process which hung can
> be seen in the log as 172.18.0.15:0/1143980479.
>
> It appears as if the TCP socket is opened to the OSD, but then times
> out 15 minutes later, the process gets data when the socket is closed
> on the client server and it retries.
>
> Please help.
>
> We are using ceph version 0.48.2argonaut
> (commit:3e02b2fad88c2a95d9c0c86878f10d1beb780bfe).
>
> We are using a 3.5.7 kernel with the following list of patches applied:
>
> 1-libceph-encapsulate-out-message-data-setup.patch
> 2-libceph-dont-mark-footer-complete-before-it-is.patch
> 3-libceph-move-init-of-bio_iter.patch
> 4-libceph-dont-use-bio_iter-as-a-flag.patch
> 5-libceph-resubmit-linger-ops-when-pg-mapping-changes.patch
> 6-libceph-re-initialize-bio_iter-on-start-of-message-receive.patch
> 7-ceph-close-old-con-before-reopening-on-mds-reconnect.patch
> 8-libceph-protect-ceph_con_open-with-mutex.patch
> 9-libceph-reset-connection-retry-on-successfully-negotiation.patch
> 10-rbd-only-reset-capacity-when-pointing-to-head.patch
> 11-rbd-set-image-size-when-header-is-updated.patch
> 12-libceph-fix-crypto-key-null-deref-memory-leak.patch
> 13-ceph-tolerate-and-warn-on-extraneous-dentry-from-mds.patch
> 14-ceph-avoid-divide-by-zero-in-__validate_layout.patch
> 15-rbd-drop-dev-reference-on-error-in-rbd_open.patch
> 16-ceph-Fix-oops-when-handling-mdsmap-that-decreases-max_mds.patch
> 17-libceph-check-for-invalid-mapping.patch
> 18-ceph-propagate-layout-error-on-osd-request-creation.patch
> 19-rbd-BUG-on-invalid-layout.patch
> 20-ceph-return-EIO-on-invalid-layout-on-GET_DATALOC-ioctl.patch
> 21-ceph-avoid-32-bit-page-index-overflow.patch
> 23-ceph-fix-dentry-reference-leak-in-encode_fh.patch
>
> Any suggestions?

The log shows your monitors don't have time sychronized enough among
them to make much progress (including authenticating new connections).
That's probably the real issue. 0.2s is pretty large clock drift.

> One thought is that the following patch (which we could not apply) is
> what is required:
>
> 22-rbd-reset-BACKOFF-if-unable-to-re-queue.patch

This is certainly useful too, but I don't think it's the cause of
the delay in this case.

Josh

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: rbd map command hangs for 15 minutes during system start up
  2012-11-09  1:43 ` Josh Durgin
@ 2012-11-12 22:19   ` Nick Bartos
  2012-11-12 23:16     ` Sage Weil
  0 siblings, 1 reply; 56+ messages in thread
From: Nick Bartos @ 2012-11-12 22:19 UTC (permalink / raw)
  To: Josh Durgin; +Cc: Mandell Degerness, ceph-devel

After removing 8-libceph-protect-ceph_con_open-with-mutex.patch, it
seems we no longer have this hang.

On Thu, Nov 8, 2012 at 5:43 PM, Josh Durgin <josh.durgin@inktank.com> wrote:
> On 11/08/2012 02:10 PM, Mandell Degerness wrote:
>>
>> We are seeing a somewhat random, but frequent hang on our systems
>> during startup.  The hang happens at the point where an "rbd map
>> <rbdvol>" command is run.
>>
>> I've attached the ceph logs from the cluster.  The map command happens
>> at Nov  8 18:41:09 on server 172.18.0.15.  The process which hung can
>> be seen in the log as 172.18.0.15:0/1143980479.
>>
>> It appears as if the TCP socket is opened to the OSD, but then times
>> out 15 minutes later, the process gets data when the socket is closed
>> on the client server and it retries.
>>
>> Please help.
>>
>> We are using ceph version 0.48.2argonaut
>> (commit:3e02b2fad88c2a95d9c0c86878f10d1beb780bfe).
>>
>> We are using a 3.5.7 kernel with the following list of patches applied:
>>
>> 1-libceph-encapsulate-out-message-data-setup.patch
>> 2-libceph-dont-mark-footer-complete-before-it-is.patch
>> 3-libceph-move-init-of-bio_iter.patch
>> 4-libceph-dont-use-bio_iter-as-a-flag.patch
>> 5-libceph-resubmit-linger-ops-when-pg-mapping-changes.patch
>> 6-libceph-re-initialize-bio_iter-on-start-of-message-receive.patch
>> 7-ceph-close-old-con-before-reopening-on-mds-reconnect.patch
>> 8-libceph-protect-ceph_con_open-with-mutex.patch
>> 9-libceph-reset-connection-retry-on-successfully-negotiation.patch
>> 10-rbd-only-reset-capacity-when-pointing-to-head.patch
>> 11-rbd-set-image-size-when-header-is-updated.patch
>> 12-libceph-fix-crypto-key-null-deref-memory-leak.patch
>> 13-ceph-tolerate-and-warn-on-extraneous-dentry-from-mds.patch
>> 14-ceph-avoid-divide-by-zero-in-__validate_layout.patch
>> 15-rbd-drop-dev-reference-on-error-in-rbd_open.patch
>> 16-ceph-Fix-oops-when-handling-mdsmap-that-decreases-max_mds.patch
>> 17-libceph-check-for-invalid-mapping.patch
>> 18-ceph-propagate-layout-error-on-osd-request-creation.patch
>> 19-rbd-BUG-on-invalid-layout.patch
>> 20-ceph-return-EIO-on-invalid-layout-on-GET_DATALOC-ioctl.patch
>> 21-ceph-avoid-32-bit-page-index-overflow.patch
>> 23-ceph-fix-dentry-reference-leak-in-encode_fh.patch
>>
>> Any suggestions?
>
>
> The log shows your monitors don't have time sychronized enough among
> them to make much progress (including authenticating new connections).
> That's probably the real issue. 0.2s is pretty large clock drift.
>
>
>> One thought is that the following patch (which we could not apply) is
>> what is required:
>>
>> 22-rbd-reset-BACKOFF-if-unable-to-re-queue.patch
>
>
> This is certainly useful too, but I don't think it's the cause of
> the delay in this case.
>
> Josh
> --
> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: rbd map command hangs for 15 minutes during system start up
  2012-11-12 22:19   ` Nick Bartos
@ 2012-11-12 23:16     ` Sage Weil
  2012-11-16  0:21       ` Nick Bartos
  0 siblings, 1 reply; 56+ messages in thread
From: Sage Weil @ 2012-11-12 23:16 UTC (permalink / raw)
  To: Nick Bartos; +Cc: Josh Durgin, Mandell Degerness, ceph-devel

On Mon, 12 Nov 2012, Nick Bartos wrote:
> After removing 8-libceph-protect-ceph_con_open-with-mutex.patch, it
> seems we no longer have this hang.

Hmm, that's a bit disconcerting.  Did this series come from our old 3.5 
stable series?  I recently prepared a new one that backports *all* of the 
fixes from 3.6 to 3.5 (and 3.4); see wip-3.5 in ceph-client.git.  I would 
be curious if you see problems with that.

So far, with these fixes in place, we have not seen any unexplained kernel 
crashes in this code.

I take it you're going back to a 3.5 kernel because you weren't able to 
get rid of the sync problem with 3.6?

sage



> 
> On Thu, Nov 8, 2012 at 5:43 PM, Josh Durgin <josh.durgin@inktank.com> wrote:
> > On 11/08/2012 02:10 PM, Mandell Degerness wrote:
> >>
> >> We are seeing a somewhat random, but frequent hang on our systems
> >> during startup.  The hang happens at the point where an "rbd map
> >> <rbdvol>" command is run.
> >>
> >> I've attached the ceph logs from the cluster.  The map command happens
> >> at Nov  8 18:41:09 on server 172.18.0.15.  The process which hung can
> >> be seen in the log as 172.18.0.15:0/1143980479.
> >>
> >> It appears as if the TCP socket is opened to the OSD, but then times
> >> out 15 minutes later, the process gets data when the socket is closed
> >> on the client server and it retries.
> >>
> >> Please help.
> >>
> >> We are using ceph version 0.48.2argonaut
> >> (commit:3e02b2fad88c2a95d9c0c86878f10d1beb780bfe).
> >>
> >> We are using a 3.5.7 kernel with the following list of patches applied:
> >>
> >> 1-libceph-encapsulate-out-message-data-setup.patch
> >> 2-libceph-dont-mark-footer-complete-before-it-is.patch
> >> 3-libceph-move-init-of-bio_iter.patch
> >> 4-libceph-dont-use-bio_iter-as-a-flag.patch
> >> 5-libceph-resubmit-linger-ops-when-pg-mapping-changes.patch
> >> 6-libceph-re-initialize-bio_iter-on-start-of-message-receive.patch
> >> 7-ceph-close-old-con-before-reopening-on-mds-reconnect.patch
> >> 8-libceph-protect-ceph_con_open-with-mutex.patch
> >> 9-libceph-reset-connection-retry-on-successfully-negotiation.patch
> >> 10-rbd-only-reset-capacity-when-pointing-to-head.patch
> >> 11-rbd-set-image-size-when-header-is-updated.patch
> >> 12-libceph-fix-crypto-key-null-deref-memory-leak.patch
> >> 13-ceph-tolerate-and-warn-on-extraneous-dentry-from-mds.patch
> >> 14-ceph-avoid-divide-by-zero-in-__validate_layout.patch
> >> 15-rbd-drop-dev-reference-on-error-in-rbd_open.patch
> >> 16-ceph-Fix-oops-when-handling-mdsmap-that-decreases-max_mds.patch
> >> 17-libceph-check-for-invalid-mapping.patch
> >> 18-ceph-propagate-layout-error-on-osd-request-creation.patch
> >> 19-rbd-BUG-on-invalid-layout.patch
> >> 20-ceph-return-EIO-on-invalid-layout-on-GET_DATALOC-ioctl.patch
> >> 21-ceph-avoid-32-bit-page-index-overflow.patch
> >> 23-ceph-fix-dentry-reference-leak-in-encode_fh.patch
> >>
> >> Any suggestions?
> >
> >
> > The log shows your monitors don't have time sychronized enough among
> > them to make much progress (including authenticating new connections).
> > That's probably the real issue. 0.2s is pretty large clock drift.
> >
> >
> >> One thought is that the following patch (which we could not apply) is
> >> what is required:
> >>
> >> 22-rbd-reset-BACKOFF-if-unable-to-re-queue.patch
> >
> >
> > This is certainly useful too, but I don't think it's the cause of
> > the delay in this case.
> >
> > Josh
> > --
> > To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
> > the body of a message to majordomo@vger.kernel.org
> > More majordomo info at  http://vger.kernel.org/majordomo-info.html
> --
> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 
> 

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: rbd map command hangs for 15 minutes during system start up
  2012-11-12 23:16     ` Sage Weil
@ 2012-11-16  0:21       ` Nick Bartos
  2012-11-16  0:25         ` Sage Weil
  0 siblings, 1 reply; 56+ messages in thread
From: Nick Bartos @ 2012-11-16  0:21 UTC (permalink / raw)
  To: Sage Weil; +Cc: Josh Durgin, Mandell Degerness, ceph-devel

Sorry I guess this e-mail got missed.  I believe those patches came
from the ceph/linux-3.5.5-ceph branch.  I'm now using the wip-3.5
branch patches, which seem to all be fine.  We'll stick with 3.5 and
this backport for now until we can figure out what's wrong with 3.6.

I typically ignore the wip branches just due to the naming when I'm
looking for updates.  Where should I typically look for updates that
aren't in released kernels?  Also, is there anything else in the wip*
branches that you think we may find particularly useful?


On Mon, Nov 12, 2012 at 3:16 PM, Sage Weil <sage@inktank.com> wrote:
> On Mon, 12 Nov 2012, Nick Bartos wrote:
>> After removing 8-libceph-protect-ceph_con_open-with-mutex.patch, it
>> seems we no longer have this hang.
>
> Hmm, that's a bit disconcerting.  Did this series come from our old 3.5
> stable series?  I recently prepared a new one that backports *all* of the
> fixes from 3.6 to 3.5 (and 3.4); see wip-3.5 in ceph-client.git.  I would
> be curious if you see problems with that.
>
> So far, with these fixes in place, we have not seen any unexplained kernel
> crashes in this code.
>
> I take it you're going back to a 3.5 kernel because you weren't able to
> get rid of the sync problem with 3.6?
>
> sage
>
>
>
>>
>> On Thu, Nov 8, 2012 at 5:43 PM, Josh Durgin <josh.durgin@inktank.com> wrote:
>> > On 11/08/2012 02:10 PM, Mandell Degerness wrote:
>> >>
>> >> We are seeing a somewhat random, but frequent hang on our systems
>> >> during startup.  The hang happens at the point where an "rbd map
>> >> <rbdvol>" command is run.
>> >>
>> >> I've attached the ceph logs from the cluster.  The map command happens
>> >> at Nov  8 18:41:09 on server 172.18.0.15.  The process which hung can
>> >> be seen in the log as 172.18.0.15:0/1143980479.
>> >>
>> >> It appears as if the TCP socket is opened to the OSD, but then times
>> >> out 15 minutes later, the process gets data when the socket is closed
>> >> on the client server and it retries.
>> >>
>> >> Please help.
>> >>
>> >> We are using ceph version 0.48.2argonaut
>> >> (commit:3e02b2fad88c2a95d9c0c86878f10d1beb780bfe).
>> >>
>> >> We are using a 3.5.7 kernel with the following list of patches applied:
>> >>
>> >> 1-libceph-encapsulate-out-message-data-setup.patch
>> >> 2-libceph-dont-mark-footer-complete-before-it-is.patch
>> >> 3-libceph-move-init-of-bio_iter.patch
>> >> 4-libceph-dont-use-bio_iter-as-a-flag.patch
>> >> 5-libceph-resubmit-linger-ops-when-pg-mapping-changes.patch
>> >> 6-libceph-re-initialize-bio_iter-on-start-of-message-receive.patch
>> >> 7-ceph-close-old-con-before-reopening-on-mds-reconnect.patch
>> >> 8-libceph-protect-ceph_con_open-with-mutex.patch
>> >> 9-libceph-reset-connection-retry-on-successfully-negotiation.patch
>> >> 10-rbd-only-reset-capacity-when-pointing-to-head.patch
>> >> 11-rbd-set-image-size-when-header-is-updated.patch
>> >> 12-libceph-fix-crypto-key-null-deref-memory-leak.patch
>> >> 13-ceph-tolerate-and-warn-on-extraneous-dentry-from-mds.patch
>> >> 14-ceph-avoid-divide-by-zero-in-__validate_layout.patch
>> >> 15-rbd-drop-dev-reference-on-error-in-rbd_open.patch
>> >> 16-ceph-Fix-oops-when-handling-mdsmap-that-decreases-max_mds.patch
>> >> 17-libceph-check-for-invalid-mapping.patch
>> >> 18-ceph-propagate-layout-error-on-osd-request-creation.patch
>> >> 19-rbd-BUG-on-invalid-layout.patch
>> >> 20-ceph-return-EIO-on-invalid-layout-on-GET_DATALOC-ioctl.patch
>> >> 21-ceph-avoid-32-bit-page-index-overflow.patch
>> >> 23-ceph-fix-dentry-reference-leak-in-encode_fh.patch
>> >>
>> >> Any suggestions?
>> >
>> >
>> > The log shows your monitors don't have time sychronized enough among
>> > them to make much progress (including authenticating new connections).
>> > That's probably the real issue. 0.2s is pretty large clock drift.
>> >
>> >
>> >> One thought is that the following patch (which we could not apply) is
>> >> what is required:
>> >>
>> >> 22-rbd-reset-BACKOFF-if-unable-to-re-queue.patch
>> >
>> >
>> > This is certainly useful too, but I don't think it's the cause of
>> > the delay in this case.
>> >
>> > Josh
>> > --
>> > To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
>> > the body of a message to majordomo@vger.kernel.org
>> > More majordomo info at  http://vger.kernel.org/majordomo-info.html
>> --
>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
>> the body of a message to majordomo@vger.kernel.org
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>
>>

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: rbd map command hangs for 15 minutes during system start up
  2012-11-16  0:21       ` Nick Bartos
@ 2012-11-16  0:25         ` Sage Weil
  2012-11-16 18:36           ` Nick Bartos
  0 siblings, 1 reply; 56+ messages in thread
From: Sage Weil @ 2012-11-16  0:25 UTC (permalink / raw)
  To: Nick Bartos; +Cc: Josh Durgin, Mandell Degerness, ceph-devel

On Thu, 15 Nov 2012, Nick Bartos wrote:
> Sorry I guess this e-mail got missed.  I believe those patches came
> from the ceph/linux-3.5.5-ceph branch.  I'm now using the wip-3.5
> branch patches, which seem to all be fine.  We'll stick with 3.5 and
> this backport for now until we can figure out what's wrong with 3.6.
> 
> I typically ignore the wip branches just due to the naming when I'm
> looking for updates.  Where should I typically look for updates that
> aren't in released kernels?  Also, is there anything else in the wip*
> branches that you think we may find particularly useful?

You were looking in the right place.  The problem was we weren't super 
organized with our stable patches, and changed our minds about what to 
send upstream.  These are 'wip' in the sense that they were in preparation 
for going upstream.  The goal is to push them to the mainline stable 
kernels and ideally not keep them in our tree at all.

wip-3.5 is an oddity because the mainline stable kernel is EOL'd, but 
we're keeping it so that ubuntu can pick it up for quantal.

I'll make sure these are more clearly marked as stable.

sage


> 
> 
> On Mon, Nov 12, 2012 at 3:16 PM, Sage Weil <sage@inktank.com> wrote:
> > On Mon, 12 Nov 2012, Nick Bartos wrote:
> >> After removing 8-libceph-protect-ceph_con_open-with-mutex.patch, it
> >> seems we no longer have this hang.
> >
> > Hmm, that's a bit disconcerting.  Did this series come from our old 3.5
> > stable series?  I recently prepared a new one that backports *all* of the
> > fixes from 3.6 to 3.5 (and 3.4); see wip-3.5 in ceph-client.git.  I would
> > be curious if you see problems with that.
> >
> > So far, with these fixes in place, we have not seen any unexplained kernel
> > crashes in this code.
> >
> > I take it you're going back to a 3.5 kernel because you weren't able to
> > get rid of the sync problem with 3.6?
> >
> > sage
> >
> >
> >
> >>
> >> On Thu, Nov 8, 2012 at 5:43 PM, Josh Durgin <josh.durgin@inktank.com> wrote:
> >> > On 11/08/2012 02:10 PM, Mandell Degerness wrote:
> >> >>
> >> >> We are seeing a somewhat random, but frequent hang on our systems
> >> >> during startup.  The hang happens at the point where an "rbd map
> >> >> <rbdvol>" command is run.
> >> >>
> >> >> I've attached the ceph logs from the cluster.  The map command happens
> >> >> at Nov  8 18:41:09 on server 172.18.0.15.  The process which hung can
> >> >> be seen in the log as 172.18.0.15:0/1143980479.
> >> >>
> >> >> It appears as if the TCP socket is opened to the OSD, but then times
> >> >> out 15 minutes later, the process gets data when the socket is closed
> >> >> on the client server and it retries.
> >> >>
> >> >> Please help.
> >> >>
> >> >> We are using ceph version 0.48.2argonaut
> >> >> (commit:3e02b2fad88c2a95d9c0c86878f10d1beb780bfe).
> >> >>
> >> >> We are using a 3.5.7 kernel with the following list of patches applied:
> >> >>
> >> >> 1-libceph-encapsulate-out-message-data-setup.patch
> >> >> 2-libceph-dont-mark-footer-complete-before-it-is.patch
> >> >> 3-libceph-move-init-of-bio_iter.patch
> >> >> 4-libceph-dont-use-bio_iter-as-a-flag.patch
> >> >> 5-libceph-resubmit-linger-ops-when-pg-mapping-changes.patch
> >> >> 6-libceph-re-initialize-bio_iter-on-start-of-message-receive.patch
> >> >> 7-ceph-close-old-con-before-reopening-on-mds-reconnect.patch
> >> >> 8-libceph-protect-ceph_con_open-with-mutex.patch
> >> >> 9-libceph-reset-connection-retry-on-successfully-negotiation.patch
> >> >> 10-rbd-only-reset-capacity-when-pointing-to-head.patch
> >> >> 11-rbd-set-image-size-when-header-is-updated.patch
> >> >> 12-libceph-fix-crypto-key-null-deref-memory-leak.patch
> >> >> 13-ceph-tolerate-and-warn-on-extraneous-dentry-from-mds.patch
> >> >> 14-ceph-avoid-divide-by-zero-in-__validate_layout.patch
> >> >> 15-rbd-drop-dev-reference-on-error-in-rbd_open.patch
> >> >> 16-ceph-Fix-oops-when-handling-mdsmap-that-decreases-max_mds.patch
> >> >> 17-libceph-check-for-invalid-mapping.patch
> >> >> 18-ceph-propagate-layout-error-on-osd-request-creation.patch
> >> >> 19-rbd-BUG-on-invalid-layout.patch
> >> >> 20-ceph-return-EIO-on-invalid-layout-on-GET_DATALOC-ioctl.patch
> >> >> 21-ceph-avoid-32-bit-page-index-overflow.patch
> >> >> 23-ceph-fix-dentry-reference-leak-in-encode_fh.patch
> >> >>
> >> >> Any suggestions?
> >> >
> >> >
> >> > The log shows your monitors don't have time sychronized enough among
> >> > them to make much progress (including authenticating new connections).
> >> > That's probably the real issue. 0.2s is pretty large clock drift.
> >> >
> >> >
> >> >> One thought is that the following patch (which we could not apply) is
> >> >> what is required:
> >> >>
> >> >> 22-rbd-reset-BACKOFF-if-unable-to-re-queue.patch
> >> >
> >> >
> >> > This is certainly useful too, but I don't think it's the cause of
> >> > the delay in this case.
> >> >
> >> > Josh
> >> > --
> >> > To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
> >> > the body of a message to majordomo@vger.kernel.org
> >> > More majordomo info at  http://vger.kernel.org/majordomo-info.html
> >> --
> >> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
> >> the body of a message to majordomo@vger.kernel.org
> >> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> >>
> >>
> --
> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 
> 

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: rbd map command hangs for 15 minutes during system start up
  2012-11-16  0:25         ` Sage Weil
@ 2012-11-16 18:36           ` Nick Bartos
  2012-11-16 19:16             ` Sage Weil
  0 siblings, 1 reply; 56+ messages in thread
From: Nick Bartos @ 2012-11-16 18:36 UTC (permalink / raw)
  To: Sage Weil; +Cc: Josh Durgin, Mandell Degerness, ceph-devel

Turns out we're having the 'rbd map' hang on startup again, after we
started using the wip-3.5 patch set.  How critical is the
libceph_protect_ceph_con_open_with_mutex commit?  That's the one I
removed before which seemed to get rid of the problem (although I'm
not completely sure if it completely got rid of it, at least seemed to
happen much less often).

It seems like we only started having this issue after we started
patching the 3.5 ceph client (we started patching to try and get rid
of a kernel oops, which the patches seem to have fixed).


On Thu, Nov 15, 2012 at 4:25 PM, Sage Weil <sage@inktank.com> wrote:
> On Thu, 15 Nov 2012, Nick Bartos wrote:
>> Sorry I guess this e-mail got missed.  I believe those patches came
>> from the ceph/linux-3.5.5-ceph branch.  I'm now using the wip-3.5
>> branch patches, which seem to all be fine.  We'll stick with 3.5 and
>> this backport for now until we can figure out what's wrong with 3.6.
>>
>> I typically ignore the wip branches just due to the naming when I'm
>> looking for updates.  Where should I typically look for updates that
>> aren't in released kernels?  Also, is there anything else in the wip*
>> branches that you think we may find particularly useful?
>
> You were looking in the right place.  The problem was we weren't super
> organized with our stable patches, and changed our minds about what to
> send upstream.  These are 'wip' in the sense that they were in preparation
> for going upstream.  The goal is to push them to the mainline stable
> kernels and ideally not keep them in our tree at all.
>
> wip-3.5 is an oddity because the mainline stable kernel is EOL'd, but
> we're keeping it so that ubuntu can pick it up for quantal.
>
> I'll make sure these are more clearly marked as stable.
>
> sage
>
>
>>
>>
>> On Mon, Nov 12, 2012 at 3:16 PM, Sage Weil <sage@inktank.com> wrote:
>> > On Mon, 12 Nov 2012, Nick Bartos wrote:
>> >> After removing 8-libceph-protect-ceph_con_open-with-mutex.patch, it
>> >> seems we no longer have this hang.
>> >
>> > Hmm, that's a bit disconcerting.  Did this series come from our old 3.5
>> > stable series?  I recently prepared a new one that backports *all* of the
>> > fixes from 3.6 to 3.5 (and 3.4); see wip-3.5 in ceph-client.git.  I would
>> > be curious if you see problems with that.
>> >
>> > So far, with these fixes in place, we have not seen any unexplained kernel
>> > crashes in this code.
>> >
>> > I take it you're going back to a 3.5 kernel because you weren't able to
>> > get rid of the sync problem with 3.6?
>> >
>> > sage
>> >
>> >
>> >
>> >>
>> >> On Thu, Nov 8, 2012 at 5:43 PM, Josh Durgin <josh.durgin@inktank.com> wrote:
>> >> > On 11/08/2012 02:10 PM, Mandell Degerness wrote:
>> >> >>
>> >> >> We are seeing a somewhat random, but frequent hang on our systems
>> >> >> during startup.  The hang happens at the point where an "rbd map
>> >> >> <rbdvol>" command is run.
>> >> >>
>> >> >> I've attached the ceph logs from the cluster.  The map command happens
>> >> >> at Nov  8 18:41:09 on server 172.18.0.15.  The process which hung can
>> >> >> be seen in the log as 172.18.0.15:0/1143980479.
>> >> >>
>> >> >> It appears as if the TCP socket is opened to the OSD, but then times
>> >> >> out 15 minutes later, the process gets data when the socket is closed
>> >> >> on the client server and it retries.
>> >> >>
>> >> >> Please help.
>> >> >>
>> >> >> We are using ceph version 0.48.2argonaut
>> >> >> (commit:3e02b2fad88c2a95d9c0c86878f10d1beb780bfe).
>> >> >>
>> >> >> We are using a 3.5.7 kernel with the following list of patches applied:
>> >> >>
>> >> >> 1-libceph-encapsulate-out-message-data-setup.patch
>> >> >> 2-libceph-dont-mark-footer-complete-before-it-is.patch
>> >> >> 3-libceph-move-init-of-bio_iter.patch
>> >> >> 4-libceph-dont-use-bio_iter-as-a-flag.patch
>> >> >> 5-libceph-resubmit-linger-ops-when-pg-mapping-changes.patch
>> >> >> 6-libceph-re-initialize-bio_iter-on-start-of-message-receive.patch
>> >> >> 7-ceph-close-old-con-before-reopening-on-mds-reconnect.patch
>> >> >> 8-libceph-protect-ceph_con_open-with-mutex.patch
>> >> >> 9-libceph-reset-connection-retry-on-successfully-negotiation.patch
>> >> >> 10-rbd-only-reset-capacity-when-pointing-to-head.patch
>> >> >> 11-rbd-set-image-size-when-header-is-updated.patch
>> >> >> 12-libceph-fix-crypto-key-null-deref-memory-leak.patch
>> >> >> 13-ceph-tolerate-and-warn-on-extraneous-dentry-from-mds.patch
>> >> >> 14-ceph-avoid-divide-by-zero-in-__validate_layout.patch
>> >> >> 15-rbd-drop-dev-reference-on-error-in-rbd_open.patch
>> >> >> 16-ceph-Fix-oops-when-handling-mdsmap-that-decreases-max_mds.patch
>> >> >> 17-libceph-check-for-invalid-mapping.patch
>> >> >> 18-ceph-propagate-layout-error-on-osd-request-creation.patch
>> >> >> 19-rbd-BUG-on-invalid-layout.patch
>> >> >> 20-ceph-return-EIO-on-invalid-layout-on-GET_DATALOC-ioctl.patch
>> >> >> 21-ceph-avoid-32-bit-page-index-overflow.patch
>> >> >> 23-ceph-fix-dentry-reference-leak-in-encode_fh.patch
>> >> >>
>> >> >> Any suggestions?
>> >> >
>> >> >
>> >> > The log shows your monitors don't have time sychronized enough among
>> >> > them to make much progress (including authenticating new connections).
>> >> > That's probably the real issue. 0.2s is pretty large clock drift.
>> >> >
>> >> >
>> >> >> One thought is that the following patch (which we could not apply) is
>> >> >> what is required:
>> >> >>
>> >> >> 22-rbd-reset-BACKOFF-if-unable-to-re-queue.patch
>> >> >
>> >> >
>> >> > This is certainly useful too, but I don't think it's the cause of
>> >> > the delay in this case.
>> >> >
>> >> > Josh
>> >> > --
>> >> > To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
>> >> > the body of a message to majordomo@vger.kernel.org
>> >> > More majordomo info at  http://vger.kernel.org/majordomo-info.html
>> >> --
>> >> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
>> >> the body of a message to majordomo@vger.kernel.org
>> >> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>> >>
>> >>
>> --
>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
>> the body of a message to majordomo@vger.kernel.org
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>
>>

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: rbd map command hangs for 15 minutes during system start up
  2012-11-16 18:36           ` Nick Bartos
@ 2012-11-16 19:16             ` Sage Weil
  2012-11-16 22:01               ` Nick Bartos
  0 siblings, 1 reply; 56+ messages in thread
From: Sage Weil @ 2012-11-16 19:16 UTC (permalink / raw)
  To: Nick Bartos; +Cc: Josh Durgin, Mandell Degerness, ceph-devel

I just realized I was mixing up this thread with the other deadlock 
thread.  

On Fri, 16 Nov 2012, Nick Bartos wrote:
> Turns out we're having the 'rbd map' hang on startup again, after we
> started using the wip-3.5 patch set.  How critical is the
> libceph_protect_ceph_con_open_with_mutex commit?  That's the one I
> removed before which seemed to get rid of the problem (although I'm
> not completely sure if it completely got rid of it, at least seemed to
> happen much less often).
> 
> It seems like we only started having this issue after we started
> patching the 3.5 ceph client (we started patching to try and get rid
> of a kernel oops, which the patches seem to have fixed).

Right.  That patch fixes a real bug.  It also seems pretty unlikely that 
this patch is related to the startup hang.  The original log showed clock 
drift on the monitor that could very easily cause this sort of hang.  Can 
you confirm that that isn't the case with this recent instance of the 
problem?  And/or attach a log?

Thanks-
sage


> 
> 
> On Thu, Nov 15, 2012 at 4:25 PM, Sage Weil <sage@inktank.com> wrote:
> > On Thu, 15 Nov 2012, Nick Bartos wrote:
> >> Sorry I guess this e-mail got missed.  I believe those patches came
> >> from the ceph/linux-3.5.5-ceph branch.  I'm now using the wip-3.5
> >> branch patches, which seem to all be fine.  We'll stick with 3.5 and
> >> this backport for now until we can figure out what's wrong with 3.6.
> >>
> >> I typically ignore the wip branches just due to the naming when I'm
> >> looking for updates.  Where should I typically look for updates that
> >> aren't in released kernels?  Also, is there anything else in the wip*
> >> branches that you think we may find particularly useful?
> >
> > You were looking in the right place.  The problem was we weren't super
> > organized with our stable patches, and changed our minds about what to
> > send upstream.  These are 'wip' in the sense that they were in preparation
> > for going upstream.  The goal is to push them to the mainline stable
> > kernels and ideally not keep them in our tree at all.
> >
> > wip-3.5 is an oddity because the mainline stable kernel is EOL'd, but
> > we're keeping it so that ubuntu can pick it up for quantal.
> >
> > I'll make sure these are more clearly marked as stable.
> >
> > sage
> >
> >
> >>
> >>
> >> On Mon, Nov 12, 2012 at 3:16 PM, Sage Weil <sage@inktank.com> wrote:
> >> > On Mon, 12 Nov 2012, Nick Bartos wrote:
> >> >> After removing 8-libceph-protect-ceph_con_open-with-mutex.patch, it
> >> >> seems we no longer have this hang.
> >> >
> >> > Hmm, that's a bit disconcerting.  Did this series come from our old 3.5
> >> > stable series?  I recently prepared a new one that backports *all* of the
> >> > fixes from 3.6 to 3.5 (and 3.4); see wip-3.5 in ceph-client.git.  I would
> >> > be curious if you see problems with that.
> >> >
> >> > So far, with these fixes in place, we have not seen any unexplained kernel
> >> > crashes in this code.
> >> >
> >> > I take it you're going back to a 3.5 kernel because you weren't able to
> >> > get rid of the sync problem with 3.6?
> >> >
> >> > sage
> >> >
> >> >
> >> >
> >> >>
> >> >> On Thu, Nov 8, 2012 at 5:43 PM, Josh Durgin <josh.durgin@inktank.com> wrote:
> >> >> > On 11/08/2012 02:10 PM, Mandell Degerness wrote:
> >> >> >>
> >> >> >> We are seeing a somewhat random, but frequent hang on our systems
> >> >> >> during startup.  The hang happens at the point where an "rbd map
> >> >> >> <rbdvol>" command is run.
> >> >> >>
> >> >> >> I've attached the ceph logs from the cluster.  The map command happens
> >> >> >> at Nov  8 18:41:09 on server 172.18.0.15.  The process which hung can
> >> >> >> be seen in the log as 172.18.0.15:0/1143980479.
> >> >> >>
> >> >> >> It appears as if the TCP socket is opened to the OSD, but then times
> >> >> >> out 15 minutes later, the process gets data when the socket is closed
> >> >> >> on the client server and it retries.
> >> >> >>
> >> >> >> Please help.
> >> >> >>
> >> >> >> We are using ceph version 0.48.2argonaut
> >> >> >> (commit:3e02b2fad88c2a95d9c0c86878f10d1beb780bfe).
> >> >> >>
> >> >> >> We are using a 3.5.7 kernel with the following list of patches applied:
> >> >> >>
> >> >> >> 1-libceph-encapsulate-out-message-data-setup.patch
> >> >> >> 2-libceph-dont-mark-footer-complete-before-it-is.patch
> >> >> >> 3-libceph-move-init-of-bio_iter.patch
> >> >> >> 4-libceph-dont-use-bio_iter-as-a-flag.patch
> >> >> >> 5-libceph-resubmit-linger-ops-when-pg-mapping-changes.patch
> >> >> >> 6-libceph-re-initialize-bio_iter-on-start-of-message-receive.patch
> >> >> >> 7-ceph-close-old-con-before-reopening-on-mds-reconnect.patch
> >> >> >> 8-libceph-protect-ceph_con_open-with-mutex.patch
> >> >> >> 9-libceph-reset-connection-retry-on-successfully-negotiation.patch
> >> >> >> 10-rbd-only-reset-capacity-when-pointing-to-head.patch
> >> >> >> 11-rbd-set-image-size-when-header-is-updated.patch
> >> >> >> 12-libceph-fix-crypto-key-null-deref-memory-leak.patch
> >> >> >> 13-ceph-tolerate-and-warn-on-extraneous-dentry-from-mds.patch
> >> >> >> 14-ceph-avoid-divide-by-zero-in-__validate_layout.patch
> >> >> >> 15-rbd-drop-dev-reference-on-error-in-rbd_open.patch
> >> >> >> 16-ceph-Fix-oops-when-handling-mdsmap-that-decreases-max_mds.patch
> >> >> >> 17-libceph-check-for-invalid-mapping.patch
> >> >> >> 18-ceph-propagate-layout-error-on-osd-request-creation.patch
> >> >> >> 19-rbd-BUG-on-invalid-layout.patch
> >> >> >> 20-ceph-return-EIO-on-invalid-layout-on-GET_DATALOC-ioctl.patch
> >> >> >> 21-ceph-avoid-32-bit-page-index-overflow.patch
> >> >> >> 23-ceph-fix-dentry-reference-leak-in-encode_fh.patch
> >> >> >>
> >> >> >> Any suggestions?
> >> >> >
> >> >> >
> >> >> > The log shows your monitors don't have time sychronized enough among
> >> >> > them to make much progress (including authenticating new connections).
> >> >> > That's probably the real issue. 0.2s is pretty large clock drift.
> >> >> >
> >> >> >
> >> >> >> One thought is that the following patch (which we could not apply) is
> >> >> >> what is required:
> >> >> >>
> >> >> >> 22-rbd-reset-BACKOFF-if-unable-to-re-queue.patch
> >> >> >
> >> >> >
> >> >> > This is certainly useful too, but I don't think it's the cause of
> >> >> > the delay in this case.
> >> >> >
> >> >> > Josh
> >> >> > --
> >> >> > To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
> >> >> > the body of a message to majordomo@vger.kernel.org
> >> >> > More majordomo info at  http://vger.kernel.org/majordomo-info.html
> >> >> --
> >> >> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
> >> >> the body of a message to majordomo@vger.kernel.org
> >> >> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> >> >>
> >> >>
> >> --
> >> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
> >> the body of a message to majordomo@vger.kernel.org
> >> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> >>
> >>
> --
> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 
> 

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: rbd map command hangs for 15 minutes during system start up
  2012-11-16 19:16             ` Sage Weil
@ 2012-11-16 22:01               ` Nick Bartos
  2012-11-16 22:13                 ` Sage Weil
  0 siblings, 1 reply; 56+ messages in thread
From: Nick Bartos @ 2012-11-16 22:01 UTC (permalink / raw)
  To: Sage Weil; +Cc: Josh Durgin, Mandell Degerness, ceph-devel

How far off do the clocks need to be before there is a problem?  It
would seem to be hard to ensure a very large cluster has all of it's
nodes synchronized within 50ms (which seems to be the default for "mon
clock drift allowed").  Does the mon clock drift allowed parameter
change anything other than the log messages?  Are there any other
tuning options that may help, assuming that this is the issue and it's
not feasible to get the clocks more than 500ms in sync between all
nodes?

I'm trying to get a good way of reproducing this and get a trace on
the ceph processes to see what they're waiting on.  I'll let you know
when I have more info.


On Fri, Nov 16, 2012 at 11:16 AM, Sage Weil <sage@inktank.com> wrote:
> I just realized I was mixing up this thread with the other deadlock
> thread.
>
> On Fri, 16 Nov 2012, Nick Bartos wrote:
>> Turns out we're having the 'rbd map' hang on startup again, after we
>> started using the wip-3.5 patch set.  How critical is the
>> libceph_protect_ceph_con_open_with_mutex commit?  That's the one I
>> removed before which seemed to get rid of the problem (although I'm
>> not completely sure if it completely got rid of it, at least seemed to
>> happen much less often).
>>
>> It seems like we only started having this issue after we started
>> patching the 3.5 ceph client (we started patching to try and get rid
>> of a kernel oops, which the patches seem to have fixed).
>
> Right.  That patch fixes a real bug.  It also seems pretty unlikely that
> this patch is related to the startup hang.  The original log showed clock
> drift on the monitor that could very easily cause this sort of hang.  Can
> you confirm that that isn't the case with this recent instance of the
> problem?  And/or attach a log?
>
> Thanks-
> sage
>
>
>>
>>
>> On Thu, Nov 15, 2012 at 4:25 PM, Sage Weil <sage@inktank.com> wrote:
>> > On Thu, 15 Nov 2012, Nick Bartos wrote:
>> >> Sorry I guess this e-mail got missed.  I believe those patches came
>> >> from the ceph/linux-3.5.5-ceph branch.  I'm now using the wip-3.5
>> >> branch patches, which seem to all be fine.  We'll stick with 3.5 and
>> >> this backport for now until we can figure out what's wrong with 3.6.
>> >>
>> >> I typically ignore the wip branches just due to the naming when I'm
>> >> looking for updates.  Where should I typically look for updates that
>> >> aren't in released kernels?  Also, is there anything else in the wip*
>> >> branches that you think we may find particularly useful?
>> >
>> > You were looking in the right place.  The problem was we weren't super
>> > organized with our stable patches, and changed our minds about what to
>> > send upstream.  These are 'wip' in the sense that they were in preparation
>> > for going upstream.  The goal is to push them to the mainline stable
>> > kernels and ideally not keep them in our tree at all.
>> >
>> > wip-3.5 is an oddity because the mainline stable kernel is EOL'd, but
>> > we're keeping it so that ubuntu can pick it up for quantal.
>> >
>> > I'll make sure these are more clearly marked as stable.
>> >
>> > sage
>> >
>> >
>> >>
>> >>
>> >> On Mon, Nov 12, 2012 at 3:16 PM, Sage Weil <sage@inktank.com> wrote:
>> >> > On Mon, 12 Nov 2012, Nick Bartos wrote:
>> >> >> After removing 8-libceph-protect-ceph_con_open-with-mutex.patch, it
>> >> >> seems we no longer have this hang.
>> >> >
>> >> > Hmm, that's a bit disconcerting.  Did this series come from our old 3.5
>> >> > stable series?  I recently prepared a new one that backports *all* of the
>> >> > fixes from 3.6 to 3.5 (and 3.4); see wip-3.5 in ceph-client.git.  I would
>> >> > be curious if you see problems with that.
>> >> >
>> >> > So far, with these fixes in place, we have not seen any unexplained kernel
>> >> > crashes in this code.
>> >> >
>> >> > I take it you're going back to a 3.5 kernel because you weren't able to
>> >> > get rid of the sync problem with 3.6?
>> >> >
>> >> > sage
>> >> >
>> >> >
>> >> >
>> >> >>
>> >> >> On Thu, Nov 8, 2012 at 5:43 PM, Josh Durgin <josh.durgin@inktank.com> wrote:
>> >> >> > On 11/08/2012 02:10 PM, Mandell Degerness wrote:
>> >> >> >>
>> >> >> >> We are seeing a somewhat random, but frequent hang on our systems
>> >> >> >> during startup.  The hang happens at the point where an "rbd map
>> >> >> >> <rbdvol>" command is run.
>> >> >> >>
>> >> >> >> I've attached the ceph logs from the cluster.  The map command happens
>> >> >> >> at Nov  8 18:41:09 on server 172.18.0.15.  The process which hung can
>> >> >> >> be seen in the log as 172.18.0.15:0/1143980479.
>> >> >> >>
>> >> >> >> It appears as if the TCP socket is opened to the OSD, but then times
>> >> >> >> out 15 minutes later, the process gets data when the socket is closed
>> >> >> >> on the client server and it retries.
>> >> >> >>
>> >> >> >> Please help.
>> >> >> >>
>> >> >> >> We are using ceph version 0.48.2argonaut
>> >> >> >> (commit:3e02b2fad88c2a95d9c0c86878f10d1beb780bfe).
>> >> >> >>
>> >> >> >> We are using a 3.5.7 kernel with the following list of patches applied:
>> >> >> >>
>> >> >> >> 1-libceph-encapsulate-out-message-data-setup.patch
>> >> >> >> 2-libceph-dont-mark-footer-complete-before-it-is.patch
>> >> >> >> 3-libceph-move-init-of-bio_iter.patch
>> >> >> >> 4-libceph-dont-use-bio_iter-as-a-flag.patch
>> >> >> >> 5-libceph-resubmit-linger-ops-when-pg-mapping-changes.patch
>> >> >> >> 6-libceph-re-initialize-bio_iter-on-start-of-message-receive.patch
>> >> >> >> 7-ceph-close-old-con-before-reopening-on-mds-reconnect.patch
>> >> >> >> 8-libceph-protect-ceph_con_open-with-mutex.patch
>> >> >> >> 9-libceph-reset-connection-retry-on-successfully-negotiation.patch
>> >> >> >> 10-rbd-only-reset-capacity-when-pointing-to-head.patch
>> >> >> >> 11-rbd-set-image-size-when-header-is-updated.patch
>> >> >> >> 12-libceph-fix-crypto-key-null-deref-memory-leak.patch
>> >> >> >> 13-ceph-tolerate-and-warn-on-extraneous-dentry-from-mds.patch
>> >> >> >> 14-ceph-avoid-divide-by-zero-in-__validate_layout.patch
>> >> >> >> 15-rbd-drop-dev-reference-on-error-in-rbd_open.patch
>> >> >> >> 16-ceph-Fix-oops-when-handling-mdsmap-that-decreases-max_mds.patch
>> >> >> >> 17-libceph-check-for-invalid-mapping.patch
>> >> >> >> 18-ceph-propagate-layout-error-on-osd-request-creation.patch
>> >> >> >> 19-rbd-BUG-on-invalid-layout.patch
>> >> >> >> 20-ceph-return-EIO-on-invalid-layout-on-GET_DATALOC-ioctl.patch
>> >> >> >> 21-ceph-avoid-32-bit-page-index-overflow.patch
>> >> >> >> 23-ceph-fix-dentry-reference-leak-in-encode_fh.patch
>> >> >> >>
>> >> >> >> Any suggestions?
>> >> >> >
>> >> >> >
>> >> >> > The log shows your monitors don't have time sychronized enough among
>> >> >> > them to make much progress (including authenticating new connections).
>> >> >> > That's probably the real issue. 0.2s is pretty large clock drift.
>> >> >> >
>> >> >> >
>> >> >> >> One thought is that the following patch (which we could not apply) is
>> >> >> >> what is required:
>> >> >> >>
>> >> >> >> 22-rbd-reset-BACKOFF-if-unable-to-re-queue.patch
>> >> >> >
>> >> >> >
>> >> >> > This is certainly useful too, but I don't think it's the cause of
>> >> >> > the delay in this case.
>> >> >> >
>> >> >> > Josh
>> >> >> > --
>> >> >> > To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
>> >> >> > the body of a message to majordomo@vger.kernel.org
>> >> >> > More majordomo info at  http://vger.kernel.org/majordomo-info.html
>> >> >> --
>> >> >> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
>> >> >> the body of a message to majordomo@vger.kernel.org
>> >> >> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>> >> >>
>> >> >>
>> >> --
>> >> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
>> >> the body of a message to majordomo@vger.kernel.org
>> >> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>> >>
>> >>
>> --
>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
>> the body of a message to majordomo@vger.kernel.org
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>
>>

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: rbd map command hangs for 15 minutes during system start up
  2012-11-16 22:01               ` Nick Bartos
@ 2012-11-16 22:13                 ` Sage Weil
  2012-11-16 22:16                   ` Nick Bartos
  0 siblings, 1 reply; 56+ messages in thread
From: Sage Weil @ 2012-11-16 22:13 UTC (permalink / raw)
  To: Nick Bartos; +Cc: Josh Durgin, Mandell Degerness, ceph-devel

You can safely set the clock drift allowed as high as 500ms.  The real 
limitation is that it needs to be well under the lease interval, which is 
currently 5 seconds by default.

You might be able to reproduce more easily by lowering the threshold...

sage


On Fri, 16 Nov 2012, Nick Bartos wrote:

> How far off do the clocks need to be before there is a problem?  It
> would seem to be hard to ensure a very large cluster has all of it's
> nodes synchronized within 50ms (which seems to be the default for "mon
> clock drift allowed").  Does the mon clock drift allowed parameter
> change anything other than the log messages?  Are there any other
> tuning options that may help, assuming that this is the issue and it's
> not feasible to get the clocks more than 500ms in sync between all
> nodes?
> 
> I'm trying to get a good way of reproducing this and get a trace on
> the ceph processes to see what they're waiting on.  I'll let you know
> when I have more info.
> 
> 
> On Fri, Nov 16, 2012 at 11:16 AM, Sage Weil <sage@inktank.com> wrote:
> > I just realized I was mixing up this thread with the other deadlock
> > thread.
> >
> > On Fri, 16 Nov 2012, Nick Bartos wrote:
> >> Turns out we're having the 'rbd map' hang on startup again, after we
> >> started using the wip-3.5 patch set.  How critical is the
> >> libceph_protect_ceph_con_open_with_mutex commit?  That's the one I
> >> removed before which seemed to get rid of the problem (although I'm
> >> not completely sure if it completely got rid of it, at least seemed to
> >> happen much less often).
> >>
> >> It seems like we only started having this issue after we started
> >> patching the 3.5 ceph client (we started patching to try and get rid
> >> of a kernel oops, which the patches seem to have fixed).
> >
> > Right.  That patch fixes a real bug.  It also seems pretty unlikely that
> > this patch is related to the startup hang.  The original log showed clock
> > drift on the monitor that could very easily cause this sort of hang.  Can
> > you confirm that that isn't the case with this recent instance of the
> > problem?  And/or attach a log?
> >
> > Thanks-
> > sage
> >
> >
> >>
> >>
> >> On Thu, Nov 15, 2012 at 4:25 PM, Sage Weil <sage@inktank.com> wrote:
> >> > On Thu, 15 Nov 2012, Nick Bartos wrote:
> >> >> Sorry I guess this e-mail got missed.  I believe those patches came
> >> >> from the ceph/linux-3.5.5-ceph branch.  I'm now using the wip-3.5
> >> >> branch patches, which seem to all be fine.  We'll stick with 3.5 and
> >> >> this backport for now until we can figure out what's wrong with 3.6.
> >> >>
> >> >> I typically ignore the wip branches just due to the naming when I'm
> >> >> looking for updates.  Where should I typically look for updates that
> >> >> aren't in released kernels?  Also, is there anything else in the wip*
> >> >> branches that you think we may find particularly useful?
> >> >
> >> > You were looking in the right place.  The problem was we weren't super
> >> > organized with our stable patches, and changed our minds about what to
> >> > send upstream.  These are 'wip' in the sense that they were in preparation
> >> > for going upstream.  The goal is to push them to the mainline stable
> >> > kernels and ideally not keep them in our tree at all.
> >> >
> >> > wip-3.5 is an oddity because the mainline stable kernel is EOL'd, but
> >> > we're keeping it so that ubuntu can pick it up for quantal.
> >> >
> >> > I'll make sure these are more clearly marked as stable.
> >> >
> >> > sage
> >> >
> >> >
> >> >>
> >> >>
> >> >> On Mon, Nov 12, 2012 at 3:16 PM, Sage Weil <sage@inktank.com> wrote:
> >> >> > On Mon, 12 Nov 2012, Nick Bartos wrote:
> >> >> >> After removing 8-libceph-protect-ceph_con_open-with-mutex.patch, it
> >> >> >> seems we no longer have this hang.
> >> >> >
> >> >> > Hmm, that's a bit disconcerting.  Did this series come from our old 3.5
> >> >> > stable series?  I recently prepared a new one that backports *all* of the
> >> >> > fixes from 3.6 to 3.5 (and 3.4); see wip-3.5 in ceph-client.git.  I would
> >> >> > be curious if you see problems with that.
> >> >> >
> >> >> > So far, with these fixes in place, we have not seen any unexplained kernel
> >> >> > crashes in this code.
> >> >> >
> >> >> > I take it you're going back to a 3.5 kernel because you weren't able to
> >> >> > get rid of the sync problem with 3.6?
> >> >> >
> >> >> > sage
> >> >> >
> >> >> >
> >> >> >
> >> >> >>
> >> >> >> On Thu, Nov 8, 2012 at 5:43 PM, Josh Durgin <josh.durgin@inktank.com> wrote:
> >> >> >> > On 11/08/2012 02:10 PM, Mandell Degerness wrote:
> >> >> >> >>
> >> >> >> >> We are seeing a somewhat random, but frequent hang on our systems
> >> >> >> >> during startup.  The hang happens at the point where an "rbd map
> >> >> >> >> <rbdvol>" command is run.
> >> >> >> >>
> >> >> >> >> I've attached the ceph logs from the cluster.  The map command happens
> >> >> >> >> at Nov  8 18:41:09 on server 172.18.0.15.  The process which hung can
> >> >> >> >> be seen in the log as 172.18.0.15:0/1143980479.
> >> >> >> >>
> >> >> >> >> It appears as if the TCP socket is opened to the OSD, but then times
> >> >> >> >> out 15 minutes later, the process gets data when the socket is closed
> >> >> >> >> on the client server and it retries.
> >> >> >> >>
> >> >> >> >> Please help.
> >> >> >> >>
> >> >> >> >> We are using ceph version 0.48.2argonaut
> >> >> >> >> (commit:3e02b2fad88c2a95d9c0c86878f10d1beb780bfe).
> >> >> >> >>
> >> >> >> >> We are using a 3.5.7 kernel with the following list of patches applied:
> >> >> >> >>
> >> >> >> >> 1-libceph-encapsulate-out-message-data-setup.patch
> >> >> >> >> 2-libceph-dont-mark-footer-complete-before-it-is.patch
> >> >> >> >> 3-libceph-move-init-of-bio_iter.patch
> >> >> >> >> 4-libceph-dont-use-bio_iter-as-a-flag.patch
> >> >> >> >> 5-libceph-resubmit-linger-ops-when-pg-mapping-changes.patch
> >> >> >> >> 6-libceph-re-initialize-bio_iter-on-start-of-message-receive.patch
> >> >> >> >> 7-ceph-close-old-con-before-reopening-on-mds-reconnect.patch
> >> >> >> >> 8-libceph-protect-ceph_con_open-with-mutex.patch
> >> >> >> >> 9-libceph-reset-connection-retry-on-successfully-negotiation.patch
> >> >> >> >> 10-rbd-only-reset-capacity-when-pointing-to-head.patch
> >> >> >> >> 11-rbd-set-image-size-when-header-is-updated.patch
> >> >> >> >> 12-libceph-fix-crypto-key-null-deref-memory-leak.patch
> >> >> >> >> 13-ceph-tolerate-and-warn-on-extraneous-dentry-from-mds.patch
> >> >> >> >> 14-ceph-avoid-divide-by-zero-in-__validate_layout.patch
> >> >> >> >> 15-rbd-drop-dev-reference-on-error-in-rbd_open.patch
> >> >> >> >> 16-ceph-Fix-oops-when-handling-mdsmap-that-decreases-max_mds.patch
> >> >> >> >> 17-libceph-check-for-invalid-mapping.patch
> >> >> >> >> 18-ceph-propagate-layout-error-on-osd-request-creation.patch
> >> >> >> >> 19-rbd-BUG-on-invalid-layout.patch
> >> >> >> >> 20-ceph-return-EIO-on-invalid-layout-on-GET_DATALOC-ioctl.patch
> >> >> >> >> 21-ceph-avoid-32-bit-page-index-overflow.patch
> >> >> >> >> 23-ceph-fix-dentry-reference-leak-in-encode_fh.patch
> >> >> >> >>
> >> >> >> >> Any suggestions?
> >> >> >> >
> >> >> >> >
> >> >> >> > The log shows your monitors don't have time sychronized enough among
> >> >> >> > them to make much progress (including authenticating new connections).
> >> >> >> > That's probably the real issue. 0.2s is pretty large clock drift.
> >> >> >> >
> >> >> >> >
> >> >> >> >> One thought is that the following patch (which we could not apply) is
> >> >> >> >> what is required:
> >> >> >> >>
> >> >> >> >> 22-rbd-reset-BACKOFF-if-unable-to-re-queue.patch
> >> >> >> >
> >> >> >> >
> >> >> >> > This is certainly useful too, but I don't think it's the cause of
> >> >> >> > the delay in this case.
> >> >> >> >
> >> >> >> > Josh
> >> >> >> > --
> >> >> >> > To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
> >> >> >> > the body of a message to majordomo@vger.kernel.org
> >> >> >> > More majordomo info at  http://vger.kernel.org/majordomo-info.html
> >> >> >> --
> >> >> >> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
> >> >> >> the body of a message to majordomo@vger.kernel.org
> >> >> >> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> >> >> >>
> >> >> >>
> >> >> --
> >> >> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
> >> >> the body of a message to majordomo@vger.kernel.org
> >> >> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> >> >>
> >> >>
> >> --
> >> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
> >> the body of a message to majordomo@vger.kernel.org
> >> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> >>
> >>
> 
> 

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: rbd map command hangs for 15 minutes during system start up
  2012-11-16 22:13                 ` Sage Weil
@ 2012-11-16 22:16                   ` Nick Bartos
  2012-11-16 22:21                     ` Sage Weil
  2012-11-16 22:23                     ` Gregory Farnum
  0 siblings, 2 replies; 56+ messages in thread
From: Nick Bartos @ 2012-11-16 22:16 UTC (permalink / raw)
  To: Sage Weil; +Cc: Josh Durgin, Mandell Degerness, ceph-devel

Should I be lowering the clock drift allowed, or the lease interval to
help reproduce it?

On Fri, Nov 16, 2012 at 2:13 PM, Sage Weil <sage@inktank.com> wrote:
> You can safely set the clock drift allowed as high as 500ms.  The real
> limitation is that it needs to be well under the lease interval, which is
> currently 5 seconds by default.
>
> You might be able to reproduce more easily by lowering the threshold...
>
> sage
>
>
> On Fri, 16 Nov 2012, Nick Bartos wrote:
>
>> How far off do the clocks need to be before there is a problem?  It
>> would seem to be hard to ensure a very large cluster has all of it's
>> nodes synchronized within 50ms (which seems to be the default for "mon
>> clock drift allowed").  Does the mon clock drift allowed parameter
>> change anything other than the log messages?  Are there any other
>> tuning options that may help, assuming that this is the issue and it's
>> not feasible to get the clocks more than 500ms in sync between all
>> nodes?
>>
>> I'm trying to get a good way of reproducing this and get a trace on
>> the ceph processes to see what they're waiting on.  I'll let you know
>> when I have more info.
>>
>>
>> On Fri, Nov 16, 2012 at 11:16 AM, Sage Weil <sage@inktank.com> wrote:
>> > I just realized I was mixing up this thread with the other deadlock
>> > thread.
>> >
>> > On Fri, 16 Nov 2012, Nick Bartos wrote:
>> >> Turns out we're having the 'rbd map' hang on startup again, after we
>> >> started using the wip-3.5 patch set.  How critical is the
>> >> libceph_protect_ceph_con_open_with_mutex commit?  That's the one I
>> >> removed before which seemed to get rid of the problem (although I'm
>> >> not completely sure if it completely got rid of it, at least seemed to
>> >> happen much less often).
>> >>
>> >> It seems like we only started having this issue after we started
>> >> patching the 3.5 ceph client (we started patching to try and get rid
>> >> of a kernel oops, which the patches seem to have fixed).
>> >
>> > Right.  That patch fixes a real bug.  It also seems pretty unlikely that
>> > this patch is related to the startup hang.  The original log showed clock
>> > drift on the monitor that could very easily cause this sort of hang.  Can
>> > you confirm that that isn't the case with this recent instance of the
>> > problem?  And/or attach a log?
>> >
>> > Thanks-
>> > sage
>> >
>> >
>> >>
>> >>
>> >> On Thu, Nov 15, 2012 at 4:25 PM, Sage Weil <sage@inktank.com> wrote:
>> >> > On Thu, 15 Nov 2012, Nick Bartos wrote:
>> >> >> Sorry I guess this e-mail got missed.  I believe those patches came
>> >> >> from the ceph/linux-3.5.5-ceph branch.  I'm now using the wip-3.5
>> >> >> branch patches, which seem to all be fine.  We'll stick with 3.5 and
>> >> >> this backport for now until we can figure out what's wrong with 3.6.
>> >> >>
>> >> >> I typically ignore the wip branches just due to the naming when I'm
>> >> >> looking for updates.  Where should I typically look for updates that
>> >> >> aren't in released kernels?  Also, is there anything else in the wip*
>> >> >> branches that you think we may find particularly useful?
>> >> >
>> >> > You were looking in the right place.  The problem was we weren't super
>> >> > organized with our stable patches, and changed our minds about what to
>> >> > send upstream.  These are 'wip' in the sense that they were in preparation
>> >> > for going upstream.  The goal is to push them to the mainline stable
>> >> > kernels and ideally not keep them in our tree at all.
>> >> >
>> >> > wip-3.5 is an oddity because the mainline stable kernel is EOL'd, but
>> >> > we're keeping it so that ubuntu can pick it up for quantal.
>> >> >
>> >> > I'll make sure these are more clearly marked as stable.
>> >> >
>> >> > sage
>> >> >
>> >> >
>> >> >>
>> >> >>
>> >> >> On Mon, Nov 12, 2012 at 3:16 PM, Sage Weil <sage@inktank.com> wrote:
>> >> >> > On Mon, 12 Nov 2012, Nick Bartos wrote:
>> >> >> >> After removing 8-libceph-protect-ceph_con_open-with-mutex.patch, it
>> >> >> >> seems we no longer have this hang.
>> >> >> >
>> >> >> > Hmm, that's a bit disconcerting.  Did this series come from our old 3.5
>> >> >> > stable series?  I recently prepared a new one that backports *all* of the
>> >> >> > fixes from 3.6 to 3.5 (and 3.4); see wip-3.5 in ceph-client.git.  I would
>> >> >> > be curious if you see problems with that.
>> >> >> >
>> >> >> > So far, with these fixes in place, we have not seen any unexplained kernel
>> >> >> > crashes in this code.
>> >> >> >
>> >> >> > I take it you're going back to a 3.5 kernel because you weren't able to
>> >> >> > get rid of the sync problem with 3.6?
>> >> >> >
>> >> >> > sage
>> >> >> >
>> >> >> >
>> >> >> >
>> >> >> >>
>> >> >> >> On Thu, Nov 8, 2012 at 5:43 PM, Josh Durgin <josh.durgin@inktank.com> wrote:
>> >> >> >> > On 11/08/2012 02:10 PM, Mandell Degerness wrote:
>> >> >> >> >>
>> >> >> >> >> We are seeing a somewhat random, but frequent hang on our systems
>> >> >> >> >> during startup.  The hang happens at the point where an "rbd map
>> >> >> >> >> <rbdvol>" command is run.
>> >> >> >> >>
>> >> >> >> >> I've attached the ceph logs from the cluster.  The map command happens
>> >> >> >> >> at Nov  8 18:41:09 on server 172.18.0.15.  The process which hung can
>> >> >> >> >> be seen in the log as 172.18.0.15:0/1143980479.
>> >> >> >> >>
>> >> >> >> >> It appears as if the TCP socket is opened to the OSD, but then times
>> >> >> >> >> out 15 minutes later, the process gets data when the socket is closed
>> >> >> >> >> on the client server and it retries.
>> >> >> >> >>
>> >> >> >> >> Please help.
>> >> >> >> >>
>> >> >> >> >> We are using ceph version 0.48.2argonaut
>> >> >> >> >> (commit:3e02b2fad88c2a95d9c0c86878f10d1beb780bfe).
>> >> >> >> >>
>> >> >> >> >> We are using a 3.5.7 kernel with the following list of patches applied:
>> >> >> >> >>
>> >> >> >> >> 1-libceph-encapsulate-out-message-data-setup.patch
>> >> >> >> >> 2-libceph-dont-mark-footer-complete-before-it-is.patch
>> >> >> >> >> 3-libceph-move-init-of-bio_iter.patch
>> >> >> >> >> 4-libceph-dont-use-bio_iter-as-a-flag.patch
>> >> >> >> >> 5-libceph-resubmit-linger-ops-when-pg-mapping-changes.patch
>> >> >> >> >> 6-libceph-re-initialize-bio_iter-on-start-of-message-receive.patch
>> >> >> >> >> 7-ceph-close-old-con-before-reopening-on-mds-reconnect.patch
>> >> >> >> >> 8-libceph-protect-ceph_con_open-with-mutex.patch
>> >> >> >> >> 9-libceph-reset-connection-retry-on-successfully-negotiation.patch
>> >> >> >> >> 10-rbd-only-reset-capacity-when-pointing-to-head.patch
>> >> >> >> >> 11-rbd-set-image-size-when-header-is-updated.patch
>> >> >> >> >> 12-libceph-fix-crypto-key-null-deref-memory-leak.patch
>> >> >> >> >> 13-ceph-tolerate-and-warn-on-extraneous-dentry-from-mds.patch
>> >> >> >> >> 14-ceph-avoid-divide-by-zero-in-__validate_layout.patch
>> >> >> >> >> 15-rbd-drop-dev-reference-on-error-in-rbd_open.patch
>> >> >> >> >> 16-ceph-Fix-oops-when-handling-mdsmap-that-decreases-max_mds.patch
>> >> >> >> >> 17-libceph-check-for-invalid-mapping.patch
>> >> >> >> >> 18-ceph-propagate-layout-error-on-osd-request-creation.patch
>> >> >> >> >> 19-rbd-BUG-on-invalid-layout.patch
>> >> >> >> >> 20-ceph-return-EIO-on-invalid-layout-on-GET_DATALOC-ioctl.patch
>> >> >> >> >> 21-ceph-avoid-32-bit-page-index-overflow.patch
>> >> >> >> >> 23-ceph-fix-dentry-reference-leak-in-encode_fh.patch
>> >> >> >> >>
>> >> >> >> >> Any suggestions?
>> >> >> >> >
>> >> >> >> >
>> >> >> >> > The log shows your monitors don't have time sychronized enough among
>> >> >> >> > them to make much progress (including authenticating new connections).
>> >> >> >> > That's probably the real issue. 0.2s is pretty large clock drift.
>> >> >> >> >
>> >> >> >> >
>> >> >> >> >> One thought is that the following patch (which we could not apply) is
>> >> >> >> >> what is required:
>> >> >> >> >>
>> >> >> >> >> 22-rbd-reset-BACKOFF-if-unable-to-re-queue.patch
>> >> >> >> >
>> >> >> >> >
>> >> >> >> > This is certainly useful too, but I don't think it's the cause of
>> >> >> >> > the delay in this case.
>> >> >> >> >
>> >> >> >> > Josh
>> >> >> >> > --
>> >> >> >> > To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
>> >> >> >> > the body of a message to majordomo@vger.kernel.org
>> >> >> >> > More majordomo info at  http://vger.kernel.org/majordomo-info.html
>> >> >> >> --
>> >> >> >> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
>> >> >> >> the body of a message to majordomo@vger.kernel.org
>> >> >> >> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>> >> >> >>
>> >> >> >>
>> >> >> --
>> >> >> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
>> >> >> the body of a message to majordomo@vger.kernel.org
>> >> >> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>> >> >>
>> >> >>
>> >> --
>> >> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
>> >> the body of a message to majordomo@vger.kernel.org
>> >> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>> >>
>> >>
>>
>>

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: rbd map command hangs for 15 minutes during system start up
  2012-11-16 22:16                   ` Nick Bartos
@ 2012-11-16 22:21                     ` Sage Weil
  2012-11-19 23:04                       ` Nick Bartos
  2012-11-16 22:23                     ` Gregory Farnum
  1 sibling, 1 reply; 56+ messages in thread
From: Sage Weil @ 2012-11-16 22:21 UTC (permalink / raw)
  To: Nick Bartos; +Cc: Josh Durgin, Mandell Degerness, ceph-devel

On Fri, 16 Nov 2012, Nick Bartos wrote:
> Should I be lowering the clock drift allowed, or the lease interval to
> help reproduce it?

clock drift allowed.



> 
> On Fri, Nov 16, 2012 at 2:13 PM, Sage Weil <sage@inktank.com> wrote:
> > You can safely set the clock drift allowed as high as 500ms.  The real
> > limitation is that it needs to be well under the lease interval, which is
> > currently 5 seconds by default.
> >
> > You might be able to reproduce more easily by lowering the threshold...
> >
> > sage
> >
> >
> > On Fri, 16 Nov 2012, Nick Bartos wrote:
> >
> >> How far off do the clocks need to be before there is a problem?  It
> >> would seem to be hard to ensure a very large cluster has all of it's
> >> nodes synchronized within 50ms (which seems to be the default for "mon
> >> clock drift allowed").  Does the mon clock drift allowed parameter
> >> change anything other than the log messages?  Are there any other
> >> tuning options that may help, assuming that this is the issue and it's
> >> not feasible to get the clocks more than 500ms in sync between all
> >> nodes?
> >>
> >> I'm trying to get a good way of reproducing this and get a trace on
> >> the ceph processes to see what they're waiting on.  I'll let you know
> >> when I have more info.
> >>
> >>
> >> On Fri, Nov 16, 2012 at 11:16 AM, Sage Weil <sage@inktank.com> wrote:
> >> > I just realized I was mixing up this thread with the other deadlock
> >> > thread.
> >> >
> >> > On Fri, 16 Nov 2012, Nick Bartos wrote:
> >> >> Turns out we're having the 'rbd map' hang on startup again, after we
> >> >> started using the wip-3.5 patch set.  How critical is the
> >> >> libceph_protect_ceph_con_open_with_mutex commit?  That's the one I
> >> >> removed before which seemed to get rid of the problem (although I'm
> >> >> not completely sure if it completely got rid of it, at least seemed to
> >> >> happen much less often).
> >> >>
> >> >> It seems like we only started having this issue after we started
> >> >> patching the 3.5 ceph client (we started patching to try and get rid
> >> >> of a kernel oops, which the patches seem to have fixed).
> >> >
> >> > Right.  That patch fixes a real bug.  It also seems pretty unlikely that
> >> > this patch is related to the startup hang.  The original log showed clock
> >> > drift on the monitor that could very easily cause this sort of hang.  Can
> >> > you confirm that that isn't the case with this recent instance of the
> >> > problem?  And/or attach a log?
> >> >
> >> > Thanks-
> >> > sage
> >> >
> >> >
> >> >>
> >> >>
> >> >> On Thu, Nov 15, 2012 at 4:25 PM, Sage Weil <sage@inktank.com> wrote:
> >> >> > On Thu, 15 Nov 2012, Nick Bartos wrote:
> >> >> >> Sorry I guess this e-mail got missed.  I believe those patches came
> >> >> >> from the ceph/linux-3.5.5-ceph branch.  I'm now using the wip-3.5
> >> >> >> branch patches, which seem to all be fine.  We'll stick with 3.5 and
> >> >> >> this backport for now until we can figure out what's wrong with 3.6.
> >> >> >>
> >> >> >> I typically ignore the wip branches just due to the naming when I'm
> >> >> >> looking for updates.  Where should I typically look for updates that
> >> >> >> aren't in released kernels?  Also, is there anything else in the wip*
> >> >> >> branches that you think we may find particularly useful?
> >> >> >
> >> >> > You were looking in the right place.  The problem was we weren't super
> >> >> > organized with our stable patches, and changed our minds about what to
> >> >> > send upstream.  These are 'wip' in the sense that they were in preparation
> >> >> > for going upstream.  The goal is to push them to the mainline stable
> >> >> > kernels and ideally not keep them in our tree at all.
> >> >> >
> >> >> > wip-3.5 is an oddity because the mainline stable kernel is EOL'd, but
> >> >> > we're keeping it so that ubuntu can pick it up for quantal.
> >> >> >
> >> >> > I'll make sure these are more clearly marked as stable.
> >> >> >
> >> >> > sage
> >> >> >
> >> >> >
> >> >> >>
> >> >> >>
> >> >> >> On Mon, Nov 12, 2012 at 3:16 PM, Sage Weil <sage@inktank.com> wrote:
> >> >> >> > On Mon, 12 Nov 2012, Nick Bartos wrote:
> >> >> >> >> After removing 8-libceph-protect-ceph_con_open-with-mutex.patch, it
> >> >> >> >> seems we no longer have this hang.
> >> >> >> >
> >> >> >> > Hmm, that's a bit disconcerting.  Did this series come from our old 3.5
> >> >> >> > stable series?  I recently prepared a new one that backports *all* of the
> >> >> >> > fixes from 3.6 to 3.5 (and 3.4); see wip-3.5 in ceph-client.git.  I would
> >> >> >> > be curious if you see problems with that.
> >> >> >> >
> >> >> >> > So far, with these fixes in place, we have not seen any unexplained kernel
> >> >> >> > crashes in this code.
> >> >> >> >
> >> >> >> > I take it you're going back to a 3.5 kernel because you weren't able to
> >> >> >> > get rid of the sync problem with 3.6?
> >> >> >> >
> >> >> >> > sage
> >> >> >> >
> >> >> >> >
> >> >> >> >
> >> >> >> >>
> >> >> >> >> On Thu, Nov 8, 2012 at 5:43 PM, Josh Durgin <josh.durgin@inktank.com> wrote:
> >> >> >> >> > On 11/08/2012 02:10 PM, Mandell Degerness wrote:
> >> >> >> >> >>
> >> >> >> >> >> We are seeing a somewhat random, but frequent hang on our systems
> >> >> >> >> >> during startup.  The hang happens at the point where an "rbd map
> >> >> >> >> >> <rbdvol>" command is run.
> >> >> >> >> >>
> >> >> >> >> >> I've attached the ceph logs from the cluster.  The map command happens
> >> >> >> >> >> at Nov  8 18:41:09 on server 172.18.0.15.  The process which hung can
> >> >> >> >> >> be seen in the log as 172.18.0.15:0/1143980479.
> >> >> >> >> >>
> >> >> >> >> >> It appears as if the TCP socket is opened to the OSD, but then times
> >> >> >> >> >> out 15 minutes later, the process gets data when the socket is closed
> >> >> >> >> >> on the client server and it retries.
> >> >> >> >> >>
> >> >> >> >> >> Please help.
> >> >> >> >> >>
> >> >> >> >> >> We are using ceph version 0.48.2argonaut
> >> >> >> >> >> (commit:3e02b2fad88c2a95d9c0c86878f10d1beb780bfe).
> >> >> >> >> >>
> >> >> >> >> >> We are using a 3.5.7 kernel with the following list of patches applied:
> >> >> >> >> >>
> >> >> >> >> >> 1-libceph-encapsulate-out-message-data-setup.patch
> >> >> >> >> >> 2-libceph-dont-mark-footer-complete-before-it-is.patch
> >> >> >> >> >> 3-libceph-move-init-of-bio_iter.patch
> >> >> >> >> >> 4-libceph-dont-use-bio_iter-as-a-flag.patch
> >> >> >> >> >> 5-libceph-resubmit-linger-ops-when-pg-mapping-changes.patch
> >> >> >> >> >> 6-libceph-re-initialize-bio_iter-on-start-of-message-receive.patch
> >> >> >> >> >> 7-ceph-close-old-con-before-reopening-on-mds-reconnect.patch
> >> >> >> >> >> 8-libceph-protect-ceph_con_open-with-mutex.patch
> >> >> >> >> >> 9-libceph-reset-connection-retry-on-successfully-negotiation.patch
> >> >> >> >> >> 10-rbd-only-reset-capacity-when-pointing-to-head.patch
> >> >> >> >> >> 11-rbd-set-image-size-when-header-is-updated.patch
> >> >> >> >> >> 12-libceph-fix-crypto-key-null-deref-memory-leak.patch
> >> >> >> >> >> 13-ceph-tolerate-and-warn-on-extraneous-dentry-from-mds.patch
> >> >> >> >> >> 14-ceph-avoid-divide-by-zero-in-__validate_layout.patch
> >> >> >> >> >> 15-rbd-drop-dev-reference-on-error-in-rbd_open.patch
> >> >> >> >> >> 16-ceph-Fix-oops-when-handling-mdsmap-that-decreases-max_mds.patch
> >> >> >> >> >> 17-libceph-check-for-invalid-mapping.patch
> >> >> >> >> >> 18-ceph-propagate-layout-error-on-osd-request-creation.patch
> >> >> >> >> >> 19-rbd-BUG-on-invalid-layout.patch
> >> >> >> >> >> 20-ceph-return-EIO-on-invalid-layout-on-GET_DATALOC-ioctl.patch
> >> >> >> >> >> 21-ceph-avoid-32-bit-page-index-overflow.patch
> >> >> >> >> >> 23-ceph-fix-dentry-reference-leak-in-encode_fh.patch
> >> >> >> >> >>
> >> >> >> >> >> Any suggestions?
> >> >> >> >> >
> >> >> >> >> >
> >> >> >> >> > The log shows your monitors don't have time sychronized enough among
> >> >> >> >> > them to make much progress (including authenticating new connections).
> >> >> >> >> > That's probably the real issue. 0.2s is pretty large clock drift.
> >> >> >> >> >
> >> >> >> >> >
> >> >> >> >> >> One thought is that the following patch (which we could not apply) is
> >> >> >> >> >> what is required:
> >> >> >> >> >>
> >> >> >> >> >> 22-rbd-reset-BACKOFF-if-unable-to-re-queue.patch
> >> >> >> >> >
> >> >> >> >> >
> >> >> >> >> > This is certainly useful too, but I don't think it's the cause of
> >> >> >> >> > the delay in this case.
> >> >> >> >> >
> >> >> >> >> > Josh
> >> >> >> >> > --
> >> >> >> >> > To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
> >> >> >> >> > the body of a message to majordomo@vger.kernel.org
> >> >> >> >> > More majordomo info at  http://vger.kernel.org/majordomo-info.html
> >> >> >> >> --
> >> >> >> >> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
> >> >> >> >> the body of a message to majordomo@vger.kernel.org
> >> >> >> >> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> >> >> >> >>
> >> >> >> >>
> >> >> >> --
> >> >> >> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
> >> >> >> the body of a message to majordomo@vger.kernel.org
> >> >> >> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> >> >> >>
> >> >> >>
> >> >> --
> >> >> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
> >> >> the body of a message to majordomo@vger.kernel.org
> >> >> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> >> >>
> >> >>
> >>
> >>
> --
> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 
> 

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: rbd map command hangs for 15 minutes during system start up
  2012-11-16 22:16                   ` Nick Bartos
  2012-11-16 22:21                     ` Sage Weil
@ 2012-11-16 22:23                     ` Gregory Farnum
  1 sibling, 0 replies; 56+ messages in thread
From: Gregory Farnum @ 2012-11-16 22:23 UTC (permalink / raw)
  To: Nick Bartos; +Cc: Sage Weil, Josh Durgin, Mandell Degerness, ceph-devel

To be clear, the monitor cluster needs to be within this clock drift —
the rest of the Ceph cluster can be off by as much as you care to.

(Well, there's also a limit imposed by cephx authorization which can
keep nodes out of the cluster, but that drift allowance is measured in
units of hours.)
-Greg

On Fri, Nov 16, 2012 at 2:16 PM, Nick Bartos <nick@pistoncloud.com> wrote:
> Should I be lowering the clock drift allowed, or the lease interval to
> help reproduce it?
>
> On Fri, Nov 16, 2012 at 2:13 PM, Sage Weil <sage@inktank.com> wrote:
>> You can safely set the clock drift allowed as high as 500ms.  The real
>> limitation is that it needs to be well under the lease interval, which is
>> currently 5 seconds by default.
>>
>> You might be able to reproduce more easily by lowering the threshold...
>>
>> sage
>>
>>
>> On Fri, 16 Nov 2012, Nick Bartos wrote:
>>
>>> How far off do the clocks need to be before there is a problem?  It
>>> would seem to be hard to ensure a very large cluster has all of it's
>>> nodes synchronized within 50ms (which seems to be the default for "mon
>>> clock drift allowed").  Does the mon clock drift allowed parameter
>>> change anything other than the log messages?  Are there any other
>>> tuning options that may help, assuming that this is the issue and it's
>>> not feasible to get the clocks more than 500ms in sync between all
>>> nodes?
>>>
>>> I'm trying to get a good way of reproducing this and get a trace on
>>> the ceph processes to see what they're waiting on.  I'll let you know
>>> when I have more info.
>>>
>>>
>>> On Fri, Nov 16, 2012 at 11:16 AM, Sage Weil <sage@inktank.com> wrote:
>>> > I just realized I was mixing up this thread with the other deadlock
>>> > thread.
>>> >
>>> > On Fri, 16 Nov 2012, Nick Bartos wrote:
>>> >> Turns out we're having the 'rbd map' hang on startup again, after we
>>> >> started using the wip-3.5 patch set.  How critical is the
>>> >> libceph_protect_ceph_con_open_with_mutex commit?  That's the one I
>>> >> removed before which seemed to get rid of the problem (although I'm
>>> >> not completely sure if it completely got rid of it, at least seemed to
>>> >> happen much less often).
>>> >>
>>> >> It seems like we only started having this issue after we started
>>> >> patching the 3.5 ceph client (we started patching to try and get rid
>>> >> of a kernel oops, which the patches seem to have fixed).
>>> >
>>> > Right.  That patch fixes a real bug.  It also seems pretty unlikely that
>>> > this patch is related to the startup hang.  The original log showed clock
>>> > drift on the monitor that could very easily cause this sort of hang.  Can
>>> > you confirm that that isn't the case with this recent instance of the
>>> > problem?  And/or attach a log?
>>> >
>>> > Thanks-
>>> > sage
>>> >
>>> >
>>> >>
>>> >>
>>> >> On Thu, Nov 15, 2012 at 4:25 PM, Sage Weil <sage@inktank.com> wrote:
>>> >> > On Thu, 15 Nov 2012, Nick Bartos wrote:
>>> >> >> Sorry I guess this e-mail got missed.  I believe those patches came
>>> >> >> from the ceph/linux-3.5.5-ceph branch.  I'm now using the wip-3.5
>>> >> >> branch patches, which seem to all be fine.  We'll stick with 3.5 and
>>> >> >> this backport for now until we can figure out what's wrong with 3.6.
>>> >> >>
>>> >> >> I typically ignore the wip branches just due to the naming when I'm
>>> >> >> looking for updates.  Where should I typically look for updates that
>>> >> >> aren't in released kernels?  Also, is there anything else in the wip*
>>> >> >> branches that you think we may find particularly useful?
>>> >> >
>>> >> > You were looking in the right place.  The problem was we weren't super
>>> >> > organized with our stable patches, and changed our minds about what to
>>> >> > send upstream.  These are 'wip' in the sense that they were in preparation
>>> >> > for going upstream.  The goal is to push them to the mainline stable
>>> >> > kernels and ideally not keep them in our tree at all.
>>> >> >
>>> >> > wip-3.5 is an oddity because the mainline stable kernel is EOL'd, but
>>> >> > we're keeping it so that ubuntu can pick it up for quantal.
>>> >> >
>>> >> > I'll make sure these are more clearly marked as stable.
>>> >> >
>>> >> > sage
>>> >> >
>>> >> >
>>> >> >>
>>> >> >>
>>> >> >> On Mon, Nov 12, 2012 at 3:16 PM, Sage Weil <sage@inktank.com> wrote:
>>> >> >> > On Mon, 12 Nov 2012, Nick Bartos wrote:
>>> >> >> >> After removing 8-libceph-protect-ceph_con_open-with-mutex.patch, it
>>> >> >> >> seems we no longer have this hang.
>>> >> >> >
>>> >> >> > Hmm, that's a bit disconcerting.  Did this series come from our old 3.5
>>> >> >> > stable series?  I recently prepared a new one that backports *all* of the
>>> >> >> > fixes from 3.6 to 3.5 (and 3.4); see wip-3.5 in ceph-client.git.  I would
>>> >> >> > be curious if you see problems with that.
>>> >> >> >
>>> >> >> > So far, with these fixes in place, we have not seen any unexplained kernel
>>> >> >> > crashes in this code.
>>> >> >> >
>>> >> >> > I take it you're going back to a 3.5 kernel because you weren't able to
>>> >> >> > get rid of the sync problem with 3.6?
>>> >> >> >
>>> >> >> > sage
>>> >> >> >
>>> >> >> >
>>> >> >> >
>>> >> >> >>
>>> >> >> >> On Thu, Nov 8, 2012 at 5:43 PM, Josh Durgin <josh.durgin@inktank.com> wrote:
>>> >> >> >> > On 11/08/2012 02:10 PM, Mandell Degerness wrote:
>>> >> >> >> >>
>>> >> >> >> >> We are seeing a somewhat random, but frequent hang on our systems
>>> >> >> >> >> during startup.  The hang happens at the point where an "rbd map
>>> >> >> >> >> <rbdvol>" command is run.
>>> >> >> >> >>
>>> >> >> >> >> I've attached the ceph logs from the cluster.  The map command happens
>>> >> >> >> >> at Nov  8 18:41:09 on server 172.18.0.15.  The process which hung can
>>> >> >> >> >> be seen in the log as 172.18.0.15:0/1143980479.
>>> >> >> >> >>
>>> >> >> >> >> It appears as if the TCP socket is opened to the OSD, but then times
>>> >> >> >> >> out 15 minutes later, the process gets data when the socket is closed
>>> >> >> >> >> on the client server and it retries.
>>> >> >> >> >>
>>> >> >> >> >> Please help.
>>> >> >> >> >>
>>> >> >> >> >> We are using ceph version 0.48.2argonaut
>>> >> >> >> >> (commit:3e02b2fad88c2a95d9c0c86878f10d1beb780bfe).
>>> >> >> >> >>
>>> >> >> >> >> We are using a 3.5.7 kernel with the following list of patches applied:
>>> >> >> >> >>
>>> >> >> >> >> 1-libceph-encapsulate-out-message-data-setup.patch
>>> >> >> >> >> 2-libceph-dont-mark-footer-complete-before-it-is.patch
>>> >> >> >> >> 3-libceph-move-init-of-bio_iter.patch
>>> >> >> >> >> 4-libceph-dont-use-bio_iter-as-a-flag.patch
>>> >> >> >> >> 5-libceph-resubmit-linger-ops-when-pg-mapping-changes.patch
>>> >> >> >> >> 6-libceph-re-initialize-bio_iter-on-start-of-message-receive.patch
>>> >> >> >> >> 7-ceph-close-old-con-before-reopening-on-mds-reconnect.patch
>>> >> >> >> >> 8-libceph-protect-ceph_con_open-with-mutex.patch
>>> >> >> >> >> 9-libceph-reset-connection-retry-on-successfully-negotiation.patch
>>> >> >> >> >> 10-rbd-only-reset-capacity-when-pointing-to-head.patch
>>> >> >> >> >> 11-rbd-set-image-size-when-header-is-updated.patch
>>> >> >> >> >> 12-libceph-fix-crypto-key-null-deref-memory-leak.patch
>>> >> >> >> >> 13-ceph-tolerate-and-warn-on-extraneous-dentry-from-mds.patch
>>> >> >> >> >> 14-ceph-avoid-divide-by-zero-in-__validate_layout.patch
>>> >> >> >> >> 15-rbd-drop-dev-reference-on-error-in-rbd_open.patch
>>> >> >> >> >> 16-ceph-Fix-oops-when-handling-mdsmap-that-decreases-max_mds.patch
>>> >> >> >> >> 17-libceph-check-for-invalid-mapping.patch
>>> >> >> >> >> 18-ceph-propagate-layout-error-on-osd-request-creation.patch
>>> >> >> >> >> 19-rbd-BUG-on-invalid-layout.patch
>>> >> >> >> >> 20-ceph-return-EIO-on-invalid-layout-on-GET_DATALOC-ioctl.patch
>>> >> >> >> >> 21-ceph-avoid-32-bit-page-index-overflow.patch
>>> >> >> >> >> 23-ceph-fix-dentry-reference-leak-in-encode_fh.patch
>>> >> >> >> >>
>>> >> >> >> >> Any suggestions?
>>> >> >> >> >
>>> >> >> >> >
>>> >> >> >> > The log shows your monitors don't have time sychronized enough among
>>> >> >> >> > them to make much progress (including authenticating new connections).
>>> >> >> >> > That's probably the real issue. 0.2s is pretty large clock drift.
>>> >> >> >> >
>>> >> >> >> >
>>> >> >> >> >> One thought is that the following patch (which we could not apply) is
>>> >> >> >> >> what is required:
>>> >> >> >> >>
>>> >> >> >> >> 22-rbd-reset-BACKOFF-if-unable-to-re-queue.patch
>>> >> >> >> >
>>> >> >> >> >
>>> >> >> >> > This is certainly useful too, but I don't think it's the cause of
>>> >> >> >> > the delay in this case.
>>> >> >> >> >
>>> >> >> >> > Josh
>>> >> >> >> > --
>>> >> >> >> > To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
>>> >> >> >> > the body of a message to majordomo@vger.kernel.org
>>> >> >> >> > More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>> >> >> >> --
>>> >> >> >> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
>>> >> >> >> the body of a message to majordomo@vger.kernel.org
>>> >> >> >> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>> >> >> >>
>>> >> >> >>
>>> >> >> --
>>> >> >> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
>>> >> >> the body of a message to majordomo@vger.kernel.org
>>> >> >> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>> >> >>
>>> >> >>
>>> >> --
>>> >> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
>>> >> the body of a message to majordomo@vger.kernel.org
>>> >> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>> >>
>>> >>
>>>
>>>
> --
> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: rbd map command hangs for 15 minutes during system start up
  2012-11-16 22:21                     ` Sage Weil
@ 2012-11-19 23:04                       ` Nick Bartos
  2012-11-19 23:34                         ` Gregory Farnum
  0 siblings, 1 reply; 56+ messages in thread
From: Nick Bartos @ 2012-11-19 23:04 UTC (permalink / raw)
  To: Sage Weil; +Cc: Josh Durgin, Mandell Degerness, ceph-devel

Making 'mon clock drift allowed' very small (0.00001) does not
reliably reproduce the hang.  I started looking at the code for 0.48.2
and it looks like this is only used in Paxos::warn_on_future_time,
which only handles the warning, nothing else.


On Fri, Nov 16, 2012 at 2:21 PM, Sage Weil <sage@inktank.com> wrote:
> On Fri, 16 Nov 2012, Nick Bartos wrote:
>> Should I be lowering the clock drift allowed, or the lease interval to
>> help reproduce it?
>
> clock drift allowed.
>
>
>
>>
>> On Fri, Nov 16, 2012 at 2:13 PM, Sage Weil <sage@inktank.com> wrote:
>> > You can safely set the clock drift allowed as high as 500ms.  The real
>> > limitation is that it needs to be well under the lease interval, which is
>> > currently 5 seconds by default.
>> >
>> > You might be able to reproduce more easily by lowering the threshold...
>> >
>> > sage
>> >
>> >
>> > On Fri, 16 Nov 2012, Nick Bartos wrote:
>> >
>> >> How far off do the clocks need to be before there is a problem?  It
>> >> would seem to be hard to ensure a very large cluster has all of it's
>> >> nodes synchronized within 50ms (which seems to be the default for "mon
>> >> clock drift allowed").  Does the mon clock drift allowed parameter
>> >> change anything other than the log messages?  Are there any other
>> >> tuning options that may help, assuming that this is the issue and it's
>> >> not feasible to get the clocks more than 500ms in sync between all
>> >> nodes?
>> >>
>> >> I'm trying to get a good way of reproducing this and get a trace on
>> >> the ceph processes to see what they're waiting on.  I'll let you know
>> >> when I have more info.
>> >>
>> >>
>> >> On Fri, Nov 16, 2012 at 11:16 AM, Sage Weil <sage@inktank.com> wrote:
>> >> > I just realized I was mixing up this thread with the other deadlock
>> >> > thread.
>> >> >
>> >> > On Fri, 16 Nov 2012, Nick Bartos wrote:
>> >> >> Turns out we're having the 'rbd map' hang on startup again, after we
>> >> >> started using the wip-3.5 patch set.  How critical is the
>> >> >> libceph_protect_ceph_con_open_with_mutex commit?  That's the one I
>> >> >> removed before which seemed to get rid of the problem (although I'm
>> >> >> not completely sure if it completely got rid of it, at least seemed to
>> >> >> happen much less often).
>> >> >>
>> >> >> It seems like we only started having this issue after we started
>> >> >> patching the 3.5 ceph client (we started patching to try and get rid
>> >> >> of a kernel oops, which the patches seem to have fixed).
>> >> >
>> >> > Right.  That patch fixes a real bug.  It also seems pretty unlikely that
>> >> > this patch is related to the startup hang.  The original log showed clock
>> >> > drift on the monitor that could very easily cause this sort of hang.  Can
>> >> > you confirm that that isn't the case with this recent instance of the
>> >> > problem?  And/or attach a log?
>> >> >
>> >> > Thanks-
>> >> > sage
>> >> >
>> >> >
>> >> >>
>> >> >>
>> >> >> On Thu, Nov 15, 2012 at 4:25 PM, Sage Weil <sage@inktank.com> wrote:
>> >> >> > On Thu, 15 Nov 2012, Nick Bartos wrote:
>> >> >> >> Sorry I guess this e-mail got missed.  I believe those patches came
>> >> >> >> from the ceph/linux-3.5.5-ceph branch.  I'm now using the wip-3.5
>> >> >> >> branch patches, which seem to all be fine.  We'll stick with 3.5 and
>> >> >> >> this backport for now until we can figure out what's wrong with 3.6.
>> >> >> >>
>> >> >> >> I typically ignore the wip branches just due to the naming when I'm
>> >> >> >> looking for updates.  Where should I typically look for updates that
>> >> >> >> aren't in released kernels?  Also, is there anything else in the wip*
>> >> >> >> branches that you think we may find particularly useful?
>> >> >> >
>> >> >> > You were looking in the right place.  The problem was we weren't super
>> >> >> > organized with our stable patches, and changed our minds about what to
>> >> >> > send upstream.  These are 'wip' in the sense that they were in preparation
>> >> >> > for going upstream.  The goal is to push them to the mainline stable
>> >> >> > kernels and ideally not keep them in our tree at all.
>> >> >> >
>> >> >> > wip-3.5 is an oddity because the mainline stable kernel is EOL'd, but
>> >> >> > we're keeping it so that ubuntu can pick it up for quantal.
>> >> >> >
>> >> >> > I'll make sure these are more clearly marked as stable.
>> >> >> >
>> >> >> > sage
>> >> >> >
>> >> >> >
>> >> >> >>
>> >> >> >>
>> >> >> >> On Mon, Nov 12, 2012 at 3:16 PM, Sage Weil <sage@inktank.com> wrote:
>> >> >> >> > On Mon, 12 Nov 2012, Nick Bartos wrote:
>> >> >> >> >> After removing 8-libceph-protect-ceph_con_open-with-mutex.patch, it
>> >> >> >> >> seems we no longer have this hang.
>> >> >> >> >
>> >> >> >> > Hmm, that's a bit disconcerting.  Did this series come from our old 3.5
>> >> >> >> > stable series?  I recently prepared a new one that backports *all* of the
>> >> >> >> > fixes from 3.6 to 3.5 (and 3.4); see wip-3.5 in ceph-client.git.  I would
>> >> >> >> > be curious if you see problems with that.
>> >> >> >> >
>> >> >> >> > So far, with these fixes in place, we have not seen any unexplained kernel
>> >> >> >> > crashes in this code.
>> >> >> >> >
>> >> >> >> > I take it you're going back to a 3.5 kernel because you weren't able to
>> >> >> >> > get rid of the sync problem with 3.6?
>> >> >> >> >
>> >> >> >> > sage
>> >> >> >> >
>> >> >> >> >
>> >> >> >> >
>> >> >> >> >>
>> >> >> >> >> On Thu, Nov 8, 2012 at 5:43 PM, Josh Durgin <josh.durgin@inktank.com> wrote:
>> >> >> >> >> > On 11/08/2012 02:10 PM, Mandell Degerness wrote:
>> >> >> >> >> >>
>> >> >> >> >> >> We are seeing a somewhat random, but frequent hang on our systems
>> >> >> >> >> >> during startup.  The hang happens at the point where an "rbd map
>> >> >> >> >> >> <rbdvol>" command is run.
>> >> >> >> >> >>
>> >> >> >> >> >> I've attached the ceph logs from the cluster.  The map command happens
>> >> >> >> >> >> at Nov  8 18:41:09 on server 172.18.0.15.  The process which hung can
>> >> >> >> >> >> be seen in the log as 172.18.0.15:0/1143980479.
>> >> >> >> >> >>
>> >> >> >> >> >> It appears as if the TCP socket is opened to the OSD, but then times
>> >> >> >> >> >> out 15 minutes later, the process gets data when the socket is closed
>> >> >> >> >> >> on the client server and it retries.
>> >> >> >> >> >>
>> >> >> >> >> >> Please help.
>> >> >> >> >> >>
>> >> >> >> >> >> We are using ceph version 0.48.2argonaut
>> >> >> >> >> >> (commit:3e02b2fad88c2a95d9c0c86878f10d1beb780bfe).
>> >> >> >> >> >>
>> >> >> >> >> >> We are using a 3.5.7 kernel with the following list of patches applied:
>> >> >> >> >> >>
>> >> >> >> >> >> 1-libceph-encapsulate-out-message-data-setup.patch
>> >> >> >> >> >> 2-libceph-dont-mark-footer-complete-before-it-is.patch
>> >> >> >> >> >> 3-libceph-move-init-of-bio_iter.patch
>> >> >> >> >> >> 4-libceph-dont-use-bio_iter-as-a-flag.patch
>> >> >> >> >> >> 5-libceph-resubmit-linger-ops-when-pg-mapping-changes.patch
>> >> >> >> >> >> 6-libceph-re-initialize-bio_iter-on-start-of-message-receive.patch
>> >> >> >> >> >> 7-ceph-close-old-con-before-reopening-on-mds-reconnect.patch
>> >> >> >> >> >> 8-libceph-protect-ceph_con_open-with-mutex.patch
>> >> >> >> >> >> 9-libceph-reset-connection-retry-on-successfully-negotiation.patch
>> >> >> >> >> >> 10-rbd-only-reset-capacity-when-pointing-to-head.patch
>> >> >> >> >> >> 11-rbd-set-image-size-when-header-is-updated.patch
>> >> >> >> >> >> 12-libceph-fix-crypto-key-null-deref-memory-leak.patch
>> >> >> >> >> >> 13-ceph-tolerate-and-warn-on-extraneous-dentry-from-mds.patch
>> >> >> >> >> >> 14-ceph-avoid-divide-by-zero-in-__validate_layout.patch
>> >> >> >> >> >> 15-rbd-drop-dev-reference-on-error-in-rbd_open.patch
>> >> >> >> >> >> 16-ceph-Fix-oops-when-handling-mdsmap-that-decreases-max_mds.patch
>> >> >> >> >> >> 17-libceph-check-for-invalid-mapping.patch
>> >> >> >> >> >> 18-ceph-propagate-layout-error-on-osd-request-creation.patch
>> >> >> >> >> >> 19-rbd-BUG-on-invalid-layout.patch
>> >> >> >> >> >> 20-ceph-return-EIO-on-invalid-layout-on-GET_DATALOC-ioctl.patch
>> >> >> >> >> >> 21-ceph-avoid-32-bit-page-index-overflow.patch
>> >> >> >> >> >> 23-ceph-fix-dentry-reference-leak-in-encode_fh.patch
>> >> >> >> >> >>
>> >> >> >> >> >> Any suggestions?
>> >> >> >> >> >
>> >> >> >> >> >
>> >> >> >> >> > The log shows your monitors don't have time sychronized enough among
>> >> >> >> >> > them to make much progress (including authenticating new connections).
>> >> >> >> >> > That's probably the real issue. 0.2s is pretty large clock drift.
>> >> >> >> >> >
>> >> >> >> >> >
>> >> >> >> >> >> One thought is that the following patch (which we could not apply) is
>> >> >> >> >> >> what is required:
>> >> >> >> >> >>
>> >> >> >> >> >> 22-rbd-reset-BACKOFF-if-unable-to-re-queue.patch
>> >> >> >> >> >
>> >> >> >> >> >
>> >> >> >> >> > This is certainly useful too, but I don't think it's the cause of
>> >> >> >> >> > the delay in this case.
>> >> >> >> >> >
>> >> >> >> >> > Josh
>> >> >> >> >> > --
>> >> >> >> >> > To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
>> >> >> >> >> > the body of a message to majordomo@vger.kernel.org
>> >> >> >> >> > More majordomo info at  http://vger.kernel.org/majordomo-info.html
>> >> >> >> >> --
>> >> >> >> >> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
>> >> >> >> >> the body of a message to majordomo@vger.kernel.org
>> >> >> >> >> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>> >> >> >> >>
>> >> >> >> >>
>> >> >> >> --
>> >> >> >> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
>> >> >> >> the body of a message to majordomo@vger.kernel.org
>> >> >> >> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>> >> >> >>
>> >> >> >>
>> >> >> --
>> >> >> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
>> >> >> the body of a message to majordomo@vger.kernel.org
>> >> >> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>> >> >>
>> >> >>
>> >>
>> >>
>> --
>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
>> the body of a message to majordomo@vger.kernel.org
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>
>>

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: rbd map command hangs for 15 minutes during system start up
  2012-11-19 23:04                       ` Nick Bartos
@ 2012-11-19 23:34                         ` Gregory Farnum
  2012-11-20 21:53                           ` Nick Bartos
  0 siblings, 1 reply; 56+ messages in thread
From: Gregory Farnum @ 2012-11-19 23:34 UTC (permalink / raw)
  To: Nick Bartos; +Cc: Sage Weil, Josh Durgin, Mandell Degerness, ceph-devel

Hmm, yep — that param is actually only used for the warning; I guess
we forgot what it actually covers. :(

Have your monitor clocks been off by more than 5 seconds at any point?

On Mon, Nov 19, 2012 at 3:04 PM, Nick Bartos <nick@pistoncloud.com> wrote:
> Making 'mon clock drift allowed' very small (0.00001) does not
> reliably reproduce the hang.  I started looking at the code for 0.48.2
> and it looks like this is only used in Paxos::warn_on_future_time,
> which only handles the warning, nothing else.
>
>
> On Fri, Nov 16, 2012 at 2:21 PM, Sage Weil <sage@inktank.com> wrote:
>> On Fri, 16 Nov 2012, Nick Bartos wrote:
>>> Should I be lowering the clock drift allowed, or the lease interval to
>>> help reproduce it?
>>
>> clock drift allowed.
>>
>>
>>
>>>
>>> On Fri, Nov 16, 2012 at 2:13 PM, Sage Weil <sage@inktank.com> wrote:
>>> > You can safely set the clock drift allowed as high as 500ms.  The real
>>> > limitation is that it needs to be well under the lease interval, which is
>>> > currently 5 seconds by default.
>>> >
>>> > You might be able to reproduce more easily by lowering the threshold...
>>> >
>>> > sage
>>> >
>>> >
>>> > On Fri, 16 Nov 2012, Nick Bartos wrote:
>>> >
>>> >> How far off do the clocks need to be before there is a problem?  It
>>> >> would seem to be hard to ensure a very large cluster has all of it's
>>> >> nodes synchronized within 50ms (which seems to be the default for "mon
>>> >> clock drift allowed").  Does the mon clock drift allowed parameter
>>> >> change anything other than the log messages?  Are there any other
>>> >> tuning options that may help, assuming that this is the issue and it's
>>> >> not feasible to get the clocks more than 500ms in sync between all
>>> >> nodes?
>>> >>
>>> >> I'm trying to get a good way of reproducing this and get a trace on
>>> >> the ceph processes to see what they're waiting on.  I'll let you know
>>> >> when I have more info.
>>> >>
>>> >>
>>> >> On Fri, Nov 16, 2012 at 11:16 AM, Sage Weil <sage@inktank.com> wrote:
>>> >> > I just realized I was mixing up this thread with the other deadlock
>>> >> > thread.
>>> >> >
>>> >> > On Fri, 16 Nov 2012, Nick Bartos wrote:
>>> >> >> Turns out we're having the 'rbd map' hang on startup again, after we
>>> >> >> started using the wip-3.5 patch set.  How critical is the
>>> >> >> libceph_protect_ceph_con_open_with_mutex commit?  That's the one I
>>> >> >> removed before which seemed to get rid of the problem (although I'm
>>> >> >> not completely sure if it completely got rid of it, at least seemed to
>>> >> >> happen much less often).
>>> >> >>
>>> >> >> It seems like we only started having this issue after we started
>>> >> >> patching the 3.5 ceph client (we started patching to try and get rid
>>> >> >> of a kernel oops, which the patches seem to have fixed).
>>> >> >
>>> >> > Right.  That patch fixes a real bug.  It also seems pretty unlikely that
>>> >> > this patch is related to the startup hang.  The original log showed clock
>>> >> > drift on the monitor that could very easily cause this sort of hang.  Can
>>> >> > you confirm that that isn't the case with this recent instance of the
>>> >> > problem?  And/or attach a log?
>>> >> >
>>> >> > Thanks-
>>> >> > sage
>>> >> >
>>> >> >
>>> >> >>
>>> >> >>
>>> >> >> On Thu, Nov 15, 2012 at 4:25 PM, Sage Weil <sage@inktank.com> wrote:
>>> >> >> > On Thu, 15 Nov 2012, Nick Bartos wrote:
>>> >> >> >> Sorry I guess this e-mail got missed.  I believe those patches came
>>> >> >> >> from the ceph/linux-3.5.5-ceph branch.  I'm now using the wip-3.5
>>> >> >> >> branch patches, which seem to all be fine.  We'll stick with 3.5 and
>>> >> >> >> this backport for now until we can figure out what's wrong with 3.6.
>>> >> >> >>
>>> >> >> >> I typically ignore the wip branches just due to the naming when I'm
>>> >> >> >> looking for updates.  Where should I typically look for updates that
>>> >> >> >> aren't in released kernels?  Also, is there anything else in the wip*
>>> >> >> >> branches that you think we may find particularly useful?
>>> >> >> >
>>> >> >> > You were looking in the right place.  The problem was we weren't super
>>> >> >> > organized with our stable patches, and changed our minds about what to
>>> >> >> > send upstream.  These are 'wip' in the sense that they were in preparation
>>> >> >> > for going upstream.  The goal is to push them to the mainline stable
>>> >> >> > kernels and ideally not keep them in our tree at all.
>>> >> >> >
>>> >> >> > wip-3.5 is an oddity because the mainline stable kernel is EOL'd, but
>>> >> >> > we're keeping it so that ubuntu can pick it up for quantal.
>>> >> >> >
>>> >> >> > I'll make sure these are more clearly marked as stable.
>>> >> >> >
>>> >> >> > sage
>>> >> >> >
>>> >> >> >
>>> >> >> >>
>>> >> >> >>
>>> >> >> >> On Mon, Nov 12, 2012 at 3:16 PM, Sage Weil <sage@inktank.com> wrote:
>>> >> >> >> > On Mon, 12 Nov 2012, Nick Bartos wrote:
>>> >> >> >> >> After removing 8-libceph-protect-ceph_con_open-with-mutex.patch, it
>>> >> >> >> >> seems we no longer have this hang.
>>> >> >> >> >
>>> >> >> >> > Hmm, that's a bit disconcerting.  Did this series come from our old 3.5
>>> >> >> >> > stable series?  I recently prepared a new one that backports *all* of the
>>> >> >> >> > fixes from 3.6 to 3.5 (and 3.4); see wip-3.5 in ceph-client.git.  I would
>>> >> >> >> > be curious if you see problems with that.
>>> >> >> >> >
>>> >> >> >> > So far, with these fixes in place, we have not seen any unexplained kernel
>>> >> >> >> > crashes in this code.
>>> >> >> >> >
>>> >> >> >> > I take it you're going back to a 3.5 kernel because you weren't able to
>>> >> >> >> > get rid of the sync problem with 3.6?
>>> >> >> >> >
>>> >> >> >> > sage
>>> >> >> >> >
>>> >> >> >> >
>>> >> >> >> >
>>> >> >> >> >>
>>> >> >> >> >> On Thu, Nov 8, 2012 at 5:43 PM, Josh Durgin <josh.durgin@inktank.com> wrote:
>>> >> >> >> >> > On 11/08/2012 02:10 PM, Mandell Degerness wrote:
>>> >> >> >> >> >>
>>> >> >> >> >> >> We are seeing a somewhat random, but frequent hang on our systems
>>> >> >> >> >> >> during startup.  The hang happens at the point where an "rbd map
>>> >> >> >> >> >> <rbdvol>" command is run.
>>> >> >> >> >> >>
>>> >> >> >> >> >> I've attached the ceph logs from the cluster.  The map command happens
>>> >> >> >> >> >> at Nov  8 18:41:09 on server 172.18.0.15.  The process which hung can
>>> >> >> >> >> >> be seen in the log as 172.18.0.15:0/1143980479.
>>> >> >> >> >> >>
>>> >> >> >> >> >> It appears as if the TCP socket is opened to the OSD, but then times
>>> >> >> >> >> >> out 15 minutes later, the process gets data when the socket is closed
>>> >> >> >> >> >> on the client server and it retries.
>>> >> >> >> >> >>
>>> >> >> >> >> >> Please help.
>>> >> >> >> >> >>
>>> >> >> >> >> >> We are using ceph version 0.48.2argonaut
>>> >> >> >> >> >> (commit:3e02b2fad88c2a95d9c0c86878f10d1beb780bfe).
>>> >> >> >> >> >>
>>> >> >> >> >> >> We are using a 3.5.7 kernel with the following list of patches applied:
>>> >> >> >> >> >>
>>> >> >> >> >> >> 1-libceph-encapsulate-out-message-data-setup.patch
>>> >> >> >> >> >> 2-libceph-dont-mark-footer-complete-before-it-is.patch
>>> >> >> >> >> >> 3-libceph-move-init-of-bio_iter.patch
>>> >> >> >> >> >> 4-libceph-dont-use-bio_iter-as-a-flag.patch
>>> >> >> >> >> >> 5-libceph-resubmit-linger-ops-when-pg-mapping-changes.patch
>>> >> >> >> >> >> 6-libceph-re-initialize-bio_iter-on-start-of-message-receive.patch
>>> >> >> >> >> >> 7-ceph-close-old-con-before-reopening-on-mds-reconnect.patch
>>> >> >> >> >> >> 8-libceph-protect-ceph_con_open-with-mutex.patch
>>> >> >> >> >> >> 9-libceph-reset-connection-retry-on-successfully-negotiation.patch
>>> >> >> >> >> >> 10-rbd-only-reset-capacity-when-pointing-to-head.patch
>>> >> >> >> >> >> 11-rbd-set-image-size-when-header-is-updated.patch
>>> >> >> >> >> >> 12-libceph-fix-crypto-key-null-deref-memory-leak.patch
>>> >> >> >> >> >> 13-ceph-tolerate-and-warn-on-extraneous-dentry-from-mds.patch
>>> >> >> >> >> >> 14-ceph-avoid-divide-by-zero-in-__validate_layout.patch
>>> >> >> >> >> >> 15-rbd-drop-dev-reference-on-error-in-rbd_open.patch
>>> >> >> >> >> >> 16-ceph-Fix-oops-when-handling-mdsmap-that-decreases-max_mds.patch
>>> >> >> >> >> >> 17-libceph-check-for-invalid-mapping.patch
>>> >> >> >> >> >> 18-ceph-propagate-layout-error-on-osd-request-creation.patch
>>> >> >> >> >> >> 19-rbd-BUG-on-invalid-layout.patch
>>> >> >> >> >> >> 20-ceph-return-EIO-on-invalid-layout-on-GET_DATALOC-ioctl.patch
>>> >> >> >> >> >> 21-ceph-avoid-32-bit-page-index-overflow.patch
>>> >> >> >> >> >> 23-ceph-fix-dentry-reference-leak-in-encode_fh.patch
>>> >> >> >> >> >>
>>> >> >> >> >> >> Any suggestions?
>>> >> >> >> >> >
>>> >> >> >> >> >
>>> >> >> >> >> > The log shows your monitors don't have time sychronized enough among
>>> >> >> >> >> > them to make much progress (including authenticating new connections).
>>> >> >> >> >> > That's probably the real issue. 0.2s is pretty large clock drift.
>>> >> >> >> >> >
>>> >> >> >> >> >
>>> >> >> >> >> >> One thought is that the following patch (which we could not apply) is
>>> >> >> >> >> >> what is required:
>>> >> >> >> >> >>
>>> >> >> >> >> >> 22-rbd-reset-BACKOFF-if-unable-to-re-queue.patch
>>> >> >> >> >> >
>>> >> >> >> >> >
>>> >> >> >> >> > This is certainly useful too, but I don't think it's the cause of
>>> >> >> >> >> > the delay in this case.
>>> >> >> >> >> >
>>> >> >> >> >> > Josh
>>> >> >> >> >> > --
>>> >> >> >> >> > To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
>>> >> >> >> >> > the body of a message to majordomo@vger.kernel.org
>>> >> >> >> >> > More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>> >> >> >> >> --
>>> >> >> >> >> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
>>> >> >> >> >> the body of a message to majordomo@vger.kernel.org
>>> >> >> >> >> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>> >> >> >> >>
>>> >> >> >> >>
>>> >> >> >> --
>>> >> >> >> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
>>> >> >> >> the body of a message to majordomo@vger.kernel.org
>>> >> >> >> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>> >> >> >>
>>> >> >> >>
>>> >> >> --
>>> >> >> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
>>> >> >> the body of a message to majordomo@vger.kernel.org
>>> >> >> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>> >> >>
>>> >> >>
>>> >>
>>> >>
>>> --
>>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
>>> the body of a message to majordomo@vger.kernel.org
>>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>>
>>>
> --
> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: rbd map command hangs for 15 minutes during system start up
  2012-11-19 23:34                         ` Gregory Farnum
@ 2012-11-20 21:53                           ` Nick Bartos
  2012-11-21  1:31                             ` Nick Bartos
  0 siblings, 1 reply; 56+ messages in thread
From: Nick Bartos @ 2012-11-20 21:53 UTC (permalink / raw)
  To: Gregory Farnum; +Cc: Sage Weil, Josh Durgin, Mandell Degerness, ceph-devel

I reproduced the problem and got several sysrq states captured.
During this run, the monitor running on the host complained a few
times about the clocks being off, but all messages were for under 0.55
seconds.

Here are the kernel logs.  Note that there are several traces, I
thought multiple during the incident may help:
https://raw.github.com/gist/4121395/a6dda7552ed8a45725ee5d632fe3ba38703f8cfc/gistfile1.txt


On Mon, Nov 19, 2012 at 3:34 PM, Gregory Farnum <greg@inktank.com> wrote:
> Hmm, yep — that param is actually only used for the warning; I guess
> we forgot what it actually covers. :(
>
> Have your monitor clocks been off by more than 5 seconds at any point?
>
> On Mon, Nov 19, 2012 at 3:04 PM, Nick Bartos <nick@pistoncloud.com> wrote:
>> Making 'mon clock drift allowed' very small (0.00001) does not
>> reliably reproduce the hang.  I started looking at the code for 0.48.2
>> and it looks like this is only used in Paxos::warn_on_future_time,
>> which only handles the warning, nothing else.
>>
>>
>> On Fri, Nov 16, 2012 at 2:21 PM, Sage Weil <sage@inktank.com> wrote:
>>> On Fri, 16 Nov 2012, Nick Bartos wrote:
>>>> Should I be lowering the clock drift allowed, or the lease interval to
>>>> help reproduce it?
>>>
>>> clock drift allowed.
>>>
>>>
>>>
>>>>
>>>> On Fri, Nov 16, 2012 at 2:13 PM, Sage Weil <sage@inktank.com> wrote:
>>>> > You can safely set the clock drift allowed as high as 500ms.  The real
>>>> > limitation is that it needs to be well under the lease interval, which is
>>>> > currently 5 seconds by default.
>>>> >
>>>> > You might be able to reproduce more easily by lowering the threshold...
>>>> >
>>>> > sage
>>>> >
>>>> >
>>>> > On Fri, 16 Nov 2012, Nick Bartos wrote:
>>>> >
>>>> >> How far off do the clocks need to be before there is a problem?  It
>>>> >> would seem to be hard to ensure a very large cluster has all of it's
>>>> >> nodes synchronized within 50ms (which seems to be the default for "mon
>>>> >> clock drift allowed").  Does the mon clock drift allowed parameter
>>>> >> change anything other than the log messages?  Are there any other
>>>> >> tuning options that may help, assuming that this is the issue and it's
>>>> >> not feasible to get the clocks more than 500ms in sync between all
>>>> >> nodes?
>>>> >>
>>>> >> I'm trying to get a good way of reproducing this and get a trace on
>>>> >> the ceph processes to see what they're waiting on.  I'll let you know
>>>> >> when I have more info.
>>>> >>
>>>> >>
>>>> >> On Fri, Nov 16, 2012 at 11:16 AM, Sage Weil <sage@inktank.com> wrote:
>>>> >> > I just realized I was mixing up this thread with the other deadlock
>>>> >> > thread.
>>>> >> >
>>>> >> > On Fri, 16 Nov 2012, Nick Bartos wrote:
>>>> >> >> Turns out we're having the 'rbd map' hang on startup again, after we
>>>> >> >> started using the wip-3.5 patch set.  How critical is the
>>>> >> >> libceph_protect_ceph_con_open_with_mutex commit?  That's the one I
>>>> >> >> removed before which seemed to get rid of the problem (although I'm
>>>> >> >> not completely sure if it completely got rid of it, at least seemed to
>>>> >> >> happen much less often).
>>>> >> >>
>>>> >> >> It seems like we only started having this issue after we started
>>>> >> >> patching the 3.5 ceph client (we started patching to try and get rid
>>>> >> >> of a kernel oops, which the patches seem to have fixed).
>>>> >> >
>>>> >> > Right.  That patch fixes a real bug.  It also seems pretty unlikely that
>>>> >> > this patch is related to the startup hang.  The original log showed clock
>>>> >> > drift on the monitor that could very easily cause this sort of hang.  Can
>>>> >> > you confirm that that isn't the case with this recent instance of the
>>>> >> > problem?  And/or attach a log?
>>>> >> >
>>>> >> > Thanks-
>>>> >> > sage
>>>> >> >
>>>> >> >
>>>> >> >>
>>>> >> >>
>>>> >> >> On Thu, Nov 15, 2012 at 4:25 PM, Sage Weil <sage@inktank.com> wrote:
>>>> >> >> > On Thu, 15 Nov 2012, Nick Bartos wrote:
>>>> >> >> >> Sorry I guess this e-mail got missed.  I believe those patches came
>>>> >> >> >> from the ceph/linux-3.5.5-ceph branch.  I'm now using the wip-3.5
>>>> >> >> >> branch patches, which seem to all be fine.  We'll stick with 3.5 and
>>>> >> >> >> this backport for now until we can figure out what's wrong with 3.6.
>>>> >> >> >>
>>>> >> >> >> I typically ignore the wip branches just due to the naming when I'm
>>>> >> >> >> looking for updates.  Where should I typically look for updates that
>>>> >> >> >> aren't in released kernels?  Also, is there anything else in the wip*
>>>> >> >> >> branches that you think we may find particularly useful?
>>>> >> >> >
>>>> >> >> > You were looking in the right place.  The problem was we weren't super
>>>> >> >> > organized with our stable patches, and changed our minds about what to
>>>> >> >> > send upstream.  These are 'wip' in the sense that they were in preparation
>>>> >> >> > for going upstream.  The goal is to push them to the mainline stable
>>>> >> >> > kernels and ideally not keep them in our tree at all.
>>>> >> >> >
>>>> >> >> > wip-3.5 is an oddity because the mainline stable kernel is EOL'd, but
>>>> >> >> > we're keeping it so that ubuntu can pick it up for quantal.
>>>> >> >> >
>>>> >> >> > I'll make sure these are more clearly marked as stable.
>>>> >> >> >
>>>> >> >> > sage
>>>> >> >> >
>>>> >> >> >
>>>> >> >> >>
>>>> >> >> >>
>>>> >> >> >> On Mon, Nov 12, 2012 at 3:16 PM, Sage Weil <sage@inktank.com> wrote:
>>>> >> >> >> > On Mon, 12 Nov 2012, Nick Bartos wrote:
>>>> >> >> >> >> After removing 8-libceph-protect-ceph_con_open-with-mutex.patch, it
>>>> >> >> >> >> seems we no longer have this hang.
>>>> >> >> >> >
>>>> >> >> >> > Hmm, that's a bit disconcerting.  Did this series come from our old 3.5
>>>> >> >> >> > stable series?  I recently prepared a new one that backports *all* of the
>>>> >> >> >> > fixes from 3.6 to 3.5 (and 3.4); see wip-3.5 in ceph-client.git.  I would
>>>> >> >> >> > be curious if you see problems with that.
>>>> >> >> >> >
>>>> >> >> >> > So far, with these fixes in place, we have not seen any unexplained kernel
>>>> >> >> >> > crashes in this code.
>>>> >> >> >> >
>>>> >> >> >> > I take it you're going back to a 3.5 kernel because you weren't able to
>>>> >> >> >> > get rid of the sync problem with 3.6?
>>>> >> >> >> >
>>>> >> >> >> > sage
>>>> >> >> >> >
>>>> >> >> >> >
>>>> >> >> >> >
>>>> >> >> >> >>
>>>> >> >> >> >> On Thu, Nov 8, 2012 at 5:43 PM, Josh Durgin <josh.durgin@inktank.com> wrote:
>>>> >> >> >> >> > On 11/08/2012 02:10 PM, Mandell Degerness wrote:
>>>> >> >> >> >> >>
>>>> >> >> >> >> >> We are seeing a somewhat random, but frequent hang on our systems
>>>> >> >> >> >> >> during startup.  The hang happens at the point where an "rbd map
>>>> >> >> >> >> >> <rbdvol>" command is run.
>>>> >> >> >> >> >>
>>>> >> >> >> >> >> I've attached the ceph logs from the cluster.  The map command happens
>>>> >> >> >> >> >> at Nov  8 18:41:09 on server 172.18.0.15.  The process which hung can
>>>> >> >> >> >> >> be seen in the log as 172.18.0.15:0/1143980479.
>>>> >> >> >> >> >>
>>>> >> >> >> >> >> It appears as if the TCP socket is opened to the OSD, but then times
>>>> >> >> >> >> >> out 15 minutes later, the process gets data when the socket is closed
>>>> >> >> >> >> >> on the client server and it retries.
>>>> >> >> >> >> >>
>>>> >> >> >> >> >> Please help.
>>>> >> >> >> >> >>
>>>> >> >> >> >> >> We are using ceph version 0.48.2argonaut
>>>> >> >> >> >> >> (commit:3e02b2fad88c2a95d9c0c86878f10d1beb780bfe).
>>>> >> >> >> >> >>
>>>> >> >> >> >> >> We are using a 3.5.7 kernel with the following list of patches applied:
>>>> >> >> >> >> >>
>>>> >> >> >> >> >> 1-libceph-encapsulate-out-message-data-setup.patch
>>>> >> >> >> >> >> 2-libceph-dont-mark-footer-complete-before-it-is.patch
>>>> >> >> >> >> >> 3-libceph-move-init-of-bio_iter.patch
>>>> >> >> >> >> >> 4-libceph-dont-use-bio_iter-as-a-flag.patch
>>>> >> >> >> >> >> 5-libceph-resubmit-linger-ops-when-pg-mapping-changes.patch
>>>> >> >> >> >> >> 6-libceph-re-initialize-bio_iter-on-start-of-message-receive.patch
>>>> >> >> >> >> >> 7-ceph-close-old-con-before-reopening-on-mds-reconnect.patch
>>>> >> >> >> >> >> 8-libceph-protect-ceph_con_open-with-mutex.patch
>>>> >> >> >> >> >> 9-libceph-reset-connection-retry-on-successfully-negotiation.patch
>>>> >> >> >> >> >> 10-rbd-only-reset-capacity-when-pointing-to-head.patch
>>>> >> >> >> >> >> 11-rbd-set-image-size-when-header-is-updated.patch
>>>> >> >> >> >> >> 12-libceph-fix-crypto-key-null-deref-memory-leak.patch
>>>> >> >> >> >> >> 13-ceph-tolerate-and-warn-on-extraneous-dentry-from-mds.patch
>>>> >> >> >> >> >> 14-ceph-avoid-divide-by-zero-in-__validate_layout.patch
>>>> >> >> >> >> >> 15-rbd-drop-dev-reference-on-error-in-rbd_open.patch
>>>> >> >> >> >> >> 16-ceph-Fix-oops-when-handling-mdsmap-that-decreases-max_mds.patch
>>>> >> >> >> >> >> 17-libceph-check-for-invalid-mapping.patch
>>>> >> >> >> >> >> 18-ceph-propagate-layout-error-on-osd-request-creation.patch
>>>> >> >> >> >> >> 19-rbd-BUG-on-invalid-layout.patch
>>>> >> >> >> >> >> 20-ceph-return-EIO-on-invalid-layout-on-GET_DATALOC-ioctl.patch
>>>> >> >> >> >> >> 21-ceph-avoid-32-bit-page-index-overflow.patch
>>>> >> >> >> >> >> 23-ceph-fix-dentry-reference-leak-in-encode_fh.patch
>>>> >> >> >> >> >>
>>>> >> >> >> >> >> Any suggestions?
>>>> >> >> >> >> >
>>>> >> >> >> >> >
>>>> >> >> >> >> > The log shows your monitors don't have time sychronized enough among
>>>> >> >> >> >> > them to make much progress (including authenticating new connections).
>>>> >> >> >> >> > That's probably the real issue. 0.2s is pretty large clock drift.
>>>> >> >> >> >> >
>>>> >> >> >> >> >
>>>> >> >> >> >> >> One thought is that the following patch (which we could not apply) is
>>>> >> >> >> >> >> what is required:
>>>> >> >> >> >> >>
>>>> >> >> >> >> >> 22-rbd-reset-BACKOFF-if-unable-to-re-queue.patch
>>>> >> >> >> >> >
>>>> >> >> >> >> >
>>>> >> >> >> >> > This is certainly useful too, but I don't think it's the cause of
>>>> >> >> >> >> > the delay in this case.
>>>> >> >> >> >> >
>>>> >> >> >> >> > Josh
>>>> >> >> >> >> > --
>>>> >> >> >> >> > To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
>>>> >> >> >> >> > the body of a message to majordomo@vger.kernel.org
>>>> >> >> >> >> > More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>>> >> >> >> >> --
>>>> >> >> >> >> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
>>>> >> >> >> >> the body of a message to majordomo@vger.kernel.org
>>>> >> >> >> >> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>>> >> >> >> >>
>>>> >> >> >> >>
>>>> >> >> >> --
>>>> >> >> >> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
>>>> >> >> >> the body of a message to majordomo@vger.kernel.org
>>>> >> >> >> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>>> >> >> >>
>>>> >> >> >>
>>>> >> >> --
>>>> >> >> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
>>>> >> >> the body of a message to majordomo@vger.kernel.org
>>>> >> >> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>>> >> >>
>>>> >> >>
>>>> >>
>>>> >>
>>>> --
>>>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
>>>> the body of a message to majordomo@vger.kernel.org
>>>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>>>
>>>>
>> --
>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
>> the body of a message to majordomo@vger.kernel.org
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: rbd map command hangs for 15 minutes during system start up
  2012-11-20 21:53                           ` Nick Bartos
@ 2012-11-21  1:31                             ` Nick Bartos
  2012-11-21 16:50                               ` Sage Weil
  0 siblings, 1 reply; 56+ messages in thread
From: Nick Bartos @ 2012-11-21  1:31 UTC (permalink / raw)
  To: Gregory Farnum; +Cc: Sage Weil, Josh Durgin, Mandell Degerness, ceph-devel

Since I now have a decent script which can reproduce this, I decided
to re-test with the same 3.5.7 kernel, but just not applying the
patches from the wip-3.5 branch.  With the patches, I can only go 2
builds before I run into a hang.  Without the patches, I have gone 9
consecutive builds (and still going) without seeing the hang.  So it
seems like a reasonable assumption that the problem was introduced in
one of those patches.

We started seeing the problem before applying all the 3.5 patches, so
it seems like one of these is the culprit:

1-libceph-encapsulate-out-message-data-setup.patch
2-libceph-dont-mark-footer-complete-before-it-is.patch
3-libceph-move-init-of-bio_iter.patch
4-libceph-dont-use-bio_iter-as-a-flag.patch
5-libceph-resubmit-linger-ops-when-pg-mapping-changes.patch
6-libceph-re-initialize-bio_iter-on-start-of-message-receive.patch
7-ceph-close-old-con-before-reopening-on-mds-reconnect.patch
8-libceph-protect-ceph_con_open-with-mutex.patch
9-libceph-reset-connection-retry-on-successfully-negotiation.patch
10-rbd-only-reset-capacity-when-pointing-to-head.patch
11-rbd-set-image-size-when-header-is-updated.patch
12-libceph-fix-crypto-key-null-deref-memory-leak.patch
13-ceph-tolerate-and-warn-on-extraneous-dentry-from-mds.patch
14-ceph-avoid-divide-by-zero-in-__validate_layout.patch
15-rbd-drop-dev-reference-on-error-in-rbd_open.patch
16-ceph-Fix-oops-when-handling-mdsmap-that-decreases-max_mds.patch
17-libceph-check-for-invalid-mapping.patch
18-ceph-propagate-layout-error-on-osd-request-creation.patch
19-rbd-BUG-on-invalid-layout.patch
20-ceph-return-EIO-on-invalid-layout-on-GET_DATALOC-ioctl.patch
21-ceph-avoid-32-bit-page-index-overflow.patch
23-ceph-fix-dentry-reference-leak-in-encode_fh.patch

I'll start doing some other builds to try and narrow down the patch
introducing the problem more specifically.


On Tue, Nov 20, 2012 at 1:53 PM, Nick Bartos <nick@pistoncloud.com> wrote:
> I reproduced the problem and got several sysrq states captured.
> During this run, the monitor running on the host complained a few
> times about the clocks being off, but all messages were for under 0.55
> seconds.
>
> Here are the kernel logs.  Note that there are several traces, I
> thought multiple during the incident may help:
> https://raw.github.com/gist/4121395/a6dda7552ed8a45725ee5d632fe3ba38703f8cfc/gistfile1.txt
>
>
> On Mon, Nov 19, 2012 at 3:34 PM, Gregory Farnum <greg@inktank.com> wrote:
>> Hmm, yep — that param is actually only used for the warning; I guess
>> we forgot what it actually covers. :(
>>
>> Have your monitor clocks been off by more than 5 seconds at any point?
>>
>> On Mon, Nov 19, 2012 at 3:04 PM, Nick Bartos <nick@pistoncloud.com> wrote:
>>> Making 'mon clock drift allowed' very small (0.00001) does not
>>> reliably reproduce the hang.  I started looking at the code for 0.48.2
>>> and it looks like this is only used in Paxos::warn_on_future_time,
>>> which only handles the warning, nothing else.
>>>
>>>
>>> On Fri, Nov 16, 2012 at 2:21 PM, Sage Weil <sage@inktank.com> wrote:
>>>> On Fri, 16 Nov 2012, Nick Bartos wrote:
>>>>> Should I be lowering the clock drift allowed, or the lease interval to
>>>>> help reproduce it?
>>>>
>>>> clock drift allowed.
>>>>
>>>>
>>>>
>>>>>
>>>>> On Fri, Nov 16, 2012 at 2:13 PM, Sage Weil <sage@inktank.com> wrote:
>>>>> > You can safely set the clock drift allowed as high as 500ms.  The real
>>>>> > limitation is that it needs to be well under the lease interval, which is
>>>>> > currently 5 seconds by default.
>>>>> >
>>>>> > You might be able to reproduce more easily by lowering the threshold...
>>>>> >
>>>>> > sage
>>>>> >
>>>>> >
>>>>> > On Fri, 16 Nov 2012, Nick Bartos wrote:
>>>>> >
>>>>> >> How far off do the clocks need to be before there is a problem?  It
>>>>> >> would seem to be hard to ensure a very large cluster has all of it's
>>>>> >> nodes synchronized within 50ms (which seems to be the default for "mon
>>>>> >> clock drift allowed").  Does the mon clock drift allowed parameter
>>>>> >> change anything other than the log messages?  Are there any other
>>>>> >> tuning options that may help, assuming that this is the issue and it's
>>>>> >> not feasible to get the clocks more than 500ms in sync between all
>>>>> >> nodes?
>>>>> >>
>>>>> >> I'm trying to get a good way of reproducing this and get a trace on
>>>>> >> the ceph processes to see what they're waiting on.  I'll let you know
>>>>> >> when I have more info.
>>>>> >>
>>>>> >>
>>>>> >> On Fri, Nov 16, 2012 at 11:16 AM, Sage Weil <sage@inktank.com> wrote:
>>>>> >> > I just realized I was mixing up this thread with the other deadlock
>>>>> >> > thread.
>>>>> >> >
>>>>> >> > On Fri, 16 Nov 2012, Nick Bartos wrote:
>>>>> >> >> Turns out we're having the 'rbd map' hang on startup again, after we
>>>>> >> >> started using the wip-3.5 patch set.  How critical is the
>>>>> >> >> libceph_protect_ceph_con_open_with_mutex commit?  That's the one I
>>>>> >> >> removed before which seemed to get rid of the problem (although I'm
>>>>> >> >> not completely sure if it completely got rid of it, at least seemed to
>>>>> >> >> happen much less often).
>>>>> >> >>
>>>>> >> >> It seems like we only started having this issue after we started
>>>>> >> >> patching the 3.5 ceph client (we started patching to try and get rid
>>>>> >> >> of a kernel oops, which the patches seem to have fixed).
>>>>> >> >
>>>>> >> > Right.  That patch fixes a real bug.  It also seems pretty unlikely that
>>>>> >> > this patch is related to the startup hang.  The original log showed clock
>>>>> >> > drift on the monitor that could very easily cause this sort of hang.  Can
>>>>> >> > you confirm that that isn't the case with this recent instance of the
>>>>> >> > problem?  And/or attach a log?
>>>>> >> >
>>>>> >> > Thanks-
>>>>> >> > sage
>>>>> >> >
>>>>> >> >
>>>>> >> >>
>>>>> >> >>
>>>>> >> >> On Thu, Nov 15, 2012 at 4:25 PM, Sage Weil <sage@inktank.com> wrote:
>>>>> >> >> > On Thu, 15 Nov 2012, Nick Bartos wrote:
>>>>> >> >> >> Sorry I guess this e-mail got missed.  I believe those patches came
>>>>> >> >> >> from the ceph/linux-3.5.5-ceph branch.  I'm now using the wip-3.5
>>>>> >> >> >> branch patches, which seem to all be fine.  We'll stick with 3.5 and
>>>>> >> >> >> this backport for now until we can figure out what's wrong with 3.6.
>>>>> >> >> >>
>>>>> >> >> >> I typically ignore the wip branches just due to the naming when I'm
>>>>> >> >> >> looking for updates.  Where should I typically look for updates that
>>>>> >> >> >> aren't in released kernels?  Also, is there anything else in the wip*
>>>>> >> >> >> branches that you think we may find particularly useful?
>>>>> >> >> >
>>>>> >> >> > You were looking in the right place.  The problem was we weren't super
>>>>> >> >> > organized with our stable patches, and changed our minds about what to
>>>>> >> >> > send upstream.  These are 'wip' in the sense that they were in preparation
>>>>> >> >> > for going upstream.  The goal is to push them to the mainline stable
>>>>> >> >> > kernels and ideally not keep them in our tree at all.
>>>>> >> >> >
>>>>> >> >> > wip-3.5 is an oddity because the mainline stable kernel is EOL'd, but
>>>>> >> >> > we're keeping it so that ubuntu can pick it up for quantal.
>>>>> >> >> >
>>>>> >> >> > I'll make sure these are more clearly marked as stable.
>>>>> >> >> >
>>>>> >> >> > sage
>>>>> >> >> >
>>>>> >> >> >
>>>>> >> >> >>
>>>>> >> >> >>
>>>>> >> >> >> On Mon, Nov 12, 2012 at 3:16 PM, Sage Weil <sage@inktank.com> wrote:
>>>>> >> >> >> > On Mon, 12 Nov 2012, Nick Bartos wrote:
>>>>> >> >> >> >> After removing 8-libceph-protect-ceph_con_open-with-mutex.patch, it
>>>>> >> >> >> >> seems we no longer have this hang.
>>>>> >> >> >> >
>>>>> >> >> >> > Hmm, that's a bit disconcerting.  Did this series come from our old 3.5
>>>>> >> >> >> > stable series?  I recently prepared a new one that backports *all* of the
>>>>> >> >> >> > fixes from 3.6 to 3.5 (and 3.4); see wip-3.5 in ceph-client.git.  I would
>>>>> >> >> >> > be curious if you see problems with that.
>>>>> >> >> >> >
>>>>> >> >> >> > So far, with these fixes in place, we have not seen any unexplained kernel
>>>>> >> >> >> > crashes in this code.
>>>>> >> >> >> >
>>>>> >> >> >> > I take it you're going back to a 3.5 kernel because you weren't able to
>>>>> >> >> >> > get rid of the sync problem with 3.6?
>>>>> >> >> >> >
>>>>> >> >> >> > sage
>>>>> >> >> >> >
>>>>> >> >> >> >
>>>>> >> >> >> >
>>>>> >> >> >> >>
>>>>> >> >> >> >> On Thu, Nov 8, 2012 at 5:43 PM, Josh Durgin <josh.durgin@inktank.com> wrote:
>>>>> >> >> >> >> > On 11/08/2012 02:10 PM, Mandell Degerness wrote:
>>>>> >> >> >> >> >>
>>>>> >> >> >> >> >> We are seeing a somewhat random, but frequent hang on our systems
>>>>> >> >> >> >> >> during startup.  The hang happens at the point where an "rbd map
>>>>> >> >> >> >> >> <rbdvol>" command is run.
>>>>> >> >> >> >> >>
>>>>> >> >> >> >> >> I've attached the ceph logs from the cluster.  The map command happens
>>>>> >> >> >> >> >> at Nov  8 18:41:09 on server 172.18.0.15.  The process which hung can
>>>>> >> >> >> >> >> be seen in the log as 172.18.0.15:0/1143980479.
>>>>> >> >> >> >> >>
>>>>> >> >> >> >> >> It appears as if the TCP socket is opened to the OSD, but then times
>>>>> >> >> >> >> >> out 15 minutes later, the process gets data when the socket is closed
>>>>> >> >> >> >> >> on the client server and it retries.
>>>>> >> >> >> >> >>
>>>>> >> >> >> >> >> Please help.
>>>>> >> >> >> >> >>
>>>>> >> >> >> >> >> We are using ceph version 0.48.2argonaut
>>>>> >> >> >> >> >> (commit:3e02b2fad88c2a95d9c0c86878f10d1beb780bfe).
>>>>> >> >> >> >> >>
>>>>> >> >> >> >> >> We are using a 3.5.7 kernel with the following list of patches applied:
>>>>> >> >> >> >> >>
>>>>> >> >> >> >> >> 1-libceph-encapsulate-out-message-data-setup.patch
>>>>> >> >> >> >> >> 2-libceph-dont-mark-footer-complete-before-it-is.patch
>>>>> >> >> >> >> >> 3-libceph-move-init-of-bio_iter.patch
>>>>> >> >> >> >> >> 4-libceph-dont-use-bio_iter-as-a-flag.patch
>>>>> >> >> >> >> >> 5-libceph-resubmit-linger-ops-when-pg-mapping-changes.patch
>>>>> >> >> >> >> >> 6-libceph-re-initialize-bio_iter-on-start-of-message-receive.patch
>>>>> >> >> >> >> >> 7-ceph-close-old-con-before-reopening-on-mds-reconnect.patch
>>>>> >> >> >> >> >> 8-libceph-protect-ceph_con_open-with-mutex.patch
>>>>> >> >> >> >> >> 9-libceph-reset-connection-retry-on-successfully-negotiation.patch
>>>>> >> >> >> >> >> 10-rbd-only-reset-capacity-when-pointing-to-head.patch
>>>>> >> >> >> >> >> 11-rbd-set-image-size-when-header-is-updated.patch
>>>>> >> >> >> >> >> 12-libceph-fix-crypto-key-null-deref-memory-leak.patch
>>>>> >> >> >> >> >> 13-ceph-tolerate-and-warn-on-extraneous-dentry-from-mds.patch
>>>>> >> >> >> >> >> 14-ceph-avoid-divide-by-zero-in-__validate_layout.patch
>>>>> >> >> >> >> >> 15-rbd-drop-dev-reference-on-error-in-rbd_open.patch
>>>>> >> >> >> >> >> 16-ceph-Fix-oops-when-handling-mdsmap-that-decreases-max_mds.patch
>>>>> >> >> >> >> >> 17-libceph-check-for-invalid-mapping.patch
>>>>> >> >> >> >> >> 18-ceph-propagate-layout-error-on-osd-request-creation.patch
>>>>> >> >> >> >> >> 19-rbd-BUG-on-invalid-layout.patch
>>>>> >> >> >> >> >> 20-ceph-return-EIO-on-invalid-layout-on-GET_DATALOC-ioctl.patch
>>>>> >> >> >> >> >> 21-ceph-avoid-32-bit-page-index-overflow.patch
>>>>> >> >> >> >> >> 23-ceph-fix-dentry-reference-leak-in-encode_fh.patch
>>>>> >> >> >> >> >>
>>>>> >> >> >> >> >> Any suggestions?
>>>>> >> >> >> >> >
>>>>> >> >> >> >> >
>>>>> >> >> >> >> > The log shows your monitors don't have time sychronized enough among
>>>>> >> >> >> >> > them to make much progress (including authenticating new connections).
>>>>> >> >> >> >> > That's probably the real issue. 0.2s is pretty large clock drift.
>>>>> >> >> >> >> >
>>>>> >> >> >> >> >
>>>>> >> >> >> >> >> One thought is that the following patch (which we could not apply) is
>>>>> >> >> >> >> >> what is required:
>>>>> >> >> >> >> >>
>>>>> >> >> >> >> >> 22-rbd-reset-BACKOFF-if-unable-to-re-queue.patch
>>>>> >> >> >> >> >
>>>>> >> >> >> >> >
>>>>> >> >> >> >> > This is certainly useful too, but I don't think it's the cause of
>>>>> >> >> >> >> > the delay in this case.
>>>>> >> >> >> >> >
>>>>> >> >> >> >> > Josh
>>>>> >> >> >> >> > --
>>>>> >> >> >> >> > To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
>>>>> >> >> >> >> > the body of a message to majordomo@vger.kernel.org
>>>>> >> >> >> >> > More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>>>> >> >> >> >> --
>>>>> >> >> >> >> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
>>>>> >> >> >> >> the body of a message to majordomo@vger.kernel.org
>>>>> >> >> >> >> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>>>> >> >> >> >>
>>>>> >> >> >> >>
>>>>> >> >> >> --
>>>>> >> >> >> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
>>>>> >> >> >> the body of a message to majordomo@vger.kernel.org
>>>>> >> >> >> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>>>> >> >> >>
>>>>> >> >> >>
>>>>> >> >> --
>>>>> >> >> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
>>>>> >> >> the body of a message to majordomo@vger.kernel.org
>>>>> >> >> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>>>> >> >>
>>>>> >> >>
>>>>> >>
>>>>> >>
>>>>> --
>>>>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
>>>>> the body of a message to majordomo@vger.kernel.org
>>>>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>>>>
>>>>>
>>> --
>>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
>>> the body of a message to majordomo@vger.kernel.org
>>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: rbd map command hangs for 15 minutes during system start up
  2012-11-21  1:31                             ` Nick Bartos
@ 2012-11-21 16:50                               ` Sage Weil
  2012-11-21 17:02                                 ` Nick Bartos
  0 siblings, 1 reply; 56+ messages in thread
From: Sage Weil @ 2012-11-21 16:50 UTC (permalink / raw)
  To: Nick Bartos; +Cc: Gregory Farnum, Josh Durgin, Mandell Degerness, ceph-devel

On Tue, 20 Nov 2012, Nick Bartos wrote:
> Since I now have a decent script which can reproduce this, I decided
> to re-test with the same 3.5.7 kernel, but just not applying the
> patches from the wip-3.5 branch.  With the patches, I can only go 2
> builds before I run into a hang.  Without the patches, I have gone 9
> consecutive builds (and still going) without seeing the hang.  So it
> seems like a reasonable assumption that the problem was introduced in
> one of those patches.
> 
> We started seeing the problem before applying all the 3.5 patches, so
> it seems like one of these is the culprit:
> 
> 1-libceph-encapsulate-out-message-data-setup.patch
> 2-libceph-dont-mark-footer-complete-before-it-is.patch
> 3-libceph-move-init-of-bio_iter.patch
> 4-libceph-dont-use-bio_iter-as-a-flag.patch
> 5-libceph-resubmit-linger-ops-when-pg-mapping-changes.patch
> 6-libceph-re-initialize-bio_iter-on-start-of-message-receive.patch
> 7-ceph-close-old-con-before-reopening-on-mds-reconnect.patch
> 8-libceph-protect-ceph_con_open-with-mutex.patch
> 9-libceph-reset-connection-retry-on-successfully-negotiation.patch
> 10-rbd-only-reset-capacity-when-pointing-to-head.patch
> 11-rbd-set-image-size-when-header-is-updated.patch
> 12-libceph-fix-crypto-key-null-deref-memory-leak.patch
> 13-ceph-tolerate-and-warn-on-extraneous-dentry-from-mds.patch
> 14-ceph-avoid-divide-by-zero-in-__validate_layout.patch
> 15-rbd-drop-dev-reference-on-error-in-rbd_open.patch
> 16-ceph-Fix-oops-when-handling-mdsmap-that-decreases-max_mds.patch
> 17-libceph-check-for-invalid-mapping.patch
> 18-ceph-propagate-layout-error-on-osd-request-creation.patch
> 19-rbd-BUG-on-invalid-layout.patch
> 20-ceph-return-EIO-on-invalid-layout-on-GET_DATALOC-ioctl.patch
> 21-ceph-avoid-32-bit-page-index-overflow.patch
> 23-ceph-fix-dentry-reference-leak-in-encode_fh.patch
> 
> I'll start doing some other builds to try and narrow down the patch
> introducing the problem more specifically.

Thanks for hunting this down.  I'm very curious what the culprit is...

sage



> 
> 
> On Tue, Nov 20, 2012 at 1:53 PM, Nick Bartos <nick@pistoncloud.com> wrote:
> > I reproduced the problem and got several sysrq states captured.
> > During this run, the monitor running on the host complained a few
> > times about the clocks being off, but all messages were for under 0.55
> > seconds.
> >
> > Here are the kernel logs.  Note that there are several traces, I
> > thought multiple during the incident may help:
> > https://raw.github.com/gist/4121395/a6dda7552ed8a45725ee5d632fe3ba38703f8cfc/gistfile1.txt
> >
> >
> > On Mon, Nov 19, 2012 at 3:34 PM, Gregory Farnum <greg@inktank.com> wrote:
> >> Hmm, yep ? that param is actually only used for the warning; I guess
> >> we forgot what it actually covers. :(
> >>
> >> Have your monitor clocks been off by more than 5 seconds at any point?
> >>
> >> On Mon, Nov 19, 2012 at 3:04 PM, Nick Bartos <nick@pistoncloud.com> wrote:
> >>> Making 'mon clock drift allowed' very small (0.00001) does not
> >>> reliably reproduce the hang.  I started looking at the code for 0.48.2
> >>> and it looks like this is only used in Paxos::warn_on_future_time,
> >>> which only handles the warning, nothing else.
> >>>
> >>>
> >>> On Fri, Nov 16, 2012 at 2:21 PM, Sage Weil <sage@inktank.com> wrote:
> >>>> On Fri, 16 Nov 2012, Nick Bartos wrote:
> >>>>> Should I be lowering the clock drift allowed, or the lease interval to
> >>>>> help reproduce it?
> >>>>
> >>>> clock drift allowed.
> >>>>
> >>>>
> >>>>
> >>>>>
> >>>>> On Fri, Nov 16, 2012 at 2:13 PM, Sage Weil <sage@inktank.com> wrote:
> >>>>> > You can safely set the clock drift allowed as high as 500ms.  The real
> >>>>> > limitation is that it needs to be well under the lease interval, which is
> >>>>> > currently 5 seconds by default.
> >>>>> >
> >>>>> > You might be able to reproduce more easily by lowering the threshold...
> >>>>> >
> >>>>> > sage
> >>>>> >
> >>>>> >
> >>>>> > On Fri, 16 Nov 2012, Nick Bartos wrote:
> >>>>> >
> >>>>> >> How far off do the clocks need to be before there is a problem?  It
> >>>>> >> would seem to be hard to ensure a very large cluster has all of it's
> >>>>> >> nodes synchronized within 50ms (which seems to be the default for "mon
> >>>>> >> clock drift allowed").  Does the mon clock drift allowed parameter
> >>>>> >> change anything other than the log messages?  Are there any other
> >>>>> >> tuning options that may help, assuming that this is the issue and it's
> >>>>> >> not feasible to get the clocks more than 500ms in sync between all
> >>>>> >> nodes?
> >>>>> >>
> >>>>> >> I'm trying to get a good way of reproducing this and get a trace on
> >>>>> >> the ceph processes to see what they're waiting on.  I'll let you know
> >>>>> >> when I have more info.
> >>>>> >>
> >>>>> >>
> >>>>> >> On Fri, Nov 16, 2012 at 11:16 AM, Sage Weil <sage@inktank.com> wrote:
> >>>>> >> > I just realized I was mixing up this thread with the other deadlock
> >>>>> >> > thread.
> >>>>> >> >
> >>>>> >> > On Fri, 16 Nov 2012, Nick Bartos wrote:
> >>>>> >> >> Turns out we're having the 'rbd map' hang on startup again, after we
> >>>>> >> >> started using the wip-3.5 patch set.  How critical is the
> >>>>> >> >> libceph_protect_ceph_con_open_with_mutex commit?  That's the one I
> >>>>> >> >> removed before which seemed to get rid of the problem (although I'm
> >>>>> >> >> not completely sure if it completely got rid of it, at least seemed to
> >>>>> >> >> happen much less often).
> >>>>> >> >>
> >>>>> >> >> It seems like we only started having this issue after we started
> >>>>> >> >> patching the 3.5 ceph client (we started patching to try and get rid
> >>>>> >> >> of a kernel oops, which the patches seem to have fixed).
> >>>>> >> >
> >>>>> >> > Right.  That patch fixes a real bug.  It also seems pretty unlikely that
> >>>>> >> > this patch is related to the startup hang.  The original log showed clock
> >>>>> >> > drift on the monitor that could very easily cause this sort of hang.  Can
> >>>>> >> > you confirm that that isn't the case with this recent instance of the
> >>>>> >> > problem?  And/or attach a log?
> >>>>> >> >
> >>>>> >> > Thanks-
> >>>>> >> > sage
> >>>>> >> >
> >>>>> >> >
> >>>>> >> >>
> >>>>> >> >>
> >>>>> >> >> On Thu, Nov 15, 2012 at 4:25 PM, Sage Weil <sage@inktank.com> wrote:
> >>>>> >> >> > On Thu, 15 Nov 2012, Nick Bartos wrote:
> >>>>> >> >> >> Sorry I guess this e-mail got missed.  I believe those patches came
> >>>>> >> >> >> from the ceph/linux-3.5.5-ceph branch.  I'm now using the wip-3.5
> >>>>> >> >> >> branch patches, which seem to all be fine.  We'll stick with 3.5 and
> >>>>> >> >> >> this backport for now until we can figure out what's wrong with 3.6.
> >>>>> >> >> >>
> >>>>> >> >> >> I typically ignore the wip branches just due to the naming when I'm
> >>>>> >> >> >> looking for updates.  Where should I typically look for updates that
> >>>>> >> >> >> aren't in released kernels?  Also, is there anything else in the wip*
> >>>>> >> >> >> branches that you think we may find particularly useful?
> >>>>> >> >> >
> >>>>> >> >> > You were looking in the right place.  The problem was we weren't super
> >>>>> >> >> > organized with our stable patches, and changed our minds about what to
> >>>>> >> >> > send upstream.  These are 'wip' in the sense that they were in preparation
> >>>>> >> >> > for going upstream.  The goal is to push them to the mainline stable
> >>>>> >> >> > kernels and ideally not keep them in our tree at all.
> >>>>> >> >> >
> >>>>> >> >> > wip-3.5 is an oddity because the mainline stable kernel is EOL'd, but
> >>>>> >> >> > we're keeping it so that ubuntu can pick it up for quantal.
> >>>>> >> >> >
> >>>>> >> >> > I'll make sure these are more clearly marked as stable.
> >>>>> >> >> >
> >>>>> >> >> > sage
> >>>>> >> >> >
> >>>>> >> >> >
> >>>>> >> >> >>
> >>>>> >> >> >>
> >>>>> >> >> >> On Mon, Nov 12, 2012 at 3:16 PM, Sage Weil <sage@inktank.com> wrote:
> >>>>> >> >> >> > On Mon, 12 Nov 2012, Nick Bartos wrote:
> >>>>> >> >> >> >> After removing 8-libceph-protect-ceph_con_open-with-mutex.patch, it
> >>>>> >> >> >> >> seems we no longer have this hang.
> >>>>> >> >> >> >
> >>>>> >> >> >> > Hmm, that's a bit disconcerting.  Did this series come from our old 3.5
> >>>>> >> >> >> > stable series?  I recently prepared a new one that backports *all* of the
> >>>>> >> >> >> > fixes from 3.6 to 3.5 (and 3.4); see wip-3.5 in ceph-client.git.  I would
> >>>>> >> >> >> > be curious if you see problems with that.
> >>>>> >> >> >> >
> >>>>> >> >> >> > So far, with these fixes in place, we have not seen any unexplained kernel
> >>>>> >> >> >> > crashes in this code.
> >>>>> >> >> >> >
> >>>>> >> >> >> > I take it you're going back to a 3.5 kernel because you weren't able to
> >>>>> >> >> >> > get rid of the sync problem with 3.6?
> >>>>> >> >> >> >
> >>>>> >> >> >> > sage
> >>>>> >> >> >> >
> >>>>> >> >> >> >
> >>>>> >> >> >> >
> >>>>> >> >> >> >>
> >>>>> >> >> >> >> On Thu, Nov 8, 2012 at 5:43 PM, Josh Durgin <josh.durgin@inktank.com> wrote:
> >>>>> >> >> >> >> > On 11/08/2012 02:10 PM, Mandell Degerness wrote:
> >>>>> >> >> >> >> >>
> >>>>> >> >> >> >> >> We are seeing a somewhat random, but frequent hang on our systems
> >>>>> >> >> >> >> >> during startup.  The hang happens at the point where an "rbd map
> >>>>> >> >> >> >> >> <rbdvol>" command is run.
> >>>>> >> >> >> >> >>
> >>>>> >> >> >> >> >> I've attached the ceph logs from the cluster.  The map command happens
> >>>>> >> >> >> >> >> at Nov  8 18:41:09 on server 172.18.0.15.  The process which hung can
> >>>>> >> >> >> >> >> be seen in the log as 172.18.0.15:0/1143980479.
> >>>>> >> >> >> >> >>
> >>>>> >> >> >> >> >> It appears as if the TCP socket is opened to the OSD, but then times
> >>>>> >> >> >> >> >> out 15 minutes later, the process gets data when the socket is closed
> >>>>> >> >> >> >> >> on the client server and it retries.
> >>>>> >> >> >> >> >>
> >>>>> >> >> >> >> >> Please help.
> >>>>> >> >> >> >> >>
> >>>>> >> >> >> >> >> We are using ceph version 0.48.2argonaut
> >>>>> >> >> >> >> >> (commit:3e02b2fad88c2a95d9c0c86878f10d1beb780bfe).
> >>>>> >> >> >> >> >>
> >>>>> >> >> >> >> >> We are using a 3.5.7 kernel with the following list of patches applied:
> >>>>> >> >> >> >> >>
> >>>>> >> >> >> >> >> 1-libceph-encapsulate-out-message-data-setup.patch
> >>>>> >> >> >> >> >> 2-libceph-dont-mark-footer-complete-before-it-is.patch
> >>>>> >> >> >> >> >> 3-libceph-move-init-of-bio_iter.patch
> >>>>> >> >> >> >> >> 4-libceph-dont-use-bio_iter-as-a-flag.patch
> >>>>> >> >> >> >> >> 5-libceph-resubmit-linger-ops-when-pg-mapping-changes.patch
> >>>>> >> >> >> >> >> 6-libceph-re-initialize-bio_iter-on-start-of-message-receive.patch
> >>>>> >> >> >> >> >> 7-ceph-close-old-con-before-reopening-on-mds-reconnect.patch
> >>>>> >> >> >> >> >> 8-libceph-protect-ceph_con_open-with-mutex.patch
> >>>>> >> >> >> >> >> 9-libceph-reset-connection-retry-on-successfully-negotiation.patch
> >>>>> >> >> >> >> >> 10-rbd-only-reset-capacity-when-pointing-to-head.patch
> >>>>> >> >> >> >> >> 11-rbd-set-image-size-when-header-is-updated.patch
> >>>>> >> >> >> >> >> 12-libceph-fix-crypto-key-null-deref-memory-leak.patch
> >>>>> >> >> >> >> >> 13-ceph-tolerate-and-warn-on-extraneous-dentry-from-mds.patch
> >>>>> >> >> >> >> >> 14-ceph-avoid-divide-by-zero-in-__validate_layout.patch
> >>>>> >> >> >> >> >> 15-rbd-drop-dev-reference-on-error-in-rbd_open.patch
> >>>>> >> >> >> >> >> 16-ceph-Fix-oops-when-handling-mdsmap-that-decreases-max_mds.patch
> >>>>> >> >> >> >> >> 17-libceph-check-for-invalid-mapping.patch
> >>>>> >> >> >> >> >> 18-ceph-propagate-layout-error-on-osd-request-creation.patch
> >>>>> >> >> >> >> >> 19-rbd-BUG-on-invalid-layout.patch
> >>>>> >> >> >> >> >> 20-ceph-return-EIO-on-invalid-layout-on-GET_DATALOC-ioctl.patch
> >>>>> >> >> >> >> >> 21-ceph-avoid-32-bit-page-index-overflow.patch
> >>>>> >> >> >> >> >> 23-ceph-fix-dentry-reference-leak-in-encode_fh.patch
> >>>>> >> >> >> >> >>
> >>>>> >> >> >> >> >> Any suggestions?
> >>>>> >> >> >> >> >
> >>>>> >> >> >> >> >
> >>>>> >> >> >> >> > The log shows your monitors don't have time sychronized enough among
> >>>>> >> >> >> >> > them to make much progress (including authenticating new connections).
> >>>>> >> >> >> >> > That's probably the real issue. 0.2s is pretty large clock drift.
> >>>>> >> >> >> >> >
> >>>>> >> >> >> >> >
> >>>>> >> >> >> >> >> One thought is that the following patch (which we could not apply) is
> >>>>> >> >> >> >> >> what is required:
> >>>>> >> >> >> >> >>
> >>>>> >> >> >> >> >> 22-rbd-reset-BACKOFF-if-unable-to-re-queue.patch
> >>>>> >> >> >> >> >
> >>>>> >> >> >> >> >
> >>>>> >> >> >> >> > This is certainly useful too, but I don't think it's the cause of
> >>>>> >> >> >> >> > the delay in this case.
> >>>>> >> >> >> >> >
> >>>>> >> >> >> >> > Josh
> >>>>> >> >> >> >> > --
> >>>>> >> >> >> >> > To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
> >>>>> >> >> >> >> > the body of a message to majordomo@vger.kernel.org
> >>>>> >> >> >> >> > More majordomo info at  http://vger.kernel.org/majordomo-info.html
> >>>>> >> >> >> >> --
> >>>>> >> >> >> >> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
> >>>>> >> >> >> >> the body of a message to majordomo@vger.kernel.org
> >>>>> >> >> >> >> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> >>>>> >> >> >> >>
> >>>>> >> >> >> >>
> >>>>> >> >> >> --
> >>>>> >> >> >> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
> >>>>> >> >> >> the body of a message to majordomo@vger.kernel.org
> >>>>> >> >> >> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> >>>>> >> >> >>
> >>>>> >> >> >>
> >>>>> >> >> --
> >>>>> >> >> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
> >>>>> >> >> the body of a message to majordomo@vger.kernel.org
> >>>>> >> >> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> >>>>> >> >>
> >>>>> >> >>
> >>>>> >>
> >>>>> >>
> >>>>> --
> >>>>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
> >>>>> the body of a message to majordomo@vger.kernel.org
> >>>>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> >>>>>
> >>>>>
> >>> --
> >>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
> >>> the body of a message to majordomo@vger.kernel.org
> >>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> --
> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 
> 

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: rbd map command hangs for 15 minutes during system start up
  2012-11-21 16:50                               ` Sage Weil
@ 2012-11-21 17:02                                 ` Nick Bartos
  2012-11-21 17:34                                   ` Nick Bartos
  0 siblings, 1 reply; 56+ messages in thread
From: Nick Bartos @ 2012-11-21 17:02 UTC (permalink / raw)
  To: Sage Weil; +Cc: Gregory Farnum, Josh Durgin, Mandell Degerness, ceph-devel

It's really looking like it's the
libceph_resubmit_linger_ops_when_pg_mapping_changes commit.  When
patches 1-50 (listed below) are applied to 3.5.7, the hang is present.
 So far I have gone through 4 successful installs with no hang with
only 1-49 applied.  I'm still leaving my test run to make sure it's
not a fluke, but since previously it hangs within the first couple of
builds, it really looks like this is where the problem originated.

1-libceph_eliminate_connection_state_DEAD.patch
2-libceph_kill_bad_proto_ceph_connection_op.patch
3-libceph_rename_socket_callbacks.patch
4-libceph_rename_kvec_reset_and_kvec_add_functions.patch
5-libceph_embed_ceph_messenger_structure_in_ceph_client.patch
6-libceph_start_separating_connection_flags_from_state.patch
7-libceph_start_tracking_connection_socket_state.patch
8-libceph_provide_osd_number_when_creating_osd.patch
9-libceph_set_CLOSED_state_bit_in_con_init.patch
10-libceph_embed_ceph_connection_structure_in_mon_client.patch
11-libceph_drop_connection_refcounting_for_mon_client.patch
12-libceph_init_monitor_connection_when_opening.patch
13-libceph_fully_initialize_connection_in_con_init.patch
14-libceph_tweak_ceph_alloc_msg.patch
15-libceph_have_messages_point_to_their_connection.patch
16-libceph_have_messages_take_a_connection_reference.patch
17-libceph_make_ceph_con_revoke_a_msg_operation.patch
18-libceph_make_ceph_con_revoke_message_a_msg_op.patch
19-libceph_fix_overflow_in___decode_pool_names.patch
20-libceph_fix_overflow_in_osdmap_decode.patch
21-libceph_fix_overflow_in_osdmap_apply_incremental.patch
22-libceph_transition_socket_state_prior_to_actual_connect.patch
23-libceph_fix_NULL_dereference_in_reset_connection.patch
24-libceph_use_con_get_put_methods.patch
25-libceph_drop_ceph_con_get_put_helpers_and_nref_member.patch
26-libceph_encapsulate_out_message_data_setup.patch
27-libceph_encapsulate_advancing_msg_page.patch
28-libceph_don_t_mark_footer_complete_before_it_is.patch
29-libceph_move_init_bio__functions_up.patch
30-libceph_move_init_of_bio_iter.patch
31-libceph_don_t_use_bio_iter_as_a_flag.patch
32-libceph_SOCK_CLOSED_is_a_flag_not_a_state.patch
33-libceph_don_t_change_socket_state_on_sock_event.patch
34-libceph_just_set_SOCK_CLOSED_when_state_changes.patch
35-libceph_don_t_touch_con_state_in_con_close_socket.patch
36-libceph_clear_CONNECTING_in_ceph_con_close.patch
37-libceph_clear_NEGOTIATING_when_done.patch
38-libceph_define_and_use_an_explicit_CONNECTED_state.patch
39-libceph_separate_banner_and_connect_writes.patch
40-libceph_distinguish_two_phases_of_connect_sequence.patch
41-libceph_small_changes_to_messenger.c.patch
42-libceph_add_some_fine_ASCII_art.patch
43-libceph_set_peer_name_on_con_open_not_init.patch
44-libceph_initialize_mon_client_con_only_once.patch
45-libceph_allow_sock_transition_from_CONNECTING_to_CLOSED.patch
46-libceph_initialize_msgpool_message_types.patch
47-libceph_prevent_the_race_of_incoming_work_during_teardown.patch
48-libceph_report_socket_read_write_error_message.patch
49-libceph_fix_mutex_coverage_for_ceph_con_close.patch
50-libceph_resubmit_linger_ops_when_pg_mapping_changes.patch


On Wed, Nov 21, 2012 at 8:50 AM, Sage Weil <sage@inktank.com> wrote:
> Thanks for hunting this down.  I'm very curious what the culprit is...
>
> sage

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: rbd map command hangs for 15 minutes during system start up
  2012-11-21 17:02                                 ` Nick Bartos
@ 2012-11-21 17:34                                   ` Nick Bartos
  2012-11-21 21:41                                     ` Nick Bartos
  0 siblings, 1 reply; 56+ messages in thread
From: Nick Bartos @ 2012-11-21 17:34 UTC (permalink / raw)
  To: Sage Weil; +Cc: Gregory Farnum, Josh Durgin, Mandell Degerness, ceph-devel

With 8 successful installs already done, I'm reasonably confident that
it's patch #50.  I'm making another build which applies all patches
from the 3.5 backport branch, excluding that specific one.  I'll let
you know if that turns up any unexpected failures.

What will the potential fall out be for removing that specific patch?


On Wed, Nov 21, 2012 at 9:02 AM, Nick Bartos <nick@pistoncloud.com> wrote:
> It's really looking like it's the
> libceph_resubmit_linger_ops_when_pg_mapping_changes commit.  When
> patches 1-50 (listed below) are applied to 3.5.7, the hang is present.
>  So far I have gone through 4 successful installs with no hang with
> only 1-49 applied.  I'm still leaving my test run to make sure it's
> not a fluke, but since previously it hangs within the first couple of
> builds, it really looks like this is where the problem originated.
>
> 1-libceph_eliminate_connection_state_DEAD.patch
> 2-libceph_kill_bad_proto_ceph_connection_op.patch
> 3-libceph_rename_socket_callbacks.patch
> 4-libceph_rename_kvec_reset_and_kvec_add_functions.patch
> 5-libceph_embed_ceph_messenger_structure_in_ceph_client.patch
> 6-libceph_start_separating_connection_flags_from_state.patch
> 7-libceph_start_tracking_connection_socket_state.patch
> 8-libceph_provide_osd_number_when_creating_osd.patch
> 9-libceph_set_CLOSED_state_bit_in_con_init.patch
> 10-libceph_embed_ceph_connection_structure_in_mon_client.patch
> 11-libceph_drop_connection_refcounting_for_mon_client.patch
> 12-libceph_init_monitor_connection_when_opening.patch
> 13-libceph_fully_initialize_connection_in_con_init.patch
> 14-libceph_tweak_ceph_alloc_msg.patch
> 15-libceph_have_messages_point_to_their_connection.patch
> 16-libceph_have_messages_take_a_connection_reference.patch
> 17-libceph_make_ceph_con_revoke_a_msg_operation.patch
> 18-libceph_make_ceph_con_revoke_message_a_msg_op.patch
> 19-libceph_fix_overflow_in___decode_pool_names.patch
> 20-libceph_fix_overflow_in_osdmap_decode.patch
> 21-libceph_fix_overflow_in_osdmap_apply_incremental.patch
> 22-libceph_transition_socket_state_prior_to_actual_connect.patch
> 23-libceph_fix_NULL_dereference_in_reset_connection.patch
> 24-libceph_use_con_get_put_methods.patch
> 25-libceph_drop_ceph_con_get_put_helpers_and_nref_member.patch
> 26-libceph_encapsulate_out_message_data_setup.patch
> 27-libceph_encapsulate_advancing_msg_page.patch
> 28-libceph_don_t_mark_footer_complete_before_it_is.patch
> 29-libceph_move_init_bio__functions_up.patch
> 30-libceph_move_init_of_bio_iter.patch
> 31-libceph_don_t_use_bio_iter_as_a_flag.patch
> 32-libceph_SOCK_CLOSED_is_a_flag_not_a_state.patch
> 33-libceph_don_t_change_socket_state_on_sock_event.patch
> 34-libceph_just_set_SOCK_CLOSED_when_state_changes.patch
> 35-libceph_don_t_touch_con_state_in_con_close_socket.patch
> 36-libceph_clear_CONNECTING_in_ceph_con_close.patch
> 37-libceph_clear_NEGOTIATING_when_done.patch
> 38-libceph_define_and_use_an_explicit_CONNECTED_state.patch
> 39-libceph_separate_banner_and_connect_writes.patch
> 40-libceph_distinguish_two_phases_of_connect_sequence.patch
> 41-libceph_small_changes_to_messenger.c.patch
> 42-libceph_add_some_fine_ASCII_art.patch
> 43-libceph_set_peer_name_on_con_open_not_init.patch
> 44-libceph_initialize_mon_client_con_only_once.patch
> 45-libceph_allow_sock_transition_from_CONNECTING_to_CLOSED.patch
> 46-libceph_initialize_msgpool_message_types.patch
> 47-libceph_prevent_the_race_of_incoming_work_during_teardown.patch
> 48-libceph_report_socket_read_write_error_message.patch
> 49-libceph_fix_mutex_coverage_for_ceph_con_close.patch
> 50-libceph_resubmit_linger_ops_when_pg_mapping_changes.patch
>
>
> On Wed, Nov 21, 2012 at 8:50 AM, Sage Weil <sage@inktank.com> wrote:
>> Thanks for hunting this down.  I'm very curious what the culprit is...
>>
>> sage

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: rbd map command hangs for 15 minutes during system start up
  2012-11-21 17:34                                   ` Nick Bartos
@ 2012-11-21 21:41                                     ` Nick Bartos
  2012-11-22  4:47                                       ` Sage Weil
  0 siblings, 1 reply; 56+ messages in thread
From: Nick Bartos @ 2012-11-21 21:41 UTC (permalink / raw)
  To: Sage Weil; +Cc: Gregory Farnum, Josh Durgin, Mandell Degerness, ceph-devel

FYI the build which included all 3.5 backports except patch #50 is
still going strong after 21 builds.

On Wed, Nov 21, 2012 at 9:34 AM, Nick Bartos <nick@pistoncloud.com> wrote:
> With 8 successful installs already done, I'm reasonably confident that
> it's patch #50.  I'm making another build which applies all patches
> from the 3.5 backport branch, excluding that specific one.  I'll let
> you know if that turns up any unexpected failures.
>
> What will the potential fall out be for removing that specific patch?
>
>
> On Wed, Nov 21, 2012 at 9:02 AM, Nick Bartos <nick@pistoncloud.com> wrote:
>> It's really looking like it's the
>> libceph_resubmit_linger_ops_when_pg_mapping_changes commit.  When
>> patches 1-50 (listed below) are applied to 3.5.7, the hang is present.
>>  So far I have gone through 4 successful installs with no hang with
>> only 1-49 applied.  I'm still leaving my test run to make sure it's
>> not a fluke, but since previously it hangs within the first couple of
>> builds, it really looks like this is where the problem originated.
>>
>> 1-libceph_eliminate_connection_state_DEAD.patch
>> 2-libceph_kill_bad_proto_ceph_connection_op.patch
>> 3-libceph_rename_socket_callbacks.patch
>> 4-libceph_rename_kvec_reset_and_kvec_add_functions.patch
>> 5-libceph_embed_ceph_messenger_structure_in_ceph_client.patch
>> 6-libceph_start_separating_connection_flags_from_state.patch
>> 7-libceph_start_tracking_connection_socket_state.patch
>> 8-libceph_provide_osd_number_when_creating_osd.patch
>> 9-libceph_set_CLOSED_state_bit_in_con_init.patch
>> 10-libceph_embed_ceph_connection_structure_in_mon_client.patch
>> 11-libceph_drop_connection_refcounting_for_mon_client.patch
>> 12-libceph_init_monitor_connection_when_opening.patch
>> 13-libceph_fully_initialize_connection_in_con_init.patch
>> 14-libceph_tweak_ceph_alloc_msg.patch
>> 15-libceph_have_messages_point_to_their_connection.patch
>> 16-libceph_have_messages_take_a_connection_reference.patch
>> 17-libceph_make_ceph_con_revoke_a_msg_operation.patch
>> 18-libceph_make_ceph_con_revoke_message_a_msg_op.patch
>> 19-libceph_fix_overflow_in___decode_pool_names.patch
>> 20-libceph_fix_overflow_in_osdmap_decode.patch
>> 21-libceph_fix_overflow_in_osdmap_apply_incremental.patch
>> 22-libceph_transition_socket_state_prior_to_actual_connect.patch
>> 23-libceph_fix_NULL_dereference_in_reset_connection.patch
>> 24-libceph_use_con_get_put_methods.patch
>> 25-libceph_drop_ceph_con_get_put_helpers_and_nref_member.patch
>> 26-libceph_encapsulate_out_message_data_setup.patch
>> 27-libceph_encapsulate_advancing_msg_page.patch
>> 28-libceph_don_t_mark_footer_complete_before_it_is.patch
>> 29-libceph_move_init_bio__functions_up.patch
>> 30-libceph_move_init_of_bio_iter.patch
>> 31-libceph_don_t_use_bio_iter_as_a_flag.patch
>> 32-libceph_SOCK_CLOSED_is_a_flag_not_a_state.patch
>> 33-libceph_don_t_change_socket_state_on_sock_event.patch
>> 34-libceph_just_set_SOCK_CLOSED_when_state_changes.patch
>> 35-libceph_don_t_touch_con_state_in_con_close_socket.patch
>> 36-libceph_clear_CONNECTING_in_ceph_con_close.patch
>> 37-libceph_clear_NEGOTIATING_when_done.patch
>> 38-libceph_define_and_use_an_explicit_CONNECTED_state.patch
>> 39-libceph_separate_banner_and_connect_writes.patch
>> 40-libceph_distinguish_two_phases_of_connect_sequence.patch
>> 41-libceph_small_changes_to_messenger.c.patch
>> 42-libceph_add_some_fine_ASCII_art.patch
>> 43-libceph_set_peer_name_on_con_open_not_init.patch
>> 44-libceph_initialize_mon_client_con_only_once.patch
>> 45-libceph_allow_sock_transition_from_CONNECTING_to_CLOSED.patch
>> 46-libceph_initialize_msgpool_message_types.patch
>> 47-libceph_prevent_the_race_of_incoming_work_during_teardown.patch
>> 48-libceph_report_socket_read_write_error_message.patch
>> 49-libceph_fix_mutex_coverage_for_ceph_con_close.patch
>> 50-libceph_resubmit_linger_ops_when_pg_mapping_changes.patch
>>
>>
>> On Wed, Nov 21, 2012 at 8:50 AM, Sage Weil <sage@inktank.com> wrote:
>>> Thanks for hunting this down.  I'm very curious what the culprit is...
>>>
>>> sage

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: rbd map command hangs for 15 minutes during system start up
  2012-11-21 21:41                                     ` Nick Bartos
@ 2012-11-22  4:47                                       ` Sage Weil
  2012-11-22  5:49                                         ` Nick Bartos
  0 siblings, 1 reply; 56+ messages in thread
From: Sage Weil @ 2012-11-22  4:47 UTC (permalink / raw)
  To: Nick Bartos
  Cc: elder, Gregory Farnum, Josh Durgin, Mandell Degerness, ceph-devel

On Wed, 21 Nov 2012, Nick Bartos wrote:
> FYI the build which included all 3.5 backports except patch #50 is
> still going strong after 21 builds.

Okay, that one at least makes some sense.  I've opened

	http://tracker.newdream.net/issues/3519

How easy is this to reproduce?  If it is something you can trigger with 
debugging enabled ('echo module libceph +p > 
/sys/kernel/debug/dynamic_debug/control') that would help tremendously.

I'm guessing that during this startup time the OSDs are still in the 
process of starting?

Alex, I bet that a test that does a lot of map/unmap stuff in a loop while 
thrashing OSDs could hit this.

Thanks!
sage


> 
> On Wed, Nov 21, 2012 at 9:34 AM, Nick Bartos <nick@pistoncloud.com> wrote:
> > With 8 successful installs already done, I'm reasonably confident that
> > it's patch #50.  I'm making another build which applies all patches
> > from the 3.5 backport branch, excluding that specific one.  I'll let
> > you know if that turns up any unexpected failures.
> >
> > What will the potential fall out be for removing that specific patch?
> >
> >
> > On Wed, Nov 21, 2012 at 9:02 AM, Nick Bartos <nick@pistoncloud.com> wrote:
> >> It's really looking like it's the
> >> libceph_resubmit_linger_ops_when_pg_mapping_changes commit.  When
> >> patches 1-50 (listed below) are applied to 3.5.7, the hang is present.
> >>  So far I have gone through 4 successful installs with no hang with
> >> only 1-49 applied.  I'm still leaving my test run to make sure it's
> >> not a fluke, but since previously it hangs within the first couple of
> >> builds, it really looks like this is where the problem originated.
> >>
> >> 1-libceph_eliminate_connection_state_DEAD.patch
> >> 2-libceph_kill_bad_proto_ceph_connection_op.patch
> >> 3-libceph_rename_socket_callbacks.patch
> >> 4-libceph_rename_kvec_reset_and_kvec_add_functions.patch
> >> 5-libceph_embed_ceph_messenger_structure_in_ceph_client.patch
> >> 6-libceph_start_separating_connection_flags_from_state.patch
> >> 7-libceph_start_tracking_connection_socket_state.patch
> >> 8-libceph_provide_osd_number_when_creating_osd.patch
> >> 9-libceph_set_CLOSED_state_bit_in_con_init.patch
> >> 10-libceph_embed_ceph_connection_structure_in_mon_client.patch
> >> 11-libceph_drop_connection_refcounting_for_mon_client.patch
> >> 12-libceph_init_monitor_connection_when_opening.patch
> >> 13-libceph_fully_initialize_connection_in_con_init.patch
> >> 14-libceph_tweak_ceph_alloc_msg.patch
> >> 15-libceph_have_messages_point_to_their_connection.patch
> >> 16-libceph_have_messages_take_a_connection_reference.patch
> >> 17-libceph_make_ceph_con_revoke_a_msg_operation.patch
> >> 18-libceph_make_ceph_con_revoke_message_a_msg_op.patch
> >> 19-libceph_fix_overflow_in___decode_pool_names.patch
> >> 20-libceph_fix_overflow_in_osdmap_decode.patch
> >> 21-libceph_fix_overflow_in_osdmap_apply_incremental.patch
> >> 22-libceph_transition_socket_state_prior_to_actual_connect.patch
> >> 23-libceph_fix_NULL_dereference_in_reset_connection.patch
> >> 24-libceph_use_con_get_put_methods.patch
> >> 25-libceph_drop_ceph_con_get_put_helpers_and_nref_member.patch
> >> 26-libceph_encapsulate_out_message_data_setup.patch
> >> 27-libceph_encapsulate_advancing_msg_page.patch
> >> 28-libceph_don_t_mark_footer_complete_before_it_is.patch
> >> 29-libceph_move_init_bio__functions_up.patch
> >> 30-libceph_move_init_of_bio_iter.patch
> >> 31-libceph_don_t_use_bio_iter_as_a_flag.patch
> >> 32-libceph_SOCK_CLOSED_is_a_flag_not_a_state.patch
> >> 33-libceph_don_t_change_socket_state_on_sock_event.patch
> >> 34-libceph_just_set_SOCK_CLOSED_when_state_changes.patch
> >> 35-libceph_don_t_touch_con_state_in_con_close_socket.patch
> >> 36-libceph_clear_CONNECTING_in_ceph_con_close.patch
> >> 37-libceph_clear_NEGOTIATING_when_done.patch
> >> 38-libceph_define_and_use_an_explicit_CONNECTED_state.patch
> >> 39-libceph_separate_banner_and_connect_writes.patch
> >> 40-libceph_distinguish_two_phases_of_connect_sequence.patch
> >> 41-libceph_small_changes_to_messenger.c.patch
> >> 42-libceph_add_some_fine_ASCII_art.patch
> >> 43-libceph_set_peer_name_on_con_open_not_init.patch
> >> 44-libceph_initialize_mon_client_con_only_once.patch
> >> 45-libceph_allow_sock_transition_from_CONNECTING_to_CLOSED.patch
> >> 46-libceph_initialize_msgpool_message_types.patch
> >> 47-libceph_prevent_the_race_of_incoming_work_during_teardown.patch
> >> 48-libceph_report_socket_read_write_error_message.patch
> >> 49-libceph_fix_mutex_coverage_for_ceph_con_close.patch
> >> 50-libceph_resubmit_linger_ops_when_pg_mapping_changes.patch
> >>
> >>
> >> On Wed, Nov 21, 2012 at 8:50 AM, Sage Weil <sage@inktank.com> wrote:
> >>> Thanks for hunting this down.  I'm very curious what the culprit is...
> >>>
> >>> sage
> 
> 

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: rbd map command hangs for 15 minutes during system start up
  2012-11-22  4:47                                       ` Sage Weil
@ 2012-11-22  5:49                                         ` Nick Bartos
  2012-11-22 18:04                                           ` Nick Bartos
  0 siblings, 1 reply; 56+ messages in thread
From: Nick Bartos @ 2012-11-22  5:49 UTC (permalink / raw)
  To: Sage Weil
  Cc: elder, Gregory Farnum, Josh Durgin, Mandell Degerness, ceph-devel

It's very easy to reproduce now with my automated install script, the
most I've seen it succeed with that patch is 2 in a row, and hanging
on the 3rd, although it hangs on most builds.  So it shouldn't take
much to get it to do it again.  I'll try and get to that tomorrow,
when I'm a bit more rested and my brain is working better.

Yes during this the OSDs are probably all syncing up.  All the osd and
mon daemons have started by the time the rdb commands are ran, though.

On Wed, Nov 21, 2012 at 8:47 PM, Sage Weil <sage@inktank.com> wrote:
> On Wed, 21 Nov 2012, Nick Bartos wrote:
>> FYI the build which included all 3.5 backports except patch #50 is
>> still going strong after 21 builds.
>
> Okay, that one at least makes some sense.  I've opened
>
>         http://tracker.newdream.net/issues/3519
>
> How easy is this to reproduce?  If it is something you can trigger with
> debugging enabled ('echo module libceph +p >
> /sys/kernel/debug/dynamic_debug/control') that would help tremendously.
>
> I'm guessing that during this startup time the OSDs are still in the
> process of starting?
>
> Alex, I bet that a test that does a lot of map/unmap stuff in a loop while
> thrashing OSDs could hit this.
>
> Thanks!
> sage
>
>
>>
>> On Wed, Nov 21, 2012 at 9:34 AM, Nick Bartos <nick@pistoncloud.com> wrote:
>> > With 8 successful installs already done, I'm reasonably confident that
>> > it's patch #50.  I'm making another build which applies all patches
>> > from the 3.5 backport branch, excluding that specific one.  I'll let
>> > you know if that turns up any unexpected failures.
>> >
>> > What will the potential fall out be for removing that specific patch?
>> >
>> >
>> > On Wed, Nov 21, 2012 at 9:02 AM, Nick Bartos <nick@pistoncloud.com> wrote:
>> >> It's really looking like it's the
>> >> libceph_resubmit_linger_ops_when_pg_mapping_changes commit.  When
>> >> patches 1-50 (listed below) are applied to 3.5.7, the hang is present.
>> >>  So far I have gone through 4 successful installs with no hang with
>> >> only 1-49 applied.  I'm still leaving my test run to make sure it's
>> >> not a fluke, but since previously it hangs within the first couple of
>> >> builds, it really looks like this is where the problem originated.
>> >>
>> >> 1-libceph_eliminate_connection_state_DEAD.patch
>> >> 2-libceph_kill_bad_proto_ceph_connection_op.patch
>> >> 3-libceph_rename_socket_callbacks.patch
>> >> 4-libceph_rename_kvec_reset_and_kvec_add_functions.patch
>> >> 5-libceph_embed_ceph_messenger_structure_in_ceph_client.patch
>> >> 6-libceph_start_separating_connection_flags_from_state.patch
>> >> 7-libceph_start_tracking_connection_socket_state.patch
>> >> 8-libceph_provide_osd_number_when_creating_osd.patch
>> >> 9-libceph_set_CLOSED_state_bit_in_con_init.patch
>> >> 10-libceph_embed_ceph_connection_structure_in_mon_client.patch
>> >> 11-libceph_drop_connection_refcounting_for_mon_client.patch
>> >> 12-libceph_init_monitor_connection_when_opening.patch
>> >> 13-libceph_fully_initialize_connection_in_con_init.patch
>> >> 14-libceph_tweak_ceph_alloc_msg.patch
>> >> 15-libceph_have_messages_point_to_their_connection.patch
>> >> 16-libceph_have_messages_take_a_connection_reference.patch
>> >> 17-libceph_make_ceph_con_revoke_a_msg_operation.patch
>> >> 18-libceph_make_ceph_con_revoke_message_a_msg_op.patch
>> >> 19-libceph_fix_overflow_in___decode_pool_names.patch
>> >> 20-libceph_fix_overflow_in_osdmap_decode.patch
>> >> 21-libceph_fix_overflow_in_osdmap_apply_incremental.patch
>> >> 22-libceph_transition_socket_state_prior_to_actual_connect.patch
>> >> 23-libceph_fix_NULL_dereference_in_reset_connection.patch
>> >> 24-libceph_use_con_get_put_methods.patch
>> >> 25-libceph_drop_ceph_con_get_put_helpers_and_nref_member.patch
>> >> 26-libceph_encapsulate_out_message_data_setup.patch
>> >> 27-libceph_encapsulate_advancing_msg_page.patch
>> >> 28-libceph_don_t_mark_footer_complete_before_it_is.patch
>> >> 29-libceph_move_init_bio__functions_up.patch
>> >> 30-libceph_move_init_of_bio_iter.patch
>> >> 31-libceph_don_t_use_bio_iter_as_a_flag.patch
>> >> 32-libceph_SOCK_CLOSED_is_a_flag_not_a_state.patch
>> >> 33-libceph_don_t_change_socket_state_on_sock_event.patch
>> >> 34-libceph_just_set_SOCK_CLOSED_when_state_changes.patch
>> >> 35-libceph_don_t_touch_con_state_in_con_close_socket.patch
>> >> 36-libceph_clear_CONNECTING_in_ceph_con_close.patch
>> >> 37-libceph_clear_NEGOTIATING_when_done.patch
>> >> 38-libceph_define_and_use_an_explicit_CONNECTED_state.patch
>> >> 39-libceph_separate_banner_and_connect_writes.patch
>> >> 40-libceph_distinguish_two_phases_of_connect_sequence.patch
>> >> 41-libceph_small_changes_to_messenger.c.patch
>> >> 42-libceph_add_some_fine_ASCII_art.patch
>> >> 43-libceph_set_peer_name_on_con_open_not_init.patch
>> >> 44-libceph_initialize_mon_client_con_only_once.patch
>> >> 45-libceph_allow_sock_transition_from_CONNECTING_to_CLOSED.patch
>> >> 46-libceph_initialize_msgpool_message_types.patch
>> >> 47-libceph_prevent_the_race_of_incoming_work_during_teardown.patch
>> >> 48-libceph_report_socket_read_write_error_message.patch
>> >> 49-libceph_fix_mutex_coverage_for_ceph_con_close.patch
>> >> 50-libceph_resubmit_linger_ops_when_pg_mapping_changes.patch
>> >>
>> >>
>> >> On Wed, Nov 21, 2012 at 8:50 AM, Sage Weil <sage@inktank.com> wrote:
>> >>> Thanks for hunting this down.  I'm very curious what the culprit is...
>> >>>
>> >>> sage
>>
>>

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: rbd map command hangs for 15 minutes during system start up
  2012-11-22  5:49                                         ` Nick Bartos
@ 2012-11-22 18:04                                           ` Nick Bartos
  2012-11-29 20:37                                             ` Alex Elder
  0 siblings, 1 reply; 56+ messages in thread
From: Nick Bartos @ 2012-11-22 18:04 UTC (permalink / raw)
  To: Sage Weil
  Cc: elder, Gregory Farnum, Josh Durgin, Mandell Degerness, ceph-devel

Here are the ceph log messages (including the libceph kernel debug
stuff you asked for) from a node boot with the rbd command hung for a
couple of minutes:

https://raw.github.com/gist/4132395/7cb5f0150179b012429c6e57749120dd88616cce/gistfile1.txt

On Wed, Nov 21, 2012 at 9:49 PM, Nick Bartos <nick@pistoncloud.com> wrote:
> It's very easy to reproduce now with my automated install script, the
> most I've seen it succeed with that patch is 2 in a row, and hanging
> on the 3rd, although it hangs on most builds.  So it shouldn't take
> much to get it to do it again.  I'll try and get to that tomorrow,
> when I'm a bit more rested and my brain is working better.
>
> Yes during this the OSDs are probably all syncing up.  All the osd and
> mon daemons have started by the time the rdb commands are ran, though.
>
> On Wed, Nov 21, 2012 at 8:47 PM, Sage Weil <sage@inktank.com> wrote:
>> On Wed, 21 Nov 2012, Nick Bartos wrote:
>>> FYI the build which included all 3.5 backports except patch #50 is
>>> still going strong after 21 builds.
>>
>> Okay, that one at least makes some sense.  I've opened
>>
>>         http://tracker.newdream.net/issues/3519
>>
>> How easy is this to reproduce?  If it is something you can trigger with
>> debugging enabled ('echo module libceph +p >
>> /sys/kernel/debug/dynamic_debug/control') that would help tremendously.
>>
>> I'm guessing that during this startup time the OSDs are still in the
>> process of starting?
>>
>> Alex, I bet that a test that does a lot of map/unmap stuff in a loop while
>> thrashing OSDs could hit this.
>>
>> Thanks!
>> sage
>>
>>
>>>
>>> On Wed, Nov 21, 2012 at 9:34 AM, Nick Bartos <nick@pistoncloud.com> wrote:
>>> > With 8 successful installs already done, I'm reasonably confident that
>>> > it's patch #50.  I'm making another build which applies all patches
>>> > from the 3.5 backport branch, excluding that specific one.  I'll let
>>> > you know if that turns up any unexpected failures.
>>> >
>>> > What will the potential fall out be for removing that specific patch?
>>> >
>>> >
>>> > On Wed, Nov 21, 2012 at 9:02 AM, Nick Bartos <nick@pistoncloud.com> wrote:
>>> >> It's really looking like it's the
>>> >> libceph_resubmit_linger_ops_when_pg_mapping_changes commit.  When
>>> >> patches 1-50 (listed below) are applied to 3.5.7, the hang is present.
>>> >>  So far I have gone through 4 successful installs with no hang with
>>> >> only 1-49 applied.  I'm still leaving my test run to make sure it's
>>> >> not a fluke, but since previously it hangs within the first couple of
>>> >> builds, it really looks like this is where the problem originated.
>>> >>
>>> >> 1-libceph_eliminate_connection_state_DEAD.patch
>>> >> 2-libceph_kill_bad_proto_ceph_connection_op.patch
>>> >> 3-libceph_rename_socket_callbacks.patch
>>> >> 4-libceph_rename_kvec_reset_and_kvec_add_functions.patch
>>> >> 5-libceph_embed_ceph_messenger_structure_in_ceph_client.patch
>>> >> 6-libceph_start_separating_connection_flags_from_state.patch
>>> >> 7-libceph_start_tracking_connection_socket_state.patch
>>> >> 8-libceph_provide_osd_number_when_creating_osd.patch
>>> >> 9-libceph_set_CLOSED_state_bit_in_con_init.patch
>>> >> 10-libceph_embed_ceph_connection_structure_in_mon_client.patch
>>> >> 11-libceph_drop_connection_refcounting_for_mon_client.patch
>>> >> 12-libceph_init_monitor_connection_when_opening.patch
>>> >> 13-libceph_fully_initialize_connection_in_con_init.patch
>>> >> 14-libceph_tweak_ceph_alloc_msg.patch
>>> >> 15-libceph_have_messages_point_to_their_connection.patch
>>> >> 16-libceph_have_messages_take_a_connection_reference.patch
>>> >> 17-libceph_make_ceph_con_revoke_a_msg_operation.patch
>>> >> 18-libceph_make_ceph_con_revoke_message_a_msg_op.patch
>>> >> 19-libceph_fix_overflow_in___decode_pool_names.patch
>>> >> 20-libceph_fix_overflow_in_osdmap_decode.patch
>>> >> 21-libceph_fix_overflow_in_osdmap_apply_incremental.patch
>>> >> 22-libceph_transition_socket_state_prior_to_actual_connect.patch
>>> >> 23-libceph_fix_NULL_dereference_in_reset_connection.patch
>>> >> 24-libceph_use_con_get_put_methods.patch
>>> >> 25-libceph_drop_ceph_con_get_put_helpers_and_nref_member.patch
>>> >> 26-libceph_encapsulate_out_message_data_setup.patch
>>> >> 27-libceph_encapsulate_advancing_msg_page.patch
>>> >> 28-libceph_don_t_mark_footer_complete_before_it_is.patch
>>> >> 29-libceph_move_init_bio__functions_up.patch
>>> >> 30-libceph_move_init_of_bio_iter.patch
>>> >> 31-libceph_don_t_use_bio_iter_as_a_flag.patch
>>> >> 32-libceph_SOCK_CLOSED_is_a_flag_not_a_state.patch
>>> >> 33-libceph_don_t_change_socket_state_on_sock_event.patch
>>> >> 34-libceph_just_set_SOCK_CLOSED_when_state_changes.patch
>>> >> 35-libceph_don_t_touch_con_state_in_con_close_socket.patch
>>> >> 36-libceph_clear_CONNECTING_in_ceph_con_close.patch
>>> >> 37-libceph_clear_NEGOTIATING_when_done.patch
>>> >> 38-libceph_define_and_use_an_explicit_CONNECTED_state.patch
>>> >> 39-libceph_separate_banner_and_connect_writes.patch
>>> >> 40-libceph_distinguish_two_phases_of_connect_sequence.patch
>>> >> 41-libceph_small_changes_to_messenger.c.patch
>>> >> 42-libceph_add_some_fine_ASCII_art.patch
>>> >> 43-libceph_set_peer_name_on_con_open_not_init.patch
>>> >> 44-libceph_initialize_mon_client_con_only_once.patch
>>> >> 45-libceph_allow_sock_transition_from_CONNECTING_to_CLOSED.patch
>>> >> 46-libceph_initialize_msgpool_message_types.patch
>>> >> 47-libceph_prevent_the_race_of_incoming_work_during_teardown.patch
>>> >> 48-libceph_report_socket_read_write_error_message.patch
>>> >> 49-libceph_fix_mutex_coverage_for_ceph_con_close.patch
>>> >> 50-libceph_resubmit_linger_ops_when_pg_mapping_changes.patch
>>> >>
>>> >>
>>> >> On Wed, Nov 21, 2012 at 8:50 AM, Sage Weil <sage@inktank.com> wrote:
>>> >>> Thanks for hunting this down.  I'm very curious what the culprit is...
>>> >>>
>>> >>> sage
>>>
>>>

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: rbd map command hangs for 15 minutes during system start up
  2012-11-22 18:04                                           ` Nick Bartos
@ 2012-11-29 20:37                                             ` Alex Elder
  2012-11-30 18:49                                               ` Nick Bartos
  2012-11-30 23:22                                               ` Alex Elder
  0 siblings, 2 replies; 56+ messages in thread
From: Alex Elder @ 2012-11-29 20:37 UTC (permalink / raw)
  To: Nick Bartos
  Cc: Sage Weil, Gregory Farnum, Josh Durgin, Mandell Degerness, ceph-devel

On 11/22/2012 12:04 PM, Nick Bartos wrote:
> Here are the ceph log messages (including the libceph kernel debug
> stuff you asked for) from a node boot with the rbd command hung for a
> couple of minutes:

Nick, I have put together a branch that includes two fixes
that might be helpful.  I don't expect these fixes will
necessarily *fix* what you're seeing, but one of them
pulls a big hunk of processing out of the picture and
might help eliminate some potential causes.  I had to
pull in several other patches as prerequisites in order
to get those fixes to apply cleanly.

Would you be able to give it a try, and let us know what
results you get?  The branch contains:
- Linux 3.5.5
- Plus the first 49 patches you listed
- Plus four patches, which are prerequisites...
    libceph: define ceph_extract_encoded_string()
    rbd: define some new format constants
    rbd: define rbd_dev_image_id()
    rbd: kill create_snap sysfs entry
- ...for these two bug fixes:
    libceph: remove 'osdtimeout' option
    ceph: don't reference req after put

The branch is available in the ceph-client git repository
under the name "wip-nick" and has commit id dd9323aa.
    https://github.com/ceph/ceph-client/tree/wip-nick

> https://raw.github.com/gist/4132395/7cb5f0150179b012429c6e57749120dd88616cce/gistfile1.txt

This full debug output is very helpful.  Please supply
that again as well.

Thanks.

					-Alex

> On Wed, Nov 21, 2012 at 9:49 PM, Nick Bartos <nick@pistoncloud.com> wrote:
>> It's very easy to reproduce now with my automated install script, the
>> most I've seen it succeed with that patch is 2 in a row, and hanging
>> on the 3rd, although it hangs on most builds.  So it shouldn't take
>> much to get it to do it again.  I'll try and get to that tomorrow,
>> when I'm a bit more rested and my brain is working better.
>>
>> Yes during this the OSDs are probably all syncing up.  All the osd and
>> mon daemons have started by the time the rdb commands are ran, though.
>>
>> On Wed, Nov 21, 2012 at 8:47 PM, Sage Weil <sage@inktank.com> wrote:
>>> On Wed, 21 Nov 2012, Nick Bartos wrote:
>>>> FYI the build which included all 3.5 backports except patch #50 is
>>>> still going strong after 21 builds.
>>>
>>> Okay, that one at least makes some sense.  I've opened
>>>
>>>         http://tracker.newdream.net/issues/3519
>>>
>>> How easy is this to reproduce?  If it is something you can trigger with
>>> debugging enabled ('echo module libceph +p >
>>> /sys/kernel/debug/dynamic_debug/control') that would help tremendously.
>>>
>>> I'm guessing that during this startup time the OSDs are still in the
>>> process of starting?
>>>
>>> Alex, I bet that a test that does a lot of map/unmap stuff in a loop while
>>> thrashing OSDs could hit this.
>>>
>>> Thanks!
>>> sage
>>>
>>>
>>>>
>>>> On Wed, Nov 21, 2012 at 9:34 AM, Nick Bartos <nick@pistoncloud.com> wrote:
>>>>> With 8 successful installs already done, I'm reasonably confident that
>>>>> it's patch #50.  I'm making another build which applies all patches
>>>>> from the 3.5 backport branch, excluding that specific one.  I'll let
>>>>> you know if that turns up any unexpected failures.
>>>>>
>>>>> What will the potential fall out be for removing that specific patch?
>>>>>
>>>>>
>>>>> On Wed, Nov 21, 2012 at 9:02 AM, Nick Bartos <nick@pistoncloud.com> wrote:
>>>>>> It's really looking like it's the
>>>>>> libceph_resubmit_linger_ops_when_pg_mapping_changes commit.  When
>>>>>> patches 1-50 (listed below) are applied to 3.5.7, the hang is present.
>>>>>>  So far I have gone through 4 successful installs with no hang with
>>>>>> only 1-49 applied.  I'm still leaving my test run to make sure it's
>>>>>> not a fluke, but since previously it hangs within the first couple of
>>>>>> builds, it really looks like this is where the problem originated.
>>>>>>
>>>>>> 1-libceph_eliminate_connection_state_DEAD.patch
>>>>>> 2-libceph_kill_bad_proto_ceph_connection_op.patch
>>>>>> 3-libceph_rename_socket_callbacks.patch
>>>>>> 4-libceph_rename_kvec_reset_and_kvec_add_functions.patch
>>>>>> 5-libceph_embed_ceph_messenger_structure_in_ceph_client.patch
>>>>>> 6-libceph_start_separating_connection_flags_from_state.patch
>>>>>> 7-libceph_start_tracking_connection_socket_state.patch
>>>>>> 8-libceph_provide_osd_number_when_creating_osd.patch
>>>>>> 9-libceph_set_CLOSED_state_bit_in_con_init.patch
>>>>>> 10-libceph_embed_ceph_connection_structure_in_mon_client.patch
>>>>>> 11-libceph_drop_connection_refcounting_for_mon_client.patch
>>>>>> 12-libceph_init_monitor_connection_when_opening.patch
>>>>>> 13-libceph_fully_initialize_connection_in_con_init.patch
>>>>>> 14-libceph_tweak_ceph_alloc_msg.patch
>>>>>> 15-libceph_have_messages_point_to_their_connection.patch
>>>>>> 16-libceph_have_messages_take_a_connection_reference.patch
>>>>>> 17-libceph_make_ceph_con_revoke_a_msg_operation.patch
>>>>>> 18-libceph_make_ceph_con_revoke_message_a_msg_op.patch
>>>>>> 19-libceph_fix_overflow_in___decode_pool_names.patch
>>>>>> 20-libceph_fix_overflow_in_osdmap_decode.patch
>>>>>> 21-libceph_fix_overflow_in_osdmap_apply_incremental.patch
>>>>>> 22-libceph_transition_socket_state_prior_to_actual_connect.patch
>>>>>> 23-libceph_fix_NULL_dereference_in_reset_connection.patch
>>>>>> 24-libceph_use_con_get_put_methods.patch
>>>>>> 25-libceph_drop_ceph_con_get_put_helpers_and_nref_member.patch
>>>>>> 26-libceph_encapsulate_out_message_data_setup.patch
>>>>>> 27-libceph_encapsulate_advancing_msg_page.patch
>>>>>> 28-libceph_don_t_mark_footer_complete_before_it_is.patch
>>>>>> 29-libceph_move_init_bio__functions_up.patch
>>>>>> 30-libceph_move_init_of_bio_iter.patch
>>>>>> 31-libceph_don_t_use_bio_iter_as_a_flag.patch
>>>>>> 32-libceph_SOCK_CLOSED_is_a_flag_not_a_state.patch
>>>>>> 33-libceph_don_t_change_socket_state_on_sock_event.patch
>>>>>> 34-libceph_just_set_SOCK_CLOSED_when_state_changes.patch
>>>>>> 35-libceph_don_t_touch_con_state_in_con_close_socket.patch
>>>>>> 36-libceph_clear_CONNECTING_in_ceph_con_close.patch
>>>>>> 37-libceph_clear_NEGOTIATING_when_done.patch
>>>>>> 38-libceph_define_and_use_an_explicit_CONNECTED_state.patch
>>>>>> 39-libceph_separate_banner_and_connect_writes.patch
>>>>>> 40-libceph_distinguish_two_phases_of_connect_sequence.patch
>>>>>> 41-libceph_small_changes_to_messenger.c.patch
>>>>>> 42-libceph_add_some_fine_ASCII_art.patch
>>>>>> 43-libceph_set_peer_name_on_con_open_not_init.patch
>>>>>> 44-libceph_initialize_mon_client_con_only_once.patch
>>>>>> 45-libceph_allow_sock_transition_from_CONNECTING_to_CLOSED.patch
>>>>>> 46-libceph_initialize_msgpool_message_types.patch
>>>>>> 47-libceph_prevent_the_race_of_incoming_work_during_teardown.patch
>>>>>> 48-libceph_report_socket_read_write_error_message.patch
>>>>>> 49-libceph_fix_mutex_coverage_for_ceph_con_close.patch
>>>>>> 50-libceph_resubmit_linger_ops_when_pg_mapping_changes.patch
>>>>>>
>>>>>>
>>>>>> On Wed, Nov 21, 2012 at 8:50 AM, Sage Weil <sage@inktank.com> wrote:
>>>>>>> Thanks for hunting this down.  I'm very curious what the culprit is...
>>>>>>>
>>>>>>> sage
>>>>
>>>>


^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: rbd map command hangs for 15 minutes during system start up
  2012-11-29 20:37                                             ` Alex Elder
@ 2012-11-30 18:49                                               ` Nick Bartos
  2012-11-30 19:10                                                 ` Alex Elder
  2012-11-30 23:22                                               ` Alex Elder
  1 sibling, 1 reply; 56+ messages in thread
From: Nick Bartos @ 2012-11-30 18:49 UTC (permalink / raw)
  To: Alex Elder
  Cc: Sage Weil, Gregory Farnum, Josh Durgin, Mandell Degerness, ceph-devel

My initial tests using a 3.5.7 kernel with the 55 patches from
wip-nick are going well.  So far I've gone through 8 installs without
an incident, I'll leave it run for a bit longer to see if it crops up
again.

Can I get a branch with these patches integrated into all of the
backported patches to 3.5.x?  I'd like to get this into our main
testing branch, which is currently running 3.5.7 with the patches from
wip-3.5 excluding the
libceph_resubmit_linger_ops_when_pg_mapping_changes patch.

Note that we had a case of a rbd map hang with our main testing
branch, but I don't have a script that can reproduce that yet.  It was
after the cluster was all up and working, and we were  doing a rolling
reboot (cycling through each node).


On Thu, Nov 29, 2012 at 12:37 PM, Alex Elder <elder@inktank.com> wrote:
> On 11/22/2012 12:04 PM, Nick Bartos wrote:
>> Here are the ceph log messages (including the libceph kernel debug
>> stuff you asked for) from a node boot with the rbd command hung for a
>> couple of minutes:
>
> Nick, I have put together a branch that includes two fixes
> that might be helpful.  I don't expect these fixes will
> necessarily *fix* what you're seeing, but one of them
> pulls a big hunk of processing out of the picture and
> might help eliminate some potential causes.  I had to
> pull in several other patches as prerequisites in order
> to get those fixes to apply cleanly.
>
> Would you be able to give it a try, and let us know what
> results you get?  The branch contains:
> - Linux 3.5.5
> - Plus the first 49 patches you listed
> - Plus four patches, which are prerequisites...
>     libceph: define ceph_extract_encoded_string()
>     rbd: define some new format constants
>     rbd: define rbd_dev_image_id()
>     rbd: kill create_snap sysfs entry
> - ...for these two bug fixes:
>     libceph: remove 'osdtimeout' option
>     ceph: don't reference req after put
>
> The branch is available in the ceph-client git repository
> under the name "wip-nick" and has commit id dd9323aa.
>     https://github.com/ceph/ceph-client/tree/wip-nick
>
>> https://raw.github.com/gist/4132395/7cb5f0150179b012429c6e57749120dd88616cce/gistfile1.txt
>
> This full debug output is very helpful.  Please supply
> that again as well.
>
> Thanks.
>
>                                         -Alex
>
>> On Wed, Nov 21, 2012 at 9:49 PM, Nick Bartos <nick@pistoncloud.com> wrote:
>>> It's very easy to reproduce now with my automated install script, the
>>> most I've seen it succeed with that patch is 2 in a row, and hanging
>>> on the 3rd, although it hangs on most builds.  So it shouldn't take
>>> much to get it to do it again.  I'll try and get to that tomorrow,
>>> when I'm a bit more rested and my brain is working better.
>>>
>>> Yes during this the OSDs are probably all syncing up.  All the osd and
>>> mon daemons have started by the time the rdb commands are ran, though.
>>>
>>> On Wed, Nov 21, 2012 at 8:47 PM, Sage Weil <sage@inktank.com> wrote:
>>>> On Wed, 21 Nov 2012, Nick Bartos wrote:
>>>>> FYI the build which included all 3.5 backports except patch #50 is
>>>>> still going strong after 21 builds.
>>>>
>>>> Okay, that one at least makes some sense.  I've opened
>>>>
>>>>         http://tracker.newdream.net/issues/3519
>>>>
>>>> How easy is this to reproduce?  If it is something you can trigger with
>>>> debugging enabled ('echo module libceph +p >
>>>> /sys/kernel/debug/dynamic_debug/control') that would help tremendously.
>>>>
>>>> I'm guessing that during this startup time the OSDs are still in the
>>>> process of starting?
>>>>
>>>> Alex, I bet that a test that does a lot of map/unmap stuff in a loop while
>>>> thrashing OSDs could hit this.
>>>>
>>>> Thanks!
>>>> sage
>>>>
>>>>
>>>>>
>>>>> On Wed, Nov 21, 2012 at 9:34 AM, Nick Bartos <nick@pistoncloud.com> wrote:
>>>>>> With 8 successful installs already done, I'm reasonably confident that
>>>>>> it's patch #50.  I'm making another build which applies all patches
>>>>>> from the 3.5 backport branch, excluding that specific one.  I'll let
>>>>>> you know if that turns up any unexpected failures.
>>>>>>
>>>>>> What will the potential fall out be for removing that specific patch?
>>>>>>
>>>>>>
>>>>>> On Wed, Nov 21, 2012 at 9:02 AM, Nick Bartos <nick@pistoncloud.com> wrote:
>>>>>>> It's really looking like it's the
>>>>>>> libceph_resubmit_linger_ops_when_pg_mapping_changes commit.  When
>>>>>>> patches 1-50 (listed below) are applied to 3.5.7, the hang is present.
>>>>>>>  So far I have gone through 4 successful installs with no hang with
>>>>>>> only 1-49 applied.  I'm still leaving my test run to make sure it's
>>>>>>> not a fluke, but since previously it hangs within the first couple of
>>>>>>> builds, it really looks like this is where the problem originated.
>>>>>>>
>>>>>>> 1-libceph_eliminate_connection_state_DEAD.patch
>>>>>>> 2-libceph_kill_bad_proto_ceph_connection_op.patch
>>>>>>> 3-libceph_rename_socket_callbacks.patch
>>>>>>> 4-libceph_rename_kvec_reset_and_kvec_add_functions.patch
>>>>>>> 5-libceph_embed_ceph_messenger_structure_in_ceph_client.patch
>>>>>>> 6-libceph_start_separating_connection_flags_from_state.patch
>>>>>>> 7-libceph_start_tracking_connection_socket_state.patch
>>>>>>> 8-libceph_provide_osd_number_when_creating_osd.patch
>>>>>>> 9-libceph_set_CLOSED_state_bit_in_con_init.patch
>>>>>>> 10-libceph_embed_ceph_connection_structure_in_mon_client.patch
>>>>>>> 11-libceph_drop_connection_refcounting_for_mon_client.patch
>>>>>>> 12-libceph_init_monitor_connection_when_opening.patch
>>>>>>> 13-libceph_fully_initialize_connection_in_con_init.patch
>>>>>>> 14-libceph_tweak_ceph_alloc_msg.patch
>>>>>>> 15-libceph_have_messages_point_to_their_connection.patch
>>>>>>> 16-libceph_have_messages_take_a_connection_reference.patch
>>>>>>> 17-libceph_make_ceph_con_revoke_a_msg_operation.patch
>>>>>>> 18-libceph_make_ceph_con_revoke_message_a_msg_op.patch
>>>>>>> 19-libceph_fix_overflow_in___decode_pool_names.patch
>>>>>>> 20-libceph_fix_overflow_in_osdmap_decode.patch
>>>>>>> 21-libceph_fix_overflow_in_osdmap_apply_incremental.patch
>>>>>>> 22-libceph_transition_socket_state_prior_to_actual_connect.patch
>>>>>>> 23-libceph_fix_NULL_dereference_in_reset_connection.patch
>>>>>>> 24-libceph_use_con_get_put_methods.patch
>>>>>>> 25-libceph_drop_ceph_con_get_put_helpers_and_nref_member.patch
>>>>>>> 26-libceph_encapsulate_out_message_data_setup.patch
>>>>>>> 27-libceph_encapsulate_advancing_msg_page.patch
>>>>>>> 28-libceph_don_t_mark_footer_complete_before_it_is.patch
>>>>>>> 29-libceph_move_init_bio__functions_up.patch
>>>>>>> 30-libceph_move_init_of_bio_iter.patch
>>>>>>> 31-libceph_don_t_use_bio_iter_as_a_flag.patch
>>>>>>> 32-libceph_SOCK_CLOSED_is_a_flag_not_a_state.patch
>>>>>>> 33-libceph_don_t_change_socket_state_on_sock_event.patch
>>>>>>> 34-libceph_just_set_SOCK_CLOSED_when_state_changes.patch
>>>>>>> 35-libceph_don_t_touch_con_state_in_con_close_socket.patch
>>>>>>> 36-libceph_clear_CONNECTING_in_ceph_con_close.patch
>>>>>>> 37-libceph_clear_NEGOTIATING_when_done.patch
>>>>>>> 38-libceph_define_and_use_an_explicit_CONNECTED_state.patch
>>>>>>> 39-libceph_separate_banner_and_connect_writes.patch
>>>>>>> 40-libceph_distinguish_two_phases_of_connect_sequence.patch
>>>>>>> 41-libceph_small_changes_to_messenger.c.patch
>>>>>>> 42-libceph_add_some_fine_ASCII_art.patch
>>>>>>> 43-libceph_set_peer_name_on_con_open_not_init.patch
>>>>>>> 44-libceph_initialize_mon_client_con_only_once.patch
>>>>>>> 45-libceph_allow_sock_transition_from_CONNECTING_to_CLOSED.patch
>>>>>>> 46-libceph_initialize_msgpool_message_types.patch
>>>>>>> 47-libceph_prevent_the_race_of_incoming_work_during_teardown.patch
>>>>>>> 48-libceph_report_socket_read_write_error_message.patch
>>>>>>> 49-libceph_fix_mutex_coverage_for_ceph_con_close.patch
>>>>>>> 50-libceph_resubmit_linger_ops_when_pg_mapping_changes.patch
>>>>>>>
>>>>>>>
>>>>>>> On Wed, Nov 21, 2012 at 8:50 AM, Sage Weil <sage@inktank.com> wrote:
>>>>>>>> Thanks for hunting this down.  I'm very curious what the culprit is...
>>>>>>>>
>>>>>>>> sage
>>>>>
>>>>>
>

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: rbd map command hangs for 15 minutes during system start up
  2012-11-30 18:49                                               ` Nick Bartos
@ 2012-11-30 19:10                                                 ` Alex Elder
  2012-11-30 19:31                                                   ` Sage Weil
  0 siblings, 1 reply; 56+ messages in thread
From: Alex Elder @ 2012-11-30 19:10 UTC (permalink / raw)
  To: Nick Bartos
  Cc: Sage Weil, Gregory Farnum, Josh Durgin, Mandell Degerness, ceph-devel

On 11/30/2012 12:49 PM, Nick Bartos wrote:
> My initial tests using a 3.5.7 kernel with the 55 patches from
> wip-nick are going well.  So far I've gone through 8 installs without
> an incident, I'll leave it run for a bit longer to see if it crops up
> again.

This is great news!  Now I wonder which of the two fixes took
care of the problem...

> Can I get a branch with these patches integrated into all of the
> backported patches to 3.5.x?  I'd like to get this into our main
> testing branch, which is currently running 3.5.7 with the patches from
> wip-3.5 excluding the
> libceph_resubmit_linger_ops_when_pg_mapping_changes patch.

I will put together a new branch that includes the remainder
of those patches for you shortly.

> Note that we had a case of a rbd map hang with our main testing
> branch, but I don't have a script that can reproduce that yet.  It was
> after the cluster was all up and working, and we were  doing a rolling
> reboot (cycling through each node).

If you are able to reproduce this please let us know.

					-Alex

> 
> 
> On Thu, Nov 29, 2012 at 12:37 PM, Alex Elder <elder@inktank.com> wrote:
>> On 11/22/2012 12:04 PM, Nick Bartos wrote:
>>> Here are the ceph log messages (including the libceph kernel debug
>>> stuff you asked for) from a node boot with the rbd command hung for a
>>> couple of minutes:
>>
>> Nick, I have put together a branch that includes two fixes
>> that might be helpful.  I don't expect these fixes will
>> necessarily *fix* what you're seeing, but one of them
>> pulls a big hunk of processing out of the picture and
>> might help eliminate some potential causes.  I had to
>> pull in several other patches as prerequisites in order
>> to get those fixes to apply cleanly.
>>
>> Would you be able to give it a try, and let us know what
>> results you get?  The branch contains:
>> - Linux 3.5.5
>> - Plus the first 49 patches you listed
>> - Plus four patches, which are prerequisites...
>>     libceph: define ceph_extract_encoded_string()
>>     rbd: define some new format constants
>>     rbd: define rbd_dev_image_id()
>>     rbd: kill create_snap sysfs entry
>> - ...for these two bug fixes:
>>     libceph: remove 'osdtimeout' option
>>     ceph: don't reference req after put
>>
>> The branch is available in the ceph-client git repository
>> under the name "wip-nick" and has commit id dd9323aa.
>>     https://github.com/ceph/ceph-client/tree/wip-nick
>>
>>> https://raw.github.com/gist/4132395/7cb5f0150179b012429c6e57749120dd88616cce/gistfile1.txt
>>
>> This full debug output is very helpful.  Please supply
>> that again as well.
>>
>> Thanks.
>>
>>                                         -Alex
>>
>>> On Wed, Nov 21, 2012 at 9:49 PM, Nick Bartos <nick@pistoncloud.com> wrote:
>>>> It's very easy to reproduce now with my automated install script, the
>>>> most I've seen it succeed with that patch is 2 in a row, and hanging
>>>> on the 3rd, although it hangs on most builds.  So it shouldn't take
>>>> much to get it to do it again.  I'll try and get to that tomorrow,
>>>> when I'm a bit more rested and my brain is working better.
>>>>
>>>> Yes during this the OSDs are probably all syncing up.  All the osd and
>>>> mon daemons have started by the time the rdb commands are ran, though.
>>>>
>>>> On Wed, Nov 21, 2012 at 8:47 PM, Sage Weil <sage@inktank.com> wrote:
>>>>> On Wed, 21 Nov 2012, Nick Bartos wrote:
>>>>>> FYI the build which included all 3.5 backports except patch #50 is
>>>>>> still going strong after 21 builds.
>>>>>
>>>>> Okay, that one at least makes some sense.  I've opened
>>>>>
>>>>>         http://tracker.newdream.net/issues/3519
>>>>>
>>>>> How easy is this to reproduce?  If it is something you can trigger with
>>>>> debugging enabled ('echo module libceph +p >
>>>>> /sys/kernel/debug/dynamic_debug/control') that would help tremendously.
>>>>>
>>>>> I'm guessing that during this startup time the OSDs are still in the
>>>>> process of starting?
>>>>>
>>>>> Alex, I bet that a test that does a lot of map/unmap stuff in a loop while
>>>>> thrashing OSDs could hit this.
>>>>>
>>>>> Thanks!
>>>>> sage
>>>>>
>>>>>
>>>>>>
>>>>>> On Wed, Nov 21, 2012 at 9:34 AM, Nick Bartos <nick@pistoncloud.com> wrote:
>>>>>>> With 8 successful installs already done, I'm reasonably confident that
>>>>>>> it's patch #50.  I'm making another build which applies all patches
>>>>>>> from the 3.5 backport branch, excluding that specific one.  I'll let
>>>>>>> you know if that turns up any unexpected failures.
>>>>>>>
>>>>>>> What will the potential fall out be for removing that specific patch?
>>>>>>>
>>>>>>>
>>>>>>> On Wed, Nov 21, 2012 at 9:02 AM, Nick Bartos <nick@pistoncloud.com> wrote:
>>>>>>>> It's really looking like it's the
>>>>>>>> libceph_resubmit_linger_ops_when_pg_mapping_changes commit.  When
>>>>>>>> patches 1-50 (listed below) are applied to 3.5.7, the hang is present.
>>>>>>>>  So far I have gone through 4 successful installs with no hang with
>>>>>>>> only 1-49 applied.  I'm still leaving my test run to make sure it's
>>>>>>>> not a fluke, but since previously it hangs within the first couple of
>>>>>>>> builds, it really looks like this is where the problem originated.
>>>>>>>>
>>>>>>>> 1-libceph_eliminate_connection_state_DEAD.patch
>>>>>>>> 2-libceph_kill_bad_proto_ceph_connection_op.patch
>>>>>>>> 3-libceph_rename_socket_callbacks.patch
>>>>>>>> 4-libceph_rename_kvec_reset_and_kvec_add_functions.patch
>>>>>>>> 5-libceph_embed_ceph_messenger_structure_in_ceph_client.patch
>>>>>>>> 6-libceph_start_separating_connection_flags_from_state.patch
>>>>>>>> 7-libceph_start_tracking_connection_socket_state.patch
>>>>>>>> 8-libceph_provide_osd_number_when_creating_osd.patch
>>>>>>>> 9-libceph_set_CLOSED_state_bit_in_con_init.patch
>>>>>>>> 10-libceph_embed_ceph_connection_structure_in_mon_client.patch
>>>>>>>> 11-libceph_drop_connection_refcounting_for_mon_client.patch
>>>>>>>> 12-libceph_init_monitor_connection_when_opening.patch
>>>>>>>> 13-libceph_fully_initialize_connection_in_con_init.patch
>>>>>>>> 14-libceph_tweak_ceph_alloc_msg.patch
>>>>>>>> 15-libceph_have_messages_point_to_their_connection.patch
>>>>>>>> 16-libceph_have_messages_take_a_connection_reference.patch
>>>>>>>> 17-libceph_make_ceph_con_revoke_a_msg_operation.patch
>>>>>>>> 18-libceph_make_ceph_con_revoke_message_a_msg_op.patch
>>>>>>>> 19-libceph_fix_overflow_in___decode_pool_names.patch
>>>>>>>> 20-libceph_fix_overflow_in_osdmap_decode.patch
>>>>>>>> 21-libceph_fix_overflow_in_osdmap_apply_incremental.patch
>>>>>>>> 22-libceph_transition_socket_state_prior_to_actual_connect.patch
>>>>>>>> 23-libceph_fix_NULL_dereference_in_reset_connection.patch
>>>>>>>> 24-libceph_use_con_get_put_methods.patch
>>>>>>>> 25-libceph_drop_ceph_con_get_put_helpers_and_nref_member.patch
>>>>>>>> 26-libceph_encapsulate_out_message_data_setup.patch
>>>>>>>> 27-libceph_encapsulate_advancing_msg_page.patch
>>>>>>>> 28-libceph_don_t_mark_footer_complete_before_it_is.patch
>>>>>>>> 29-libceph_move_init_bio__functions_up.patch
>>>>>>>> 30-libceph_move_init_of_bio_iter.patch
>>>>>>>> 31-libceph_don_t_use_bio_iter_as_a_flag.patch
>>>>>>>> 32-libceph_SOCK_CLOSED_is_a_flag_not_a_state.patch
>>>>>>>> 33-libceph_don_t_change_socket_state_on_sock_event.patch
>>>>>>>> 34-libceph_just_set_SOCK_CLOSED_when_state_changes.patch
>>>>>>>> 35-libceph_don_t_touch_con_state_in_con_close_socket.patch
>>>>>>>> 36-libceph_clear_CONNECTING_in_ceph_con_close.patch
>>>>>>>> 37-libceph_clear_NEGOTIATING_when_done.patch
>>>>>>>> 38-libceph_define_and_use_an_explicit_CONNECTED_state.patch
>>>>>>>> 39-libceph_separate_banner_and_connect_writes.patch
>>>>>>>> 40-libceph_distinguish_two_phases_of_connect_sequence.patch
>>>>>>>> 41-libceph_small_changes_to_messenger.c.patch
>>>>>>>> 42-libceph_add_some_fine_ASCII_art.patch
>>>>>>>> 43-libceph_set_peer_name_on_con_open_not_init.patch
>>>>>>>> 44-libceph_initialize_mon_client_con_only_once.patch
>>>>>>>> 45-libceph_allow_sock_transition_from_CONNECTING_to_CLOSED.patch
>>>>>>>> 46-libceph_initialize_msgpool_message_types.patch
>>>>>>>> 47-libceph_prevent_the_race_of_incoming_work_during_teardown.patch
>>>>>>>> 48-libceph_report_socket_read_write_error_message.patch
>>>>>>>> 49-libceph_fix_mutex_coverage_for_ceph_con_close.patch
>>>>>>>> 50-libceph_resubmit_linger_ops_when_pg_mapping_changes.patch
>>>>>>>>
>>>>>>>>
>>>>>>>> On Wed, Nov 21, 2012 at 8:50 AM, Sage Weil <sage@inktank.com> wrote:
>>>>>>>>> Thanks for hunting this down.  I'm very curious what the culprit is...
>>>>>>>>>
>>>>>>>>> sage
>>>>>>
>>>>>>
>>


^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: rbd map command hangs for 15 minutes during system start up
  2012-11-30 19:10                                                 ` Alex Elder
@ 2012-11-30 19:31                                                   ` Sage Weil
  0 siblings, 0 replies; 56+ messages in thread
From: Sage Weil @ 2012-11-30 19:31 UTC (permalink / raw)
  To: Alex Elder
  Cc: Nick Bartos, Gregory Farnum, Josh Durgin, Mandell Degerness, ceph-devel

On Fri, 30 Nov 2012, Alex Elder wrote:
> On 11/30/2012 12:49 PM, Nick Bartos wrote:
> > My initial tests using a 3.5.7 kernel with the 55 patches from
> > wip-nick are going well.  So far I've gone through 8 installs without
> > an incident, I'll leave it run for a bit longer to see if it crops up
> > again.
> 
> This is great news!  Now I wonder which of the two fixes took
> care of the problem...
> 
> > Can I get a branch with these patches integrated into all of the
> > backported patches to 3.5.x?  I'd like to get this into our main
> > testing branch, which is currently running 3.5.7 with the patches from
> > wip-3.5 excluding the
> > libceph_resubmit_linger_ops_when_pg_mapping_changes patch.
> 
> I will put together a new branch that includes the remainder
> of those patches for you shortly.
> 
> > Note that we had a case of a rbd map hang with our main testing
> > branch, but I don't have a script that can reproduce that yet.  It was
> > after the cluster was all up and working, and we were  doing a rolling
> > reboot (cycling through each node).
> 
> If you are able to reproduce this please let us know.

It sounds to me like it might be the same problem.  If we're lucky, those 
2 patches will resolve this as well.

(says the optimist!)
sage


> 
> 					-Alex
> 
> > 
> > 
> > On Thu, Nov 29, 2012 at 12:37 PM, Alex Elder <elder@inktank.com> wrote:
> >> On 11/22/2012 12:04 PM, Nick Bartos wrote:
> >>> Here are the ceph log messages (including the libceph kernel debug
> >>> stuff you asked for) from a node boot with the rbd command hung for a
> >>> couple of minutes:
> >>
> >> Nick, I have put together a branch that includes two fixes
> >> that might be helpful.  I don't expect these fixes will
> >> necessarily *fix* what you're seeing, but one of them
> >> pulls a big hunk of processing out of the picture and
> >> might help eliminate some potential causes.  I had to
> >> pull in several other patches as prerequisites in order
> >> to get those fixes to apply cleanly.
> >>
> >> Would you be able to give it a try, and let us know what
> >> results you get?  The branch contains:
> >> - Linux 3.5.5
> >> - Plus the first 49 patches you listed
> >> - Plus four patches, which are prerequisites...
> >>     libceph: define ceph_extract_encoded_string()
> >>     rbd: define some new format constants
> >>     rbd: define rbd_dev_image_id()
> >>     rbd: kill create_snap sysfs entry
> >> - ...for these two bug fixes:
> >>     libceph: remove 'osdtimeout' option
> >>     ceph: don't reference req after put
> >>
> >> The branch is available in the ceph-client git repository
> >> under the name "wip-nick" and has commit id dd9323aa.
> >>     https://github.com/ceph/ceph-client/tree/wip-nick
> >>
> >>> https://raw.github.com/gist/4132395/7cb5f0150179b012429c6e57749120dd88616cce/gistfile1.txt
> >>
> >> This full debug output is very helpful.  Please supply
> >> that again as well.
> >>
> >> Thanks.
> >>
> >>                                         -Alex
> >>
> >>> On Wed, Nov 21, 2012 at 9:49 PM, Nick Bartos <nick@pistoncloud.com> wrote:
> >>>> It's very easy to reproduce now with my automated install script, the
> >>>> most I've seen it succeed with that patch is 2 in a row, and hanging
> >>>> on the 3rd, although it hangs on most builds.  So it shouldn't take
> >>>> much to get it to do it again.  I'll try and get to that tomorrow,
> >>>> when I'm a bit more rested and my brain is working better.
> >>>>
> >>>> Yes during this the OSDs are probably all syncing up.  All the osd and
> >>>> mon daemons have started by the time the rdb commands are ran, though.
> >>>>
> >>>> On Wed, Nov 21, 2012 at 8:47 PM, Sage Weil <sage@inktank.com> wrote:
> >>>>> On Wed, 21 Nov 2012, Nick Bartos wrote:
> >>>>>> FYI the build which included all 3.5 backports except patch #50 is
> >>>>>> still going strong after 21 builds.
> >>>>>
> >>>>> Okay, that one at least makes some sense.  I've opened
> >>>>>
> >>>>>         http://tracker.newdream.net/issues/3519
> >>>>>
> >>>>> How easy is this to reproduce?  If it is something you can trigger with
> >>>>> debugging enabled ('echo module libceph +p >
> >>>>> /sys/kernel/debug/dynamic_debug/control') that would help tremendously.
> >>>>>
> >>>>> I'm guessing that during this startup time the OSDs are still in the
> >>>>> process of starting?
> >>>>>
> >>>>> Alex, I bet that a test that does a lot of map/unmap stuff in a loop while
> >>>>> thrashing OSDs could hit this.
> >>>>>
> >>>>> Thanks!
> >>>>> sage
> >>>>>
> >>>>>
> >>>>>>
> >>>>>> On Wed, Nov 21, 2012 at 9:34 AM, Nick Bartos <nick@pistoncloud.com> wrote:
> >>>>>>> With 8 successful installs already done, I'm reasonably confident that
> >>>>>>> it's patch #50.  I'm making another build which applies all patches
> >>>>>>> from the 3.5 backport branch, excluding that specific one.  I'll let
> >>>>>>> you know if that turns up any unexpected failures.
> >>>>>>>
> >>>>>>> What will the potential fall out be for removing that specific patch?
> >>>>>>>
> >>>>>>>
> >>>>>>> On Wed, Nov 21, 2012 at 9:02 AM, Nick Bartos <nick@pistoncloud.com> wrote:
> >>>>>>>> It's really looking like it's the
> >>>>>>>> libceph_resubmit_linger_ops_when_pg_mapping_changes commit.  When
> >>>>>>>> patches 1-50 (listed below) are applied to 3.5.7, the hang is present.
> >>>>>>>>  So far I have gone through 4 successful installs with no hang with
> >>>>>>>> only 1-49 applied.  I'm still leaving my test run to make sure it's
> >>>>>>>> not a fluke, but since previously it hangs within the first couple of
> >>>>>>>> builds, it really looks like this is where the problem originated.
> >>>>>>>>
> >>>>>>>> 1-libceph_eliminate_connection_state_DEAD.patch
> >>>>>>>> 2-libceph_kill_bad_proto_ceph_connection_op.patch
> >>>>>>>> 3-libceph_rename_socket_callbacks.patch
> >>>>>>>> 4-libceph_rename_kvec_reset_and_kvec_add_functions.patch
> >>>>>>>> 5-libceph_embed_ceph_messenger_structure_in_ceph_client.patch
> >>>>>>>> 6-libceph_start_separating_connection_flags_from_state.patch
> >>>>>>>> 7-libceph_start_tracking_connection_socket_state.patch
> >>>>>>>> 8-libceph_provide_osd_number_when_creating_osd.patch
> >>>>>>>> 9-libceph_set_CLOSED_state_bit_in_con_init.patch
> >>>>>>>> 10-libceph_embed_ceph_connection_structure_in_mon_client.patch
> >>>>>>>> 11-libceph_drop_connection_refcounting_for_mon_client.patch
> >>>>>>>> 12-libceph_init_monitor_connection_when_opening.patch
> >>>>>>>> 13-libceph_fully_initialize_connection_in_con_init.patch
> >>>>>>>> 14-libceph_tweak_ceph_alloc_msg.patch
> >>>>>>>> 15-libceph_have_messages_point_to_their_connection.patch
> >>>>>>>> 16-libceph_have_messages_take_a_connection_reference.patch
> >>>>>>>> 17-libceph_make_ceph_con_revoke_a_msg_operation.patch
> >>>>>>>> 18-libceph_make_ceph_con_revoke_message_a_msg_op.patch
> >>>>>>>> 19-libceph_fix_overflow_in___decode_pool_names.patch
> >>>>>>>> 20-libceph_fix_overflow_in_osdmap_decode.patch
> >>>>>>>> 21-libceph_fix_overflow_in_osdmap_apply_incremental.patch
> >>>>>>>> 22-libceph_transition_socket_state_prior_to_actual_connect.patch
> >>>>>>>> 23-libceph_fix_NULL_dereference_in_reset_connection.patch
> >>>>>>>> 24-libceph_use_con_get_put_methods.patch
> >>>>>>>> 25-libceph_drop_ceph_con_get_put_helpers_and_nref_member.patch
> >>>>>>>> 26-libceph_encapsulate_out_message_data_setup.patch
> >>>>>>>> 27-libceph_encapsulate_advancing_msg_page.patch
> >>>>>>>> 28-libceph_don_t_mark_footer_complete_before_it_is.patch
> >>>>>>>> 29-libceph_move_init_bio__functions_up.patch
> >>>>>>>> 30-libceph_move_init_of_bio_iter.patch
> >>>>>>>> 31-libceph_don_t_use_bio_iter_as_a_flag.patch
> >>>>>>>> 32-libceph_SOCK_CLOSED_is_a_flag_not_a_state.patch
> >>>>>>>> 33-libceph_don_t_change_socket_state_on_sock_event.patch
> >>>>>>>> 34-libceph_just_set_SOCK_CLOSED_when_state_changes.patch
> >>>>>>>> 35-libceph_don_t_touch_con_state_in_con_close_socket.patch
> >>>>>>>> 36-libceph_clear_CONNECTING_in_ceph_con_close.patch
> >>>>>>>> 37-libceph_clear_NEGOTIATING_when_done.patch
> >>>>>>>> 38-libceph_define_and_use_an_explicit_CONNECTED_state.patch
> >>>>>>>> 39-libceph_separate_banner_and_connect_writes.patch
> >>>>>>>> 40-libceph_distinguish_two_phases_of_connect_sequence.patch
> >>>>>>>> 41-libceph_small_changes_to_messenger.c.patch
> >>>>>>>> 42-libceph_add_some_fine_ASCII_art.patch
> >>>>>>>> 43-libceph_set_peer_name_on_con_open_not_init.patch
> >>>>>>>> 44-libceph_initialize_mon_client_con_only_once.patch
> >>>>>>>> 45-libceph_allow_sock_transition_from_CONNECTING_to_CLOSED.patch
> >>>>>>>> 46-libceph_initialize_msgpool_message_types.patch
> >>>>>>>> 47-libceph_prevent_the_race_of_incoming_work_during_teardown.patch
> >>>>>>>> 48-libceph_report_socket_read_write_error_message.patch
> >>>>>>>> 49-libceph_fix_mutex_coverage_for_ceph_con_close.patch
> >>>>>>>> 50-libceph_resubmit_linger_ops_when_pg_mapping_changes.patch
> >>>>>>>>
> >>>>>>>>
> >>>>>>>> On Wed, Nov 21, 2012 at 8:50 AM, Sage Weil <sage@inktank.com> wrote:
> >>>>>>>>> Thanks for hunting this down.  I'm very curious what the culprit is...
> >>>>>>>>>
> >>>>>>>>> sage
> >>>>>>
> >>>>>>
> >>
> 
> --
> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 
> 

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: rbd map command hangs for 15 minutes during system start up
  2012-11-29 20:37                                             ` Alex Elder
  2012-11-30 18:49                                               ` Nick Bartos
@ 2012-11-30 23:22                                               ` Alex Elder
  2012-12-02  5:34                                                 ` Nick Bartos
  1 sibling, 1 reply; 56+ messages in thread
From: Alex Elder @ 2012-11-30 23:22 UTC (permalink / raw)
  To: Nick Bartos
  Cc: Sage Weil, Gregory Farnum, Josh Durgin, Mandell Degerness, ceph-devel

On 11/29/2012 02:37 PM, Alex Elder wrote:
> On 11/22/2012 12:04 PM, Nick Bartos wrote:
>> Here are the ceph log messages (including the libceph kernel debug
>> stuff you asked for) from a node boot with the rbd command hung for a
>> couple of minutes:

I'm sorry, but I did something stupid...

Yes, the branch I gave you includes these fixes.  However
it does *not* include the commit that was giving you trouble
to begin with.

So...

I have updated that same branch (wip-nick) to contain:
- Linux 3.5.5
- Plus the first *50* (not 49) patches you listed
- Plus the ones I added before.

The new commit id for that branch begins with be3198d6.

I'm really sorry for this mistake.  Please try this new
branch and report back what you find.

					-Alex


> Nick, I have put together a branch that includes two fixes
> that might be helpful.  I don't expect these fixes will
> necessarily *fix* what you're seeing, but one of them
> pulls a big hunk of processing out of the picture and
> might help eliminate some potential causes.  I had to
> pull in several other patches as prerequisites in order
> to get those fixes to apply cleanly.
> 
> Would you be able to give it a try, and let us know what
> results you get?  The branch contains:
> - Linux 3.5.5
> - Plus the first 49 patches you listed
> - Plus four patches, which are prerequisites...
>     libceph: define ceph_extract_encoded_string()
>     rbd: define some new format constants
>     rbd: define rbd_dev_image_id()
>     rbd: kill create_snap sysfs entry
> - ...for these two bug fixes:
>     libceph: remove 'osdtimeout' option
>     ceph: don't reference req after put
> 
> The branch is available in the ceph-client git repository
> under the name "wip-nick" and has commit id dd9323aa.
>     https://github.com/ceph/ceph-client/tree/wip-nick
> 
>> https://raw.github.com/gist/4132395/7cb5f0150179b012429c6e57749120dd88616cce/gistfile1.txt
> 
> This full debug output is very helpful.  Please supply
> that again as well.
> 
> Thanks.
> 
> 					-Alex
> 
>> On Wed, Nov 21, 2012 at 9:49 PM, Nick Bartos <nick@pistoncloud.com> wrote:
>>> It's very easy to reproduce now with my automated install script, the
>>> most I've seen it succeed with that patch is 2 in a row, and hanging
>>> on the 3rd, although it hangs on most builds.  So it shouldn't take
>>> much to get it to do it again.  I'll try and get to that tomorrow,
>>> when I'm a bit more rested and my brain is working better.
>>>
>>> Yes during this the OSDs are probably all syncing up.  All the osd and
>>> mon daemons have started by the time the rdb commands are ran, though.
>>>
>>> On Wed, Nov 21, 2012 at 8:47 PM, Sage Weil <sage@inktank.com> wrote:
>>>> On Wed, 21 Nov 2012, Nick Bartos wrote:
>>>>> FYI the build which included all 3.5 backports except patch #50 is
>>>>> still going strong after 21 builds.
>>>>
>>>> Okay, that one at least makes some sense.  I've opened
>>>>
>>>>         http://tracker.newdream.net/issues/3519
>>>>
>>>> How easy is this to reproduce?  If it is something you can trigger with
>>>> debugging enabled ('echo module libceph +p >
>>>> /sys/kernel/debug/dynamic_debug/control') that would help tremendously.
>>>>
>>>> I'm guessing that during this startup time the OSDs are still in the
>>>> process of starting?
>>>>
>>>> Alex, I bet that a test that does a lot of map/unmap stuff in a loop while
>>>> thrashing OSDs could hit this.
>>>>
>>>> Thanks!
>>>> sage
>>>>
>>>>
>>>>>
>>>>> On Wed, Nov 21, 2012 at 9:34 AM, Nick Bartos <nick@pistoncloud.com> wrote:
>>>>>> With 8 successful installs already done, I'm reasonably confident that
>>>>>> it's patch #50.  I'm making another build which applies all patches
>>>>>> from the 3.5 backport branch, excluding that specific one.  I'll let
>>>>>> you know if that turns up any unexpected failures.
>>>>>>
>>>>>> What will the potential fall out be for removing that specific patch?
>>>>>>
>>>>>>
>>>>>> On Wed, Nov 21, 2012 at 9:02 AM, Nick Bartos <nick@pistoncloud.com> wrote:
>>>>>>> It's really looking like it's the
>>>>>>> libceph_resubmit_linger_ops_when_pg_mapping_changes commit.  When
>>>>>>> patches 1-50 (listed below) are applied to 3.5.7, the hang is present.
>>>>>>>  So far I have gone through 4 successful installs with no hang with
>>>>>>> only 1-49 applied.  I'm still leaving my test run to make sure it's
>>>>>>> not a fluke, but since previously it hangs within the first couple of
>>>>>>> builds, it really looks like this is where the problem originated.
>>>>>>>
>>>>>>> 1-libceph_eliminate_connection_state_DEAD.patch
>>>>>>> 2-libceph_kill_bad_proto_ceph_connection_op.patch
>>>>>>> 3-libceph_rename_socket_callbacks.patch
>>>>>>> 4-libceph_rename_kvec_reset_and_kvec_add_functions.patch
>>>>>>> 5-libceph_embed_ceph_messenger_structure_in_ceph_client.patch
>>>>>>> 6-libceph_start_separating_connection_flags_from_state.patch
>>>>>>> 7-libceph_start_tracking_connection_socket_state.patch
>>>>>>> 8-libceph_provide_osd_number_when_creating_osd.patch
>>>>>>> 9-libceph_set_CLOSED_state_bit_in_con_init.patch
>>>>>>> 10-libceph_embed_ceph_connection_structure_in_mon_client.patch
>>>>>>> 11-libceph_drop_connection_refcounting_for_mon_client.patch
>>>>>>> 12-libceph_init_monitor_connection_when_opening.patch
>>>>>>> 13-libceph_fully_initialize_connection_in_con_init.patch
>>>>>>> 14-libceph_tweak_ceph_alloc_msg.patch
>>>>>>> 15-libceph_have_messages_point_to_their_connection.patch
>>>>>>> 16-libceph_have_messages_take_a_connection_reference.patch
>>>>>>> 17-libceph_make_ceph_con_revoke_a_msg_operation.patch
>>>>>>> 18-libceph_make_ceph_con_revoke_message_a_msg_op.patch
>>>>>>> 19-libceph_fix_overflow_in___decode_pool_names.patch
>>>>>>> 20-libceph_fix_overflow_in_osdmap_decode.patch
>>>>>>> 21-libceph_fix_overflow_in_osdmap_apply_incremental.patch
>>>>>>> 22-libceph_transition_socket_state_prior_to_actual_connect.patch
>>>>>>> 23-libceph_fix_NULL_dereference_in_reset_connection.patch
>>>>>>> 24-libceph_use_con_get_put_methods.patch
>>>>>>> 25-libceph_drop_ceph_con_get_put_helpers_and_nref_member.patch
>>>>>>> 26-libceph_encapsulate_out_message_data_setup.patch
>>>>>>> 27-libceph_encapsulate_advancing_msg_page.patch
>>>>>>> 28-libceph_don_t_mark_footer_complete_before_it_is.patch
>>>>>>> 29-libceph_move_init_bio__functions_up.patch
>>>>>>> 30-libceph_move_init_of_bio_iter.patch
>>>>>>> 31-libceph_don_t_use_bio_iter_as_a_flag.patch
>>>>>>> 32-libceph_SOCK_CLOSED_is_a_flag_not_a_state.patch
>>>>>>> 33-libceph_don_t_change_socket_state_on_sock_event.patch
>>>>>>> 34-libceph_just_set_SOCK_CLOSED_when_state_changes.patch
>>>>>>> 35-libceph_don_t_touch_con_state_in_con_close_socket.patch
>>>>>>> 36-libceph_clear_CONNECTING_in_ceph_con_close.patch
>>>>>>> 37-libceph_clear_NEGOTIATING_when_done.patch
>>>>>>> 38-libceph_define_and_use_an_explicit_CONNECTED_state.patch
>>>>>>> 39-libceph_separate_banner_and_connect_writes.patch
>>>>>>> 40-libceph_distinguish_two_phases_of_connect_sequence.patch
>>>>>>> 41-libceph_small_changes_to_messenger.c.patch
>>>>>>> 42-libceph_add_some_fine_ASCII_art.patch
>>>>>>> 43-libceph_set_peer_name_on_con_open_not_init.patch
>>>>>>> 44-libceph_initialize_mon_client_con_only_once.patch
>>>>>>> 45-libceph_allow_sock_transition_from_CONNECTING_to_CLOSED.patch
>>>>>>> 46-libceph_initialize_msgpool_message_types.patch
>>>>>>> 47-libceph_prevent_the_race_of_incoming_work_during_teardown.patch
>>>>>>> 48-libceph_report_socket_read_write_error_message.patch
>>>>>>> 49-libceph_fix_mutex_coverage_for_ceph_con_close.patch
>>>>>>> 50-libceph_resubmit_linger_ops_when_pg_mapping_changes.patch
>>>>>>>
>>>>>>>
>>>>>>> On Wed, Nov 21, 2012 at 8:50 AM, Sage Weil <sage@inktank.com> wrote:
>>>>>>>> Thanks for hunting this down.  I'm very curious what the culprit is...
>>>>>>>>
>>>>>>>> sage
>>>>>
>>>>>
> 


^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: rbd map command hangs for 15 minutes during system start up
  2012-11-30 23:22                                               ` Alex Elder
@ 2012-12-02  5:34                                                 ` Nick Bartos
  2012-12-03  4:43                                                   ` Alex Elder
  0 siblings, 1 reply; 56+ messages in thread
From: Nick Bartos @ 2012-12-02  5:34 UTC (permalink / raw)
  To: Alex Elder
  Cc: Sage Weil, Gregory Farnum, Josh Durgin, Mandell Degerness, ceph-devel

Unfortunately the hangs happen with the new set of patches.  Here's
some debug info:

https://gist.github.com/raw/4187123/90194ce172130244a9c1c968ed185eee7282d809/gistfile1.txt


On Fri, Nov 30, 2012 at 3:22 PM, Alex Elder <elder@inktank.com> wrote:
> On 11/29/2012 02:37 PM, Alex Elder wrote:
>> On 11/22/2012 12:04 PM, Nick Bartos wrote:
>>> Here are the ceph log messages (including the libceph kernel debug
>>> stuff you asked for) from a node boot with the rbd command hung for a
>>> couple of minutes:
>
> I'm sorry, but I did something stupid...
>
> Yes, the branch I gave you includes these fixes.  However
> it does *not* include the commit that was giving you trouble
> to begin with.
>
> So...
>
> I have updated that same branch (wip-nick) to contain:
> - Linux 3.5.5
> - Plus the first *50* (not 49) patches you listed
> - Plus the ones I added before.
>
> The new commit id for that branch begins with be3198d6.
>
> I'm really sorry for this mistake.  Please try this new
> branch and report back what you find.
>
>                                         -Alex
>
>
>> Nick, I have put together a branch that includes two fixes
>> that might be helpful.  I don't expect these fixes will
>> necessarily *fix* what you're seeing, but one of them
>> pulls a big hunk of processing out of the picture and
>> might help eliminate some potential causes.  I had to
>> pull in several other patches as prerequisites in order
>> to get those fixes to apply cleanly.
>>
>> Would you be able to give it a try, and let us know what
>> results you get?  The branch contains:
>> - Linux 3.5.5
>> - Plus the first 49 patches you listed
>> - Plus four patches, which are prerequisites...
>>     libceph: define ceph_extract_encoded_string()
>>     rbd: define some new format constants
>>     rbd: define rbd_dev_image_id()
>>     rbd: kill create_snap sysfs entry
>> - ...for these two bug fixes:
>>     libceph: remove 'osdtimeout' option
>>     ceph: don't reference req after put
>>
>> The branch is available in the ceph-client git repository
>> under the name "wip-nick" and has commit id dd9323aa.
>>     https://github.com/ceph/ceph-client/tree/wip-nick
>>
>>> https://raw.github.com/gist/4132395/7cb5f0150179b012429c6e57749120dd88616cce/gistfile1.txt
>>
>> This full debug output is very helpful.  Please supply
>> that again as well.
>>
>> Thanks.
>>
>>                                       -Alex
>>
>>> On Wed, Nov 21, 2012 at 9:49 PM, Nick Bartos <nick@pistoncloud.com> wrote:
>>>> It's very easy to reproduce now with my automated install script, the
>>>> most I've seen it succeed with that patch is 2 in a row, and hanging
>>>> on the 3rd, although it hangs on most builds.  So it shouldn't take
>>>> much to get it to do it again.  I'll try and get to that tomorrow,
>>>> when I'm a bit more rested and my brain is working better.
>>>>
>>>> Yes during this the OSDs are probably all syncing up.  All the osd and
>>>> mon daemons have started by the time the rdb commands are ran, though.
>>>>
>>>> On Wed, Nov 21, 2012 at 8:47 PM, Sage Weil <sage@inktank.com> wrote:
>>>>> On Wed, 21 Nov 2012, Nick Bartos wrote:
>>>>>> FYI the build which included all 3.5 backports except patch #50 is
>>>>>> still going strong after 21 builds.
>>>>>
>>>>> Okay, that one at least makes some sense.  I've opened
>>>>>
>>>>>         http://tracker.newdream.net/issues/3519
>>>>>
>>>>> How easy is this to reproduce?  If it is something you can trigger with
>>>>> debugging enabled ('echo module libceph +p >
>>>>> /sys/kernel/debug/dynamic_debug/control') that would help tremendously.
>>>>>
>>>>> I'm guessing that during this startup time the OSDs are still in the
>>>>> process of starting?
>>>>>
>>>>> Alex, I bet that a test that does a lot of map/unmap stuff in a loop while
>>>>> thrashing OSDs could hit this.
>>>>>
>>>>> Thanks!
>>>>> sage
>>>>>
>>>>>
>>>>>>
>>>>>> On Wed, Nov 21, 2012 at 9:34 AM, Nick Bartos <nick@pistoncloud.com> wrote:
>>>>>>> With 8 successful installs already done, I'm reasonably confident that
>>>>>>> it's patch #50.  I'm making another build which applies all patches
>>>>>>> from the 3.5 backport branch, excluding that specific one.  I'll let
>>>>>>> you know if that turns up any unexpected failures.
>>>>>>>
>>>>>>> What will the potential fall out be for removing that specific patch?
>>>>>>>
>>>>>>>
>>>>>>> On Wed, Nov 21, 2012 at 9:02 AM, Nick Bartos <nick@pistoncloud.com> wrote:
>>>>>>>> It's really looking like it's the
>>>>>>>> libceph_resubmit_linger_ops_when_pg_mapping_changes commit.  When
>>>>>>>> patches 1-50 (listed below) are applied to 3.5.7, the hang is present.
>>>>>>>>  So far I have gone through 4 successful installs with no hang with
>>>>>>>> only 1-49 applied.  I'm still leaving my test run to make sure it's
>>>>>>>> not a fluke, but since previously it hangs within the first couple of
>>>>>>>> builds, it really looks like this is where the problem originated.
>>>>>>>>
>>>>>>>> 1-libceph_eliminate_connection_state_DEAD.patch
>>>>>>>> 2-libceph_kill_bad_proto_ceph_connection_op.patch
>>>>>>>> 3-libceph_rename_socket_callbacks.patch
>>>>>>>> 4-libceph_rename_kvec_reset_and_kvec_add_functions.patch
>>>>>>>> 5-libceph_embed_ceph_messenger_structure_in_ceph_client.patch
>>>>>>>> 6-libceph_start_separating_connection_flags_from_state.patch
>>>>>>>> 7-libceph_start_tracking_connection_socket_state.patch
>>>>>>>> 8-libceph_provide_osd_number_when_creating_osd.patch
>>>>>>>> 9-libceph_set_CLOSED_state_bit_in_con_init.patch
>>>>>>>> 10-libceph_embed_ceph_connection_structure_in_mon_client.patch
>>>>>>>> 11-libceph_drop_connection_refcounting_for_mon_client.patch
>>>>>>>> 12-libceph_init_monitor_connection_when_opening.patch
>>>>>>>> 13-libceph_fully_initialize_connection_in_con_init.patch
>>>>>>>> 14-libceph_tweak_ceph_alloc_msg.patch
>>>>>>>> 15-libceph_have_messages_point_to_their_connection.patch
>>>>>>>> 16-libceph_have_messages_take_a_connection_reference.patch
>>>>>>>> 17-libceph_make_ceph_con_revoke_a_msg_operation.patch
>>>>>>>> 18-libceph_make_ceph_con_revoke_message_a_msg_op.patch
>>>>>>>> 19-libceph_fix_overflow_in___decode_pool_names.patch
>>>>>>>> 20-libceph_fix_overflow_in_osdmap_decode.patch
>>>>>>>> 21-libceph_fix_overflow_in_osdmap_apply_incremental.patch
>>>>>>>> 22-libceph_transition_socket_state_prior_to_actual_connect.patch
>>>>>>>> 23-libceph_fix_NULL_dereference_in_reset_connection.patch
>>>>>>>> 24-libceph_use_con_get_put_methods.patch
>>>>>>>> 25-libceph_drop_ceph_con_get_put_helpers_and_nref_member.patch
>>>>>>>> 26-libceph_encapsulate_out_message_data_setup.patch
>>>>>>>> 27-libceph_encapsulate_advancing_msg_page.patch
>>>>>>>> 28-libceph_don_t_mark_footer_complete_before_it_is.patch
>>>>>>>> 29-libceph_move_init_bio__functions_up.patch
>>>>>>>> 30-libceph_move_init_of_bio_iter.patch
>>>>>>>> 31-libceph_don_t_use_bio_iter_as_a_flag.patch
>>>>>>>> 32-libceph_SOCK_CLOSED_is_a_flag_not_a_state.patch
>>>>>>>> 33-libceph_don_t_change_socket_state_on_sock_event.patch
>>>>>>>> 34-libceph_just_set_SOCK_CLOSED_when_state_changes.patch
>>>>>>>> 35-libceph_don_t_touch_con_state_in_con_close_socket.patch
>>>>>>>> 36-libceph_clear_CONNECTING_in_ceph_con_close.patch
>>>>>>>> 37-libceph_clear_NEGOTIATING_when_done.patch
>>>>>>>> 38-libceph_define_and_use_an_explicit_CONNECTED_state.patch
>>>>>>>> 39-libceph_separate_banner_and_connect_writes.patch
>>>>>>>> 40-libceph_distinguish_two_phases_of_connect_sequence.patch
>>>>>>>> 41-libceph_small_changes_to_messenger.c.patch
>>>>>>>> 42-libceph_add_some_fine_ASCII_art.patch
>>>>>>>> 43-libceph_set_peer_name_on_con_open_not_init.patch
>>>>>>>> 44-libceph_initialize_mon_client_con_only_once.patch
>>>>>>>> 45-libceph_allow_sock_transition_from_CONNECTING_to_CLOSED.patch
>>>>>>>> 46-libceph_initialize_msgpool_message_types.patch
>>>>>>>> 47-libceph_prevent_the_race_of_incoming_work_during_teardown.patch
>>>>>>>> 48-libceph_report_socket_read_write_error_message.patch
>>>>>>>> 49-libceph_fix_mutex_coverage_for_ceph_con_close.patch
>>>>>>>> 50-libceph_resubmit_linger_ops_when_pg_mapping_changes.patch
>>>>>>>>
>>>>>>>>
>>>>>>>> On Wed, Nov 21, 2012 at 8:50 AM, Sage Weil <sage@inktank.com> wrote:
>>>>>>>>> Thanks for hunting this down.  I'm very curious what the culprit is...
>>>>>>>>>
>>>>>>>>> sage
>>>>>>
>>>>>>
>>
>

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: rbd map command hangs for 15 minutes during system start up
  2012-12-02  5:34                                                 ` Nick Bartos
@ 2012-12-03  4:43                                                   ` Alex Elder
  2012-12-10 21:57                                                     ` Alex Elder
  0 siblings, 1 reply; 56+ messages in thread
From: Alex Elder @ 2012-12-03  4:43 UTC (permalink / raw)
  To: Nick Bartos
  Cc: Sage Weil, Gregory Farnum, Josh Durgin, Mandell Degerness, ceph-devel

On 12/01/2012 11:34 PM, Nick Bartos wrote:
> Unfortunately the hangs happen with the new set of patches.  Here's
> some debug info:
>
> https://gist.github.com/raw/4187123/90194ce172130244a9c1c968ed185eee7282d809/gistfile1.txt

Well I'm sorry to hear that but I'm glad to have the new info.

In retrospect running the new patches *without* the one that
seems to cause the hang (#50) were good validation that they
didn't lead to any new problems.

I'll look at this some more in the morning, and I think I'll
confer with Sage whenever he's available for ideas on how to
proceed.

					-Alex


> On Fri, Nov 30, 2012 at 3:22 PM, Alex Elder <elder@inktank.com> wrote:
>> On 11/29/2012 02:37 PM, Alex Elder wrote:
>>> On 11/22/2012 12:04 PM, Nick Bartos wrote:
>>>> Here are the ceph log messages (including the libceph kernel debug
>>>> stuff you asked for) from a node boot with the rbd command hung for a
>>>> couple of minutes:
>>
>> I'm sorry, but I did something stupid...
>>
>> Yes, the branch I gave you includes these fixes.  However
>> it does *not* include the commit that was giving you trouble
>> to begin with.
>>
>> So...
>>
>> I have updated that same branch (wip-nick) to contain:
>> - Linux 3.5.5
>> - Plus the first *50* (not 49) patches you listed
>> - Plus the ones I added before.
>>
>> The new commit id for that branch begins with be3198d6.
>>
>> I'm really sorry for this mistake.  Please try this new
>> branch and report back what you find.
>>
>>                                          -Alex
>>
>>
>>> Nick, I have put together a branch that includes two fixes
>>> that might be helpful.  I don't expect these fixes will
>>> necessarily *fix* what you're seeing, but one of them
>>> pulls a big hunk of processing out of the picture and
>>> might help eliminate some potential causes.  I had to
>>> pull in several other patches as prerequisites in order
>>> to get those fixes to apply cleanly.
>>>
>>> Would you be able to give it a try, and let us know what
>>> results you get?  The branch contains:
>>> - Linux 3.5.5
>>> - Plus the first 49 patches you listed
>>> - Plus four patches, which are prerequisites...
>>>      libceph: define ceph_extract_encoded_string()
>>>      rbd: define some new format constants
>>>      rbd: define rbd_dev_image_id()
>>>      rbd: kill create_snap sysfs entry
>>> - ...for these two bug fixes:
>>>      libceph: remove 'osdtimeout' option
>>>      ceph: don't reference req after put
>>>
>>> The branch is available in the ceph-client git repository
>>> under the name "wip-nick" and has commit id dd9323aa.
>>>      https://github.com/ceph/ceph-client/tree/wip-nick
>>>
>>>> https://raw.github.com/gist/4132395/7cb5f0150179b012429c6e57749120dd88616cce/gistfile1.txt
>>>
>>> This full debug output is very helpful.  Please supply
>>> that again as well.
>>>
>>> Thanks.
>>>
>>>                                        -Alex
>>>
>>>> On Wed, Nov 21, 2012 at 9:49 PM, Nick Bartos <nick@pistoncloud.com> wrote:
>>>>> It's very easy to reproduce now with my automated install script, the
>>>>> most I've seen it succeed with that patch is 2 in a row, and hanging
>>>>> on the 3rd, although it hangs on most builds.  So it shouldn't take
>>>>> much to get it to do it again.  I'll try and get to that tomorrow,
>>>>> when I'm a bit more rested and my brain is working better.
>>>>>
>>>>> Yes during this the OSDs are probably all syncing up.  All the osd and
>>>>> mon daemons have started by the time the rdb commands are ran, though.
>>>>>
>>>>> On Wed, Nov 21, 2012 at 8:47 PM, Sage Weil <sage@inktank.com> wrote:
>>>>>> On Wed, 21 Nov 2012, Nick Bartos wrote:
>>>>>>> FYI the build which included all 3.5 backports except patch #50 is
>>>>>>> still going strong after 21 builds.
>>>>>>
>>>>>> Okay, that one at least makes some sense.  I've opened
>>>>>>
>>>>>>          http://tracker.newdream.net/issues/3519
>>>>>>
>>>>>> How easy is this to reproduce?  If it is something you can trigger with
>>>>>> debugging enabled ('echo module libceph +p >
>>>>>> /sys/kernel/debug/dynamic_debug/control') that would help tremendously.
>>>>>>
>>>>>> I'm guessing that during this startup time the OSDs are still in the
>>>>>> process of starting?
>>>>>>
>>>>>> Alex, I bet that a test that does a lot of map/unmap stuff in a loop while
>>>>>> thrashing OSDs could hit this.
>>>>>>
>>>>>> Thanks!
>>>>>> sage
>>>>>>
>>>>>>
>>>>>>>
>>>>>>> On Wed, Nov 21, 2012 at 9:34 AM, Nick Bartos <nick@pistoncloud.com> wrote:
>>>>>>>> With 8 successful installs already done, I'm reasonably confident that
>>>>>>>> it's patch #50.  I'm making another build which applies all patches
>>>>>>>> from the 3.5 backport branch, excluding that specific one.  I'll let
>>>>>>>> you know if that turns up any unexpected failures.
>>>>>>>>
>>>>>>>> What will the potential fall out be for removing that specific patch?
>>>>>>>>
>>>>>>>>
>>>>>>>> On Wed, Nov 21, 2012 at 9:02 AM, Nick Bartos <nick@pistoncloud.com> wrote:
>>>>>>>>> It's really looking like it's the
>>>>>>>>> libceph_resubmit_linger_ops_when_pg_mapping_changes commit.  When
>>>>>>>>> patches 1-50 (listed below) are applied to 3.5.7, the hang is present.
>>>>>>>>>   So far I have gone through 4 successful installs with no hang with
>>>>>>>>> only 1-49 applied.  I'm still leaving my test run to make sure it's
>>>>>>>>> not a fluke, but since previously it hangs within the first couple of
>>>>>>>>> builds, it really looks like this is where the problem originated.
>>>>>>>>>
>>>>>>>>> 1-libceph_eliminate_connection_state_DEAD.patch
>>>>>>>>> 2-libceph_kill_bad_proto_ceph_connection_op.patch
>>>>>>>>> 3-libceph_rename_socket_callbacks.patch
>>>>>>>>> 4-libceph_rename_kvec_reset_and_kvec_add_functions.patch
>>>>>>>>> 5-libceph_embed_ceph_messenger_structure_in_ceph_client.patch
>>>>>>>>> 6-libceph_start_separating_connection_flags_from_state.patch
>>>>>>>>> 7-libceph_start_tracking_connection_socket_state.patch
>>>>>>>>> 8-libceph_provide_osd_number_when_creating_osd.patch
>>>>>>>>> 9-libceph_set_CLOSED_state_bit_in_con_init.patch
>>>>>>>>> 10-libceph_embed_ceph_connection_structure_in_mon_client.patch
>>>>>>>>> 11-libceph_drop_connection_refcounting_for_mon_client.patch
>>>>>>>>> 12-libceph_init_monitor_connection_when_opening.patch
>>>>>>>>> 13-libceph_fully_initialize_connection_in_con_init.patch
>>>>>>>>> 14-libceph_tweak_ceph_alloc_msg.patch
>>>>>>>>> 15-libceph_have_messages_point_to_their_connection.patch
>>>>>>>>> 16-libceph_have_messages_take_a_connection_reference.patch
>>>>>>>>> 17-libceph_make_ceph_con_revoke_a_msg_operation.patch
>>>>>>>>> 18-libceph_make_ceph_con_revoke_message_a_msg_op.patch
>>>>>>>>> 19-libceph_fix_overflow_in___decode_pool_names.patch
>>>>>>>>> 20-libceph_fix_overflow_in_osdmap_decode.patch
>>>>>>>>> 21-libceph_fix_overflow_in_osdmap_apply_incremental.patch
>>>>>>>>> 22-libceph_transition_socket_state_prior_to_actual_connect.patch
>>>>>>>>> 23-libceph_fix_NULL_dereference_in_reset_connection.patch
>>>>>>>>> 24-libceph_use_con_get_put_methods.patch
>>>>>>>>> 25-libceph_drop_ceph_con_get_put_helpers_and_nref_member.patch
>>>>>>>>> 26-libceph_encapsulate_out_message_data_setup.patch
>>>>>>>>> 27-libceph_encapsulate_advancing_msg_page.patch
>>>>>>>>> 28-libceph_don_t_mark_footer_complete_before_it_is.patch
>>>>>>>>> 29-libceph_move_init_bio__functions_up.patch
>>>>>>>>> 30-libceph_move_init_of_bio_iter.patch
>>>>>>>>> 31-libceph_don_t_use_bio_iter_as_a_flag.patch
>>>>>>>>> 32-libceph_SOCK_CLOSED_is_a_flag_not_a_state.patch
>>>>>>>>> 33-libceph_don_t_change_socket_state_on_sock_event.patch
>>>>>>>>> 34-libceph_just_set_SOCK_CLOSED_when_state_changes.patch
>>>>>>>>> 35-libceph_don_t_touch_con_state_in_con_close_socket.patch
>>>>>>>>> 36-libceph_clear_CONNECTING_in_ceph_con_close.patch
>>>>>>>>> 37-libceph_clear_NEGOTIATING_when_done.patch
>>>>>>>>> 38-libceph_define_and_use_an_explicit_CONNECTED_state.patch
>>>>>>>>> 39-libceph_separate_banner_and_connect_writes.patch
>>>>>>>>> 40-libceph_distinguish_two_phases_of_connect_sequence.patch
>>>>>>>>> 41-libceph_small_changes_to_messenger.c.patch
>>>>>>>>> 42-libceph_add_some_fine_ASCII_art.patch
>>>>>>>>> 43-libceph_set_peer_name_on_con_open_not_init.patch
>>>>>>>>> 44-libceph_initialize_mon_client_con_only_once.patch
>>>>>>>>> 45-libceph_allow_sock_transition_from_CONNECTING_to_CLOSED.patch
>>>>>>>>> 46-libceph_initialize_msgpool_message_types.patch
>>>>>>>>> 47-libceph_prevent_the_race_of_incoming_work_during_teardown.patch
>>>>>>>>> 48-libceph_report_socket_read_write_error_message.patch
>>>>>>>>> 49-libceph_fix_mutex_coverage_for_ceph_con_close.patch
>>>>>>>>> 50-libceph_resubmit_linger_ops_when_pg_mapping_changes.patch
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> On Wed, Nov 21, 2012 at 8:50 AM, Sage Weil <sage@inktank.com> wrote:
>>>>>>>>>> Thanks for hunting this down.  I'm very curious what the culprit is...
>>>>>>>>>>
>>>>>>>>>> sage
>>>>>>>
>>>>>>>
>>>
>>


^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: rbd map command hangs for 15 minutes during system start up
  2012-12-03  4:43                                                   ` Alex Elder
@ 2012-12-10 21:57                                                     ` Alex Elder
  2012-12-11 17:26                                                       ` Nick Bartos
  0 siblings, 1 reply; 56+ messages in thread
From: Alex Elder @ 2012-12-10 21:57 UTC (permalink / raw)
  To: Nick Bartos
  Cc: Sage Weil, Gregory Farnum, Josh Durgin, Mandell Degerness, ceph-devel

On 12/02/2012 10:43 PM, Alex Elder wrote:
> On 12/01/2012 11:34 PM, Nick Bartos wrote:
>> Unfortunately the hangs happen with the new set of patches.  Here's
>> some debug info:
>>
>> https://gist.github.com/raw/4187123/90194ce172130244a9c1c968ed185eee7282d809/gistfile1.txt
>>
> 
> Well I'm sorry to hear that but I'm glad to have the new info.
> 
> In retrospect running the new patches *without* the one that
> seems to cause the hang (#50) were good validation that they
> didn't lead to any new problems.
> 
> I'll look at this some more in the morning, and I think I'll
> confer with Sage whenever he's available for ideas on how to
> proceed.

Over the course of last week I have been finding and fixing a
few problems in rbd, the osd client, and the messenger in the
Linux kernel code.  I've added a handful of new patches to the
end of the ones I gave you last time.

At this point I don't expect these changes directly affect
the hangs you have been seeing, but a couple of these are
real problems you could (also) hit, and I'd like to avoid
that.

I haven't done rigorous testing on this but I believe
the changes are correct (and Sage has looked at them
and says they look OK to him).

The new version is available in the branch "wip-nick-new"
in the ceph-client git repository.  If you reproduce your
hang with this updated code (or do not), please let me know.

Thanks.

					-Alex


>                     -Alex
> 
> 
>> On Fri, Nov 30, 2012 at 3:22 PM, Alex Elder <elder@inktank.com> wrote:
>>> On 11/29/2012 02:37 PM, Alex Elder wrote:
>>>> On 11/22/2012 12:04 PM, Nick Bartos wrote:
>>>>> Here are the ceph log messages (including the libceph kernel debug
>>>>> stuff you asked for) from a node boot with the rbd command hung for a
>>>>> couple of minutes:
>>>
>>> I'm sorry, but I did something stupid...
>>>
>>> Yes, the branch I gave you includes these fixes.  However
>>> it does *not* include the commit that was giving you trouble
>>> to begin with.
>>>
>>> So...
>>>
>>> I have updated that same branch (wip-nick) to contain:
>>> - Linux 3.5.5
>>> - Plus the first *50* (not 49) patches you listed
>>> - Plus the ones I added before.
>>>
>>> The new commit id for that branch begins with be3198d6.
>>>
>>> I'm really sorry for this mistake.  Please try this new
>>> branch and report back what you find.
>>>
>>>                                          -Alex
>>>
>>>
>>>> Nick, I have put together a branch that includes two fixes
>>>> that might be helpful.  I don't expect these fixes will
>>>> necessarily *fix* what you're seeing, but one of them
>>>> pulls a big hunk of processing out of the picture and
>>>> might help eliminate some potential causes.  I had to
>>>> pull in several other patches as prerequisites in order
>>>> to get those fixes to apply cleanly.
>>>>
>>>> Would you be able to give it a try, and let us know what
>>>> results you get?  The branch contains:
>>>> - Linux 3.5.5
>>>> - Plus the first 49 patches you listed
>>>> - Plus four patches, which are prerequisites...
>>>>      libceph: define ceph_extract_encoded_string()
>>>>      rbd: define some new format constants
>>>>      rbd: define rbd_dev_image_id()
>>>>      rbd: kill create_snap sysfs entry
>>>> - ...for these two bug fixes:
>>>>      libceph: remove 'osdtimeout' option
>>>>      ceph: don't reference req after put
>>>>
>>>> The branch is available in the ceph-client git repository
>>>> under the name "wip-nick" and has commit id dd9323aa.
>>>>      https://github.com/ceph/ceph-client/tree/wip-nick
>>>>
>>>>> https://raw.github.com/gist/4132395/7cb5f0150179b012429c6e57749120dd88616cce/gistfile1.txt
>>>>>
>>>>
>>>> This full debug output is very helpful.  Please supply
>>>> that again as well.
>>>>
>>>> Thanks.
>>>>
>>>>                                        -Alex
>>>>
>>>>> On Wed, Nov 21, 2012 at 9:49 PM, Nick Bartos <nick@pistoncloud.com>
>>>>> wrote:
>>>>>> It's very easy to reproduce now with my automated install script, the
>>>>>> most I've seen it succeed with that patch is 2 in a row, and hanging
>>>>>> on the 3rd, although it hangs on most builds.  So it shouldn't take
>>>>>> much to get it to do it again.  I'll try and get to that tomorrow,
>>>>>> when I'm a bit more rested and my brain is working better.
>>>>>>
>>>>>> Yes during this the OSDs are probably all syncing up.  All the osd
>>>>>> and
>>>>>> mon daemons have started by the time the rdb commands are ran,
>>>>>> though.
>>>>>>
>>>>>> On Wed, Nov 21, 2012 at 8:47 PM, Sage Weil <sage@inktank.com> wrote:
>>>>>>> On Wed, 21 Nov 2012, Nick Bartos wrote:
>>>>>>>> FYI the build which included all 3.5 backports except patch #50 is
>>>>>>>> still going strong after 21 builds.
>>>>>>>
>>>>>>> Okay, that one at least makes some sense.  I've opened
>>>>>>>
>>>>>>>          http://tracker.newdream.net/issues/3519
>>>>>>>
>>>>>>> How easy is this to reproduce?  If it is something you can
>>>>>>> trigger with
>>>>>>> debugging enabled ('echo module libceph +p >
>>>>>>> /sys/kernel/debug/dynamic_debug/control') that would help
>>>>>>> tremendously.
>>>>>>>
>>>>>>> I'm guessing that during this startup time the OSDs are still in the
>>>>>>> process of starting?
>>>>>>>
>>>>>>> Alex, I bet that a test that does a lot of map/unmap stuff in a
>>>>>>> loop while
>>>>>>> thrashing OSDs could hit this.
>>>>>>>
>>>>>>> Thanks!
>>>>>>> sage
>>>>>>>
>>>>>>>
>>>>>>>>
>>>>>>>> On Wed, Nov 21, 2012 at 9:34 AM, Nick Bartos
>>>>>>>> <nick@pistoncloud.com> wrote:
>>>>>>>>> With 8 successful installs already done, I'm reasonably
>>>>>>>>> confident that
>>>>>>>>> it's patch #50.  I'm making another build which applies all
>>>>>>>>> patches
>>>>>>>>> from the 3.5 backport branch, excluding that specific one. 
>>>>>>>>> I'll let
>>>>>>>>> you know if that turns up any unexpected failures.
>>>>>>>>>
>>>>>>>>> What will the potential fall out be for removing that specific
>>>>>>>>> patch?
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> On Wed, Nov 21, 2012 at 9:02 AM, Nick Bartos
>>>>>>>>> <nick@pistoncloud.com> wrote:
>>>>>>>>>> It's really looking like it's the
>>>>>>>>>> libceph_resubmit_linger_ops_when_pg_mapping_changes commit.  When
>>>>>>>>>> patches 1-50 (listed below) are applied to 3.5.7, the hang is
>>>>>>>>>> present.
>>>>>>>>>>   So far I have gone through 4 successful installs with no
>>>>>>>>>> hang with
>>>>>>>>>> only 1-49 applied.  I'm still leaving my test run to make sure
>>>>>>>>>> it's
>>>>>>>>>> not a fluke, but since previously it hangs within the first
>>>>>>>>>> couple of
>>>>>>>>>> builds, it really looks like this is where the problem
>>>>>>>>>> originated.
>>>>>>>>>>
>>>>>>>>>> 1-libceph_eliminate_connection_state_DEAD.patch
>>>>>>>>>> 2-libceph_kill_bad_proto_ceph_connection_op.patch
>>>>>>>>>> 3-libceph_rename_socket_callbacks.patch
>>>>>>>>>> 4-libceph_rename_kvec_reset_and_kvec_add_functions.patch
>>>>>>>>>> 5-libceph_embed_ceph_messenger_structure_in_ceph_client.patch
>>>>>>>>>> 6-libceph_start_separating_connection_flags_from_state.patch
>>>>>>>>>> 7-libceph_start_tracking_connection_socket_state.patch
>>>>>>>>>> 8-libceph_provide_osd_number_when_creating_osd.patch
>>>>>>>>>> 9-libceph_set_CLOSED_state_bit_in_con_init.patch
>>>>>>>>>> 10-libceph_embed_ceph_connection_structure_in_mon_client.patch
>>>>>>>>>> 11-libceph_drop_connection_refcounting_for_mon_client.patch
>>>>>>>>>> 12-libceph_init_monitor_connection_when_opening.patch
>>>>>>>>>> 13-libceph_fully_initialize_connection_in_con_init.patch
>>>>>>>>>> 14-libceph_tweak_ceph_alloc_msg.patch
>>>>>>>>>> 15-libceph_have_messages_point_to_their_connection.patch
>>>>>>>>>> 16-libceph_have_messages_take_a_connection_reference.patch
>>>>>>>>>> 17-libceph_make_ceph_con_revoke_a_msg_operation.patch
>>>>>>>>>> 18-libceph_make_ceph_con_revoke_message_a_msg_op.patch
>>>>>>>>>> 19-libceph_fix_overflow_in___decode_pool_names.patch
>>>>>>>>>> 20-libceph_fix_overflow_in_osdmap_decode.patch
>>>>>>>>>> 21-libceph_fix_overflow_in_osdmap_apply_incremental.patch
>>>>>>>>>> 22-libceph_transition_socket_state_prior_to_actual_connect.patch
>>>>>>>>>> 23-libceph_fix_NULL_dereference_in_reset_connection.patch
>>>>>>>>>> 24-libceph_use_con_get_put_methods.patch
>>>>>>>>>> 25-libceph_drop_ceph_con_get_put_helpers_and_nref_member.patch
>>>>>>>>>> 26-libceph_encapsulate_out_message_data_setup.patch
>>>>>>>>>> 27-libceph_encapsulate_advancing_msg_page.patch
>>>>>>>>>> 28-libceph_don_t_mark_footer_complete_before_it_is.patch
>>>>>>>>>> 29-libceph_move_init_bio__functions_up.patch
>>>>>>>>>> 30-libceph_move_init_of_bio_iter.patch
>>>>>>>>>> 31-libceph_don_t_use_bio_iter_as_a_flag.patch
>>>>>>>>>> 32-libceph_SOCK_CLOSED_is_a_flag_not_a_state.patch
>>>>>>>>>> 33-libceph_don_t_change_socket_state_on_sock_event.patch
>>>>>>>>>> 34-libceph_just_set_SOCK_CLOSED_when_state_changes.patch
>>>>>>>>>> 35-libceph_don_t_touch_con_state_in_con_close_socket.patch
>>>>>>>>>> 36-libceph_clear_CONNECTING_in_ceph_con_close.patch
>>>>>>>>>> 37-libceph_clear_NEGOTIATING_when_done.patch
>>>>>>>>>> 38-libceph_define_and_use_an_explicit_CONNECTED_state.patch
>>>>>>>>>> 39-libceph_separate_banner_and_connect_writes.patch
>>>>>>>>>> 40-libceph_distinguish_two_phases_of_connect_sequence.patch
>>>>>>>>>> 41-libceph_small_changes_to_messenger.c.patch
>>>>>>>>>> 42-libceph_add_some_fine_ASCII_art.patch
>>>>>>>>>> 43-libceph_set_peer_name_on_con_open_not_init.patch
>>>>>>>>>> 44-libceph_initialize_mon_client_con_only_once.patch
>>>>>>>>>> 45-libceph_allow_sock_transition_from_CONNECTING_to_CLOSED.patch
>>>>>>>>>> 46-libceph_initialize_msgpool_message_types.patch
>>>>>>>>>> 47-libceph_prevent_the_race_of_incoming_work_during_teardown.patch
>>>>>>>>>>
>>>>>>>>>> 48-libceph_report_socket_read_write_error_message.patch
>>>>>>>>>> 49-libceph_fix_mutex_coverage_for_ceph_con_close.patch
>>>>>>>>>> 50-libceph_resubmit_linger_ops_when_pg_mapping_changes.patch
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> On Wed, Nov 21, 2012 at 8:50 AM, Sage Weil <sage@inktank.com>
>>>>>>>>>> wrote:
>>>>>>>>>>> Thanks for hunting this down.  I'm very curious what the
>>>>>>>>>>> culprit is...
>>>>>>>>>>>
>>>>>>>>>>> sage
>>>>>>>>
>>>>>>>>
>>>>
>>>
> 


^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: rbd map command hangs for 15 minutes during system start up
  2012-12-10 21:57                                                     ` Alex Elder
@ 2012-12-11 17:26                                                       ` Nick Bartos
  2012-12-11 18:01                                                         ` Alex Elder
  0 siblings, 1 reply; 56+ messages in thread
From: Nick Bartos @ 2012-12-11 17:26 UTC (permalink / raw)
  To: Alex Elder
  Cc: Sage Weil, Gregory Farnum, Josh Durgin, Mandell Degerness, ceph-devel

Thanks! I'm creating a build with the new patches now.  I'll let you
know how testing goes.

On Mon, Dec 10, 2012 at 1:57 PM, Alex Elder <elder@inktank.com> wrote:
> On 12/02/2012 10:43 PM, Alex Elder wrote:
>> On 12/01/2012 11:34 PM, Nick Bartos wrote:
>>> Unfortunately the hangs happen with the new set of patches.  Here's
>>> some debug info:
>>>
>>> https://gist.github.com/raw/4187123/90194ce172130244a9c1c968ed185eee7282d809/gistfile1.txt
>>>
>>
>> Well I'm sorry to hear that but I'm glad to have the new info.
>>
>> In retrospect running the new patches *without* the one that
>> seems to cause the hang (#50) were good validation that they
>> didn't lead to any new problems.
>>
>> I'll look at this some more in the morning, and I think I'll
>> confer with Sage whenever he's available for ideas on how to
>> proceed.
>
> Over the course of last week I have been finding and fixing a
> few problems in rbd, the osd client, and the messenger in the
> Linux kernel code.  I've added a handful of new patches to the
> end of the ones I gave you last time.
>
> At this point I don't expect these changes directly affect
> the hangs you have been seeing, but a couple of these are
> real problems you could (also) hit, and I'd like to avoid
> that.
>
> I haven't done rigorous testing on this but I believe
> the changes are correct (and Sage has looked at them
> and says they look OK to him).
>
> The new version is available in the branch "wip-nick-new"
> in the ceph-client git repository.  If you reproduce your
> hang with this updated code (or do not), please let me know.
>
> Thanks.
>
>                                         -Alex
>
>
>>                     -Alex
>>
>>
>>> On Fri, Nov 30, 2012 at 3:22 PM, Alex Elder <elder@inktank.com> wrote:
>>>> On 11/29/2012 02:37 PM, Alex Elder wrote:
>>>>> On 11/22/2012 12:04 PM, Nick Bartos wrote:
>>>>>> Here are the ceph log messages (including the libceph kernel debug
>>>>>> stuff you asked for) from a node boot with the rbd command hung for a
>>>>>> couple of minutes:
>>>>
>>>> I'm sorry, but I did something stupid...
>>>>
>>>> Yes, the branch I gave you includes these fixes.  However
>>>> it does *not* include the commit that was giving you trouble
>>>> to begin with.
>>>>
>>>> So...
>>>>
>>>> I have updated that same branch (wip-nick) to contain:
>>>> - Linux 3.5.5
>>>> - Plus the first *50* (not 49) patches you listed
>>>> - Plus the ones I added before.
>>>>
>>>> The new commit id for that branch begins with be3198d6.
>>>>
>>>> I'm really sorry for this mistake.  Please try this new
>>>> branch and report back what you find.
>>>>
>>>>                                          -Alex
>>>>
>>>>
>>>>> Nick, I have put together a branch that includes two fixes
>>>>> that might be helpful.  I don't expect these fixes will
>>>>> necessarily *fix* what you're seeing, but one of them
>>>>> pulls a big hunk of processing out of the picture and
>>>>> might help eliminate some potential causes.  I had to
>>>>> pull in several other patches as prerequisites in order
>>>>> to get those fixes to apply cleanly.
>>>>>
>>>>> Would you be able to give it a try, and let us know what
>>>>> results you get?  The branch contains:
>>>>> - Linux 3.5.5
>>>>> - Plus the first 49 patches you listed
>>>>> - Plus four patches, which are prerequisites...
>>>>>      libceph: define ceph_extract_encoded_string()
>>>>>      rbd: define some new format constants
>>>>>      rbd: define rbd_dev_image_id()
>>>>>      rbd: kill create_snap sysfs entry
>>>>> - ...for these two bug fixes:
>>>>>      libceph: remove 'osdtimeout' option
>>>>>      ceph: don't reference req after put
>>>>>
>>>>> The branch is available in the ceph-client git repository
>>>>> under the name "wip-nick" and has commit id dd9323aa.
>>>>>      https://github.com/ceph/ceph-client/tree/wip-nick
>>>>>
>>>>>> https://raw.github.com/gist/4132395/7cb5f0150179b012429c6e57749120dd88616cce/gistfile1.txt
>>>>>>
>>>>>
>>>>> This full debug output is very helpful.  Please supply
>>>>> that again as well.
>>>>>
>>>>> Thanks.
>>>>>
>>>>>                                        -Alex
>>>>>
>>>>>> On Wed, Nov 21, 2012 at 9:49 PM, Nick Bartos <nick@pistoncloud.com>
>>>>>> wrote:
>>>>>>> It's very easy to reproduce now with my automated install script, the
>>>>>>> most I've seen it succeed with that patch is 2 in a row, and hanging
>>>>>>> on the 3rd, although it hangs on most builds.  So it shouldn't take
>>>>>>> much to get it to do it again.  I'll try and get to that tomorrow,
>>>>>>> when I'm a bit more rested and my brain is working better.
>>>>>>>
>>>>>>> Yes during this the OSDs are probably all syncing up.  All the osd
>>>>>>> and
>>>>>>> mon daemons have started by the time the rdb commands are ran,
>>>>>>> though.
>>>>>>>
>>>>>>> On Wed, Nov 21, 2012 at 8:47 PM, Sage Weil <sage@inktank.com> wrote:
>>>>>>>> On Wed, 21 Nov 2012, Nick Bartos wrote:
>>>>>>>>> FYI the build which included all 3.5 backports except patch #50 is
>>>>>>>>> still going strong after 21 builds.
>>>>>>>>
>>>>>>>> Okay, that one at least makes some sense.  I've opened
>>>>>>>>
>>>>>>>>          http://tracker.newdream.net/issues/3519
>>>>>>>>
>>>>>>>> How easy is this to reproduce?  If it is something you can
>>>>>>>> trigger with
>>>>>>>> debugging enabled ('echo module libceph +p >
>>>>>>>> /sys/kernel/debug/dynamic_debug/control') that would help
>>>>>>>> tremendously.
>>>>>>>>
>>>>>>>> I'm guessing that during this startup time the OSDs are still in the
>>>>>>>> process of starting?
>>>>>>>>
>>>>>>>> Alex, I bet that a test that does a lot of map/unmap stuff in a
>>>>>>>> loop while
>>>>>>>> thrashing OSDs could hit this.
>>>>>>>>
>>>>>>>> Thanks!
>>>>>>>> sage
>>>>>>>>
>>>>>>>>
>>>>>>>>>
>>>>>>>>> On Wed, Nov 21, 2012 at 9:34 AM, Nick Bartos
>>>>>>>>> <nick@pistoncloud.com> wrote:
>>>>>>>>>> With 8 successful installs already done, I'm reasonably
>>>>>>>>>> confident that
>>>>>>>>>> it's patch #50.  I'm making another build which applies all
>>>>>>>>>> patches
>>>>>>>>>> from the 3.5 backport branch, excluding that specific one.
>>>>>>>>>> I'll let
>>>>>>>>>> you know if that turns up any unexpected failures.
>>>>>>>>>>
>>>>>>>>>> What will the potential fall out be for removing that specific
>>>>>>>>>> patch?
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> On Wed, Nov 21, 2012 at 9:02 AM, Nick Bartos
>>>>>>>>>> <nick@pistoncloud.com> wrote:
>>>>>>>>>>> It's really looking like it's the
>>>>>>>>>>> libceph_resubmit_linger_ops_when_pg_mapping_changes commit.  When
>>>>>>>>>>> patches 1-50 (listed below) are applied to 3.5.7, the hang is
>>>>>>>>>>> present.
>>>>>>>>>>>   So far I have gone through 4 successful installs with no
>>>>>>>>>>> hang with
>>>>>>>>>>> only 1-49 applied.  I'm still leaving my test run to make sure
>>>>>>>>>>> it's
>>>>>>>>>>> not a fluke, but since previously it hangs within the first
>>>>>>>>>>> couple of
>>>>>>>>>>> builds, it really looks like this is where the problem
>>>>>>>>>>> originated.
>>>>>>>>>>>
>>>>>>>>>>> 1-libceph_eliminate_connection_state_DEAD.patch
>>>>>>>>>>> 2-libceph_kill_bad_proto_ceph_connection_op.patch
>>>>>>>>>>> 3-libceph_rename_socket_callbacks.patch
>>>>>>>>>>> 4-libceph_rename_kvec_reset_and_kvec_add_functions.patch
>>>>>>>>>>> 5-libceph_embed_ceph_messenger_structure_in_ceph_client.patch
>>>>>>>>>>> 6-libceph_start_separating_connection_flags_from_state.patch
>>>>>>>>>>> 7-libceph_start_tracking_connection_socket_state.patch
>>>>>>>>>>> 8-libceph_provide_osd_number_when_creating_osd.patch
>>>>>>>>>>> 9-libceph_set_CLOSED_state_bit_in_con_init.patch
>>>>>>>>>>> 10-libceph_embed_ceph_connection_structure_in_mon_client.patch
>>>>>>>>>>> 11-libceph_drop_connection_refcounting_for_mon_client.patch
>>>>>>>>>>> 12-libceph_init_monitor_connection_when_opening.patch
>>>>>>>>>>> 13-libceph_fully_initialize_connection_in_con_init.patch
>>>>>>>>>>> 14-libceph_tweak_ceph_alloc_msg.patch
>>>>>>>>>>> 15-libceph_have_messages_point_to_their_connection.patch
>>>>>>>>>>> 16-libceph_have_messages_take_a_connection_reference.patch
>>>>>>>>>>> 17-libceph_make_ceph_con_revoke_a_msg_operation.patch
>>>>>>>>>>> 18-libceph_make_ceph_con_revoke_message_a_msg_op.patch
>>>>>>>>>>> 19-libceph_fix_overflow_in___decode_pool_names.patch
>>>>>>>>>>> 20-libceph_fix_overflow_in_osdmap_decode.patch
>>>>>>>>>>> 21-libceph_fix_overflow_in_osdmap_apply_incremental.patch
>>>>>>>>>>> 22-libceph_transition_socket_state_prior_to_actual_connect.patch
>>>>>>>>>>> 23-libceph_fix_NULL_dereference_in_reset_connection.patch
>>>>>>>>>>> 24-libceph_use_con_get_put_methods.patch
>>>>>>>>>>> 25-libceph_drop_ceph_con_get_put_helpers_and_nref_member.patch
>>>>>>>>>>> 26-libceph_encapsulate_out_message_data_setup.patch
>>>>>>>>>>> 27-libceph_encapsulate_advancing_msg_page.patch
>>>>>>>>>>> 28-libceph_don_t_mark_footer_complete_before_it_is.patch
>>>>>>>>>>> 29-libceph_move_init_bio__functions_up.patch
>>>>>>>>>>> 30-libceph_move_init_of_bio_iter.patch
>>>>>>>>>>> 31-libceph_don_t_use_bio_iter_as_a_flag.patch
>>>>>>>>>>> 32-libceph_SOCK_CLOSED_is_a_flag_not_a_state.patch
>>>>>>>>>>> 33-libceph_don_t_change_socket_state_on_sock_event.patch
>>>>>>>>>>> 34-libceph_just_set_SOCK_CLOSED_when_state_changes.patch
>>>>>>>>>>> 35-libceph_don_t_touch_con_state_in_con_close_socket.patch
>>>>>>>>>>> 36-libceph_clear_CONNECTING_in_ceph_con_close.patch
>>>>>>>>>>> 37-libceph_clear_NEGOTIATING_when_done.patch
>>>>>>>>>>> 38-libceph_define_and_use_an_explicit_CONNECTED_state.patch
>>>>>>>>>>> 39-libceph_separate_banner_and_connect_writes.patch
>>>>>>>>>>> 40-libceph_distinguish_two_phases_of_connect_sequence.patch
>>>>>>>>>>> 41-libceph_small_changes_to_messenger.c.patch
>>>>>>>>>>> 42-libceph_add_some_fine_ASCII_art.patch
>>>>>>>>>>> 43-libceph_set_peer_name_on_con_open_not_init.patch
>>>>>>>>>>> 44-libceph_initialize_mon_client_con_only_once.patch
>>>>>>>>>>> 45-libceph_allow_sock_transition_from_CONNECTING_to_CLOSED.patch
>>>>>>>>>>> 46-libceph_initialize_msgpool_message_types.patch
>>>>>>>>>>> 47-libceph_prevent_the_race_of_incoming_work_during_teardown.patch
>>>>>>>>>>>
>>>>>>>>>>> 48-libceph_report_socket_read_write_error_message.patch
>>>>>>>>>>> 49-libceph_fix_mutex_coverage_for_ceph_con_close.patch
>>>>>>>>>>> 50-libceph_resubmit_linger_ops_when_pg_mapping_changes.patch
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> On Wed, Nov 21, 2012 at 8:50 AM, Sage Weil <sage@inktank.com>
>>>>>>>>>>> wrote:
>>>>>>>>>>>> Thanks for hunting this down.  I'm very curious what the
>>>>>>>>>>>> culprit is...
>>>>>>>>>>>>
>>>>>>>>>>>> sage
>>>>>>>>>
>>>>>>>>>
>>>>>
>>>>
>>
>

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: rbd map command hangs for 15 minutes during system start up
  2012-12-11 17:26                                                       ` Nick Bartos
@ 2012-12-11 18:01                                                         ` Alex Elder
  2012-12-11 19:44                                                           ` Alex Elder
  0 siblings, 1 reply; 56+ messages in thread
From: Alex Elder @ 2012-12-11 18:01 UTC (permalink / raw)
  To: Nick Bartos
  Cc: Sage Weil, Gregory Farnum, Josh Durgin, Mandell Degerness, ceph-devel

On 12/11/2012 11:26 AM, Nick Bartos wrote:
> Thanks! I'm creating a build with the new patches now.  I'll let you
> know how testing goes.

FYI, I've been testing with these changes and have *not* been
hitting the kinds of problems I'd been previously.  However
those problems were different from yours, so I'm offering no
promises...  But there's a chance it'll be more helpful than
I thought.

I am preparing yet another branch for you, this time adding
all the rest of the commits you started with, just in case
this does improve things.

Please keep me informed how your testing goes.

					-Alex


> On Mon, Dec 10, 2012 at 1:57 PM, Alex Elder <elder@inktank.com> wrote:
>> On 12/02/2012 10:43 PM, Alex Elder wrote:
>>> On 12/01/2012 11:34 PM, Nick Bartos wrote:
>>>> Unfortunately the hangs happen with the new set of patches.  Here's
>>>> some debug info:
>>>>
>>>> https://gist.github.com/raw/4187123/90194ce172130244a9c1c968ed185eee7282d809/gistfile1.txt
>>>>
>>>
>>> Well I'm sorry to hear that but I'm glad to have the new info.
>>>
>>> In retrospect running the new patches *without* the one that
>>> seems to cause the hang (#50) were good validation that they
>>> didn't lead to any new problems.
>>>
>>> I'll look at this some more in the morning, and I think I'll
>>> confer with Sage whenever he's available for ideas on how to
>>> proceed.
>>
>> Over the course of last week I have been finding and fixing a
>> few problems in rbd, the osd client, and the messenger in the
>> Linux kernel code.  I've added a handful of new patches to the
>> end of the ones I gave you last time.
>>
>> At this point I don't expect these changes directly affect
>> the hangs you have been seeing, but a couple of these are
>> real problems you could (also) hit, and I'd like to avoid
>> that.
>>
>> I haven't done rigorous testing on this but I believe
>> the changes are correct (and Sage has looked at them
>> and says they look OK to him).
>>
>> The new version is available in the branch "wip-nick-new"
>> in the ceph-client git repository.  If you reproduce your
>> hang with this updated code (or do not), please let me know.
>>
>> Thanks.
>>
>>                                         -Alex
>>
>>
>>>                     -Alex
>>>
>>>
>>>> On Fri, Nov 30, 2012 at 3:22 PM, Alex Elder <elder@inktank.com> wrote:
>>>>> On 11/29/2012 02:37 PM, Alex Elder wrote:
>>>>>> On 11/22/2012 12:04 PM, Nick Bartos wrote:
>>>>>>> Here are the ceph log messages (including the libceph kernel debug
>>>>>>> stuff you asked for) from a node boot with the rbd command hung for a
>>>>>>> couple of minutes:
>>>>>
>>>>> I'm sorry, but I did something stupid...
>>>>>
>>>>> Yes, the branch I gave you includes these fixes.  However
>>>>> it does *not* include the commit that was giving you trouble
>>>>> to begin with.
>>>>>
>>>>> So...
>>>>>
>>>>> I have updated that same branch (wip-nick) to contain:
>>>>> - Linux 3.5.5
>>>>> - Plus the first *50* (not 49) patches you listed
>>>>> - Plus the ones I added before.
>>>>>
>>>>> The new commit id for that branch begins with be3198d6.
>>>>>
>>>>> I'm really sorry for this mistake.  Please try this new
>>>>> branch and report back what you find.
>>>>>
>>>>>                                          -Alex
>>>>>
>>>>>
>>>>>> Nick, I have put together a branch that includes two fixes
>>>>>> that might be helpful.  I don't expect these fixes will
>>>>>> necessarily *fix* what you're seeing, but one of them
>>>>>> pulls a big hunk of processing out of the picture and
>>>>>> might help eliminate some potential causes.  I had to
>>>>>> pull in several other patches as prerequisites in order
>>>>>> to get those fixes to apply cleanly.
>>>>>>
>>>>>> Would you be able to give it a try, and let us know what
>>>>>> results you get?  The branch contains:
>>>>>> - Linux 3.5.5
>>>>>> - Plus the first 49 patches you listed
>>>>>> - Plus four patches, which are prerequisites...
>>>>>>      libceph: define ceph_extract_encoded_string()
>>>>>>      rbd: define some new format constants
>>>>>>      rbd: define rbd_dev_image_id()
>>>>>>      rbd: kill create_snap sysfs entry
>>>>>> - ...for these two bug fixes:
>>>>>>      libceph: remove 'osdtimeout' option
>>>>>>      ceph: don't reference req after put
>>>>>>
>>>>>> The branch is available in the ceph-client git repository
>>>>>> under the name "wip-nick" and has commit id dd9323aa.
>>>>>>      https://github.com/ceph/ceph-client/tree/wip-nick
>>>>>>
>>>>>>> https://raw.github.com/gist/4132395/7cb5f0150179b012429c6e57749120dd88616cce/gistfile1.txt
>>>>>>>
>>>>>>
>>>>>> This full debug output is very helpful.  Please supply
>>>>>> that again as well.
>>>>>>
>>>>>> Thanks.
>>>>>>
>>>>>>                                        -Alex
>>>>>>
>>>>>>> On Wed, Nov 21, 2012 at 9:49 PM, Nick Bartos <nick@pistoncloud.com>
>>>>>>> wrote:
>>>>>>>> It's very easy to reproduce now with my automated install script, the
>>>>>>>> most I've seen it succeed with that patch is 2 in a row, and hanging
>>>>>>>> on the 3rd, although it hangs on most builds.  So it shouldn't take
>>>>>>>> much to get it to do it again.  I'll try and get to that tomorrow,
>>>>>>>> when I'm a bit more rested and my brain is working better.
>>>>>>>>
>>>>>>>> Yes during this the OSDs are probably all syncing up.  All the osd
>>>>>>>> and
>>>>>>>> mon daemons have started by the time the rdb commands are ran,
>>>>>>>> though.
>>>>>>>>
>>>>>>>> On Wed, Nov 21, 2012 at 8:47 PM, Sage Weil <sage@inktank.com> wrote:
>>>>>>>>> On Wed, 21 Nov 2012, Nick Bartos wrote:
>>>>>>>>>> FYI the build which included all 3.5 backports except patch #50 is
>>>>>>>>>> still going strong after 21 builds.
>>>>>>>>>
>>>>>>>>> Okay, that one at least makes some sense.  I've opened
>>>>>>>>>
>>>>>>>>>          http://tracker.newdream.net/issues/3519
>>>>>>>>>
>>>>>>>>> How easy is this to reproduce?  If it is something you can
>>>>>>>>> trigger with
>>>>>>>>> debugging enabled ('echo module libceph +p >
>>>>>>>>> /sys/kernel/debug/dynamic_debug/control') that would help
>>>>>>>>> tremendously.
>>>>>>>>>
>>>>>>>>> I'm guessing that during this startup time the OSDs are still in the
>>>>>>>>> process of starting?
>>>>>>>>>
>>>>>>>>> Alex, I bet that a test that does a lot of map/unmap stuff in a
>>>>>>>>> loop while
>>>>>>>>> thrashing OSDs could hit this.
>>>>>>>>>
>>>>>>>>> Thanks!
>>>>>>>>> sage
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> On Wed, Nov 21, 2012 at 9:34 AM, Nick Bartos
>>>>>>>>>> <nick@pistoncloud.com> wrote:
>>>>>>>>>>> With 8 successful installs already done, I'm reasonably
>>>>>>>>>>> confident that
>>>>>>>>>>> it's patch #50.  I'm making another build which applies all
>>>>>>>>>>> patches
>>>>>>>>>>> from the 3.5 backport branch, excluding that specific one.
>>>>>>>>>>> I'll let
>>>>>>>>>>> you know if that turns up any unexpected failures.
>>>>>>>>>>>
>>>>>>>>>>> What will the potential fall out be for removing that specific
>>>>>>>>>>> patch?
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> On Wed, Nov 21, 2012 at 9:02 AM, Nick Bartos
>>>>>>>>>>> <nick@pistoncloud.com> wrote:
>>>>>>>>>>>> It's really looking like it's the
>>>>>>>>>>>> libceph_resubmit_linger_ops_when_pg_mapping_changes commit.  When
>>>>>>>>>>>> patches 1-50 (listed below) are applied to 3.5.7, the hang is
>>>>>>>>>>>> present.
>>>>>>>>>>>>   So far I have gone through 4 successful installs with no
>>>>>>>>>>>> hang with
>>>>>>>>>>>> only 1-49 applied.  I'm still leaving my test run to make sure
>>>>>>>>>>>> it's
>>>>>>>>>>>> not a fluke, but since previously it hangs within the first
>>>>>>>>>>>> couple of
>>>>>>>>>>>> builds, it really looks like this is where the problem
>>>>>>>>>>>> originated.
>>>>>>>>>>>>
>>>>>>>>>>>> 1-libceph_eliminate_connection_state_DEAD.patch
>>>>>>>>>>>> 2-libceph_kill_bad_proto_ceph_connection_op.patch
>>>>>>>>>>>> 3-libceph_rename_socket_callbacks.patch
>>>>>>>>>>>> 4-libceph_rename_kvec_reset_and_kvec_add_functions.patch
>>>>>>>>>>>> 5-libceph_embed_ceph_messenger_structure_in_ceph_client.patch
>>>>>>>>>>>> 6-libceph_start_separating_connection_flags_from_state.patch
>>>>>>>>>>>> 7-libceph_start_tracking_connection_socket_state.patch
>>>>>>>>>>>> 8-libceph_provide_osd_number_when_creating_osd.patch
>>>>>>>>>>>> 9-libceph_set_CLOSED_state_bit_in_con_init.patch
>>>>>>>>>>>> 10-libceph_embed_ceph_connection_structure_in_mon_client.patch
>>>>>>>>>>>> 11-libceph_drop_connection_refcounting_for_mon_client.patch
>>>>>>>>>>>> 12-libceph_init_monitor_connection_when_opening.patch
>>>>>>>>>>>> 13-libceph_fully_initialize_connection_in_con_init.patch
>>>>>>>>>>>> 14-libceph_tweak_ceph_alloc_msg.patch
>>>>>>>>>>>> 15-libceph_have_messages_point_to_their_connection.patch
>>>>>>>>>>>> 16-libceph_have_messages_take_a_connection_reference.patch
>>>>>>>>>>>> 17-libceph_make_ceph_con_revoke_a_msg_operation.patch
>>>>>>>>>>>> 18-libceph_make_ceph_con_revoke_message_a_msg_op.patch
>>>>>>>>>>>> 19-libceph_fix_overflow_in___decode_pool_names.patch
>>>>>>>>>>>> 20-libceph_fix_overflow_in_osdmap_decode.patch
>>>>>>>>>>>> 21-libceph_fix_overflow_in_osdmap_apply_incremental.patch
>>>>>>>>>>>> 22-libceph_transition_socket_state_prior_to_actual_connect.patch
>>>>>>>>>>>> 23-libceph_fix_NULL_dereference_in_reset_connection.patch
>>>>>>>>>>>> 24-libceph_use_con_get_put_methods.patch
>>>>>>>>>>>> 25-libceph_drop_ceph_con_get_put_helpers_and_nref_member.patch
>>>>>>>>>>>> 26-libceph_encapsulate_out_message_data_setup.patch
>>>>>>>>>>>> 27-libceph_encapsulate_advancing_msg_page.patch
>>>>>>>>>>>> 28-libceph_don_t_mark_footer_complete_before_it_is.patch
>>>>>>>>>>>> 29-libceph_move_init_bio__functions_up.patch
>>>>>>>>>>>> 30-libceph_move_init_of_bio_iter.patch
>>>>>>>>>>>> 31-libceph_don_t_use_bio_iter_as_a_flag.patch
>>>>>>>>>>>> 32-libceph_SOCK_CLOSED_is_a_flag_not_a_state.patch
>>>>>>>>>>>> 33-libceph_don_t_change_socket_state_on_sock_event.patch
>>>>>>>>>>>> 34-libceph_just_set_SOCK_CLOSED_when_state_changes.patch
>>>>>>>>>>>> 35-libceph_don_t_touch_con_state_in_con_close_socket.patch
>>>>>>>>>>>> 36-libceph_clear_CONNECTING_in_ceph_con_close.patch
>>>>>>>>>>>> 37-libceph_clear_NEGOTIATING_when_done.patch
>>>>>>>>>>>> 38-libceph_define_and_use_an_explicit_CONNECTED_state.patch
>>>>>>>>>>>> 39-libceph_separate_banner_and_connect_writes.patch
>>>>>>>>>>>> 40-libceph_distinguish_two_phases_of_connect_sequence.patch
>>>>>>>>>>>> 41-libceph_small_changes_to_messenger.c.patch
>>>>>>>>>>>> 42-libceph_add_some_fine_ASCII_art.patch
>>>>>>>>>>>> 43-libceph_set_peer_name_on_con_open_not_init.patch
>>>>>>>>>>>> 44-libceph_initialize_mon_client_con_only_once.patch
>>>>>>>>>>>> 45-libceph_allow_sock_transition_from_CONNECTING_to_CLOSED.patch
>>>>>>>>>>>> 46-libceph_initialize_msgpool_message_types.patch
>>>>>>>>>>>> 47-libceph_prevent_the_race_of_incoming_work_during_teardown.patch
>>>>>>>>>>>>
>>>>>>>>>>>> 48-libceph_report_socket_read_write_error_message.patch
>>>>>>>>>>>> 49-libceph_fix_mutex_coverage_for_ceph_con_close.patch
>>>>>>>>>>>> 50-libceph_resubmit_linger_ops_when_pg_mapping_changes.patch
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> On Wed, Nov 21, 2012 at 8:50 AM, Sage Weil <sage@inktank.com>
>>>>>>>>>>>> wrote:
>>>>>>>>>>>>> Thanks for hunting this down.  I'm very curious what the
>>>>>>>>>>>>> culprit is...
>>>>>>>>>>>>>
>>>>>>>>>>>>> sage
>>>>>>>>>>
>>>>>>>>>>
>>>>>>
>>>>>
>>>
>>


^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: rbd map command hangs for 15 minutes during system start up
  2012-12-11 18:01                                                         ` Alex Elder
@ 2012-12-11 19:44                                                           ` Alex Elder
  2012-12-13  0:57                                                             ` Nick Bartos
  0 siblings, 1 reply; 56+ messages in thread
From: Alex Elder @ 2012-12-11 19:44 UTC (permalink / raw)
  To: Nick Bartos
  Cc: Sage Weil, Gregory Farnum, Josh Durgin, Mandell Degerness, ceph-devel

On 12/11/2012 12:01 PM, Alex Elder wrote:
> On 12/11/2012 11:26 AM, Nick Bartos wrote:
>> Thanks! I'm creating a build with the new patches now.  I'll let you
>> know how testing goes.
> 
> FYI, I've been testing with these changes and have *not* been
> hitting the kinds of problems I'd been previously.  However
> those problems were different from yours, so I'm offering no
> promises...  But there's a chance it'll be more helpful than
> I thought.
> 
> I am preparing yet another branch for you, this time adding
> all the rest of the commits you started with, just in case
> this does improve things.

This new branch is ready.  Feel free to try it out, and again
let me know how it works for you.

The branch is "wip-nick-newer"

					-Alex


> Please keep me informed how your testing goes.
> 
> 					-Alex
> 
> 
>> On Mon, Dec 10, 2012 at 1:57 PM, Alex Elder <elder@inktank.com> wrote:
>>> On 12/02/2012 10:43 PM, Alex Elder wrote:
>>>> On 12/01/2012 11:34 PM, Nick Bartos wrote:
>>>>> Unfortunately the hangs happen with the new set of patches.  Here's
>>>>> some debug info:
>>>>>
>>>>> https://gist.github.com/raw/4187123/90194ce172130244a9c1c968ed185eee7282d809/gistfile1.txt
>>>>>
>>>>
>>>> Well I'm sorry to hear that but I'm glad to have the new info.
>>>>
>>>> In retrospect running the new patches *without* the one that
>>>> seems to cause the hang (#50) were good validation that they
>>>> didn't lead to any new problems.
>>>>
>>>> I'll look at this some more in the morning, and I think I'll
>>>> confer with Sage whenever he's available for ideas on how to
>>>> proceed.
>>>
>>> Over the course of last week I have been finding and fixing a
>>> few problems in rbd, the osd client, and the messenger in the
>>> Linux kernel code.  I've added a handful of new patches to the
>>> end of the ones I gave you last time.
>>>
>>> At this point I don't expect these changes directly affect
>>> the hangs you have been seeing, but a couple of these are
>>> real problems you could (also) hit, and I'd like to avoid
>>> that.
>>>
>>> I haven't done rigorous testing on this but I believe
>>> the changes are correct (and Sage has looked at them
>>> and says they look OK to him).
>>>
>>> The new version is available in the branch "wip-nick-new"
>>> in the ceph-client git repository.  If you reproduce your
>>> hang with this updated code (or do not), please let me know.
>>>
>>> Thanks.
>>>
>>>                                         -Alex
>>>
>>>
>>>>                     -Alex
>>>>
>>>>
>>>>> On Fri, Nov 30, 2012 at 3:22 PM, Alex Elder <elder@inktank.com> wrote:
>>>>>> On 11/29/2012 02:37 PM, Alex Elder wrote:
>>>>>>> On 11/22/2012 12:04 PM, Nick Bartos wrote:
>>>>>>>> Here are the ceph log messages (including the libceph kernel debug
>>>>>>>> stuff you asked for) from a node boot with the rbd command hung for a
>>>>>>>> couple of minutes:
>>>>>>
>>>>>> I'm sorry, but I did something stupid...
>>>>>>
>>>>>> Yes, the branch I gave you includes these fixes.  However
>>>>>> it does *not* include the commit that was giving you trouble
>>>>>> to begin with.
>>>>>>
>>>>>> So...
>>>>>>
>>>>>> I have updated that same branch (wip-nick) to contain:
>>>>>> - Linux 3.5.5
>>>>>> - Plus the first *50* (not 49) patches you listed
>>>>>> - Plus the ones I added before.
>>>>>>
>>>>>> The new commit id for that branch begins with be3198d6.
>>>>>>
>>>>>> I'm really sorry for this mistake.  Please try this new
>>>>>> branch and report back what you find.
>>>>>>
>>>>>>                                          -Alex
>>>>>>
>>>>>>
>>>>>>> Nick, I have put together a branch that includes two fixes
>>>>>>> that might be helpful.  I don't expect these fixes will
>>>>>>> necessarily *fix* what you're seeing, but one of them
>>>>>>> pulls a big hunk of processing out of the picture and
>>>>>>> might help eliminate some potential causes.  I had to
>>>>>>> pull in several other patches as prerequisites in order
>>>>>>> to get those fixes to apply cleanly.
>>>>>>>
>>>>>>> Would you be able to give it a try, and let us know what
>>>>>>> results you get?  The branch contains:
>>>>>>> - Linux 3.5.5
>>>>>>> - Plus the first 49 patches you listed
>>>>>>> - Plus four patches, which are prerequisites...
>>>>>>>      libceph: define ceph_extract_encoded_string()
>>>>>>>      rbd: define some new format constants
>>>>>>>      rbd: define rbd_dev_image_id()
>>>>>>>      rbd: kill create_snap sysfs entry
>>>>>>> - ...for these two bug fixes:
>>>>>>>      libceph: remove 'osdtimeout' option
>>>>>>>      ceph: don't reference req after put
>>>>>>>
>>>>>>> The branch is available in the ceph-client git repository
>>>>>>> under the name "wip-nick" and has commit id dd9323aa.
>>>>>>>      https://github.com/ceph/ceph-client/tree/wip-nick
>>>>>>>
>>>>>>>> https://raw.github.com/gist/4132395/7cb5f0150179b012429c6e57749120dd88616cce/gistfile1.txt
>>>>>>>>
>>>>>>>
>>>>>>> This full debug output is very helpful.  Please supply
>>>>>>> that again as well.
>>>>>>>
>>>>>>> Thanks.
>>>>>>>
>>>>>>>                                        -Alex
>>>>>>>
>>>>>>>> On Wed, Nov 21, 2012 at 9:49 PM, Nick Bartos <nick@pistoncloud.com>
>>>>>>>> wrote:
>>>>>>>>> It's very easy to reproduce now with my automated install script, the
>>>>>>>>> most I've seen it succeed with that patch is 2 in a row, and hanging
>>>>>>>>> on the 3rd, although it hangs on most builds.  So it shouldn't take
>>>>>>>>> much to get it to do it again.  I'll try and get to that tomorrow,
>>>>>>>>> when I'm a bit more rested and my brain is working better.
>>>>>>>>>
>>>>>>>>> Yes during this the OSDs are probably all syncing up.  All the osd
>>>>>>>>> and
>>>>>>>>> mon daemons have started by the time the rdb commands are ran,
>>>>>>>>> though.
>>>>>>>>>
>>>>>>>>> On Wed, Nov 21, 2012 at 8:47 PM, Sage Weil <sage@inktank.com> wrote:
>>>>>>>>>> On Wed, 21 Nov 2012, Nick Bartos wrote:
>>>>>>>>>>> FYI the build which included all 3.5 backports except patch #50 is
>>>>>>>>>>> still going strong after 21 builds.
>>>>>>>>>>
>>>>>>>>>> Okay, that one at least makes some sense.  I've opened
>>>>>>>>>>
>>>>>>>>>>          http://tracker.newdream.net/issues/3519
>>>>>>>>>>
>>>>>>>>>> How easy is this to reproduce?  If it is something you can
>>>>>>>>>> trigger with
>>>>>>>>>> debugging enabled ('echo module libceph +p >
>>>>>>>>>> /sys/kernel/debug/dynamic_debug/control') that would help
>>>>>>>>>> tremendously.
>>>>>>>>>>
>>>>>>>>>> I'm guessing that during this startup time the OSDs are still in the
>>>>>>>>>> process of starting?
>>>>>>>>>>
>>>>>>>>>> Alex, I bet that a test that does a lot of map/unmap stuff in a
>>>>>>>>>> loop while
>>>>>>>>>> thrashing OSDs could hit this.
>>>>>>>>>>
>>>>>>>>>> Thanks!
>>>>>>>>>> sage
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> On Wed, Nov 21, 2012 at 9:34 AM, Nick Bartos
>>>>>>>>>>> <nick@pistoncloud.com> wrote:
>>>>>>>>>>>> With 8 successful installs already done, I'm reasonably
>>>>>>>>>>>> confident that
>>>>>>>>>>>> it's patch #50.  I'm making another build which applies all
>>>>>>>>>>>> patches
>>>>>>>>>>>> from the 3.5 backport branch, excluding that specific one.
>>>>>>>>>>>> I'll let
>>>>>>>>>>>> you know if that turns up any unexpected failures.
>>>>>>>>>>>>
>>>>>>>>>>>> What will the potential fall out be for removing that specific
>>>>>>>>>>>> patch?
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> On Wed, Nov 21, 2012 at 9:02 AM, Nick Bartos
>>>>>>>>>>>> <nick@pistoncloud.com> wrote:
>>>>>>>>>>>>> It's really looking like it's the
>>>>>>>>>>>>> libceph_resubmit_linger_ops_when_pg_mapping_changes commit.  When
>>>>>>>>>>>>> patches 1-50 (listed below) are applied to 3.5.7, the hang is
>>>>>>>>>>>>> present.
>>>>>>>>>>>>>   So far I have gone through 4 successful installs with no
>>>>>>>>>>>>> hang with
>>>>>>>>>>>>> only 1-49 applied.  I'm still leaving my test run to make sure
>>>>>>>>>>>>> it's
>>>>>>>>>>>>> not a fluke, but since previously it hangs within the first
>>>>>>>>>>>>> couple of
>>>>>>>>>>>>> builds, it really looks like this is where the problem
>>>>>>>>>>>>> originated.
>>>>>>>>>>>>>
>>>>>>>>>>>>> 1-libceph_eliminate_connection_state_DEAD.patch
>>>>>>>>>>>>> 2-libceph_kill_bad_proto_ceph_connection_op.patch
>>>>>>>>>>>>> 3-libceph_rename_socket_callbacks.patch
>>>>>>>>>>>>> 4-libceph_rename_kvec_reset_and_kvec_add_functions.patch
>>>>>>>>>>>>> 5-libceph_embed_ceph_messenger_structure_in_ceph_client.patch
>>>>>>>>>>>>> 6-libceph_start_separating_connection_flags_from_state.patch
>>>>>>>>>>>>> 7-libceph_start_tracking_connection_socket_state.patch
>>>>>>>>>>>>> 8-libceph_provide_osd_number_when_creating_osd.patch
>>>>>>>>>>>>> 9-libceph_set_CLOSED_state_bit_in_con_init.patch
>>>>>>>>>>>>> 10-libceph_embed_ceph_connection_structure_in_mon_client.patch
>>>>>>>>>>>>> 11-libceph_drop_connection_refcounting_for_mon_client.patch
>>>>>>>>>>>>> 12-libceph_init_monitor_connection_when_opening.patch
>>>>>>>>>>>>> 13-libceph_fully_initialize_connection_in_con_init.patch
>>>>>>>>>>>>> 14-libceph_tweak_ceph_alloc_msg.patch
>>>>>>>>>>>>> 15-libceph_have_messages_point_to_their_connection.patch
>>>>>>>>>>>>> 16-libceph_have_messages_take_a_connection_reference.patch
>>>>>>>>>>>>> 17-libceph_make_ceph_con_revoke_a_msg_operation.patch
>>>>>>>>>>>>> 18-libceph_make_ceph_con_revoke_message_a_msg_op.patch
>>>>>>>>>>>>> 19-libceph_fix_overflow_in___decode_pool_names.patch
>>>>>>>>>>>>> 20-libceph_fix_overflow_in_osdmap_decode.patch
>>>>>>>>>>>>> 21-libceph_fix_overflow_in_osdmap_apply_incremental.patch
>>>>>>>>>>>>> 22-libceph_transition_socket_state_prior_to_actual_connect.patch
>>>>>>>>>>>>> 23-libceph_fix_NULL_dereference_in_reset_connection.patch
>>>>>>>>>>>>> 24-libceph_use_con_get_put_methods.patch
>>>>>>>>>>>>> 25-libceph_drop_ceph_con_get_put_helpers_and_nref_member.patch
>>>>>>>>>>>>> 26-libceph_encapsulate_out_message_data_setup.patch
>>>>>>>>>>>>> 27-libceph_encapsulate_advancing_msg_page.patch
>>>>>>>>>>>>> 28-libceph_don_t_mark_footer_complete_before_it_is.patch
>>>>>>>>>>>>> 29-libceph_move_init_bio__functions_up.patch
>>>>>>>>>>>>> 30-libceph_move_init_of_bio_iter.patch
>>>>>>>>>>>>> 31-libceph_don_t_use_bio_iter_as_a_flag.patch
>>>>>>>>>>>>> 32-libceph_SOCK_CLOSED_is_a_flag_not_a_state.patch
>>>>>>>>>>>>> 33-libceph_don_t_change_socket_state_on_sock_event.patch
>>>>>>>>>>>>> 34-libceph_just_set_SOCK_CLOSED_when_state_changes.patch
>>>>>>>>>>>>> 35-libceph_don_t_touch_con_state_in_con_close_socket.patch
>>>>>>>>>>>>> 36-libceph_clear_CONNECTING_in_ceph_con_close.patch
>>>>>>>>>>>>> 37-libceph_clear_NEGOTIATING_when_done.patch
>>>>>>>>>>>>> 38-libceph_define_and_use_an_explicit_CONNECTED_state.patch
>>>>>>>>>>>>> 39-libceph_separate_banner_and_connect_writes.patch
>>>>>>>>>>>>> 40-libceph_distinguish_two_phases_of_connect_sequence.patch
>>>>>>>>>>>>> 41-libceph_small_changes_to_messenger.c.patch
>>>>>>>>>>>>> 42-libceph_add_some_fine_ASCII_art.patch
>>>>>>>>>>>>> 43-libceph_set_peer_name_on_con_open_not_init.patch
>>>>>>>>>>>>> 44-libceph_initialize_mon_client_con_only_once.patch
>>>>>>>>>>>>> 45-libceph_allow_sock_transition_from_CONNECTING_to_CLOSED.patch
>>>>>>>>>>>>> 46-libceph_initialize_msgpool_message_types.patch
>>>>>>>>>>>>> 47-libceph_prevent_the_race_of_incoming_work_during_teardown.patch
>>>>>>>>>>>>>
>>>>>>>>>>>>> 48-libceph_report_socket_read_write_error_message.patch
>>>>>>>>>>>>> 49-libceph_fix_mutex_coverage_for_ceph_con_close.patch
>>>>>>>>>>>>> 50-libceph_resubmit_linger_ops_when_pg_mapping_changes.patch
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> On Wed, Nov 21, 2012 at 8:50 AM, Sage Weil <sage@inktank.com>
>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>> Thanks for hunting this down.  I'm very curious what the
>>>>>>>>>>>>>> culprit is...
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> sage
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>
>>>>>>
>>>>
>>>
> 


^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: rbd map command hangs for 15 minutes during system start up
  2012-12-11 19:44                                                           ` Alex Elder
@ 2012-12-13  0:57                                                             ` Nick Bartos
  2012-12-13 19:00                                                               ` Nick Bartos
  0 siblings, 1 reply; 56+ messages in thread
From: Nick Bartos @ 2012-12-13  0:57 UTC (permalink / raw)
  To: Alex Elder
  Cc: Sage Weil, Gregory Farnum, Josh Durgin, Mandell Degerness, ceph-devel

Using wip-nick-newer, the problem still presented itself after 4
successful runs (so it may be a fluke, but it got slightly further
than before).  The log is here:
https://gist.github.com/raw/4273114/9085ed00d5bdd5ebab9a94b48f4a562d1fbac431/rbd-hang-1355359129.log

Unfortunately I forgot to enable libceph debugging, I'll do that in a
bit and get you another log later.


On Tue, Dec 11, 2012 at 11:44 AM, Alex Elder <elder@inktank.com> wrote:
> On 12/11/2012 12:01 PM, Alex Elder wrote:
>> On 12/11/2012 11:26 AM, Nick Bartos wrote:
>>> Thanks! I'm creating a build with the new patches now.  I'll let you
>>> know how testing goes.
>>
>> FYI, I've been testing with these changes and have *not* been
>> hitting the kinds of problems I'd been previously.  However
>> those problems were different from yours, so I'm offering no
>> promises...  But there's a chance it'll be more helpful than
>> I thought.
>>
>> I am preparing yet another branch for you, this time adding
>> all the rest of the commits you started with, just in case
>> this does improve things.
>
> This new branch is ready.  Feel free to try it out, and again
> let me know how it works for you.
>
> The branch is "wip-nick-newer"
>
>                                         -Alex
>
>
>> Please keep me informed how your testing goes.
>>
>>                                       -Alex
>>
>>
>>> On Mon, Dec 10, 2012 at 1:57 PM, Alex Elder <elder@inktank.com> wrote:
>>>> On 12/02/2012 10:43 PM, Alex Elder wrote:
>>>>> On 12/01/2012 11:34 PM, Nick Bartos wrote:
>>>>>> Unfortunately the hangs happen with the new set of patches.  Here's
>>>>>> some debug info:
>>>>>>
>>>>>> https://gist.github.com/raw/4187123/90194ce172130244a9c1c968ed185eee7282d809/gistfile1.txt
>>>>>>
>>>>>
>>>>> Well I'm sorry to hear that but I'm glad to have the new info.
>>>>>
>>>>> In retrospect running the new patches *without* the one that
>>>>> seems to cause the hang (#50) were good validation that they
>>>>> didn't lead to any new problems.
>>>>>
>>>>> I'll look at this some more in the morning, and I think I'll
>>>>> confer with Sage whenever he's available for ideas on how to
>>>>> proceed.
>>>>
>>>> Over the course of last week I have been finding and fixing a
>>>> few problems in rbd, the osd client, and the messenger in the
>>>> Linux kernel code.  I've added a handful of new patches to the
>>>> end of the ones I gave you last time.
>>>>
>>>> At this point I don't expect these changes directly affect
>>>> the hangs you have been seeing, but a couple of these are
>>>> real problems you could (also) hit, and I'd like to avoid
>>>> that.
>>>>
>>>> I haven't done rigorous testing on this but I believe
>>>> the changes are correct (and Sage has looked at them
>>>> and says they look OK to him).
>>>>
>>>> The new version is available in the branch "wip-nick-new"
>>>> in the ceph-client git repository.  If you reproduce your
>>>> hang with this updated code (or do not), please let me know.
>>>>
>>>> Thanks.
>>>>
>>>>                                         -Alex
>>>>
>>>>
>>>>>                     -Alex
>>>>>
>>>>>
>>>>>> On Fri, Nov 30, 2012 at 3:22 PM, Alex Elder <elder@inktank.com> wrote:
>>>>>>> On 11/29/2012 02:37 PM, Alex Elder wrote:
>>>>>>>> On 11/22/2012 12:04 PM, Nick Bartos wrote:
>>>>>>>>> Here are the ceph log messages (including the libceph kernel debug
>>>>>>>>> stuff you asked for) from a node boot with the rbd command hung for a
>>>>>>>>> couple of minutes:
>>>>>>>
>>>>>>> I'm sorry, but I did something stupid...
>>>>>>>
>>>>>>> Yes, the branch I gave you includes these fixes.  However
>>>>>>> it does *not* include the commit that was giving you trouble
>>>>>>> to begin with.
>>>>>>>
>>>>>>> So...
>>>>>>>
>>>>>>> I have updated that same branch (wip-nick) to contain:
>>>>>>> - Linux 3.5.5
>>>>>>> - Plus the first *50* (not 49) patches you listed
>>>>>>> - Plus the ones I added before.
>>>>>>>
>>>>>>> The new commit id for that branch begins with be3198d6.
>>>>>>>
>>>>>>> I'm really sorry for this mistake.  Please try this new
>>>>>>> branch and report back what you find.
>>>>>>>
>>>>>>>                                          -Alex
>>>>>>>
>>>>>>>
>>>>>>>> Nick, I have put together a branch that includes two fixes
>>>>>>>> that might be helpful.  I don't expect these fixes will
>>>>>>>> necessarily *fix* what you're seeing, but one of them
>>>>>>>> pulls a big hunk of processing out of the picture and
>>>>>>>> might help eliminate some potential causes.  I had to
>>>>>>>> pull in several other patches as prerequisites in order
>>>>>>>> to get those fixes to apply cleanly.
>>>>>>>>
>>>>>>>> Would you be able to give it a try, and let us know what
>>>>>>>> results you get?  The branch contains:
>>>>>>>> - Linux 3.5.5
>>>>>>>> - Plus the first 49 patches you listed
>>>>>>>> - Plus four patches, which are prerequisites...
>>>>>>>>      libceph: define ceph_extract_encoded_string()
>>>>>>>>      rbd: define some new format constants
>>>>>>>>      rbd: define rbd_dev_image_id()
>>>>>>>>      rbd: kill create_snap sysfs entry
>>>>>>>> - ...for these two bug fixes:
>>>>>>>>      libceph: remove 'osdtimeout' option
>>>>>>>>      ceph: don't reference req after put
>>>>>>>>
>>>>>>>> The branch is available in the ceph-client git repository
>>>>>>>> under the name "wip-nick" and has commit id dd9323aa.
>>>>>>>>      https://github.com/ceph/ceph-client/tree/wip-nick
>>>>>>>>
>>>>>>>>> https://raw.github.com/gist/4132395/7cb5f0150179b012429c6e57749120dd88616cce/gistfile1.txt
>>>>>>>>>
>>>>>>>>
>>>>>>>> This full debug output is very helpful.  Please supply
>>>>>>>> that again as well.
>>>>>>>>
>>>>>>>> Thanks.
>>>>>>>>
>>>>>>>>                                        -Alex
>>>>>>>>
>>>>>>>>> On Wed, Nov 21, 2012 at 9:49 PM, Nick Bartos <nick@pistoncloud.com>
>>>>>>>>> wrote:
>>>>>>>>>> It's very easy to reproduce now with my automated install script, the
>>>>>>>>>> most I've seen it succeed with that patch is 2 in a row, and hanging
>>>>>>>>>> on the 3rd, although it hangs on most builds.  So it shouldn't take
>>>>>>>>>> much to get it to do it again.  I'll try and get to that tomorrow,
>>>>>>>>>> when I'm a bit more rested and my brain is working better.
>>>>>>>>>>
>>>>>>>>>> Yes during this the OSDs are probably all syncing up.  All the osd
>>>>>>>>>> and
>>>>>>>>>> mon daemons have started by the time the rdb commands are ran,
>>>>>>>>>> though.
>>>>>>>>>>
>>>>>>>>>> On Wed, Nov 21, 2012 at 8:47 PM, Sage Weil <sage@inktank.com> wrote:
>>>>>>>>>>> On Wed, 21 Nov 2012, Nick Bartos wrote:
>>>>>>>>>>>> FYI the build which included all 3.5 backports except patch #50 is
>>>>>>>>>>>> still going strong after 21 builds.
>>>>>>>>>>>
>>>>>>>>>>> Okay, that one at least makes some sense.  I've opened
>>>>>>>>>>>
>>>>>>>>>>>          http://tracker.newdream.net/issues/3519
>>>>>>>>>>>
>>>>>>>>>>> How easy is this to reproduce?  If it is something you can
>>>>>>>>>>> trigger with
>>>>>>>>>>> debugging enabled ('echo module libceph +p >
>>>>>>>>>>> /sys/kernel/debug/dynamic_debug/control') that would help
>>>>>>>>>>> tremendously.
>>>>>>>>>>>
>>>>>>>>>>> I'm guessing that during this startup time the OSDs are still in the
>>>>>>>>>>> process of starting?
>>>>>>>>>>>
>>>>>>>>>>> Alex, I bet that a test that does a lot of map/unmap stuff in a
>>>>>>>>>>> loop while
>>>>>>>>>>> thrashing OSDs could hit this.
>>>>>>>>>>>
>>>>>>>>>>> Thanks!
>>>>>>>>>>> sage
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> On Wed, Nov 21, 2012 at 9:34 AM, Nick Bartos
>>>>>>>>>>>> <nick@pistoncloud.com> wrote:
>>>>>>>>>>>>> With 8 successful installs already done, I'm reasonably
>>>>>>>>>>>>> confident that
>>>>>>>>>>>>> it's patch #50.  I'm making another build which applies all
>>>>>>>>>>>>> patches
>>>>>>>>>>>>> from the 3.5 backport branch, excluding that specific one.
>>>>>>>>>>>>> I'll let
>>>>>>>>>>>>> you know if that turns up any unexpected failures.
>>>>>>>>>>>>>
>>>>>>>>>>>>> What will the potential fall out be for removing that specific
>>>>>>>>>>>>> patch?
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> On Wed, Nov 21, 2012 at 9:02 AM, Nick Bartos
>>>>>>>>>>>>> <nick@pistoncloud.com> wrote:
>>>>>>>>>>>>>> It's really looking like it's the
>>>>>>>>>>>>>> libceph_resubmit_linger_ops_when_pg_mapping_changes commit.  When
>>>>>>>>>>>>>> patches 1-50 (listed below) are applied to 3.5.7, the hang is
>>>>>>>>>>>>>> present.
>>>>>>>>>>>>>>   So far I have gone through 4 successful installs with no
>>>>>>>>>>>>>> hang with
>>>>>>>>>>>>>> only 1-49 applied.  I'm still leaving my test run to make sure
>>>>>>>>>>>>>> it's
>>>>>>>>>>>>>> not a fluke, but since previously it hangs within the first
>>>>>>>>>>>>>> couple of
>>>>>>>>>>>>>> builds, it really looks like this is where the problem
>>>>>>>>>>>>>> originated.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> 1-libceph_eliminate_connection_state_DEAD.patch
>>>>>>>>>>>>>> 2-libceph_kill_bad_proto_ceph_connection_op.patch
>>>>>>>>>>>>>> 3-libceph_rename_socket_callbacks.patch
>>>>>>>>>>>>>> 4-libceph_rename_kvec_reset_and_kvec_add_functions.patch
>>>>>>>>>>>>>> 5-libceph_embed_ceph_messenger_structure_in_ceph_client.patch
>>>>>>>>>>>>>> 6-libceph_start_separating_connection_flags_from_state.patch
>>>>>>>>>>>>>> 7-libceph_start_tracking_connection_socket_state.patch
>>>>>>>>>>>>>> 8-libceph_provide_osd_number_when_creating_osd.patch
>>>>>>>>>>>>>> 9-libceph_set_CLOSED_state_bit_in_con_init.patch
>>>>>>>>>>>>>> 10-libceph_embed_ceph_connection_structure_in_mon_client.patch
>>>>>>>>>>>>>> 11-libceph_drop_connection_refcounting_for_mon_client.patch
>>>>>>>>>>>>>> 12-libceph_init_monitor_connection_when_opening.patch
>>>>>>>>>>>>>> 13-libceph_fully_initialize_connection_in_con_init.patch
>>>>>>>>>>>>>> 14-libceph_tweak_ceph_alloc_msg.patch
>>>>>>>>>>>>>> 15-libceph_have_messages_point_to_their_connection.patch
>>>>>>>>>>>>>> 16-libceph_have_messages_take_a_connection_reference.patch
>>>>>>>>>>>>>> 17-libceph_make_ceph_con_revoke_a_msg_operation.patch
>>>>>>>>>>>>>> 18-libceph_make_ceph_con_revoke_message_a_msg_op.patch
>>>>>>>>>>>>>> 19-libceph_fix_overflow_in___decode_pool_names.patch
>>>>>>>>>>>>>> 20-libceph_fix_overflow_in_osdmap_decode.patch
>>>>>>>>>>>>>> 21-libceph_fix_overflow_in_osdmap_apply_incremental.patch
>>>>>>>>>>>>>> 22-libceph_transition_socket_state_prior_to_actual_connect.patch
>>>>>>>>>>>>>> 23-libceph_fix_NULL_dereference_in_reset_connection.patch
>>>>>>>>>>>>>> 24-libceph_use_con_get_put_methods.patch
>>>>>>>>>>>>>> 25-libceph_drop_ceph_con_get_put_helpers_and_nref_member.patch
>>>>>>>>>>>>>> 26-libceph_encapsulate_out_message_data_setup.patch
>>>>>>>>>>>>>> 27-libceph_encapsulate_advancing_msg_page.patch
>>>>>>>>>>>>>> 28-libceph_don_t_mark_footer_complete_before_it_is.patch
>>>>>>>>>>>>>> 29-libceph_move_init_bio__functions_up.patch
>>>>>>>>>>>>>> 30-libceph_move_init_of_bio_iter.patch
>>>>>>>>>>>>>> 31-libceph_don_t_use_bio_iter_as_a_flag.patch
>>>>>>>>>>>>>> 32-libceph_SOCK_CLOSED_is_a_flag_not_a_state.patch
>>>>>>>>>>>>>> 33-libceph_don_t_change_socket_state_on_sock_event.patch
>>>>>>>>>>>>>> 34-libceph_just_set_SOCK_CLOSED_when_state_changes.patch
>>>>>>>>>>>>>> 35-libceph_don_t_touch_con_state_in_con_close_socket.patch
>>>>>>>>>>>>>> 36-libceph_clear_CONNECTING_in_ceph_con_close.patch
>>>>>>>>>>>>>> 37-libceph_clear_NEGOTIATING_when_done.patch
>>>>>>>>>>>>>> 38-libceph_define_and_use_an_explicit_CONNECTED_state.patch
>>>>>>>>>>>>>> 39-libceph_separate_banner_and_connect_writes.patch
>>>>>>>>>>>>>> 40-libceph_distinguish_two_phases_of_connect_sequence.patch
>>>>>>>>>>>>>> 41-libceph_small_changes_to_messenger.c.patch
>>>>>>>>>>>>>> 42-libceph_add_some_fine_ASCII_art.patch
>>>>>>>>>>>>>> 43-libceph_set_peer_name_on_con_open_not_init.patch
>>>>>>>>>>>>>> 44-libceph_initialize_mon_client_con_only_once.patch
>>>>>>>>>>>>>> 45-libceph_allow_sock_transition_from_CONNECTING_to_CLOSED.patch
>>>>>>>>>>>>>> 46-libceph_initialize_msgpool_message_types.patch
>>>>>>>>>>>>>> 47-libceph_prevent_the_race_of_incoming_work_during_teardown.patch
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> 48-libceph_report_socket_read_write_error_message.patch
>>>>>>>>>>>>>> 49-libceph_fix_mutex_coverage_for_ceph_con_close.patch
>>>>>>>>>>>>>> 50-libceph_resubmit_linger_ops_when_pg_mapping_changes.patch
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> On Wed, Nov 21, 2012 at 8:50 AM, Sage Weil <sage@inktank.com>
>>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>> Thanks for hunting this down.  I'm very curious what the
>>>>>>>>>>>>>>> culprit is...
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> sage
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>
>>>>
>>
>

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: rbd map command hangs for 15 minutes during system start up
  2012-12-13  0:57                                                             ` Nick Bartos
@ 2012-12-13 19:00                                                               ` Nick Bartos
  2012-12-13 19:07                                                                 ` Alex Elder
  2012-12-14 16:46                                                                 ` Alex Elder
  0 siblings, 2 replies; 56+ messages in thread
From: Nick Bartos @ 2012-12-13 19:00 UTC (permalink / raw)
  To: Alex Elder
  Cc: Sage Weil, Gregory Farnum, Josh Durgin, Mandell Degerness, ceph-devel

Here's another log with the kernel debugging enabled:
https://gist.github.com/raw/4278697/1c9e41d275e614783fbbdee8ca5842680f46c249/rbd-hang-1355424455.log

Note that it hung on the 2nd try.


On Wed, Dec 12, 2012 at 4:57 PM, Nick Bartos <nick@pistoncloud.com> wrote:
> Using wip-nick-newer, the problem still presented itself after 4
> successful runs (so it may be a fluke, but it got slightly further
> than before).  The log is here:
> https://gist.github.com/raw/4273114/9085ed00d5bdd5ebab9a94b48f4a562d1fbac431/rbd-hang-1355359129.log
>
> Unfortunately I forgot to enable libceph debugging, I'll do that in a
> bit and get you another log later.
>
>
> On Tue, Dec 11, 2012 at 11:44 AM, Alex Elder <elder@inktank.com> wrote:
>> On 12/11/2012 12:01 PM, Alex Elder wrote:
>>> On 12/11/2012 11:26 AM, Nick Bartos wrote:
>>>> Thanks! I'm creating a build with the new patches now.  I'll let you
>>>> know how testing goes.
>>>
>>> FYI, I've been testing with these changes and have *not* been
>>> hitting the kinds of problems I'd been previously.  However
>>> those problems were different from yours, so I'm offering no
>>> promises...  But there's a chance it'll be more helpful than
>>> I thought.
>>>
>>> I am preparing yet another branch for you, this time adding
>>> all the rest of the commits you started with, just in case
>>> this does improve things.
>>
>> This new branch is ready.  Feel free to try it out, and again
>> let me know how it works for you.
>>
>> The branch is "wip-nick-newer"
>>
>>                                         -Alex
>>
>>
>>> Please keep me informed how your testing goes.
>>>
>>>                                       -Alex
>>>
>>>
>>>> On Mon, Dec 10, 2012 at 1:57 PM, Alex Elder <elder@inktank.com> wrote:
>>>>> On 12/02/2012 10:43 PM, Alex Elder wrote:
>>>>>> On 12/01/2012 11:34 PM, Nick Bartos wrote:
>>>>>>> Unfortunately the hangs happen with the new set of patches.  Here's
>>>>>>> some debug info:
>>>>>>>
>>>>>>> https://gist.github.com/raw/4187123/90194ce172130244a9c1c968ed185eee7282d809/gistfile1.txt
>>>>>>>
>>>>>>
>>>>>> Well I'm sorry to hear that but I'm glad to have the new info.
>>>>>>
>>>>>> In retrospect running the new patches *without* the one that
>>>>>> seems to cause the hang (#50) were good validation that they
>>>>>> didn't lead to any new problems.
>>>>>>
>>>>>> I'll look at this some more in the morning, and I think I'll
>>>>>> confer with Sage whenever he's available for ideas on how to
>>>>>> proceed.
>>>>>
>>>>> Over the course of last week I have been finding and fixing a
>>>>> few problems in rbd, the osd client, and the messenger in the
>>>>> Linux kernel code.  I've added a handful of new patches to the
>>>>> end of the ones I gave you last time.
>>>>>
>>>>> At this point I don't expect these changes directly affect
>>>>> the hangs you have been seeing, but a couple of these are
>>>>> real problems you could (also) hit, and I'd like to avoid
>>>>> that.
>>>>>
>>>>> I haven't done rigorous testing on this but I believe
>>>>> the changes are correct (and Sage has looked at them
>>>>> and says they look OK to him).
>>>>>
>>>>> The new version is available in the branch "wip-nick-new"
>>>>> in the ceph-client git repository.  If you reproduce your
>>>>> hang with this updated code (or do not), please let me know.
>>>>>
>>>>> Thanks.
>>>>>
>>>>>                                         -Alex
>>>>>
>>>>>
>>>>>>                     -Alex
>>>>>>
>>>>>>
>>>>>>> On Fri, Nov 30, 2012 at 3:22 PM, Alex Elder <elder@inktank.com> wrote:
>>>>>>>> On 11/29/2012 02:37 PM, Alex Elder wrote:
>>>>>>>>> On 11/22/2012 12:04 PM, Nick Bartos wrote:
>>>>>>>>>> Here are the ceph log messages (including the libceph kernel debug
>>>>>>>>>> stuff you asked for) from a node boot with the rbd command hung for a
>>>>>>>>>> couple of minutes:
>>>>>>>>
>>>>>>>> I'm sorry, but I did something stupid...
>>>>>>>>
>>>>>>>> Yes, the branch I gave you includes these fixes.  However
>>>>>>>> it does *not* include the commit that was giving you trouble
>>>>>>>> to begin with.
>>>>>>>>
>>>>>>>> So...
>>>>>>>>
>>>>>>>> I have updated that same branch (wip-nick) to contain:
>>>>>>>> - Linux 3.5.5
>>>>>>>> - Plus the first *50* (not 49) patches you listed
>>>>>>>> - Plus the ones I added before.
>>>>>>>>
>>>>>>>> The new commit id for that branch begins with be3198d6.
>>>>>>>>
>>>>>>>> I'm really sorry for this mistake.  Please try this new
>>>>>>>> branch and report back what you find.
>>>>>>>>
>>>>>>>>                                          -Alex
>>>>>>>>
>>>>>>>>
>>>>>>>>> Nick, I have put together a branch that includes two fixes
>>>>>>>>> that might be helpful.  I don't expect these fixes will
>>>>>>>>> necessarily *fix* what you're seeing, but one of them
>>>>>>>>> pulls a big hunk of processing out of the picture and
>>>>>>>>> might help eliminate some potential causes.  I had to
>>>>>>>>> pull in several other patches as prerequisites in order
>>>>>>>>> to get those fixes to apply cleanly.
>>>>>>>>>
>>>>>>>>> Would you be able to give it a try, and let us know what
>>>>>>>>> results you get?  The branch contains:
>>>>>>>>> - Linux 3.5.5
>>>>>>>>> - Plus the first 49 patches you listed
>>>>>>>>> - Plus four patches, which are prerequisites...
>>>>>>>>>      libceph: define ceph_extract_encoded_string()
>>>>>>>>>      rbd: define some new format constants
>>>>>>>>>      rbd: define rbd_dev_image_id()
>>>>>>>>>      rbd: kill create_snap sysfs entry
>>>>>>>>> - ...for these two bug fixes:
>>>>>>>>>      libceph: remove 'osdtimeout' option
>>>>>>>>>      ceph: don't reference req after put
>>>>>>>>>
>>>>>>>>> The branch is available in the ceph-client git repository
>>>>>>>>> under the name "wip-nick" and has commit id dd9323aa.
>>>>>>>>>      https://github.com/ceph/ceph-client/tree/wip-nick
>>>>>>>>>
>>>>>>>>>> https://raw.github.com/gist/4132395/7cb5f0150179b012429c6e57749120dd88616cce/gistfile1.txt
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>> This full debug output is very helpful.  Please supply
>>>>>>>>> that again as well.
>>>>>>>>>
>>>>>>>>> Thanks.
>>>>>>>>>
>>>>>>>>>                                        -Alex
>>>>>>>>>
>>>>>>>>>> On Wed, Nov 21, 2012 at 9:49 PM, Nick Bartos <nick@pistoncloud.com>
>>>>>>>>>> wrote:
>>>>>>>>>>> It's very easy to reproduce now with my automated install script, the
>>>>>>>>>>> most I've seen it succeed with that patch is 2 in a row, and hanging
>>>>>>>>>>> on the 3rd, although it hangs on most builds.  So it shouldn't take
>>>>>>>>>>> much to get it to do it again.  I'll try and get to that tomorrow,
>>>>>>>>>>> when I'm a bit more rested and my brain is working better.
>>>>>>>>>>>
>>>>>>>>>>> Yes during this the OSDs are probably all syncing up.  All the osd
>>>>>>>>>>> and
>>>>>>>>>>> mon daemons have started by the time the rdb commands are ran,
>>>>>>>>>>> though.
>>>>>>>>>>>
>>>>>>>>>>> On Wed, Nov 21, 2012 at 8:47 PM, Sage Weil <sage@inktank.com> wrote:
>>>>>>>>>>>> On Wed, 21 Nov 2012, Nick Bartos wrote:
>>>>>>>>>>>>> FYI the build which included all 3.5 backports except patch #50 is
>>>>>>>>>>>>> still going strong after 21 builds.
>>>>>>>>>>>>
>>>>>>>>>>>> Okay, that one at least makes some sense.  I've opened
>>>>>>>>>>>>
>>>>>>>>>>>>          http://tracker.newdream.net/issues/3519
>>>>>>>>>>>>
>>>>>>>>>>>> How easy is this to reproduce?  If it is something you can
>>>>>>>>>>>> trigger with
>>>>>>>>>>>> debugging enabled ('echo module libceph +p >
>>>>>>>>>>>> /sys/kernel/debug/dynamic_debug/control') that would help
>>>>>>>>>>>> tremendously.
>>>>>>>>>>>>
>>>>>>>>>>>> I'm guessing that during this startup time the OSDs are still in the
>>>>>>>>>>>> process of starting?
>>>>>>>>>>>>
>>>>>>>>>>>> Alex, I bet that a test that does a lot of map/unmap stuff in a
>>>>>>>>>>>> loop while
>>>>>>>>>>>> thrashing OSDs could hit this.
>>>>>>>>>>>>
>>>>>>>>>>>> Thanks!
>>>>>>>>>>>> sage
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> On Wed, Nov 21, 2012 at 9:34 AM, Nick Bartos
>>>>>>>>>>>>> <nick@pistoncloud.com> wrote:
>>>>>>>>>>>>>> With 8 successful installs already done, I'm reasonably
>>>>>>>>>>>>>> confident that
>>>>>>>>>>>>>> it's patch #50.  I'm making another build which applies all
>>>>>>>>>>>>>> patches
>>>>>>>>>>>>>> from the 3.5 backport branch, excluding that specific one.
>>>>>>>>>>>>>> I'll let
>>>>>>>>>>>>>> you know if that turns up any unexpected failures.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> What will the potential fall out be for removing that specific
>>>>>>>>>>>>>> patch?
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> On Wed, Nov 21, 2012 at 9:02 AM, Nick Bartos
>>>>>>>>>>>>>> <nick@pistoncloud.com> wrote:
>>>>>>>>>>>>>>> It's really looking like it's the
>>>>>>>>>>>>>>> libceph_resubmit_linger_ops_when_pg_mapping_changes commit.  When
>>>>>>>>>>>>>>> patches 1-50 (listed below) are applied to 3.5.7, the hang is
>>>>>>>>>>>>>>> present.
>>>>>>>>>>>>>>>   So far I have gone through 4 successful installs with no
>>>>>>>>>>>>>>> hang with
>>>>>>>>>>>>>>> only 1-49 applied.  I'm still leaving my test run to make sure
>>>>>>>>>>>>>>> it's
>>>>>>>>>>>>>>> not a fluke, but since previously it hangs within the first
>>>>>>>>>>>>>>> couple of
>>>>>>>>>>>>>>> builds, it really looks like this is where the problem
>>>>>>>>>>>>>>> originated.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> 1-libceph_eliminate_connection_state_DEAD.patch
>>>>>>>>>>>>>>> 2-libceph_kill_bad_proto_ceph_connection_op.patch
>>>>>>>>>>>>>>> 3-libceph_rename_socket_callbacks.patch
>>>>>>>>>>>>>>> 4-libceph_rename_kvec_reset_and_kvec_add_functions.patch
>>>>>>>>>>>>>>> 5-libceph_embed_ceph_messenger_structure_in_ceph_client.patch
>>>>>>>>>>>>>>> 6-libceph_start_separating_connection_flags_from_state.patch
>>>>>>>>>>>>>>> 7-libceph_start_tracking_connection_socket_state.patch
>>>>>>>>>>>>>>> 8-libceph_provide_osd_number_when_creating_osd.patch
>>>>>>>>>>>>>>> 9-libceph_set_CLOSED_state_bit_in_con_init.patch
>>>>>>>>>>>>>>> 10-libceph_embed_ceph_connection_structure_in_mon_client.patch
>>>>>>>>>>>>>>> 11-libceph_drop_connection_refcounting_for_mon_client.patch
>>>>>>>>>>>>>>> 12-libceph_init_monitor_connection_when_opening.patch
>>>>>>>>>>>>>>> 13-libceph_fully_initialize_connection_in_con_init.patch
>>>>>>>>>>>>>>> 14-libceph_tweak_ceph_alloc_msg.patch
>>>>>>>>>>>>>>> 15-libceph_have_messages_point_to_their_connection.patch
>>>>>>>>>>>>>>> 16-libceph_have_messages_take_a_connection_reference.patch
>>>>>>>>>>>>>>> 17-libceph_make_ceph_con_revoke_a_msg_operation.patch
>>>>>>>>>>>>>>> 18-libceph_make_ceph_con_revoke_message_a_msg_op.patch
>>>>>>>>>>>>>>> 19-libceph_fix_overflow_in___decode_pool_names.patch
>>>>>>>>>>>>>>> 20-libceph_fix_overflow_in_osdmap_decode.patch
>>>>>>>>>>>>>>> 21-libceph_fix_overflow_in_osdmap_apply_incremental.patch
>>>>>>>>>>>>>>> 22-libceph_transition_socket_state_prior_to_actual_connect.patch
>>>>>>>>>>>>>>> 23-libceph_fix_NULL_dereference_in_reset_connection.patch
>>>>>>>>>>>>>>> 24-libceph_use_con_get_put_methods.patch
>>>>>>>>>>>>>>> 25-libceph_drop_ceph_con_get_put_helpers_and_nref_member.patch
>>>>>>>>>>>>>>> 26-libceph_encapsulate_out_message_data_setup.patch
>>>>>>>>>>>>>>> 27-libceph_encapsulate_advancing_msg_page.patch
>>>>>>>>>>>>>>> 28-libceph_don_t_mark_footer_complete_before_it_is.patch
>>>>>>>>>>>>>>> 29-libceph_move_init_bio__functions_up.patch
>>>>>>>>>>>>>>> 30-libceph_move_init_of_bio_iter.patch
>>>>>>>>>>>>>>> 31-libceph_don_t_use_bio_iter_as_a_flag.patch
>>>>>>>>>>>>>>> 32-libceph_SOCK_CLOSED_is_a_flag_not_a_state.patch
>>>>>>>>>>>>>>> 33-libceph_don_t_change_socket_state_on_sock_event.patch
>>>>>>>>>>>>>>> 34-libceph_just_set_SOCK_CLOSED_when_state_changes.patch
>>>>>>>>>>>>>>> 35-libceph_don_t_touch_con_state_in_con_close_socket.patch
>>>>>>>>>>>>>>> 36-libceph_clear_CONNECTING_in_ceph_con_close.patch
>>>>>>>>>>>>>>> 37-libceph_clear_NEGOTIATING_when_done.patch
>>>>>>>>>>>>>>> 38-libceph_define_and_use_an_explicit_CONNECTED_state.patch
>>>>>>>>>>>>>>> 39-libceph_separate_banner_and_connect_writes.patch
>>>>>>>>>>>>>>> 40-libceph_distinguish_two_phases_of_connect_sequence.patch
>>>>>>>>>>>>>>> 41-libceph_small_changes_to_messenger.c.patch
>>>>>>>>>>>>>>> 42-libceph_add_some_fine_ASCII_art.patch
>>>>>>>>>>>>>>> 43-libceph_set_peer_name_on_con_open_not_init.patch
>>>>>>>>>>>>>>> 44-libceph_initialize_mon_client_con_only_once.patch
>>>>>>>>>>>>>>> 45-libceph_allow_sock_transition_from_CONNECTING_to_CLOSED.patch
>>>>>>>>>>>>>>> 46-libceph_initialize_msgpool_message_types.patch
>>>>>>>>>>>>>>> 47-libceph_prevent_the_race_of_incoming_work_during_teardown.patch
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> 48-libceph_report_socket_read_write_error_message.patch
>>>>>>>>>>>>>>> 49-libceph_fix_mutex_coverage_for_ceph_con_close.patch
>>>>>>>>>>>>>>> 50-libceph_resubmit_linger_ops_when_pg_mapping_changes.patch
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> On Wed, Nov 21, 2012 at 8:50 AM, Sage Weil <sage@inktank.com>
>>>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>>> Thanks for hunting this down.  I'm very curious what the
>>>>>>>>>>>>>>>> culprit is...
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> sage
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>
>>>>>>>>
>>>>>>
>>>>>
>>>
>>

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: rbd map command hangs for 15 minutes during system start up
  2012-12-13 19:00                                                               ` Nick Bartos
@ 2012-12-13 19:07                                                                 ` Alex Elder
  2012-12-14 16:46                                                                 ` Alex Elder
  1 sibling, 0 replies; 56+ messages in thread
From: Alex Elder @ 2012-12-13 19:07 UTC (permalink / raw)
  To: Nick Bartos
  Cc: Sage Weil, Gregory Farnum, Josh Durgin, Mandell Degerness, ceph-devel

On 12/13/2012 01:00 PM, Nick Bartos wrote:
> Here's another log with the kernel debugging enabled:
> https://gist.github.com/raw/4278697/1c9e41d275e614783fbbdee8ca5842680f46c249/rbd-hang-1355424455.log
> 
> Note that it hung on the 2nd try.

OK, thanks for the info.  We'll keep looking.	-Alex

> 
> On Wed, Dec 12, 2012 at 4:57 PM, Nick Bartos <nick@pistoncloud.com> wrote:
>> Using wip-nick-newer, the problem still presented itself after 4
>> successful runs (so it may be a fluke, but it got slightly further
>> than before).  The log is here:
>> https://gist.github.com/raw/4273114/9085ed00d5bdd5ebab9a94b48f4a562d1fbac431/rbd-hang-1355359129.log
>>
>> Unfortunately I forgot to enable libceph debugging, I'll do that in a
>> bit and get you another log later.
>>
>>
>> On Tue, Dec 11, 2012 at 11:44 AM, Alex Elder <elder@inktank.com> wrote:
>>> On 12/11/2012 12:01 PM, Alex Elder wrote:
>>>> On 12/11/2012 11:26 AM, Nick Bartos wrote:
>>>>> Thanks! I'm creating a build with the new patches now.  I'll let you
>>>>> know how testing goes.
>>>>
>>>> FYI, I've been testing with these changes and have *not* been
>>>> hitting the kinds of problems I'd been previously.  However
>>>> those problems were different from yours, so I'm offering no
>>>> promises...  But there's a chance it'll be more helpful than
>>>> I thought.
>>>>
>>>> I am preparing yet another branch for you, this time adding
>>>> all the rest of the commits you started with, just in case
>>>> this does improve things.
>>>
>>> This new branch is ready.  Feel free to try it out, and again
>>> let me know how it works for you.
>>>
>>> The branch is "wip-nick-newer"
>>>
>>>                                         -Alex
>>>
>>>
>>>> Please keep me informed how your testing goes.
>>>>
>>>>                                       -Alex
>>>>
>>>>
>>>>> On Mon, Dec 10, 2012 at 1:57 PM, Alex Elder <elder@inktank.com> wrote:
>>>>>> On 12/02/2012 10:43 PM, Alex Elder wrote:
>>>>>>> On 12/01/2012 11:34 PM, Nick Bartos wrote:
>>>>>>>> Unfortunately the hangs happen with the new set of patches.  Here's
>>>>>>>> some debug info:
>>>>>>>>
>>>>>>>> https://gist.github.com/raw/4187123/90194ce172130244a9c1c968ed185eee7282d809/gistfile1.txt
>>>>>>>>
>>>>>>>
>>>>>>> Well I'm sorry to hear that but I'm glad to have the new info.
>>>>>>>
>>>>>>> In retrospect running the new patches *without* the one that
>>>>>>> seems to cause the hang (#50) were good validation that they
>>>>>>> didn't lead to any new problems.
>>>>>>>
>>>>>>> I'll look at this some more in the morning, and I think I'll
>>>>>>> confer with Sage whenever he's available for ideas on how to
>>>>>>> proceed.
>>>>>>
>>>>>> Over the course of last week I have been finding and fixing a
>>>>>> few problems in rbd, the osd client, and the messenger in the
>>>>>> Linux kernel code.  I've added a handful of new patches to the
>>>>>> end of the ones I gave you last time.
>>>>>>
>>>>>> At this point I don't expect these changes directly affect
>>>>>> the hangs you have been seeing, but a couple of these are
>>>>>> real problems you could (also) hit, and I'd like to avoid
>>>>>> that.
>>>>>>
>>>>>> I haven't done rigorous testing on this but I believe
>>>>>> the changes are correct (and Sage has looked at them
>>>>>> and says they look OK to him).
>>>>>>
>>>>>> The new version is available in the branch "wip-nick-new"
>>>>>> in the ceph-client git repository.  If you reproduce your
>>>>>> hang with this updated code (or do not), please let me know.
>>>>>>
>>>>>> Thanks.
>>>>>>
>>>>>>                                         -Alex
>>>>>>
>>>>>>
>>>>>>>                     -Alex
>>>>>>>
>>>>>>>
>>>>>>>> On Fri, Nov 30, 2012 at 3:22 PM, Alex Elder <elder@inktank.com> wrote:
>>>>>>>>> On 11/29/2012 02:37 PM, Alex Elder wrote:
>>>>>>>>>> On 11/22/2012 12:04 PM, Nick Bartos wrote:
>>>>>>>>>>> Here are the ceph log messages (including the libceph kernel debug
>>>>>>>>>>> stuff you asked for) from a node boot with the rbd command hung for a
>>>>>>>>>>> couple of minutes:
>>>>>>>>>
>>>>>>>>> I'm sorry, but I did something stupid...
>>>>>>>>>
>>>>>>>>> Yes, the branch I gave you includes these fixes.  However
>>>>>>>>> it does *not* include the commit that was giving you trouble
>>>>>>>>> to begin with.
>>>>>>>>>
>>>>>>>>> So...
>>>>>>>>>
>>>>>>>>> I have updated that same branch (wip-nick) to contain:
>>>>>>>>> - Linux 3.5.5
>>>>>>>>> - Plus the first *50* (not 49) patches you listed
>>>>>>>>> - Plus the ones I added before.
>>>>>>>>>
>>>>>>>>> The new commit id for that branch begins with be3198d6.
>>>>>>>>>
>>>>>>>>> I'm really sorry for this mistake.  Please try this new
>>>>>>>>> branch and report back what you find.
>>>>>>>>>
>>>>>>>>>                                          -Alex
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>> Nick, I have put together a branch that includes two fixes
>>>>>>>>>> that might be helpful.  I don't expect these fixes will
>>>>>>>>>> necessarily *fix* what you're seeing, but one of them
>>>>>>>>>> pulls a big hunk of processing out of the picture and
>>>>>>>>>> might help eliminate some potential causes.  I had to
>>>>>>>>>> pull in several other patches as prerequisites in order
>>>>>>>>>> to get those fixes to apply cleanly.
>>>>>>>>>>
>>>>>>>>>> Would you be able to give it a try, and let us know what
>>>>>>>>>> results you get?  The branch contains:
>>>>>>>>>> - Linux 3.5.5
>>>>>>>>>> - Plus the first 49 patches you listed
>>>>>>>>>> - Plus four patches, which are prerequisites...
>>>>>>>>>>      libceph: define ceph_extract_encoded_string()
>>>>>>>>>>      rbd: define some new format constants
>>>>>>>>>>      rbd: define rbd_dev_image_id()
>>>>>>>>>>      rbd: kill create_snap sysfs entry
>>>>>>>>>> - ...for these two bug fixes:
>>>>>>>>>>      libceph: remove 'osdtimeout' option
>>>>>>>>>>      ceph: don't reference req after put
>>>>>>>>>>
>>>>>>>>>> The branch is available in the ceph-client git repository
>>>>>>>>>> under the name "wip-nick" and has commit id dd9323aa.
>>>>>>>>>>      https://github.com/ceph/ceph-client/tree/wip-nick
>>>>>>>>>>
>>>>>>>>>>> https://raw.github.com/gist/4132395/7cb5f0150179b012429c6e57749120dd88616cce/gistfile1.txt
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> This full debug output is very helpful.  Please supply
>>>>>>>>>> that again as well.
>>>>>>>>>>
>>>>>>>>>> Thanks.
>>>>>>>>>>
>>>>>>>>>>                                        -Alex
>>>>>>>>>>
>>>>>>>>>>> On Wed, Nov 21, 2012 at 9:49 PM, Nick Bartos <nick@pistoncloud.com>
>>>>>>>>>>> wrote:
>>>>>>>>>>>> It's very easy to reproduce now with my automated install script, the
>>>>>>>>>>>> most I've seen it succeed with that patch is 2 in a row, and hanging
>>>>>>>>>>>> on the 3rd, although it hangs on most builds.  So it shouldn't take
>>>>>>>>>>>> much to get it to do it again.  I'll try and get to that tomorrow,
>>>>>>>>>>>> when I'm a bit more rested and my brain is working better.
>>>>>>>>>>>>
>>>>>>>>>>>> Yes during this the OSDs are probably all syncing up.  All the osd
>>>>>>>>>>>> and
>>>>>>>>>>>> mon daemons have started by the time the rdb commands are ran,
>>>>>>>>>>>> though.
>>>>>>>>>>>>
>>>>>>>>>>>> On Wed, Nov 21, 2012 at 8:47 PM, Sage Weil <sage@inktank.com> wrote:
>>>>>>>>>>>>> On Wed, 21 Nov 2012, Nick Bartos wrote:
>>>>>>>>>>>>>> FYI the build which included all 3.5 backports except patch #50 is
>>>>>>>>>>>>>> still going strong after 21 builds.
>>>>>>>>>>>>>
>>>>>>>>>>>>> Okay, that one at least makes some sense.  I've opened
>>>>>>>>>>>>>
>>>>>>>>>>>>>          http://tracker.newdream.net/issues/3519
>>>>>>>>>>>>>
>>>>>>>>>>>>> How easy is this to reproduce?  If it is something you can
>>>>>>>>>>>>> trigger with
>>>>>>>>>>>>> debugging enabled ('echo module libceph +p >
>>>>>>>>>>>>> /sys/kernel/debug/dynamic_debug/control') that would help
>>>>>>>>>>>>> tremendously.
>>>>>>>>>>>>>
>>>>>>>>>>>>> I'm guessing that during this startup time the OSDs are still in the
>>>>>>>>>>>>> process of starting?
>>>>>>>>>>>>>
>>>>>>>>>>>>> Alex, I bet that a test that does a lot of map/unmap stuff in a
>>>>>>>>>>>>> loop while
>>>>>>>>>>>>> thrashing OSDs could hit this.
>>>>>>>>>>>>>
>>>>>>>>>>>>> Thanks!
>>>>>>>>>>>>> sage
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> On Wed, Nov 21, 2012 at 9:34 AM, Nick Bartos
>>>>>>>>>>>>>> <nick@pistoncloud.com> wrote:
>>>>>>>>>>>>>>> With 8 successful installs already done, I'm reasonably
>>>>>>>>>>>>>>> confident that
>>>>>>>>>>>>>>> it's patch #50.  I'm making another build which applies all
>>>>>>>>>>>>>>> patches
>>>>>>>>>>>>>>> from the 3.5 backport branch, excluding that specific one.
>>>>>>>>>>>>>>> I'll let
>>>>>>>>>>>>>>> you know if that turns up any unexpected failures.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> What will the potential fall out be for removing that specific
>>>>>>>>>>>>>>> patch?
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> On Wed, Nov 21, 2012 at 9:02 AM, Nick Bartos
>>>>>>>>>>>>>>> <nick@pistoncloud.com> wrote:
>>>>>>>>>>>>>>>> It's really looking like it's the
>>>>>>>>>>>>>>>> libceph_resubmit_linger_ops_when_pg_mapping_changes commit.  When
>>>>>>>>>>>>>>>> patches 1-50 (listed below) are applied to 3.5.7, the hang is
>>>>>>>>>>>>>>>> present.
>>>>>>>>>>>>>>>>   So far I have gone through 4 successful installs with no
>>>>>>>>>>>>>>>> hang with
>>>>>>>>>>>>>>>> only 1-49 applied.  I'm still leaving my test run to make sure
>>>>>>>>>>>>>>>> it's
>>>>>>>>>>>>>>>> not a fluke, but since previously it hangs within the first
>>>>>>>>>>>>>>>> couple of
>>>>>>>>>>>>>>>> builds, it really looks like this is where the problem
>>>>>>>>>>>>>>>> originated.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> 1-libceph_eliminate_connection_state_DEAD.patch
>>>>>>>>>>>>>>>> 2-libceph_kill_bad_proto_ceph_connection_op.patch
>>>>>>>>>>>>>>>> 3-libceph_rename_socket_callbacks.patch
>>>>>>>>>>>>>>>> 4-libceph_rename_kvec_reset_and_kvec_add_functions.patch
>>>>>>>>>>>>>>>> 5-libceph_embed_ceph_messenger_structure_in_ceph_client.patch
>>>>>>>>>>>>>>>> 6-libceph_start_separating_connection_flags_from_state.patch
>>>>>>>>>>>>>>>> 7-libceph_start_tracking_connection_socket_state.patch
>>>>>>>>>>>>>>>> 8-libceph_provide_osd_number_when_creating_osd.patch
>>>>>>>>>>>>>>>> 9-libceph_set_CLOSED_state_bit_in_con_init.patch
>>>>>>>>>>>>>>>> 10-libceph_embed_ceph_connection_structure_in_mon_client.patch
>>>>>>>>>>>>>>>> 11-libceph_drop_connection_refcounting_for_mon_client.patch
>>>>>>>>>>>>>>>> 12-libceph_init_monitor_connection_when_opening.patch
>>>>>>>>>>>>>>>> 13-libceph_fully_initialize_connection_in_con_init.patch
>>>>>>>>>>>>>>>> 14-libceph_tweak_ceph_alloc_msg.patch
>>>>>>>>>>>>>>>> 15-libceph_have_messages_point_to_their_connection.patch
>>>>>>>>>>>>>>>> 16-libceph_have_messages_take_a_connection_reference.patch
>>>>>>>>>>>>>>>> 17-libceph_make_ceph_con_revoke_a_msg_operation.patch
>>>>>>>>>>>>>>>> 18-libceph_make_ceph_con_revoke_message_a_msg_op.patch
>>>>>>>>>>>>>>>> 19-libceph_fix_overflow_in___decode_pool_names.patch
>>>>>>>>>>>>>>>> 20-libceph_fix_overflow_in_osdmap_decode.patch
>>>>>>>>>>>>>>>> 21-libceph_fix_overflow_in_osdmap_apply_incremental.patch
>>>>>>>>>>>>>>>> 22-libceph_transition_socket_state_prior_to_actual_connect.patch
>>>>>>>>>>>>>>>> 23-libceph_fix_NULL_dereference_in_reset_connection.patch
>>>>>>>>>>>>>>>> 24-libceph_use_con_get_put_methods.patch
>>>>>>>>>>>>>>>> 25-libceph_drop_ceph_con_get_put_helpers_and_nref_member.patch
>>>>>>>>>>>>>>>> 26-libceph_encapsulate_out_message_data_setup.patch
>>>>>>>>>>>>>>>> 27-libceph_encapsulate_advancing_msg_page.patch
>>>>>>>>>>>>>>>> 28-libceph_don_t_mark_footer_complete_before_it_is.patch
>>>>>>>>>>>>>>>> 29-libceph_move_init_bio__functions_up.patch
>>>>>>>>>>>>>>>> 30-libceph_move_init_of_bio_iter.patch
>>>>>>>>>>>>>>>> 31-libceph_don_t_use_bio_iter_as_a_flag.patch
>>>>>>>>>>>>>>>> 32-libceph_SOCK_CLOSED_is_a_flag_not_a_state.patch
>>>>>>>>>>>>>>>> 33-libceph_don_t_change_socket_state_on_sock_event.patch
>>>>>>>>>>>>>>>> 34-libceph_just_set_SOCK_CLOSED_when_state_changes.patch
>>>>>>>>>>>>>>>> 35-libceph_don_t_touch_con_state_in_con_close_socket.patch
>>>>>>>>>>>>>>>> 36-libceph_clear_CONNECTING_in_ceph_con_close.patch
>>>>>>>>>>>>>>>> 37-libceph_clear_NEGOTIATING_when_done.patch
>>>>>>>>>>>>>>>> 38-libceph_define_and_use_an_explicit_CONNECTED_state.patch
>>>>>>>>>>>>>>>> 39-libceph_separate_banner_and_connect_writes.patch
>>>>>>>>>>>>>>>> 40-libceph_distinguish_two_phases_of_connect_sequence.patch
>>>>>>>>>>>>>>>> 41-libceph_small_changes_to_messenger.c.patch
>>>>>>>>>>>>>>>> 42-libceph_add_some_fine_ASCII_art.patch
>>>>>>>>>>>>>>>> 43-libceph_set_peer_name_on_con_open_not_init.patch
>>>>>>>>>>>>>>>> 44-libceph_initialize_mon_client_con_only_once.patch
>>>>>>>>>>>>>>>> 45-libceph_allow_sock_transition_from_CONNECTING_to_CLOSED.patch
>>>>>>>>>>>>>>>> 46-libceph_initialize_msgpool_message_types.patch
>>>>>>>>>>>>>>>> 47-libceph_prevent_the_race_of_incoming_work_during_teardown.patch
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> 48-libceph_report_socket_read_write_error_message.patch
>>>>>>>>>>>>>>>> 49-libceph_fix_mutex_coverage_for_ceph_con_close.patch
>>>>>>>>>>>>>>>> 50-libceph_resubmit_linger_ops_when_pg_mapping_changes.patch
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> On Wed, Nov 21, 2012 at 8:50 AM, Sage Weil <sage@inktank.com>
>>>>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>>>> Thanks for hunting this down.  I'm very curious what the
>>>>>>>>>>>>>>>>> culprit is...
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> sage
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>
>>>>>>>
>>>>>>
>>>>
>>>
> --
> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 


^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: rbd map command hangs for 15 minutes during system start up
  2012-12-13 19:00                                                               ` Nick Bartos
  2012-12-13 19:07                                                                 ` Alex Elder
@ 2012-12-14 16:46                                                                 ` Alex Elder
  2012-12-14 16:53                                                                   ` Nick Bartos
  1 sibling, 1 reply; 56+ messages in thread
From: Alex Elder @ 2012-12-14 16:46 UTC (permalink / raw)
  To: Nick Bartos
  Cc: Sage Weil, Gregory Farnum, Josh Durgin, Mandell Degerness, ceph-devel

On 12/13/2012 01:00 PM, Nick Bartos wrote:
> Here's another log with the kernel debugging enabled:
> https://gist.github.com/raw/4278697/1c9e41d275e614783fbbdee8ca5842680f46c249/rbd-hang-1355424455.log
> 
> Note that it hung on the 2nd try.

Just to make sure I'm working with the right code base, can
you confirm that you're using a kernel built with the equivalent
of what's now in the "wip-nick-newer" branch (commit id 1728893)?


Also, looking at this log I don't think I see any rbd debug output.
Does that make sense to you?

How are you activating debugging to get these messages?
If it includes something like:

    echo module libceph +p > /sys/kernel/debug/dynamic_debug/control

it might be that you need to also do:

    echo module rbd +p > /sys/kernel/debug/dynamic_debug/control

This information would be helpful in providing some more context
about what rbd is doing that's leading to the various messaging
activity I seen in this log.

Please send me a log with that info if you are able to produce
one.  Thanks a lot.

					-Alex

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: rbd map command hangs for 15 minutes during system start up
  2012-12-14 16:46                                                                 ` Alex Elder
@ 2012-12-14 16:53                                                                   ` Nick Bartos
  2012-12-14 18:03                                                                     ` Alex Elder
  0 siblings, 1 reply; 56+ messages in thread
From: Nick Bartos @ 2012-12-14 16:53 UTC (permalink / raw)
  To: Alex Elder
  Cc: Sage Weil, Gregory Farnum, Josh Durgin, Mandell Degerness, ceph-devel

The kernel is 3.5.7 with the following patches applied (and in the
order specified below):

001-libceph_eliminate_connection_state_DEAD_13_days_ago.patch
002-libceph_kill_bad_proto_ceph_connection_op_13_days_ago.patch
003-libceph_rename_socket_callbacks_13_days_ago.patch
004-libceph_rename_kvec_reset_and_kvec_add_functions_13_days_ago.patch
005-libceph_embed_ceph_messenger_structure_in_ceph_client_13_days_ago.patch
006-libceph_start_separating_connection_flags_from_state_13_days_ago.patch
007-libceph_start_tracking_connection_socket_state_13_days_ago.patch
008-libceph_provide_osd_number_when_creating_osd_13_days_ago.patch
009-libceph_set_CLOSED_state_bit_in_con_init_13_days_ago.patch
010-libceph_embed_ceph_connection_structure_in_mon_client_13_days_ago.patch
011-libceph_drop_connection_refcounting_for_mon_client_13_days_ago.patch
012-libceph_init_monitor_connection_when_opening_13_days_ago.patch
013-libceph_fully_initialize_connection_in_con_init_13_days_ago.patch
014-libceph_tweak_ceph_alloc_msg_13_days_ago.patch
015-libceph_have_messages_point_to_their_connection_13_days_ago.patch
016-libceph_have_messages_take_a_connection_reference_13_days_ago.patch
017-libceph_make_ceph_con_revoke_a_msg_operation_13_days_ago.patch
018-libceph_make_ceph_con_revoke_message_a_msg_op_13_days_ago.patch
019-libceph_fix_overflow_in___decode_pool_names_13_days_ago.patch
020-libceph_fix_overflow_in_osdmap_decode_13_days_ago.patch
021-libceph_fix_overflow_in_osdmap_apply_incremental_13_days_ago.patch
022-libceph_transition_socket_state_prior_to_actual_connect_13_days_ago.patch
023-libceph_fix_NULL_dereference_in_reset_connection_13_days_ago.patch
024-libceph_use_con_get_put_methods_13_days_ago.patch
025-libceph_drop_ceph_con_get_put_helpers_and_nref_member_13_days_ago.patch
026-libceph_encapsulate_out_message_data_setup_13_days_ago.patch
027-libceph_encapsulate_advancing_msg_page_13_days_ago.patch
028-libceph_don_t_mark_footer_complete_before_it_is_13_days_ago.patch
029-libceph_move_init_bio__functions_up_13_days_ago.patch
030-libceph_move_init_of_bio_iter_13_days_ago.patch
031-libceph_don_t_use_bio_iter_as_a_flag_13_days_ago.patch
032-libceph_SOCK_CLOSED_is_a_flag_not_a_state_13_days_ago.patch
033-libceph_don_t_change_socket_state_on_sock_event_13_days_ago.patch
034-libceph_just_set_SOCK_CLOSED_when_state_changes_13_days_ago.patch
035-libceph_don_t_touch_con_state_in_con_close_socket_13_days_ago.patch
036-libceph_clear_CONNECTING_in_ceph_con_close_13_days_ago.patch
037-libceph_clear_NEGOTIATING_when_done_13_days_ago.patch
038-libceph_define_and_use_an_explicit_CONNECTED_state_13_days_ago.patch
039-libceph_separate_banner_and_connect_writes_13_days_ago.patch
040-libceph_distinguish_two_phases_of_connect_sequence_13_days_ago.patch
041-libceph_small_changes_to_messenger.c_13_days_ago.patch
042-libceph_add_some_fine_ASCII_art_13_days_ago.patch
043-libceph_set_peer_name_on_con_open_not_init_13_days_ago.patch
044-libceph_initialize_mon_client_con_only_once_13_days_ago.patch
045-libceph_allow_sock_transition_from_CONNECTING_to_CLOSED_13_days_ago.patch
046-libceph_initialize_msgpool_message_types_13_days_ago.patch
047-libceph_prevent_the_race_of_incoming_work_during_teardown_13_days_ago.patch
048-libceph_report_socket_read_write_error_message_13_days_ago.patch
049-libceph_fix_mutex_coverage_for_ceph_con_close_13_days_ago.patch
050-libceph_resubmit_linger_ops_when_pg_mapping_changes_12_days_ago.patch
051-libceph_re_initialize_bio_iter_on_start_of_message_receive_28_hours_ago.patch
052-libceph_protect_ceph_con_open_with_mutex_28_hours_ago.patch
053-libceph_reset_connection_retry_on_successfully_negotiation_28_hours_ago.patch
054-libceph_fix_fault_locking_close_socket_on_lossy_fault_28_hours_ago.patch
055-libceph_move_msgr_clear_standby_under_con_mutex_protection_28_hours_ago.patch
056-libceph_move_ceph_con_send_closed_check_under_the_con_mutex_28_hours_ago.patch
057-libceph_drop_gratuitous_socket_close_calls_in_con_work_28_hours_ago.patch
058-libceph_close_socket_directly_from_ceph_con_close_28_hours_ago.patch
059-libceph_drop_unnecessary_CLOSED_check_in_socket_state_change_callback_28_hours_ago.patch
060-libceph_replace_connection_state_bits_with_states_28_hours_ago.patch
061-libceph_clean_up_con_flags_28_hours_ago.patch
062-libceph_clear_all_flags_on_con_close_28_hours_ago.patch
063-libceph_fix_handling_of_immediate_socket_connect_failure_28_hours_ago.patch
064-libceph_revoke_mon_client_messages_on_session_restart_28_hours_ago.patch
065-libceph_verify_state_after_retaking_con_lock_after_dispatch_28_hours_ago.patch
066-libceph_avoid_dropping_con_mutex_before_fault_28_hours_ago.patch
067-libceph_change_ceph_con_in_msg_alloc_convention_to_be_less_weird_28_hours_ago.patch
068-libceph_recheck_con_state_after_allocating_incoming_message_28_hours_ago.patch
069-libceph_fix_crypto_key_null_deref_memory_leak_28_hours_ago.patch
070-libceph_delay_debugfs_initialization_until_we_learn_global_id_28_hours_ago.patch
071-libceph_avoid_truncation_due_to_racing_banners_28_hours_ago.patch
072-libceph_only_kunmap_kmapped_pages_28_hours_ago.patch
073-rbd_reset_BACKOFF_if_unable_to_re-queue_28_hours_ago.patch
074-libceph_avoid_NULL_kref_put_when_osd_reset_races_with_alloc_msg_28_hours_ago.patch
075-ceph_fix_dentry_reference_leak_in_encode_fh_28_hours_ago.patch
076-ceph_Fix_oops_when_handling_mdsmap_that_decreases_max_mds_28_hours_ago.patch
077-libceph_check_for_invalid_mapping_28_hours_ago.patch
078-ceph_avoid_32-bit_page_index_overflow_28_hours_ago.patch
079-libceph_define_ceph_extract_encoded_string_28_hours_ago.patch
080-rbd_define_some_new_format_constants_28_hours_ago.patch
081-rbd_define_rbd_dev_image_id_28_hours_ago.patch
082-rbd_kill_create_snap_sysfs_entry_28_hours_ago.patch
083-libceph_remove_osdtimeout_option_28_hours_ago.patch
084-ceph_don_t_reference_req_after_put_28_hours_ago.patch
085-libceph_avoid_using_freed_osd_in___kick_osd_requests_28_hours_ago.patch
086-libceph_register_request_before_unregister_linger_28_hours_ago.patch
087-libceph_socket_can_close_in_any_connection_state_28_hours_ago.patch
088-libceph_init_osd-_o_node_in_create_osd_28_hours_ago.patch
089-rbd_remove_linger_unconditionally_28_hours_ago.patch
090-HEAD_ceph_wip-nick-newer_libceph_reformat___reset_osd_28_hours_ago.patch
linux-3.4.4-ignoresync-hack.patch

Yes I was only enabling debugging for libceph.  I'm adding debugging
for rbd as well.  I'll do a repro later today when a test cluster
opens up.


On Fri, Dec 14, 2012 at 8:46 AM, Alex Elder <elder@inktank.com> wrote:
> On 12/13/2012 01:00 PM, Nick Bartos wrote:
>> Here's another log with the kernel debugging enabled:
>> https://gist.github.com/raw/4278697/1c9e41d275e614783fbbdee8ca5842680f46c249/rbd-hang-1355424455.log
>>
>> Note that it hung on the 2nd try.
>
> Just to make sure I'm working with the right code base, can
> you confirm that you're using a kernel built with the equivalent
> of what's now in the "wip-nick-newer" branch (commit id 1728893)?
>
>
> Also, looking at this log I don't think I see any rbd debug output.
> Does that make sense to you?
>
> How are you activating debugging to get these messages?
> If it includes something like:
>
>     echo module libceph +p > /sys/kernel/debug/dynamic_debug/control
>
> it might be that you need to also do:
>
>     echo module rbd +p > /sys/kernel/debug/dynamic_debug/control
>
> This information would be helpful in providing some more context
> about what rbd is doing that's leading to the various messaging
> activity I seen in this log.
>
> Please send me a log with that info if you are able to produce
> one.  Thanks a lot.
>
>                                         -Alex

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: rbd map command hangs for 15 minutes during system start up
  2012-12-14 16:53                                                                   ` Nick Bartos
@ 2012-12-14 18:03                                                                     ` Alex Elder
  2012-12-17 17:12                                                                       ` Nick Bartos
  0 siblings, 1 reply; 56+ messages in thread
From: Alex Elder @ 2012-12-14 18:03 UTC (permalink / raw)
  To: Nick Bartos
  Cc: Sage Weil, Gregory Farnum, Josh Durgin, Mandell Degerness, ceph-devel

On 12/14/2012 10:53 AM, Nick Bartos wrote:
> Yes I was only enabling debugging for libceph.  I'm adding debugging
> for rbd as well.  I'll do a repro later today when a test cluster
> opens up.

Excellent, thank you.	-Alex

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: rbd map command hangs for 15 minutes during system start up
  2012-12-14 18:03                                                                     ` Alex Elder
@ 2012-12-17 17:12                                                                       ` Nick Bartos
  2012-12-18 16:09                                                                         ` Alex Elder
  0 siblings, 1 reply; 56+ messages in thread
From: Nick Bartos @ 2012-12-17 17:12 UTC (permalink / raw)
  To: Alex Elder
  Cc: Sage Weil, Gregory Farnum, Josh Durgin, Mandell Degerness, ceph-devel

Here's a log with the rbd debugging enabled:

https://gist.github.com/raw/4319962/d9690fd92c169198efc5eecabf275ef1808929d2/rbd-hang-test-1355763470.log

On Fri, Dec 14, 2012 at 10:03 AM, Alex Elder <elder@inktank.com> wrote:
> On 12/14/2012 10:53 AM, Nick Bartos wrote:
>> Yes I was only enabling debugging for libceph.  I'm adding debugging
>> for rbd as well.  I'll do a repro later today when a test cluster
>> opens up.
>
> Excellent, thank you.   -Alex

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: rbd map command hangs for 15 minutes during system start up
  2012-12-17 17:12                                                                       ` Nick Bartos
@ 2012-12-18 16:09                                                                         ` Alex Elder
  2012-12-18 18:05                                                                           ` Nick Bartos
  0 siblings, 1 reply; 56+ messages in thread
From: Alex Elder @ 2012-12-18 16:09 UTC (permalink / raw)
  To: Nick Bartos
  Cc: Sage Weil, Gregory Farnum, Josh Durgin, Mandell Degerness, ceph-devel

On 12/17/2012 11:12 AM, Nick Bartos wrote:
> Here's a log with the rbd debugging enabled:
> 
> https://gist.github.com/raw/4319962/d9690fd92c169198efc5eecabf275ef1808929d2/rbd-hang-test-1355763470.log
> 
> On Fri, Dec 14, 2012 at 10:03 AM, Alex Elder <elder@inktank.com> wrote:
>> On 12/14/2012 10:53 AM, Nick Bartos wrote:
>>> Yes I was only enabling debugging for libceph.  I'm adding debugging
>>> for rbd as well.  I'll do a repro later today when a test cluster
>>> opens up.
>>
>> Excellent, thank you.   -Alex

I looked through these debugging messages.  Looking only at the
rbd debugging, what I see seems to indicate that rbd is idle at
the point the "hang" seems to start.  This suggests that the hang
is not due to rbd itself, but rather whatever it is that might
be responsible for using the rbd image once it has been mapped.

Is that possible?  I don't know what process you have that is
mapping the rbd image, and what is supposed to be the next thing
it does.  (I realize this may not make a lot of sense, given
a patch in rdb seems to have caused the hang to begin occurring.)

Also note that the debugging information available (i.e., the
lines in the code that can output debugging information) may
well be incomplete.  So if you don't find anything it may be
necessary to provide you with another update which might include
more debugging.

Anyway, could you provide a little more context about what
is going on sort of *around* rbd when activity seems to stop?

Thanks a lot.

					-Alex

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: rbd map command hangs for 15 minutes during system start up
  2012-12-18 16:09                                                                         ` Alex Elder
@ 2012-12-18 18:05                                                                           ` Nick Bartos
  2012-12-19 21:25                                                                             ` Alex Elder
  0 siblings, 1 reply; 56+ messages in thread
From: Nick Bartos @ 2012-12-18 18:05 UTC (permalink / raw)
  To: Alex Elder
  Cc: Sage Weil, Gregory Farnum, Josh Durgin, Mandell Degerness, ceph-devel

I've added the output of "ps -ef" in addition to triggering a trace
when a hang is detected.  Not much is generally running at that point,
but you can have a look:

https://gist.github.com/raw/4330223/2f131ee312ee43cb3d8c307a9bf2f454a7edfe57/rbd-hang-1355851498.txt

Is it possible that there is some sort of deadlock going on?  We are
doing the rbd maps (and subsequent filesystem mounts) on the same
systems which are running the ceph-osd and ceph-mon processes.  To get
around the 'sync' deadlock problem, we are using a patch from Sage
which ignores system wide sync's on filesystems mounted with the
'mand' option (and we mount the underlying osd filesystems with
'mand').  However I am wondering if there is potential for other types
of deadlocks in this environment.

Also, we recently saw an rbd hang in a much older version, running
kernel 3.5.3 with only the sync hack patch, along side ceph 0.48.1.
It's possible that this issue was around for some time, just the
recent patches made it happen more often (and thus more reproducible)
for us.


On Tue, Dec 18, 2012 at 8:09 AM, Alex Elder <elder@inktank.com> wrote:
> On 12/17/2012 11:12 AM, Nick Bartos wrote:
>> Here's a log with the rbd debugging enabled:
>>
>> https://gist.github.com/raw/4319962/d9690fd92c169198efc5eecabf275ef1808929d2/rbd-hang-test-1355763470.log
>>
>> On Fri, Dec 14, 2012 at 10:03 AM, Alex Elder <elder@inktank.com> wrote:
>>> On 12/14/2012 10:53 AM, Nick Bartos wrote:
>>>> Yes I was only enabling debugging for libceph.  I'm adding debugging
>>>> for rbd as well.  I'll do a repro later today when a test cluster
>>>> opens up.
>>>
>>> Excellent, thank you.   -Alex
>
> I looked through these debugging messages.  Looking only at the
> rbd debugging, what I see seems to indicate that rbd is idle at
> the point the "hang" seems to start.  This suggests that the hang
> is not due to rbd itself, but rather whatever it is that might
> be responsible for using the rbd image once it has been mapped.
>
> Is that possible?  I don't know what process you have that is
> mapping the rbd image, and what is supposed to be the next thing
> it does.  (I realize this may not make a lot of sense, given
> a patch in rdb seems to have caused the hang to begin occurring.)
>
> Also note that the debugging information available (i.e., the
> lines in the code that can output debugging information) may
> well be incomplete.  So if you don't find anything it may be
> necessary to provide you with another update which might include
> more debugging.
>
> Anyway, could you provide a little more context about what
> is going on sort of *around* rbd when activity seems to stop?
>
> Thanks a lot.
>
>                                         -Alex

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: rbd map command hangs for 15 minutes during system start up
  2012-12-18 18:05                                                                           ` Nick Bartos
@ 2012-12-19 21:25                                                                             ` Alex Elder
  2012-12-19 22:42                                                                               ` Alex Elder
  0 siblings, 1 reply; 56+ messages in thread
From: Alex Elder @ 2012-12-19 21:25 UTC (permalink / raw)
  To: Nick Bartos
  Cc: Sage Weil, Gregory Farnum, Josh Durgin, Mandell Degerness, ceph-devel

On 12/18/2012 12:05 PM, Nick Bartos wrote:
> I've added the output of "ps -ef" in addition to triggering a trace
> when a hang is detected.  Not much is generally running at that point,
> but you can have a look:
> 
> https://gist.github.com/raw/4330223/2f131ee312ee43cb3d8c307a9bf2f454a7edfe57/rbd-hang-1355851498.txt

This helped a lot.  I updated the bug with a little more info.

    http://tracker.newdream.net/issues/3519

I also think I have now found something that could explain what you
are seeing, and am developing a fix.  I'll provide you an update
as soon as I have tested what I come up with, almost certainly
this afternoon.

					-Alex

> Is it possible that there is some sort of deadlock going on?  We are
> doing the rbd maps (and subsequent filesystem mounts) on the same
> systems which are running the ceph-osd and ceph-mon processes.  To get
> around the 'sync' deadlock problem, we are using a patch from Sage
> which ignores system wide sync's on filesystems mounted with the
> 'mand' option (and we mount the underlying osd filesystems with
> 'mand').  However I am wondering if there is potential for other types
> of deadlocks in this environment.
> 
> Also, we recently saw an rbd hang in a much older version, running
> kernel 3.5.3 with only the sync hack patch, along side ceph 0.48.1.
> It's possible that this issue was around for some time, just the
> recent patches made it happen more often (and thus more reproducible)
> for us.
> 
> 
> On Tue, Dec 18, 2012 at 8:09 AM, Alex Elder <elder@inktank.com> wrote:
>> On 12/17/2012 11:12 AM, Nick Bartos wrote:
>>> Here's a log with the rbd debugging enabled:
>>>
>>> https://gist.github.com/raw/4319962/d9690fd92c169198efc5eecabf275ef1808929d2/rbd-hang-test-1355763470.log
>>>
>>> On Fri, Dec 14, 2012 at 10:03 AM, Alex Elder <elder@inktank.com> wrote:
>>>> On 12/14/2012 10:53 AM, Nick Bartos wrote:
>>>>> Yes I was only enabling debugging for libceph.  I'm adding debugging
>>>>> for rbd as well.  I'll do a repro later today when a test cluster
>>>>> opens up.
>>>>
>>>> Excellent, thank you.   -Alex
>>
>> I looked through these debugging messages.  Looking only at the
>> rbd debugging, what I see seems to indicate that rbd is idle at
>> the point the "hang" seems to start.  This suggests that the hang
>> is not due to rbd itself, but rather whatever it is that might
>> be responsible for using the rbd image once it has been mapped.
>>
>> Is that possible?  I don't know what process you have that is
>> mapping the rbd image, and what is supposed to be the next thing
>> it does.  (I realize this may not make a lot of sense, given
>> a patch in rdb seems to have caused the hang to begin occurring.)
>>
>> Also note that the debugging information available (i.e., the
>> lines in the code that can output debugging information) may
>> well be incomplete.  So if you don't find anything it may be
>> necessary to provide you with another update which might include
>> more debugging.
>>
>> Anyway, could you provide a little more context about what
>> is going on sort of *around* rbd when activity seems to stop?
>>
>> Thanks a lot.
>>
>>                                         -Alex


^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: rbd map command hangs for 15 minutes during system start up
  2012-12-19 21:25                                                                             ` Alex Elder
@ 2012-12-19 22:42                                                                               ` Alex Elder
  2012-12-20 17:48                                                                                 ` Nick Bartos
  0 siblings, 1 reply; 56+ messages in thread
From: Alex Elder @ 2012-12-19 22:42 UTC (permalink / raw)
  To: Nick Bartos
  Cc: Sage Weil, Gregory Farnum, Josh Durgin, Mandell Degerness, ceph-devel

On 12/19/2012 03:25 PM, Alex Elder wrote:
> On 12/18/2012 12:05 PM, Nick Bartos wrote:
>> I've added the output of "ps -ef" in addition to triggering a trace
>> when a hang is detected.  Not much is generally running at that point,
>> but you can have a look:
>>
>> https://gist.github.com/raw/4330223/2f131ee312ee43cb3d8c307a9bf2f454a7edfe57/rbd-hang-1355851498.txt
> 
> This helped a lot.  I updated the bug with a little more info.
> 
>     http://tracker.newdream.net/issues/3519
> 
> I also think I have now found something that could explain what you
> are seeing, and am developing a fix.  I'll provide you an update
> as soon as I have tested what I come up with, almost certainly
> this afternoon.

Nick, I have a new branch for you to try with a new fix in place.
As you might have predicted, it's named "wip-nick-newest".

Please give it a try to see if it resolved the hang you've
been seeing and let me know how it goes.  If it continues
to hang, please provide the logs as you have before, it's
been very helpful.

Thanks a lot.

					-Alex
> 
> 					-Alex
> 
>> Is it possible that there is some sort of deadlock going on?  We are
>> doing the rbd maps (and subsequent filesystem mounts) on the same
>> systems which are running the ceph-osd and ceph-mon processes.  To get
>> around the 'sync' deadlock problem, we are using a patch from Sage
>> which ignores system wide sync's on filesystems mounted with the
>> 'mand' option (and we mount the underlying osd filesystems with
>> 'mand').  However I am wondering if there is potential for other types
>> of deadlocks in this environment.
>>
>> Also, we recently saw an rbd hang in a much older version, running
>> kernel 3.5.3 with only the sync hack patch, along side ceph 0.48.1.
>> It's possible that this issue was around for some time, just the
>> recent patches made it happen more often (and thus more reproducible)
>> for us.
>>
>>
>> On Tue, Dec 18, 2012 at 8:09 AM, Alex Elder <elder@inktank.com> wrote:
>>> On 12/17/2012 11:12 AM, Nick Bartos wrote:
>>>> Here's a log with the rbd debugging enabled:
>>>>
>>>> https://gist.github.com/raw/4319962/d9690fd92c169198efc5eecabf275ef1808929d2/rbd-hang-test-1355763470.log
>>>>
>>>> On Fri, Dec 14, 2012 at 10:03 AM, Alex Elder <elder@inktank.com> wrote:
>>>>> On 12/14/2012 10:53 AM, Nick Bartos wrote:
>>>>>> Yes I was only enabling debugging for libceph.  I'm adding debugging
>>>>>> for rbd as well.  I'll do a repro later today when a test cluster
>>>>>> opens up.
>>>>>
>>>>> Excellent, thank you.   -Alex
>>>
>>> I looked through these debugging messages.  Looking only at the
>>> rbd debugging, what I see seems to indicate that rbd is idle at
>>> the point the "hang" seems to start.  This suggests that the hang
>>> is not due to rbd itself, but rather whatever it is that might
>>> be responsible for using the rbd image once it has been mapped.
>>>
>>> Is that possible?  I don't know what process you have that is
>>> mapping the rbd image, and what is supposed to be the next thing
>>> it does.  (I realize this may not make a lot of sense, given
>>> a patch in rdb seems to have caused the hang to begin occurring.)
>>>
>>> Also note that the debugging information available (i.e., the
>>> lines in the code that can output debugging information) may
>>> well be incomplete.  So if you don't find anything it may be
>>> necessary to provide you with another update which might include
>>> more debugging.
>>>
>>> Anyway, could you provide a little more context about what
>>> is going on sort of *around* rbd when activity seems to stop?
>>>
>>> Thanks a lot.
>>>
>>>                                         -Alex
> 


^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: rbd map command hangs for 15 minutes during system start up
  2012-12-19 22:42                                                                               ` Alex Elder
@ 2012-12-20 17:48                                                                                 ` Nick Bartos
  2012-12-20 21:59                                                                                   ` Alex Elder
  0 siblings, 1 reply; 56+ messages in thread
From: Nick Bartos @ 2012-12-20 17:48 UTC (permalink / raw)
  To: Alex Elder
  Cc: Sage Weil, Gregory Farnum, Josh Durgin, Mandell Degerness, ceph-devel

Unfortunately, we still have a hang:

https://gist.github.com/4347052/download


On Wed, Dec 19, 2012 at 2:42 PM, Alex Elder <elder@inktank.com> wrote:
> On 12/19/2012 03:25 PM, Alex Elder wrote:
>> On 12/18/2012 12:05 PM, Nick Bartos wrote:
>>> I've added the output of "ps -ef" in addition to triggering a trace
>>> when a hang is detected.  Not much is generally running at that point,
>>> but you can have a look:
>>>
>>> https://gist.github.com/raw/4330223/2f131ee312ee43cb3d8c307a9bf2f454a7edfe57/rbd-hang-1355851498.txt
>>
>> This helped a lot.  I updated the bug with a little more info.
>>
>>     http://tracker.newdream.net/issues/3519
>>
>> I also think I have now found something that could explain what you
>> are seeing, and am developing a fix.  I'll provide you an update
>> as soon as I have tested what I come up with, almost certainly
>> this afternoon.
>
> Nick, I have a new branch for you to try with a new fix in place.
> As you might have predicted, it's named "wip-nick-newest".
>
> Please give it a try to see if it resolved the hang you've
> been seeing and let me know how it goes.  If it continues
> to hang, please provide the logs as you have before, it's
> been very helpful.
>
> Thanks a lot.
>
>                                         -Alex
>>
>>                                       -Alex
>>
>>> Is it possible that there is some sort of deadlock going on?  We are
>>> doing the rbd maps (and subsequent filesystem mounts) on the same
>>> systems which are running the ceph-osd and ceph-mon processes.  To get
>>> around the 'sync' deadlock problem, we are using a patch from Sage
>>> which ignores system wide sync's on filesystems mounted with the
>>> 'mand' option (and we mount the underlying osd filesystems with
>>> 'mand').  However I am wondering if there is potential for other types
>>> of deadlocks in this environment.
>>>
>>> Also, we recently saw an rbd hang in a much older version, running
>>> kernel 3.5.3 with only the sync hack patch, along side ceph 0.48.1.
>>> It's possible that this issue was around for some time, just the
>>> recent patches made it happen more often (and thus more reproducible)
>>> for us.
>>>
>>>
>>> On Tue, Dec 18, 2012 at 8:09 AM, Alex Elder <elder@inktank.com> wrote:
>>>> On 12/17/2012 11:12 AM, Nick Bartos wrote:
>>>>> Here's a log with the rbd debugging enabled:
>>>>>
>>>>> https://gist.github.com/raw/4319962/d9690fd92c169198efc5eecabf275ef1808929d2/rbd-hang-test-1355763470.log
>>>>>
>>>>> On Fri, Dec 14, 2012 at 10:03 AM, Alex Elder <elder@inktank.com> wrote:
>>>>>> On 12/14/2012 10:53 AM, Nick Bartos wrote:
>>>>>>> Yes I was only enabling debugging for libceph.  I'm adding debugging
>>>>>>> for rbd as well.  I'll do a repro later today when a test cluster
>>>>>>> opens up.
>>>>>>
>>>>>> Excellent, thank you.   -Alex
>>>>
>>>> I looked through these debugging messages.  Looking only at the
>>>> rbd debugging, what I see seems to indicate that rbd is idle at
>>>> the point the "hang" seems to start.  This suggests that the hang
>>>> is not due to rbd itself, but rather whatever it is that might
>>>> be responsible for using the rbd image once it has been mapped.
>>>>
>>>> Is that possible?  I don't know what process you have that is
>>>> mapping the rbd image, and what is supposed to be the next thing
>>>> it does.  (I realize this may not make a lot of sense, given
>>>> a patch in rdb seems to have caused the hang to begin occurring.)
>>>>
>>>> Also note that the debugging information available (i.e., the
>>>> lines in the code that can output debugging information) may
>>>> well be incomplete.  So if you don't find anything it may be
>>>> necessary to provide you with another update which might include
>>>> more debugging.
>>>>
>>>> Anyway, could you provide a little more context about what
>>>> is going on sort of *around* rbd when activity seems to stop?
>>>>
>>>> Thanks a lot.
>>>>
>>>>                                         -Alex
>>
>

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: rbd map command hangs for 15 minutes during system start up
  2012-12-20 17:48                                                                                 ` Nick Bartos
@ 2012-12-20 21:59                                                                                   ` Alex Elder
  2012-12-26 17:45                                                                                     ` Nick Bartos
  0 siblings, 1 reply; 56+ messages in thread
From: Alex Elder @ 2012-12-20 21:59 UTC (permalink / raw)
  To: Nick Bartos
  Cc: Sage Weil, Gregory Farnum, Josh Durgin, Mandell Degerness, ceph-devel

On 12/20/2012 11:48 AM, Nick Bartos wrote:
> Unfortunately, we still have a hang:
> 
> https://gist.github.com/4347052/download

The saga continues, and each time we get a little more
information.  Please try branch: "wip-nick-newerest"

Thank you.

					-Alex


> On Wed, Dec 19, 2012 at 2:42 PM, Alex Elder <elder@inktank.com> wrote:
>> On 12/19/2012 03:25 PM, Alex Elder wrote:
>>> On 12/18/2012 12:05 PM, Nick Bartos wrote:
>>>> I've added the output of "ps -ef" in addition to triggering a trace
>>>> when a hang is detected.  Not much is generally running at that point,
>>>> but you can have a look:
>>>>
>>>> https://gist.github.com/raw/4330223/2f131ee312ee43cb3d8c307a9bf2f454a7edfe57/rbd-hang-1355851498.txt
>>>
>>> This helped a lot.  I updated the bug with a little more info.
>>>
>>>     http://tracker.newdream.net/issues/3519
>>>
>>> I also think I have now found something that could explain what you
>>> are seeing, and am developing a fix.  I'll provide you an update
>>> as soon as I have tested what I come up with, almost certainly
>>> this afternoon.
>>
>> Nick, I have a new branch for you to try with a new fix in place.
>> As you might have predicted, it's named "wip-nick-newest".
>>
>> Please give it a try to see if it resolved the hang you've
>> been seeing and let me know how it goes.  If it continues
>> to hang, please provide the logs as you have before, it's
>> been very helpful.
>>
>> Thanks a lot.
>>
>>                                         -Alex
>>>
>>>                                       -Alex
>>>
>>>> Is it possible that there is some sort of deadlock going on?  We are
>>>> doing the rbd maps (and subsequent filesystem mounts) on the same
>>>> systems which are running the ceph-osd and ceph-mon processes.  To get
>>>> around the 'sync' deadlock problem, we are using a patch from Sage
>>>> which ignores system wide sync's on filesystems mounted with the
>>>> 'mand' option (and we mount the underlying osd filesystems with
>>>> 'mand').  However I am wondering if there is potential for other types
>>>> of deadlocks in this environment.
>>>>
>>>> Also, we recently saw an rbd hang in a much older version, running
>>>> kernel 3.5.3 with only the sync hack patch, along side ceph 0.48.1.
>>>> It's possible that this issue was around for some time, just the
>>>> recent patches made it happen more often (and thus more reproducible)
>>>> for us.
>>>>
>>>>
>>>> On Tue, Dec 18, 2012 at 8:09 AM, Alex Elder <elder@inktank.com> wrote:
>>>>> On 12/17/2012 11:12 AM, Nick Bartos wrote:
>>>>>> Here's a log with the rbd debugging enabled:
>>>>>>
>>>>>> https://gist.github.com/raw/4319962/d9690fd92c169198efc5eecabf275ef1808929d2/rbd-hang-test-1355763470.log
>>>>>>
>>>>>> On Fri, Dec 14, 2012 at 10:03 AM, Alex Elder <elder@inktank.com> wrote:
>>>>>>> On 12/14/2012 10:53 AM, Nick Bartos wrote:
>>>>>>>> Yes I was only enabling debugging for libceph.  I'm adding debugging
>>>>>>>> for rbd as well.  I'll do a repro later today when a test cluster
>>>>>>>> opens up.
>>>>>>>
>>>>>>> Excellent, thank you.   -Alex
>>>>>
>>>>> I looked through these debugging messages.  Looking only at the
>>>>> rbd debugging, what I see seems to indicate that rbd is idle at
>>>>> the point the "hang" seems to start.  This suggests that the hang
>>>>> is not due to rbd itself, but rather whatever it is that might
>>>>> be responsible for using the rbd image once it has been mapped.
>>>>>
>>>>> Is that possible?  I don't know what process you have that is
>>>>> mapping the rbd image, and what is supposed to be the next thing
>>>>> it does.  (I realize this may not make a lot of sense, given
>>>>> a patch in rdb seems to have caused the hang to begin occurring.)
>>>>>
>>>>> Also note that the debugging information available (i.e., the
>>>>> lines in the code that can output debugging information) may
>>>>> well be incomplete.  So if you don't find anything it may be
>>>>> necessary to provide you with another update which might include
>>>>> more debugging.
>>>>>
>>>>> Anyway, could you provide a little more context about what
>>>>> is going on sort of *around* rbd when activity seems to stop?
>>>>>
>>>>> Thanks a lot.
>>>>>
>>>>>                                         -Alex
>>>
>>


^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: rbd map command hangs for 15 minutes during system start up
  2012-12-20 21:59                                                                                   ` Alex Elder
@ 2012-12-26 17:45                                                                                     ` Nick Bartos
  2012-12-26 17:50                                                                                       ` Alex Elder
  2012-12-26 21:36                                                                                       ` Alex Elder
  0 siblings, 2 replies; 56+ messages in thread
From: Nick Bartos @ 2012-12-26 17:45 UTC (permalink / raw)
  To: Alex Elder
  Cc: Sage Weil, Gregory Farnum, Josh Durgin, Mandell Degerness, ceph-devel

Here's a log with a hang on the updated branch:

https://gist.github.com/raw/4381750/772476e1bae1e6366347a223f34aa6c440b92765/rdb-hang-1356543132.log


On Thu, Dec 20, 2012 at 1:59 PM, Alex Elder <elder@inktank.com> wrote:
> On 12/20/2012 11:48 AM, Nick Bartos wrote:
>> Unfortunately, we still have a hang:
>>
>> https://gist.github.com/4347052/download
>
> The saga continues, and each time we get a little more
> information.  Please try branch: "wip-nick-newerest"
>
> Thank you.
>
>                                         -Alex
>
>
>> On Wed, Dec 19, 2012 at 2:42 PM, Alex Elder <elder@inktank.com> wrote:
>>> On 12/19/2012 03:25 PM, Alex Elder wrote:
>>>> On 12/18/2012 12:05 PM, Nick Bartos wrote:
>>>>> I've added the output of "ps -ef" in addition to triggering a trace
>>>>> when a hang is detected.  Not much is generally running at that point,
>>>>> but you can have a look:
>>>>>
>>>>> https://gist.github.com/raw/4330223/2f131ee312ee43cb3d8c307a9bf2f454a7edfe57/rbd-hang-1355851498.txt
>>>>
>>>> This helped a lot.  I updated the bug with a little more info.
>>>>
>>>>     http://tracker.newdream.net/issues/3519
>>>>
>>>> I also think I have now found something that could explain what you
>>>> are seeing, and am developing a fix.  I'll provide you an update
>>>> as soon as I have tested what I come up with, almost certainly
>>>> this afternoon.
>>>
>>> Nick, I have a new branch for you to try with a new fix in place.
>>> As you might have predicted, it's named "wip-nick-newest".
>>>
>>> Please give it a try to see if it resolved the hang you've
>>> been seeing and let me know how it goes.  If it continues
>>> to hang, please provide the logs as you have before, it's
>>> been very helpful.
>>>
>>> Thanks a lot.
>>>
>>>                                         -Alex
>>>>
>>>>                                       -Alex
>>>>
>>>>> Is it possible that there is some sort of deadlock going on?  We are
>>>>> doing the rbd maps (and subsequent filesystem mounts) on the same
>>>>> systems which are running the ceph-osd and ceph-mon processes.  To get
>>>>> around the 'sync' deadlock problem, we are using a patch from Sage
>>>>> which ignores system wide sync's on filesystems mounted with the
>>>>> 'mand' option (and we mount the underlying osd filesystems with
>>>>> 'mand').  However I am wondering if there is potential for other types
>>>>> of deadlocks in this environment.
>>>>>
>>>>> Also, we recently saw an rbd hang in a much older version, running
>>>>> kernel 3.5.3 with only the sync hack patch, along side ceph 0.48.1.
>>>>> It's possible that this issue was around for some time, just the
>>>>> recent patches made it happen more often (and thus more reproducible)
>>>>> for us.
>>>>>
>>>>>
>>>>> On Tue, Dec 18, 2012 at 8:09 AM, Alex Elder <elder@inktank.com> wrote:
>>>>>> On 12/17/2012 11:12 AM, Nick Bartos wrote:
>>>>>>> Here's a log with the rbd debugging enabled:
>>>>>>>
>>>>>>> https://gist.github.com/raw/4319962/d9690fd92c169198efc5eecabf275ef1808929d2/rbd-hang-test-1355763470.log
>>>>>>>
>>>>>>> On Fri, Dec 14, 2012 at 10:03 AM, Alex Elder <elder@inktank.com> wrote:
>>>>>>>> On 12/14/2012 10:53 AM, Nick Bartos wrote:
>>>>>>>>> Yes I was only enabling debugging for libceph.  I'm adding debugging
>>>>>>>>> for rbd as well.  I'll do a repro later today when a test cluster
>>>>>>>>> opens up.
>>>>>>>>
>>>>>>>> Excellent, thank you.   -Alex
>>>>>>
>>>>>> I looked through these debugging messages.  Looking only at the
>>>>>> rbd debugging, what I see seems to indicate that rbd is idle at
>>>>>> the point the "hang" seems to start.  This suggests that the hang
>>>>>> is not due to rbd itself, but rather whatever it is that might
>>>>>> be responsible for using the rbd image once it has been mapped.
>>>>>>
>>>>>> Is that possible?  I don't know what process you have that is
>>>>>> mapping the rbd image, and what is supposed to be the next thing
>>>>>> it does.  (I realize this may not make a lot of sense, given
>>>>>> a patch in rdb seems to have caused the hang to begin occurring.)
>>>>>>
>>>>>> Also note that the debugging information available (i.e., the
>>>>>> lines in the code that can output debugging information) may
>>>>>> well be incomplete.  So if you don't find anything it may be
>>>>>> necessary to provide you with another update which might include
>>>>>> more debugging.
>>>>>>
>>>>>> Anyway, could you provide a little more context about what
>>>>>> is going on sort of *around* rbd when activity seems to stop?
>>>>>>
>>>>>> Thanks a lot.
>>>>>>
>>>>>>                                         -Alex
>>>>
>>>
>

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: rbd map command hangs for 15 minutes during system start up
  2012-12-26 17:45                                                                                     ` Nick Bartos
@ 2012-12-26 17:50                                                                                       ` Alex Elder
  2012-12-26 21:36                                                                                       ` Alex Elder
  1 sibling, 0 replies; 56+ messages in thread
From: Alex Elder @ 2012-12-26 17:50 UTC (permalink / raw)
  To: Nick Bartos
  Cc: Sage Weil, Gregory Farnum, Josh Durgin, Mandell Degerness, ceph-devel

On 12/26/2012 11:45 AM, Nick Bartos wrote:
> Here's a log with a hang on the updated branch:
> 
> https://gist.github.com/raw/4381750/772476e1bae1e6366347a223f34aa6c440b92765/rdb-hang-1356543132.log

I'm starting to look this over.  Thanks a lot for supplying it.
Sorry we still haven't nailed the problem.

					-Alex
> 
> On Thu, Dec 20, 2012 at 1:59 PM, Alex Elder <elder@inktank.com> wrote:
>> On 12/20/2012 11:48 AM, Nick Bartos wrote:
>>> Unfortunately, we still have a hang:
>>>
>>> https://gist.github.com/4347052/download
>>
>> The saga continues, and each time we get a little more
>> information.  Please try branch: "wip-nick-newerest"
>>
>> Thank you.
>>
>>                                         -Alex
>>
>>
>>> On Wed, Dec 19, 2012 at 2:42 PM, Alex Elder <elder@inktank.com> wrote:
>>>> On 12/19/2012 03:25 PM, Alex Elder wrote:
>>>>> On 12/18/2012 12:05 PM, Nick Bartos wrote:
>>>>>> I've added the output of "ps -ef" in addition to triggering a trace
>>>>>> when a hang is detected.  Not much is generally running at that point,
>>>>>> but you can have a look:
>>>>>>
>>>>>> https://gist.github.com/raw/4330223/2f131ee312ee43cb3d8c307a9bf2f454a7edfe57/rbd-hang-1355851498.txt
>>>>>
>>>>> This helped a lot.  I updated the bug with a little more info.
>>>>>
>>>>>     http://tracker.newdream.net/issues/3519
>>>>>
>>>>> I also think I have now found something that could explain what you
>>>>> are seeing, and am developing a fix.  I'll provide you an update
>>>>> as soon as I have tested what I come up with, almost certainly
>>>>> this afternoon.
>>>>
>>>> Nick, I have a new branch for you to try with a new fix in place.
>>>> As you might have predicted, it's named "wip-nick-newest".
>>>>
>>>> Please give it a try to see if it resolved the hang you've
>>>> been seeing and let me know how it goes.  If it continues
>>>> to hang, please provide the logs as you have before, it's
>>>> been very helpful.
>>>>
>>>> Thanks a lot.
>>>>
>>>>                                         -Alex
>>>>>
>>>>>                                       -Alex
>>>>>
>>>>>> Is it possible that there is some sort of deadlock going on?  We are
>>>>>> doing the rbd maps (and subsequent filesystem mounts) on the same
>>>>>> systems which are running the ceph-osd and ceph-mon processes.  To get
>>>>>> around the 'sync' deadlock problem, we are using a patch from Sage
>>>>>> which ignores system wide sync's on filesystems mounted with the
>>>>>> 'mand' option (and we mount the underlying osd filesystems with
>>>>>> 'mand').  However I am wondering if there is potential for other types
>>>>>> of deadlocks in this environment.
>>>>>>
>>>>>> Also, we recently saw an rbd hang in a much older version, running
>>>>>> kernel 3.5.3 with only the sync hack patch, along side ceph 0.48.1.
>>>>>> It's possible that this issue was around for some time, just the
>>>>>> recent patches made it happen more often (and thus more reproducible)
>>>>>> for us.
>>>>>>
>>>>>>
>>>>>> On Tue, Dec 18, 2012 at 8:09 AM, Alex Elder <elder@inktank.com> wrote:
>>>>>>> On 12/17/2012 11:12 AM, Nick Bartos wrote:
>>>>>>>> Here's a log with the rbd debugging enabled:
>>>>>>>>
>>>>>>>> https://gist.github.com/raw/4319962/d9690fd92c169198efc5eecabf275ef1808929d2/rbd-hang-test-1355763470.log
>>>>>>>>
>>>>>>>> On Fri, Dec 14, 2012 at 10:03 AM, Alex Elder <elder@inktank.com> wrote:
>>>>>>>>> On 12/14/2012 10:53 AM, Nick Bartos wrote:
>>>>>>>>>> Yes I was only enabling debugging for libceph.  I'm adding debugging
>>>>>>>>>> for rbd as well.  I'll do a repro later today when a test cluster
>>>>>>>>>> opens up.
>>>>>>>>>
>>>>>>>>> Excellent, thank you.   -Alex
>>>>>>>
>>>>>>> I looked through these debugging messages.  Looking only at the
>>>>>>> rbd debugging, what I see seems to indicate that rbd is idle at
>>>>>>> the point the "hang" seems to start.  This suggests that the hang
>>>>>>> is not due to rbd itself, but rather whatever it is that might
>>>>>>> be responsible for using the rbd image once it has been mapped.
>>>>>>>
>>>>>>> Is that possible?  I don't know what process you have that is
>>>>>>> mapping the rbd image, and what is supposed to be the next thing
>>>>>>> it does.  (I realize this may not make a lot of sense, given
>>>>>>> a patch in rdb seems to have caused the hang to begin occurring.)
>>>>>>>
>>>>>>> Also note that the debugging information available (i.e., the
>>>>>>> lines in the code that can output debugging information) may
>>>>>>> well be incomplete.  So if you don't find anything it may be
>>>>>>> necessary to provide you with another update which might include
>>>>>>> more debugging.
>>>>>>>
>>>>>>> Anyway, could you provide a little more context about what
>>>>>>> is going on sort of *around* rbd when activity seems to stop?
>>>>>>>
>>>>>>> Thanks a lot.
>>>>>>>
>>>>>>>                                         -Alex
>>>>>
>>>>
>>


^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: rbd map command hangs for 15 minutes during system start up
  2012-12-26 17:45                                                                                     ` Nick Bartos
  2012-12-26 17:50                                                                                       ` Alex Elder
@ 2012-12-26 21:36                                                                                       ` Alex Elder
  2012-12-27 17:33                                                                                         ` Nick Bartos
  2012-12-31 18:22                                                                                         ` Alex Elder
  1 sibling, 2 replies; 56+ messages in thread
From: Alex Elder @ 2012-12-26 21:36 UTC (permalink / raw)
  To: Nick Bartos
  Cc: Sage Weil, Gregory Farnum, Josh Durgin, Mandell Degerness, ceph-devel

On 12/26/2012 11:45 AM, Nick Bartos wrote:
> Here's a log with a hang on the updated branch:
> 
> https://gist.github.com/raw/4381750/772476e1bae1e6366347a223f34aa6c440b92765/rdb-hang-1356543132.log

OK, new naming scheme.  Please try:  wip-nick-1

I added another simple fix, but then collapsed three commits
into one, and added one more (somewhat unrelated).

I've done simple testing with this and will subject it to
more rigorous testing shortly.  I wanted to make it available
to you quickly though.

					-Alex

> 
> On Thu, Dec 20, 2012 at 1:59 PM, Alex Elder <elder@inktank.com> wrote:
>> On 12/20/2012 11:48 AM, Nick Bartos wrote:
>>> Unfortunately, we still have a hang:
>>>
>>> https://gist.github.com/4347052/download
>>
>> The saga continues, and each time we get a little more
>> information.  Please try branch: "wip-nick-newerest"
>>
>> Thank you.
>>
>>                                         -Alex
>>
>>
>>> On Wed, Dec 19, 2012 at 2:42 PM, Alex Elder <elder@inktank.com> wrote:
>>>> On 12/19/2012 03:25 PM, Alex Elder wrote:
>>>>> On 12/18/2012 12:05 PM, Nick Bartos wrote:
>>>>>> I've added the output of "ps -ef" in addition to triggering a trace
>>>>>> when a hang is detected.  Not much is generally running at that point,
>>>>>> but you can have a look:
>>>>>>
>>>>>> https://gist.github.com/raw/4330223/2f131ee312ee43cb3d8c307a9bf2f454a7edfe57/rbd-hang-1355851498.txt
>>>>>
>>>>> This helped a lot.  I updated the bug with a little more info.
>>>>>
>>>>>     http://tracker.newdream.net/issues/3519
>>>>>
>>>>> I also think I have now found something that could explain what you
>>>>> are seeing, and am developing a fix.  I'll provide you an update
>>>>> as soon as I have tested what I come up with, almost certainly
>>>>> this afternoon.
>>>>
>>>> Nick, I have a new branch for you to try with a new fix in place.
>>>> As you might have predicted, it's named "wip-nick-newest".
>>>>
>>>> Please give it a try to see if it resolved the hang you've
>>>> been seeing and let me know how it goes.  If it continues
>>>> to hang, please provide the logs as you have before, it's
>>>> been very helpful.
>>>>
>>>> Thanks a lot.
>>>>
>>>>                                         -Alex
>>>>>
>>>>>                                       -Alex
>>>>>
>>>>>> Is it possible that there is some sort of deadlock going on?  We are
>>>>>> doing the rbd maps (and subsequent filesystem mounts) on the same
>>>>>> systems which are running the ceph-osd and ceph-mon processes.  To get
>>>>>> around the 'sync' deadlock problem, we are using a patch from Sage
>>>>>> which ignores system wide sync's on filesystems mounted with the
>>>>>> 'mand' option (and we mount the underlying osd filesystems with
>>>>>> 'mand').  However I am wondering if there is potential for other types
>>>>>> of deadlocks in this environment.
>>>>>>
>>>>>> Also, we recently saw an rbd hang in a much older version, running
>>>>>> kernel 3.5.3 with only the sync hack patch, along side ceph 0.48.1.
>>>>>> It's possible that this issue was around for some time, just the
>>>>>> recent patches made it happen more often (and thus more reproducible)
>>>>>> for us.
>>>>>>
>>>>>>
>>>>>> On Tue, Dec 18, 2012 at 8:09 AM, Alex Elder <elder@inktank.com> wrote:
>>>>>>> On 12/17/2012 11:12 AM, Nick Bartos wrote:
>>>>>>>> Here's a log with the rbd debugging enabled:
>>>>>>>>
>>>>>>>> https://gist.github.com/raw/4319962/d9690fd92c169198efc5eecabf275ef1808929d2/rbd-hang-test-1355763470.log
>>>>>>>>
>>>>>>>> On Fri, Dec 14, 2012 at 10:03 AM, Alex Elder <elder@inktank.com> wrote:
>>>>>>>>> On 12/14/2012 10:53 AM, Nick Bartos wrote:
>>>>>>>>>> Yes I was only enabling debugging for libceph.  I'm adding debugging
>>>>>>>>>> for rbd as well.  I'll do a repro later today when a test cluster
>>>>>>>>>> opens up.
>>>>>>>>>
>>>>>>>>> Excellent, thank you.   -Alex
>>>>>>>
>>>>>>> I looked through these debugging messages.  Looking only at the
>>>>>>> rbd debugging, what I see seems to indicate that rbd is idle at
>>>>>>> the point the "hang" seems to start.  This suggests that the hang
>>>>>>> is not due to rbd itself, but rather whatever it is that might
>>>>>>> be responsible for using the rbd image once it has been mapped.
>>>>>>>
>>>>>>> Is that possible?  I don't know what process you have that is
>>>>>>> mapping the rbd image, and what is supposed to be the next thing
>>>>>>> it does.  (I realize this may not make a lot of sense, given
>>>>>>> a patch in rdb seems to have caused the hang to begin occurring.)
>>>>>>>
>>>>>>> Also note that the debugging information available (i.e., the
>>>>>>> lines in the code that can output debugging information) may
>>>>>>> well be incomplete.  So if you don't find anything it may be
>>>>>>> necessary to provide you with another update which might include
>>>>>>> more debugging.
>>>>>>>
>>>>>>> Anyway, could you provide a little more context about what
>>>>>>> is going on sort of *around* rbd when activity seems to stop?
>>>>>>>
>>>>>>> Thanks a lot.
>>>>>>>
>>>>>>>                                         -Alex
>>>>>
>>>>
>>


^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: rbd map command hangs for 15 minutes during system start up
  2012-12-26 21:36                                                                                       ` Alex Elder
@ 2012-12-27 17:33                                                                                         ` Nick Bartos
  2012-12-27 18:43                                                                                           ` Sage Weil
  2012-12-31 18:22                                                                                         ` Alex Elder
  1 sibling, 1 reply; 56+ messages in thread
From: Nick Bartos @ 2012-12-27 17:33 UTC (permalink / raw)
  To: Alex Elder
  Cc: Sage Weil, Gregory Farnum, Josh Durgin, Mandell Degerness, ceph-devel

I have some exciting news.  After 215 test runs, no hung processes
were detected.  I think we may actually have it this time.  Thanks for
all your hard work!

-Nick

On Wed, Dec 26, 2012 at 1:36 PM, Alex Elder <elder@inktank.com> wrote:
> On 12/26/2012 11:45 AM, Nick Bartos wrote:
>> Here's a log with a hang on the updated branch:
>>
>> https://gist.github.com/raw/4381750/772476e1bae1e6366347a223f34aa6c440b92765/rdb-hang-1356543132.log
>
> OK, new naming scheme.  Please try:  wip-nick-1
>
> I added another simple fix, but then collapsed three commits
> into one, and added one more (somewhat unrelated).
>
> I've done simple testing with this and will subject it to
> more rigorous testing shortly.  I wanted to make it available
> to you quickly though.
>
>                                         -Alex

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: rbd map command hangs for 15 minutes during system start up
  2012-12-27 17:33                                                                                         ` Nick Bartos
@ 2012-12-27 18:43                                                                                           ` Sage Weil
  2012-12-27 19:41                                                                                             ` Alex Elder
  0 siblings, 1 reply; 56+ messages in thread
From: Sage Weil @ 2012-12-27 18:43 UTC (permalink / raw)
  To: Nick Bartos
  Cc: Alex Elder, Gregory Farnum, Josh Durgin, Mandell Degerness, ceph-devel

On Thu, 27 Dec 2012, Nick Bartos wrote:
> I have some exciting news.  After 215 test runs, no hung processes
> were detected.  I think we may actually have it this time.  Thanks for
> all your hard work!

Sweet!  I think it was the new branch naming scheme that did it.

sage

> 
> -Nick
> 
> On Wed, Dec 26, 2012 at 1:36 PM, Alex Elder <elder@inktank.com> wrote:
> > On 12/26/2012 11:45 AM, Nick Bartos wrote:
> >> Here's a log with a hang on the updated branch:
> >>
> >> https://gist.github.com/raw/4381750/772476e1bae1e6366347a223f34aa6c440b92765/rdb-hang-1356543132.log
> >
> > OK, new naming scheme.  Please try:  wip-nick-1
> >
> > I added another simple fix, but then collapsed three commits
> > into one, and added one more (somewhat unrelated).
> >
> > I've done simple testing with this and will subject it to
> > more rigorous testing shortly.  I wanted to make it available
> > to you quickly though.
> >
> >                                         -Alex
> 
> 

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: rbd map command hangs for 15 minutes during system start up
  2012-12-27 18:43                                                                                           ` Sage Weil
@ 2012-12-27 19:41                                                                                             ` Alex Elder
  0 siblings, 0 replies; 56+ messages in thread
From: Alex Elder @ 2012-12-27 19:41 UTC (permalink / raw)
  To: Sage Weil
  Cc: Nick Bartos, Gregory Farnum, Josh Durgin, Mandell Degerness, ceph-devel

On 12/27/2012 12:43 PM, Sage Weil wrote:
> On Thu, 27 Dec 2012, Nick Bartos wrote:
>> I have some exciting news.  After 215 test runs, no hung processes
>> were detected.  I think we may actually have it this time.  Thanks for
>> all your hard work!

This is great news Nick, and I really appreciate your help testing so
we could get rid of this ugly thing.

> Sweet!  I think it was the new branch naming scheme that did it.

Obviously.

I'm going to blow away all the wip-nick-new* branches.

					-Alex

> 
> sage
> 
>>
>> -Nick
>>
>> On Wed, Dec 26, 2012 at 1:36 PM, Alex Elder <elder@inktank.com> wrote:
>>> On 12/26/2012 11:45 AM, Nick Bartos wrote:
>>>> Here's a log with a hang on the updated branch:
>>>>
>>>> https://gist.github.com/raw/4381750/772476e1bae1e6366347a223f34aa6c440b92765/rdb-hang-1356543132.log
>>>
>>> OK, new naming scheme.  Please try:  wip-nick-1
>>>
>>> I added another simple fix, but then collapsed three commits
>>> into one, and added one more (somewhat unrelated).
>>>
>>> I've done simple testing with this and will subject it to
>>> more rigorous testing shortly.  I wanted to make it available
>>> to you quickly though.
>>>
>>>                                         -Alex
>>
>>


^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: rbd map command hangs for 15 minutes during system start up
  2012-12-26 21:36                                                                                       ` Alex Elder
  2012-12-27 17:33                                                                                         ` Nick Bartos
@ 2012-12-31 18:22                                                                                         ` Alex Elder
  2013-01-02 15:56                                                                                           ` Nick Bartos
  1 sibling, 1 reply; 56+ messages in thread
From: Alex Elder @ 2012-12-31 18:22 UTC (permalink / raw)
  To: Nick Bartos
  Cc: Sage Weil, Gregory Farnum, Josh Durgin, Mandell Degerness, ceph-devel

On 12/26/2012 03:36 PM, Alex Elder wrote:
> On 12/26/2012 11:45 AM, Nick Bartos wrote:
>> Here's a log with a hang on the updated branch:
>>
>> https://gist.github.com/raw/4381750/772476e1bae1e6366347a223f34aa6c440b92765/rdb-hang-1356543132.log
> 
> OK, new naming scheme.  Please try:  wip-nick-1

Now that we've got this resolved, I've created an updated
"stable" branch with ceph-related bug fixes, based on the
latest 3.5 stable branch, 3.5.7.  It contains a bunch of
other bug fixes that what you had been working with did
not have.

I'm starting my own testing with this branch now.  But it
would be great if you'd give it a try as well, since I
know you're a "real" user of this code base.

It's available as branch "linux-3.5.7-ceph" on the
ceph-client git repository.  Thanks a lot.

					-Alex

> 
> I added another simple fix, but then collapsed three commits
> into one, and added one more (somewhat unrelated).
> 
> I've done simple testing with this and will subject it to
> more rigorous testing shortly.  I wanted to make it available
> to you quickly though.
> 
> 					-Alex
> 
>>
>> On Thu, Dec 20, 2012 at 1:59 PM, Alex Elder <elder@inktank.com> wrote:
>>> On 12/20/2012 11:48 AM, Nick Bartos wrote:
>>>> Unfortunately, we still have a hang:
>>>>
>>>> https://gist.github.com/4347052/download
>>>
>>> The saga continues, and each time we get a little more
>>> information.  Please try branch: "wip-nick-newerest"
>>>
>>> Thank you.
>>>
>>>                                         -Alex
>>>
>>>
>>>> On Wed, Dec 19, 2012 at 2:42 PM, Alex Elder <elder@inktank.com> wrote:
>>>>> On 12/19/2012 03:25 PM, Alex Elder wrote:
>>>>>> On 12/18/2012 12:05 PM, Nick Bartos wrote:
>>>>>>> I've added the output of "ps -ef" in addition to triggering a trace
>>>>>>> when a hang is detected.  Not much is generally running at that point,
>>>>>>> but you can have a look:
>>>>>>>
>>>>>>> https://gist.github.com/raw/4330223/2f131ee312ee43cb3d8c307a9bf2f454a7edfe57/rbd-hang-1355851498.txt
>>>>>>
>>>>>> This helped a lot.  I updated the bug with a little more info.
>>>>>>
>>>>>>     http://tracker.newdream.net/issues/3519
>>>>>>
>>>>>> I also think I have now found something that could explain what you
>>>>>> are seeing, and am developing a fix.  I'll provide you an update
>>>>>> as soon as I have tested what I come up with, almost certainly
>>>>>> this afternoon.
>>>>>
>>>>> Nick, I have a new branch for you to try with a new fix in place.
>>>>> As you might have predicted, it's named "wip-nick-newest".
>>>>>
>>>>> Please give it a try to see if it resolved the hang you've
>>>>> been seeing and let me know how it goes.  If it continues
>>>>> to hang, please provide the logs as you have before, it's
>>>>> been very helpful.
>>>>>
>>>>> Thanks a lot.
>>>>>
>>>>>                                         -Alex
>>>>>>
>>>>>>                                       -Alex
>>>>>>
>>>>>>> Is it possible that there is some sort of deadlock going on?  We are
>>>>>>> doing the rbd maps (and subsequent filesystem mounts) on the same
>>>>>>> systems which are running the ceph-osd and ceph-mon processes.  To get
>>>>>>> around the 'sync' deadlock problem, we are using a patch from Sage
>>>>>>> which ignores system wide sync's on filesystems mounted with the
>>>>>>> 'mand' option (and we mount the underlying osd filesystems with
>>>>>>> 'mand').  However I am wondering if there is potential for other types
>>>>>>> of deadlocks in this environment.
>>>>>>>
>>>>>>> Also, we recently saw an rbd hang in a much older version, running
>>>>>>> kernel 3.5.3 with only the sync hack patch, along side ceph 0.48.1.
>>>>>>> It's possible that this issue was around for some time, just the
>>>>>>> recent patches made it happen more often (and thus more reproducible)
>>>>>>> for us.
>>>>>>>
>>>>>>>
>>>>>>> On Tue, Dec 18, 2012 at 8:09 AM, Alex Elder <elder@inktank.com> wrote:
>>>>>>>> On 12/17/2012 11:12 AM, Nick Bartos wrote:
>>>>>>>>> Here's a log with the rbd debugging enabled:
>>>>>>>>>
>>>>>>>>> https://gist.github.com/raw/4319962/d9690fd92c169198efc5eecabf275ef1808929d2/rbd-hang-test-1355763470.log
>>>>>>>>>
>>>>>>>>> On Fri, Dec 14, 2012 at 10:03 AM, Alex Elder <elder@inktank.com> wrote:
>>>>>>>>>> On 12/14/2012 10:53 AM, Nick Bartos wrote:
>>>>>>>>>>> Yes I was only enabling debugging for libceph.  I'm adding debugging
>>>>>>>>>>> for rbd as well.  I'll do a repro later today when a test cluster
>>>>>>>>>>> opens up.
>>>>>>>>>>
>>>>>>>>>> Excellent, thank you.   -Alex
>>>>>>>>
>>>>>>>> I looked through these debugging messages.  Looking only at the
>>>>>>>> rbd debugging, what I see seems to indicate that rbd is idle at
>>>>>>>> the point the "hang" seems to start.  This suggests that the hang
>>>>>>>> is not due to rbd itself, but rather whatever it is that might
>>>>>>>> be responsible for using the rbd image once it has been mapped.
>>>>>>>>
>>>>>>>> Is that possible?  I don't know what process you have that is
>>>>>>>> mapping the rbd image, and what is supposed to be the next thing
>>>>>>>> it does.  (I realize this may not make a lot of sense, given
>>>>>>>> a patch in rdb seems to have caused the hang to begin occurring.)
>>>>>>>>
>>>>>>>> Also note that the debugging information available (i.e., the
>>>>>>>> lines in the code that can output debugging information) may
>>>>>>>> well be incomplete.  So if you don't find anything it may be
>>>>>>>> necessary to provide you with another update which might include
>>>>>>>> more debugging.
>>>>>>>>
>>>>>>>> Anyway, could you provide a little more context about what
>>>>>>>> is going on sort of *around* rbd when activity seems to stop?
>>>>>>>>
>>>>>>>> Thanks a lot.
>>>>>>>>
>>>>>>>>                                         -Alex
>>>>>>
>>>>>
>>>
> 


^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: rbd map command hangs for 15 minutes during system start up
  2012-12-31 18:22                                                                                         ` Alex Elder
@ 2013-01-02 15:56                                                                                           ` Nick Bartos
  0 siblings, 0 replies; 56+ messages in thread
From: Nick Bartos @ 2013-01-02 15:56 UTC (permalink / raw)
  To: Alex Elder
  Cc: Sage Weil, Gregory Farnum, Josh Durgin, Mandell Degerness, ceph-devel

So far basic things are working fine, and my hang test is at 78 passes
and still going good.  I'll let you know if any problems crop up with
it.

On Mon, Dec 31, 2012 at 10:22 AM, Alex Elder <elder@inktank.com> wrote:
> On 12/26/2012 03:36 PM, Alex Elder wrote:
>> On 12/26/2012 11:45 AM, Nick Bartos wrote:
>>> Here's a log with a hang on the updated branch:
>>>
>>> https://gist.github.com/raw/4381750/772476e1bae1e6366347a223f34aa6c440b92765/rdb-hang-1356543132.log
>>
>> OK, new naming scheme.  Please try:  wip-nick-1
>
> Now that we've got this resolved, I've created an updated
> "stable" branch with ceph-related bug fixes, based on the
> latest 3.5 stable branch, 3.5.7.  It contains a bunch of
> other bug fixes that what you had been working with did
> not have.
>
> I'm starting my own testing with this branch now.  But it
> would be great if you'd give it a try as well, since I
> know you're a "real" user of this code base.
>
> It's available as branch "linux-3.5.7-ceph" on the
> ceph-client git repository.  Thanks a lot.
>
>                                         -Alex
>
>>
>> I added another simple fix, but then collapsed three commits
>> into one, and added one more (somewhat unrelated).
>>
>> I've done simple testing with this and will subject it to
>> more rigorous testing shortly.  I wanted to make it available
>> to you quickly though.
>>
>>                                       -Alex
>>
>>>
>>> On Thu, Dec 20, 2012 at 1:59 PM, Alex Elder <elder@inktank.com> wrote:
>>>> On 12/20/2012 11:48 AM, Nick Bartos wrote:
>>>>> Unfortunately, we still have a hang:
>>>>>
>>>>> https://gist.github.com/4347052/download
>>>>
>>>> The saga continues, and each time we get a little more
>>>> information.  Please try branch: "wip-nick-newerest"
>>>>
>>>> Thank you.
>>>>
>>>>                                         -Alex
>>>>
>>>>
>>>>> On Wed, Dec 19, 2012 at 2:42 PM, Alex Elder <elder@inktank.com> wrote:
>>>>>> On 12/19/2012 03:25 PM, Alex Elder wrote:
>>>>>>> On 12/18/2012 12:05 PM, Nick Bartos wrote:
>>>>>>>> I've added the output of "ps -ef" in addition to triggering a trace
>>>>>>>> when a hang is detected.  Not much is generally running at that point,
>>>>>>>> but you can have a look:
>>>>>>>>
>>>>>>>> https://gist.github.com/raw/4330223/2f131ee312ee43cb3d8c307a9bf2f454a7edfe57/rbd-hang-1355851498.txt
>>>>>>>
>>>>>>> This helped a lot.  I updated the bug with a little more info.
>>>>>>>
>>>>>>>     http://tracker.newdream.net/issues/3519
>>>>>>>
>>>>>>> I also think I have now found something that could explain what you
>>>>>>> are seeing, and am developing a fix.  I'll provide you an update
>>>>>>> as soon as I have tested what I come up with, almost certainly
>>>>>>> this afternoon.
>>>>>>
>>>>>> Nick, I have a new branch for you to try with a new fix in place.
>>>>>> As you might have predicted, it's named "wip-nick-newest".
>>>>>>
>>>>>> Please give it a try to see if it resolved the hang you've
>>>>>> been seeing and let me know how it goes.  If it continues
>>>>>> to hang, please provide the logs as you have before, it's
>>>>>> been very helpful.
>>>>>>
>>>>>> Thanks a lot.
>>>>>>
>>>>>>                                         -Alex
>>>>>>>
>>>>>>>                                       -Alex
>>>>>>>
>>>>>>>> Is it possible that there is some sort of deadlock going on?  We are
>>>>>>>> doing the rbd maps (and subsequent filesystem mounts) on the same
>>>>>>>> systems which are running the ceph-osd and ceph-mon processes.  To get
>>>>>>>> around the 'sync' deadlock problem, we are using a patch from Sage
>>>>>>>> which ignores system wide sync's on filesystems mounted with the
>>>>>>>> 'mand' option (and we mount the underlying osd filesystems with
>>>>>>>> 'mand').  However I am wondering if there is potential for other types
>>>>>>>> of deadlocks in this environment.
>>>>>>>>
>>>>>>>> Also, we recently saw an rbd hang in a much older version, running
>>>>>>>> kernel 3.5.3 with only the sync hack patch, along side ceph 0.48.1.
>>>>>>>> It's possible that this issue was around for some time, just the
>>>>>>>> recent patches made it happen more often (and thus more reproducible)
>>>>>>>> for us.
>>>>>>>>
>>>>>>>>
>>>>>>>> On Tue, Dec 18, 2012 at 8:09 AM, Alex Elder <elder@inktank.com> wrote:
>>>>>>>>> On 12/17/2012 11:12 AM, Nick Bartos wrote:
>>>>>>>>>> Here's a log with the rbd debugging enabled:
>>>>>>>>>>
>>>>>>>>>> https://gist.github.com/raw/4319962/d9690fd92c169198efc5eecabf275ef1808929d2/rbd-hang-test-1355763470.log
>>>>>>>>>>
>>>>>>>>>> On Fri, Dec 14, 2012 at 10:03 AM, Alex Elder <elder@inktank.com> wrote:
>>>>>>>>>>> On 12/14/2012 10:53 AM, Nick Bartos wrote:
>>>>>>>>>>>> Yes I was only enabling debugging for libceph.  I'm adding debugging
>>>>>>>>>>>> for rbd as well.  I'll do a repro later today when a test cluster
>>>>>>>>>>>> opens up.
>>>>>>>>>>>
>>>>>>>>>>> Excellent, thank you.   -Alex
>>>>>>>>>
>>>>>>>>> I looked through these debugging messages.  Looking only at the
>>>>>>>>> rbd debugging, what I see seems to indicate that rbd is idle at
>>>>>>>>> the point the "hang" seems to start.  This suggests that the hang
>>>>>>>>> is not due to rbd itself, but rather whatever it is that might
>>>>>>>>> be responsible for using the rbd image once it has been mapped.
>>>>>>>>>
>>>>>>>>> Is that possible?  I don't know what process you have that is
>>>>>>>>> mapping the rbd image, and what is supposed to be the next thing
>>>>>>>>> it does.  (I realize this may not make a lot of sense, given
>>>>>>>>> a patch in rdb seems to have caused the hang to begin occurring.)
>>>>>>>>>
>>>>>>>>> Also note that the debugging information available (i.e., the
>>>>>>>>> lines in the code that can output debugging information) may
>>>>>>>>> well be incomplete.  So if you don't find anything it may be
>>>>>>>>> necessary to provide you with another update which might include
>>>>>>>>> more debugging.
>>>>>>>>>
>>>>>>>>> Anyway, could you provide a little more context about what
>>>>>>>>> is going on sort of *around* rbd when activity seems to stop?
>>>>>>>>>
>>>>>>>>> Thanks a lot.
>>>>>>>>>
>>>>>>>>>                                         -Alex
>>>>>>>
>>>>>>
>>>>
>>
>

^ permalink raw reply	[flat|nested] 56+ messages in thread

end of thread, other threads:[~2013-01-02 15:56 UTC | newest]

Thread overview: 56+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2012-11-08 22:10 rbd map command hangs for 15 minutes during system start up Mandell Degerness
2012-11-09  1:43 ` Josh Durgin
2012-11-12 22:19   ` Nick Bartos
2012-11-12 23:16     ` Sage Weil
2012-11-16  0:21       ` Nick Bartos
2012-11-16  0:25         ` Sage Weil
2012-11-16 18:36           ` Nick Bartos
2012-11-16 19:16             ` Sage Weil
2012-11-16 22:01               ` Nick Bartos
2012-11-16 22:13                 ` Sage Weil
2012-11-16 22:16                   ` Nick Bartos
2012-11-16 22:21                     ` Sage Weil
2012-11-19 23:04                       ` Nick Bartos
2012-11-19 23:34                         ` Gregory Farnum
2012-11-20 21:53                           ` Nick Bartos
2012-11-21  1:31                             ` Nick Bartos
2012-11-21 16:50                               ` Sage Weil
2012-11-21 17:02                                 ` Nick Bartos
2012-11-21 17:34                                   ` Nick Bartos
2012-11-21 21:41                                     ` Nick Bartos
2012-11-22  4:47                                       ` Sage Weil
2012-11-22  5:49                                         ` Nick Bartos
2012-11-22 18:04                                           ` Nick Bartos
2012-11-29 20:37                                             ` Alex Elder
2012-11-30 18:49                                               ` Nick Bartos
2012-11-30 19:10                                                 ` Alex Elder
2012-11-30 19:31                                                   ` Sage Weil
2012-11-30 23:22                                               ` Alex Elder
2012-12-02  5:34                                                 ` Nick Bartos
2012-12-03  4:43                                                   ` Alex Elder
2012-12-10 21:57                                                     ` Alex Elder
2012-12-11 17:26                                                       ` Nick Bartos
2012-12-11 18:01                                                         ` Alex Elder
2012-12-11 19:44                                                           ` Alex Elder
2012-12-13  0:57                                                             ` Nick Bartos
2012-12-13 19:00                                                               ` Nick Bartos
2012-12-13 19:07                                                                 ` Alex Elder
2012-12-14 16:46                                                                 ` Alex Elder
2012-12-14 16:53                                                                   ` Nick Bartos
2012-12-14 18:03                                                                     ` Alex Elder
2012-12-17 17:12                                                                       ` Nick Bartos
2012-12-18 16:09                                                                         ` Alex Elder
2012-12-18 18:05                                                                           ` Nick Bartos
2012-12-19 21:25                                                                             ` Alex Elder
2012-12-19 22:42                                                                               ` Alex Elder
2012-12-20 17:48                                                                                 ` Nick Bartos
2012-12-20 21:59                                                                                   ` Alex Elder
2012-12-26 17:45                                                                                     ` Nick Bartos
2012-12-26 17:50                                                                                       ` Alex Elder
2012-12-26 21:36                                                                                       ` Alex Elder
2012-12-27 17:33                                                                                         ` Nick Bartos
2012-12-27 18:43                                                                                           ` Sage Weil
2012-12-27 19:41                                                                                             ` Alex Elder
2012-12-31 18:22                                                                                         ` Alex Elder
2013-01-02 15:56                                                                                           ` Nick Bartos
2012-11-16 22:23                     ` Gregory Farnum

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.