* rbd unmap deadlock
@ 2014-05-02 16:04 Hannes Landeholm
2014-05-02 16:09 ` Hannes Landeholm
2014-05-02 16:09 ` Alex Elder
0 siblings, 2 replies; 7+ messages in thread
From: Hannes Landeholm @ 2014-05-02 16:04 UTC (permalink / raw)
To: Ceph Development
Hi, I just had a rbd unmap operation deadlock on my development
machine. The file system was in heavy use before I did it but I have a
sync barrier before the umount and unmap so it shouldn't matter. The
rbd unmap hanged in "State: D (disk sleep)". I have so far waited
over 10 minutes, this normally takes < 1 sec.
Here is the /proc/pid/stack output:
[<ffffffff8107e23a>] flush_workqueue+0x11a/0x5a0
[<ffffffffa031b415>] ceph_msgr_flush+0x15/0x20 [libceph]
[<ffffffffa03219c6>] ceph_monc_stop+0x46/0x120 [libceph]
[<ffffffffa031af28>] ceph_destroy_client+0x38/0xa0 [libceph]
[<ffffffffa0359b88>] rbd_client_release+0x68/0xa0 [rbd]
[<ffffffffa0359bec>] rbd_put_client+0x2c/0x30 [rbd]
[<ffffffffa0359c06>] rbd_dev_destroy+0x16/0x30 [rbd]
[<ffffffffa0359c77>] rbd_dev_image_release+0x57/0x60 [rbd]
[<ffffffffa035adc7>] do_rbd_remove.isra.25+0x167/0x1b0 [rbd]
[<ffffffffa035ae54>] rbd_remove+0x24/0x30 [rbd]
[<ffffffff8136ea67>] bus_attr_store+0x27/0x30
[<ffffffff81218d4d>] sysfs_kf_write+0x3d/0x50
[<ffffffff8121c982>] kernfs_fop_write+0xd2/0x140
[<ffffffff811a67fa>] vfs_write+0xba/0x1e0
[<ffffffff811a7206>] SyS_write+0x46/0xc0
[<ffffffff814e66e9>] system_call_fastpath+0x16/0x1b
[<ffffffffffffffff>] 0xffffffffffffffff
Unfortunately our rbd.ko does not appear to have any debug symbols.
Other unmaps also hanged after this that have the same parent. (We are
using layering.) Linux version: 3.14.1.
Thank you for your time,
--
Hannes Landeholm
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: rbd unmap deadlock
2014-05-02 16:04 rbd unmap deadlock Hannes Landeholm
@ 2014-05-02 16:09 ` Hannes Landeholm
2014-05-02 16:09 ` Alex Elder
1 sibling, 0 replies; 7+ messages in thread
From: Hannes Landeholm @ 2014-05-02 16:09 UTC (permalink / raw)
To: Ceph Development
Correction: I just realized that the other hanged unmaps does not have
the same parent. They are actually in different pools and unrelated.
Thank you for your time,
--
Hannes Landeholm
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: rbd unmap deadlock
2014-05-02 16:04 rbd unmap deadlock Hannes Landeholm
2014-05-02 16:09 ` Hannes Landeholm
@ 2014-05-02 16:09 ` Alex Elder
2014-05-02 16:23 ` Hannes Landeholm
1 sibling, 1 reply; 7+ messages in thread
From: Alex Elder @ 2014-05-02 16:09 UTC (permalink / raw)
To: Hannes Landeholm, Ceph Development
On 05/02/2014 11:04 AM, Hannes Landeholm wrote:
> Hi, I just had a rbd unmap operation deadlock on my development
> machine. The file system was in heavy use before I did it but I have a
> sync barrier before the umount and unmap so it shouldn't matter. The
> rbd unmap hanged in "State: D (disk sleep)". I have so far waited
> over 10 minutes, this normally takes < 1 sec.
>
> Here is the /proc/pid/stack output:
>
> [<ffffffff8107e23a>] flush_workqueue+0x11a/0x5a0
> [<ffffffffa031b415>] ceph_msgr_flush+0x15/0x20 [libceph]
> [<ffffffffa03219c6>] ceph_monc_stop+0x46/0x120 [libceph]
> [<ffffffffa031af28>] ceph_destroy_client+0x38/0xa0 [libceph]
> [<ffffffffa0359b88>] rbd_client_release+0x68/0xa0 [rbd]
> [<ffffffffa0359bec>] rbd_put_client+0x2c/0x30 [rbd]
> [<ffffffffa0359c06>] rbd_dev_destroy+0x16/0x30 [rbd]
> [<ffffffffa0359c77>] rbd_dev_image_release+0x57/0x60 [rbd]
> [<ffffffffa035adc7>] do_rbd_remove.isra.25+0x167/0x1b0 [rbd]
> [<ffffffffa035ae54>] rbd_remove+0x24/0x30 [rbd]
> [<ffffffff8136ea67>] bus_attr_store+0x27/0x30
> [<ffffffff81218d4d>] sysfs_kf_write+0x3d/0x50
> [<ffffffff8121c982>] kernfs_fop_write+0xd2/0x140
> [<ffffffff811a67fa>] vfs_write+0xba/0x1e0
> [<ffffffff811a7206>] SyS_write+0x46/0xc0
> [<ffffffff814e66e9>] system_call_fastpath+0x16/0x1b
> [<ffffffffffffffff>] 0xffffffffffffffff
>
> Unfortunately our rbd.ko does not appear to have any debug symbols.
>
> Other unmaps also hanged after this that have the same parent. (We are
> using layering.) Linux version: 3.14.1.
Is this "stock" 3.14.1? Can you provide the full output of "uname -a"?
And if possible, either /proc/config.gz or /boot/config-3.13.1 (or
whichever file seems to match the currently-running kernel)?
Thank you.
-Alex
>
> Thank you for your time,
> --
> Hannes Landeholm
> --
> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
>
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: rbd unmap deadlock
2014-05-02 16:09 ` Alex Elder
@ 2014-05-02 16:23 ` Hannes Landeholm
2014-05-02 16:30 ` Ilya Dryomov
0 siblings, 1 reply; 7+ messages in thread
From: Hannes Landeholm @ 2014-05-02 16:23 UTC (permalink / raw)
To: Alex Elder; +Cc: Ceph Development
On Fri, May 2, 2014 at 6:09 PM, Alex Elder <elder@ieee.org> wrote:
> On 05/02/2014 11:04 AM, Hannes Landeholm wrote:
>>
>> Hi, I just had a rbd unmap operation deadlock on my development
>> machine. The file system was in heavy use before I did it but I have a
>> sync barrier before the umount and unmap so it shouldn't matter. The
>> rbd unmap hanged in "State: D (disk sleep)". I have so far waited
>> over 10 minutes, this normally takes < 1 sec.
>>
>> Here is the /proc/pid/stack output:
>>
>> [<ffffffff8107e23a>] flush_workqueue+0x11a/0x5a0
>> [<ffffffffa031b415>] ceph_msgr_flush+0x15/0x20 [libceph]
>> [<ffffffffa03219c6>] ceph_monc_stop+0x46/0x120 [libceph]
>> [<ffffffffa031af28>] ceph_destroy_client+0x38/0xa0 [libceph]
>> [<ffffffffa0359b88>] rbd_client_release+0x68/0xa0 [rbd]
>> [<ffffffffa0359bec>] rbd_put_client+0x2c/0x30 [rbd]
>> [<ffffffffa0359c06>] rbd_dev_destroy+0x16/0x30 [rbd]
>> [<ffffffffa0359c77>] rbd_dev_image_release+0x57/0x60 [rbd]
>> [<ffffffffa035adc7>] do_rbd_remove.isra.25+0x167/0x1b0 [rbd]
>> [<ffffffffa035ae54>] rbd_remove+0x24/0x30 [rbd]
>> [<ffffffff8136ea67>] bus_attr_store+0x27/0x30
>> [<ffffffff81218d4d>] sysfs_kf_write+0x3d/0x50
>> [<ffffffff8121c982>] kernfs_fop_write+0xd2/0x140
>> [<ffffffff811a67fa>] vfs_write+0xba/0x1e0
>> [<ffffffff811a7206>] SyS_write+0x46/0xc0
>> [<ffffffff814e66e9>] system_call_fastpath+0x16/0x1b
>> [<ffffffffffffffff>] 0xffffffffffffffff
>>
>> Unfortunately our rbd.ko does not appear to have any debug symbols.
>>
>> Other unmaps also hanged after this that have the same parent. (We are
>> using layering.) Linux version: 3.14.1.
>
> Is this "stock" 3.14.1? Can you provide the full output of "uname -a"?
> And if possible, either /proc/config.gz or /boot/config-3.13.1 (or
> whichever file seems to match the currently-running kernel)?
Yes, this is a "stock" Arch 3.14.1 kernel with no custom patches.
uname: Linux localhost 3.14.1-1-js #1 SMP PREEMPT Tue Apr 15 17:59:05
CEST 2014 x86_64 GNU/Linux
config: http://pastebin.com/unZCzXZZ
Thank you for your time,
--
Hannes Landeholm
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: rbd unmap deadlock
2014-05-02 16:23 ` Hannes Landeholm
@ 2014-05-02 16:30 ` Ilya Dryomov
2014-05-02 16:40 ` Hannes Landeholm
0 siblings, 1 reply; 7+ messages in thread
From: Ilya Dryomov @ 2014-05-02 16:30 UTC (permalink / raw)
To: Hannes Landeholm; +Cc: Alex Elder, Ceph Development
Can you succesfully map and then unmap a different image? What's the
general state of the cluster, i.e. the output of ceph -s?
Thanks,
Ilya
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: rbd unmap deadlock
2014-05-02 16:30 ` Ilya Dryomov
@ 2014-05-02 16:40 ` Hannes Landeholm
2014-05-02 16:52 ` Ilya Dryomov
0 siblings, 1 reply; 7+ messages in thread
From: Hannes Landeholm @ 2014-05-02 16:40 UTC (permalink / raw)
To: Ilya Dryomov; +Cc: Alex Elder, Ceph Development
On Fri, May 2, 2014 at 6:30 PM, Ilya Dryomov <ilya.dryomov@inktank.com> wrote:
> Can you succesfully map and then unmap a different image? What's the
> general state of the cluster, i.e. the output of ceph -s?
>
> Thanks,
>
> Ilya
Yes, that was possible, however (as I mentioned) some additional
unmaps also deadlocked (and some succeeded).
Unfortunately I've rebooted the devel machine now (which fixed the
issue). The status of ceph looks pretty much the same as before
though:
cluster e1206f49-cc79-436e-b69d-375e0374d7a9
health HEALTH_WARN
monmap e1: 1 mons at {localhost=192.168.0.215:6789/0}, election
epoch 1, quorum 0 localhost
osdmap e550: 3 osds: 1 up, 1 in
pgmap v153419: 892 pgs, 10 pools, 37194 MB data, 10703 objects
49119 MB used, 299 GB / 349 GB avail
892 active+clean
2014-05-02 18:31:57.254360 mon.0 [INF] pgmap v153419: 892 pgs: 892
active+clean; 37194 MB data, 49119 MB used, 299 GB /
349 GB avail
FYI: This machine runs both the ceph cluster and the clients.
Thank you for your time,
--
Hannes Landeholm
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: rbd unmap deadlock
2014-05-02 16:40 ` Hannes Landeholm
@ 2014-05-02 16:52 ` Ilya Dryomov
0 siblings, 0 replies; 7+ messages in thread
From: Ilya Dryomov @ 2014-05-02 16:52 UTC (permalink / raw)
To: Hannes Landeholm; +Cc: Alex Elder, Ceph Development
On Fri, May 2, 2014 at 8:40 PM, Hannes Landeholm <hannes@jumpstarter.io> wrote:
> FYI: This machine runs both the ceph cluster and the clients.
I'll file a ticket. I think I saw something similar sometime ago, at
least the stacktrace looks familiar, and that was a dev box running
both servers and kernel client too.
Thanks,
Ilya
^ permalink raw reply [flat|nested] 7+ messages in thread
end of thread, other threads:[~2014-05-02 16:52 UTC | newest]
Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2014-05-02 16:04 rbd unmap deadlock Hannes Landeholm
2014-05-02 16:09 ` Hannes Landeholm
2014-05-02 16:09 ` Alex Elder
2014-05-02 16:23 ` Hannes Landeholm
2014-05-02 16:30 ` Ilya Dryomov
2014-05-02 16:40 ` Hannes Landeholm
2014-05-02 16:52 ` Ilya Dryomov
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.