All of lore.kernel.org
 help / color / mirror / Atom feed
From: "Yan, Zheng" <ukernel@gmail.com>
To: Barclay Jameson <almightybeeij@gmail.com>
Cc: "Gregory Farnum" <greg@gregs42.com>,
	"ceph-devel@vger.kernel.org" <ceph-devel@vger.kernel.org>,
	严正 <zyan@redhat.com>
Subject: Re: Ceph hard lock Hammer 9.2
Date: Wed, 24 Jun 2015 21:48:51 +0800	[thread overview]
Message-ID: <CAAM7YAnFhtYXvrz+kb6ADGnqoTh7rtHcn7qA5wAivrxkAbyj2A@mail.gmail.com> (raw)
In-Reply-To: <CAMzumdaTn5NX9g3J_ea0wAzPVDt2tsMDFsafjECDuzN9OOWa=w@mail.gmail.com>

Could you please run "echo 1 > /proc/sys/kernel/sysrq;  echo t >
/proc/sysrq-trigger" when this warning happens again.  then send the
kernel message to us.

Regards
Yan, Zheng

On Tue, Jun 23, 2015 at 10:25 PM, Barclay Jameson
<almightybeeij@gmail.com> wrote:
> Sure,
> I guess it's actually a soft kernel lock since it's only the
> filesystem that is hung with high IO wait.
> The kernel is 4.0.4-1.el6.elrepo.x86.
> The Ceph version is 0.94.2 (Sorry about the confusion I missed a 4
> when I typed in the subject line).
> I was testing copying 100,000 files from directory (dir1) to
> (dir1-`hostname`) on three septate hosts.
> 2 of the hosts completed the job and the third one hung with the stack
> trace in /var/log/messages.
>
> On Tue, Jun 23, 2015 at 6:54 AM, Gregory Farnum <greg@gregs42.com> wrote:
>> On Mon, Jun 22, 2015 at 9:45 PM, Barclay Jameson
>> <almightybeeij@gmail.com> wrote:
>>> Has anyone seen this?
>>
>> Can you describe the kernel you're using, the workload you were
>> running, the Ceph cluster you're running against, etc?
>>
>>>
>>> Jun 22 15:09:27 node kernel: Call Trace:
>>> Jun 22 15:09:27 node kernel: [<ffffffff816803ee>] schedule+0x3e/0x90
>>> Jun 22 15:09:27 node kernel: [<ffffffff8168062e>]
>>> schedule_preempt_disabled+0xe/0x10
>>> Jun 22 15:09:27 node kernel: [<ffffffff81681ce3>]
>>> __mutex_lock_slowpath+0x93/0x100
>>> Jun 22 15:09:27 node kernel: [<ffffffffa060def8>] ?
>>> __cap_is_valid+0x58/0x70 [ceph]
>>> Jun 22 15:09:27 node kernel: [<ffffffff81681d73>] mutex_lock+0x23/0x40
>>> Jun 22 15:09:27 node kernel: [<ffffffffa0610f2d>]
>>> ceph_check_caps+0x38d/0x780 [ceph]
>>> Jun 22 15:09:27 node kernel: [<ffffffff812f5a9b>] ?
>>> __radix_tree_delete_node+0x7b/0x130
>>> Jun 22 15:09:27 node kernel: [<ffffffffa0612637>]
>>> ceph_put_wrbuffer_cap_refs+0xf7/0x240 [ceph]
>>> Jun 22 15:09:27 node kernel: [<ffffffffa060b170>]
>>> writepages_finish+0x200/0x290 [ceph]
>>> Jun 22 15:09:27 node kernel: [<ffffffffa05e2731>]
>>> handle_reply+0x4f1/0x640 [libceph]
>>> Jun 22 15:09:27 node kernel: [<ffffffffa05e3065>] dispatch+0x85/0xa0 [libceph]
>>> Jun 22 15:09:27 node kernel: [<ffffffffa05d7ceb>]
>>> process_message+0xab/0xd0 [libceph]
>>> Jun 22 15:09:27 node kernel: [<ffffffffa05db052>] try_read+0x2d2/0x430 [libceph]
>>> Jun 22 15:09:27 node kernel: [<ffffffffa05db7e8>] con_work+0x78/0x220 [libceph]
>>> Jun 22 15:09:27 node kernel: [<ffffffff8108c475>] process_one_work+0x145/0x460
>>> Jun 22 15:09:27 node kernel: [<ffffffff8108c8b2>] worker_thread+0x122/0x420
>>> Jun 22 15:09:27 node kernel: [<ffffffff8167fdb8>] ? __schedule+0x398/0x840
>>> Jun 22 15:09:27 node kernel: [<ffffffff8108c790>] ? process_one_work+0x460/0x460
>>> Jun 22 15:09:27 node kernel: [<ffffffff8108c790>] ? process_one_work+0x460/0x460
>>> Jun 22 15:09:27 node kernel: [<ffffffff8109170e>] kthread+0xce/0xf0
>>> Jun 22 15:09:27 node kernel: [<ffffffff81091640>] ?
>>> kthread_freezable_should_stop+0x70/0x70
>>> Jun 22 15:09:27 node kernel: [<ffffffff81683dd8>] ret_from_fork+0x58/0x90
>>> Jun 22 15:09:27 node kernel: [<ffffffff81091640>] ?
>>> kthread_freezable_should_stop+0x70/0x70
>>> Jun 22 15:11:27 node kernel: INFO: task kworker/2:1:40 blocked for
>>> more than 120 seconds.
>>> Jun 22 15:11:27 node kernel:      Tainted: G          I
>>> 4.0.4-1.el6.elrepo.x86_64 #1
>>> Jun 22 15:11:27 node kernel: "echo 0 >
>>> /proc/sys/kernel/hung_task_timeout_secs" disables this message.
>>> Jun 22 15:11:27 node kernel: kworker/2:1     D ffff881ff279f7f8     0
>>>   40      2 0x00000000
>>> Jun 22 15:11:27 node kernel: Workqueue: ceph-msgr con_work [libceph]
>>> Jun 22 15:11:27 node kernel: ffff881ff279f7f8 ffff881ff261c010
>>> ffff881ff2b67050 ffff88207fd95270
>>> Jun 22 15:11:27 node kernel: ffff881ff279c010 ffff88207fd15200
>>> 7fffffffffffffff 0000000000000002
>>> Jun 22 15:11:27 node kernel: ffffffff81680ae0 ffff881ff279f818
>>> ffffffff816803ee ffffffff810ae63b
>>> --
>>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
> --
> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

  reply	other threads:[~2015-06-24 13:48 UTC|newest]

Thread overview: 5+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2015-06-22 20:45 Ceph hard lock Hammer 9.2 Barclay Jameson
2015-06-23 11:54 ` Gregory Farnum
2015-06-23 14:25   ` Barclay Jameson
2015-06-24 13:48     ` Yan, Zheng [this message]
2015-06-25 14:11       ` Barclay Jameson

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=CAAM7YAnFhtYXvrz+kb6ADGnqoTh7rtHcn7qA5wAivrxkAbyj2A@mail.gmail.com \
    --to=ukernel@gmail.com \
    --cc=almightybeeij@gmail.com \
    --cc=ceph-devel@vger.kernel.org \
    --cc=greg@gregs42.com \
    --cc=zyan@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.