All of lore.kernel.org
 help / color / mirror / Atom feed
* Any suggestion to deal with slow request?
@ 2016-01-07  4:03 Jevon Qiao
  2016-01-07 16:43 ` [ceph-users] " Robert LeBlanc
  0 siblings, 1 reply; 3+ messages in thread
From: Jevon Qiao @ 2016-01-07  4:03 UTC (permalink / raw)
  To: ceph-users-Qp0mS5GaXlQ, ceph-devel-u79uwXL29TY76Z2rM5mHXA


[-- Attachment #1.1: Type: text/plain, Size: 1845 bytes --]

Hi Cephers,

We have a Ceph cluster running 0.80.9, which consists of 36 OSDs with 3 
replicas. Recently, some OSDs keep reporting slow request and the 
cluster has a performance downgrade.

 From the log of one OSD, I observe that all the slow requests are 
resulted from waiting for the replicas to complete. And the replication 
OSDs are not always some specific ones but could be any other two OSDs.

    2016-01-06 08:17:11.887016 7f175ef25700  0 log [WRN] : slow request
    1.162776 seconds old, received at 2016-01-06 08:17:11.887092:
    osd_op(client.13302933.0:839452
    rbd_data.c2659c728b0ddb.0000000000000024 [stat,set-alloc-hint
    object_size 16777216 write_size 16777216,write 12099584~8192]
    3.abd08522 ack+ondisk+write e4661) v4 currently waiting for subops
    from 24,31

I dumped out the historic Ops of the OSD and noticed the following 
information:
1) wait about 8 seconds for the replies from the replica OSDs.
                     { "time": "2016-01-06 08:17:03.879264",
                       "event": "op_applied"},
                     { "time": "2016-01-06 08:17:11.684598",
                       "event": "sub_op_applied_rec"},
                     { "time": "2016-01-06 08:17:11.687016",
                       "event": "sub_op_commit_rec"},

2) spend more than 3 seconds in writeq and 2 seconds to write the journal.
                   { "time": "2016-01-06 08:19:16.887519",
                       "event": "commit_queued_for_journal_write"},
                     { "time": "2016-01-06 08:19:20.109339",
                       "event": "write_thread_in_journal_buffer"},
                     { "time": "2016-01-06 08:19:22.177952",
                       "event": "journaled_completion_queued"},

Any ideas or suggestions?

BTW, I checked the underlying network with iperf, it works fine.

Thanks,
Jevon

[-- Attachment #1.2: Type: text/html, Size: 2601 bytes --]

[-- Attachment #2: Type: text/plain, Size: 178 bytes --]

_______________________________________________
ceph-users mailing list
ceph-users-idqoXFIVOFJgJs9I8MT0rw@public.gmane.org
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2016-01-08  4:22 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-01-07  4:03 Any suggestion to deal with slow request? Jevon Qiao
2016-01-07 16:43 ` [ceph-users] " Robert LeBlanc
     [not found]   ` <CAANLjFrTf1CCm=zWJSpFU78WeODhfVeqbr7RxOz3Df4ojABZug-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2016-01-08  4:22     ` Jevon Qiao

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.