All of lore.kernel.org
 help / color / mirror / Atom feed
* Slow request warnings on 0.48
@ 2012-07-04 16:53 David Blundell
  2012-07-04 16:58 ` Alexandre DERUMIER
  0 siblings, 1 reply; 12+ messages in thread
From: David Blundell @ 2012-07-04 16:53 UTC (permalink / raw)
  To: ceph-devel

I have three servers running mon and osd using Ubuntu 12.04 that I have been testing with RADOS storing RBD KVM instances

0.47.3 worked extremely well (once I got over a few btrfs issues).  The same servers running 0.48 give a large number of "[WRN] slow request" messages whenever I generate a lot of random IO in the KVM instances using iozone.  The slow responses eventually leads to disk timeouts on the KVM instances.

I have erased the osds and recreated on new btrfs volumes with the same result.

I have also tried switching to xfs using mkfs.xfs -n size=64k with noatime, inode64,delaylog,logbufs=8,logbsize=256k

Xfs gives the same result - the iozone tests run fine until the random IO starts and then there are lots of slow request warnings.

Does anyone have any ideas about the best place to start troubleshooting / debugging?

Thanks,

David

^ permalink raw reply	[flat|nested] 12+ messages in thread
* Re: Slow request warnings on 0.48
@ 2012-07-19 10:48 Matthew Richardson
  0 siblings, 0 replies; 12+ messages in thread
From: Matthew Richardson @ 2012-07-19 10:48 UTC (permalink / raw)
  To: ceph-devel

[-- Attachment #1: Type: text/plain, Size: 1620 bytes --]

I'd just like to report the same behaviour on my test cluster with 0.48.

I've set up a single box (Sl6.1 - 2.6.32-220.23.1 kernel) with 1 mds,
mon and osd, and replication set to '1' for both data and metadata.

Having mounted using ceph-fuse, I'm running a simple fio job to create load:

[global]
directory=/mnt/ceph
size=500M
rw=read
ioengine=libaio

[simple]

I'm then watching the latency with ioping.

With rw=read, rw=randread (random reads) or rw=write (sequential writes)
I see no problems and the latency sits around 1-2ms.  However, with
rw=randwrite (random writes) I see the latency jump to between 5 and 60
seconds, and the following types of warning lines appear:

2012-07-19 10:29:39.417625 osd.0 [WRN] 11 slow requests, 6 included
below; oldest blocked for > 54.425766 secs
[WRN] slow request 54.420958 seconds old, received at 2012-07-19
10:28:44.996584: osd_op(client.4113.0:9153 100000003ed.0000003b [write
847872~4096] 0.dc4b476f snapc 1=[]) v4 currently started
2012-07-19 10:29:39.417641 osd.0 [WRN] slow request 54.420587 seconds
old, received at 2012-07-19 10:28:44.996955: osd_op(client.4113.0:9154
100000003ed.00000000 [write 1175552~4096] 0.44a7cb80 snapc 1=[]) v4
currently started
[...snip...]


Let me know if there's any more information that I can provide that
might help with diagnosing the problem (also bearing in mind that I'm
new to ceph so might need extra notes on generating tests, dumps etc :) )

Thanks,

Matthew


-- 
The University of Edinburgh is a charitable body, registered in
Scotland, with registration number SC005336.


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 198 bytes --]

^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2012-07-19 11:33 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2012-07-04 16:53 Slow request warnings on 0.48 David Blundell
2012-07-04 16:58 ` Alexandre DERUMIER
2012-07-04 18:59   ` Gregory Farnum
2012-07-04 22:03     ` David Blundell
2012-07-05 17:58   ` Mark Nelson
2012-07-05 18:16     ` David Blundell
2012-07-05 18:33     ` Alexandre DERUMIER
2012-07-05 18:43     ` David Blundell
2012-07-05 19:49       ` Mark Nelson
2012-07-05 20:21         ` Samuel Just
2012-07-05 22:15           ` David Blundell
2012-07-19 10:48 Matthew Richardson

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.