* Slow request warnings on 0.48
@ 2012-07-04 16:53 David Blundell
2012-07-04 16:58 ` Alexandre DERUMIER
0 siblings, 1 reply; 12+ messages in thread
From: David Blundell @ 2012-07-04 16:53 UTC (permalink / raw)
To: ceph-devel
I have three servers running mon and osd using Ubuntu 12.04 that I have been testing with RADOS storing RBD KVM instances
0.47.3 worked extremely well (once I got over a few btrfs issues). The same servers running 0.48 give a large number of "[WRN] slow request" messages whenever I generate a lot of random IO in the KVM instances using iozone. The slow responses eventually leads to disk timeouts on the KVM instances.
I have erased the osds and recreated on new btrfs volumes with the same result.
I have also tried switching to xfs using mkfs.xfs -n size=64k with noatime, inode64,delaylog,logbufs=8,logbsize=256k
Xfs gives the same result - the iozone tests run fine until the random IO starts and then there are lots of slow request warnings.
Does anyone have any ideas about the best place to start troubleshooting / debugging?
Thanks,
David
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: Slow request warnings on 0.48
2012-07-04 16:53 Slow request warnings on 0.48 David Blundell
@ 2012-07-04 16:58 ` Alexandre DERUMIER
2012-07-04 18:59 ` Gregory Farnum
2012-07-05 17:58 ` Mark Nelson
0 siblings, 2 replies; 12+ messages in thread
From: Alexandre DERUMIER @ 2012-07-04 16:58 UTC (permalink / raw)
To: David Blundell; +Cc: ceph-devel
Hi, I see same messages here after upgrade to 0.48.
with random write benchmark.
I have more lags than before with 0.47 (but disks are at 100% usage, so can't tell if it's normal or not)
----- Mail original -----
De: "David Blundell" <David.Blundell@100percentit.com>
À: ceph-devel@vger.kernel.org
Envoyé: Mercredi 4 Juillet 2012 18:53:02
Objet: Slow request warnings on 0.48
I have three servers running mon and osd using Ubuntu 12.04 that I have been testing with RADOS storing RBD KVM instances
0.47.3 worked extremely well (once I got over a few btrfs issues). The same servers running 0.48 give a large number of "[WRN] slow request" messages whenever I generate a lot of random IO in the KVM instances using iozone. The slow responses eventually leads to disk timeouts on the KVM instances.
I have erased the osds and recreated on new btrfs volumes with the same result.
I have also tried switching to xfs using mkfs.xfs -n size=64k with noatime, inode64,delaylog,logbufs=8,logbsize=256k
Xfs gives the same result - the iozone tests run fine until the random IO starts and then there are lots of slow request warnings.
Does anyone have any ideas about the best place to start troubleshooting / debugging?
Thanks,
David
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
--
--
Alexandre D e rumier
Ingénieur Systèmes et Réseaux
Fixe : 03 20 68 88 85
Fax : 03 20 68 90 88
45 Bvd du Général Leclerc 59100 Roubaix
12 rue Marivaux 75002 Paris
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: Slow request warnings on 0.48
2012-07-04 16:58 ` Alexandre DERUMIER
@ 2012-07-04 18:59 ` Gregory Farnum
2012-07-04 22:03 ` David Blundell
2012-07-05 17:58 ` Mark Nelson
1 sibling, 1 reply; 12+ messages in thread
From: Gregory Farnum @ 2012-07-04 18:59 UTC (permalink / raw)
To: Alexandre DERUMIER, David Blundell; +Cc: ceph-devel
That's odd — there isn't too much that went into the OSD between 0.47 and 0.48 that I can think of, and most of that only impact OSDs when they go through bootup. What does ceph -s display — are all the PGs healthy?
-Greg
On Wednesday, July 4, 2012 at 9:58 AM, Alexandre DERUMIER wrote:
> Hi, I see same messages here after upgrade to 0.48.
>
> with random write benchmark.
>
> I have more lags than before with 0.47 (but disks are at 100% usage, so can't tell if it's normal or not)
>
>
> ----- Mail original -----
>
> De: "David Blundell" <David.Blundell@100percentit.com (mailto:David.Blundell@100percentit.com)>
> À: ceph-devel@vger.kernel.org (mailto:ceph-devel@vger.kernel.org)
> Envoyé: Mercredi 4 Juillet 2012 18:53:02
> Objet: Slow request warnings on 0.48
>
> I have three servers running mon and osd using Ubuntu 12.04 that I have been testing with RADOS storing RBD KVM instances
>
> 0.47.3 worked extremely well (once I got over a few btrfs issues). The same servers running 0.48 give a large number of "[WRN] slow request" messages whenever I generate a lot of random IO in the KVM instances using iozone. The slow responses eventually leads to disk timeouts on the KVM instances.
>
> I have erased the osds and recreated on new btrfs volumes with the same result.
>
> I have also tried switching to xfs using mkfs.xfs -n size=64k with noatime, inode64,delaylog,logbufs=8,logbsize=256k
>
> Xfs gives the same result - the iozone tests run fine until the random IO starts and then there are lots of slow request warnings.
>
> Does anyone have any ideas about the best place to start troubleshooting / debugging?
>
> Thanks,
>
> David
> --
> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
> the body of a message to majordomo@vger.kernel.org (mailto:majordomo@vger.kernel.org)
> More majordomo info at http://vger.kernel.org/majordomo-info.html
>
>
>
> --
>
> --
>
>
>
>
> Alexandre D e rumier
>
> Ingénieur Systèmes et Réseaux
>
>
> Fixe : 03 20 68 88 85
>
> Fax : 03 20 68 90 88
>
>
> 45 Bvd du Général Leclerc 59100 Roubaix
> 12 rue Marivaux 75002 Paris
>
> --
> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
> the body of a message to majordomo@vger.kernel.org (mailto:majordomo@vger.kernel.org)
> More majordomo info at http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: Slow request warnings on 0.48
2012-07-04 18:59 ` Gregory Farnum
@ 2012-07-04 22:03 ` David Blundell
0 siblings, 0 replies; 12+ messages in thread
From: David Blundell @ 2012-07-04 22:03 UTC (permalink / raw)
To: Gregory Farnum; +Cc: Alexandre DERUMIER, <ceph-devel@vger.kernel.org>
On 4 Jul 2012, at 19:59, Gregory Farnum wrote:
> That's odd — there isn't too much that went into the OSD between 0.47 and 0.48 that I can think of, and most of that only impact OSDs when they go through bootup. What does ceph -s display — are all the PGs healthy?
> -Greg
>
Hi Greg,
The PGs all seem to be healthy:
root@store1:~# ceph -s
health HEALTH_OK
monmap e1: 3 mons at {0=10.0.1.40:6789/0,1=10.0.1.41:6789/0,2=10.0.1.42:6789/0}, election epoch 40, quorum 0,1,2 0,1,2
osdmap e342: 7 osds: 7 up, 7 in
pgmap v5403: 1344 pgs: 1344 active+clean; 4620 MB data, 9617 MB used, 1368 GB / 1377 GB avail
mdsmap e50: 0/0/1 up
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: Slow request warnings on 0.48
2012-07-04 16:58 ` Alexandre DERUMIER
2012-07-04 18:59 ` Gregory Farnum
@ 2012-07-05 17:58 ` Mark Nelson
2012-07-05 18:16 ` David Blundell
` (2 more replies)
1 sibling, 3 replies; 12+ messages in thread
From: Mark Nelson @ 2012-07-05 17:58 UTC (permalink / raw)
To: Alexandre DERUMIER; +Cc: David Blundell, ceph-devel
On 07/04/2012 11:58 AM, Alexandre DERUMIER wrote:
> Hi, I see same messages here after upgrade to 0.48.
>
> with random write benchmark.
>
> I have more lags than before with 0.47 (but disks are at 100% usage, so can't tell if it's normal or not)
>
>
> ----- Mail original -----
>
> De: "David Blundell"<David.Blundell@100percentit.com>
> À: ceph-devel@vger.kernel.org
> Envoyé: Mercredi 4 Juillet 2012 18:53:02
> Objet: Slow request warnings on 0.48
>
> I have three servers running mon and osd using Ubuntu 12.04 that I have been testing with RADOS storing RBD KVM instances
>
> 0.47.3 worked extremely well (once I got over a few btrfs issues). The same servers running 0.48 give a large number of "[WRN] slow request" messages whenever I generate a lot of random IO in the KVM instances using iozone. The slow responses eventually leads to disk timeouts on the KVM instances.
>
> I have erased the osds and recreated on new btrfs volumes with the same result.
>
> I have also tried switching to xfs using mkfs.xfs -n size=64k with noatime, inode64,delaylog,logbufs=8,logbsize=256k
>
> Xfs gives the same result - the iozone tests run fine until the random IO starts and then there are lots of slow request warnings.
>
> Does anyone have any ideas about the best place to start troubleshooting / debugging?
>
> Thanks,
>
> David
Hi David and Alexandre,
Does this only happen with random writes or also sequential writes? If
it happens with sequential writes as well, does it happen with rados bench?
--
Mark Nelson
Performance Engineer
Inktank
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 12+ messages in thread
* RE: Slow request warnings on 0.48
2012-07-05 17:58 ` Mark Nelson
@ 2012-07-05 18:16 ` David Blundell
2012-07-05 18:33 ` Alexandre DERUMIER
2012-07-05 18:43 ` David Blundell
2 siblings, 0 replies; 12+ messages in thread
From: David Blundell @ 2012-07-05 18:16 UTC (permalink / raw)
To: Mark Nelson, Alexandre DERUMIER; +Cc: ceph-devel
> Hi David and Alexandre,
>
> Does this only happen with random writes or also sequential writes? If it
> happens with sequential writes as well, does it happen with rados bench?
>
> --
> Mark Nelson
> Performance Engineer
> Inktank
Hi Mark,
I have only ever seen it with random writes. I'll retry rados bench in a few minutes to double check - are there any other tests you would like me to run?
I'm currently waiting for some iozone tests to finish. When the sequential tests are running the logs are fine, as soon as the random writes start, the logs start to fill with messages like:
2012-07-05 19:10:00.599250 osd.6 10.0.1.42:6802/2145 1151 : [WRN] slow request 37.933071 seconds old, received at 2012-07-05 19:09:22.665917: osd_op(client.96416.0:91965 rb.0.1.00000000015f [write 4022272~4096] 2.3777e91a snapc 11=[11,10]) v4 currently waiting for sub ops
2012-07-05 19:10:00.599258 osd.6 10.0.1.42:6802/2145 1152 : [WRN] slow request 37.932836 seconds old, received at 2012-07-05 19:09:22.666152: osd_op(client.96416.0:91966 rb.0.1.00000000015f [write 4034560~4096] 2.3777e91a snapc 11=[11,10]) v4 currently waiting for sub ops
2012-07-05 19:10:03.278141 mon.0 10.0.1.40:6789/0 493 : [INF] pgmap v7564: 1344 pgs: 1344 active+clean; 5183 MB data, 11066 MB used, 1367 GB / 1377 GB avail
2012-07-05 19:09:55.388448 osd.3 10.0.1.41:6802/2540 160 : [WRN] 6 slow requests, 6 included below; oldest blocked for > 32.622016 secs
2012-07-05 19:09:55.388463 osd.3 10.0.1.41:6802/2540 161 : [WRN] slow request 32.622016 seconds old, received at 2012-07-05 19:09:22.766269: osd_op(client.96416.0:92308 rb.0.1.00000000017b [write 4001792~4096] 2.f606a6c6 snapc 11=[11,10]) v4 currently waiting for sub ops
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: Slow request warnings on 0.48
2012-07-05 17:58 ` Mark Nelson
2012-07-05 18:16 ` David Blundell
@ 2012-07-05 18:33 ` Alexandre DERUMIER
2012-07-05 18:43 ` David Blundell
2 siblings, 0 replies; 12+ messages in thread
From: Alexandre DERUMIER @ 2012-07-05 18:33 UTC (permalink / raw)
To: Mark Nelson; +Cc: David Blundell, ceph-devel
It was during a random write (fio benchmark).
I can't reproduce it now,I'll try to do tests again this week.
----- Mail original -----
De: "Mark Nelson" <mark.nelson@inktank.com>
À: "Alexandre DERUMIER" <aderumier@odiso.com>
Cc: "David Blundell" <David.Blundell@100percentit.com>, ceph-devel@vger.kernel.org
Envoyé: Jeudi 5 Juillet 2012 19:58:27
Objet: Re: Slow request warnings on 0.48
On 07/04/2012 11:58 AM, Alexandre DERUMIER wrote:
> Hi, I see same messages here after upgrade to 0.48.
>
> with random write benchmark.
>
> I have more lags than before with 0.47 (but disks are at 100% usage, so can't tell if it's normal or not)
>
>
> ----- Mail original -----
>
> De: "David Blundell"<David.Blundell@100percentit.com>
> À: ceph-devel@vger.kernel.org
> Envoyé: Mercredi 4 Juillet 2012 18:53:02
> Objet: Slow request warnings on 0.48
>
> I have three servers running mon and osd using Ubuntu 12.04 that I have been testing with RADOS storing RBD KVM instances
>
> 0.47.3 worked extremely well (once I got over a few btrfs issues). The same servers running 0.48 give a large number of "[WRN] slow request" messages whenever I generate a lot of random IO in the KVM instances using iozone. The slow responses eventually leads to disk timeouts on the KVM instances.
>
> I have erased the osds and recreated on new btrfs volumes with the same result.
>
> I have also tried switching to xfs using mkfs.xfs -n size=64k with noatime, inode64,delaylog,logbufs=8,logbsize=256k
>
> Xfs gives the same result - the iozone tests run fine until the random IO starts and then there are lots of slow request warnings.
>
> Does anyone have any ideas about the best place to start troubleshooting / debugging?
>
> Thanks,
>
> David
Hi David and Alexandre,
Does this only happen with random writes or also sequential writes? If
it happens with sequential writes as well, does it happen with rados bench?
--
Mark Nelson
Performance Engineer
Inktank
--
--
Alexandre D e rumier
Ingénieur Systèmes et Réseaux
Fixe : 03 20 68 88 85
Fax : 03 20 68 90 88
45 Bvd du Général Leclerc 59100 Roubaix
12 rue Marivaux 75002 Paris
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 12+ messages in thread
* RE: Slow request warnings on 0.48
2012-07-05 17:58 ` Mark Nelson
2012-07-05 18:16 ` David Blundell
2012-07-05 18:33 ` Alexandre DERUMIER
@ 2012-07-05 18:43 ` David Blundell
2012-07-05 19:49 ` Mark Nelson
2 siblings, 1 reply; 12+ messages in thread
From: David Blundell @ 2012-07-05 18:43 UTC (permalink / raw)
To: Mark Nelson, Alexandre DERUMIER; +Cc: ceph-devel
> Hi David and Alexandre,
>
> Does this only happen with random writes or also sequential writes? If it
> happens with sequential writes as well, does it happen with rados bench?
>
> --
> Mark Nelson
> Performance Engineer
> Inktank
Hi Mark,
I just ran "rados -p data bench 60 write -t 16" and a few dd tests with no problems at all so at the moment it looks like only random IO triggers the slow writes.
Please do let me know if there are any other tests that I can do to help track down the cause.
David
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: Slow request warnings on 0.48
2012-07-05 18:43 ` David Blundell
@ 2012-07-05 19:49 ` Mark Nelson
2012-07-05 20:21 ` Samuel Just
0 siblings, 1 reply; 12+ messages in thread
From: Mark Nelson @ 2012-07-05 19:49 UTC (permalink / raw)
To: David Blundell; +Cc: Alexandre DERUMIER, ceph-devel
On 07/05/2012 01:43 PM, David Blundell wrote:
>> Hi David and Alexandre,
>>
>> Does this only happen with random writes or also sequential writes? If it
>> happens with sequential writes as well, does it happen with rados bench?
>>
>> --
>> Mark Nelson
>> Performance Engineer
>> Inktank
>
> Hi Mark,
>
> I just ran "rados -p data bench 60 write -t 16" and a few dd tests with no problems at all so at the moment it looks like only random IO triggers the slow writes.
>
> Please do let me know if there are any other tests that I can do to help track down the cause.
>
> David
Thanks David! We've got some people internally taking a look at this.
I'll let you guys know if there is anything else we need!
Thanks,
Mark
--
Mark Nelson
Performance Engineer
Inktank
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: Slow request warnings on 0.48
2012-07-05 19:49 ` Mark Nelson
@ 2012-07-05 20:21 ` Samuel Just
2012-07-05 22:15 ` David Blundell
0 siblings, 1 reply; 12+ messages in thread
From: Samuel Just @ 2012-07-05 20:21 UTC (permalink / raw)
To: Mark Nelson; +Cc: David Blundell, Alexandre DERUMIER, ceph-devel
David,
Could you try rados -p data bench 60 write -t 16 -b 4096?
rados bench defaults to 4MB objects, this'll give us results for 4k objects.
If you could give me the latency too, that would help.
-Sam
On Thu, Jul 5, 2012 at 12:49 PM, Mark Nelson <mark.nelson@inktank.com> wrote:
> On 07/05/2012 01:43 PM, David Blundell wrote:
>>>
>>> Hi David and Alexandre,
>>>
>>> Does this only happen with random writes or also sequential writes? If
>>> it
>>> happens with sequential writes as well, does it happen with rados bench?
>>>
>>> --
>>> Mark Nelson
>>> Performance Engineer
>>> Inktank
>>
>>
>> Hi Mark,
>>
>> I just ran "rados -p data bench 60 write -t 16" and a few dd tests with no
>> problems at all so at the moment it looks like only random IO triggers the
>> slow writes.
>>
>> Please do let me know if there are any other tests that I can do to help
>> track down the cause.
>>
>> David
>
>
> Thanks David! We've got some people internally taking a look at this. I'll
> let you guys know if there is anything else we need!
>
> Thanks,
> Mark
>
>
> --
> Mark Nelson
> Performance Engineer
> Inktank
> --
> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: Slow request warnings on 0.48
2012-07-05 20:21 ` Samuel Just
@ 2012-07-05 22:15 ` David Blundell
0 siblings, 0 replies; 12+ messages in thread
From: David Blundell @ 2012-07-05 22:15 UTC (permalink / raw)
To: Samuel Just; +Cc: Mark Nelson, Alexandre DERUMIER, ceph-devel
On 5 Jul 2012, at 21:21, Samuel Just wrote:
> David,
>
> Could you try rados -p data bench 60 write -t 16 -b 4096?
>
> rados bench defaults to 4MB objects, this'll give us results for 4k objects.
>
> If you could give me the latency too, that would help.
> -Sam
Hi Sam,
I first ran this with the standard ceph settings giving http://pastebin.com/MWLxEazS
This did not cause any slow request warnings so I set "filestore queue max ops = 5000" to increase the number of requests in flight. This resulted in http://pastebin.com/yFnALmGW and also a small number of slow request warnings. I ran it again with similar results http://pastebin.com/VnKSVmsq
If there's anything else you need, please let me know.
David
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: Slow request warnings on 0.48
@ 2012-07-19 10:48 Matthew Richardson
0 siblings, 0 replies; 12+ messages in thread
From: Matthew Richardson @ 2012-07-19 10:48 UTC (permalink / raw)
To: ceph-devel
[-- Attachment #1: Type: text/plain, Size: 1620 bytes --]
I'd just like to report the same behaviour on my test cluster with 0.48.
I've set up a single box (Sl6.1 - 2.6.32-220.23.1 kernel) with 1 mds,
mon and osd, and replication set to '1' for both data and metadata.
Having mounted using ceph-fuse, I'm running a simple fio job to create load:
[global]
directory=/mnt/ceph
size=500M
rw=read
ioengine=libaio
[simple]
I'm then watching the latency with ioping.
With rw=read, rw=randread (random reads) or rw=write (sequential writes)
I see no problems and the latency sits around 1-2ms. However, with
rw=randwrite (random writes) I see the latency jump to between 5 and 60
seconds, and the following types of warning lines appear:
2012-07-19 10:29:39.417625 osd.0 [WRN] 11 slow requests, 6 included
below; oldest blocked for > 54.425766 secs
[WRN] slow request 54.420958 seconds old, received at 2012-07-19
10:28:44.996584: osd_op(client.4113.0:9153 100000003ed.0000003b [write
847872~4096] 0.dc4b476f snapc 1=[]) v4 currently started
2012-07-19 10:29:39.417641 osd.0 [WRN] slow request 54.420587 seconds
old, received at 2012-07-19 10:28:44.996955: osd_op(client.4113.0:9154
100000003ed.00000000 [write 1175552~4096] 0.44a7cb80 snapc 1=[]) v4
currently started
[...snip...]
Let me know if there's any more information that I can provide that
might help with diagnosing the problem (also bearing in mind that I'm
new to ceph so might need extra notes on generating tests, dumps etc :) )
Thanks,
Matthew
--
The University of Edinburgh is a charitable body, registered in
Scotland, with registration number SC005336.
[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 198 bytes --]
^ permalink raw reply [flat|nested] 12+ messages in thread
end of thread, other threads:[~2012-07-19 11:33 UTC | newest]
Thread overview: 12+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2012-07-04 16:53 Slow request warnings on 0.48 David Blundell
2012-07-04 16:58 ` Alexandre DERUMIER
2012-07-04 18:59 ` Gregory Farnum
2012-07-04 22:03 ` David Blundell
2012-07-05 17:58 ` Mark Nelson
2012-07-05 18:16 ` David Blundell
2012-07-05 18:33 ` Alexandre DERUMIER
2012-07-05 18:43 ` David Blundell
2012-07-05 19:49 ` Mark Nelson
2012-07-05 20:21 ` Samuel Just
2012-07-05 22:15 ` David Blundell
2012-07-19 10:48 Matthew Richardson
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.