* read performance not perfect @ 2011-07-18 4:51 huang jun 2011-07-18 17:14 ` Sage Weil 0 siblings, 1 reply; 19+ messages in thread From: huang jun @ 2011-07-18 4:51 UTC (permalink / raw) To: ceph-devel hi,all We test ceph's read performance last week, and find something weird we use ceph v0.30 on linux 2.6.37 mount ceph on back-platform consist of 2 osds \1 mon \1 mds $mount -t ceph 192.168.1.103:/ /mnt -vv $ dd if=/dev/zero of=/mnt/test bs=4M count=200 $ cd .. && umount /mnt $mount -t ceph 192.168.1.103:/ /mnt -vv $dd if=test of=/dev/zero bs=4M 200+0 records in 200+0 records out 838860800 bytes (839 MB) copied, 16.2327 s, 51.7 MB/s but if we use rados to test it $ rados -m 192.168.1.103:6789 -p data bench 60 write $ rados -m 192.168.1.103:6789 -p data bench 60 seq the result is: Total time run: 24.733935 Total reads made: 438 Read size: 4194304 Bandwidth (MB/sec): 70.834 Average Latency: 0.899429 Max latency: 1.85106 Min latency: 0.128017 this phenomenon attracts our attention, then we begin to analysis the osd debug log. we find that : 1) the kernel client send READ request, at first it requests 1MB, and after that it is 512KB 2) from rados test cmd log, OSD recept the READ op with 4MB data to handle we know the ceph developers pay their attention to read and write performance, so i just want to confrim that if the communication between the client and OSD spend more time than it should be? can we request bigger size, just like default object size 4MB, when it occurs to READ operation? or this is related to OS management, if so, what can we do to promote the performance? thanks very much! ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: read performance not perfect 2011-07-18 4:51 read performance not perfect huang jun @ 2011-07-18 17:14 ` Sage Weil 2011-07-20 0:21 ` huang jun [not found] ` <CABAwU-YKmEC=umFLzDb-ykPbzQ9s3sKoUmQbkumExrXEwyveNA@mail.gmail.com> 0 siblings, 2 replies; 19+ messages in thread From: Sage Weil @ 2011-07-18 17:14 UTC (permalink / raw) To: huang jun; +Cc: ceph-devel On Mon, 18 Jul 2011, huang jun wrote: > hi,all > We test ceph's read performance last week, and find something weird > we use ceph v0.30 on linux 2.6.37 > mount ceph on back-platform consist of 2 osds \1 mon \1 mds > $mount -t ceph 192.168.1.103:/ /mnt -vv > $ dd if=/dev/zero of=/mnt/test bs=4M count=200 > $ cd .. && umount /mnt > $mount -t ceph 192.168.1.103:/ /mnt -vv > $dd if=test of=/dev/zero bs=4M > 200+0 records in > 200+0 records out > 838860800 bytes (839 MB) copied, 16.2327 s, 51.7 MB/s > but if we use rados to test it > $ rados -m 192.168.1.103:6789 -p data bench 60 write > $ rados -m 192.168.1.103:6789 -p data bench 60 seq > the result is: > Total time run: 24.733935 > Total reads made: 438 > Read size: 4194304 > Bandwidth (MB/sec): 70.834 > > Average Latency: 0.899429 > Max latency: 1.85106 > Min latency: 0.128017 > this phenomenon attracts our attention, then we begin to analysis the > osd debug log. > we find that : > 1) the kernel client send READ request, at first it requests 1MB, and > after that it is 512KB > 2) from rados test cmd log, OSD recept the READ op with 4MB data to handle > we know the ceph developers pay their attention to read and write > performance, so i just want to confrim that > if the communication between the client and OSD spend more time than > it should be? can we request bigger size, just like default object > size 4MB, when it occurs to READ operation? or this is related to OS > management, if so, what can we do to promote the performance? I think it's related to the way the Linux VFS is doing readahead, and how the ceph fs code is handling it. It's issue #1122 in the tracker and I plan to look at it today or tomorrow! Thanks- sage ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: read performance not perfect 2011-07-18 17:14 ` Sage Weil @ 2011-07-20 0:21 ` huang jun [not found] ` <CABAwU-YKmEC=umFLzDb-ykPbzQ9s3sKoUmQbkumExrXEwyveNA@mail.gmail.com> 1 sibling, 0 replies; 19+ messages in thread From: huang jun @ 2011-07-20 0:21 UTC (permalink / raw) To: Sage Weil; +Cc: ceph-devel thanks for you reply now we find two points confused us: 1) the kernel client execute sequence read though aio_read function, but from OSD log, the dispatch_queue length in OSD is always 0, it means OSD can't got next READ message until client send to it. It seems that async_read changes to sync_read, OSD can't parallely read data, so can not make the most of resources.What are the original purposes when you design this part? perfect realiablity? 2) In singleness read circumstance,during OSD read data from it disk, the OSD doesn't do anything but to wait it finish.We think it was the result of 1), OSD have nothing to do,so just to wait. 2011/7/19 Sage Weil <sage@newdream.net>: > On Mon, 18 Jul 2011, huang jun wrote: >> hi,all >> We test ceph's read performance last week, and find something weird >> we use ceph v0.30 on linux 2.6.37 >> mount ceph on back-platform consist of 2 osds \1 mon \1 mds >> $mount -t ceph 192.168.1.103:/ /mnt -vv >> $ dd if=/dev/zero of=/mnt/test bs=4M count=200 >> $ cd .. && umount /mnt >> $mount -t ceph 192.168.1.103:/ /mnt -vv >> $dd if=test of=/dev/zero bs=4M >> 200+0 records in >> 200+0 records out >> 838860800 bytes (839 MB) copied, 16.2327 s, 51.7 MB/s >> but if we use rados to test it >> $ rados -m 192.168.1.103:6789 -p data bench 60 write >> $ rados -m 192.168.1.103:6789 -p data bench 60 seq >> the result is: >> Total time run: 24.733935 >> Total reads made: 438 >> Read size: 4194304 >> Bandwidth (MB/sec): 70.834 >> >> Average Latency: 0.899429 >> Max latency: 1.85106 >> Min latency: 0.128017 >> this phenomenon attracts our attention, then we begin to analysis the >> osd debug log. >> we find that : >> 1) the kernel client send READ request, at first it requests 1MB, and >> after that it is 512KB >> 2) from rados test cmd log, OSD recept the READ op with 4MB data to handle >> we know the ceph developers pay their attention to read and write >> performance, so i just want to confrim that >> if the communication between the client and OSD spend more time than >> it should be? can we request bigger size, just like default object >> size 4MB, when it occurs to READ operation? or this is related to OS >> management, if so, what can we do to promote the performance? > > I think it's related to the way the Linux VFS is doing readahead, and how > the ceph fs code is handling it. It's issue #1122 in the tracker and I > plan to look at it today or tomorrow! > > Thanks- > sage > -- To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 19+ messages in thread
[parent not found: <CABAwU-YKmEC=umFLzDb-ykPbzQ9s3sKoUmQbkumExrXEwyveNA@mail.gmail.com>]
* Re: read performance not perfect [not found] ` <CABAwU-YKmEC=umFLzDb-ykPbzQ9s3sKoUmQbkumExrXEwyveNA@mail.gmail.com> @ 2011-08-04 15:51 ` Sage Weil 2011-08-04 19:36 ` Fyodor Ustinov 2011-08-09 3:56 ` huang jun 0 siblings, 2 replies; 19+ messages in thread From: Sage Weil @ 2011-08-04 15:51 UTC (permalink / raw) To: huang jun; +Cc: ceph-devel [-- Attachment #1: Type: TEXT/PLAIN, Size: 3738 bytes --] Hi, I've just pushed a wip-readahead branch to ceph-client.git that rewrites ceph_readpages (used for readahead) to be fully asynchronous. This should let us take full advantage of whatever the readahead window is. I'm still doing some testing on this end, but things look good so far. There are two relevant mount options: rasize=NN - max readahead window size (bytes) rsize=MM - max read size rsize defaults to 0 (no limit), which means it effectively maxes out at the stripe size (one object, 4MB by default). rasize now defaults to 8 MB. This is probably what you'll want to experiment with. In practice I think something on the order of 8-12 MB will be best, as it will start loading things of disk ~2 objects ahead of the current position. Can you give it a go and see if this helps in your environment? Thanks! sage On Tue, 19 Jul 2011, huang jun wrote: > thanks for you reply > now we find two points confused us: > 1) the kernel client execute sequence read though aio_read function, > but from OSD log, > the dispatch_queue length in OSD is always 0, it means OSD can't > got next READ message until client send to it. It seems that > async_read changes to sync_read, OSD can't parallely read data, so can > not make the most of resources.What are the original purposes when > you design this part? perfect realiablity? Right. The old ceph_readpages was synhronous, which slowed things down in a couple of different ways. > 2) In singleness read circumstance,during OSD read data from it disk, > the OSD doesn't do anything but to wait it finish.We think it was the > result of 1), OSD have nothing to do,so just to wait. > > > 2011/7/19 Sage Weil <sage@newdream.net>: > > On Mon, 18 Jul 2011, huang jun wrote: > >> hi,all > >> We test ceph's read performance last week, and find something weird > >> we use ceph v0.30 on linux 2.6.37 > >> mount ceph on back-platform consist of 2 osds \1 mon \1 mds > >> $mount -t ceph 192.168.1.103:/ /mnt -vv > >> $ dd if=/dev/zero of=/mnt/test bs=4M count=200 > >> $ cd .. && umount /mnt > >> $mount -t ceph 192.168.1.103:/ /mnt -vv > >> $dd if=test of=/dev/zero bs=4M > >> 200+0 records in > >> 200+0 records out > >> 838860800 bytes (839 MB) copied, 16.2327 s, 51.7 MB/s > >> but if we use rados to test it > >> $ rados -m 192.168.1.103:6789 -p data bench 60 write > >> $ rados -m 192.168.1.103:6789 -p data bench 60 seq > >> the result is: > >> Total time run: 24.733935 > >> Total reads made: 438 > >> Read size: 4194304 > >> Bandwidth (MB/sec): 70.834 > >> > >> Average Latency: 0.899429 > >> Max latency: 1.85106 > >> Min latency: 0.128017 > >> this phenomenon attracts our attention, then we begin to analysis the > >> osd debug log. > >> we find that : > >> 1) the kernel client send READ request, at first it requests 1MB, and > >> after that it is 512KB > >> 2) from rados test cmd log, OSD recept the READ op with 4MB data to handle > >> we know the ceph developers pay their attention to read and write > >> performance, so i just want to confrim that > >> if the communication between the client and OSD spend more time than > >> it should be? can we request bigger size, just like default object > >> size 4MB, when it occurs to READ operation? or this is related to OS > >> management, if so, what can we do to promote the performance? > > > > I think it's related to the way the Linux VFS is doing readahead, and how > > the ceph fs code is handling it. It's issue #1122 in the tracker and I > > plan to look at it today or tomorrow! > > > > Thanks- > > sage > > > > ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: read performance not perfect 2011-08-04 15:51 ` Sage Weil @ 2011-08-04 19:36 ` Fyodor Ustinov 2011-08-04 19:53 ` Sage Weil 2011-08-09 3:56 ` huang jun 1 sibling, 1 reply; 19+ messages in thread From: Fyodor Ustinov @ 2011-08-04 19:36 UTC (permalink / raw) To: ceph-devel Sage Weil <sage <at> newdream.net> writes: > > Hi, > > I've just pushed a wip-readahead branch to ceph-client.git that rewrites > ceph_readpages (used for readahead) to be fully asynchronous. This should > let us take full advantage of whatever the readahead window is. I'm still > doing some testing on this end, but things look good so far. As I understand it's available only in kernel 3.1 ? WBR, Fyodor. ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: read performance not perfect 2011-08-04 19:36 ` Fyodor Ustinov @ 2011-08-04 19:53 ` Sage Weil 2011-08-04 23:38 ` Fyodor Ustinov 0 siblings, 1 reply; 19+ messages in thread From: Sage Weil @ 2011-08-04 19:53 UTC (permalink / raw) To: Fyodor Ustinov; +Cc: ceph-devel On Thu, 4 Aug 2011, Fyodor Ustinov wrote: > Sage Weil <sage <at> newdream.net> writes: > > > > > Hi, > > > > I've just pushed a wip-readahead branch to ceph-client.git that rewrites > > ceph_readpages (used for readahead) to be fully asynchronous. This should > > let us take full advantage of whatever the readahead window is. I'm still > > doing some testing on this end, but things look good so far. > > As I understand it's available only in kernel 3.1 ? The current patches are on top of v3.0, but you should be able to rebase the readahead stuff on top of anything reasonably recent. sage ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: read performance not perfect 2011-08-04 19:53 ` Sage Weil @ 2011-08-04 23:38 ` Fyodor Ustinov 2011-08-05 1:26 ` Sage Weil 0 siblings, 1 reply; 19+ messages in thread From: Fyodor Ustinov @ 2011-08-04 23:38 UTC (permalink / raw) To: Sage Weil; +Cc: ceph-devel On 08/04/2011 10:53 PM, Sage Weil wrote: > > The current patches are on top of v3.0, but you should be able to rebase > the readahead stuff on top of anything reasonably recent. > > sage As usual. cluster - latest 0.32 from your ubuntu rep. client - latest git-pulled kernel. dd file from cluster to /dev/null and press ctrl-c. In syslog: [ 12.950114] libceph: mon0 10.5.51.230:6789 connection failed [ 19.971512] libceph: client4119 fsid af9be081-9777-e2cc-8988-ba02fff0f390 [ 19.971845] libceph: mon0 10.5.51.230:6789 session established [ 92.891202] libceph: try_read bad con->in_tag = -108 [ 92.891258] libceph: osd5 10.5.51.145:6801 protocol error, garbage tag [ 114.508350] libceph: try_read bad con->in_tag = 122 [ 114.508406] libceph: osd1 10.5.51.141:6800 protocol error, garbage tag [ 119.077246] libceph: try_read bad con->in_tag = -39 [ 119.077301] libceph: osd7 10.5.51.147:6801 protocol error, garbage tag WBR, Fyodor. ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: read performance not perfect 2011-08-04 23:38 ` Fyodor Ustinov @ 2011-08-05 1:26 ` Sage Weil 2011-08-05 6:34 ` Fyodor Ustinov 0 siblings, 1 reply; 19+ messages in thread From: Sage Weil @ 2011-08-05 1:26 UTC (permalink / raw) To: Fyodor Ustinov; +Cc: ceph-devel On Fri, 5 Aug 2011, Fyodor Ustinov wrote: > On 08/04/2011 10:53 PM, Sage Weil wrote: > > > > The current patches are on top of v3.0, but you should be able to rebase > > the readahead stuff on top of anything reasonably recent. > > > > sage > > As usual. > cluster - latest 0.32 from your ubuntu rep. > client - latest git-pulled kernel. > > dd file from cluster to /dev/null and press ctrl-c. In syslog: > > [ 12.950114] libceph: mon0 10.5.51.230:6789 connection failed > [ 19.971512] libceph: client4119 fsid af9be081-9777-e2cc-8988-ba02fff0f390 > [ 19.971845] libceph: mon0 10.5.51.230:6789 session established > [ 92.891202] libceph: try_read bad con->in_tag = -108 > [ 92.891258] libceph: osd5 10.5.51.145:6801 protocol error, garbage tag > [ 114.508350] libceph: try_read bad con->in_tag = 122 > [ 114.508406] libceph: osd1 10.5.51.141:6800 protocol error, garbage tag > [ 119.077246] libceph: try_read bad con->in_tag = -39 > [ 119.077301] libceph: osd7 10.5.51.147:6801 protocol error, garbage tag Hmm, this is something new. Can you confirm which commit you're running? Have you seen this before? It may be in the batch of stuff on top of 3.0. sage ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: read performance not perfect 2011-08-05 1:26 ` Sage Weil @ 2011-08-05 6:34 ` Fyodor Ustinov 2011-08-05 16:07 ` Sage Weil 0 siblings, 1 reply; 19+ messages in thread From: Fyodor Ustinov @ 2011-08-05 6:34 UTC (permalink / raw) To: Sage Weil; +Cc: ceph-devel On 08/05/2011 04:26 AM, Sage Weil wrote: > On Fri, 5 Aug 2011, Fyodor Ustinov wrote: >> On 08/04/2011 10:53 PM, Sage Weil wrote: >>> The current patches are on top of v3.0, but you should be able to rebase >>> the readahead stuff on top of anything reasonably recent. >>> >>> sage >> As usual. >> cluster - latest 0.32 from your ubuntu rep. >> client - latest git-pulled kernel. >> >> dd file from cluster to /dev/null and press ctrl-c. In syslog: >> >> [ 12.950114] libceph: mon0 10.5.51.230:6789 connection failed >> [ 19.971512] libceph: client4119 fsid af9be081-9777-e2cc-8988-ba02fff0f390 >> [ 19.971845] libceph: mon0 10.5.51.230:6789 session established >> [ 92.891202] libceph: try_read bad con->in_tag = -108 >> [ 92.891258] libceph: osd5 10.5.51.145:6801 protocol error, garbage tag >> [ 114.508350] libceph: try_read bad con->in_tag = 122 >> [ 114.508406] libceph: osd1 10.5.51.141:6800 protocol error, garbage tag >> [ 119.077246] libceph: try_read bad con->in_tag = -39 >> [ 119.077301] libceph: osd7 10.5.51.147:6801 protocol error, garbage tag > Hmm, this is something new. Can you confirm which commit you're running? Well. More detailed. 1. Cluster: 8 physical servers with 14 osd servers (fs - xfs) + 1 physical server with mon+mds. Ceph version - 0.32 from repository on all servers and clients. 2. Fresh ceph fs. (Really fresh - I made this fs from scratch) 3. One client via cfuse slowly fills the cluster by some data (7T). Really slowly (about 1G in minute). But we are talking about another client. Kernel for this client git pulled from git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git (it's latest kernel). On client ceph mounted via fstab: 10.5.51.230:/dcvolia/bacula /bacula ceph _netdev,rw 0 0 Now make show: root@amanda:/bacula/archive/zab.servers.dcv# cd /bacula/archive/zab.servers.dcv root@amanda:/bacula/archive/zab.servers.dcv# ls -alh total 100G drwxr-xr-x 1 bacula tape 100G 2011-07-31 00:05 . drwxr-xr-x 1 bacula tape 253G 2011-07-18 15:21 .. -rw-r----- 1 bacula tape 23G 2011-08-05 00:40 zab.servers.dcv-daily-20110719-000519 -rw-r----- 1 bacula tape 28G 2011-07-25 00:39 zab.servers.dcv-daily-20110719-003333 -rw-r----- 1 bacula tape 32G 2011-08-01 00:42 zab.servers.dcv-daily-20110726-000515 -rw-r----- 1 bacula tape 6.2G 2011-07-18 12:29 zab.servers.dcv-monthly-20110718-111036 -rw-r----- 1 bacula tape 6.1G 2011-07-24 01:22 zab.servers.dcv-weekly-20110724-000518 -rw-r----- 1 bacula tape 6.1G 2011-07-31 01:22 zab.servers.dcv-weekly-20110731-000522 root@amanda:/bacula/archive/zab.servers.dcv# dd if=zab.servers.dcv-daily-20110719-000519 of=/dev/null bs=8M ^C34+1 records in 34+0 records out 285212672 bytes (285 MB) copied, 5.04607 s, 56.5 MB/s [24983.180068] libceph: get_reply unknown tid 6215 from osd6 root@amanda:/bacula/archive/zab.servers.dcv# dd if=zab.servers.dcv-daily-20110719-000519 of=/dev/null bs=8M ^C24+1 records in 24+0 records out 201326592 bytes (201 MB) copied, 2.4007 s, 83.9 MB/s [25035.656266] libceph: get_reply unknown tid 7025 from osd1 root@amanda:/bacula/archive/zab.servers.dcv# dd if=zab.servers.dcv-daily-20110719-000519 of=/dev/null bs=8M ^C130+1 records in 130+0 records out 1090519040 bytes (1.1 GB) copied, 14.9645 s, 72.9 MB/s root@amanda:/bacula/archive/zab.servers.dcv# [25088.452033] libceph: try_read bad con->in_tag = 106 [25088.452087] libceph: osd13 10.5.51.146:6800 protocol error, garbage tag root@amanda:/bacula/archive/zab.servers.dcv# dd if=zab.servers.dcv-daily-20110719-000519 of=/dev/null bs=8M ^C104+1 records in 104+0 records out 872415232 bytes (872 MB) copied, 10.5863 s, 82.4 MB/s [25166.344264] libceph: try_read bad con->in_tag = 122 [25166.344317] libceph: osd4 10.5.51.144:6800 protocol error, garbage tag and so on. > Have you seen this before? Never. > It may be in the batch of stuff on top of > 3.0. > May be. BTW, dramatically increase read speed I do not see. :( WBR, Fyodor. ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: read performance not perfect 2011-08-05 6:34 ` Fyodor Ustinov @ 2011-08-05 16:07 ` Sage Weil 2011-08-05 19:30 ` Fyodor Ustinov 2011-08-06 11:03 ` Fyodor Ustinov 0 siblings, 2 replies; 19+ messages in thread From: Sage Weil @ 2011-08-05 16:07 UTC (permalink / raw) To: Fyodor Ustinov; +Cc: ceph-devel On Fri, 5 Aug 2011, Fyodor Ustinov wrote: > On 08/05/2011 04:26 AM, Sage Weil wrote: > > On Fri, 5 Aug 2011, Fyodor Ustinov wrote: > > > On 08/04/2011 10:53 PM, Sage Weil wrote: > > > > The current patches are on top of v3.0, but you should be able to rebase > > > > the readahead stuff on top of anything reasonably recent. > > > > > > > > sage > > > As usual. > > > cluster - latest 0.32 from your ubuntu rep. > > > client - latest git-pulled kernel. > > > > > > dd file from cluster to /dev/null and press ctrl-c. In syslog: > > > > > > [ 12.950114] libceph: mon0 10.5.51.230:6789 connection failed > > > [ 19.971512] libceph: client4119 fsid > > > af9be081-9777-e2cc-8988-ba02fff0f390 > > > [ 19.971845] libceph: mon0 10.5.51.230:6789 session established > > > [ 92.891202] libceph: try_read bad con->in_tag = -108 > > > [ 92.891258] libceph: osd5 10.5.51.145:6801 protocol error, garbage tag > > > [ 114.508350] libceph: try_read bad con->in_tag = 122 > > > [ 114.508406] libceph: osd1 10.5.51.141:6800 protocol error, garbage tag > > > [ 119.077246] libceph: try_read bad con->in_tag = -39 > > > [ 119.077301] libceph: osd7 10.5.51.147:6801 protocol error, garbage tag > > Hmm, this is something new. Can you confirm which commit you're running? > Well. More detailed. > > 1. Cluster: 8 physical servers with 14 osd servers (fs - xfs) + 1 physical > server with mon+mds. Ceph version - 0.32 from repository on all servers and > clients. > 2. Fresh ceph fs. (Really fresh - I made this fs from scratch) > 3. One client via cfuse slowly fills the cluster by some data (7T). Really > slowly (about 1G in minute). > > But we are talking about another client. > > Kernel for this client git pulled from > git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git (it's latest > kernel). This is the problem. The readahead patches in the master branch of git://ceph.newdream.net/git/ceph-client.git. They're not upstream yet. Sorry that wasn't clear! > On client ceph mounted via fstab: > > 10.5.51.230:/dcvolia/bacula /bacula ceph _netdev,rw 0 0 > > Now make show: > > root@amanda:/bacula/archive/zab.servers.dcv# cd > /bacula/archive/zab.servers.dcv > root@amanda:/bacula/archive/zab.servers.dcv# ls -alh > total 100G > drwxr-xr-x 1 bacula tape 100G 2011-07-31 00:05 . > drwxr-xr-x 1 bacula tape 253G 2011-07-18 15:21 .. > -rw-r----- 1 bacula tape 23G 2011-08-05 00:40 > zab.servers.dcv-daily-20110719-000519 > -rw-r----- 1 bacula tape 28G 2011-07-25 00:39 > zab.servers.dcv-daily-20110719-003333 > -rw-r----- 1 bacula tape 32G 2011-08-01 00:42 > zab.servers.dcv-daily-20110726-000515 > -rw-r----- 1 bacula tape 6.2G 2011-07-18 12:29 > zab.servers.dcv-monthly-20110718-111036 > -rw-r----- 1 bacula tape 6.1G 2011-07-24 01:22 > zab.servers.dcv-weekly-20110724-000518 > -rw-r----- 1 bacula tape 6.1G 2011-07-31 01:22 > zab.servers.dcv-weekly-20110731-000522 > root@amanda:/bacula/archive/zab.servers.dcv# dd > if=zab.servers.dcv-daily-20110719-000519 of=/dev/null bs=8M > ^C34+1 records in > 34+0 records out > 285212672 bytes (285 MB) copied, 5.04607 s, 56.5 MB/s > > [24983.180068] libceph: get_reply unknown tid 6215 from osd6 This message is normal. We should probably turn down the debug level, or try to detect whether it is expected or not. > root@amanda:/bacula/archive/zab.servers.dcv# dd > if=zab.servers.dcv-daily-20110719-000519 of=/dev/null bs=8M > ^C24+1 records in > 24+0 records out > 201326592 bytes (201 MB) copied, 2.4007 s, 83.9 MB/s > > [25035.656266] libceph: get_reply unknown tid 7025 from osd1 > > root@amanda:/bacula/archive/zab.servers.dcv# dd > if=zab.servers.dcv-daily-20110719-000519 of=/dev/null bs=8M > ^C130+1 records in > 130+0 records out > 1090519040 bytes (1.1 GB) copied, 14.9645 s, 72.9 MB/s > > root@amanda:/bacula/archive/zab.servers.dcv# > > [25088.452033] libceph: try_read bad con->in_tag = 106 > [25088.452087] libceph: osd13 10.5.51.146:6800 protocol error, garbage tag This is not. I'll open a bug and try to track this one down. It looks new. Thanks! sage > > root@amanda:/bacula/archive/zab.servers.dcv# dd > if=zab.servers.dcv-daily-20110719-000519 of=/dev/null bs=8M > ^C104+1 records in > 104+0 records out > 872415232 bytes (872 MB) copied, 10.5863 s, 82.4 MB/s > > [25166.344264] libceph: try_read bad con->in_tag = 122 > [25166.344317] libceph: osd4 10.5.51.144:6800 protocol error, garbage tag > > and so on. > > > > Have you seen this before? > Never. > > It may be in the batch of stuff on top of > > 3.0. > > > May be. > > BTW, dramatically increase read speed I do not see. :( > > WBR, > Fyodor. > -- > To unsubscribe from this list: send the line "unsubscribe ceph-devel" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html > > ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: read performance not perfect 2011-08-05 16:07 ` Sage Weil @ 2011-08-05 19:30 ` Fyodor Ustinov 2011-08-05 19:35 ` Gregory Farnum 2011-08-05 20:17 ` Sage Weil 2011-08-06 11:03 ` Fyodor Ustinov 1 sibling, 2 replies; 19+ messages in thread From: Fyodor Ustinov @ 2011-08-05 19:30 UTC (permalink / raw) To: Sage Weil; +Cc: ceph-devel On 08/05/2011 07:07 PM, Sage Weil wrote: > > This is the problem. The readahead patches in the master branch of > git://ceph.newdream.net/git/ceph-client.git. They're not upstream yet. > Sorry that wasn't clear! http://git.kernel.org/?p=linux/kernel/git/torvalds/linux.git;a=commit;h=e9852227431a0ed6ceda064f33e4218757acab6c - it's not this patch? WBR, Fyodor. ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: read performance not perfect 2011-08-05 19:30 ` Fyodor Ustinov @ 2011-08-05 19:35 ` Gregory Farnum 2011-08-05 20:17 ` Sage Weil 1 sibling, 0 replies; 19+ messages in thread From: Gregory Farnum @ 2011-08-05 19:35 UTC (permalink / raw) To: Fyodor Ustinov; +Cc: Sage Weil, ceph-devel On Fri, Aug 5, 2011 at 12:30 PM, Fyodor Ustinov <ufm@ufm.su> wrote: > On 08/05/2011 07:07 PM, Sage Weil wrote: >> >> This is the problem. The readahead patches in the master branch of >> git://ceph.newdream.net/git/ceph-client.git. They're not upstream yet. >> Sorry that wasn't clear! > > http://git.kernel.org/?p=linux/kernel/git/torvalds/linux.git;a=commit;h=e9852227431a0ed6ceda064f33e4218757acab6c > - it's not this patch? Nope, that patch essentially just adjusted a preference setting. The full set of patches are much more extensive. -Greg -- To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: read performance not perfect 2011-08-05 19:30 ` Fyodor Ustinov 2011-08-05 19:35 ` Gregory Farnum @ 2011-08-05 20:17 ` Sage Weil 2011-08-05 21:12 ` Fyodor Ustinov 1 sibling, 1 reply; 19+ messages in thread From: Sage Weil @ 2011-08-05 20:17 UTC (permalink / raw) To: Fyodor Ustinov; +Cc: ceph-devel On Fri, 5 Aug 2011, Fyodor Ustinov wrote: > On 08/05/2011 07:07 PM, Sage Weil wrote: > > > > This is the problem. The readahead patches in the master branch of > > git://ceph.newdream.net/git/ceph-client.git. They're not upstream yet. > > Sorry that wasn't clear! > > http://git.kernel.org/?p=linux/kernel/git/torvalds/linux.git;a=commit;h=e9852227431a0ed6ceda064f33e4218757acab6c > - it's not this patch? Nope, it's ebd62c49c0a71a9af6b92b4f0cedfd2b1d46c16e, in ceph-client.git. Then d0a287e18a81a0314a9aa82b6f54eb7f5ecabd60 bumps up the default rasize window. FWIW I saw a big jump in read spead on my cluster (now fully saturates the client interface). sage ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: read performance not perfect 2011-08-05 20:17 ` Sage Weil @ 2011-08-05 21:12 ` Fyodor Ustinov 2011-08-08 17:52 ` Fyodor Ustinov 0 siblings, 1 reply; 19+ messages in thread From: Fyodor Ustinov @ 2011-08-05 21:12 UTC (permalink / raw) To: Sage Weil; +Cc: ceph-devel On 08/05/2011 11:17 PM, Sage Weil wrote: > On Fri, 5 Aug 2011, Fyodor Ustinov wrote: >> On 08/05/2011 07:07 PM, Sage Weil wrote: >>> This is the problem. The readahead patches in the master branch of >>> git://ceph.newdream.net/git/ceph-client.git. They're not upstream yet. >>> Sorry that wasn't clear! >> http://git.kernel.org/?p=linux/kernel/git/torvalds/linux.git;a=commit;h=e9852227431a0ed6ceda064f33e4218757acab6c >> - it's not this patch? > Nope, it's ebd62c49c0a71a9af6b92b4f0cedfd2b1d46c16e, in ceph-client.git. > Then d0a287e18a81a0314a9aa82b6f54eb7f5ecabd60 bumps up the default rasize > window. > > FWIW I saw a big jump in read spead on my cluster (now fully saturates the > client interface). > > sage Well. I wait net kernel. :) WBR, Fyodor. ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: read performance not perfect 2011-08-05 21:12 ` Fyodor Ustinov @ 2011-08-08 17:52 ` Fyodor Ustinov 2011-08-08 19:14 ` Sage Weil 0 siblings, 1 reply; 19+ messages in thread From: Fyodor Ustinov @ 2011-08-08 17:52 UTC (permalink / raw) To: Sage Weil; +Cc: ceph-devel On 08/06/2011 12:12 AM, Fyodor Ustinov wrote: > On 08/05/2011 11:17 PM, Sage Weil wrote: >> On Fri, 5 Aug 2011, Fyodor Ustinov wrote: >>> On 08/05/2011 07:07 PM, Sage Weil wrote: >>>> This is the problem. The readahead patches in the master branch of >>>> git://ceph.newdream.net/git/ceph-client.git. They're not upstream >>>> yet. >>>> Sorry that wasn't clear! >>> http://git.kernel.org/?p=linux/kernel/git/torvalds/linux.git;a=commit;h=e9852227431a0ed6ceda064f33e4218757acab6c >>> >>> - it's not this patch? >> Nope, it's ebd62c49c0a71a9af6b92b4f0cedfd2b1d46c16e, in ceph-client.git. >> Then d0a287e18a81a0314a9aa82b6f54eb7f5ecabd60 bumps up the default >> rasize >> window. >> >> FWIW I saw a big jump in read spead on my cluster (now fully >> saturates the >> client interface). >> >> sage > Well. I wait net kernel. :) Sage, 3.1-rc1 released. This release has the necessary patches? WBR, Fyodor. ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: read performance not perfect 2011-08-08 17:52 ` Fyodor Ustinov @ 2011-08-08 19:14 ` Sage Weil 0 siblings, 0 replies; 19+ messages in thread From: Sage Weil @ 2011-08-08 19:14 UTC (permalink / raw) To: Fyodor Ustinov; +Cc: ceph-devel On Mon, 8 Aug 2011, Fyodor Ustinov wrote: > On 08/06/2011 12:12 AM, Fyodor Ustinov wrote: > > On 08/05/2011 11:17 PM, Sage Weil wrote: > > > On Fri, 5 Aug 2011, Fyodor Ustinov wrote: > > > > On 08/05/2011 07:07 PM, Sage Weil wrote: > > > > > This is the problem. The readahead patches in the master branch of > > > > > git://ceph.newdream.net/git/ceph-client.git. They're not upstream > > > > > yet. > > > > > Sorry that wasn't clear! > > > > http://git.kernel.org/?p=linux/kernel/git/torvalds/linux.git;a=commit;h=e9852227431a0ed6ceda064f33e4218757acab6c > > > > - it's not this patch? > > > Nope, it's ebd62c49c0a71a9af6b92b4f0cedfd2b1d46c16e, in ceph-client.git. > > > Then d0a287e18a81a0314a9aa82b6f54eb7f5ecabd60 bumps up the default rasize > > > window. > > > > > > FWIW I saw a big jump in read spead on my cluster (now fully saturates the > > > client interface). > > > > > > sage > > Well. I wait net kernel. :) > Sage, 3.1-rc1 released. This release has the necessary patches? Not readahead, no. I didn't have the patches done and tested in time. That'll go into 3.2. sage ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: read performance not perfect 2011-08-05 16:07 ` Sage Weil 2011-08-05 19:30 ` Fyodor Ustinov @ 2011-08-06 11:03 ` Fyodor Ustinov 2011-08-06 19:08 ` Sage Weil 1 sibling, 1 reply; 19+ messages in thread From: Fyodor Ustinov @ 2011-08-06 11:03 UTC (permalink / raw) To: Sage Weil; +Cc: ceph-devel On 08/05/2011 07:07 PM, Sage Weil wrote: > > This is not. I'll open a bug and try to track this one down. It looks > new. In yours kernel version I do not see this trouble. WBR, Fyodor. ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: read performance not perfect 2011-08-06 11:03 ` Fyodor Ustinov @ 2011-08-06 19:08 ` Sage Weil 0 siblings, 0 replies; 19+ messages in thread From: Sage Weil @ 2011-08-06 19:08 UTC (permalink / raw) To: Fyodor Ustinov; +Cc: ceph-devel On Sat, 6 Aug 2011, Fyodor Ustinov wrote: > On 08/05/2011 07:07 PM, Sage Weil wrote: > > > > This is not. I'll open a bug and try to track this one down. It looks > > new. > In yours kernel version I do not see this trouble. Oh, this might have been the bug Jim was seeing a few weeks back, fixed by 0da5d70369e87f80adf794080cfff1ca15a34198 (merged into 3.0-rc1). In any case, if you see this again with the current code, let us know! sage ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: read performance not perfect 2011-08-04 15:51 ` Sage Weil 2011-08-04 19:36 ` Fyodor Ustinov @ 2011-08-09 3:56 ` huang jun 1 sibling, 0 replies; 19+ messages in thread From: huang jun @ 2011-08-09 3:56 UTC (permalink / raw) To: Sage Weil; +Cc: ceph-devel hi,sage we have a test recently,use 5 OSDs on v0.30, OS is linux-2.6.39 the read speed increased to 79MB/s at first read, and avg ups to 85MB/s~90MB/s, about two times of our former test ,it promotes read performance very much. but we don't know whether it lives up to your expectations. 2011/8/4 Sage Weil <sage@newdream.net>: > Hi, > > I've just pushed a wip-readahead branch to ceph-client.git that rewrites > ceph_readpages (used for readahead) to be fully asynchronous. This should > let us take full advantage of whatever the readahead window is. I'm still > doing some testing on this end, but things look good so far. > > There are two relevant mount options: > > rasize=NN - max readahead window size (bytes) > rsize=MM - max read size > > rsize defaults to 0 (no limit), which means it effectively maxes out at > the stripe size (one object, 4MB by default). > > rasize now defaults to 8 MB. This is probably what you'll want to > experiment with. In practice I think something on the order of 8-12 MB > will be best, as it will start loading things of disk ~2 objects ahead of > the current position. > > Can you give it a go and see if this helps in your environment? > > Thanks! > sage > > > On Tue, 19 Jul 2011, huang jun wrote: >> thanks for you reply >> now we find two points confused us: >> 1) the kernel client execute sequence read though aio_read function, >> but from OSD log, >> the dispatch_queue length in OSD is always 0, it means OSD can't >> got next READ message until client send to it. It seems that >> async_read changes to sync_read, OSD can't parallely read data, so can >> not make the most of resources.What are the original purposes when >> you design this part? perfect realiablity? > > Right. The old ceph_readpages was synhronous, which slowed things down in > a couple of different ways. > >> 2) In singleness read circumstance,during OSD read data from it disk, >> the OSD doesn't do anything but to wait it finish.We think it was the >> result of 1), OSD have nothing to do,so just to wait. >> >> >> 2011/7/19 Sage Weil <sage@newdream.net>: >> > On Mon, 18 Jul 2011, huang jun wrote: >> >> hi,all >> >> We test ceph's read performance last week, and find something weird >> >> we use ceph v0.30 on linux 2.6.37 >> >> mount ceph on back-platform consist of 2 osds \1 mon \1 mds >> >> $mount -t ceph 192.168.1.103:/ /mnt -vv >> >> $ dd if=/dev/zero of=/mnt/test bs=4M count=200 >> >> $ cd .. && umount /mnt >> >> $mount -t ceph 192.168.1.103:/ /mnt -vv >> >> $dd if=test of=/dev/zero bs=4M >> >> 200+0 records in >> >> 200+0 records out >> >> 838860800 bytes (839 MB) copied, 16.2327 s, 51.7 MB/s >> >> but if we use rados to test it >> >> $ rados -m 192.168.1.103:6789 -p data bench 60 write >> >> $ rados -m 192.168.1.103:6789 -p data bench 60 seq >> >> the result is: >> >> Total time run: 24.733935 >> >> Total reads made: 438 >> >> Read size: 4194304 >> >> Bandwidth (MB/sec): 70.834 >> >> >> >> Average Latency: 0.899429 >> >> Max latency: 1.85106 >> >> Min latency: 0.128017 >> >> this phenomenon attracts our attention, then we begin to analysis the >> >> osd debug log. >> >> we find that : >> >> 1) the kernel client send READ request, at first it requests 1MB, and >> >> after that it is 512KB >> >> 2) from rados test cmd log, OSD recept the READ op with 4MB data to handle >> >> we know the ceph developers pay their attention to read and write >> >> performance, so i just want to confrim that >> >> if the communication between the client and OSD spend more time than >> >> it should be? can we request bigger size, just like default object >> >> size 4MB, when it occurs to READ operation? or this is related to OS >> >> management, if so, what can we do to promote the performance? >> > >> > I think it's related to the way the Linux VFS is doing readahead, and how >> > the ceph fs code is handling it. It's issue #1122 in the tracker and I >> > plan to look at it today or tomorrow! >> > >> > Thanks- >> > sage >> > >> >> -- To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 19+ messages in thread
end of thread, other threads:[~2011-08-09 3:56 UTC | newest] Thread overview: 19+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2011-07-18 4:51 read performance not perfect huang jun 2011-07-18 17:14 ` Sage Weil 2011-07-20 0:21 ` huang jun [not found] ` <CABAwU-YKmEC=umFLzDb-ykPbzQ9s3sKoUmQbkumExrXEwyveNA@mail.gmail.com> 2011-08-04 15:51 ` Sage Weil 2011-08-04 19:36 ` Fyodor Ustinov 2011-08-04 19:53 ` Sage Weil 2011-08-04 23:38 ` Fyodor Ustinov 2011-08-05 1:26 ` Sage Weil 2011-08-05 6:34 ` Fyodor Ustinov 2011-08-05 16:07 ` Sage Weil 2011-08-05 19:30 ` Fyodor Ustinov 2011-08-05 19:35 ` Gregory Farnum 2011-08-05 20:17 ` Sage Weil 2011-08-05 21:12 ` Fyodor Ustinov 2011-08-08 17:52 ` Fyodor Ustinov 2011-08-08 19:14 ` Sage Weil 2011-08-06 11:03 ` Fyodor Ustinov 2011-08-06 19:08 ` Sage Weil 2011-08-09 3:56 ` huang jun
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.