* Buffered write slowness
@ 2004-10-26 1:14 Jesse Barnes
2004-10-29 17:46 ` Buffered I/O slowness Jesse Barnes
0 siblings, 1 reply; 7+ messages in thread
From: Jesse Barnes @ 2004-10-26 1:14 UTC (permalink / raw)
To: linux-kernel
[-- Attachment #1: Type: text/plain, Size: 2140 bytes --]
I've been doing some simple disk I/O benchmarking with an eye towards
improving large, striped volume bandwidth. I ran some tests on individual
disks and filesystems to establish a baseline and found that things generally
scale quite well:
o one thread/disk using O_DIRECT on the block device
read avg: 2784.81 MB/s
write avg: 2585.60 MB/s
o one thread/disk using O_DIRECT + filesystem
read avg: 2635.98 MB/s
write avg: 2573.39 MB/s
o one thread/disk using buffered I/O + filesystem
read w/default (128) block/*/queue/read_ahead_kb avg: 2626.25 MB/s
read w/max (4096) block/*/queue/read_ahead_kb avg: 2652.62 MB/s
write avg: 1394.99 MB/s
Configuration:
o 8p sn2 ia64 box
o 8GB memory
o 58 disks across 16 controllers
(4 disks for 10 of them and 3 for the other 6)
o aggregate I/O bw available is about 2.8GB/s
Test:
o one I/O thread per disk, round robined across the 8 CPUs
o each thread did ~450MB of I/O depending on the test (ran for 10s)
Note: the total was > 8GB so in the buffered read case not everything
could be cached
As you can see, for a test that does one thread/disk things look really good
(very close to the available bandwidth in the system) with the exception of
buffered writes. I've attached the vmstat and profile from that run in case
anyone's interested. It seems that there was some spinlock contention in
that run that wasn't present in other runs.
Preliminary runs on a large volume showed that a single thread reading from a
striped volume w/O_DIRECT performed poorly, while a single thread writing to
a volume the same way was able to get slightly over 1GB/s. Using multiple
read threads against the volume increased the bandwidth to near 1GB/s, but
multiple threads writing slightly slowed performance. My tests and the
system configuration have changed slightly though, so don't put much stock in
these numbers until I rerun them (and collect profiles and such).
Thanks,
Jesse
P.S. The 'dev-fs' in the filenames doesn't mean I was using devfs (I wasn't,
not that it should matter), just that I was running per-dev tests with a
filesystem. :)
[-- Attachment #2: profile-buffered-write-dev-fs.txt --]
[-- Type: text/plain, Size: 1711 bytes --]
598157 total 0.1052
132002 _spin_unlock_irq 2062.5312
87219 ia64_pal_call_static 454.2656
72515 default_idle 161.8638
60019 __copy_user 25.3459
37898 ia64_spinlock_contention 394.7708
32167 _spin_unlock_irqrestore 335.0729
15351 kmem_cache_free 39.9766
11585 smp_call_function 10.3438
10047 kmem_cache_alloc 39.2461
10007 ia64_save_scratch_fpregs 156.3594
9994 ia64_load_scratch_fpregs 156.1562
7125 bio_put 24.7396
6540 __end_that_request_first 5.5236
6064 shrink_list 1.3345
5829 buffered_rmqueue 3.1406
5137 mempool_alloc 4.8646
4986 set_bh_page 17.3125
4261 bio_alloc 3.0967
4069 end_bio_bh_io_sync 9.0826
3906 submit_bh 4.3594
3653 wake_up_page 28.5391
3607 drop_buffers 6.6305
3381 __might_sleep 5.5609
3335 free_hot_cold_page 3.2568
3157 writeback_inodes 3.5234
3105 __alloc_pages 1.2937
2533 submit_bio 3.0445
2486 __block_prepare_write 0.9033
2335 mark_buffer_async_write 18.2422
[-- Attachment #3: vmstat-buffered-write-dev-fs.txt --]
[-- Type: text/plain, Size: 6879 bytes --]
[root@junkbond ~]# vmstat 1
procs -----------memory---------- ---swap-- -----io---- --system-- ----cpu----
r b swpd free buff cache si so bi bo in cs us sy id wa
0 0 2544 230240 1040 6981472 0 0 4342 10272 7 77 0 4 93 3
0 0 2544 231776 1040 6981472 0 0 24 16515 8529 143 0 0 100 0
0 0 2544 230880 1040 6981472 0 0 64 276273 10068 96 0 4 96 0
63 0 2544 14176 1040 7196128 0 0 24 303435 13281 216 0 12 88 0
62 1 2544 8736 1104 7187808 0 0 72 755830 51021 5716 0 100 0 0
63 2 2544 5728 656 7182064 0 0 4 1330749 79954 10109 0 100 0 0
61 17 2544 9056 672 7107744 0 0 104 2905708 68129 8461 0 100 0 0
58 23 2544 6368 672 7052016 0 0 64 2432964 66482 6400 0 100 0 0
60 25 2544 14816 848 7025008 0 0 428 1755180 68765 8282 0 100 0 0
57 20 2544 5856 448 7021280 0 0 72 1216548 63333 7905 0 100 0 0
58 17 2544 10272 448 7006832 0 0 32 1097956 61771 8779 0 100 0 0
57 10 2544 8224 624 7010784 0 0 464 1083460 60803 6123 0 100 0 0
62 14 2544 5792 464 7015072 0 0 100 1005260 63960 5773 0 100 0 0
5 1 2544 14912 752 7000336 0 0 620 1060772 60624 5679 0 98 1 0
0 1 2544 14784 976 6998048 0 0 856 63972 16941 3954 1 16 70 13
0 0 2544 14720 976 6998048 0 0 16 4920 14865 1285 0 9 82 9
0 1 2544 14784 976 6998048 0 0 8 23728 13974 40 0 7 93 0
0 1 2544 15040 976 6998048 0 0 0 18432 14097 847 0 8 81 11
63 1 2544 8640 1040 7008304 0 0 64 140656 39666 2799 0 40 52 8
59 1 2544 11328 1008 7004208 0 0 80 915952 64727 8821 0 100 0 0
62 7 2544 9952 448 7015088 0 0 64 2327688 76873 9117 0 100 0 0
60 12 2544 6688 448 7013024 0 0 32 2457992 66550 8099 0 100 0 0
61 15 2544 8032 448 7010960 0 0 80 1851480 66455 7185 0 100 0 0
60 18 2544 6144 448 6994448 0 0 80 1488552 70833 8758 0 100 0 0
60 13 2544 7456 448 6984128 0 0 32 1207904 65113 7116 0 100 0 0
58 10 2544 5664 448 6982064 0 0 32 985696 64674 7964 0 100 0 0
60 21 2544 7840 448 6984128 0 0 64 976540 61481 7015 0 100 0 0
59 15 2544 8736 448 6982064 0 0 88 869564 60383 6613 0 100 0 0
61 11 2544 7616 736 6981776 0 0 592 1183640 63801 7145 0 100 0 0
4 1 2544 85376 896 6971296 0 0 924 193617 22851 4451 1 27 62 11
0 1 2544 11904 896 6971296 0 0 12 0 14137 1325 0 10 79 11
0 1 2544 11840 896 6971296 0 0 0 6 13729 446 0 7 82 11
0 1 2544 11968 896 6971296 0 0 0 2 13719 952 0 7 82 11
0 1 2544 12160 960 6971232 0 0 8 69 12409 858 0 6 82 12
57 1 2544 13312 960 6971232 0 0 96 686610 58996 5417 0 97 3 0
58 1 2544 11008 448 6980000 0 0 104 1973020 81572 12784 0 100 0 0
60 19 2544 9632 448 6977936 0 0 88 2302016 74034 9208 0 100 0 0
procs -----------memory---------- ---swap-- -----io---- --system-- ----cpu----
r b swpd free buff cache si so bi bo in cs us sy id wa
62 20 2544 11872 448 6984128 0 0 48 1914352 71746 8483 0 100 0 0
56 22 2544 5472 448 6984128 0 0 56 1413780 68497 7051 0 100 0 0
60 12 2544 10912 448 6980000 0 0 112 1076532 67341 7422 0 100 0 0
59 19 2544 5984 448 6984128 0 0 56 1129136 69899 7353 0 100 0 0
55 19 2544 9696 448 6980000 0 0 48 1113548 72296 6531 0 100 0 0
65 15 2544 7584 448 6984128 0 0 32 1028156 66183 8963 0 100 0 0
62 26 2544 9728 784 6977600 0 0 604 997204 69320 10154 0 100 0 0
0 4 2544 29376 864 6963072 0 0 836 49804 17486 5250 1 28 31 40
0 1 2544 29376 864 6963072 0 0 0 24856 15374 952 0 10 76 14
0 1 2544 29376 864 6963072 0 0 0 0 14715 1532 0 9 80 11
0 1 2544 29312 864 6963072 0 0 8 18856 13580 735 0 8 85 7
64 2 2544 8768 928 6981584 0 0 104 422060 54309 2831 0 68 28 4
56 1 2544 6592 576 6986064 0 0 48 1364520 75387 8135 0 100 0 0
63 24 2544 7776 448 6990320 0 0 56 2284556 66785 9020 0 100 0 0
57 33 2544 5536 448 6988256 0 0 80 1774088 62152 9677 0 100 0 0
58 19 2544 8992 448 6988256 0 0 40 1671936 70438 7714 0 100 0 0
58 15 2544 7520 448 6984128 0 0 48 1333652 69927 6837 0 100 0 0
56 10 2544 9824 448 6988256 0 0 16 1095912 73715 7797 0 100 0 0
58 7 2544 7008 448 6982064 0 0 56 1167724 67655 6401 0 100 0 0
57 10 2544 8480 448 6986192 0 0 8 915948 69804 6384 0 100 0 0
56 11 2544 7392 448 6986192 0 0 32 998204 69488 6417 0 100 0 0
59 13 2544 7296 528 6988176 0 0 196 1001173 68362 6354 0 100 0 0
6 2 2544 14528 784 6977600 0 0 440 277287 25760 5077 0 35 44 21
1 1 2544 13184 944 6965056 0 0 844 6335 15959 3621 1 16 70 13
0 1 2544 13120 944 6965056 0 0 0 2 13599 833 0 7 81 11
0 1 2544 13120 944 6965056 0 0 0 11 13051 1347 0 7 82 11
0 1 2544 13056 1008 6964992 0 0 8 80 11969 858 0 5 83 12
59 1 2544 9984 992 6971200 0 0 128 874215 69268 7155 0 83 14 2
61 2 2544 7712 480 6975840 0 0 40 1739593 73008 12486 0 100 0 0
61 18 2544 5536 448 6977936 0 0 128 2176972 67952 9666 0 100 0 0
59 19 2544 5792 448 6973808 0 0 64 2008752 68552 7674 0 100 0 0
57 24 2544 8288 448 6973808 0 0 80 1561368 71155 8361 0 100 0 0
63 14 2544 11744 448 6971744 0 0 96 1349620 71476 7671 0 100 0 0
60 17 2544 5856 448 6967616 0 0 64 958588 66934 5707 0 100 0 0
56 15 2544 6560 448 6967616 0 0 112 1061172 71689 7726 0 100 0 0
57 8 2544 6176 448 6969680 0 0 48 948800 70414 7758 0 100 0 0
62 10 2544 8704 704 6969424 0 0 596 959484 66082 8013 0 100 0 0
procs -----------memory---------- ---swap-- -----io---- --system-- ----cpu----
r b swpd free buff cache si so bi bo in cs us sy id wa
2 3 2544 23872 656 6957088 0 0 144 613012 47183 8732 0 72 17 11
0 1 2544 112000 896 6956848 0 0 1220 48532 14407 3578 2 14 58 26
0 1 2544 112000 896 6956848 0 0 0 0 13582 995 0 8 81 11
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: Buffered I/O slowness
2004-10-26 1:14 Buffered write slowness Jesse Barnes
@ 2004-10-29 17:46 ` Jesse Barnes
2004-10-29 23:08 ` Andrew Morton
0 siblings, 1 reply; 7+ messages in thread
From: Jesse Barnes @ 2004-10-29 17:46 UTC (permalink / raw)
To: linux-kernel; +Cc: akpm
[-- Attachment #1: Type: text/plain, Size: 3172 bytes --]
On Monday, October 25, 2004 6:14 pm, Jesse Barnes wrote:
> I've been doing some simple disk I/O benchmarking with an eye towards
> improving large, striped volume bandwidth. I ran some tests on individual
> disks and filesystems to establish a baseline and found that things
> generally scale quite well:
>
> o one thread/disk using O_DIRECT on the block device
> read avg: 2784.81 MB/s
> write avg: 2585.60 MB/s
>
> o one thread/disk using O_DIRECT + filesystem
> read avg: 2635.98 MB/s
> write avg: 2573.39 MB/s
>
> o one thread/disk using buffered I/O + filesystem
> read w/default (128) block/*/queue/read_ahead_kb avg: 2626.25 MB/s
> read w/max (4096) block/*/queue/read_ahead_kb avg: 2652.62 MB/s
> write avg: 1394.99 MB/s
>
> Configuration:
> o 8p sn2 ia64 box
> o 8GB memory
> o 58 disks across 16 controllers
> (4 disks for 10 of them and 3 for the other 6)
> o aggregate I/O bw available is about 2.8GB/s
>
> Test:
> o one I/O thread per disk, round robined across the 8 CPUs
> o each thread did ~450MB of I/O depending on the test (ran for 10s)
> Note: the total was > 8GB so in the buffered read case not everything
> could be cached
More results here. I've run some tests on a large dm striped volume formatted
with XFS. It had 64 disks with a 64k stripe unit (XFS was made aware of this
at format time), and I explicitly set the readahead using blockdev to 524288
blocks. The results aren't as bad as my previous runs, but are still much
slower than they ought to be I think given the direct I/O results above.
This is after a fresh mount, so the pagecache was empty when I started the
tests.
o one thread on one large volume using buffered I/O + filesystem
read (1 thread, one volume, 131072 blocks/request) avg: ~931 MB/s
write (1 thread, one volume, 131072 blocks/request) avg: ~908 MB/s
I'm intentionally issuing very large reads and writes here to take advantage
of the striping, but it looks like both the readahead and regular buffered
I/O code will split the I/O into page sized chunks? The call chain is pretty
long, but it looks to me like do_generic_mapping_read() will split the reads
up by page and issue them independently to the lower levels. In the direct
I/O case, up to 64 pages are issued at a time, which seems like it would help
throughput quite a bit. The profile seems to confirm this. Unfortunately I
didn't save the vmstat output for this run (and now the fc switch is
misbehaving so I have to fix that before I run again), but iirc the system
time was pretty high given that only one thread was issuing I/O.
So maybe a few things need to be done:
o set readahead to larger values by default for dm volumes at setup time
(the default was very small)
o maybe bypass readahead for very large requests?
if the process is doing a huge request, chances are that readahead won't
benefit it as much as a process doing small requests
o not sure about writes yet, I haven't looked at that call chain much yet
Does any of this sound reasonable at all? What else could be done to make the
buffered I/O layer friendlier to large requests?
Thanks,
Jesse
[-- Attachment #2: vol-buffered-read-profile.txt --]
[-- Type: text/plain, Size: 1710 bytes --]
115383 total 0.0203
49642 ia64_pal_call_static 258.5521
42065 default_idle 93.8951
7348 __copy_user 3.1030
5865 ia64_save_scratch_fpregs 91.6406
5766 ia64_load_scratch_fpregs 90.0938
1944 _spin_unlock_irq 30.3750
352 _spin_unlock_irqrestore 3.6667
231 buffered_rmqueue 0.1245
225 kmem_cache_free 0.5859
151 mpage_end_io_read 0.2776
147 __end_that_request_first 0.1242
133 bio_alloc 0.0967
122 smp_call_function 0.1089
102 shrink_list 0.0224
99 unlock_page 0.4420
86 free_hot_cold_page 0.0840
82 kmem_cache_alloc 0.3203
65 __alloc_pages 0.0271
53 do_mpage_readpage 0.0224
53 bio_clone 0.1380
49 __might_sleep 0.0806
44 mpage_readpages 0.0598
43 generic_make_request 0.0345
42 sn_pci_unmap_sg 0.1010
42 sn_dma_flush 0.0597
41 clear_page 0.2562
40 file_read_actor 0.0431
34 mark_page_accessed 0.0966
32 __bio_add_page 0.0278
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: Buffered I/O slowness
2004-10-29 17:46 ` Buffered I/O slowness Jesse Barnes
@ 2004-10-29 23:08 ` Andrew Morton
2004-10-30 0:16 ` Jesse Barnes
0 siblings, 1 reply; 7+ messages in thread
From: Andrew Morton @ 2004-10-29 23:08 UTC (permalink / raw)
To: Jesse Barnes; +Cc: linux-kernel
Jesse Barnes <jbarnes@engr.sgi.com> wrote:
>
> ...
> o one thread on one large volume using buffered I/O + filesystem
> read (1 thread, one volume, 131072 blocks/request) avg: ~931 MB/s
> write (1 thread, one volume, 131072 blocks/request) avg: ~908 MB/s
>
> I'm intentionally issuing very large reads and writes here to take advantage
> of the striping, but it looks like both the readahead and regular buffered
> I/O code will split the I/O into page sized chunks?
No, the readahead code will assemble single BIOs up to the size of the
readahead window. So the single-page-reads in do_generic_mapping_read()
should never happen, because the pages are in cache from the readahead.
> The call chain is pretty
> long, but it looks to me like do_generic_mapping_read() will split the reads
> up by page and issue them independently to the lower levels. In the direct
> I/O case, up to 64 pages are issued at a time, which seems like it would help
> throughput quite a bit. The profile seems to confirm this. Unfortunately I
> didn't save the vmstat output for this run (and now the fc switch is
> misbehaving so I have to fix that before I run again), but iirc the system
> time was pretty high given that only one thread was issuing I/O.
>
> So maybe a few things need to be done:
> o set readahead to larger values by default for dm volumes at setup time
> (the default was very small)
Well possibly. dm has control of queue->backing_dev_info and is free to
tune the queue's default readahead.
> o maybe bypass readahead for very large requests?
> if the process is doing a huge request, chances are that readahead won't
> benefit it as much as a process doing small requests
Maybe - but bear in mind that this is all pinned memory when the I/O is in
flight, so some upper bound has to remain.
> o not sure about writes yet, I haven't looked at that call chain much yet
>
> Does any of this sound reasonable at all? What else could be done to make the
> buffered I/O layer friendlier to large requests?
I'm not sure that we know what's going on yet. I certainly don't. The
above numbers look good, so what's the problem???
Suggest you get geared up to monitor the BIOs going into submit_bio().
Look at their bi_sector and bi_size. Make sure that buffered I/O is doing
the right thing.
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: Buffered I/O slowness
2004-10-29 23:08 ` Andrew Morton
@ 2004-10-30 0:16 ` Jesse Barnes
2004-10-30 0:30 ` Andrew Morton
0 siblings, 1 reply; 7+ messages in thread
From: Jesse Barnes @ 2004-10-30 0:16 UTC (permalink / raw)
To: Andrew Morton; +Cc: linux-kernel, jeremy
On Friday, October 29, 2004 4:08 pm, Andrew Morton wrote:
> > I'm intentionally issuing very large reads and writes here to take
> > advantage of the striping, but it looks like both the readahead and
> > regular buffered I/O code will split the I/O into page sized chunks?
>
> No, the readahead code will assemble single BIOs up to the size of the
> readahead window. So the single-page-reads in do_generic_mapping_read()
> should never happen, because the pages are in cache from the readahead.
Yeah, I realized that after I sent the message. The readahead looks like it
might be ok.
> > So maybe a few things need to be done:
> > o set readahead to larger values by default for dm volumes at setup
> > time (the default was very small)
>
> Well possibly. dm has control of queue->backing_dev_info and is free to
> tune the queue's default readahead.
Yep, I'll give that a try and see if I can come up with a reasonable default
(something more the stripe unit seems like a start).
> > o maybe bypass readahead for very large requests?
> > if the process is doing a huge request, chances are that readahead
> > won't benefit it as much as a process doing small requests
>
> Maybe - but bear in mind that this is all pinned memory when the I/O is in
> flight, so some upper bound has to remain.
Right, for the direct I/O case, it looks like things are limited to 64 pages
at a time.
>
> > o not sure about writes yet, I haven't looked at that call chain much
> > yet
> >
> > Does any of this sound reasonable at all? What else could be done to
> > make the buffered I/O layer friendlier to large requests?
>
> I'm not sure that we know what's going on yet. I certainly don't. The
> above numbers look good, so what's the problem???
The numbers are ~1/3 of what the machine is capable of with direct I/O. That
seems like it's much lower than it should be to me. Cache cold reads into
the page cache seem like they should be nearly as fast as direct reads (at
least on a CPU where the extra data copying overhead isn't getting in the
way).
> Suggest you get geared up to monitor the BIOs going into submit_bio().
> Look at their bi_sector and bi_size. Make sure that buffered I/O is doing
> the right thing.
Ok, I'll give that a try.
Thanks,
Jesse
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: Buffered I/O slowness
2004-10-30 0:16 ` Jesse Barnes
@ 2004-10-30 0:30 ` Andrew Morton
2004-11-01 18:26 ` Jesse Barnes
0 siblings, 1 reply; 7+ messages in thread
From: Andrew Morton @ 2004-10-30 0:30 UTC (permalink / raw)
To: Jesse Barnes; +Cc: linux-kernel, jeremy
Jesse Barnes <jbarnes@engr.sgi.com> wrote:
>
> > I'm not sure that we know what's going on yet. I certainly don't. The
> > above numbers look good, so what's the problem???
>
> The numbers are ~1/3 of what the machine is capable of with direct I/O.
Are there CPU cycles to spare? If you have just one CPU copying 1GB/sec
out of pagecache, maybe it is pegged?
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: Buffered I/O slowness
2004-10-30 0:30 ` Andrew Morton
@ 2004-11-01 18:26 ` Jesse Barnes
2004-11-01 18:34 ` Jesse Barnes
0 siblings, 1 reply; 7+ messages in thread
From: Jesse Barnes @ 2004-11-01 18:26 UTC (permalink / raw)
To: Andrew Morton; +Cc: linux-kernel, jeremy
On Friday, October 29, 2004 5:30 pm, Andrew Morton wrote:
> Jesse Barnes <jbarnes@engr.sgi.com> wrote:
> > > I'm not sure that we know what's going on yet. I certainly don't. The
> > > above numbers look good, so what's the problem???
> >
> > The numbers are ~1/3 of what the machine is capable of with direct I/O.
>
> Are there CPU cycles to spare? If you have just one CPU copying 1GB/sec
> out of pagecache, maybe it is pegged?
Hm, I thought I had more CPU to spare, but when I set the readahead to a large
value, I'm taking ~100% of the CPU time on the CPU doing the read. ~98% of
that is system time. When I run 8 copies (this is an 8 CPU system), I get
~4GB/s and all the CPUs are near fully busy. I guess things aren't as bad as
I initially thought.
Thanks,
Jesse
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: Buffered I/O slowness
2004-11-01 18:26 ` Jesse Barnes
@ 2004-11-01 18:34 ` Jesse Barnes
0 siblings, 0 replies; 7+ messages in thread
From: Jesse Barnes @ 2004-11-01 18:34 UTC (permalink / raw)
To: Andrew Morton; +Cc: linux-kernel, jeremy
On Monday, November 1, 2004 10:26 am, Jesse Barnes wrote:
> On Friday, October 29, 2004 5:30 pm, Andrew Morton wrote:
> > Jesse Barnes <jbarnes@engr.sgi.com> wrote:
> > > > I'm not sure that we know what's going on yet. I certainly don't.
> > > > The above numbers look good, so what's the problem???
> > >
> > > The numbers are ~1/3 of what the machine is capable of with direct I/O.
> >
> > Are there CPU cycles to spare? If you have just one CPU copying 1GB/sec
> > out of pagecache, maybe it is pegged?
>
> Hm, I thought I had more CPU to spare, but when I set the readahead to a
> large value, I'm taking ~100% of the CPU time on the CPU doing the read.
> ~98% of that is system time. When I run 8 copies (this is an 8 CPU
> system), I get ~4GB/s and all the CPUs are near fully busy. I guess things
> aren't as bad as I initially thought.
OTOH, if I run 8 copies against 8 separate files (the test above was 8 I/O
threads on the same file), I'm seeing ~16% CPU for each CPU in the machine
and only about 700 MB/s of I/O throughput, so this case *does* look like a
problem. Here's the profile (this is 2.6.10-rc1-mm2).
Jesse
mgr Aggregate throughput: 6241.204239 MB in 10.183594s; 612.868541 MB/s
116885 total 0.0162
50577 ia64_pal_call_static 263.4219
42784 default_idle 95.5000
6148 ia64_save_scratch_fpregs 96.0625
5908 ia64_load_scratch_fpregs 92.3125
4738 __copy_user 2.0008
2079 _spin_unlock_irq 12.9938
926 _spin_unlock_irqrestore 4.8229
374 sn_dma_flush 0.2997
192 generic_make_request 0.1250
177 clone_endio 0.2634
149 _read_unlock_irq 0.9313
135 dm_table_unplug_all 0.4688
128 buffered_rmqueue 0.0597
122 mptscsih_io_done 0.0428
117 clear_page 0.7312
96 __end_that_request_first 0.0811
94 _spin_lock_irqsave 0.2670
92 mempool_alloc 0.0927
88 handle_IRQ_event 0.3056
80 _write_unlock_irq 0.3571
80 mpage_end_io_read 0.1471
61 kmem_cache_alloc 0.2383
59 xfs_iomap 0.0181
59 xfs_bmapi 0.0038
59 do_mpage_readpage 0.0249
55 dm_table_any_congested 0.1719
53 pcibr_dma_unmap 0.3312
51 scsi_io_completion 0.0228
47 kmem_cache_free 0.1224
^ permalink raw reply [flat|nested] 7+ messages in thread
end of thread, other threads:[~2004-11-01 19:05 UTC | newest]
Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2004-10-26 1:14 Buffered write slowness Jesse Barnes
2004-10-29 17:46 ` Buffered I/O slowness Jesse Barnes
2004-10-29 23:08 ` Andrew Morton
2004-10-30 0:16 ` Jesse Barnes
2004-10-30 0:30 ` Andrew Morton
2004-11-01 18:26 ` Jesse Barnes
2004-11-01 18:34 ` Jesse Barnes
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).