All of lore.kernel.org
 help / color / mirror / Atom feed
* performance
@ 2003-06-09 16:12 P
  2003-06-09 16:16 ` performance P
  0 siblings, 1 reply; 21+ messages in thread
From: P @ 2003-06-09 16:12 UTC (permalink / raw)
  To: netfilter-devel

Hi,

I'm testing netfilter performance here on
PIII 1.2GHz based systems. With default
kernel configuration, netfilter is able
to process 85,000 pps with 125 rules (all
rules matching).

Note the application is just counting.
There is no transmitting/forwarding.

Also note the nics are e100.

So my simple question are there any
tips in increasing the performance?
Hmm actually the performance seems
optimal? is it only taking 9 instructions
per match? 1.2*10^9/(85000*1500) = 9

thanks,
Pádraig.

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: performance
  2003-06-09 16:12 performance P
@ 2003-06-09 16:16 ` P
  2003-06-09 18:27   ` Re[2]: performance Peteris Krumins
  0 siblings, 1 reply; 21+ messages in thread
From: P @ 2003-06-09 16:16 UTC (permalink / raw)
  To: netfilter-devel

P@draigBrady.com wrote:
> Hi,
> 
> I'm testing netfilter performance here on
> PIII 1.2GHz based systems. With default
> kernel configuration, netfilter is able
> to process 85,000 pps with 125 rules (all
> rules matching).
> 
> Note the application is just counting.
> There is no transmitting/forwarding.
> 
> Also note the nics are e100.
> 
> So my simple question are there any
> tips in increasing the performance?
> Hmm actually the performance seems
> optimal? is it only taking 9 instructions
> per match? 1.2*10^9/(85000*1500) = 9

I knew that couldn't be right.
That was tested on a dual 1.2GHz,
so that should be approx:
2*10^9/(85000*125) = 188 instructions per match.

I guess that's pretty optimal?
The best I could hope for after that
would be to increase the rx packet
buffer space so as to handle higher
spikes than this.

cheers,
Pádraig.

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re[2]: performance
  2003-06-09 16:16 ` performance P
@ 2003-06-09 18:27   ` Peteris Krumins
  2003-06-10  8:49     ` performance P
  0 siblings, 1 reply; 21+ messages in thread
From: Peteris Krumins @ 2003-06-09 18:27 UTC (permalink / raw)
  To: P; +Cc: netfilter-devel

Monday, June 9, 2003, 7:16:35 PM, you wrote:

Pdc> P@draigBrady.com wrote:
>> Hi,
>> 
>> I'm testing netfilter performance here on
>> PIII 1.2GHz based systems. With default
>> kernel configuration, netfilter is able
>> to process 85,000 pps with 125 rules (all
>> rules matching).
>> 
>> Note the application is just counting.
>> There is no transmitting/forwarding.
>> 
>> Also note the nics are e100.
>> 
>> So my simple question are there any
>> tips in increasing the performance?
>> Hmm actually the performance seems
>> optimal? is it only taking 9 instructions
>> per match? 1.2*10^9/(85000*1500) = 9

Pdc> I knew that couldn't be right.
Pdc> That was tested on a dual 1.2GHz,
Pdc> so that should be approx:
Pdc> 2*10^9/(85000*125) = 188 instructions per match.

I am afraid that is not instructions per match
but ticks per match? An instruction can take more than
one tick (clock cycle).


P.Krumins

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: performance
  2003-06-09 18:27   ` Re[2]: performance Peteris Krumins
@ 2003-06-10  8:49     ` P
  2003-06-11 12:16       ` performance Harald Welte
  0 siblings, 1 reply; 21+ messages in thread
From: P @ 2003-06-10  8:49 UTC (permalink / raw)
  To: Peteris Krumins, netfilter-devel

Peteris Krumins wrote:
> Monday, June 9, 2003, 7:16:35 PM, you wrote:
> 
> Pdc> P@draigBrady.com wrote:
> 
>>>Hi,
>>>
>>>I'm testing netfilter performance here on
>>>PIII 1.2GHz based systems. With default
>>>kernel configuration, netfilter is able
>>>to process 85,000 pps with 125 rules (all
>>>rules matching).
>>>
>>>Note the application is just counting.
>>>There is no transmitting/forwarding.
>>>
>>>Also note the nics are e100.
>>>
>>>So my simple question are there any
>>>tips in increasing the performance?
>>>Hmm actually the performance seems
>>>optimal? is it only taking 9 instructions
>>>per match? 1.2*10^9/(85000*1500) = 9
> 
> 
> Pdc> I knew that couldn't be right.
> Pdc> That was tested on a dual 1.2GHz,
> Pdc> so that should be approx:
> Pdc> 2*10^9/(85000*125) = 188 instructions per match.
> 
> I am afraid that is not instructions per match
> but ticks per match? An instruction can take more than
> one tick (clock cycle).

yes true. I was making assumptions.
The following suggests the average cycles:instruction
ratio is 1.68 I think:
http://www.cs.berkeley.edu/~pattrsn/252S01/Lec18-dynamic3.pdf

Looking again at the system it's actually 2 x 1.4GHz
So the instructions per match are about:
((2.8*10^9)/1.68)/(85000*125) = 156

With the overhead associated with SMP/kernel/...
this suggests a figure closer to 120?
I'm very impressed.

So to scale I would have to organise the rules
into chains that could be bypassed. I wonder is
there any projects to do this automatically? hmm..

Pádraig.

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: performance
  2003-06-10  8:49     ` performance P
@ 2003-06-11 12:16       ` Harald Welte
  2003-06-12 12:04         ` performance P
  0 siblings, 1 reply; 21+ messages in thread
From: Harald Welte @ 2003-06-11 12:16 UTC (permalink / raw)
  To: P; +Cc: Peteris Krumins, netfilter-devel

[-- Attachment #1: Type: text/plain, Size: 1842 bytes --]

On Tue, Jun 10, 2003 at 09:49:54AM +0100, P@draigBrady.com wrote:
 
> yes true. I was making assumptions.
> The following suggests the average cycles:instruction
> ratio is 1.68 I think:
> http://www.cs.berkeley.edu/~pattrsn/252S01/Lec18-dynamic3.pdf
> 
> Looking again at the system it's actually 2 x 1.4GHz
> So the instructions per match are about:
> ((2.8*10^9)/1.68)/(85000*125) = 156
> 
> With the overhead associated with SMP/kernel/...
> this suggests a figure closer to 120?
> I'm very impressed.

I don't know what kind of weird calculation that would be.  First of
all, why the hack are you adding up the clock rates of your CPU's?  Do
you have any idea how SMP systems work? 

And then, why do you multiply the pps rate with the size of the packets?
do you think we process every byte of a packet individually?

Let's make a simple calculation for the UP case:

average clock ticks per processed packet:
1.4*10^9/85000 = 16470

In order to get any idea about how those clock ticks are distributed
among the various pars of kernel networking, you need to use some means
of profiling.

I advise you to dig into profiling, and exploit all general kernel
networking optimizations (like NAPI, ...) before starting to think about
optimizing iptables.

> So to scale I would have to organise the rules
> into chains that could be bypassed. I wonder is
> there any projects to do this automatically? hmm..

no.

> Pádraig.

-- 
- Harald Welte <laforge@netfilter.org>             http://www.netfilter.org/
============================================================================
  "Fragmentation is like classful addressing -- an interesting early
   architectural error that shows how much experimentation was going
   on while IP was being designed."                    -- Paul Vixie

[-- Attachment #2: Type: application/pgp-signature, Size: 189 bytes --]

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: performance
  2003-06-11 12:16       ` performance Harald Welte
@ 2003-06-12 12:04         ` P
  0 siblings, 0 replies; 21+ messages in thread
From: P @ 2003-06-12 12:04 UTC (permalink / raw)
  To: Harald Welte; +Cc: Peteris Krumins, netfilter-devel

Harald Welte wrote:
> In order to get any idea about how those clock ticks are distributed
> among the various pars of kernel networking, you need to use some means
> of profiling.

iptables is in modules so I can't profile it at the moment.
But this is informative. This is with 160 match any rules in the
mangle::PREROUTING chain and then the packets are just dropped after.
Packet rate again is 85Kpps.

  14754 total                                      0.0096
  11421 default_idle                             142.7625 (71% !!)
    537 handle_IRQ_event                           3.3563
    341 eth_type_trans                             1.7760
    298 ip_rcv                                     0.4233
    238 do_gettimeofday                            1.8594
    224 netif_rx                                   0.5185
    217 add_timer_randomness                       0.9042
    196 skb_release_data                           1.3611
    180 batch_entropy_store                        1.0227
    162 alloc_skb                                  0.3375
    157 process_backlog                            0.5164
    137 kfree                                      0.7784
    126 __kmem_cache_alloc                         0.3937
    109 netif_receive_skb                          0.2004
     85 nf_hook_slow                               0.1771
     66 kmalloc                                    0.8250
     61 kfree_skbmem                               0.4766
     50 __constant_c_and_count_memset              0.3125
     36 __kfree_skb                                0.0978
     28 nf_iterate                                 0.1750
     16 get_sample_stats                           0.1250
     15 add_entropy_words                          0.0852
     14 ip_promisc_rcv_finish                      0.2917
     10 kmem_cache_free                            0.0625
      8 add_interrupt_randomness                   0.1250
      6 __generic_copy_to_user                     0.0625
      4 schedule                                   0.0030
      2 net_rx_action                              0.0057
      2 do_softirq                                 0.0089

The main question is why all the idle time?
Note userspace is locked out at this packet rate.

system specs:
    dual 1.4GHz PIII
    e1000 nic
    kernel 2.4.20

Pádraig.

^ permalink raw reply	[flat|nested] 21+ messages in thread

* RE: Performance
@ 2003-03-07 16:36 erik.teose
  0 siblings, 0 replies; 21+ messages in thread
From: erik.teose @ 2003-03-07 16:36 UTC (permalink / raw)
  To: linuxppc-embedded


> Can anybody suggest the best way to calculate the performance of an
> application, running on PPC 440.

1. A stopwatch comes to mind.

2. If it's too slow the performance is not good enough....

** Sent via the linuxppc-embedded mail list. See http://lists.linuxppc.org/

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Performance
@ 2003-03-07  7:48 Aman
  0 siblings, 0 replies; 21+ messages in thread
From: Aman @ 2003-03-07  7:48 UTC (permalink / raw)
  To: linuxppc embedded


Hi

Can anybody suggest the best way to calculate the performance of an
application, running on PPC 440.

Thanking you in advance
regards
Aman


** Sent via the linuxppc-embedded mail list. See http://lists.linuxppc.org/

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: Performance
  2002-08-04  7:39           ` Performance Mukul Kotwani
@ 2002-08-04  8:01             ` Jeremy Higdon
  0 siblings, 0 replies; 21+ messages in thread
From: Jeremy Higdon @ 2002-08-04  8:01 UTC (permalink / raw)
  To: Mukul Kotwani; +Cc: linux-scsi

On Aug 4, 12:39am, Mukul Kotwani wrote:
> 
> Thanks for the reply Jeremy!
> 
> Can you point me to the proper test which I can use to
> test it? I think I have pretty good servers and
> storage. What did u guys use..and was there any tuning
> of the OS required?
> 
> For the IOPs, I guess it must have been the Windows
> cache then, because ths storage used was the same in
> both cases.
> 
> Mukul


Well, a later post from you indicated you were using 13 luns
(I presume that means 13 of these 15K drives).  That would
be 538 IOPS per drive, which sounds a little high (2ms per
I/O would account for 1/2 rotation and no seek/settle time).

The tests we've run were not using stock Linux SCSI, so it
might be hard for you to duplicate.  Have you tried using
the raw driver interface to sd (/dev/raw), or the sg
benchmark tools?  I believe the raw interface should give
your the IOPS and the sg interface would give you the MB/s
(and also the IOPS perhaps).

A couple of years ago, we had some patches to the raw interface
and block layer that allowed kiobufs to be passed directly to
the SCSI interface.  That was nice because we avoided the
CPU overhead of deconstruction and reconstruction of large
I/O requests.  However, it was messy.  I believe that the
2.5 changes with bio should help a lot in this area, without
the mess.

jeremy

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: Performance
  2002-08-04  6:27         ` Performance Jeremy Higdon
@ 2002-08-04  7:39           ` Mukul Kotwani
  2002-08-04  8:01             ` Performance Jeremy Higdon
  0 siblings, 1 reply; 21+ messages in thread
From: Mukul Kotwani @ 2002-08-04  7:39 UTC (permalink / raw)
  To: Jeremy Higdon; +Cc: linux-scsi

Thanks for the reply Jeremy!

Can you point me to the proper test which I can use to
test it? I think I have pretty good servers and
storage. What did u guys use..and was there any tuning
of the OS required?

For the IOPs, I guess it must have been the Windows
cache then, because ths storage used was the same in
both cases.

Mukul


--- Jeremy Higdon <jeremy@classic.engr.sgi.com> wrote:
> We've seen about 15000 IOPS from a 2200 and nearly
> 40000 from a 2310.
> With the proper test and proper hardware, you ought
> to see up to 102 MB/s
> on disk reads with a 2200 and 204 MB/s with the 2300
> (10^6 MB).
> 
> However, you may have trouble matching this in the
> Linux block layer.
> Perhaps if you try the sg driver with direct I/O . .
> . .
> 
> I question the 100% random results that you got from
> Windows.  100%
> random implies no cache hits, which would leave you
> with the raw
> IOPS that the drive can supply (260 sounds in the
> right ballpark).
> 
> It would be impossible for the drive to supply 7000
> IOPS in a
> random workload.  If you were supplying out of the
> drive cache or
> Windows cache, then 7000 seems more reasonable
> (though perhaps not
> a very interesting number).
> 
> jeremy
> -
> To unsubscribe from this list: send the line
> "unsubscribe linux-scsi" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at 
http://vger.kernel.org/majordomo-info.html


__________________________________________________
Do You Yahoo!?
Yahoo! Health - Feel better, live better
http://health.yahoo.com

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: Performance
  2002-08-03 18:16       ` Performance Simon Trimmer
  2002-08-03 20:03         ` Performance Mukul Kotwani
@ 2002-08-04  6:27         ` Jeremy Higdon
  2002-08-04  7:39           ` Performance Mukul Kotwani
  1 sibling, 1 reply; 21+ messages in thread
From: Jeremy Higdon @ 2002-08-04  6:27 UTC (permalink / raw)
  To: Simon Trimmer, Mukul Kotwani; +Cc: Craig Tierney, linux-scsi

We've seen about 15000 IOPS from a 2200 and nearly 40000 from a 2310.
With the proper test and proper hardware, you ought to see up to 102 MB/s
on disk reads with a 2200 and 204 MB/s with the 2300 (10^6 MB).

However, you may have trouble matching this in the Linux block layer.
Perhaps if you try the sg driver with direct I/O . . . .

I question the 100% random results that you got from Windows.  100%
random implies no cache hits, which would leave you with the raw
IOPS that the drive can supply (260 sounds in the right ballpark).

It would be impossible for the drive to supply 7000 IOPS in a
random workload.  If you were supplying out of the drive cache or
Windows cache, then 7000 seems more reasonable (though perhaps not
a very interesting number).

jeremy

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: Performance
  2002-08-03 18:16       ` Performance Simon Trimmer
@ 2002-08-03 20:03         ` Mukul Kotwani
  2002-08-04  6:27         ` Performance Jeremy Higdon
  1 sibling, 0 replies; 21+ messages in thread
From: Mukul Kotwani @ 2002-08-03 20:03 UTC (permalink / raw)
  To: Simon Trimmer; +Cc: Craig Tierney, linux-scsi

I do get about 23000 IOPs if I do sequential 512 byte
reads. And there, the block driver seems to be merging
requests into one big chunk. Thats the reason I tried
random reads instead of sequential , and I have the
QLA2340, which I guess is better than the 220x series.
And the sequential performance is not the correct
indicator because of the merging. Thats the reason I
tried random. Did u get 18k IOPS on random or
sequential? And I am using the latest JBODs from IBM,
15k RPM drives, so drives is not the reason got poor
performance.

The load generator is IOMeter/dynamo,which is pretty
much the standard in the Windows world. And the same
card with the same storage using Qlogic drivers gives
7000 IOPs on random reads to 13 LUns with 20
outstanding IOs on Windows, where Linux with the same
settings gives 250! And I did try to vary the
queuedepth of the drivers, and that does not seem to
make a difference. So its a straight comparison, same
card, same storage, same load generator.

Same for the 512k sequential reads. I see 64k requests
on the wire even though I claim to support 1024
sectors  and a scatter gather list of 248! 

Any pointers?

Thanks,
Mukul

 
--- Simon Trimmer <simon@urbanmyth.org> wrote:
> Those numbers do seem rather poor, using Matt
> Jacob's linux qlogic driver as
> an initiator I can easily get 18,000 ops/s on 230x
> series isp cards with a
> userspace app. I don't have recent figures for isp
> 220x cards but around 7k
> ops/s matches some old notes and given the right
> workload you can pretty much
> max out the channel.
> 
> Most of the block io experts have steered clear of
> the thread, this might be
> because a lot of performance measurement depends on
> the load generator, your
> scsi targets, the workload actually presented to
> them, whether OS readahead /
> request merging fires etc etc etc. It's not so clear
> cut and simple anymore
> for a straight answer!
> 
> Chances are the qlogic isp cards have more than
> enough grunt for most people.
> If not, they are probably doing something wrong or
> can afford multiport
> variants! :)
> 
> -Simon
> Simon Trimmer <simon@urbanmyth.org>
> 
> 
> On Fri, 2 Aug 2002, Mukul Kotwani wrote:
> > Running IOMeter against the Qlogic with IBM 15kRPM
> > JBODS, Im getting:
> >
> > 1) For 512 byte 100%random reads, I get *just* 260
> > IOPs per sec.On Windows with IOMeter, I get about
> 7000
> > foir the same config.
> >
> >
> > 2) For 512k 100%sequential reads, I see 30MB per
> sec.
> > Windows gives 85MB/sec
> >
> > I see the similar performance with my driver on
> Linux.
> > I dont know whyit is so low, cannot be that low as
> > compared to windws, a few MB/sec or a few hundred
> IOPs
> > per sec difference is OK, but this difference is
> TOO
> > huge. I tried a bunch of diff host template
> params,
> > but doesnt seem to make a difference.
> >
> > Has anyone dont any testing with IOMeter? Or is
> there
> > any other toold I can use to test IOPs/througput?
> >
> >
> > Thanks,
> > Mukul
> >
> > --- Craig Tierney <ctierney@hpti.com> wrote:
> > > For the Qlogic 2200F I am able to push 100 MB/s
> for
> > > reads and writes using multiple threads to a
> > > filesystem
> > > that is striped across mutliple host ports on
> the
> > > SAN.  Backend RAID is a
> > > Data Direct Networks SAN.  For a single thread
> to a
> > > ext3 filesystem
> > > and 2.4.18, I get about 70 MB/s for reads and
> > > writes.
> > >
> > > I have no Iops/sec numbers.
> > >
> > > Craig
> > >
> > >
> > > > Hello!
> > > > Does anyone have any performace numbers of a
> Qlogic
> > > > HBA? No of IOps/sec Max MBs /sec?
> > > >
> > > > Thanks!
> > > > M
> 
> 
> 


__________________________________________________
Do You Yahoo!?
Yahoo! Health - Feel better, live better
http://health.yahoo.com

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: Performance
  2002-08-02 23:01     ` Performance Mukul Kotwani
  2002-08-02 23:06       ` Performance Randy.Dunlap
@ 2002-08-03 18:16       ` Simon Trimmer
  2002-08-03 20:03         ` Performance Mukul Kotwani
  2002-08-04  6:27         ` Performance Jeremy Higdon
  1 sibling, 2 replies; 21+ messages in thread
From: Simon Trimmer @ 2002-08-03 18:16 UTC (permalink / raw)
  To: Mukul Kotwani; +Cc: Craig Tierney, linux-scsi

Those numbers do seem rather poor, using Matt Jacob's linux qlogic driver as
an initiator I can easily get 18,000 ops/s on 230x series isp cards with a
userspace app. I don't have recent figures for isp 220x cards but around 7k
ops/s matches some old notes and given the right workload you can pretty much
max out the channel.

Most of the block io experts have steered clear of the thread, this might be
because a lot of performance measurement depends on the load generator, your
scsi targets, the workload actually presented to them, whether OS readahead /
request merging fires etc etc etc. It's not so clear cut and simple anymore
for a straight answer!

Chances are the qlogic isp cards have more than enough grunt for most people.
If not, they are probably doing something wrong or can afford multiport
variants! :)

-Simon
Simon Trimmer <simon@urbanmyth.org>


On Fri, 2 Aug 2002, Mukul Kotwani wrote:
> Running IOMeter against the Qlogic with IBM 15kRPM
> JBODS, Im getting:
>
> 1) For 512 byte 100%random reads, I get *just* 260
> IOPs per sec.On Windows with IOMeter, I get about 7000
> foir the same config.
>
>
> 2) For 512k 100%sequential reads, I see 30MB per sec.
> Windows gives 85MB/sec
>
> I see the similar performance with my driver on Linux.
> I dont know whyit is so low, cannot be that low as
> compared to windws, a few MB/sec or a few hundred IOPs
> per sec difference is OK, but this difference is TOO
> huge. I tried a bunch of diff host template params,
> but doesnt seem to make a difference.
>
> Has anyone dont any testing with IOMeter? Or is there
> any other toold I can use to test IOPs/througput?
>
>
> Thanks,
> Mukul
>
> --- Craig Tierney <ctierney@hpti.com> wrote:
> > For the Qlogic 2200F I am able to push 100 MB/s for
> > reads and writes using multiple threads to a
> > filesystem
> > that is striped across mutliple host ports on the
> > SAN.  Backend RAID is a
> > Data Direct Networks SAN.  For a single thread to a
> > ext3 filesystem
> > and 2.4.18, I get about 70 MB/s for reads and
> > writes.
> >
> > I have no Iops/sec numbers.
> >
> > Craig
> >
> >
> > > Hello!
> > > Does anyone have any performace numbers of a Qlogic
> > > HBA? No of IOps/sec Max MBs /sec?
> > >
> > > Thanks!
> > > M




^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: Performance
  2002-08-03  8:26         ` Performance Mukul Kotwani
@ 2002-08-03 16:34           ` Randy.Dunlap
  0 siblings, 0 replies; 21+ messages in thread
From: Randy.Dunlap @ 2002-08-03 16:34 UTC (permalink / raw)
  To: Mukul Kotwani; +Cc: Craig Tierney, linux-scsi

On Sat, 3 Aug 2002, Mukul Kotwani wrote:

| Where is PgMeter available? I dont see a binary or
| source for it anywhere! Can you please point me to
| where it is avilable?

Source code is available only in CVS at
sourceforge.net/projects/pgmeter .
Use cvs to get it, or browse the CVS repository
and download each file.
No binaries available.

| IoMeter source is available on sourceforge.net. I
| compiled the dynamo for Linux from there.You can
| spcify the IP of the machine which will run the GUI as
| a command line param to the dynamo as in:
|
| dynamo IPaddressOfMachineRunningIoMeter
|
| Run the dynamo on the Linux machine as specified
| above, the GUI(Iometer.exe itself!)  on the Windows
| machine, and they should establish a connection. Once
| the GUI and the dynamo are up and connected, you will
| see disks on your Linux machine on the GUI running on
| Windows, and can run tests as you normally do!

Thanks.

~Randy

| --- "Randy.Dunlap" <rddunlap@osdl.org> wrote:
| > On Fri, 2 Aug 2002, Mukul Kotwani wrote:
| >
| > | Running IOMeter against the Qlogic with IBM 15kRPM
| > | JBODS, Im getting:
| > |
| > | 1) For 512 byte 100%random reads, I get *just* 260
| > | IOPs per sec.On Windows with IOMeter, I get about
| > 7000
| > | foir the same config.
| > |
| > | 2) For 512k 100%sequential reads, I see 30MB per
| > sec.
| > | Windows gives 85MB/sec
| > |
| > | I see the similar performance with my driver on
| > Linux.
| > | I dont know whyit is so low, cannot be that low as
| > | compared to windws, a few MB/sec or a few hundred
| > IOPs
| > | per sec difference is OK, but this difference is
| > TOO
| > | huge. I tried a bunch of diff host template
| > params,
| > | but doesnt seem to make a difference.
| > |
| > | Has anyone dont any testing with IOMeter? Or is
| > there
| > | any other toold I can use to test IOPs/througput?
| >
| > Hi,
| >
| > I'm not familiar with people using iometer with
| > Linux.
| > How does someone do that?
| > I thought that it was a Windows client app.
| >
| > A few people do use pgmeter (an iometer clean-room
| > replacement) on Linux.  SGI and IBM presented a
| > paper
| > at USENIX just a few weeks ago in which pgmeter was
| > used.
| > (pgmeter.sf.net).
| >
| > Also, iozone can measure IOs/second or throughput.
| > www.iozone.org
| >
| > ~Randy
| >
| > | Thanks,
| > | Mukul
| > |
| > | --- Craig Tierney <ctierney@hpti.com> wrote:
| > | > For the Qlogic 2200F I am able to push 100 MB/s
| > for
| > | > reads and writes using multiple threads to a
| > | > filesystem
| > | > that is striped across mutliple host ports on
| > the
| > | > SAN.  Backend RAID is a
| > | > Data Direct Networks SAN.  For a single thread
| > to a
| > | > ext3 filesystem
| > | > and 2.4.18, I get about 70 MB/s for reads and
| > | > writes.
| > | >
| > | > I have no Iops/sec numbers.
| > | >
| > | > Craig
| > | >
| > | >
| > | > > Hello!
| > | > > Does anyone have any performace numbers of a
| > | > Qlogic
| > | > > HBA? No of IOps/sec Max MBs /sec?
| > | > >
| > | > > Thanks!
| > | > > M
| > | > >
| > __________________________________________________
| > | > --
| > | > Craig Tierney (ctierney@hpti.com)
| > | > -
| >
| > --
| > ~Randy


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: Performance
  2002-08-02 23:06       ` Performance Randy.Dunlap
@ 2002-08-03  8:26         ` Mukul Kotwani
  2002-08-03 16:34           ` Performance Randy.Dunlap
  0 siblings, 1 reply; 21+ messages in thread
From: Mukul Kotwani @ 2002-08-03  8:26 UTC (permalink / raw)
  To: Randy.Dunlap; +Cc: Craig Tierney, linux-scsi

Where is PgMeter available? I dont see a binary or
source for it anywhere! Can you please point me to
where it is avilable?

IoMeter source is available on sourceforge.net. I
compiled the dynamo for Linux from there.You can
spcify the IP of the machine which will run the GUI as
a command line param to the dynamo as in: 

dynamo IPaddressOfMachineRunningIoMeter

Run the dynamo on the Linux machine as specified
above, the GUI(Iometer.exe itself!)  on the Windows
machine, and they should establish a connection. Once
the GUI and the dynamo are up and connected, you will
see disks on your Linux machine on the GUI running on
Windows, and can run tests as you normally do!

--- "Randy.Dunlap" <rddunlap@osdl.org> wrote:
> On Fri, 2 Aug 2002, Mukul Kotwani wrote:
> 
> | Running IOMeter against the Qlogic with IBM 15kRPM
> | JBODS, Im getting:
> |
> | 1) For 512 byte 100%random reads, I get *just* 260
> | IOPs per sec.On Windows with IOMeter, I get about
> 7000
> | foir the same config.
> |
> | 2) For 512k 100%sequential reads, I see 30MB per
> sec.
> | Windows gives 85MB/sec
> |
> | I see the similar performance with my driver on
> Linux.
> | I dont know whyit is so low, cannot be that low as
> | compared to windws, a few MB/sec or a few hundred
> IOPs
> | per sec difference is OK, but this difference is
> TOO
> | huge. I tried a bunch of diff host template
> params,
> | but doesnt seem to make a difference.
> |
> | Has anyone dont any testing with IOMeter? Or is
> there
> | any other toold I can use to test IOPs/througput?
> 
> Hi,
> 
> I'm not familiar with people using iometer with
> Linux.
> How does someone do that?
> I thought that it was a Windows client app.
> 
> A few people do use pgmeter (an iometer clean-room
> replacement) on Linux.  SGI and IBM presented a
> paper
> at USENIX just a few weeks ago in which pgmeter was
> used.
> (pgmeter.sf.net).
> 
> Also, iozone can measure IOs/second or throughput.
> www.iozone.org
> 
> ~Randy
> 
> | Thanks,
> | Mukul
> |
> | --- Craig Tierney <ctierney@hpti.com> wrote:
> | > For the Qlogic 2200F I am able to push 100 MB/s
> for
> | > reads and writes using multiple threads to a
> | > filesystem
> | > that is striped across mutliple host ports on
> the
> | > SAN.  Backend RAID is a
> | > Data Direct Networks SAN.  For a single thread
> to a
> | > ext3 filesystem
> | > and 2.4.18, I get about 70 MB/s for reads and
> | > writes.
> | >
> | > I have no Iops/sec numbers.
> | >
> | > Craig
> | >
> | >
> | > > Hello!
> | > > Does anyone have any performace numbers of a
> | > Qlogic
> | > > HBA? No of IOps/sec Max MBs /sec?
> | > >
> | > > Thanks!
> | > > M
> | > >
> __________________________________________________
> | > --
> | > Craig Tierney (ctierney@hpti.com)
> | > -
> 
> -- 
> ~Randy
> 


__________________________________________________
Do You Yahoo!?
Yahoo! Health - Feel better, live better
http://health.yahoo.com

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: Performance
  2002-08-02 23:01     ` Performance Mukul Kotwani
@ 2002-08-02 23:06       ` Randy.Dunlap
  2002-08-03  8:26         ` Performance Mukul Kotwani
  2002-08-03 18:16       ` Performance Simon Trimmer
  1 sibling, 1 reply; 21+ messages in thread
From: Randy.Dunlap @ 2002-08-02 23:06 UTC (permalink / raw)
  To: Mukul Kotwani; +Cc: Craig Tierney, linux-scsi

On Fri, 2 Aug 2002, Mukul Kotwani wrote:

| Running IOMeter against the Qlogic with IBM 15kRPM
| JBODS, Im getting:
|
| 1) For 512 byte 100%random reads, I get *just* 260
| IOPs per sec.On Windows with IOMeter, I get about 7000
| foir the same config.
|
| 2) For 512k 100%sequential reads, I see 30MB per sec.
| Windows gives 85MB/sec
|
| I see the similar performance with my driver on Linux.
| I dont know whyit is so low, cannot be that low as
| compared to windws, a few MB/sec or a few hundred IOPs
| per sec difference is OK, but this difference is TOO
| huge. I tried a bunch of diff host template params,
| but doesnt seem to make a difference.
|
| Has anyone dont any testing with IOMeter? Or is there
| any other toold I can use to test IOPs/througput?

Hi,

I'm not familiar with people using iometer with Linux.
How does someone do that?
I thought that it was a Windows client app.

A few people do use pgmeter (an iometer clean-room
replacement) on Linux.  SGI and IBM presented a paper
at USENIX just a few weeks ago in which pgmeter was used.
(pgmeter.sf.net).

Also, iozone can measure IOs/second or throughput.
www.iozone.org

~Randy

| Thanks,
| Mukul
|
| --- Craig Tierney <ctierney@hpti.com> wrote:
| > For the Qlogic 2200F I am able to push 100 MB/s for
| > reads and writes using multiple threads to a
| > filesystem
| > that is striped across mutliple host ports on the
| > SAN.  Backend RAID is a
| > Data Direct Networks SAN.  For a single thread to a
| > ext3 filesystem
| > and 2.4.18, I get about 70 MB/s for reads and
| > writes.
| >
| > I have no Iops/sec numbers.
| >
| > Craig
| >
| >
| > > Hello!
| > > Does anyone have any performace numbers of a
| > Qlogic
| > > HBA? No of IOps/sec Max MBs /sec?
| > >
| > > Thanks!
| > > M
| > > __________________________________________________
| > --
| > Craig Tierney (ctierney@hpti.com)
| > -

-- 
~Randy


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: Performance
  2002-08-02 14:16   ` Performance Craig Tierney
@ 2002-08-02 23:01     ` Mukul Kotwani
  2002-08-02 23:06       ` Performance Randy.Dunlap
  2002-08-03 18:16       ` Performance Simon Trimmer
  0 siblings, 2 replies; 21+ messages in thread
From: Mukul Kotwani @ 2002-08-02 23:01 UTC (permalink / raw)
  To: Craig Tierney; +Cc: linux-scsi

Running IOMeter against the Qlogic with IBM 15kRPM
JBODS, Im getting:

1) For 512 byte 100%random reads, I get *just* 260 
IOPs per sec.On Windows with IOMeter, I get about 7000
foir the same config.


2) For 512k 100%sequential reads, I see 30MB per sec.
Windows gives 85MB/sec

I see the similar performance with my driver on Linux.
I dont know whyit is so low, cannot be that low as
compared to windws, a few MB/sec or a few hundred IOPs
per sec difference is OK, but this difference is TOO
huge. I tried a bunch of diff host template params,
but doesnt seem to make a difference.

Has anyone dont any testing with IOMeter? Or is there
any other toold I can use to test IOPs/througput?


Thanks,
Mukul

--- Craig Tierney <ctierney@hpti.com> wrote:
> For the Qlogic 2200F I am able to push 100 MB/s for
> reads and writes using multiple threads to a
> filesystem
> that is striped across mutliple host ports on the
> SAN.  Backend RAID is a 
> Data Direct Networks SAN.  For a single thread to a
> ext3 filesystem
> and 2.4.18, I get about 70 MB/s for reads and
> writes.
> 
> I have no Iops/sec numbers.
> 
> Craig
> 
> 
> > Hello!
> > Does anyone have any performace numbers of a
> Qlogic
> > HBA? No of IOps/sec Max MBs /sec?
> > 
> > Thanks!
> > M
> > 
> > 
> > __________________________________________________
> > Do You Yahoo!?
> > Yahoo! Health - Feel better, live better
> > http://health.yahoo.com
> > -
> > To unsubscribe from this list: send the line
> "unsubscribe linux-scsi" in
> > the body of a message to majordomo@vger.kernel.org
> > More majordomo info at 
> http://vger.kernel.org/majordomo-info.html
> 
> -- 
> Craig Tierney (ctierney@hpti.com)
> -
> To unsubscribe from this list: send the line
> "unsubscribe linux-scsi" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at 
http://vger.kernel.org/majordomo-info.html


__________________________________________________
Do You Yahoo!?
Yahoo! Health - Feel better, live better
http://health.yahoo.com

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: Performance
  2002-08-02  7:50 ` Performance Mukul Kotwani
  2002-08-02  9:16   ` Performance Fabien Salvi
@ 2002-08-02 14:16   ` Craig Tierney
  2002-08-02 23:01     ` Performance Mukul Kotwani
  1 sibling, 1 reply; 21+ messages in thread
From: Craig Tierney @ 2002-08-02 14:16 UTC (permalink / raw)
  To: Mukul Kotwani; +Cc: linux-scsi

For the Qlogic 2200F I am able to push 100 MB/s for
reads and writes using multiple threads to a filesystem
that is striped across mutliple host ports on the SAN.  Backend RAID is a 
Data Direct Networks SAN.  For a single thread to a ext3 filesystem
and 2.4.18, I get about 70 MB/s for reads and writes.

I have no Iops/sec numbers.

Craig


> Hello!
> Does anyone have any performace numbers of a Qlogic
> HBA? No of IOps/sec Max MBs /sec?
> 
> Thanks!
> M
> 
> 
> __________________________________________________
> Do You Yahoo!?
> Yahoo! Health - Feel better, live better
> http://health.yahoo.com
> -
> To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

-- 
Craig Tierney (ctierney@hpti.com)

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: Performance
  2002-08-02  7:50 ` Performance Mukul Kotwani
@ 2002-08-02  9:16   ` Fabien Salvi
  2002-08-02 14:16   ` Performance Craig Tierney
  1 sibling, 0 replies; 21+ messages in thread
From: Fabien Salvi @ 2002-08-02  9:16 UTC (permalink / raw)
  To: Mukul Kotwani; +Cc: linux-scsi

Mukul Kotwani wrote:
> 
> Hello!
> Does anyone have any performace numbers of a Qlogic
> HBA? No of IOps/sec Max MBs /sec?

I think it depends more on your controllers than on the HBA...

In our tests with a CMD 7240 controller, we have approx. 40 MB/s for
large file transfers.
I don't remember IO/s values...

-------------
Fabien SALVI      Centre de Ressources Informatiques
                  Archamps, France -- http://www.cri74.org
                  PingOO GNU/linux distribution : http://www.pingoo.org

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Performance
  2002-08-02  6:55 Max IO size Mukul Kotwani
@ 2002-08-02  7:50 ` Mukul Kotwani
  2002-08-02  9:16   ` Performance Fabien Salvi
  2002-08-02 14:16   ` Performance Craig Tierney
  0 siblings, 2 replies; 21+ messages in thread
From: Mukul Kotwani @ 2002-08-02  7:50 UTC (permalink / raw)
  To: linux-scsi

Hello!
Does anyone have any performace numbers of a Qlogic
HBA? No of IOps/sec Max MBs /sec?

Thanks!
M


__________________________________________________
Do You Yahoo!?
Yahoo! Health - Feel better, live better
http://health.yahoo.com

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Performance
@ 2002-04-22 15:15 Gregor Pavlin
  0 siblings, 0 replies; 21+ messages in thread
From: Gregor Pavlin @ 2002-04-22 15:15 UTC (permalink / raw)
  To: linux-config

Hi,


I'm trying to configure a small, embedded Linux system (Pentium III
1000MHz), that supports Java+X-Windows. In any case, the Java application
runs very slowly. Even worse, the same application runs on a Linux-system
with a (Pentium III 450MHz)twice as fast.

Initially I thought that it was the X-server configuration problem but now I
guess that it has to do with scheduling. Namely, when I checked the
processes with "top" on both systems and got interesting results:


System with (Pentium III 1000MHz):

Java stuff was assigned less than 30% of the CPU
X windows used more than 60% of the CPU resources.


System with (Pentium III 450MHz):

Java more than 80% of the CPU
X windows few % of the CPU resources.


I think that for the poor performance the X-server is not
responsible.Namely, on the (Pentium III 1000MHz)I can plug in  a harddrive
with a full Linux (SuSE)with the same XFree86 server as in the embedded
linux and the results are quite interesting. Namely, when I start the java
application, it is very slow. But then I close and open the KDE toolbar on
the desktop and after this action the Java just takes off. I checked this
process also with the "top"; before that action the 70% of the CPU was used
by the X-windows and 20% CPU by the Java application. However, as soon as I
hid and opened the KDE toolbar the ratio was Java 70%/Xwindows 10%. If the
problem were the Xserver, then this should not happen, on the same computer
using the same X-configuration, right?

In order to influence the CPU sharing on the "slow" computer (Pentium III
1000MHz) I tryed "nice --20 /$PATH/java-application". However, this doesn't
result in a significant improvement in performance.



Therefore I have the following questions:


1.) Can this be a scheduler's problem? Could this result from dynamic
scheduling associated with the interactive applications handling different
input devices?

2.) If so, are there any other ways I can influence the scheduler, e.g. the
policy without recompiling the Kernel?

3.) I'm using a read-only mounted file systems with about 10 MB RAM-disk and
no Swapping. Could these features be a problem in the context of threads and
X-Windows?




Thank you in advance.

Regards,
Gregor

----------------------------

Gregor Pavlin

email: g.pavlin@gmx.at
Tel.:++43 316 244233
     ++43 316 289402
Fax.:++43 316 244243

Kalsdorferstr. 41,
A-8073, Feldkirchen bei Graz
Austria


^ permalink raw reply	[flat|nested] 21+ messages in thread

end of thread, other threads:[~2003-06-12 12:04 UTC | newest]

Thread overview: 21+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2003-06-09 16:12 performance P
2003-06-09 16:16 ` performance P
2003-06-09 18:27   ` Re[2]: performance Peteris Krumins
2003-06-10  8:49     ` performance P
2003-06-11 12:16       ` performance Harald Welte
2003-06-12 12:04         ` performance P
  -- strict thread matches above, loose matches on Subject: below --
2003-03-07 16:36 Performance erik.teose
2003-03-07  7:48 Performance Aman
2002-08-02  6:55 Max IO size Mukul Kotwani
2002-08-02  7:50 ` Performance Mukul Kotwani
2002-08-02  9:16   ` Performance Fabien Salvi
2002-08-02 14:16   ` Performance Craig Tierney
2002-08-02 23:01     ` Performance Mukul Kotwani
2002-08-02 23:06       ` Performance Randy.Dunlap
2002-08-03  8:26         ` Performance Mukul Kotwani
2002-08-03 16:34           ` Performance Randy.Dunlap
2002-08-03 18:16       ` Performance Simon Trimmer
2002-08-03 20:03         ` Performance Mukul Kotwani
2002-08-04  6:27         ` Performance Jeremy Higdon
2002-08-04  7:39           ` Performance Mukul Kotwani
2002-08-04  8:01             ` Performance Jeremy Higdon
2002-04-22 15:15 Performance Gregor Pavlin

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.