All of lore.kernel.org
 help / color / mirror / Atom feed
* very unstable IOPS in the same test on the same machine
@ 2014-01-02  9:58 tech8891
  2014-01-02 15:40 ` David Nellans
  0 siblings, 1 reply; 5+ messages in thread
From: tech8891 @ 2014-01-02  9:58 UTC (permalink / raw)
  To: fio

Hi ,
  Please let me know if this is not a right place to ask this question.  This is the first time I use mail list. 
  
Problem summary:
 The IOPS is very unstable since I changed the number of  jobs from 2 to 4.  even I changed it back,  the IOPS performance also can't return back. 
# cat 1.fio
[global]
rw=randread
size=128m

[job1]

[job2]

when I run fio 1.fio,  the iops is around 31k.  and then I add the following 2 entries:
[job3]

[job4]

The IOPS dropped to around 1k. 

Even I remove these 2 jobs,  the IOPS still be around 1k.  

Only if  I removed all the jobn.n.0 files,  and re-run with 2 jobs setting,  the IOPS can be 31k again.

My Env:
# fio --version
fio-2.1.4
# cat /etc/redhat-release 
Red Hat Enterprise Linux Server release 6.3 (Santiago)      
# cat /sys/block/sda/queue/scheduler   // I tried both cfq/deadline, no difference
noop anticipatory [deadline] cfq 

# bash blkinfo.sh  /dev/sda 
Vendor     : LSI     
Model      : MR9260-8i       
Nr_request : 128
rotational : 1

Thank you very much for your time

zhifan

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: very unstable IOPS in the same test on the same machine
  2014-01-02  9:58 very unstable IOPS in the same test on the same machine tech8891
@ 2014-01-02 15:40 ` David Nellans
  2014-01-02 17:04   ` Roger Sibert
  2014-01-03  3:14   ` tech8891
  0 siblings, 2 replies; 5+ messages in thread
From: David Nellans @ 2014-01-02 15:40 UTC (permalink / raw)
  To: tech8891, fio


> Problem summary:
>   The IOPS is very unstable since I changed the number of  jobs from 2 to 4.  even I changed it back,  the IOPS performance also can't return back.
> # cat 1.fio
> [global]
> rw=randread
> size=128m
>
> [job1]
>
> [job2]
>
> when I run fio 1.fio,  the iops is around 31k.  and then I add the following 2 entries:
> [job3]
>
> [job4]
>
> The IOPS dropped to around 1k.
>
> Even I remove these 2 jobs,  the IOPS still be around 1k.
>
> Only if  I removed all the jobn.n.0 files,  and re-run with 2 jobs setting,  the IOPS can be 31k again.

> # bash blkinfo.sh  /dev/sda
> Vendor     : LSI
> Model      : MR9260-8i
> Nr_request : 128
> rotational : 1

It looks like you're testing against a LSI megaraid SAS controller, 
which presumably has magnetic drives attached.  When you add more jobs 
to your config its going to cause the heads on the drives (you don't say 
how many you have) to thrash more as they try and interleave requests 
that are going to land on different portions of the disk.  So its not 
unsurprising that you'll see IOPS drop off.

A lot of how and where the IOPS will drop off is going to depend on the 
raid config of the drives you have attached to the controller however. 
Generally speaking 31k IOPS at 128MB I/O's (which will be split into
something smaller like 1MB typically) is well beyond what you should 
expect 8 HDD's to do unless you're getting lots of hits in the DRAM 
buffer on the raid controller. Enterprise HDD's (even 15k ones) 
generally can only sustain <= 250 random read IOPS, so even with perfect 
interleaving on an 8 drive raid-0, 31k seem suspicious, 1k seems 
perfectly realistic however!

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: very unstable IOPS in the same test on the same machine
  2014-01-02 15:40 ` David Nellans
@ 2014-01-02 17:04   ` Roger Sibert
  2014-01-03  3:14   ` tech8891
  1 sibling, 0 replies; 5+ messages in thread
From: Roger Sibert @ 2014-01-02 17:04 UTC (permalink / raw)
  To: FIO

On Thu, Jan 2, 2014 at 10:40 AM, David Nellans <david@nellans.org> wrote:
>
>> Problem summary:
>>   The IOPS is very unstable since I changed the number of  jobs from 2 to
>> 4.  even I changed it back,  the IOPS performance also can't return back.
>> # cat 1.fio
>> [global]
>> rw=randread
>> size=128m
>>
>> [job1]
>>
>> [job2]
>>
>> when I run fio 1.fio,  the iops is around 31k.  and then I add the
>> following 2 entries:
>> [job3]
>>
>> [job4]
>>
>> The IOPS dropped to around 1k.
>>
>> Even I remove these 2 jobs,  the IOPS still be around 1k.
>>
>> Only if  I removed all the jobn.n.0 files,  and re-run with 2 jobs
>> setting,  the IOPS can be 31k again.
>
>
>> # bash blkinfo.sh  /dev/sda
>> Vendor     : LSI
>> Model      : MR9260-8i
>> Nr_request : 128
>> rotational : 1
>
>
> It looks like you're testing against a LSI megaraid SAS controller, which
> presumably has magnetic drives attached.  When you add more jobs to your
> config its going to cause the heads on the drives (you don't say how many
> you have) to thrash more as they try and interleave requests that are going
> to land on different portions of the disk.  So its not unsurprising that
> you'll see IOPS drop off.
>
> A lot of how and where the IOPS will drop off is going to depend on the raid
> config of the drives you have attached to the controller however. Generally
> speaking 31k IOPS at 128MB I/O's (which will be split into
> something smaller like 1MB typically) is well beyond what you should expect
> 8 HDD's to do unless you're getting lots of hits in the DRAM buffer on the
> raid controller. Enterprise HDD's (even 15k ones) generally can only sustain
> <= 250 random read IOPS, so even with perfect interleaving on an 8 drive
> raid-0, 31k seem suspicious, 1k seems perfectly realistic however!
> --
> To unsubscribe from this list: send the line "unsubscribe fio" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html


Just a point of observation, if we are talking a raid device which the
MR9260 does appear to be then you open a very large set of
permutations/combinations for settings that will impact performance.

In general if your talking 128M for one job then that Job in theory
can fit into the cache of the raid controller.  Performance there can
be nice and snappy.  The second you go beyond what fits in the cache
on the raid controller your performance is going to start dropping
rapidly.  By going to multiple jobs using random IO you pretty much
run the risk of negating the raid cache all together, which may be
whats causing your sudden drop off.

A useful starting point may be to disable read and write cache on your
arrays and re-run your performance so you can get a baseline of what
your disks can do and then turn caching back on and re-run the tests
and compare them.

Heres a list of things that I can think of that drive the # of
permutations/combinations.
# of disks involved (do you have enough to saturate the pci lane)
# of disks on each expander on the raid adapter (do you have enough
disks to saturate the expander on the card, assuming the card has 1
expander per channel)
SAS vs SATA (obvious performance difference between the devices, not
to mention SATA really isnt as fast)
chunk/stripe size (you should tailor this to match the data transfer
sizes, but sometimes raid code just works better for one vs another)
disk cache enabled vs disabled (if your running raid you should have
disk cache disabled but that causes SATA performance to normally tank,
you disable the cache since during any sort of power outage the raid
cache code cant tell if the data made it to the media or not, ie data
corruption issue)
raid cache size, read enabled and or write enabled vs disabled (more
cache is usually better and turning it on for reads and writes usually
helps but raid code can have goofey default values if you dont have a
battery installed)
raid type (some raid types lend themselves to better performance than
others, more than likely raid 0 is usually the fastest)
transfer size of data (if your sending down 512 byte chunks of data
thats a bunch of work vs 16k etc ... theres usually a sweetspot for
iops vs transfer size)
read vs write of the data (reads tend to be quicker than writes though
if your dealing strictly with ram that changes the difference)
random vs sequential of the data (sequential is usually faster by a
long shot, though as you increase the # of jobs you run the risk of
making the raid code think its random data)

Peace,
Roger

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re:Re: very unstable IOPS in the same test on the same machine
  2014-01-02 15:40 ` David Nellans
  2014-01-02 17:04   ` Roger Sibert
@ 2014-01-03  3:14   ` tech8891
  2014-01-03  3:37     ` David Nellans
  1 sibling, 1 reply; 5+ messages in thread
From: tech8891 @ 2014-01-03  3:14 UTC (permalink / raw)
  To: David Nellans; +Cc: fio

Thanks for your answer, Nellans!   I think 31k IOPS should be because the RAID card cache.   Seems we have 394MB cache in this controller, which is enough hold the 2 * 128M data. 

# MegaCli64 -AdpAllInfo -aALL | grep -i cache
Cache Flush Interval             : 4s
Max Configurable CacheCade Size: 0 GB
Current Size of CacheCade      : 0 GB
Current Size of FW Cache       : 394 MB
Block SSD Write Disk Cache Change: No
Disk Cache Policy    : Yes
Cache When BBU Bad               : Disabled
Cached IO                        : No

During my test,  fio first created the 2 128M files, and then it should clear the file system cache,  but the data still be in controller cache.   I tried again with 2 jobs and 1280M size,  the IOPS is just around 400...

But I am so pool in understanding many hardware terms, and I can't tell you what raid level, how many disk in this raid groups.    So I just pasted the output of MegaCli for pv, lv, raid controller information in the following.

Can you tell me what does "enclosure device" is?  usually  how many different type of enclosure  a raid controller should have ? ( I thought all the device hardware should only be a raid controller with many slots,  every disk can be installed on a slots.  )

And from the following output,  how many disk I have,  and what type of raid I am using ? 
does "RAID Level          : Primary-1, Secondary-0, RAID Level Qualifier-0"  means raid 1?
and "raid level        : primary-1, secondary-3, raid level qualifier-0 " means raid10?   Why "-3"  is 0? 

Thanks you again for your time.  



[root@lvs2b1c-93cb linuxTool]# MegaCli64 -PDList -aALL
                                     
Adapter #0




Enclosure Device ID: 252
Slot Number: 0
Drive's postion: DiskGroup: 0, Span: 0, Arm: 0
Enclosure position: N/A
Device Id: 11
WWN: 5000C5003A25E5B0
Sequence Number: 2
Media Error Count: 0
Other Error Count: 0
Predictive Failure Count: 0
Last Predictive Failure Event Seq Number: 0
PD Type: SAS




Raw Size: 279.396 GB [0x22ecb25c Sectors]
Non Coerced Size: 278.896 GB [0x22dcb25c Sectors]
Coerced Size: 278.875 GB [0x22dc0000 Sectors]
Firmware state: Online, Spun Up
Device Firmware Level: FS64
Shield Counter: 0
Successful diagnostics completion on :  N/A
SAS Address(0): 0x5000c5003a25e5b1
SAS Address(1): 0x0
Connected Port Number: 3(path0) 
Inquiry Data: SEAGATE ST9300603SS     FS646SE3T05T            
FDE Capable: Not Capable
FDE Enable: Disable
Secured: Unsecured
Locked: Unlocked
Needs EKM Attention: No
Foreign State: None 
Device Speed: 6.0Gb/s 
Link Speed: 6.0Gb/s 
Media Type: Hard Disk Device
Drive Temperature :28C (82.40 F)
PI Eligibility:  No 
Drive is formatted for PI information:  No
PI: No PI
Port-0 :
Port status: Active
Port's Linkspeed: 6.0Gb/s 
Port-1 :
Port status: Active
Port's Linkspeed: Unknown 
Drive has flagged a S.M.A.R.T alert : No

Enclosure Device ID: 252
Slot Number: 1
Drive's postion: DiskGroup: 0, Span: 0, Arm: 1
Enclosure position: N/A
Device Id: 10
WWN: 5000C5003A438B50
Sequence Number: 2
Media Error Count: 0
Other Error Count: 0
Predictive Failure Count: 0
Last Predictive Failure Event Seq Number: 0
PD Type: SAS
Raw Size: 279.396 GB [0x22ecb25c Sectors]
Non Coerced Size: 278.896 GB [0x22dcb25c Sectors]
Coerced Size: 278.875 GB [0x22dc0000 Sectors]
Firmware state: Online, Spun Up
Device Firmware Level: FS64
Shield Counter: 0
Successful diagnostics completion on :  N/A
SAS Address(0): 0x5000c5003a438b51
SAS Address(1): 0x0
Connected Port Number: 2(path0) 
Inquiry Data: SEAGATE ST9300603SS     FS646SE3VC6A            
FDE Capable: Not Capable
FDE Enable: Disable
Secured: Unsecured
Locked: Unlocked
Needs EKM Attention: No
Foreign State: None 
Device Speed: 6.0Gb/s 
Link Speed: 6.0Gb/s 
Media Type: Hard Disk Device
Drive Temperature :26C (78.80 F)
PI Eligibility:  No 
Drive is formatted for PI information:  No
PI: No PI
Port-0 :
Port status: Active
Port's Linkspeed: 6.0Gb/s 
Port-1 :
Port status: Active
Port's Linkspeed: Unknown 
Drive has flagged a S.M.A.R.T alert : No

Enclosure Device ID: 252
Slot Number: 2
Drive's postion: DiskGroup: 0, Span: 1, Arm: 0
Enclosure position: N/A
Device Id: 9
WWN: 5000C5003A433BEC
Sequence Number: 2
Media Error Count: 0
Other Error Count: 0
Predictive Failure Count: 0
Last Predictive Failure Event Seq Number: 0
PD Type: SAS

Raw Size: 279.396 GB [0x22ecb25c Sectors]
Non Coerced Size: 278.896 GB [0x22dcb25c Sectors]
Coerced Size: 278.875 GB [0x22dc0000 Sectors]
Firmware state: Online, Spun Up
Device Firmware Level: FS64
Shield Counter: 0
Successful diagnostics completion on :  N/A
SAS Address(0): 0x5000c5003a433bed
SAS Address(1): 0x0
Connected Port Number: 1(path0) 
Inquiry Data: SEAGATE ST9300603SS     FS646SE3VD8G            
FDE Capable: Not Capable
FDE Enable: Disable
Secured: Unsecured
Locked: Unlocked
Needs EKM Attention: No
Foreign State: None 
Device Speed: 6.0Gb/s 
Link Speed: 6.0Gb/s 
Media Type: Hard Disk Device
Drive Temperature :25C (77.00 F)
PI Eligibility:  No 
Drive is formatted for PI information:  No
PI: No PI
Port-0 :
Port status: Active
Port's Linkspeed: 6.0Gb/s 
Port-1 :
Port status: Active
Port's Linkspeed: Unknown 
Drive has flagged a S.M.A.R.T alert : No

Enclosure Device ID: 252
Slot Number: 3
Drive's postion: DiskGroup: 0, Span: 1, Arm: 1
Enclosure position: N/A
Device Id: 8
WWN: 5000C5003A2D1BDC
Sequence Number: 2
Media Error Count: 0
Other Error Count: 0
Predictive Failure Count: 0
Last Predictive Failure Event Seq Number: 0
PD Type: SAS

Raw Size: 279.396 GB [0x22ecb25c Sectors]
Non Coerced Size: 278.896 GB [0x22dcb25c Sectors]
Coerced Size: 278.875 GB [0x22dc0000 Sectors]
Firmware state: Online, Spun Up
Device Firmware Level: FS64
Shield Counter: 0
Successful diagnostics completion on :  N/A
SAS Address(0): 0x5000c5003a2d1bdd
SAS Address(1): 0x0
Connected Port Number: 0(path0) 
Inquiry Data: SEAGATE ST9300603SS     FS646SE3TTYY            
FDE Capable: Not Capable
FDE Enable: Disable
Secured: Unsecured
Locked: Unlocked
Needs EKM Attention: No
Foreign State: None 
Device Speed: 6.0Gb/s 
Link Speed: 6.0Gb/s 
Media Type: Hard Disk Device
Drive Temperature :25C (77.00 F)
PI Eligibility:  No 
Drive is formatted for PI information:  No
PI: No PI
Port-0 :
Port status: Active
Port's Linkspeed: 6.0Gb/s 
Port-1 :
Port status: Active
Port's Linkspeed: Unknown 
Drive has flagged a S.M.A.R.T alert : No


Exit Code: 0x00
[root@lvs2b1c-93cb linuxTool]# MegaCli64 -LDInfo -Lall -aALL

Adapter 0 -- Virtual Drive Information:
Virtual Drive: 0 (Target Id: 0)
Name                :
RAID Level          : Primary-1, Secondary-0, RAID Level Qualifier-0
Size                : 557.75 GB
Mirror Data         : 557.75 GB
State               : Optimal
Strip Size          : 64 KB
Number Of Drives per span:2
Span Depth          : 2
Default Cache Policy: WriteBack, ReadAhead, Direct, Write Cache OK if Bad BBU
Current Cache Policy: WriteBack, ReadAhead, Direct, Write Cache OK if Bad BBU
Default Access Policy: Read/Write
Current Access Policy: Read/Write
Disk Cache Policy   : Enabled
Encryption Type     : None
Default Power Savings Policy: Controller Defined
Current Power Savings Policy: None
Can spin up in 1 minute: Yes
LD has drives that support T10 power conditions: Yes
LD's IO profile supports MAX power savings with cached writes: No
Is VD Cached: No

Exit Code: 0x00
[root@lvs2b1c-93cb linuxTool]# MegaCli64 -AdpAllInfo -aALL
                                     
Adapter #0

==============================================================================
                    Versions
                ================
Product Name    : LSI MegaRAID SAS 9260-8i
Serial No       : SV11811701
FW Package Build: 12.12.0-0048




                    Mfg. Data
                ================
Mfg. Date       : 04/26/11
Rework Date     : 00/00/00
Revision No     : 60A
Battery FRU     : N/A




                Image Versions in Flash:
                ================
FW Version         : 2.120.63-1242
BIOS Version       : 3.22.00_4.11.05.00_0x05020000
Preboot CLI Version: 04.04-017:#%00008
WebBIOS Version    : 6.0-34-e_29-Rel
NVDATA Version     : 2.09.03-0013
Boot Block Version : 2.02.00.00-0000
BOOT Version       : 09.250.01.219




                Pending Images in Flash
                ================
None




                PCI Info
                ================
Controller Id   : 0000
Vendor Id       : 1000
Device Id       : 0079
SubVendorId     : 1000
SubDeviceId     : 9261




Host Interface  : PCIE




ChipRevision    : B4




Number of Frontend Port: 0 
Device Interface  : PCIE




Number of Backend Port: 8 
Port  :  Address
0        5000c5003a2d1bdd 
1        5000c5003a433bed 
2        5000c5003a438b51 
3        5000c5003a25e5b1 
4        0000000000000000 
5        0000000000000000 
6        0000000000000000 
7        0000000000000000 




                HW Configuration
                ================
SAS Address      : 500605b0033062e0
BBU              : Absent
Alarm            : Present
NVRAM            : Present
Serial Debugger  : Present
Memory           : Present
Flash            : Present
Memory Size      : 512MB
TPM              : Absent
On board Expander: Absent
Upgrade Key      : Absent
Temperature sensor for ROC    : Absent
Temperature sensor for controller    : Absent








                Settings
                ================
Current Time                     : 19:53:33 1/2, 2014
Predictive Fail Poll Interval    : 300sec
Interrupt Throttle Active Count  : 16
Interrupt Throttle Completion    : 50us
Rebuild Rate                     : 30%
PR Rate                          : 30%
BGI Rate                         : 30%
Check Consistency Rate           : 30%
Reconstruction Rate              : 30%
Cache Flush Interval             : 4s
Max Drives to Spinup at One Time : 4
Delay Among Spinup Groups        : 2s
Physical Drive Coercion Mode     : Disabled
Cluster Mode                     : Disabled
Alarm                            : Enabled
Auto Rebuild                     : Enabled
Battery Warning                  : Disabled
Ecc Bucket Size                  : 15
Ecc Bucket Leak Rate             : 1440 Minutes
Restore HotSpare on Insertion    : Disabled
Expose Enclosure Devices         : Enabled
Maintain PD Fail History         : Enabled
Host Request Reordering          : Enabled
Auto Detect BackPlane Enabled    : SGPIO/i2c SEP
Load Balance Mode                : Auto
Use FDE Only                     : No
Security Key Assigned            : No
Security Key Failed              : No
Security Key Not Backedup        : No
Default LD PowerSave Policy      : Controller Defined
Maximum number of direct attached drives to spin up in 1 min : 120 
Auto Enhanced Import             : No
Any Offline VD Cache Preserved   : No
Allow Boot with Preserved Cache  : No
Disable Online Controller Reset  : No
PFK in NVRAM                     : No
Use disk activity for locate     : No
POST delay                       : 90 seconds




                Capabilities
                ================
RAID Level Supported             : RAID0, RAID1, RAID5, RAID6, RAID00, RAID10, RAID50, RAID60, PRL 11, PRL 11 with spanning, SRL 3 supported, PRL11-RLQ0 DDF layout with no span, PRL11-RLQ0 DDF layout with span
Supported Drives                 : SAS, SATA




Allowed Mixing:




Mix in Enclosure Allowed
Mix of SAS/SATA of HDD type in VD Allowed




                Status
                ================
ECC Bucket Count                 : 0




                Limitations
                ================
Max Arms Per VD          : 32 
Max Spans Per VD         : 8 
Max Arrays               : 128 
Max Number of VDs        : 64 
Max Parallel Commands    : 1008 
Max SGE Count            : 80 
Max Data Transfer Size   : 8192 sectors 
Max Strips PerIO         : 42 
Max LD per array         : 16 
Min Strip Size           : 8 KB
Max Strip Size           : 1.0 MB
Max Configurable CacheCade Size: 0 GB
Current Size of CacheCade      : 0 GB
Current Size of FW Cache       : 394 MB




                Device Present
                ================
Virtual Drives    : 1 
  Degraded        : 0 
  Offline         : 0 
Physical Devices  : 5 
  Disks           : 4 
  Critical Disks  : 0 
  Failed Disks    : 0 




                Supported Adapter Operations
                ================
Rebuild Rate                    : Yes
CC Rate                         : Yes
BGI Rate                        : Yes
Reconstruct Rate                : Yes
Patrol Read Rate                : Yes
Alarm Control                   : Yes
Cluster Support                 : No
BBU                             : Yes
Spanning                        : Yes
Dedicated Hot Spare             : Yes
Revertible Hot Spares           : Yes
Foreign Config Import           : Yes
Self Diagnostic                 : Yes
Allow Mixed Redundancy on Array : No
Global Hot Spares               : Yes
Deny SCSI Passthrough           : No
Deny SMP Passthrough            : No
Deny STP Passthrough            : No
Support Security                : No
Snapshot Enabled                : No
Support the OCE without adding drives : Yes
Support PFK                     : Yes
Support PI                      : No
Support Boot Time PFK Change    : No
Disable Online PFK Change       : No
PFK TrailTime Remaining         : 0 days 0 hours
Support Shield State            : No
Block SSD Write Disk Cache Change: No




                Supported VD Operations
                ================
Read Policy          : Yes
Write Policy         : Yes
IO Policy            : Yes
Access Policy        : Yes
Disk Cache Policy    : Yes
Reconstruction       : Yes
Deny Locate          : No
Deny CC              : No
Allow Ctrl Encryption: No
Enable LDBBM         : No
Support Breakmirror  : No
Power Savings        : Yes




                Supported PD Operations
                ================
Force Online                            : Yes
Force Offline                           : Yes
Force Rebuild                           : Yes
Deny Force Failed                       : No
Deny Force Good/Bad                     : No
Deny Missing Replace                    : No
Deny Clear                              : No
Deny Locate                             : No
Support Temperature                     : Yes
Disable Copyback                        : No
Enable JBOD                             : No
Enable Copyback on SMART                : No
Enable Copyback to SSD on SMART Error   : Yes
Enable SSD Patrol Read                  : No
PR Correct Unconfigured Areas           : Yes
Enable Spin Down of UnConfigured Drives : Yes
Disable Spin Down of hot spares         : No
Spin Down time                          : 30 
T10 Power State                         : Yes
                Error Counters
                ================
Memory Correctable Errors   : 0 
Memory Uncorrectable Errors : 0 




                Cluster Information
                ================
Cluster Permitted     : No
Cluster Active        : No




                Default Settings
                ================
Phy Polarity                     : 0 
Phy PolaritySplit                : 0 
Background Rate                  : 30 
Strip Size                       : 64kB
Flush Time                       : 4 seconds
Write Policy                     : WB
Read Policy                      : Adaptive
Cache When BBU Bad               : Disabled
Cached IO                        : No
SMART Mode                       : Mode 6
Alarm Disable                    : Yes
Coercion Mode                    : None
ZCR Config                       : Unknown
Dirty LED Shows Drive Activity   : No
BIOS Continue on Error           : No
Spin Down Mode                   : None
Allowed Device Type              : SAS/SATA Mix
Allow Mix in Enclosure           : Yes
Allow HDD SAS/SATA Mix in VD     : Yes
Allow SSD SAS/SATA Mix in VD     : No
Allow HDD/SSD Mix in VD          : No
Allow SATA in Cluster            : No
Max Chained Enclosures           : 16 
Disable Ctrl-R                   : Yes
Enable Web BIOS                  : Yes
Direct PD Mapping                : No
BIOS Enumerate VDs               : Yes
Restore Hot Spare on Insertion   : No
Expose Enclosure Devices         : Yes
Maintain PD Fail History         : Yes
Disable Puncturing               : No
Zero Based Enclosure Enumeration : No
PreBoot CLI Enabled              : Yes
LED Show Drive Activity          : Yes
Cluster Disable                  : Yes
SAS Disable                      : No
Auto Detect BackPlane Enable     : SGPIO/i2c SEP
Use FDE Only                     : No
Enable Led Header                : Yes
Delay during POST                : 0 
EnableCrashDump                  : No
Disable Online Controller Reset  : No
EnableLDBBM                      : No
Un-Certified Hard Disk Drives    : Allow
Treat Single span R1E as R10     : No
Max LD per array                 : 16
Power Saving option              : Don't Auto spin down Configured Drives
Max power savings option is  not allowed for LDs. Only T10 power conditions are to be used.
Default spin down time in minutes: 30 
Enable JBOD                      : No
TTY Log In Flash                 : No
Auto Enhanced Import             : No
BreakMirror RAID Support         : No
Disable Join Mirror              : No
Enable Shield State              : No
Time taken to detect CME         : 60s




Exit Code: 0x00



At 2014-01-02 23:40:24,"David Nellans" <david@nellans.org> wrote:
>
>> Problem summary:
>>   The IOPS is very unstable since I changed the number of  jobs from 2 to 4.  even I changed it back,  the IOPS performance also can't return back.
>> # cat 1.fio
>> [global]
>> rw=randread
>> size=128m
>>
>> [job1]
>>
>> [job2]
>>
>> when I run fio 1.fio,  the iops is around 31k.  and then I add the following 2 entries:
>> [job3]
>>
>> [job4]
>>
>> The IOPS dropped to around 1k.
>>
>> Even I remove these 2 jobs,  the IOPS still be around 1k.
>>
>> Only if  I removed all the jobn.n.0 files,  and re-run with 2 jobs setting,  the IOPS can be 31k again.
>
>> # bash blkinfo.sh  /dev/sda
>> Vendor     : LSI
>> Model      : MR9260-8i
>> Nr_request : 128
>> rotational : 1
>
>It looks like you're testing against a LSI megaraid SAS controller, 
>which presumably has magnetic drives attached.  When you add more jobs 
>to your config its going to cause the heads on the drives (you don't say 
>how many you have) to thrash more as they try and interleave requests 
>that are going to land on different portions of the disk.  So its not 
>unsurprising that you'll see IOPS drop off.
>
>A lot of how and where the IOPS will drop off is going to depend on the 
>raid config of the drives you have attached to the controller however. 
>Generally speaking 31k IOPS at 128MB I/O's (which will be split into
>something smaller like 1MB typically) is well beyond what you should 
>expect 8 HDD's to do unless you're getting lots of hits in the DRAM 
>buffer on the raid controller. Enterprise HDD's (even 15k ones) 
>generally can only sustain <= 250 random read IOPS, so even with perfect 
>interleaving on an 8 drive raid-0, 31k seem suspicious, 1k seems 
>perfectly realistic however!

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: very unstable IOPS in the same test on the same machine
  2014-01-03  3:14   ` tech8891
@ 2014-01-03  3:37     ` David Nellans
  0 siblings, 0 replies; 5+ messages in thread
From: David Nellans @ 2014-01-03  3:37 UTC (permalink / raw)
  To: tech8891; +Cc: fio

On 01/02/2014 09:14 PM, tech8891 wrote:
> Thanks for your answer, Nellans!   I think 31k IOPS should be because the RAID card cache.   Seems we have 394MB cache in this controller, which is enough hold the 2 * 128M data.

While I'm not a megaraid expert it looks like you're using 4 seagate 10k 
rpm drives with a 3ms random read time (looked up specs from seagat) 
drives in a raid 10 with a stripe size of 64KB.  Raid 10
performance affects aside, you essentially have a raid-0 of two drives. 
At absolute best, each of your drives might be able to do 333 IOPS at a 
small random I/O size. 100% raid-0 efficiency would give you 666 IOPS 
peak.  So getting 400 IOPS with random block sizes within a 1.2G range 
(where the disk heads don't have to move much thus speeding up seek 
time), seems very reasonable.  Nothing seems obviously wrong to me with 
the performance you're seeing.

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2014-01-03  3:37 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2014-01-02  9:58 very unstable IOPS in the same test on the same machine tech8891
2014-01-02 15:40 ` David Nellans
2014-01-02 17:04   ` Roger Sibert
2014-01-03  3:14   ` tech8891
2014-01-03  3:37     ` David Nellans

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.