All of lore.kernel.org
 help / color / mirror / Atom feed
* HBA Adaptor advice
@ 2011-05-19 12:26 Ed W
  2011-05-19 12:36 ` Roman Mamedov
                   ` (4 more replies)
  0 siblings, 5 replies; 60+ messages in thread
From: Ed W @ 2011-05-19 12:26 UTC (permalink / raw)
  To: linux-raid

Hi, following on from a recent thread, can folks with decent multi-port
HBA adaptors please chime in with some model numbers of known decent
adaptors please?

The required use is to grow from currently 8 ish drives to perhaps 12-24
drives per machine. (It partitions out as: one or more RAID6 arrays for
data, plus a couple of backup drives)

Ideally I would like a controller with writeback cache and BBU since
whilst this office machine is likely quite underused, for any sensible
amount of IO (some of the other machines we might upgrade) this seems to
give a 10-100x increase in IOPs?  For the moment it's just a nice to
have though

I only intend to use linux software raid, so any onboard raid
functionality is just a liability.  Budget is either low £100 ish for
multi-port HBAs without cache, up to £1000 ish for 16-24 port high
performance cache controllers:


So far I saw recommendations for:

- LSI 1068E (SuperMicro 3081E)  (8 port 3Gb)
- LSI 9211-8i (8 port 6Gb)

And to avoid:
- Marvel controllers?
- Areca with marvel controllers?
- AOC-SASLP-MV8

these any good?
- LSI MegaRAID 9280-24i4e
- Areca ARC-1880ix-24


I'm completely ignorant of the current state of adaptors today:

- Are there any bargains to be had in the lower end 8-24 port category
(ie come up frequently as ebay specials and aren't locked to special
DELL-only disks, etc?)

- Cable management. Are there any backplanes for retro fitting into
desktop chassis (5 1/4 bays say?) which take single (8087?) connectors?


At the moment I just need to refresh our office server (10-12 disks
including back drives) and we need something compact and quiet so
looking for compact tower chassis options.  I'm also looking at adding
more storage into our datacenter racks though, so interested in a
shopping list of reliable higher performance options?

Please add suggestions for good value, reliable controllers known to
work well with linux

Thanks

Ed W

--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: HBA Adaptor advice
  2011-05-19 12:26 HBA Adaptor advice Ed W
@ 2011-05-19 12:36 ` Roman Mamedov
  2011-05-19 12:43   ` Mathias Burén
  2011-05-19 14:06 ` Michael Sallaway
                   ` (3 subsequent siblings)
  4 siblings, 1 reply; 60+ messages in thread
From: Roman Mamedov @ 2011-05-19 12:36 UTC (permalink / raw)
  To: Ed W; +Cc: linux-raid

[-- Attachment #1: Type: text/plain, Size: 577 bytes --]

On Thu, 19 May 2011 13:26:49 +0100
Ed W <lists@wildgooses.com> wrote:

> Hi, following on from a recent thread, can folks with decent multi-port
> HBA adaptors please chime in with some model numbers of known decent
> adaptors please?

Here is a useful link for you: http://blog.zorinaq.com/?e=10

> And to avoid:
> - Marvel controllers?

There are two kinds (families?) of Marvell SATA chips, and those using the
"sata_mv" module do work fine. It seems like all the complaints are directed
at the other kind, supported via "mvsas".

-- 
With respect,
Roman

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 198 bytes --]

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: HBA Adaptor advice
  2011-05-19 12:36 ` Roman Mamedov
@ 2011-05-19 12:43   ` Mathias Burén
  0 siblings, 0 replies; 60+ messages in thread
From: Mathias Burén @ 2011-05-19 12:43 UTC (permalink / raw)
  To: Roman Mamedov; +Cc: Ed W, linux-raid

On 19 May 2011 13:36, Roman Mamedov <rm@romanrm.ru> wrote:
> On Thu, 19 May 2011 13:26:49 +0100
> Ed W <lists@wildgooses.com> wrote:
>
>> Hi, following on from a recent thread, can folks with decent multi-port
>> HBA adaptors please chime in with some model numbers of known decent
>> adaptors please?
>
> Here is a useful link for you: http://blog.zorinaq.com/?e=10
>
>> And to avoid:
>> - Marvel controllers?
>
> There are two kinds (families?) of Marvell SATA chips, and those using the
> "sata_mv" module do work fine. It seems like all the complaints are directed
> at the other kind, supported via "mvsas".
>
> --
> With respect,
> Roman
>

I have to chime in; I do have this one:

05:00.0 SCSI storage controller: HighPoint Technologies, Inc.
RocketRAID 230x 4 Port SATA-II Controller (rev 02)

Which uses the sata_mv module. From dmesg:

[    1.062151] sata_mv: Highpoint RocketRAID BIOS CORRUPTS DATA on all
attached drives, regardless of if/how they are configured. BEWARE!
[    1.062156] sata_mv: For data safety, do not use sectors 8-9 on
"Legacy" drives, and avoid the final two gigabytes on all RocketRAID
BIOS initialized drives.

So stay away from RocketRAID  :-) (I only use it as a HBA)

Cheers,
/M

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: HBA Adaptor advice
  2011-05-19 12:26 HBA Adaptor advice Ed W
  2011-05-19 12:36 ` Roman Mamedov
@ 2011-05-19 14:06 ` Michael Sallaway
  2011-05-19 19:10 ` Thomas Harold
                   ` (2 subsequent siblings)
  4 siblings, 0 replies; 60+ messages in thread
From: Michael Sallaway @ 2011-05-19 14:06 UTC (permalink / raw)
  To: Ed W; +Cc: linux-raid


On 19/05/2011 10:26 PM, Ed W wrote:
> So far I saw recommendations for:
>
> - LSI 1068E (SuperMicro 3081E)  (8 port 3Gb)

I'm using one of these, and it's working great. It's a Supermicro 
AOC-USASLP-L8i. I'm using all 8 ports, and another 4 drives on the 
onboard controller.

Note, however, that I was first running a slightly older kernel, and it 
didn't work (bus errors, lots of weird stuff happening), so I 
temporarily gave up on it -- I believe it was either 2.6.31 or 2.6.32 
(Ubuntu 10.04 LTS).  However, due to a need for something else, I 
upgraded to a 2.6.35 kernel, and it started working fine with that 
kernel. Been using it for ~9 months perfectly fine.

Cheers,
Michael



^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: HBA Adaptor advice
  2011-05-19 12:26 HBA Adaptor advice Ed W
  2011-05-19 12:36 ` Roman Mamedov
  2011-05-19 14:06 ` Michael Sallaway
@ 2011-05-19 19:10 ` Thomas Harold
  2011-05-19 21:12   ` Rudy Zijlstra
  2011-05-19 21:07 ` Brad Campbell
  2011-05-20  2:08 ` Andy Smith
  4 siblings, 1 reply; 60+ messages in thread
From: Thomas Harold @ 2011-05-19 19:10 UTC (permalink / raw)
  To: Ed W; +Cc: linux-raid

On 5/19/2011 8:26 AM, Ed W wrote:
> Hi, following on from a recent thread, can folks with decent
> multi-port HBA adaptors please chime in with some model numbers of
> known decent adaptors please?
>
> The required use is to grow from currently 8 ish drives to perhaps
> 12-24 drives per machine. (It partitions out as: one or more RAID6
> arrays for data, plus a couple of backup drives)
>
> Ideally I would like a controller with writeback cache and BBU since
> whilst this office machine is likely quite underused, for any
> sensible amount of IO (some of the other machines we might upgrade)
> this seems to give a 10-100x increase in IOPs? For the moment it's
> just a nice to have though
>
> I only intend to use linux software raid, so any onboard raid
> functionality is just a liability. Budget is either low £100 ish for
> multi-port HBAs without cache, up to £1000 ish for 16-24 port high
> performance cache controllers:

I've been using a SuperMicro AOC-SASLP-MV8 (which is on your avoid 
list), which reports itself as:

class: SCSI
bus: PCI
detached: 0
driver: mvsas
desc: "Marvell Technology Group Ltd. MV64460/64461/64462 System 
Controller, Revision B"
vendorId: 11ab
deviceId: 6485
subVendorId: 15d9
subDeviceId: 0500

I've had it about 6 months at this point with SATA drives hooked up to 
it.  The issues that I've had with it dropping disks from the 6-disk 
RAID-10 array on CentOS 5.5 / 5.6 can probably be traced to:

Not using enterprise grade SATA disks (as the consumer brand takes too 
long to timeout on a bad seek, and mdadm dropped it from the array). 
Possibly combined with using a really inexpensive set of removable drive 
trays.  There were a lot of times after the weekly resync where the 
entire array went offline due to multiple drives being dropped.

Under normal operation it reads/writes to the disks fine and works fine 
as a controller.  Since this is my own personal server, I have not 
tested it with good SAS disks or enterprise SATAs and good drive 
enclosures.  I've since switched over to just hooking up a pair of RAID1 
arrays to it with a direct connect from the card to the drives (no 
removable trays), but I don't have enough time on the new setup to say 
that the problem is permanently fixed yet.

The card is inexpensive, which is a plus.  It's a PCIe x4 card.  I don't 
know whether it would be better behaved with a better class of disks / 
enclosures.

--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: HBA Adaptor advice
  2011-05-19 12:26 HBA Adaptor advice Ed W
                   ` (2 preceding siblings ...)
  2011-05-19 19:10 ` Thomas Harold
@ 2011-05-19 21:07 ` Brad Campbell
  2011-05-20 20:58   ` Tobias McNulty
  2011-05-20  2:08 ` Andy Smith
  4 siblings, 1 reply; 60+ messages in thread
From: Brad Campbell @ 2011-05-19 21:07 UTC (permalink / raw)
  To: RAID Linux

On 19/05/11 20:26, Ed W wrote:

> Please add suggestions for good value, reliable controllers known to
> work well with linux

I have three of these :

http://www.startech.com/product/PEXSATA24E-2-Port-eSATA-4-Port-SATA-PCI-Express-x4-SATA-Controller-Adapter-Card-PCIe

and 4 of these :

http://www.ebay.com.au/itm/IBM-M1015-46M0861-ServeRAID-M1015-SAS-SATA-Controller-/280655527117?pt=AU_Server_Accessories_Parts&hash=item41585f7ccd

All of which I can't recommend highly enough.

I got the Startech ones cheap from a dodgy shop about 4 years ago. They cost me about $30 each.
I got the IBM (really LSI) ones cheap from ebay at about $110 each at Christmas.

The Startech cards use the sata_mv driver and are solid, the LSI cards use the megaraid_sas driver 
and are solid. As a bonus of having SAS ports, I picked up 4 Seagate Cheetah 15k.5 SAS drives for a 
wicked fast RAID10 array.

Regards,
Brad

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: HBA Adaptor advice
  2011-05-19 19:10 ` Thomas Harold
@ 2011-05-19 21:12   ` Rudy Zijlstra
  0 siblings, 0 replies; 60+ messages in thread
From: Rudy Zijlstra @ 2011-05-19 21:12 UTC (permalink / raw)
  To: Thomas Harold; +Cc: Ed W, linux-raid

On Thu, 2011-05-19 at 15:10 -0400, Thomas Harold wrote:
> On 5/19/2011 8:26 AM, Ed W wrote:
> > Hi, following on from a recent thread, can folks with decent
> > multi-port HBA adaptors please chime in with some model numbers of
> > known decent adaptors please?
> >
> > The required use is to grow from currently 8 ish drives to perhaps
> > 12-24 drives per machine. (It partitions out as: one or more RAID6
> > arrays for data, plus a couple of backup drives)
> >
> > Ideally I would like a controller with writeback cache and BBU since
> > whilst this office machine is likely quite underused, for any
> > sensible amount of IO (some of the other machines we might upgrade)
> > this seems to give a 10-100x increase in IOPs? For the moment it's
> > just a nice to have though
> >
> > I only intend to use linux software raid, so any onboard raid
> > functionality is just a liability. Budget is either low £100 ish for
> > multi-port HBAs without cache, up to £1000 ish for 16-24 port high
> > performance cache controllers:
> 
> I've been using a SuperMicro AOC-SASLP-MV8 (which is on your avoid 
> list), which reports itself as:
> 
> class: SCSI
> bus: PCI
> detached: 0
> driver: mvsas
> desc: "Marvell Technology Group Ltd. MV64460/64461/64462 System 
> Controller, Revision B"
> vendorId: 11ab
> deviceId: 6485
> subVendorId: 15d9
> subDeviceId: 0500
> 
> I've had it about 6 months at this point with SATA drives hooked up to 
> it.  The issues that I've had with it dropping disks from the 6-disk 
> RAID-10 array on CentOS 5.5 / 5.6 can probably be traced to:
> 
> Not using enterprise grade SATA disks (as the consumer brand takes too 
> long to timeout on a bad seek, and mdadm dropped it from the array). 
> Possibly combined with using a really inexpensive set of removable drive 
> trays.  There were a lot of times after the weekly resync where the 
> entire array went offline due to multiple drives being dropped.
> 
> Under normal operation it reads/writes to the disks fine and works fine 
> as a controller.  Since this is my own personal server, I have not 
> tested it with good SAS disks or enterprise SATAs and good drive 
> enclosures.  I've since switched over to just hooking up a pair of RAID1 
> arrays to it with a direct connect from the card to the drives (no 
> removable trays), but I don't have enough time on the new setup to say 
> that the problem is permanently fixed yet.
> 
> The card is inexpensive, which is a plus.  It's a PCIe x4 card.  I don't 
> know whether it would be better behaved with a better class of disks / 
> enclosures.
Its inexpensive and unfortunately you are describing symptoms that
belong to the chipset. 


It is remains firmly on my avoidance list, and i have one... 

Rudy

--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: HBA Adaptor advice
  2011-05-19 12:26 HBA Adaptor advice Ed W
                   ` (3 preceding siblings ...)
  2011-05-19 21:07 ` Brad Campbell
@ 2011-05-20  2:08 ` Andy Smith
  2011-05-20  5:30   ` Stan Hoeppner
  2011-05-20  7:33   ` Ed W
  4 siblings, 2 replies; 60+ messages in thread
From: Andy Smith @ 2011-05-20  2:08 UTC (permalink / raw)
  To: linux-raid

Hi Ed,

On Thu, May 19, 2011 at 01:26:49PM +0100, Ed W wrote:
> Ideally I would like a controller with writeback cache and BBU since
> whilst this office machine is likely quite underused, for any sensible
> amount of IO (some of the other machines we might upgrade) this seems to
> give a 10-100x increase in IOPs?  For the moment it's just a nice to
> have though

Are there actually any HBAs that have BBU without using their RAID
features?

I'd like to stop using hardware RAID but I can't give up the BBU and
write cache.

Cheers,
Andy

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: HBA Adaptor advice
  2011-05-20  2:08 ` Andy Smith
@ 2011-05-20  5:30   ` Stan Hoeppner
  2011-05-21  9:52     ` Ed W
  2011-05-20  7:33   ` Ed W
  1 sibling, 1 reply; 60+ messages in thread
From: Stan Hoeppner @ 2011-05-20  5:30 UTC (permalink / raw)
  To: linux-raid

On 5/19/2011 9:08 PM, Andy Smith wrote:
> Hi Ed,
> 
> On Thu, May 19, 2011 at 01:26:49PM +0100, Ed W wrote:
>> Ideally I would like a controller with writeback cache and BBU since
>> whilst this office machine is likely quite underused, for any sensible
>> amount of IO (some of the other machines we might upgrade) this seems to
>> give a 10-100x increase in IOPs?  For the moment it's just a nice to
>> have though
> 
> Are there actually any HBAs that have BBU without using their RAID
> features?

AFAIK the LSI real RAID cards allow this.  To get them into a JBOD mode
you have to create a single drive RAID 0 of each disk and export it.  By
doing so the RAID firmware is actually active, though not really doing
anything, so you get the cache and BBU benefit of the controller.  One
of the XFS developers, Dave Chinner, posted this to the XFS list quite
some time ago when we discussed hardware vs software RAID setups.

-- 
Stan

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: HBA Adaptor advice
  2011-05-20  2:08 ` Andy Smith
  2011-05-20  5:30   ` Stan Hoeppner
@ 2011-05-20  7:33   ` Ed W
  2011-05-20 10:21     ` Stan Hoeppner
  2011-05-20 12:18     ` Joe Landman
  1 sibling, 2 replies; 60+ messages in thread
From: Ed W @ 2011-05-20  7:33 UTC (permalink / raw)
  To: linux-raid

On 20/05/2011 03:08, Andy Smith wrote:
> Are there actually any HBAs that have BBU without using their RAID
> features?
> 
> I'd like to stop using hardware RAID but I can't give up the BBU and
> write cache.


This is a very interesting question. Does anyone know if say the Areca
ARC-1880ix-24 can be used in the same way, ie battery backed JBOD type mode?

I received a recommendation offlist that the various 3Ware SAS 9750-xx
cards can be used easily as a bunch of single drives, however, comparing
the photos of these with the LSI MegaRAID 9280-xx they seem identical?
(Presumed to be identical?).  Anyone know why LSI sell an identical card
under the 3ware brand (still)?  Curiously I see the LSI generally
selling a little cheaper than the 3ware in the uk... (wierd)

Are there any cards to avoid because they *can't* be used in this way?
eg Dell PERC6 seem to come up cheaply on ebay - can these be used as BBU
backed single JBOD controllers?

I guess the limitation is that some of these cards can only create a
small number of arrays and/or they don't use their writeback cache
efficiently in the case of multiple arrays?

Thanks for any education here.  (I found a cheap Areca on ebay, plus
been eyeing up the various cheap Dell PERC cards...)

Ed W

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: HBA Adaptor advice
  2011-05-20  7:33   ` Ed W
@ 2011-05-20 10:21     ` Stan Hoeppner
  2011-05-21 11:17       ` Ed W
  2011-05-20 12:18     ` Joe Landman
  1 sibling, 1 reply; 60+ messages in thread
From: Stan Hoeppner @ 2011-05-20 10:21 UTC (permalink / raw)
  To: Ed W; +Cc: linux-raid

On 5/20/2011 2:33 AM, Ed W wrote:
> On 20/05/2011 03:08, Andy Smith wrote:
>> Are there actually any HBAs that have BBU without using their RAID
>> features?
>>
>> I'd like to stop using hardware RAID but I can't give up the BBU and
>> write cache.

I'm curious why you are convinced that you need BBWC, or even simply WC,
on an HBA used for md RAID.  I'm also curious as to why you are so
adamant about _not_ using the RAID ASIC on an HBA, given that it will
take much greater advantage of the BBWC than md RAID will.  You may be
interested to know:

1.  When BBWC is enabled, all internal drive caches must be disabled.
    Otherwise you eliminate the design benefit of the BBU, and may as
    well not have one.
2.  w/md RAID on an HBA, if you have a good UPS and don't suffer
    kernel panics, crashes, etc, you can disable barrier support in
    your FS and you can use the drive caches.
3.  The elevator will perform well directly on drives with large cache

Most good higher end RAID cards have 512MB to 1GB or cache.  w/12 2TB
drives you'll have a combined cache of 768MB, as most drives of this
size have a 64MB cache.  So there's not much difference in total cache
size.  And the drive firmware will usually make better decisions WRT
cache use optimization than an upstream RAID card BIOS that has disabled
the drive caches.

For a stable system with good UPS and auto shutdown configured, BBWC is
totally overrated.  If the system never takes a nose dive from power
drop, and doesn't crash due to software or hardware failure, then BBWC
is a useless $200-1000 option.  Some hardware RAID cards require a
functional BBU before they will allow you to enable write caching.  In
that case BBU is needed.  In most other cases it's not.

If your current reasoning for wanting write cache on the HBA is
performance, then forget about the write cache as you don't need it with
md RAID.  If you want the BBWC combo for safety as your system isn't
stable or you have a crappy or no UPS, then forgo md RAID and use the
hardware RAID and BBWC combo.

One last point:  If you're bargain hunting, especially if looking at
used gear on Ebay, that mindset is antithetical to proper system
integration, especially when talking about a RAID card BBU.  If you buy
a use card, the first thing you muse do is chuck the BBU and order a new
one, because the used battery can't be trusted--you have no idea how
much life is left in it.  For you data to be safe, you need a new
battery.  Buying a brand new card w/bundled BBU may cost you the same or
less than a used card and a new battery from the manufacturer.

The following would be a darn good fit for your md RAID office server
setup, given your criteria, WRT the HBA, hot swap cages, drives, and
cables.  Drop the LSI SAS HBA into a PCIe 2.0 x8 slot.  Drop the Intel
24 port SAS expander into an x4/x8 slot, or mount it to the side or
floor of the chassis and power it via the 4 pin Molex plug.  Connect the
8087/8087 cable from the LSI card to the first port on the Intel SAS
Expander.  Mount the 5 IcyDock 4 x 2.5" SAS hot swap backplane cages in
5 x 5.25" externally accessible drive bays.  Connect each of the five
8087 breakout cables from the remaining 5 ports on the Intel Expander to
each of the hot swap backplanes--one cable per backplane--label which
drive connects to which port on the Intel expander so you can properly
identify failed drives!  Mount each Seagate Enterprise 2.5" 1TB drive in
a tray and insert the trays into the backplanes--fill each quad bay
before putting drives in the next bay.  After booting the machine hop
into the LSI BIOS and configure for JBOD.  You should know how to do the
read.

This setup gives you 12 enterprise 2.5" SAS 7.2K RPM 1TB drives--not
cheap SATA drives not fit for RAID--12TB raw total, in only three 5.25"
bays, and drawing much less power than equivalent 3.5" drives.  You will
have 8 free hot swap bays for future expansion, 20TB total if acquiring
the same drives.  Controller to drive aggregate bandwidth is 2.4GB/s,
4.8GB/s full duplex, HBA to host b/w is 4/8 GB/s, likely far more than
you need.

The parts list.  Total cost from NewEgg in the US is ~$3800 with ~$3000
of that being the 12 drives at $250 each.  The HBA + expander are only $470.

Buy 1:
http://www.lsi.com/channel/products/megaraid/sassata/9240-4i/index.html

Buy 1:
http://www.intel.com/Products/Server/RAID-controllers/re-res2sv240/RES2SV240-Overview.htm

Buy 5:
http://www.icydock.com/goods.php?id=114

Buy 12:
http://www.seagate.com/ww/v/index.jsp?name=st91000640ss-constellation2-6gbs-sas-1-tb-hd&vgnextoid=ff13c5b2933d9210VgnVCM1000001a48090aRCRD&vgnextchannel=f424072516d8c010VgnVCM100000dd04090aRCRD&locale=en-US&reqPage=Support#tTabContentSpecifications

Buy 5 (or local equivalent):
http://www.newegg.com/Product/Product.aspx?Item=N82E16816116098&cm_re=cable-_-16-116-098-_-Product

Buy 1 (or local equivalent):
http://www.newegg.com/Product/Product.aspx?Item=N82E16816116093&cm_re=cable-_-16-116-093-_-Product

Food for thought.  Hope it's useful as I killed over an hour putting
this together for you. :)

-- 
Stan

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: HBA Adaptor advice
  2011-05-20  7:33   ` Ed W
  2011-05-20 10:21     ` Stan Hoeppner
@ 2011-05-20 12:18     ` Joe Landman
  2011-05-20 12:34       ` Roman Mamedov
                         ` (2 more replies)
  1 sibling, 3 replies; 60+ messages in thread
From: Joe Landman @ 2011-05-20 12:18 UTC (permalink / raw)
  To: Ed W; +Cc: linux-raid

On 05/20/2011 03:33 AM, Ed W wrote:
> On 20/05/2011 03:08, Andy Smith wrote:
>> Are there actually any HBAs that have BBU without using their RAID
>> features?
>>
>> I'd like to stop using hardware RAID but I can't give up the BBU and
>> write cache.

HBAs don't have BBU or write cache.  Only RAIDs do.  While you can run 
the RAID in JBOD mode, you effectively lose the cache (and BBU) aspect 
by doing so.

More in a moment.

> This is a very interesting question. Does anyone know if say the Areca
> ARC-1880ix-24 can be used in the same way, ie battery backed JBOD type mode?

If you absolutely insist on using a large expensive RAID card as a JBOD 
card, yeah, there are things you *can* do to keep access to the cache 
and BBU, though they are counter-intuitive.

First off, the LSI 920x series has a 16 port HBA.  You can look it up on 
their site.  SAS+SATA HBA I think.  LSI likes adorning some of their 
HBAs with some inherent RAID capability (their IR mode).  I personally 
prefer the IT mode, but its sometimes hard/impossible to make the switch 
(this is usually for motherboard mounted 'RAID' units). HBAs can be used 
as RAIDs, though the performance is abysmal (c.f. PERC*, lower end LSI 
... which PERC are rebranded versions of, ...)

Second off, you can turn any of the expensive RAID cards into an 'JBOD' 
by doing something like this:

1) have the unit configured in RAID mode

2) build virtual disks out of single drives, as RAID0.

3) iterate 2 until you exhaust your drives.

4) make sure you prevent these drives from messing with your boot drive 
order ... some bioses "helpfully" reorganize new drives for you by 
messing with this list.

Once the drive is a 1 disk RAID0, you get the cache, and the BBU for the 
cache.  Yeah, its a little weird.  But it does work (we've done this 
with some LSI8888's).

When you do this, then use mdadm atop this.  We've found, generally, by 
doing this, we can build much faster RAIDs than the LSI 8888 units, and 
comparible to the 9260's in terms of performance across the same number 
of disks, at a lower price.  E.g. mdadm and the MD RAID stack are quite 
good.

[...]

> I guess the limitation is that some of these cards can only create a
> small number of arrays and/or they don't use their writeback cache
> efficiently in the case of multiple arrays?

These are the issues.  Most RAID cards aren't thinking they'll be used 
on more than a few LUNs/RAIDs at a time, so they might not scale well 
here, with 16 or 24 single drive RAID0's.

The additional cache doesn't buy you much for this arrangement. Might 
work against you if the card CPU is slow (as most of the hardware RAID 
chips are).


^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: HBA Adaptor advice
  2011-05-20 12:18     ` Joe Landman
@ 2011-05-20 12:34       ` Roman Mamedov
  2011-05-20 12:36         ` Mathias Burén
  2011-05-20 12:48         ` Joe Landman
  2011-05-20 13:21       ` Ed W
  2011-05-20 20:01       ` Andy Smith
  2 siblings, 2 replies; 60+ messages in thread
From: Roman Mamedov @ 2011-05-20 12:34 UTC (permalink / raw)
  To: Joe Landman; +Cc: Ed W, linux-raid

[-- Attachment #1: Type: text/plain, Size: 976 bytes --]

On Fri, 20 May 2011 08:18:32 -0400
Joe Landman <joe.landman@gmail.com> wrote:

> Second off, you can turn any of the expensive RAID cards into an 'JBOD' 
> by doing something like this:
> 
> 1) have the unit configured in RAID mode
> 
> 2) build virtual disks out of single drives, as RAID0.
> 
> 3) iterate 2 until you exhaust your drives.
> 
> 4) make sure you prevent these drives from messing with your boot drive 
> order ... some bioses "helpfully" reorganize new drives for you by 
> messing with this list.
> 
> Once the drive is a 1 disk RAID0, you get the cache, and the BBU for the 
> cache.  Yeah, its a little weird.  But it does work (we've done this 
> with some LSI8888's).

But can you then access SMART of the individual drives?
Or will you see only some bogus block devices which do not accept SMART
commands, do not return real drive identity, and present themselves as RAID0
#1, RAID0 #2 etc. instead?

-- 
With respect,
Roman

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 198 bytes --]

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: HBA Adaptor advice
  2011-05-20 12:34       ` Roman Mamedov
@ 2011-05-20 12:36         ` Mathias Burén
  2011-05-20 12:48         ` Joe Landman
  1 sibling, 0 replies; 60+ messages in thread
From: Mathias Burén @ 2011-05-20 12:36 UTC (permalink / raw)
  To: Roman Mamedov; +Cc: Joe Landman, Ed W, linux-raid

On 20 May 2011 13:34, Roman Mamedov <rm@romanrm.ru> wrote:
> On Fri, 20 May 2011 08:18:32 -0400
> Joe Landman <joe.landman@gmail.com> wrote:
>
>> Second off, you can turn any of the expensive RAID cards into an 'JBOD'
>> by doing something like this:
>>
>> 1) have the unit configured in RAID mode
>>
>> 2) build virtual disks out of single drives, as RAID0.
>>
>> 3) iterate 2 until you exhaust your drives.
>>
>> 4) make sure you prevent these drives from messing with your boot drive
>> order ... some bioses "helpfully" reorganize new drives for you by
>> messing with this list.
>>
>> Once the drive is a 1 disk RAID0, you get the cache, and the BBU for the
>> cache.  Yeah, its a little weird.  But it does work (we've done this
>> with some LSI8888's).
>
> But can you then access SMART of the individual drives?
> Or will you see only some bogus block devices which do not accept SMART
> commands, do not return real drive identity, and present themselves as RAID0
> #1, RAID0 #2 etc. instead?
>
> --
> With respect,
> Roman
>

Depends on the controller; e.g.

smartctl -A -d 3ware,$I /dev/twa0
smartctl -A -d megaraid,$I /dev/sda

(where $I is the port on the controller)

/M
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: HBA Adaptor advice
  2011-05-20 12:34       ` Roman Mamedov
  2011-05-20 12:36         ` Mathias Burén
@ 2011-05-20 12:48         ` Joe Landman
  1 sibling, 0 replies; 60+ messages in thread
From: Joe Landman @ 2011-05-20 12:48 UTC (permalink / raw)
  To: Roman Mamedov; +Cc: Ed W, linux-raid

On 05/20/2011 08:34 AM, Roman Mamedov wrote:
> On Fri, 20 May 2011 08:18:32 -0400
> Joe Landman<joe.landman@gmail.com>  wrote:
>
>> Second off, you can turn any of the expensive RAID cards into an 'JBOD'
>> by doing something like this:
>>
>> 1) have the unit configured in RAID mode
>>
>> 2) build virtual disks out of single drives, as RAID0.
>>
>> 3) iterate 2 until you exhaust your drives.
>>
>> 4) make sure you prevent these drives from messing with your boot drive
>> order ... some bioses "helpfully" reorganize new drives for you by
>> messing with this list.
>>
>> Once the drive is a 1 disk RAID0, you get the cache, and the BBU for the
>> cache.  Yeah, its a little weird.  But it does work (we've done this
>> with some LSI8888's).
>
> But can you then access SMART of the individual drives?

I don't view the loss of direct SMART access as a bad thing ... most of 
the RAID cards will give you CLI access to this data, if in a convoluted 
manner.  SMART's utility is generally pretty questionable (see the 
Google paper for a discussion on the profound lack of correlation of 
SMART parameters with actual failure rates).  But its there if you want it.

> Or will you see only some bogus block devices which do not accept SMART
> commands, do not return real drive identity, and present themselves as RAID0
> #1, RAID0 #2 etc. instead?

The RAID will provide you an abstraction (e.g. a layer you have to walk 
through) to your disks.  Seeing what composes the RAID is generally not 
hard, though you might need to write a quick and dirty parser for this.

The block devices are not bogus.  They are logical block devices.

-- 
Joseph Landman, Ph.D
Founder and CEO
Scalable Informatics, Inc.
email: landman@scalableinformatics.com
web  : http://scalableinformatics.com
        http://scalableinformatics.com/sicluster
phone: +1 734 786 8423 x121
fax  : +1 866 888 3112
cell : +1 734 612 4615

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: HBA Adaptor advice
  2011-05-20 12:18     ` Joe Landman
  2011-05-20 12:34       ` Roman Mamedov
@ 2011-05-20 13:21       ` Ed W
  2011-05-20 14:23         ` Joe Landman
  2011-05-20 20:01       ` Andy Smith
  2 siblings, 1 reply; 60+ messages in thread
From: Ed W @ 2011-05-20 13:21 UTC (permalink / raw)
  To: Joe Landman; +Cc: linux-raid

Hi

> If you absolutely insist on using a large expensive RAID card as a JBOD
> card, yeah, there are things you *can* do to keep access to the cache
> and BBU, though they are counter-intuitive.

The main issue with hardware cards is that really you need at least two
of them... At the most inopportune moment the only single one you own
will break and then your entire dataset becomes unavailable...

For sure, anyone with moderate or larger budgets, or a pool of similar
hardware, this becomes a case of simply buying an extra one and stashing
it.  Or at least keeping an eye on when it becomes end of line and
unavailable to buy a new one...


> First off, the LSI 920x series has a 16 port HBA.  You can look it up on
> their site.  SAS+SATA HBA I think.  LSI likes adorning some of their
> HBAs with some inherent RAID capability (their IR mode).  I personally
> prefer the IT mode, but its sometimes hard/impossible to make the switch
> (this is usually for motherboard mounted 'RAID' units). HBAs can be used
> as RAIDs, though the performance is abysmal (c.f. PERC*, lower end LSI
> ... which PERC are rebranded versions of, ...)

This sounds helpful, but I'm not understanding it?

Are you describing the reverse, ie taking a straight HBA card and asking
it to do "hardware raid" of multiple disks?

Or do you mean that performance is dismal even if you make X arrays of 1
disk each in order to access their BB cache?

Or to be really clear - can I take a cheapo PERC6 from ebay, and make it
run 8x disks completely under linux MD Raid, with smartctl access to the
individual disks and BB cache on the card - *with* high performance...
(phew...)



> When you do this, then use mdadm atop this.  We've found, generally, by
> doing this, we can build much faster RAIDs than the LSI 8888 units, and
> comparible to the 9260's in terms of performance across the same number
> of disks, at a lower price.  E.g. mdadm and the MD RAID stack are quite
> good.

What do you think stops the MD Stack being *better* than a 9260?  Also
in very round terms what kind of performance drop do you see from going
to linux MD raid versus a 9260?


> The additional cache doesn't buy you much for this arrangement. Might
> work against you if the card CPU is slow (as most of the hardware RAID
> chips are).

Hopefully not a silly question, but surely the CPU would have to be
extremely slow indeed not to keep up with a sorted bunch of writes that
are being issued to spinning rust drives with multi-ms seek latencies?
Are they really that slow..?

Thanks for your very helpful feedback - much appreciated

Ed W

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: HBA Adaptor advice
  2011-05-20 13:21       ` Ed W
@ 2011-05-20 14:23         ` Joe Landman
  0 siblings, 0 replies; 60+ messages in thread
From: Joe Landman @ 2011-05-20 14:23 UTC (permalink / raw)
  To: Ed W; +Cc: linux-raid

On 05/20/2011 09:21 AM, Ed W wrote:
> Hi
>
>> If you absolutely insist on using a large expensive RAID card as a JBOD
>> card, yeah, there are things you *can* do to keep access to the cache
>> and BBU, though they are counter-intuitive.
>
> The main issue with hardware cards is that really you need at least two
> of them... At the most inopportune moment the only single one you own
> will break and then your entire dataset becomes unavailable...

That is a risk with any proprietary design (a point we refer to in our 
marketing, relative to completely closed designs).  This said, the issue 
on the RAID side isn't all that terrible.  RAID cards, individually, 
aren't that expensive.  You can buy replacements on ebay, or from 
various used machine resellers.  That is, your data really isn't at an 
unmitigateable risk, but it at risk.

Put another way, yeah, having a spare RAID card around isn't a bad idea. 
  In most cases they don't burn out (we've seen 4 failed RAID cards in 
our time in the field, 2 of which were ... er ... customer initiated 
burnouts ... due to bad grounding).

> For sure, anyone with moderate or larger budgets, or a pool of similar
> hardware, this becomes a case of simply buying an extra one and stashing
> it.  Or at least keeping an eye on when it becomes end of line and
> unavailable to buy a new one...

And in the case of the businesses/researchers, the cost of the 
additional card in spares stock locally is (in most cases) in the noise 
level as compared to the actual cost of the gear.

That is, its not a terrible thing to do this.  If you are a home user, 
its another issue entirely.  A 1000 EUR might cost as much as the rest 
of your system.  So you want to mitigate that risk, and not have to pay 
that cost.  That decision to mitigate, by using MD raid, will come at 
some cost, though we see MD raid very much as the future of RAID 
systems.  Its all about refresh rates and economies of scale.

>> First off, the LSI 920x series has a 16 port HBA.  You can look it up on
>> their site.  SAS+SATA HBA I think.  LSI likes adorning some of their
>> HBAs with some inherent RAID capability (their IR mode).  I personally
>> prefer the IT mode, but its sometimes hard/impossible to make the switch
>> (this is usually for motherboard mounted 'RAID' units). HBAs can be used
>> as RAIDs, though the performance is abysmal (c.f. PERC*, lower end LSI
>> ... which PERC are rebranded versions of, ...)
>
> This sounds helpful, but I'm not understanding it?

The 16 port card is mostly HBA, with a little onboard logic for RAID0, 
RAID1, RAID10.

>
> Are you describing the reverse, ie taking a straight HBA card and asking
> it to do "hardware raid" of multiple disks?


LSI's HBAs have some of this capability, though we do not recommend 
using this.  We prefer to use them as straight HBAs.

>
> Or do you mean that performance is dismal even if you make X arrays of 1
> disk each in order to access their BB cache?

No ... we haven't looked into that performance as much, as this is a 
very difficult to use model, and honestly, there are no real benefits to 
this.

>
> Or to be really clear - can I take a cheapo PERC6 from ebay, and make it
> run 8x disks completely under linux MD Raid, with smartctl access to the
> individual disks and BB cache on the card - *with* high performance...
> (phew...)

I am going to pull a Clinton here, and ask you to define "high 
performance" :)  More seriously, performance is in the eye of the 
beholder ... what does it mean to you, and where do you need to be in 
performance ... and from that, you can see if MD RAID will get you there.

>> When you do this, then use mdadm atop this.  We've found, generally, by
>> doing this, we can build much faster RAIDs than the LSI 8888 units, and
>> comparible to the 9260's in terms of performance across the same number
>> of disks, at a lower price.  E.g. mdadm and the MD RAID stack are quite
>> good.
>
> What do you think stops the MD Stack being *better* than a 9260?  Also
> in very round terms what kind of performance drop do you see from going
> to linux MD raid versus a 9260?

Very little on the read side.  MD raid is as fast, if not faster than 
the 9260 on reads.  The 9260 isn't a bad card mind you, it is roughly 
midrange in LSI's lineup.  The write side ... I think the 9260 has a 
deeply pipelined XOR engine you need for the GF(256) calculations.  So 
we see about a 2x better write performance on the 9260 than we do on the 
MD raid.


>> The additional cache doesn't buy you much for this arrangement. Might
>> work against you if the card CPU is slow (as most of the hardware RAID
>> chips are).
>
> Hopefully not a silly question, but surely the CPU would have to be
> extremely slow indeed not to keep up with a sorted bunch of writes that
> are being issued to spinning rust drives with multi-ms seek latencies?
> Are they really that slow..?

Many of the low end cards run processors at 200-800 MHz.  Yeah ... some 
of them are really ... really ... slow.  MD RAID runs circles around 
them.  And soon, I think it will be running circles around the midrange 
(and probably higher end cards as well).

Regards,

Joe

-- 
Joseph Landman, Ph.D
Founder and CEO
Scalable Informatics Inc.
email: landman@scalableinformatics.com
web  : http://scalableinformatics.com
        http://scalableinformatics.com/sicluster
phone: +1 734 786 8423 x121
fax  : +1 866 888 3112
cell : +1 734 612 4615

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: HBA Adaptor advice
  2011-05-20 12:18     ` Joe Landman
  2011-05-20 12:34       ` Roman Mamedov
  2011-05-20 13:21       ` Ed W
@ 2011-05-20 20:01       ` Andy Smith
  2011-05-20 20:12         ` Stan Hoeppner
  2011-05-20 20:24         ` Drew
  2 siblings, 2 replies; 60+ messages in thread
From: Andy Smith @ 2011-05-20 20:01 UTC (permalink / raw)
  To: linux-raid

[-- Attachment #1: Type: text/plain, Size: 600 bytes --]

Hi Joe,

On Fri, May 20, 2011 at 08:18:32AM -0400, Joe Landman wrote:
>> On 20/05/2011 03:08, Andy Smith wrote:
>>> Are there actually any HBAs that have BBU without using their RAID
>>> features?
>>>
>>> I'd like to stop using hardware RAID but I can't give up the BBU and
>>> write cache.
>
> HBAs don't have BBU or write cache.  Only RAIDs do.  While you can run  
> the RAID in JBOD mode, you effectively lose the cache (and BBU) aspect by 
> doing so.

That's what I thought, thanks.

It's a shame; maybe there will be disks with battery-backed cache
one day.

Cheers,
Andy

[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 198 bytes --]

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: HBA Adaptor advice
  2011-05-20 20:01       ` Andy Smith
@ 2011-05-20 20:12         ` Stan Hoeppner
  2011-05-20 20:24         ` Drew
  1 sibling, 0 replies; 60+ messages in thread
From: Stan Hoeppner @ 2011-05-20 20:12 UTC (permalink / raw)
  To: linux-raid

On 5/20/2011 3:01 PM, Andy Smith wrote:

> It's a shame; maybe there will be disks with battery-backed cache
> one day.

You'll never see a cache DRAM BBU built into a drive.  If this *concept*
were to be implemented it would be done with flash and a capacitor
instead of a BBU.  The capacitor would be sized to hold just enough
juice to power the ASIC, flash chip, and related circuitry, and write
the cache DRAM contents to the flash chip after sensing power to the
card has been lost.

Many higher end RAID cards already have flash backup of the cache DRAM
in addition to, in instead of, a BBU.

-- 
Stan

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: HBA Adaptor advice
  2011-05-20 20:01       ` Andy Smith
  2011-05-20 20:12         ` Stan Hoeppner
@ 2011-05-20 20:24         ` Drew
  2011-05-20 20:58           ` Stan Hoeppner
  1 sibling, 1 reply; 60+ messages in thread
From: Drew @ 2011-05-20 20:24 UTC (permalink / raw)
  To: linux-raid

> It's a shame; maybe there will be disks with battery-backed cache
> one day.

There's already hybrid drives which pack a small SSD onboard to act as
a large cache.


-- 
Drew

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: HBA Adaptor advice
  2011-05-20 20:24         ` Drew
@ 2011-05-20 20:58           ` Stan Hoeppner
       [not found]             ` <4DD7A100.2010807@wildgooses.com>
  0 siblings, 1 reply; 60+ messages in thread
From: Stan Hoeppner @ 2011-05-20 20:58 UTC (permalink / raw)
  To: Drew; +Cc: linux-raid

On 5/20/2011 3:24 PM, Drew wrote:
>> It's a shame; maybe there will be disks with battery-backed cache
>> one day.
> 
> There's already hybrid drives which pack a small SSD onboard to act as
> a large cache.

These hybrid drives still have a small 32-64MB cache DRAM, in front of
the SSD.  The DRAM loses its contents when the power goes out.  The on
board SSD doesn't prevent this cache data loss.

It may be worth noting that most, if not all, pure SSDs also have cache
DRAM in front of the flash array, and thus will lose data in the cache
when the power fails.  Some models have what has been termed "super
capacitors" on board to power the device long enough to flush pending
writes in cache to the flash cells, but few, if any, of the
manufacturers advertise that their drives have this feature, or even
bother to put it on the spec sheet.  So there's no easy/consistent way,
at present, to really know if your SSD has this feature or not.

As always, a good data persistence strategy starts with a good UPS.
Laptop users have an advantage as they get a free built in UPS, and
typically, good software integration to automatically and safely
shutdown when the battery is about out of juice.

-- 
Stan

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: HBA Adaptor advice
  2011-05-19 21:07 ` Brad Campbell
@ 2011-05-20 20:58   ` Tobias McNulty
  2011-05-20 21:23     ` Brad Campbell
  0 siblings, 1 reply; 60+ messages in thread
From: Tobias McNulty @ 2011-05-20 20:58 UTC (permalink / raw)
  To: Brad Campbell; +Cc: RAID Linux

On Thu, May 19, 2011 at 5:07 PM, Brad Campbell
<lists2009@fnarfbargle.com> wrote:
>
> On 19/05/11 20:26, Ed W wrote:
>
>> Please add suggestions for good value, reliable controllers known to
>> work well with linux
>
> I have three of these :
>
> http://www.startech.com/product/PEXSATA24E-2-Port-eSATA-4-Port-SATA-PCI-Express-x4-SATA-Controller-Adapter-Card-PCIe
>
> and 4 of these :
>
> http://www.ebay.com.au/itm/IBM-M1015-46M0861-ServeRAID-M1015-SAS-SATA-Controller-/280655527117?pt=AU_Server_Accessories_Parts&hash=item41585f7ccd
>
> All of which I can't recommend highly enough.
>
> I got the Startech ones cheap from a dodgy shop about 4 years ago. They cost me about $30 each.
> I got the IBM (really LSI) ones cheap from ebay at about $110 each at Christmas.
>
> The Startech cards use the sata_mv driver and are solid, the LSI cards use the megaraid_sas driver and are solid. As a bonus of having SAS ports, I picked up 4 Seagate Cheetah 15k.5 SAS drives for a wicked fast RAID10 array.

So, I've been through 3 cards in my current NAS, all of which didn't
fit my needs for one reason or another, and I had given up until this
thread reignited my interest in having more than 6 available SATA
ports in the box. The cards I've tried are:

* SYBA SD-PEX40031 (Pericom PI7C9X111 + Silicone Image Sil3124
chipset) - lots of errors during heavy I/O such as resyc'ing
* 3ware 9650SE-8LPML-Sgl - I thought money would solve the problem,
but I didn't realize that you can't use an expensive RAID card to
access existing data on the disk
* Supermicro AOC-Saslp-MV8 - I thought it would be a perfect match
given my Supermicro motherboard, but also gave me lots of errors
during heavy I/O and this experience seems to be confirmed by other
users in this thread

In all cases switching back to the onboard SATA ports resulted in
seamless operation (same drives, cables, etc.).

In light of what I've learned in this thread I just ordered the
Rosewill RC-218 SATA card, which has the same Marvell 88SX7042 chipset
as the Startech link above, but runs only $80 on Newegg [1] and seems
to have good reviews from a few Linux users.  I'll report back after I
get it installed next week.

Cheers,
Tobias

[1] http://www.newegg.com/Product/Product.aspx?Item=N82E16816132018
--
Tobias McNulty, Managing Member
Caktus Consulting Group, LLC
http://www.caktusgroup.com
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: HBA Adaptor advice
  2011-05-20 20:58   ` Tobias McNulty
@ 2011-05-20 21:23     ` Brad Campbell
  0 siblings, 0 replies; 60+ messages in thread
From: Brad Campbell @ 2011-05-20 21:23 UTC (permalink / raw)
  To: RAID Linux

On 21/05/11 04:58, Tobias McNulty wrote:
>
> * SYBA SD-PEX40031 (Pericom PI7C9X111 + Silicone Image Sil3124
> chipset) - lots of errors during heavy I/O such as resyc'ing

Oh dear.. Gee, another item of useless anecdotal evidence directly attributable to cheaply 
manufactured cards from a third world country and no direct correlation to a flaky chipset design.

> * 3ware 9650SE-8LPML-Sgl - I thought money would solve the problem,
> but I didn't realize that you can't use an expensive RAID card to
> access existing data on the disk

Ahh..

> * Supermicro AOC-Saslp-MV8 - I thought it would be a perfect match
> given my Supermicro motherboard, but also gave me lots of errors
> during heavy I/O and this experience seems to be confirmed by other
> users in this thread

Oh dear.. Yes, the mvsas driver has been noted to be somewhat problematic still.

I hope you're not a betting man. 3 for 3 is not a great record thus far. On the up-side the 7042's 
have been as solid as a rock.. and _fast_ for the last couple of years. Marvell worked with Mark 
Lord and the result was a workable version of the sata_mv driver. Shame they don't do the same with 
the mvsas code.

Additionally, I migrated two arrays from the Marvell7042 controllers onto the LSI based "IBM" 
controllers configured up as JBOD and they just worked. No initialisation or reconfiguration 
required at all.

Brad

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: HBA Adaptor advice
  2011-05-20  5:30   ` Stan Hoeppner
@ 2011-05-21  9:52     ` Ed W
  0 siblings, 0 replies; 60+ messages in thread
From: Ed W @ 2011-05-21  9:52 UTC (permalink / raw)
  To: Stan Hoeppner; +Cc: linux-raid

On 20/05/2011 06:30, Stan Hoeppner wrote:
>> Are there actually any HBAs that have BBU without using their RAID
>> features?
> 
> AFAIK the LSI real RAID cards allow this.  To get them into a JBOD mode
> you have to create a single drive RAID 0 of each disk and export it.  By
> doing so the RAID firmware is actually active, though not really doing
> anything, so you get the cache and BBU benefit of the controller.  One
> of the XFS developers, Dave Chinner, posted this to the XFS list quite
> some time ago when we discussed hardware vs software RAID setups.
> 


Something I didn't consider:

When you setup most hardware raid cards to have a whole bunch of RAID0
arrays (that are then assembled as software raid), *can* I swap that
hardware raid card for a different make/model or even non raid HBA?

If I can't do this then I don't want such a card... The entire point of
avoiding hardware raid was simply to avoid the proprietory lockin...

So, to be specific, given one of the LSI/3Ware/Areca fast hardware raid
cards mentioned in this thread, and assuming that I have created a bunch
of raid0 arrays, each containing 1 drive, can anyone confirm/deny if
those single disks than then be moved to a) non raid HBA controller, b)
another hardware raid controller as a new raid0 array

I'm rather expecting that a) might be possible if the HBA can ignore the
proprietory bit and see a raw partition, but b) seems highly unlikely
since the new controller presumably wants to reformat the array in it's
own format before it will use it?

If neither are possible then there seems little advantage in using a
hardware raid as a write caching HBA card (unless the card is too
underpowered that it's a bottleneck)

Thanks

Ed W

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: HBA Adaptor advice
  2011-05-20 10:21     ` Stan Hoeppner
@ 2011-05-21 11:17       ` Ed W
  2011-05-21 11:29         ` Rudy Zijlstra
  2011-05-22  9:04         ` Stan Hoeppner
  0 siblings, 2 replies; 60+ messages in thread
From: Ed W @ 2011-05-21 11:17 UTC (permalink / raw)
  To: Stan Hoeppner; +Cc: linux-raid

Hi Stan

Thanks for the time in composing your reply

> I'm curious why you are convinced that you need BBWC, or even simply WC,
> on an HBA used for md RAID.

In the past I have used battery backed cards and where the write speed
is "fsync constrained" the writeback cache makes the app performance fly
at perhaps 10-100x the speed

So for example postfix delivery speeds and mysql write performance are
examples of applications which generate regular fsyncs.  The whole app
pauses for basically the seek time of the drive head and performance is
bounded by seek time (assuming spinning media).

If we add a writeback cache then it would appear that you take a couple
of "green" 2TB drives and suddenly your desktop server acquires short
term performance which matches a bunch of high end drives? (noted only
in bursts, after some seconds you catch up with the drives IOPs).  For
my basically "small server" requirements this gives me a big boost in
the feeling of interactivity for perhaps less than the price of a couple
of those high end drives


>  I'm also curious as to why you are so
> adamant about _not_ using the RAID ASIC on an HBA, given that it will
> take much greater advantage of the BBWC than md RAID will.

Only for a single reason: Its a small office server and I want the
flexibility to move the drives to a different card (eg failed server,
failed card or something else).  Buying a spare card changes the
dynamics quite a bit when the whole server (sans raid card) only costs
£1,000 ish?


  You may be
> interested to know:
> 
> 1.  When BBWC is enabled, all internal drive caches must be disabled.
>     Otherwise you eliminate the design benefit of the BBU, and may as
>     well not have one.

Yes, I hadn't thought of that.  Good point!

> 2.  w/md RAID on an HBA, if you have a good UPS and don't suffer
>     kernel panics, crashes, etc, you can disable barrier support in
>     your FS and you can use the drive caches.

I don't buy this...

Note we are discussing "long tail events" here. ie catastrophic events
which occur very infrequently. At this point experience is everything
and I concede limited experience, you likely have more, but I'm going to
claim that these events are sufficiently rare that your experience
probably still isn't sufficient to draw proper conclusions...

In my limited experience hardware is pretty reliable and goes bad
rarely.  However, my estimate is that powercables fall out, PSUs fail
and UPSs go bad at least as often as the power fails?

Obviously it's application dependent, some may tolerate small dataloss
in the event of powerdown, but I should think most people want a
guarantee that the system is "recoverable" in the event of sudden
powerdown.

I think disabling barriers might not be the best way to avoid fsync
delays, compared with the incremental cost of adding BBU writeback
cache? (basically the same thing, but smaller chance of failure)


> For a stable system with good UPS and auto shutdown configured, BBWC is
> totally overrated.  If the system never takes a nose dive from power
> drop, and doesn't crash due to software or hardware failure, then BBWC
> is a useless $200-1000 option.

It depends on the application, but I claim that there is a fairly
significant chance of hard unexpected powerdown even with a good UPS.
You still are at risk from cables getting pulled, UPSs failing, etc

I think in a properly setup datacenter (racked) environment then it's
easier to control these accidents.  Cables can be tied in, layers of
power backup can be managed, it becomes efficient to add quality
surge/lightning protection, etc.  However, there is a large proportion
of the market that have a few machines in an office and now it's much
harder to stop the cleaner tripping over the UPS, or hiding it under
boxes of paper until it melts due to overheating...


> If your current reasoning for wanting write cache on the HBA is
> performance, then forget about the write cache as you don't need it with
> md RAID.  If you want the BBWC combo for safety as your system isn't
> stable or you have a crappy or no UPS, then forgo md RAID and use the
> hardware RAID and BBWC combo.

I want BB writeback cache purely to get the performance of effectively
disabling fsync, but without the loss of protection which occurs if you
do so.


> One last point:  If you're bargain hunting, especially if looking at
> used gear on Ebay, that mindset is antithetical to proper system
> integration, especially when talking about a RAID card BBU.  

I think there are few businesses who actually don't care about budget.
Everything is about optimisation of cost vs performance vs reliability.
 Like everything else, my question is really about the tradeoff of a
small incremental spend, which in turn might generate a substantial
performance increase for certain classes of application.  Largely I'm
thinking about performance tradeoffs for small office servers priced in
the £500-3,000 kind of range (not "proper" high end storage devices)

I think at that kind of level it makes sense to look for bargains,
especially if you are adding servers in small quantities, eg singles or
pairs.


> If you buy
> a use card, the first thing you muse do is chuck the BBU and order a new
> one,

Agreed


> Buy 12:
> http://www.seagate.com/ww/v/index.jsp?name=st91000640ss-constellation2-6gbs-sas-1-tb-hd&vgnextoid=ff13c5b2933d9210VgnVCM1000001a48090aRCRD&vgnextchannel=f424072516d8c010VgnVCM100000dd04090aRCRD&locale=en-US&reqPage=Support#tTabContentSpecifications

Out of curiosity I check the power consumption and reliability numbers
of the 3.5" "Green" drives and it's not so clear cut that the 2.5"
drives outperform?


Thanks for your thoughts - I think this thread has been very
constructive - still very interested to hear good/bad reports of
specific cards - perhaps someone might archive it into some kind of list?

Cheers

Ed W

--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: HBA Adaptor advice
  2011-05-21 11:17       ` Ed W
@ 2011-05-21 11:29         ` Rudy Zijlstra
  2011-05-21 11:54           ` Ed W
  2011-05-21 17:05           ` Leslie Rhorer
  2011-05-22  9:04         ` Stan Hoeppner
  1 sibling, 2 replies; 60+ messages in thread
From: Rudy Zijlstra @ 2011-05-21 11:29 UTC (permalink / raw)
  To: Ed W; +Cc: Stan Hoeppner, linux-raid

Hi Ed,

I understand your thinking. There is one big cost not mentioned in this 
calculation though:
- what is the cost if the data is lost/corrupt?

compared to that cost, how relevant is the cost of a proper card?

I am getting the feeling of "penny wise, pound foolish"

Now that mind set, of course, describes many a business....

Cheers,


Rudy

On 05/21/2011 01:17 PM, Ed W wrote:
> Hi Stan
>
> Thanks for the time in composing your reply
>
>    
>> I'm curious why you are convinced that you need BBWC, or even simply WC,
>> on an HBA used for md RAID.
>>      
> In the past I have used battery backed cards and where the write speed
> is "fsync constrained" the writeback cache makes the app performance fly
> at perhaps 10-100x the speed
>
> So for example postfix delivery speeds and mysql write performance are
> examples of applications which generate regular fsyncs.  The whole app
> pauses for basically the seek time of the drive head and performance is
> bounded by seek time (assuming spinning media).
>
> If we add a writeback cache then it would appear that you take a couple
> of "green" 2TB drives and suddenly your desktop server acquires short
> term performance which matches a bunch of high end drives? (noted only
> in bursts, after some seconds you catch up with the drives IOPs).  For
> my basically "small server" requirements this gives me a big boost in
> the feeling of interactivity for perhaps less than the price of a couple
> of those high end drives
>
>
>    
>>   I'm also curious as to why you are so
>> adamant about _not_ using the RAID ASIC on an HBA, given that it will
>> take much greater advantage of the BBWC than md RAID will.
>>      
> Only for a single reason: Its a small office server and I want the
> flexibility to move the drives to a different card (eg failed server,
> failed card or something else).  Buying a spare card changes the
> dynamics quite a bit when the whole server (sans raid card) only costs
> £1,000 ish?
>
>
>    You may be
>    
>> interested to know:
>>
>> 1.  When BBWC is enabled, all internal drive caches must be disabled.
>>      Otherwise you eliminate the design benefit of the BBU, and may as
>>      well not have one.
>>      
> Yes, I hadn't thought of that.  Good point!
>
>    
>> 2.  w/md RAID on an HBA, if you have a good UPS and don't suffer
>>      kernel panics, crashes, etc, you can disable barrier support in
>>      your FS and you can use the drive caches.
>>      
> I don't buy this...
>
> Note we are discussing "long tail events" here. ie catastrophic events
> which occur very infrequently. At this point experience is everything
> and I concede limited experience, you likely have more, but I'm going to
> claim that these events are sufficiently rare that your experience
> probably still isn't sufficient to draw proper conclusions...
>
> In my limited experience hardware is pretty reliable and goes bad
> rarely.  However, my estimate is that powercables fall out, PSUs fail
> and UPSs go bad at least as often as the power fails?
>
> Obviously it's application dependent, some may tolerate small dataloss
> in the event of powerdown, but I should think most people want a
> guarantee that the system is "recoverable" in the event of sudden
> powerdown.
>
> I think disabling barriers might not be the best way to avoid fsync
> delays, compared with the incremental cost of adding BBU writeback
> cache? (basically the same thing, but smaller chance of failure)
>
>
>    
>> For a stable system with good UPS and auto shutdown configured, BBWC is
>> totally overrated.  If the system never takes a nose dive from power
>> drop, and doesn't crash due to software or hardware failure, then BBWC
>> is a useless $200-1000 option.
>>      
> It depends on the application, but I claim that there is a fairly
> significant chance of hard unexpected powerdown even with a good UPS.
> You still are at risk from cables getting pulled, UPSs failing, etc
>
> I think in a properly setup datacenter (racked) environment then it's
> easier to control these accidents.  Cables can be tied in, layers of
> power backup can be managed, it becomes efficient to add quality
> surge/lightning protection, etc.  However, there is a large proportion
> of the market that have a few machines in an office and now it's much
> harder to stop the cleaner tripping over the UPS, or hiding it under
> boxes of paper until it melts due to overheating...
>
>
>    
>> If your current reasoning for wanting write cache on the HBA is
>> performance, then forget about the write cache as you don't need it with
>> md RAID.  If you want the BBWC combo for safety as your system isn't
>> stable or you have a crappy or no UPS, then forgo md RAID and use the
>> hardware RAID and BBWC combo.
>>      
> I want BB writeback cache purely to get the performance of effectively
> disabling fsync, but without the loss of protection which occurs if you
> do so.
>
>
>    
>> One last point:  If you're bargain hunting, especially if looking at
>> used gear on Ebay, that mindset is antithetical to proper system
>> integration, especially when talking about a RAID card BBU.
>>      
> I think there are few businesses who actually don't care about budget.
> Everything is about optimisation of cost vs performance vs reliability.
>   Like everything else, my question is really about the tradeoff of a
> small incremental spend, which in turn might generate a substantial
> performance increase for certain classes of application.  Largely I'm
> thinking about performance tradeoffs for small office servers priced in
> the £500-3,000 kind of range (not "proper" high end storage devices)
>
> I think at that kind of level it makes sense to look for bargains,
> especially if you are adding servers in small quantities, eg singles or
> pairs.
>
>
>    
>> If you buy
>> a use card, the first thing you muse do is chuck the BBU and order a new
>> one,
>>      
> Agreed
>
>
>    
>> Buy 12:
>> http://www.seagate.com/ww/v/index.jsp?name=st91000640ss-constellation2-6gbs-sas-1-tb-hd&vgnextoid=ff13c5b2933d9210VgnVCM1000001a48090aRCRD&vgnextchannel=f424072516d8c010VgnVCM100000dd04090aRCRD&locale=en-US&reqPage=Support#tTabContentSpecifications
>>      
> Out of curiosity I check the power consumption and reliability numbers
> of the 3.5" "Green" drives and it's not so clear cut that the 2.5"
> drives outperform?
>
>
> Thanks for your thoughts - I think this thread has been very
> constructive - still very interested to hear good/bad reports of
> specific cards - perhaps someone might archive it into some kind of list?
>
> Cheers
>
> Ed W
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-raid" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>    

--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: HBA Adaptor advice
  2011-05-21 11:29         ` Rudy Zijlstra
@ 2011-05-21 11:54           ` Ed W
  2011-05-21 17:37             ` Leslie Rhorer
  2011-05-22  9:41             ` Stan Hoeppner
  2011-05-21 17:05           ` Leslie Rhorer
  1 sibling, 2 replies; 60+ messages in thread
From: Ed W @ 2011-05-21 11:54 UTC (permalink / raw)
  To: Rudy Zijlstra; +Cc: Stan Hoeppner, linux-raid

Hi

> I understand your thinking. There is one big cost not mentioned in this
> calculation though:
> - what is the cost if the data is lost/corrupt?

I think it's fair to say that the loss of all your business data is the
loss of your entire business?!

That said if you are Skype you don't spend 8.5 billion on raid cards,
instead you choose a layered approach to availability which normally
trades speed of restore time vs cost.

Eg one might specify raid 6, dual mirrored servers, backed up to some
spare disks, blueray and some offsite storage service.  This would give
resilliance to various types of disaster without spending the entire
budget on a fancy raid card?

In fact if you go back to my question, the *entire* point is that I
don't want the choice of card to be a point of failure, ie it's my
specific point to purchase a card such that it can be swapped out for
near any other card in the event of failure.

> compared to that cost, how relevant is the cost of a proper card?

See point above.  I don't get a strong feeling that a "proper card" is
any more reliable and resiliant than a well chosen cheap card?  If that
theory is correct then the ability to swap in another cheap card in the
event of disaster is valuable and eliminates a point of failure for
little cost?

> I am getting the feeling of "penny wise, pound foolish"

I don't see that your logic leads here?

There is a clear definition of good/bad here.  The only acceptable
performance is that all reads/writes are accurate and completed.  No
data should be lost or corrupted.  Assuming that the market can be
partitioned into good/bad cards based on the definition above, then if
we select from only "good" cards, then price appears to only buy me
performance, nothing else?

So my question is how to choose from all the "good" cards, the best bang
for buck.  I don't see any reason not to buy a cheaper card that
performs well, subject to it being reliable and doesn't loose data.

Does someone have a claim that dataloss is actually on a curve and that
more expensive cards corrupt less data and cheaper cards corrupt more
data...  That doesn't seem to fit with expectation... (I expect either
working cards that loose nothing, or bad cards that loose some data.
Black and white)


> Now that mind set, of course, describes many a business....

I think this is a silly line of argument.  All you can ever do is buy
"insurance" against low probability events occuring.  Annoyingly the
"insurance" in this scenario doesn't always pay out and so the question
is how much to spend on orthogonal types of insurance to increase the
chance of a payout in the case of disaster...

It's always easy in the event of some disaster to point out how you
should have bought some different type of "insurance", but equally it's
also dead money that a business could spend to generate income...
Balancing funding between profitable activities and insurance is a fine
line (especially since you are insuring against infrequent events)

As engineers, yes it's always easy to prefer to spend money on technical
"insurance", but accept also that there are competing demands on where
cash gets deployed to earn a return?

Cheers

Ed W

^ permalink raw reply	[flat|nested] 60+ messages in thread

* RE: HBA Adaptor advice
  2011-05-21 11:29         ` Rudy Zijlstra
  2011-05-21 11:54           ` Ed W
@ 2011-05-21 17:05           ` Leslie Rhorer
  1 sibling, 0 replies; 60+ messages in thread
From: Leslie Rhorer @ 2011-05-21 17:05 UTC (permalink / raw)
  To: 'Rudy Zijlstra', 'Ed W'
  Cc: 'Stan Hoeppner', linux-raid



> -----Original Message-----
> From: linux-raid-owner@vger.kernel.org [mailto:linux-raid-
> owner@vger.kernel.org] On Behalf Of Rudy Zijlstra
> Sent: Saturday, May 21, 2011 6:29 AM
> To: Ed W
> Cc: Stan Hoeppner; linux-raid@vger.kernel.org
> Subject: Re: HBA Adaptor advice
> 
> Hi Ed,
> 
> I understand your thinking. There is one big cost not mentioned in this
> calculation though:
> - what is the cost if the data is lost/corrupt?
> 
> compared to that cost, how relevant is the cost of a proper card?
> 
> I am getting the feeling of "penny wise, pound foolish"

	Well, not necessarily.  Your point is taken, but some data is simply
not critical.  A backup system, for example, may not be as critical as a
main system.  There are also some cases where availability is quite properly
deemed more important than reliability.


^ permalink raw reply	[flat|nested] 60+ messages in thread

* RE: HBA Adaptor advice
  2011-05-21 11:54           ` Ed W
@ 2011-05-21 17:37             ` Leslie Rhorer
  2011-05-22  9:41             ` Stan Hoeppner
  1 sibling, 0 replies; 60+ messages in thread
From: Leslie Rhorer @ 2011-05-21 17:37 UTC (permalink / raw)
  To: 'Ed W', 'Rudy Zijlstra'
  Cc: 'Stan Hoeppner', linux-raid



> -----Original Message-----
> From: linux-raid-owner@vger.kernel.org [mailto:linux-raid-
> owner@vger.kernel.org] On Behalf Of Ed W
> Sent: Saturday, May 21, 2011 6:55 AM
> To: Rudy Zijlstra
> Cc: Stan Hoeppner; linux-raid@vger.kernel.org
> Subject: Re: HBA Adaptor advice
> 
> Hi
> 
> > I understand your thinking. There is one big cost not mentioned in this
> > calculation though:
> > - what is the cost if the data is lost/corrupt?
> 
> I think it's fair to say that the loss of all your business data is the
> loss of your entire business?!

	If a single device failure results in the loss of all one's business
data, then one has done a completely incompetent job of building a data
system.  In a properly designed system, the question is how much time and
effort will be spent over the life of the system in recovering data, or more
properly the cost of those resources, vs. the capital outlay for more
expensive hardware.  If any loss of productivity for the company as a whole
is involved with a hardware failure, then that should be taken into account,
as well.

> That said if you are Skype you don't spend 8.5 billion on raid cards,
> instead you choose a layered approach to availability which normally
> trades speed of restore time vs cost.
> 
> Eg one might specify raid 6, dual mirrored servers, backed up to some
> spare disks, blueray and some offsite storage service.  This would give
> resilliance to various types of disaster without spending the entire
> budget on a fancy raid card?
> 
> In fact if you go back to my question, the *entire* point is that I
> don't want the choice of card to be a point of failure, ie it's my
> specific point to purchase a card such that it can be swapped out for
> near any other card in the event of failure.
> 
> > compared to that cost, how relevant is the cost of a proper card?
> 
> See point above.  I don't get a strong feeling that a "proper card" is
> any more reliable and resiliant than a well chosen cheap card?  If that
> theory is correct then the ability to swap in another cheap card in the
> event of disaster is valuable and eliminates a point of failure for
> little cost?

	Well, yes, and no.  First of all, a "bad" card is often the result
of random factors unmitigated by any process.  The most expensive card on
Earth may have accidentally been exposed to ESD at some point in its
existence, for example.  OTOH, an inexpensive card may not necessarily be of
poor quality or design.  That said, I think there is a reasonable
expectation that a higher cost card should be the result of careful
engineering, high quality production methods, and extensive QC procedures,
all of which may be somewhat less likely of a lower cost card.

	I think on average one may expect lower failure rates on higher cost
devices.  The thing is, a statistical average has nothing to do with any
given failure.  A customer does not care if he is the only client who has
ever lost any data out of hundreds of clients.  He only cares that he has
lost his data.

> > I am getting the feeling of "penny wise, pound foolish"
> 
> I don't see that your logic leads here?
> 
> There is a clear definition of good/bad here.  The only acceptable
> performance is that all reads/writes are accurate and completed.  No
> data should be lost or corrupted.  Assuming that the market can be
> partitioned into good/bad cards based on the definition above, then if
> we select from only "good" cards, then price appears to only buy me
> performance, nothing else?

	I would say there are exceptions, but in general, yes.  More to the
point - and I think this is your point - relying upon quality hardware to
prevent failures is a much poorer approach than developing a strategy that
mitigates the impact of failures.  Put more simply, a proper backup strategy
is a must.

> So my question is how to choose from all the "good" cards, the best bang
> for buck.  I don't see any reason not to buy a cheaper card that
> performs well, subject to it being reliable and doesn't loose data.
> 
> Does someone have a claim that dataloss is actually on a curve and that
> more expensive cards corrupt less data and cheaper cards corrupt more
> data...  That doesn't seem to fit with expectation... (I expect either
> working cards that loose nothing, or bad cards that loose some data.
> Black and white)

	Well, yes, but there is a (fairly minor, I think) statistical
correlation between cost and failure rates.

> > Now that mind set, of course, describes many a business....
> 
> I think this is a silly line of argument.  All you can ever do is buy
> "insurance" against low probability events occuring.  Annoyingly the
> "insurance" in this scenario doesn't always pay out and so the question
> is how much to spend on orthogonal types of insurance to increase the
> chance of a payout in the case of disaster...
> 
> It's always easy in the event of some disaster to point out how you
> should have bought some different type of "insurance", but equally it's
> also dead money that a business could spend to generate income...
> Balancing funding between profitable activities and insurance is a fine
> line (especially since you are insuring against infrequent events)
> 
> As engineers, yes it's always easy to prefer to spend money on technical
> "insurance", but accept also that there are competing demands on where
> cash gets deployed to earn a return?

	Well, there is a big difference between "insurance" in the ordinary
sense, which is to say a recurring premium paid ad infinitum and a one time
capital outlay that offers greater reliability for the indefinite future.
In addition, depending upon the application, performance does have value.
All that said, I agree that as long as the performance is acceptable, and as
long as the average reliability is reasonable, the lower cost solution
coupled with a solid backup strategy is the better choice.


^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: HBA Adaptor advice
       [not found]             ` <4DD7A100.2010807@wildgooses.com>
@ 2011-05-22  8:13               ` Stan Hoeppner
  0 siblings, 0 replies; 60+ messages in thread
From: Stan Hoeppner @ 2011-05-22  8:13 UTC (permalink / raw)
  To: linux-raid

On 5/21/2011 6:24 AM, Ed W wrote:
> On 20/05/2011 21:58, Stan Hoeppner wrote:
> 
>> As always, a good data persistence strategy starts with a good UPS.
> 
> I'm sure you are going to tell me that my APCs aren't good UPSs, but I

In my experience APC are good UPS.  Interestingly, I have an APC
SU1400RMNET manufactured in *1997*, powering my home office rack.  I've
replaced the batteries 4 times, but the UPS itself is like the Energizer
Bunny.  14 years and still going strong.

> have something like 5 APCs and 4 have failed in odd ways due to the
> battery dying, inside of around 2 years from new.  Sure you replace the

Then I'd guess you're not performing proper UPS maintenance.  Once
yearly you need to perform a deep cycle self test which can notify you
of marginal batteries at a much earlier stage.  Your APC manual has
instructions for performing this test, or you can download the manual
from there site if it's been lost.

All APCs inform you when the battery needs to be replaced, via front
panel LED and via software or network notification (Email/SNMP).  But
you don't want to wait for that.  Do the deep self test.

> battery, but failure modes each time caused a sudden power failure.  In
> nearly all cases the UPS failed before I would have had a sudden power
> loss for other reasons...

Yep.  Lack of proper UPS maintenance and monitoring.

> So, I'm not convinced that UPSs dramatically raise the uptime, and where

Without a UPS in Missouri USA, your servers will go down from power loss
at *minimum* 50-100 times per year due to electrical storms, high winds,
power line maintenance, brown outs and sags caused by all manner of
things, truck hitting power pole, etc, etc.

> they do it's in well designed, racked, datacenter environments where
> "accidents" don't dominate the downtime risk?

Doesn't matter if it's a corporate datacenter, your rack in the
basement, or an office pedestal server.  What counts is proper design
and installation.  It's is trivially simple in an office environment to
route all cables in a manner that they won't be tripped over.  I'm truly
shocked that cable tripping could be an issue for anyone in 2011, let
alone 1999.  Get a rack cabinet and stick it in a corner.  Here, get this:

http://cgi.ebay.co.uk/COMPAQ-42U-SERVER-RACK-CABINET-ENCLOSURE-/150608001643?pt=UK_Computing_Networking_SM&hash=item2310efba6b

and 2 or 3 of these, since all your servers are non-rack boxes:

http://cgi.ebay.co.uk/StarTech-Adjustable-Depth-Fixed-Server-Rack-Cabinet-She-/320618338412?pt=UK_Computing_ComputerComponents_Monitors&hash=item4aa657986c

Problem solved.

-- 
Stan

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: HBA Adaptor advice
  2011-05-21 11:17       ` Ed W
  2011-05-21 11:29         ` Rudy Zijlstra
@ 2011-05-22  9:04         ` Stan Hoeppner
  2011-05-22 10:09           ` Brad Campbell
  1 sibling, 1 reply; 60+ messages in thread
From: Stan Hoeppner @ 2011-05-22  9:04 UTC (permalink / raw)
  To: Ed W; +Cc: linux-raid

On 5/21/2011 6:17 AM, Ed W wrote:
> Hi Stan
> 
> Thanks for the time in composing your reply

Heh, I'm TheHardwareFreak, whaddya expect? ;)  Note the domain in my
email addy.

>> I'm curious why you are convinced that you need BBWC, or even simply WC,
>> on an HBA used for md RAID.
> 
> In the past I have used battery backed cards and where the write speed
> is "fsync constrained" the writeback cache makes the app performance fly
> at perhaps 10-100x the speed

Ok, now that makes more sense.  This is usually the case with a hardware
RAID card or SAN controller, though depends on the
vendor/implementation.  I've never run one in 'JBOD' mode but with write
cache enabled, so I can't say if fsync behavior will be the same using
md RAID or not.  Maybe someone else has tested this.

> Only for a single reason: Its a small office server and I want the
> flexibility to move the drives to a different card (eg failed server,
> failed card or something else).  Buying a spare card changes the
> dynamics quite a bit when the whole server (sans raid card) only costs
> £1,000 ish?

If adding a real hardware RAID card and enterprise drives to a 'base'
server, the storage will nearly always cost more than the server,
especially with HP Proliant 1U quad core rack servers going for well
less than $1000 USD.  This has been reality for a few years now.

>   You may be
>> interested to know:
>>
>> 1.  When BBWC is enabled, all internal drive caches must be disabled.
>>     Otherwise you eliminate the design benefit of the BBU, and may as
>>     well not have one.
> 
> Yes, I hadn't thought of that.  Good point!
> 
>> 2.  w/md RAID on an HBA, if you have a good UPS and don't suffer
>>     kernel panics, crashes, etc, you can disable barrier support in
>>     your FS and you can use the drive caches.
> 
> I don't buy this...

Well, take into consideration that the vast majority of people running
md RAID arrays, including most, if not all, on this list, aren't using
hardware writeback cache.  They using plain Jane SAS/SATA HBAs.  Some
are using hybrid hardware arrays stitched together with md RAID striping
or concatenation.  But in those cases we're talking multiple tens of
thousands of dollars per system.

> In my limited experience hardware is pretty reliable and goes bad
> rarely.  However, my estimate is that powercables fall out, PSUs fail
> and UPSs go bad at least as often as the power fails?

*Quality* hardware today is very reliable.
Power cords *never* come lose in my experience, I don't allow it.
PSUs and UPSes fail at about the same rate as RAID cards, IME--*rarely*
Apparently Britain has a far better power grid than the States.

> Obviously it's application dependent, some may tolerate small dataloss
> in the event of powerdown, but I should think most people want a
> guarantee that the system is "recoverable" in the event of sudden
> powerdown.

There is always a tradeoff here between performance, resilience,
flexibility, and cost.  You currently have conflicting criteria in this
regard.  If you can afford all that you want, pick that which is most
important to eliminate the conflicts.  Then implement it.

> I think disabling barriers might not be the best way to avoid fsync
> delays, compared with the incremental cost of adding BBU writeback
> cache? (basically the same thing, but smaller chance of failure)

On the type of small office server you described, it's difficult to
grasp how performance is so critical.  You sound like a candidate for a
mixed SSD + SAS/SATA RAID setup.  Put things that require low latency,
such as the Postfix spool, Dovecot indexes, and MySQL tables on SSD, and
put user data, such as IMAP mail directories, home directory files, etc,
on spinning RAID.  This way you get high performance and low cost.

> It depends on the application, but I claim that there is a fairly
> significant chance of hard unexpected powerdown even with a good UPS.
> You still are at risk from cables getting pulled, UPSs failing, etc

If cables getting yanked is a concern, you have human issues that must
be solved long before the technical aspects of system resiliency.  I've
not built/installed/used/serviced a pedestal server in over a decade.

> I think in a properly setup datacenter (racked) environment then it's
> easier to control these accidents.  

We don't have "accidents" in our datacenters, not the homo sapien
initiated type you refer to.

> Cables can be tied in, layers of
> power backup can be managed, it becomes efficient to add quality
> surge/lightning protection, etc.  However, there is a large proportion
> of the market that have a few machines in an office and now it's much
> harder to stop the cleaner tripping over the UPS, or hiding it under
> boxes of paper until it melts due to overheating...

Again, these types of problems can't be solved with technological means.

> I want BB writeback cache purely to get the performance of effectively
> disabling fsync, but without the loss of protection which occurs if you
> do so.

You can have it with some cards.  But, you will lose your ability to
swap the drives to a different make/model of HBA in the future.

> Everything is about optimisation of cost vs performance vs reliability.

Yep.

>  Like everything else, my question is really about the tradeoff of a
> small incremental spend, which in turn might generate a substantial
> performance increase for certain classes of application.  Largely I'm
> thinking about performance tradeoffs for small office servers priced in
> the £500-3,000 kind of range (not "proper" high end storage devices)

'Proper' need not be 'high end' nor expensive.

> I think at that kind of level it makes sense to look for bargains,
> especially if you are adding servers in small quantities, eg singles or
> pairs.

Again, that's exactly what the parts I posted gives you.

>> Buy 12:
>> http://www.seagate.com/ww/v/index.jsp?name=st91000640ss-constellation2-6gbs-sas-1-tb-hd&vgnextoid=ff13c5b2933d9210VgnVCM1000001a48090aRCRD&vgnextchannel=f424072516d8c010VgnVCM100000dd04090aRCRD&locale=en-US&reqPage=Support#tTabContentSpecifications
> 
> Out of curiosity I check the power consumption and reliability numbers
> of the 3.5" "Green" drives and it's not so clear cut that the 2.5"
> drives outperform?

WD's Green drives have a 5400 rpm 'variable' spindle speed.  The Seagate
2.5" SAS drive has a 7.2k spindle speed.

It's difficult to align partitions properly on the Green drives due to
native 4K sectors translated by drive firmware to 512B sectors.  The
Seagate SAS drive has native 512B sectors.

The Green drives have aggressive power saving firmware not suitable for
business use as the heads are auto parked every 8 seconds or so.  IIRC
the drive goes into sleep mode after a short period of inactivity on the
host interface.  In short, these drives are designed optimally for the
"is not running" case rather than the "running" case.  Hence the name
"Green".  How do you save power?  Turn off the drive.  And that's
exactly what these drives are designed to do.

The Seagate 2.5" SAS drive has TLER support, the Green doesn't.  If you
go hardware RAID, you need TLER.  It's good to have for md RAID as well
but not a requirement.

Check the warranty difference between the Seagate SAS drive and the WD
Green.  Also note WD's 'RAID use' policy.

> Thanks for your thoughts - I think this thread has been very
> constructive - still very interested to hear good/bad reports of
> specific cards - perhaps someone might archive it into some kind of list?

I see RAID card shootouts now and then.  Google should find you
something.  Thought you won't see anyone testing Linux md RAID on a
hardware RAID card in JBOD mode.

-- 
Stan
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: HBA Adaptor advice
  2011-05-21 11:54           ` Ed W
  2011-05-21 17:37             ` Leslie Rhorer
@ 2011-05-22  9:41             ` Stan Hoeppner
  2011-05-22 10:03               ` Rudy Zijlstra
  1 sibling, 1 reply; 60+ messages in thread
From: Stan Hoeppner @ 2011-05-22  9:41 UTC (permalink / raw)
  To: linux-raid

On 5/21/2011 6:54 AM, Ed W wrote:

> In fact if you go back to my question, the *entire* point is that I
> don't want the choice of card to be a point of failure, ie it's my
> specific point to purchase a card such that it can be swapped out for
> near any other card in the event of failure.

You're given about 3 or 4 conflicting requirement now WRT your 'perfect'
HBA.

What HBAs are you currently using?  How many of your stated requirements
over the past few days do your current HBAs fulfill?

Do you have a tape or D2D backup system in place?

There is no guarantee that you can swap one dead HBA for another brand
with a different chipset on board and have it work without issue.  If
you are that concerned you need to buy two identical cheap HBAs so you
have a spare.  But wait!  You must have hardware write cache for md RAID
as well.  But if you do that, you're locked into that vendor's cards.
And on, and on...

I've never seen nor heard of a real SA in a business environment
vacillate like this over a simple RAID/HBA acquisition, as if the
company's entire 1st quarter net profit was being wrapped up in this HBA
purchase.  And I've never head of an SA being concerned about cable
tripping of all damn things taking down a server.

Something in this whole thread just doesn't jibe...

-- 
Stan



^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: HBA Adaptor advice
  2011-05-22  9:41             ` Stan Hoeppner
@ 2011-05-22 10:03               ` Rudy Zijlstra
  2011-05-23  9:32                 ` Ed W
  0 siblings, 1 reply; 60+ messages in thread
From: Rudy Zijlstra @ 2011-05-22 10:03 UTC (permalink / raw)
  To: Stan Hoeppner; +Cc: linux-raid

On 05/22/2011 11:41 AM, Stan Hoeppner wrote:
> On 5/21/2011 6:54 AM, Ed W wrote:
>
>    
>> In fact if you go back to my question, the *entire* point is that I
>> don't want the choice of card to be a point of failure, ie it's my
>> specific point to purchase a card such that it can be swapped out for
>> near any other card in the event of failure.
>>      
> You're given about 3 or 4 conflicting requirement now WRT your 'perfect'
> HBA.
>
> What HBAs are you currently using?  How many of your stated requirements
> over the past few days do your current HBAs fulfill?
>
> Do you have a tape or D2D backup system in place?
>
> There is no guarantee that you can swap one dead HBA for another brand
> with a different chipset on board and have it work without issue.  If
> you are that concerned you need to buy two identical cheap HBAs so you
> have a spare.  But wait!  You must have hardware write cache for md RAID
> as well.  But if you do that, you're locked into that vendor's cards.
> And on, and on...
>
> I've never seen nor heard of a real SA in a business environment
> vacillate like this over a simple RAID/HBA acquisition, as if the
> company's entire 1st quarter net profit was being wrapped up in this HBA
> purchase.  And I've never head of an SA being concerned about cable
> tripping of all damn things taking down a server.
>
> Something in this whole thread just doesn't jibe...
>
>    
The amount of money that his time has cost discussing this & thinking 
about it, is most likely already noticeably more then the cost of a 
mid-range RAID card.

My approach (and i have my own small company):
- use HW RAID on the system disks (RAID5) (and have a spare controller 
of same type ready)
- use MD RAID on big storage with cheap disks (and have spare disks 
lying ready)
- have a nightly automated backup to different system, with versioning 
and ability to recover state of half year ago

That other system is in different building.

As i do not upgrade the servers that often, this ensures:
- i do not need to spend a long time on getting the system back up, if a 
system disk goes bye-bye
- no need to think long on how grub/lilo was supposed to be working for 
multiple disks
- no need to remember to re-install bootloader on all related disks (so 
i safe-guard against my own mistakes. Takes some money, yes, but i am 
willing to pay that part of insurance quit willingly. I am aware i make 
mistakes, especially when time pressure is high)

- backup in place for the usual stupid mistaken deletes.


yes, i keep spare controllers. Do i need them? not really... so far i 
have had only 1 raid card die on me... in 10+ years i am using them.
I've had many disks go to bit hell, and some mobo. Not raid cards though.

My main issues with this discussion, is that it assumes:
- no time pressure when the shit hits the fan
- the system maintainer does not make mistakes

Both of them fail in real-life, especially with the small businesses 
where this discussion is relevant for cost reasons. Thus my stated 
feeling of "penny wise, pound foolish". Murphy being what it is, things 
usually fail when you need to have your attention on something else. 
That means there is great opportunity at such a time to make mistakes. 
Thus the setup of such systems needs to take the human aspect into 
account. As far as i can see the setup he is defining is simply too 
complex for the situation.

I've had things fail on me when i needed to leave in 2 hour, as i had a 
flight to catch. I also needed that server to be running...


Cheers,


Rudy

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: HBA Adaptor advice
  2011-05-22  9:04         ` Stan Hoeppner
@ 2011-05-22 10:09           ` Brad Campbell
  2011-05-22 19:25             ` Stan Hoeppner
  0 siblings, 1 reply; 60+ messages in thread
From: Brad Campbell @ 2011-05-22 10:09 UTC (permalink / raw)
  To: linux-raid

On 22/05/11 17:04, Stan Hoeppner wrote:

> WD's Green drives have a 5400 rpm 'variable' spindle speed.  The Seagate
> 2.5" SAS drive has a 7.2k spindle speed.

Actually, I'm pretty sure the WD drives have a 5400 rpm spindle speed 
period. I've got 15 of them here and I have no evidence of any form of 
spindle speed variation. They say the drives have spindle speed : 
"intellipower" which is marketspeak for slow enough to save a few watts, 
but fast enough to do the job.

> It's difficult to align partitions properly on the Green drives due to
> native 4K sectors translated by drive firmware to 512B sectors.  The
> Seagate SAS drive has native 512B sectors.

Actually it's not difficult at all. You just make sure all your 
partitions start on an even multiple of 8 sectors. No magic in it. Just 
the same as all my SSD partitions start on 512k boundaries.

> The Green drives have aggressive power saving firmware not suitable for
> business use as the heads are auto parked every 8 seconds or so.  IIRC
> the drive goes into sleep mode after a short period of inactivity on the
> host interface.  In short, these drives are designed optimally for the
> "is not running" case rather than the "running" case.  Hence the name
> "Green".  How do you save power?  Turn off the drive.  And that's
> exactly what these drives are designed to do.

You can turn off the aggressive head parking with a little DOS utility, 
and they don't go to sleep at all unless you tell them to. They will 
happily keep spinning just the same as any other disk.

I'm running them in a couple of large(ish) RAID arrays. I'm not saying 
it's a good idea, it's just been my experience with ultra-cheap drives 
that if you burn in the drives to weed out the early failures, and you 
keep them running 24/7 in a nice environment they tend to last long 
enough to do the job. I tend to replace my drives at around ~30,000 
hours, so these have a long way to go yet.

On the other hand, I have my company data on Seagate Cheetah SAS drives 
in RAID-10, but I back up to the large WD Green arrays.


^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: HBA Adaptor advice
  2011-05-22 10:09           ` Brad Campbell
@ 2011-05-22 19:25             ` Stan Hoeppner
  2011-05-22 20:57               ` Tobias McNulty
                                 ` (2 more replies)
  0 siblings, 3 replies; 60+ messages in thread
From: Stan Hoeppner @ 2011-05-22 19:25 UTC (permalink / raw)
  To: Brad Campbell; +Cc: linux-raid

On 5/22/2011 5:09 AM, Brad Campbell wrote:
> On 22/05/11 17:04, Stan Hoeppner wrote:
> 
>> WD's Green drives have a 5400 rpm 'variable' spindle speed.  The Seagate
>> 2.5" SAS drive has a 7.2k spindle speed.
> 
> Actually, I'm pretty sure the WD drives have a 5400 rpm spindle speed
> period. I've got 15 of them here and I have no evidence of any form of
> spindle speed variation. They say the drives have spindle speed :
> "intellipower" which is marketspeak for slow enough to save a few watts,
> but fast enough to do the job.

From:  http://www.anandtech.com/show/2385/2

The Western Digital drive's IntelliPower algorithm, which varies the
rotational speed between 5400RPM and 7200RPM, dictates the Western
Digital's rotational speed.

>> It's difficult to align partitions properly on the Green drives due to
>> native 4K sectors translated by drive firmware to 512B sectors.  The
>> Seagate SAS drive has native 512B sectors.
> 
> Actually it's not difficult at all. You just make sure all your
> partitions start on an even multiple of 8 sectors. No magic in it. Just
> the same as all my SSD partitions start on 512k boundaries.

IIRC from discussions here, mdadm has alignment issues with hybrid
sector size drives when assembling raw disks.  Not everyone assembles
their md devices from partitions.  Many assemble raw devices.

>> The Green drives have aggressive power saving firmware not suitable for
>> business use as the heads are auto parked every 8 seconds or so.  IIRC
>> the drive goes into sleep mode after a short period of inactivity on the
>> host interface.  In short, these drives are designed optimally for the
>> "is not running" case rather than the "running" case.  Hence the name
>> "Green".  How do you save power?  Turn off the drive.  And that's
>> exactly what these drives are designed to do.
> 
> You can turn off the aggressive head parking with a little DOS utility,
> and they don't go to sleep at all unless you tell them to. They will
> happily keep spinning just the same as any other disk.

You must boot your server with MD-DOS or FreeDOS and run wdidle3 once
for each Green drive in the system.  But, IIRC, if the drives are
connected via SAS expander or SATA PMP, this will not work.  A direct
connection to the HBA is required.

Once one accounts for all the necessary labor and configuration
contortions one must put himself through to make a Green drive into a
'regular' drive, it is often far more cost effective to buy 'regular'
drives to begin with.  This saves on labor $$ which is usually greater,
from a total life cycle perspective, than the drive acquisition savings.
 The drives you end up with are already designed and tuned for the
application.  Reiterating Rudy's earlier point, using the Green drives
in arrays is "penny wise, pound foolish".

Google WD20EARS and you'll find a 100:1 or more post ratio of problems
vs praise for this drive.  This is the original 2TB model which has
shipped in much greater numbers into the marketplace than all other
Green drives.  Heck, simply search the archives of this list.

> I'm running them in a couple of large(ish) RAID arrays. I'm not saying
> it's a good idea, it's just been my experience with ultra-cheap drives
> that if you burn in the drives to weed out the early failures, and you
> keep them running 24/7 in a nice environment they tend to last long
> enough to do the job. I tend to replace my drives at around ~30,000
> hours, so these have a long way to go yet.

You're one out of 100.  Congratulations. :)

> On the other hand, I have my company data on Seagate Cheetah SAS drives
> in RAID-10, but I back up to the large WD Green arrays.

And that backup array may fail you when you need it most:  during a
restore.  Search the XFS archives for the horrific tale at University of
California Santa Cruz.  The SA lost ~7TB of doctoral student research
data due to multiple WD20EARS drives in his primary storage arrays *and*
his D2D backup array dying in quick succession.  IIRC multiple grad
students were forced to attend another semester to redo their
experiments and field work to recreate the lost data, so they could then
submit their theses.

How much did this incident cost the university and the Ph. D. students
in real money and lost time?  I'm sure some actuaries might be able to
tell you, and the real cost is likely hundreds of thousands of times the
cost savings of using these crap drives, especially when you figure in
the lost salaries for 6 months of these Ph. D. students.  Depending on
their field this could be over $100k per student.  If such 10 students
were affected that's potentially $1 million in lost earnings alone.

Spending an additional $10-20K on proper disk drives would have saved an
enormous amount in this case, and not just purely money.  If you were
one of the students who was told you had to repeat a semester because a
computer lost all of your research data, how would you digest and cope
with that?  I'd bet at least one, if not more, lawsuits/settlements will
results from this.

Give that things like this can, and DO happen when banking on cheap
consumer drives in a production environment, why would anyone ever take
such a chance?

-- 
Stan

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: HBA Adaptor advice
  2011-05-22 19:25             ` Stan Hoeppner
@ 2011-05-22 20:57               ` Tobias McNulty
  2011-05-22 21:13                 ` Johannes Truschnigg
  2011-05-22 23:19               ` Brad Campbell
  2011-05-22 23:44               ` Brad Campbell
  2 siblings, 1 reply; 60+ messages in thread
From: Tobias McNulty @ 2011-05-22 20:57 UTC (permalink / raw)
  To: Stan Hoeppner; +Cc: Brad Campbell, linux-raid

On Sun, May 22, 2011 at 3:25 PM, Stan Hoeppner <stan@hardwarefreak.com> wrote:
>
> On 5/22/2011 5:09 AM, Brad Campbell wrote:
> > On 22/05/11 17:04, Stan Hoeppner wrote:
> >> It's difficult to align partitions properly on the Green drives due to
> >> native 4K sectors translated by drive firmware to 512B sectors.  The
> >> Seagate SAS drive has native 512B sectors.
> >
> > Actually it's not difficult at all. You just make sure all your
> > partitions start on an even multiple of 8 sectors. No magic in it. Just
> > the same as all my SSD partitions start on 512k boundaries.
>
> IIRC from discussions here, mdadm has alignment issues with hybrid
> sector size drives when assembling raw disks.  Not everyone assembles
> their md devices from partitions.  Many assemble raw devices.

Case in point: I have 4 of these 2TB Green drives in a RAID5 array.  I
assembled them from the raw devices (no partition table) without any
special precautions.  Am I in trouble?  The array seems to be working
fine...

Tobias
--
Tobias McNulty, Managing Member
Caktus Consulting Group, LLC
http://www.caktusgroup.com
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: HBA Adaptor advice
  2011-05-22 20:57               ` Tobias McNulty
@ 2011-05-22 21:13                 ` Johannes Truschnigg
  2011-05-23  9:48                   ` Ed W
  0 siblings, 1 reply; 60+ messages in thread
From: Johannes Truschnigg @ 2011-05-22 21:13 UTC (permalink / raw)
  To: Tobias McNulty; +Cc: linux-raid

[-- Attachment #1: Type: text/plain, Size: 1157 bytes --]

On 05/22/2011 10:57 PM, Tobias McNulty wrote:
> Case in point: I have 4 of these 2TB Green drives in a RAID5 array.  I
> assembled them from the raw devices (no partition table) without any
> special precautions.  Am I in trouble?  The array seems to be working
> fine...

No, you aren't. If you don't create a partition table in the first
place, there's no possibility for partition boundaries to be mis-aligned
in regard to the physical sector or erase block size of the underlying
blockdevice. You could probably still get it wrong if you chose (if
that's even possible, I don't know for sure off-hand) a very weird
non-power-of-two chunk size that happens to interfere with the sector
size of your disks in a bad way, but since md's default chunk sizes are
rather large powers of two, you'd have to put some effort into screwing
up (if that is at all possible, as I mentioned before) ;)


-- 
with best regards:
- Johannes Truschnigg ( johannes@truschnigg.info )

www: http://johannes.truschnigg.info/
phone: +43 650 2 133337
xmpp: johannes@truschnigg.info

Please do not bother me with HTML-eMail or attachments. Thank you.


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 261 bytes --]

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: HBA Adaptor advice
  2011-05-22 19:25             ` Stan Hoeppner
  2011-05-22 20:57               ` Tobias McNulty
@ 2011-05-22 23:19               ` Brad Campbell
  2011-05-23  4:09                 ` Roman Mamedov
  2011-05-23  6:54                 ` Stan Hoeppner
  2011-05-22 23:44               ` Brad Campbell
  2 siblings, 2 replies; 60+ messages in thread
From: Brad Campbell @ 2011-05-22 23:19 UTC (permalink / raw)
  To: Stan Hoeppner; +Cc: linux-raid

On 23/05/11 03:25, Stan Hoeppner wrote:
> On 5/22/2011 5:09 AM, Brad Campbell wrote:
>> On 22/05/11 17:04, Stan Hoeppner wrote:
>>
>>> WD's Green drives have a 5400 rpm 'variable' spindle speed.  The Seagate
>>> 2.5" SAS drive has a 7.2k spindle speed.
>>
>> Actually, I'm pretty sure the WD drives have a 5400 rpm spindle speed
>> period. I've got 15 of them here and I have no evidence of any form of
>> spindle speed variation. They say the drives have spindle speed :
>> "intellipower" which is marketspeak for slow enough to save a few watts,
>> but fast enough to do the job.
>
> From:  http://www.anandtech.com/show/2385/2
>
> The Western Digital drive's IntelliPower algorithm, which varies the
> rotational speed between 5400RPM and 7200RPM, dictates the Western
> Digital's rotational speed.

"In 2007 Western Digital announced the WD GP drive touting rotational 
speed "between 7200 and 5400 rpm", which, if potentially misleading, is 
technically correct; the drive spins at 5405 rpm, and the Green Power 
spin speed is not variable.[citation needed]"

http://en.wikipedia.org/wiki/Western_Digital

They're not variable. Or to put it another way, if they _can_ vary the 
spindle speed none of mine ever do.

Can you imagine the potential vibration nightmare as 10 drives vary 
their spindle speed up and down? Not to mention the extra load on the 
+12V rail and the delays while waiting for the platters to reach servo lock?

> IIRC from discussions here, mdadm has alignment issues with hybrid
> sector size drives when assembling raw disks.  Not everyone assembles
> their md devices from partitions.  Many assemble raw devices.

Which means the data starts at sector 0. That's an even multiple of 8. 
Job done. (Mine are all assembled raw also).

> You must boot your server with MD-DOS or FreeDOS and run wdidle3 once
> for each Green drive in the system.  But, IIRC, if the drives are
> connected via SAS expander or SATA PMP, this will not work.  A direct
> connection to the HBA is required.

Indeed. In my workshop I have an old machine with 3 SATA hotswap bays 
that allowed me to do 3 at once, booting off a USB key into DOS.

> Once one accounts for all the necessary labor and configuration
> contortions one must put himself through to make a Green drive into a
> 'regular' drive, it is often far more cost effective to buy 'regular'
> drives to begin with.  This saves on labor $$ which is usually greater,
> from a total life cycle perspective, than the drive acquisition savings.
>  The drives you end up with are already designed and tuned for the
> application.  Reiterating Rudy's earlier point, using the Green drives
> in arrays is "penny wise, pound foolish".
>

I agree with you. If I were doing it again I'd spend some extra $$$ on 
better drives, but I've already outlaid the cash and have a working array.

> Google WD20EARS and you'll find a 100:1 or more post ratio of problems
> vs praise for this drive.  This is the original 2TB model which has
> shipped in much greater numbers into the marketplace than all other
> Green drives.  Heck, simply search the archives of this list.

Indeed, but the same follows for almost any drive. People are quick to 
voice their discontent but not so quick to praise something that does 
what it says on the tin.

> And that backup array may fail you when you need it most:  during a
> restore.  Search the XFS archives for the horrific tale at University of
> California Santa Cruz.  The SA lost ~7TB of doctoral student research
> data due to multiple WD20EARS drives in his primary storage arrays *and*
> his D2D backup array dying in quick succession.  IIRC multiple grad
> students were forced to attend another semester to redo their
> experiments and field work to recreate the lost data, so they could then
> submit their theses.
>

Perhaps. Mine get a SMART short test every morning, a LONG every Sunday 
and a complete array scrub every other Sunday. My critical backups are 
also replicated to a WD World Edition Mybook that lives in another building.

I've had quite a few large arrays over the years, all comprised of the 
cheapest available storage at the time. I've had drives fail, but aside 
from a Sil3124 controller induced array failure I've never lost data 
because of a cheap hard disk and I've saved many, many, many $$$ on drives.

I'm not arguing the penny wise, pound foolish sentiment. I'm just 
stating my personal experience has been otherwise with drives.


^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: HBA Adaptor advice
  2011-05-22 19:25             ` Stan Hoeppner
  2011-05-22 20:57               ` Tobias McNulty
  2011-05-22 23:19               ` Brad Campbell
@ 2011-05-22 23:44               ` Brad Campbell
  2011-05-23  0:07                 ` Brad Campbell
  2011-05-23  9:58                 ` Stan Hoeppner
  2 siblings, 2 replies; 60+ messages in thread
From: Brad Campbell @ 2011-05-22 23:44 UTC (permalink / raw)
  To: Stan Hoeppner; +Cc: Brad Campbell, linux-raid

On 23/05/11 03:25, Stan Hoeppner wrote:
>
> And that backup array may fail you when you need it most:  during a
> restore.  Search the XFS archives for the horrific tale at University of
> California Santa Cruz.  The SA lost ~7TB of doctoral student research
> data due to multiple WD20EARS drives in his primary storage arrays *and*
> his D2D backup array dying in quick succession.  IIRC multiple grad
> students were forced to attend another semester to redo their
> experiments and field work to recreate the lost data, so they could then
> submit their theses.

So I "googled" that thread, and after I picked my way past all the top rating hits which appear to 
be you telling people to google that thread I found the real problem.

He used WD commodity drives on a "hardware" RAID enclosure that needed TLER. The RAID-5 kicked out 4 
drives in a short period of time, so he power cycled it and re-initialised the array and it came up 
fine, but blank (as it would as he re-initialised it).

Sorry Stan, that's not a failure of the drives. He lost the data due to limitations in his RAID 
configuration and bad management.



^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: HBA Adaptor advice
  2011-05-22 23:44               ` Brad Campbell
@ 2011-05-23  0:07                 ` Brad Campbell
  2011-05-23  5:30                   ` Stefan /*St0fF*/ Hübner
  2011-05-23  9:58                 ` Stan Hoeppner
  1 sibling, 1 reply; 60+ messages in thread
From: Brad Campbell @ 2011-05-23  0:07 UTC (permalink / raw)
  To: Stan Hoeppner; +Cc: linux-raid

On 23/05/11 07:44, Brad Campbell wrote:
> He used WD commodity drives on a "hardware" RAID enclosure that needed TLER. The RAID-5 kicked out 4
> drives in a short period of time, so he power cycled it and re-initialised the array and it came up
> fine, but blank (as it would as he re-initialised it).
>

Just to clarify that as it was somewhat muddled. The initial failure was on an unspecified array 
with unspecified drives and resulted in a blank array. The backup failure was TLER related using WD 
GP drives on a hardware array and was left unresolved.

That's still not concrete evidence of those drives failing, it's just using the wrong tool for the 
wrong job.

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: HBA Adaptor advice
  2011-05-22 23:19               ` Brad Campbell
@ 2011-05-23  4:09                 ` Roman Mamedov
  2011-05-23  5:54                   ` Brad Campbell
  2011-05-23  6:54                 ` Stan Hoeppner
  1 sibling, 1 reply; 60+ messages in thread
From: Roman Mamedov @ 2011-05-23  4:09 UTC (permalink / raw)
  To: Brad Campbell; +Cc: Stan Hoeppner, linux-raid

[-- Attachment #1: Type: text/plain, Size: 3571 bytes --]

On Mon, 23 May 2011 07:19:50 +0800
Brad Campbell <lists2009@fnarfbargle.com> wrote:

> On 23/05/11 03:25, Stan Hoeppner wrote:
> >> Actually, I'm pretty sure the WD drives have a 5400 rpm spindle speed
> >> period. I've got 15 of them here and I have no evidence of any form of
> >> spindle speed variation. They say the drives have spindle speed :
> >> "intellipower" which is marketspeak for slow enough to save a few watts,
> >> but fast enough to do the job.
> >
> > From:  http://www.anandtech.com/show/2385/2
> >
> > The Western Digital drive's IntelliPower algorithm, which varies the
> > rotational speed between 5400RPM and 7200RPM, dictates the Western
> > Digital's rotational speed.

You can find 100s or 1000s of articles reiterating the manufacturer's
marketing materials without much thought or experimentation.

On the other hand there have been some tests of these drives involving a
microphone and rotational noise frequency to RPM calculation which show it
does not vary ever. I can't dig up those off-hand though; so for the
'next-best' proof see HddRpmEst results by the japanese link below.

> "In 2007 Western Digital announced the WD GP drive touting rotational 
> speed "between 7200 and 5400 rpm", which, if potentially misleading, is 
> technically correct; the drive spins at 5405 rpm, and the Green Power 
> spin speed is not variable.[citation needed]"

And here is a couple of [citations]:
http://www.ciol.com/News/News-Reports/Seagate-targets-Western-Digitals-IntelliPower/131009126262/0/
http://www.storagereview.com/1000.sr

By the way, 5400-7200 isn't even true in any sense of the word. 
There are some models of WD20EARS (e.g. 00MVWB0, maybe others) which spin at
constant 5000 RPM instead: http://club.coneco.net/user/10682/review/37049/

> > IIRC from discussions here, mdadm has alignment issues with hybrid
> > sector size drives when assembling raw disks.  Not everyone assembles
> > their md devices from partitions.  Many assemble raw devices.
> 
> Which means the data starts at sector 0. That's an even multiple of 8. 
> Job done. (Mine are all assembled raw also).

The data does not necessarily start at sector 0. However it still most
likely to be fine:

$ sudo mdadm --examine /dev/sdb3 | grep Offset
    Data Offset : 2048 sectors
   Super Offset : 8 sectors

> > Google WD20EARS and you'll find a 100:1 or more post ratio of problems
> > vs praise for this drive.  This is the original 2TB model which has
> > shipped in much greater numbers into the marketplace than all other
> > Green drives.  Heck, simply search the archives of this list.
> 
> Indeed, but the same follows for almost any drive. People are quick to 
> voice their discontent but not so quick to praise something that does 
> what it says on the tin.

Personally so far I have been sucessful in avoiding the "Advanced Format" crap
that WD and others are pushing down customers' throats; I have none of such
drives. It *is* possible to make a non-AF 2TB drive, even a 3-platter one. And
this one in my opinion the ideal Green drive to buy today, which has the
advantages of being non-AF and at the same time still in production (maybe
not for too long, with WD buying Hitachi :-/ ): 

  Hitachi 5K3000 HDS5C3020ALA632
  http://www.newegg.com/Product/Product.aspx?Item=N82E16822145475

I also had only the best experiences with Hitachi HDDs, and it looks like I am
not alone:
http://www.tomshardware.com/reviews/hdd-reliability-storelab,2681-2.html

-- 
With respect,
Roman

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 198 bytes --]

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: HBA Adaptor advice
  2011-05-23  0:07                 ` Brad Campbell
@ 2011-05-23  5:30                   ` Stefan /*St0fF*/ Hübner
  2011-05-23 10:18                     ` Ed W
  0 siblings, 1 reply; 60+ messages in thread
From: Stefan /*St0fF*/ Hübner @ 2011-05-23  5:30 UTC (permalink / raw)
  To: Brad Campbell; +Cc: Stan Hoeppner, linux-raid

Am 23.05.2011 02:07, schrieb Brad Campbell:
> On 23/05/11 07:44, Brad Campbell wrote:
>> He used WD commodity drives on a "hardware" RAID enclosure that needed
>> TLER. The RAID-5 kicked out 4
>> drives in a short period of time, so he power cycled it and
>> re-initialised the array and it came up
>> fine, but blank (as it would as he re-initialised it).
>>
> 
> Just to clarify that as it was somewhat muddled. The initial failure was
> on an unspecified array with unspecified drives and resulted in a blank
> array. The backup failure was TLER related using WD GP drives on a
> hardware array and was left unresolved.
> 
> That's still not concrete evidence of those drives failing, it's just
> using the wrong tool for the wrong job.

Just to clarify a bit more: the elder WD20EADS (notice the 'D') worked
very well and until Nov 2009 they were TLER capable (which means: ERC
timeouts could be set to non-zero, the setting was preserved over
power-cycles).  Short after the Firmware-Patch that removed the
TLER-ability those WD20EARS (notice 'R') appeared.  From that moment on
our WD-failure-rates started to climb noticeably and has not fallen
again, since.

We also noticed that WD started to do the same mistake (customer-wise)
as Seagate.  More than 50% of their "certified repaired" disks (those
which you get back after sending in defective drives for RMA) died soon
after putting them back into work.  I will not comment further on this
statement.

We sell 200+ drives a week from our "at that time preferred"
Manufacturer.  That was WD from 2008 till the beginning of 2010.  But
since the climb of wd failure rates we're at "Hitachi" and have
astounding failure rates of less than one percent.  I hope this will
stay the case even after WD bought Hitachi GST...

Conclusion about this university-storage-failure: wrong drives for this
scenario.  It would've been OK to use the cheap WDs for backup (if the
backup was at least RAID6 and sends error-mails to the admin).  But the
primary storage was a big fail.  You do not use this kind of storage for
data which is worth much time (and by that much money).

Just a few pro-cents,
Stefan

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: HBA Adaptor advice
  2011-05-23  4:09                 ` Roman Mamedov
@ 2011-05-23  5:54                   ` Brad Campbell
  2011-05-23  6:08                     ` Roman Mamedov
  2011-05-23 10:42                     ` Stan Hoeppner
  0 siblings, 2 replies; 60+ messages in thread
From: Brad Campbell @ 2011-05-23  5:54 UTC (permalink / raw)
  To: Roman Mamedov; +Cc: Stan Hoeppner, linux-raid

On 23/05/11 12:09, Roman Mamedov wrote:

> Personally so far I have been sucessful in avoiding the "Advanced Format" crap
> that WD and others are pushing down customers' throats; I have none of such
> drives. It *is* possible to make a non-AF 2TB drive, even a 3-platter one. And
> this one in my opinion the ideal Green drive to buy today, which has the
> advantages of being non-AF and at the same time still in production (maybe
> not for too long, with WD buying Hitachi :-/ ):

I think the term "Advanced Format" crap is a bit harsh.
The reality is that for drives > 2TB it is simply inevitable that bigger 
sectors will be required.

Most sane operating systems use cluster sizes of 4k or larger and have 
done for years, so I really don't see what all the fuss is about.

Peoples inability to properly align the data on their disks can be read 
either as a failing in the technology (the partitioning applications 
have not caught up yet) or simply a lack of understanding on how to 
apply the technology.

Don't blame the drive manufacturers, this should have happened _years_ ago.


^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: HBA Adaptor advice
  2011-05-23  5:54                   ` Brad Campbell
@ 2011-05-23  6:08                     ` Roman Mamedov
  2011-05-23 10:42                     ` Stan Hoeppner
  1 sibling, 0 replies; 60+ messages in thread
From: Roman Mamedov @ 2011-05-23  6:08 UTC (permalink / raw)
  To: Brad Campbell; +Cc: Stan Hoeppner, linux-raid

[-- Attachment #1: Type: text/plain, Size: 794 bytes --]

On Mon, 23 May 2011 13:54:58 +0800
Brad Campbell <lists2009@fnarfbargle.com> wrote:

> I think the term "Advanced Format" crap is a bit harsh.
> The reality is that for drives > 2TB it is simply inevitable that bigger 
> sectors will be required.

For >2TB maybe, but not for 2TB.

> Most sane operating systems use cluster sizes of 4k or larger and have 
> done for years, so I really don't see what all the fuss is about.

4K drives shouldn't lie that they have 512 byte sectors, pretending all is
fine but doing that horrendous r-m-w translation under the hood; at least they
should be switchable (perhaps by a jumper?) into the 4K-native mode. No way to
improperly align anything if a drive honestly tells it has 4K/4K
logical/physical sector.

-- 
With respect,
Roman

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 198 bytes --]

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: HBA Adaptor advice
  2011-05-22 23:19               ` Brad Campbell
  2011-05-23  4:09                 ` Roman Mamedov
@ 2011-05-23  6:54                 ` Stan Hoeppner
  2011-05-23  7:23                   ` Brad Campbell
  1 sibling, 1 reply; 60+ messages in thread
From: Stan Hoeppner @ 2011-05-23  6:54 UTC (permalink / raw)
  To: Brad Campbell; +Cc: linux-raid

On 5/22/2011 6:19 PM, Brad Campbell wrote:

> They're not variable. Or to put it another way, if they _can_ vary the
> spindle speed none of mine ever do.

There's too little official info from WD on what exactly the variable
speed IntelliPower actually means.

> Can you imagine the potential vibration nightmare as 10 drives vary
> their spindle speed up and down? Not to mention the extra load on the
> +12V rail and the delays while waiting for the platters to reach servo
> lock?

WD sells this drive for *consumer* use, meaning 1 to a few drives, where
multi-drive oscillation isn't going to be an issue.  Given that fact,
it's not hard for a WD or Seagate et al in 2011 to build such a drive
with a variable spindle speed.  Apparently WD has done just that.

And btw, HDDs pull the bulk of their power from the 5 volt rail, not the
12 volt rail.  This is the main differentiating factor between a server
PSU and a PC PSU--much more current available on the 5v rail.  Hop on
NewEgg and compare the 12v and 5v rail current of an PC SLI PSU and a
server PSU.

>> IIRC from discussions here, mdadm has alignment issues with hybrid
>> sector size drives when assembling raw disks.  Not everyone assembles
>> their md devices from partitions.  Many assemble raw devices.
> 
> Which means the data starts at sector 0. That's an even multiple of 8.
> Job done. (Mine are all assembled raw also).

I had it slightly backwards.  Thanks for the correction.  The problem
case is building arrays from partitions created using defaults of
many/most partitioning tools.

>> You must boot your server with MD-DOS or FreeDOS and run wdidle3 once
>> for each Green drive in the system.  But, IIRC, if the drives are
>> connected via SAS expander or SATA PMP, this will not work.  A direct
>> connection to the HBA is required.
> 
> Indeed. In my workshop I have an old machine with 3 SATA hotswap bays
> that allowed me to do 3 at once, booting off a USB key into DOS.
> 
>> Once one accounts for all the necessary labor and configuration
>> contortions one must put himself through to make a Green drive into a
>> 'regular' drive, it is often far more cost effective to buy 'regular'
>> drives to begin with.  This saves on labor $$ which is usually greater,
>> from a total life cycle perspective, than the drive acquisition savings.
>>  The drives you end up with are already designed and tuned for the
>> application.  Reiterating Rudy's earlier point, using the Green drives
>> in arrays is "penny wise, pound foolish".
> 
> I agree with you. If I were doing it again I'd spend some extra $$$ on
> better drives, but I've already outlaid the cash and have a working array.

I think a lot of people fell into this 'trap' due the super low price/GB
of the Green drives, and simply not realizing we now have boutique hard
drives and a variety of application tailored drives, just as we have 31
flavors of ice cream.

>> Google WD20EARS and you'll find a 100:1 or more post ratio of problems
>> vs praise for this drive.  This is the original 2TB model which has
>> shipped in much greater numbers into the marketplace than all other
>> Green drives.  Heck, simply search the archives of this list.
> 
> Indeed, but the same follows for almost any drive. People are quick to
> voice their discontent but not so quick to praise something that does
> what it says on the tin.

The WD20EARS was far worse than the typical scenario you describe.
Interestingly, though, the drive itself is not at fault.  The two
problems associated with the drive are:

1.  Deploying it in the wrong application--primary RAID arrays
2.  The Linux partitioning tools lack(ed) support for 512B/4KB hybrids

Desktop MS Windows users seem to love these drives.  They're using them
as intended, go figure...

>> And that backup array may fail you when you need it most:  during a
>> restore.  Search the XFS archives for the horrific tale at University of
>> California Santa Cruz.  The SA lost ~7TB of doctoral student research
>> data due to multiple WD20EARS drives in his primary storage arrays *and*
>> his D2D backup array dying in quick succession.  IIRC multiple grad
>> students were forced to attend another semester to redo their
>> experiments and field work to recreate the lost data, so they could then
>> submit their theses.
> 
> Perhaps. Mine get a SMART short test every morning, a LONG every Sunday
> and a complete array scrub every other Sunday. My critical backups are
> also replicated to a WD World Edition Mybook that lives in another
> building.

I don't like disparaging other SAs, so I didn't go into that aspect of
the tale.  In summary, the SA tasked with managing that system had zero
monitoring in place, no proactive testing, nothing.  He was flying
blind.  When XFS "dropped" 12TB of the 60TB filesystem it took this SA
over a day to realize an entire RAID chassis had gone offline due to
multiple drives failures.  It took him almost a week, with lots of XFS
mailing list expertise, to save the intact 4/5ths of filesystem.  If
he'd have used LVM or md striping instead of concatenation he'd have
lost the entire 60TB filesystem.  He had a backup on a D2D server which
was also built of the 2TB green drives.  Turns out that system already
had 2 of its RAID6 drives down, and a 3 failed while he was
troubleshooting the file server problem.  He discovered this fact when I
decided to attempt a restore of the lost 12TB.

> I've had quite a few large arrays over the years, all comprised of the
> cheapest available storage at the time. I've had drives fail, but aside
> from a Sil3124 controller induced array failure I've never lost data
> because of a cheap hard disk and I've saved many, many, many $$$ on drives.

The problem I see most often, and have experienced first hand, isn't
losing data due to drive failure once in production.  The problem is
usually getting arrays stable when 'pounding them on the test bench'.
I've used hardware RAID far more often than md RAID over the years, and
some/many hardware RAID cards are just really damn picky about which
drives they'll work reliably with.  md RAID is more forgiving in this
regard, ones of its many benefits.

> I'm not arguing the penny wise, pound foolish sentiment. I'm just
> stating my personal experience has been otherwise with drives.

One shoe won't fit every foot.  Going the cheap route is typically more
labor intensive.  If proper procedures are used to monitor and replace
before the sky falls, this solution can work in many environments.  In
other environments, drive failure notification must be automatic,
management software and light path diagnostics must clearly show which
drive has failed, all so a $15/hour low skilled datacenter technician
can walk down the rack isle, find the dead drive, pull and replace it,
without system administrator intervention.  The SA will simply launch
his management console and make sure the array is auto-rebuilding.

-- 
Stan

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: HBA Adaptor advice
  2011-05-23  6:54                 ` Stan Hoeppner
@ 2011-05-23  7:23                   ` Brad Campbell
  0 siblings, 0 replies; 60+ messages in thread
From: Brad Campbell @ 2011-05-23  7:23 UTC (permalink / raw)
  To: Stan Hoeppner; +Cc: linux-raid

On 23/05/11 14:54, Stan Hoeppner wrote:
> On 5/22/2011 6:19 PM, Brad Campbell wrote:
>
>> They're not variable. Or to put it another way, if they _can_ vary the
>> spindle speed none of mine ever do.
>
> There's too little official info from WD on what exactly the variable
> speed IntelliPower actually means.
>

I can actually answer that one with certainty.

Intellipower is a combination of a fixed (but mostly non-standard) spin 
speed, clever cache usage and a variable speed seek.

It's the variable speed seek that trips people up. The drive knows where 
it is in its rotational cycle, and it knows how far it has to move to 
read the next block. If its rotational latency (and remember it's 
_slow_) is going to be greater than its seek time, it slows the seek 
down to save power. No point snapping the head to the next block if it 
is just going to have to wait for the data to arrive.

Unfortunately early on a couple of "benchmark" sites got it horribly 
wrong, it then got picked up by sites that are normally pretty reliable 
(Anandtech for example) and it just propagated from there.

If anyone would like concrete proof, I'm willing to sacrifice a 1TB 
drive I have here by popping the top and putting an optical tacho on it. 
Just give me a couple of weeks to get it out of service and get an 
optical tacho approved by the war office ;)

> And btw, HDDs pull the bulk of their power from the 5 volt rail, not the
> 12 volt rail.  This is the main differentiating factor between a server
> PSU and a PC PSU--much more current available on the 5v rail.  Hop on
> NewEgg and compare the 12v and 5v rail current of an PC SLI PSU and a
> server PSU.

Have another look at the drive data sheets. The bulk of the load on the 
+12V rail is the drive motor while spinning up. Even my Cheetah's detail 
this accurately in the data sheet. The logic is +5V but the magnetics 
tend to lean on the +12v rail pretty heavily.

> I think a lot of people fell into this 'trap' due the super low price/GB
> of the Green drives, and simply not realizing we now have boutique hard
> drives and a variety of application tailored drives, just as we have 31
> flavors of ice cream.

I kinda knew what I was getting myself into. Here's my reasoning.

The bulk of my storage is what you would call nearline. Write once, read 
lots but not very often. Media, movies, source trees, backups.

My storage used to be spread across 2 servers (30 drives in total) and 
one of those was comprised of 15 250G Maxtors. I did the power 
calculations and figured I could justify replacing 10 1TB drives in 
Server A with 10 WD GP 2T drives and decommission server B.

This will pay for itself in 14 Months.

I did a lot of research on the GP drives and figured for my use pattern 
they'd be ok. Remember, I keep the server up 24/7. It's on an APC Smart 
UPS (boost/buck + UPS) and it gets rebooted for kernel updates every 200 
days or so, but never powered down.

My experience has been that even with the cheapest consumer drives, if 
you keep them spinning and keep them warm they'll go the distance after 
weeding out the early life failures. Now, I might have a time bomb 
sitting there and suffer massive drive failure next week totalling my 
array, but then I knew what I was getting into before I started.

Realizing that for an extra $150 overall I could have had Hitachi 7200 
RPM drives was a bit of a DOH! moment, but then the power savings did 
not stack up as well.

You pays your money, you takes your chances.

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: HBA Adaptor advice
  2011-05-22 10:03               ` Rudy Zijlstra
@ 2011-05-23  9:32                 ` Ed W
  0 siblings, 0 replies; 60+ messages in thread
From: Ed W @ 2011-05-23  9:32 UTC (permalink / raw)
  To: Rudy Zijlstra; +Cc: linux-raid

On 22/05/2011 11:03, Rudy Zijlstra wrote:
> The amount of money that his time has cost discussing this & thinking
> about it, is most likely already noticeably more then the cost of a
> mid-range RAID card.

But hopefully that cost is shared across plenty of people who are now
more educated about the state of linux raid?

Please don't bogged down with this - what is obvious to someone who
hangs around here plenty, is not necessarily obvious to others of us
with less experience

Kind regards

Ed W

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: HBA Adaptor advice
  2011-05-22 21:13                 ` Johannes Truschnigg
@ 2011-05-23  9:48                   ` Ed W
  2011-05-23 10:44                     ` John Robinson
  0 siblings, 1 reply; 60+ messages in thread
From: Ed W @ 2011-05-23  9:48 UTC (permalink / raw)
  To: Johannes Truschnigg; +Cc: Tobias McNulty, linux-raid

On 22/05/2011 22:13, Johannes Truschnigg wrote:
> On 05/22/2011 10:57 PM, Tobias McNulty wrote:
>> Case in point: I have 4 of these 2TB Green drives in a RAID5 array.  I
>> assembled them from the raw devices (no partition table) without any
>> special precautions.  Am I in trouble?  The array seems to be working
>> fine...
> 
> No, you aren't. If you don't create a partition table in the first
> place, there's no possibility for partition boundaries to be mis-aligned
> in regard to the physical sector or erase block size of the underlying
> blockdevice. You could probably still get it wrong if you chose (if

Pardon what is probably a very ignorant question, but someone earlier in
this thread claimed that some adaptors report the size of the disk
slightly differently?  Wouldn't this potentially cause problems if you
needed to move the disks to a different controller?

Additionally if you needed to replace the disk then some new batch might
be some few sectors smaller?  This seems to be the biggest reason for
wanting to add a partition table and then deliberately partition some
10s MB smaller? (Think I saw this exact problem come up several times in
the last few weeks alone?)

Cheers

Ed W

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: HBA Adaptor advice
  2011-05-22 23:44               ` Brad Campbell
  2011-05-23  0:07                 ` Brad Campbell
@ 2011-05-23  9:58                 ` Stan Hoeppner
  2011-05-23 10:33                   ` Ed W
  1 sibling, 1 reply; 60+ messages in thread
From: Stan Hoeppner @ 2011-05-23  9:58 UTC (permalink / raw)
  To: Brad Campbell; +Cc: linux-raid

On 5/22/2011 6:44 PM, Brad Campbell wrote:

> He used WD commodity drives on a "hardware" RAID enclosure that needed
> TLER. The RAID-5 kicked out 4 drives in a short period of time, so he
> power cycled it and re-initialised the array and it came up fine, but
> blank (as it would as he re-initialised it).

It was/is an Excel Meridian Data SecurStor Astra ES with 4 expansion
chassis.  Low end, many would call it junk (myself included).  Excel
Meridian Data, formerly known as Excel CD-ROM, has been re-badging cheap
Taiwanese, and now Chinese, junk since their inception in 1992.

Eli stated this gear was sold to UCSC as a packaged, warrantied, storage
system, by an unnamed local SoCal vendor.  I took the man at his word.
We can speculate that he lied about it and tossed it together himself
form a catalog, which seems plausible given EMD's business model, but
that makes no material difference in this discussion, which is that WD
Green drives are not appropriate for RAID arrays, regardless of who
selected the components and whose hands assembled the hardware.

> Sorry Stan, that's not a failure of the drives. He lost the data due to
> limitations in his RAID configuration and bad management.

On the contrary.  It *is* a failure of the drives.  They failed to
perform properly in the chosen application environment, because the
vendor/end user put them in an unsupported environment.  That's the
whole theme of this thread, and precisely why I encouraged people to
read about this fiasco and the potential costs of using these drives in
RAIDs.

Whether the spindles motors quit, the PCBs failed, or they were merely
kicked offline due to any of a half dozen reasons, these are all drive
failures.  When a drive goes offline doe you call it "success"?  No.
What's the opposite of success?  Failure.

The Astra ES is almost certainly running embedded Linux + md RAID due to
its price point.  I can't locate the EMD website nor the PDF for this
Astra unit because every Excel Meridian domain Google'ing returned is
currently squatted.  They may have gone belly up.

If indeed that box uses embedded Linux + md RAID, TLER wouldn't have
been the problem.  Multiple drives definitely went offline, but I doubt
it's due to a real RAID ASIC with custom firmware and a TLER issue.
More likely, given the price, it was a backplane signal quality problem,
for which cheap backplanes are notorious.  Either way, cheap
not-fit-for-RAID drives were stuffed into a cheap RAID box and disaster
was the result.  People buying these drives for array use aren't
dropping them into quality backplanes, but cheap ones.  The entire
ecosystem of components used to build a WD Green drive array are
typically of much lower quality than the drives themselves.  Cheap
backplane + cheap drives + cheap HBAs = high probability of disaster.

In summary, very few people are going to successfully build reliable
arrays from these drives.  I've seen too many horror stories, the UCSC
fiasco being the most severe.  I'm simply trying want to prevent others
from suffering similar disasters.  I think that's a worthy cause.  WDC
itself says not to use the Green drives in RAID arrays.  I'm supplying
examples of real world disasters to support WDC's disclaimer, and
prevent some heartache.

-- 
Stan

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: HBA Adaptor advice
  2011-05-23  5:30                   ` Stefan /*St0fF*/ Hübner
@ 2011-05-23 10:18                     ` Ed W
  0 siblings, 0 replies; 60+ messages in thread
From: Ed W @ 2011-05-23 10:18 UTC (permalink / raw)
  To: stefan.huebner; +Cc: linux-raid

On 23/05/2011 06:30, Stefan /*St0fF*/ Hübner wrote:

> We sell 200+ drives a week from our "at that time preferred"
> Manufacturer.  That was WD from 2008 till the beginning of 2010.  

Just to clarify - are all those "failures" basically attributable to
drives with unreadable sectors which then drop out of arrays due to lack
of TLER?

ie if the drive DID have TLER then it would likely not have been
reported as a failed drive? (But presumably smart might report a
re-allocated sector and you might get a sectors dataloss?)



> But
> since the climb of wd failure rates we're at "Hitachi" and have
> astounding failure rates of less than one percent.  I hope this will
> stay the case even after WD bought Hitachi GST...

Likewise is this because the Hitachi drives appear more reliable or
because they incorporate some kind of TLER which keeps them running in
the face of reallocated sectors?

Can you draw your conclusion to the desktop Hitachi drives also? Do
these also suffer lower failure rates? Can they be made "TLER" compatible?


> Conclusion about this university-storage-failure: wrong drives for this
> scenario.  It would've been OK to use the cheap WDs for backup (if the
> backup was at least RAID6 and sends error-mails to the admin).  But the
> primary storage was a big fail.  You do not use this kind of storage for
> data which is worth much time (and by that much money).

How do folks here react to Googles paper stating that largely they find
little difference in reliability between "raid drives" and consumer drives?

Granted it's a problem if a drive pops out of an array because it has a
reallocated sector, but a) do folks with TLER drives immediately replace
the drive when they see a reallocated sector? b) those without TLER run
badblocks and put the drive back into the array c) can MD raid work
around the limitations of lacking TLER and consumer drives?

Thanks

Ed W

--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: HBA Adaptor advice
  2011-05-23  9:58                 ` Stan Hoeppner
@ 2011-05-23 10:33                   ` Ed W
  2011-05-23 11:21                     ` Stan Hoeppner
  0 siblings, 1 reply; 60+ messages in thread
From: Ed W @ 2011-05-23 10:33 UTC (permalink / raw)
  To: Stan Hoeppner; +Cc: Brad Campbell, linux-raid

On 23/05/2011 10:58, Stan Hoeppner wrote:
> Multiple drives definitely went offline, but I doubt
> it's due to a real RAID ASIC with custom firmware and a TLER issue.
> More likely, given the price, it was a backplane signal quality problem,
> for which cheap backplanes are notorious.

I think we shouldn't bang this one to death any further, but your
statement above could be interpreted that whilst the drives may not be
ideal, likely they weren't the issue in this case?

If the cheap WD drives weren't the main issue then perhaps at least this
example shouldn't be used as an example of why NOT to use those drives?


> Either way, cheap
> not-fit-for-RAID drives were stuffed into a cheap RAID box and disaster
> was the result.

But likely due to what boils down to "cables falling out" is what you
seem to be guessing?


> WDC
> itself says not to use the Green drives in RAID arrays. 

The problem with taking the manufacturers word on this is that they
provide two products and claim one is "good enough" and that the other
"lasts way longer", and then price them quite significantly differently

Now, without even looking inside the two identical metal chassis, you
have to admit: a) there is incentive for them to tell fibs here in order
to gain a price premium and b) given the "reliable" drives are roughly
twice the cost then there should be sufficient extra engineering in
there that we can look for third party documentation, patents and other
supplemental information to learn more about what that engineering is
and gain confidence that the money is well spent?

I guess you have two near equal priced options:

a) 12 disk RAID6 using "enterprise" drives
b) 12x 2 disk RAID 1, plus 12x RAID6 on the top of that (some variant of
RAID61 basically)

Does having twice the number of "cheap" drives make the thing more or
less reliable?  (More drives = higher probability of individual drive
failing, but additional redundancy decreases chance of total loss).  I
need to crank some numbers in excel to try and get my head around which
is better for a given failure probability

Cheers

Ed W

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: HBA Adaptor advice
  2011-05-23  5:54                   ` Brad Campbell
  2011-05-23  6:08                     ` Roman Mamedov
@ 2011-05-23 10:42                     ` Stan Hoeppner
  2011-05-23 11:35                       ` David Brown
  1 sibling, 1 reply; 60+ messages in thread
From: Stan Hoeppner @ 2011-05-23 10:42 UTC (permalink / raw)
  To: Brad Campbell; +Cc: Roman Mamedov, linux-raid

On 5/23/2011 12:54 AM, Brad Campbell wrote:

> Most sane operating systems use cluster sizes of 4k or larger and have
> done for years, so I really don't see what all the fuss is about.
> 
> Peoples inability to properly align the data on their disks can be read
> either as a failing in the technology (the partitioning applications
> have not caught up yet) or simply a lack of understanding on how to
> apply the technology.
> 
> Don't blame the drive manufacturers, this should have happened _years_ ago.

I don't think anyone has an issue w/native 4KB sectors and operating
system support for it.  That would have been the big win.  What folks
have issue with is the hybrid 512/4096 drives which has created the
alignment offset problems.

The industry (BIOS/firmware), commercial and FOSS OSes, should have
worked together to migrate directly to 4KB native sectors.  I don't know
why this didn't happen, usual suspects I guess.  It seems, from my
limited POV, that the Linux partition tool people and kernel folks
simply don't care at this point.

I've not paid recent attention.  Have fdisk, cfdisk, parted, etc, all
come up to speed now, and automatically handle offsets correctly for
hybrid sector size disks?

-- 
Stan

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: HBA Adaptor advice
  2011-05-23  9:48                   ` Ed W
@ 2011-05-23 10:44                     ` John Robinson
  0 siblings, 0 replies; 60+ messages in thread
From: John Robinson @ 2011-05-23 10:44 UTC (permalink / raw)
  To: Ed W; +Cc: linux-raid

On 23/05/2011 10:48, Ed W wrote:
[...]
> Pardon what is probably a very ignorant question, but someone earlier in
> this thread claimed that some adaptors report the size of the disk
> slightly differently?  Wouldn't this potentially cause problems if you
> needed to move the disks to a different controller?

Yup. RAID cards will use some of the disc for their own metadata. The 
amount used, and the location of it, is probably different for different 
controllers. This would be one reason why using a RAID controller with 
BBWC and exporting the drives as single-drive RAID0 volumes is a bit 
icky, and liable to tie you to one manufacturer.

There is a possibility (handwaving here) that using a RAID controller in 
JBOD mode would be similar. You may need to flash your controller to 
non-RAID firmware to avoid it, at which point you probably ought to have 
bought an HBA in the first place.

There is a similar problem on some OEMs' BIOSes that will set a 
"host-protected area" that will reduce the visible size of drives.

> Additionally if you needed to replace the disk then some new batch might
> be some few sectors smaller?  This seems to be the biggest reason for
> wanting to add a partition table and then deliberately partition some
> 10s MB smaller? (Think I saw this exact problem come up several times in
> the last few weeks alone?)

For spinning rust discs this hasn't been the case for several years 
since we passed about 160GB; all the manufacturers signed up to an 
industry standard[1] making all their discs a consistent number of 
sectors for any given marketing size.

It's probably a problem again now with SSDs, though.

Cheers,

John.

[1] I can't remember what the standard or standards group is, and I 
can't be bothered looking it up. But of course it's a standard. We love 
standards, that's why we have so many of them![2]

[2] Sorry if I'm a bit grumpy this morning. Too many standards and not 
enough coffee make John a grumpy boy.


^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: HBA Adaptor advice
  2011-05-23 10:33                   ` Ed W
@ 2011-05-23 11:21                     ` Stan Hoeppner
  0 siblings, 0 replies; 60+ messages in thread
From: Stan Hoeppner @ 2011-05-23 11:21 UTC (permalink / raw)
  To: Ed W; +Cc: Brad Campbell, linux-raid

On 5/23/2011 5:33 AM, Ed W wrote:

> If the cheap WD drives weren't the main issue then perhaps at least this
> example shouldn't be used as an example of why NOT to use those drives?

On the contrary, it's the *perfect* example.  A system's reliability is
no greater than that of the least reliable component.  The fact that WD
Green drives are so cheap dictates that anyone using them in a RAID
setup is going to use a cheap backplane or individual hot swap carriers.
 There is zero doubt here.  No one will buy a $1500 *empty* HP hot swap
JBOD chassis/backplane and then slap 14 x $80 = $1120 of WD 2TB Green
drives in it.

The soundness of design, manufacturing, and QC of cheap backplanes and
drive carriers is quite low in many/most cases.  Traces not routed
properly on a backplane PCB can generate timing skew or not reject
noise.  This can cause excessive CRC errors on the link, causing the
drive to be kicked off line.  This but one possible flaw that shows up
regularly in cheap backplanes and individual carriers.  The absolute
worst are cheap *active* backplanes.  These are the units with SAS/SATA
expanders on the PCB and/or I2C chips.  If you go cheap on your
backplane, you probably want to make sure you get a passive unit.

>> Either way, cheap
>> not-fit-for-RAID drives were stuffed into a cheap RAID box and disaster
>> was the result.
> 
> But likely due to what boils down to "cables falling out" is what you
> seem to be guessing?

No.  I'm saying he purchased a low ball solution and got low quality.
Some component puked momentarily and he lost a lot of data because on
top of that, he didn't know wtf he was doing and wiped the disks that
were actually ok.  We don't know exactly what caused the problem.  Root
cause analysis was never performed, or, if it was, it was never made
public.  I'm guessing the former.  The type of people who do root cause
analysis typically don't buy low ball gear.

>> WDC
>> itself says not to use the Green drives in RAID arrays. 
> 
> The problem with taking the manufacturers word on this is that they
> provide two products and claim one is "good enough" and that the other
> "lasts way longer", and then price them quite significantly differently

Welcome to the real world.  Been this way a long time.  Why do you think
health care in America is so expensive?  Because the same probe that is
sold to vets to be stuck up a horses ass is the same one sold for human
application.  Horses aren't litigious, but humans are.  That's one
reason why the same ass probe sold for human use costs 300 times more.
One is charged for the intended use of many products today, not
necessarily the capabilities of the product.  AMD builds both the Phenom
and Opteron on the same line, both chips are identical until the last
phase of production.  There, one of two (can't recall exactly) of the HT
links are disabled to make a Phenom, and you pay twice as much for the
Opteron.  Same chip, different "intended uses".  They gouge the business
customer because they know they can.  I'm a huge AMD fan, so please
don't think I'm down on them.  EVERYONE in business does this, Intel,
WDC, Seagate, the lot of them.

> Now, without even looking inside the two identical metal chassis, you
> have to admit: a) there is incentive for them to tell fibs here in order
> to gain a price premium and b) given the "reliable" drives are roughly
> twice the cost then there should be sufficient extra engineering in
> there that we can look for third party documentation, patents and other
> supplemental information to learn more about what that engineering is
> and gain confidence that the money is well spent?

I don't think they tout them necessarily as being more reliable, but
more capable or "compatible" in certain applications.  Drives used in
hardware RAIDs need TLER and some other specific firmware tweaks.  The
mechanicals of most "enterprise" SATA drives are shared with a number of
consumer counterarts.  Just different firmware and sticker color on the
top plate.

-- 
Stan

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: HBA Adaptor advice
  2011-05-23 10:42                     ` Stan Hoeppner
@ 2011-05-23 11:35                       ` David Brown
  0 siblings, 0 replies; 60+ messages in thread
From: David Brown @ 2011-05-23 11:35 UTC (permalink / raw)
  To: linux-raid

On 23/05/2011 12:42, Stan Hoeppner wrote:
> On 5/23/2011 12:54 AM, Brad Campbell wrote:
>
>> Most sane operating systems use cluster sizes of 4k or larger and have
>> done for years, so I really don't see what all the fuss is about.
>>
>> Peoples inability to properly align the data on their disks can be read
>> either as a failing in the technology (the partitioning applications
>> have not caught up yet) or simply a lack of understanding on how to
>> apply the technology.
>>
>> Don't blame the drive manufacturers, this should have happened _years_ ago.
>
> I don't think anyone has an issue w/native 4KB sectors and operating
> system support for it.  That would have been the big win.  What folks
> have issue with is the hybrid 512/4096 drives which has created the
> alignment offset problems.
>
> The industry (BIOS/firmware), commercial and FOSS OSes, should have
> worked together to migrate directly to 4KB native sectors.  I don't know
> why this didn't happen, usual suspects I guess.  It seems, from my
> limited POV, that the Linux partition tool people and kernel folks
> simply don't care at this point.
>

The problem is Windows XP - neither more nor less.  XP only supports 512 
byte sectors, and the installed base is so large that manufacturers 
can't ignore it.  Linux has been happy with 4K sectors for many years - 
the problem only came when WD produced disks that had 4K sectors but 
claimed to be 512, so that they could work with XP.

The worst case is hybrid disks that offset the sector number, so that 
512-byte "sector" number 63 is at the beginning of a 4K native sector. 
The idea is that these will be fast with XP and other systems that have 
the first partition starting at sector 63 - but it screws up everything 
else.

> I've not paid recent attention.  Have fdisk, cfdisk, parted, etc, all
> come up to speed now, and automatically handle offsets correctly for
> hybrid sector size disks?
>

Modern versions should automatically align partitions appropriately. 
Typically you use 1 MB boundaries - that works well with all sorts of 
disks (SSDs prefer alignment of perhaps 64K or 128K for erase blocks), 
and should work well for future disks.



^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: HBA Adaptor Advice
  2011-05-23 11:14 HBA Adaptor Advice Ed W
@ 2011-05-23 11:55 ` Joe Landman
  0 siblings, 0 replies; 60+ messages in thread
From: Joe Landman @ 2011-05-23 11:55 UTC (permalink / raw)
  To: Ed W; +Cc: linux-raid

On 05/23/2011 07:14 AM, Ed W wrote:
> Getting back on track of specific adaptor advice:
>
> To recap: I am looking for ideas on what to buy to upgrade our small
> office servers (not really stretched, just adding more backup disks and
> similar). My main requirement is to be able to buy equipment in single
> lots (one server at a time) and so I require the ability to take an
> array from one machine and use it in another machine using a different
> adaptor - therefore the previous thread has dissuaded me from looking at
> adaptors with writeback cache (and also hardware raid controllers)
>
> Therefore can I see a show of hands for "good value" HBA adaptors with
> 8, 12 and 24 ports? Ideally using fewer PCI slots is preferred and
> onboard expanders rather than separate expanders are preferred

Be aware that this won't be cheap.  Also be aware that many (most) 
expandor designs are performance limited due to their implementations. 
We see significant contention from bandwidth oversubscription in every 
day situations, regardless where the expandor is.

On the HBA side

http://www.lsi.com/storage_home/products_home/host_bus_adapters/sas_hbas/internal/sas9201-16i/index.html 


On the hardware RAID side of this:

http://www.lsi.com/storage_home/products_home/internal_raid/megaraid_sas/value_line/megaraid_sas_9260-16i/index.html

LSI controllers as HBAs are reasonably good.  Make sure you update your 
drivers and firmware to late revisions.

> Seems that previously we discovered that most LSI RAID cards were well
> supported and Marvel cards were frequently not.  Does this
> generalisation persist with pure HBA cards also?
>
> I can see the list of LSI HBA cards on their site, but any pointers for
> good value HBA adaptors appreciated? (Current chassis will be a tower
> chassic, but future upgrades are expected to be Supermicro/Norco 3/4U
> rack boxes)
> (is there a page on the wiki already covering any of this that we could
> try and distil this wisdom to?)

Don't skimp on power supply, or power distribution.  RAIDs hate that.

Don't skimp on cooling.  Drives hate that.

Regards,

Joe

-- 
Joseph Landman, Ph.D
Founder and CEO
Scalable Informatics, Inc.
email: landman@scalableinformatics.com
web  : http://scalableinformatics.com
        http://scalableinformatics.com/sicluster
phone: +1 734 786 8423 x121
fax  : +1 866 888 3112
cell : +1 734 612 4615

^ permalink raw reply	[flat|nested] 60+ messages in thread

* HBA Adaptor Advice
@ 2011-05-23 11:14 Ed W
  2011-05-23 11:55 ` Joe Landman
  0 siblings, 1 reply; 60+ messages in thread
From: Ed W @ 2011-05-23 11:14 UTC (permalink / raw)
  To: linux-raid

Getting back on track of specific adaptor advice:

To recap: I am looking for ideas on what to buy to upgrade our small
office servers (not really stretched, just adding more backup disks and
similar). My main requirement is to be able to buy equipment in single
lots (one server at a time) and so I require the ability to take an
array from one machine and use it in another machine using a different
adaptor - therefore the previous thread has dissuaded me from looking at
adaptors with writeback cache (and also hardware raid controllers)

Therefore can I see a show of hands for "good value" HBA adaptors with
8, 12 and 24 ports? Ideally using fewer PCI slots is preferred and
onboard expanders rather than separate expanders are preferred

Seems that previously we discovered that most LSI RAID cards were well
supported and Marvel cards were frequently not.  Does this
generalisation persist with pure HBA cards also?

I can see the list of LSI HBA cards on their site, but any pointers for
good value HBA adaptors appreciated? (Current chassis will be a tower
chassic, but future upgrades are expected to be Supermicro/Norco 3/4U
rack boxes)
(is there a page on the wiki already covering any of this that we could
try and distil this wisdom to?)


Thanks

Ed W

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: HBA Adaptor advice
  2011-05-23  3:39 ` Tobias McNulty
@ 2011-05-23 10:42   ` Ed W
  0 siblings, 0 replies; 60+ messages in thread
From: Ed W @ 2011-05-23 10:42 UTC (permalink / raw)
  To: Tobias McNulty; +Cc: Jim Schatzman, linux-raid

On 23/05/2011 04:39, Tobias McNulty wrote:
> One odd statistical fluke regarding quality control on the large Green
> drives:  I ordered 3 of the drives on Amazon and 3 on Newegg, and all
> 3 of the Newegg drives failed very quickly (within a couple weeks),
> while all 3 of the Amazon drives are still going strong (5 months old
> and on 24/7).  I'm not sure if it was a packaging issue or an issue
> with that particular set of drives that Newegg had in stock, but it
> left me wondering.


I had some similar experience with some Samsung F3 drives recently.  Not
as clear cut as your example, but I have found other examples in the
archives here which suggest that drive failure might correlate with
batch number?  I'm sure I have seen others suggest trying to build
arrays out of mixed batch (or perhaps brand?) drives

Going back some 10-15 years when I used to put paired raid1 drives into
our office servers, every failure I ever had appeared to affect both
drives within some few hours of each other... (less than 48 hours say).
 There are plenty of external reasons to explain that (besides drives
reaching end of life), such as power fluctuations, temperature, etc, but
punchline remains that mirroring didn't buy me much protection...

Tricky to make this stuff reliable.. Small probabilities and
catastrophic scenarios are hard to value...

Ed W

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: HBA Adaptor advice
  2011-05-23  2:11 Jim Schatzman
@ 2011-05-23  3:39 ` Tobias McNulty
  2011-05-23 10:42   ` Ed W
  0 siblings, 1 reply; 60+ messages in thread
From: Tobias McNulty @ 2011-05-23  3:39 UTC (permalink / raw)
  To: Jim Schatzman; +Cc: linux-raid

On Sun, May 22, 2011 at 10:11 PM, Jim Schatzman
<James.Schatzman@fulab.com> wrote:
> One more input re. using cheap drives.
>
> I have been running about 20 Western Digital "Green" and "Enterprise" drives (half are 1.5 TB and half are 2 TB) for several years in Raid-5 and Raid-6 configurations (all linux md). They are up 24x7. When they first came on the market, about 30% of my new drives failed within 3 months of operation (about equal fractions of Green and Enterprise). Overall, 50% of the drives eventually failed - 35% of the Green drives and 100% of the Enterprise drives. In the past 18 months, one has failed (an Enterprise drive). That drive was a warranty replacement for an earlier Enterprise drive failure.
>
> My impression is that Western was having some quality control issues with the 2GB drives - both Green and Enterprise. This was very annoying. It appears that quality has improved. I never lost any data nor ever had to restore from backup because I was always able to replace the bad drive and rebuild the raid without difficulty that I could not get solved through this forum.
>
> My experience suggests that the WD Enterprise class drives were an unnecessary expense, at least as far as reliability is concerned.
>
> Would I recommend cheap SATA drives for mission critical data?  Absolutely not. I wouldn't recommend  any  SATA drives. Go with the most  expensive SAS drives available. For that matter, I have loads of SCSI drives that are still going fine after 5 to 10 years of 365x24x7 operation.
>
> If you are going to build a RAID from cheap drives, expect that part of your hardware savings will be compensated for by labor costs. Run smart checks often. Also, and I cannot emphasize this enough, make certain that everything attached to the RAID is plugged into a high quality UPS. Otherwise, you are just asking for a power spike to take out multiple drives and/or the controller and to lose data.

One odd statistical fluke regarding quality control on the large Green
drives:  I ordered 3 of the drives on Amazon and 3 on Newegg, and all
3 of the Newegg drives failed very quickly (within a couple weeks),
while all 3 of the Amazon drives are still going strong (5 months old
and on 24/7).  I'm not sure if it was a packaging issue or an issue
with that particular set of drives that Newegg had in stock, but it
left me wondering.

Tobias
-- 
Tobias McNulty, Managing Member
Caktus Consulting Group, LLC
http://www.caktusgroup.com
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: HBA Adaptor advice
@ 2011-05-23  2:11 Jim Schatzman
  2011-05-23  3:39 ` Tobias McNulty
  0 siblings, 1 reply; 60+ messages in thread
From: Jim Schatzman @ 2011-05-23  2:11 UTC (permalink / raw)
  To: linux-raid

One more input re. using cheap drives. 

I have been running about 20 Western Digital "Green" and "Enterprise" drives (half are 1.5 TB and half are 2 TB) for several years in Raid-5 and Raid-6 configurations (all linux md). They are up 24x7. When they first came on the market, about 30% of my new drives failed within 3 months of operation (about equal fractions of Green and Enterprise). Overall, 50% of the drives eventually failed - 35% of the Green drives and 100% of the Enterprise drives. In the past 18 months, one has failed (an Enterprise drive). That drive was a warranty replacement for an earlier Enterprise drive failure.

My impression is that Western was having some quality control issues with the 2GB drives - both Green and Enterprise. This was very annoying. It appears that quality has improved. I never lost any data nor ever had to restore from backup because I was always able to replace the bad drive and rebuild the raid without difficulty that I could not get solved through this forum. 

My experience suggests that the WD Enterprise class drives were an unnecessary expense, at least as far as reliability is concerned.

Would I recommend cheap SATA drives for mission critical data?  Absolutely not. I wouldn't recommend  any  SATA drives. Go with the most  expensive SAS drives available. For that matter, I have loads of SCSI drives that are still going fine after 5 to 10 years of 365x24x7 operation.

If you are going to build a RAID from cheap drives, expect that part of your hardware savings will be compensated for by labor costs. Run smart checks often. Also, and I cannot emphasize this enough, make certain that everything attached to the RAID is plugged into a high quality UPS. Otherwise, you are just asking for a power spike to take out multiple drives and/or the controller and to lose data.

Jim


^ permalink raw reply	[flat|nested] 60+ messages in thread

end of thread, other threads:[~2011-05-23 11:55 UTC | newest]

Thread overview: 60+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2011-05-19 12:26 HBA Adaptor advice Ed W
2011-05-19 12:36 ` Roman Mamedov
2011-05-19 12:43   ` Mathias Burén
2011-05-19 14:06 ` Michael Sallaway
2011-05-19 19:10 ` Thomas Harold
2011-05-19 21:12   ` Rudy Zijlstra
2011-05-19 21:07 ` Brad Campbell
2011-05-20 20:58   ` Tobias McNulty
2011-05-20 21:23     ` Brad Campbell
2011-05-20  2:08 ` Andy Smith
2011-05-20  5:30   ` Stan Hoeppner
2011-05-21  9:52     ` Ed W
2011-05-20  7:33   ` Ed W
2011-05-20 10:21     ` Stan Hoeppner
2011-05-21 11:17       ` Ed W
2011-05-21 11:29         ` Rudy Zijlstra
2011-05-21 11:54           ` Ed W
2011-05-21 17:37             ` Leslie Rhorer
2011-05-22  9:41             ` Stan Hoeppner
2011-05-22 10:03               ` Rudy Zijlstra
2011-05-23  9:32                 ` Ed W
2011-05-21 17:05           ` Leslie Rhorer
2011-05-22  9:04         ` Stan Hoeppner
2011-05-22 10:09           ` Brad Campbell
2011-05-22 19:25             ` Stan Hoeppner
2011-05-22 20:57               ` Tobias McNulty
2011-05-22 21:13                 ` Johannes Truschnigg
2011-05-23  9:48                   ` Ed W
2011-05-23 10:44                     ` John Robinson
2011-05-22 23:19               ` Brad Campbell
2011-05-23  4:09                 ` Roman Mamedov
2011-05-23  5:54                   ` Brad Campbell
2011-05-23  6:08                     ` Roman Mamedov
2011-05-23 10:42                     ` Stan Hoeppner
2011-05-23 11:35                       ` David Brown
2011-05-23  6:54                 ` Stan Hoeppner
2011-05-23  7:23                   ` Brad Campbell
2011-05-22 23:44               ` Brad Campbell
2011-05-23  0:07                 ` Brad Campbell
2011-05-23  5:30                   ` Stefan /*St0fF*/ Hübner
2011-05-23 10:18                     ` Ed W
2011-05-23  9:58                 ` Stan Hoeppner
2011-05-23 10:33                   ` Ed W
2011-05-23 11:21                     ` Stan Hoeppner
2011-05-20 12:18     ` Joe Landman
2011-05-20 12:34       ` Roman Mamedov
2011-05-20 12:36         ` Mathias Burén
2011-05-20 12:48         ` Joe Landman
2011-05-20 13:21       ` Ed W
2011-05-20 14:23         ` Joe Landman
2011-05-20 20:01       ` Andy Smith
2011-05-20 20:12         ` Stan Hoeppner
2011-05-20 20:24         ` Drew
2011-05-20 20:58           ` Stan Hoeppner
     [not found]             ` <4DD7A100.2010807@wildgooses.com>
2011-05-22  8:13               ` Stan Hoeppner
2011-05-23  2:11 Jim Schatzman
2011-05-23  3:39 ` Tobias McNulty
2011-05-23 10:42   ` Ed W
2011-05-23 11:14 HBA Adaptor Advice Ed W
2011-05-23 11:55 ` Joe Landman

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.