All of lore.kernel.org
 help / color / mirror / Atom feed
* 3T drives and RAID
@ 2010-10-30  3:36 Leslie Rhorer
  2010-10-30  8:33 ` Johannes Truschnigg
  0 siblings, 1 reply; 5+ messages in thread
From: Leslie Rhorer @ 2010-10-30  3:36 UTC (permalink / raw)
  To: linux-raid

	WD has finally released a 3T drive employing 4K sectors and Advanced
Formatting.  I have some questions about Linux / RAID compatibility.  The
compatibility chart says using the drive with Linux requires employing the
included HBA.  Is the HBA really required if the system will
not be booting from the drive?  Why?  What is the minimum kernel version
required?  I am wanting to use these on a RAID6 array with no drive
partitioning underneath the array (raw disks) using md / mdadm.  I'm
using a pair of PM based eSATA RAID enclosures to house the drives, so
if I need to replace my adapters, the replacements will need to have
eSATA ports (at least 4 per card) compatible with PM enclosures, which the
included HBAs do not have.  I'm not sure if the little Sil3124 adapters I
currently have will fill the bill.  Performance is not a big issue.

	In addition, I am going to need to add capacity to my existing
arrays well before I have the money to replace them completely with 3T
drives.  I'm going to need six 3T drives per array to completely replace the
1T and 1.5T drives currently in use.  In the mean time, I could upgrade the
existing arrays with 1T and 1.5T drives, but that money would be more or
less wasted when the drives are replaced in a few months.  I would much
rather buy a couple of 3T drives and move them over to the new array when it
gets built.  Will there be an issue with attempting to add a 4K sector drive
to a RAID6 array built out of 512 byte sectors?

--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: 3T drives and RAID
  2010-10-30  3:36 3T drives and RAID Leslie Rhorer
@ 2010-10-30  8:33 ` Johannes Truschnigg
  2010-10-30  9:21   ` Leslie Rhorer
  0 siblings, 1 reply; 5+ messages in thread
From: Johannes Truschnigg @ 2010-10-30  8:33 UTC (permalink / raw)
  To: Leslie Rhorer; +Cc: linux-raid

[-- Attachment #1: Type: text/plain, Size: 3908 bytes --]

Hi Leslie,

Please note: I don't have any hands-on experience with drives over 2TB
in size, but I'm interested in the subject as well, and so I will reply
with what (I think) I know nonetheless ;)

On Fri, 29 Oct 2010 22:36:27 -0500
"Leslie Rhorer" <lrhorer@satx.rr.com> wrote:

> 	WD has finally released a 3T drive employing 4K sectors and
> Advanced Formatting.  I have some questions about Linux / RAID
> compatibility.  The compatibility chart says using the drive with
> Linux requires employing the included HBA.  Is the HBA really
> required if the system will not be booting from the drive?  Why?

I don't think that's really the case. The real troublemaker for
common SOHO computer setups with drives that big are not the hardware or
the ATA protocol (there's theoretical support for drives as large as 2
PetaBytes since ATA-6, aka UDMA/100 for EIDE/PATA drives), but simply
MBR partitions. So if your goal isn't booting off of an MBR partition
that's as large as your whole drive (or larger than 2TB, for that
matter), you shouldn't run in any problems. Using advanced partitioning
schemes (like GPT) or LVM on GNU/Linux, for example, should be able to
deal with partition size above 2TB perfectly well.

> What is the minimum kernel version required?  I am wanting to use
> these on a RAID6 array with no drive partitioning underneath the
> array (raw disks) using md / mdadm.  I'm using a pair of PM based
> eSATA RAID enclosures to house the drives, so if I need to replace my
> adapters, the replacements will need to have eSATA ports (at least 4
> per card) compatible with PM enclosures, which the included HBAs do
> not have.  I'm not sure if the little Sil3124 adapters I currently
> have will fill the bill.  Performance is not a big issue.

I don't think you'll run into problems with that kind of setup, either.
Support for SATA Port Multipliers are a matter of the controller and its
driver (check libata docs if your combination is supported - if memory
serves, Sil3xxx _do_ support PMs), and as I said before, the ATA
standard - and therefore, compliant devices - should cope with drives >
2TB quite well. Your enclosures presumably are dumb devices anyway,
and don't contain any ICs or logic that would interfere with the data on
the link between the HBA's SATA port and your drives.

> 	In addition, I am going to need to add capacity to my existing
> arrays well before I have the money to replace them completely with 3T
> drives.  I'm going to need six 3T drives per array to completely
> replace the 1T and 1.5T drives currently in use.  In the mean time, I
> could upgrade the existing arrays with 1T and 1.5T drives, but that
> money would be more or less wasted when the drives are replaced in a
> few months.  I would much rather buy a couple of 3T drives and move
> them over to the new array when it gets built.  Will there be an
> issue with attempting to add a 4K sector drive to a RAID6 array built
> out of 512 byte sectors?

I don't think so, since you can treat "Advanced Format" drives as
though as they were 512b sector drives, with the drawback of severly
degraded write performance. With md's common values for chunk-size (64k
and more), you should have no problems like the hugely increased write
penalty you get when incorrectly aligning your writes so they don't
take care of sector boundaries. If you're not using partitions but
whole drives as the building blocks of your array anyway, and your chunk
size is an integer multiple of the drives' 4k sector size, you should be
fine.

Corrections to the above are, of course, very welcome. Hth!

-- 
with best regards:
- Johannes Truschnigg ( johannes@truschnigg.info )

www: http://johannes.truschnigg.info/
phone: +43 650 2 133337
xmpp: johannes@truschnigg.info

Please do not bother me with HTML-eMail or attachments. Thank you.

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 198 bytes --]

^ permalink raw reply	[flat|nested] 5+ messages in thread

* RE: 3T drives and RAID
  2010-10-30  8:33 ` Johannes Truschnigg
@ 2010-10-30  9:21   ` Leslie Rhorer
  2010-10-31 15:30     ` Neil Brown
  0 siblings, 1 reply; 5+ messages in thread
From: Leslie Rhorer @ 2010-10-30  9:21 UTC (permalink / raw)
  To: 'Johannes Truschnigg'; +Cc: linux-raid

> -----Original Message-----
> From: linux-raid-owner@vger.kernel.org [mailto:linux-raid-
> owner@vger.kernel.org] On Behalf Of Johannes Truschnigg
> Sent: Saturday, October 30, 2010 3:33 AM
> To: Leslie Rhorer
> Cc: linux-raid@vger.kernel.org
> Subject: Re: 3T drives and RAID
> 
> Hi Leslie,
> 
> Please note: I don't have any hands-on experience with drives over 2TB
> in size, but I'm interested in the subject as well, and so I will reply
> with what (I think) I know nonetheless ;)
> 
> On Fri, 29 Oct 2010 22:36:27 -0500
> "Leslie Rhorer" <lrhorer@satx.rr.com> wrote:
> 
> > 	WD has finally released a 3T drive employing 4K sectors and
> > Advanced Formatting.  I have some questions about Linux / RAID
> > compatibility.  The compatibility chart says using the drive with
> > Linux requires employing the included HBA.  Is the HBA really
> > required if the system will not be booting from the drive?  Why?
> 
> I don't think that's really the case.

	I didn't see how it would be, myself.  I can understand how it would
be needed in order for the drive to be bootable, since the BIOS can't handle
the drive, but I was scratching my head over why it might be required
otherwise.  I know the ECC and physical layout for drives with 4K sectors is
quite different than that of legacy drive formats, but it seemed to me as
long as the OS can properly calculate the sector numbers, the controller
should be happy.

> The real troublemaker for
> common SOHO computer setups with drives that big are not the hardware or
> the ATA protocol (there's theoretical support for drives as large as 2
> PetaBytes since ATA-6, aka UDMA/100 for EIDE/PATA drives), but simply
> MBR partitions. So if your goal isn't booting off of an MBR partition
> that's as large as your whole drive (or larger than 2TB, for that
> matter), you shouldn't run in any problems. Using advanced partitioning
> schemes (like GPT) or LVM on GNU/Linux, for example, should be able to
> deal with partition size above 2TB perfectly well.

	I won't be partitioning, at all.  The file system (XFS) is written
directly to the raw array, and the array is assembled from raw drives.  I
have relatively small (500G) drives for booting, partitioned into /, /boot,
and swap, but the data is all kept on a single file system formatted on a
simple RAID6 volume.

> > What is the minimum kernel version required?  I am wanting to use
> > these on a RAID6 array with no drive partitioning underneath the
> > array (raw disks) using md / mdadm.  I'm using a pair of PM based
> > eSATA RAID enclosures to house the drives, so if I need to replace my
> > adapters, the replacements will need to have eSATA ports (at least 4
> > per card) compatible with PM enclosures, which the included HBAs do
> > not have.  I'm not sure if the little Sil3124 adapters I currently
> > have will fill the bill.  Performance is not a big issue.
> 
> I don't think you'll run into problems with that kind of setup, either.
> Support for SATA Port Multipliers are a matter of the controller and its
> driver (check libata docs if your combination is supported - if memory
> serves, Sil3xxx _do_ support PMs), and as I said before, the ATA

	Yes, they do.  That's really not the question.  I know my Sil3124
adapters support PMs: I'm using them right now.

> standard - and therefore, compliant devices - should cope with drives >
> 2TB quite well. Your enclosures presumably are dumb devices anyway,
> and don't contain any ICs or logic that would interfere with the data on
> the link between the HBA's SATA port and your drives.

	Well, they have Port Multipliers, but the PMs should not care about
the drive format.  I didn't think the SATA controller would, eitehr, but I
wasn't certain.

> > 	In addition, I am going to need to add capacity to my existing
> > arrays well before I have the money to replace them completely with 3T
> > drives.  I'm going to need six 3T drives per array to completely
> > replace the 1T and 1.5T drives currently in use.  In the mean time, I
> > could upgrade the existing arrays with 1T and 1.5T drives, but that
> > money would be more or less wasted when the drives are replaced in a
> > few months.  I would much rather buy a couple of 3T drives and move
> > them over to the new array when it gets built.  Will there be an
> > issue with attempting to add a 4K sector drive to a RAID6 array built
> > out of 512 byte sectors?
> 
> I don't think so, since you can treat "Advanced Format" drives as
> though as they were 512b sector drives, with the drawback of severly
> degraded write performance. With md's common values for chunk-size (64k
> and more), you should have no problems like the hugely increased write
> penalty you get when incorrectly aligning your writes so they don't
> take care of sector boundaries. If you're not using partitions but
> whole drives as the building blocks of your array anyway, and your chunk
> size is an integer multiple of the drives' 4k sector size, you should be
> fine.

	Md will automatically treat the oddball drive as if it had .5K
sectors, or does one need to tell ma (or the kernel) to do so?

--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: 3T drives and RAID
  2010-10-30  9:21   ` Leslie Rhorer
@ 2010-10-31 15:30     ` Neil Brown
  2010-10-31 19:05       ` Bernd Schubert
  0 siblings, 1 reply; 5+ messages in thread
From: Neil Brown @ 2010-10-31 15:30 UTC (permalink / raw)
  To: Leslie Rhorer; +Cc: 'Johannes Truschnigg', linux-raid

On Sat, 30 Oct 2010 04:21:16 -0500
"Leslie Rhorer" <lrhorer@satx.rr.com> wrote:

> 	Md will automatically treat the oddball drive as if it had .5K
> sectors, or does one need to tell ma (or the kernel) to do so?
>

You don't need to tell the kernel to do anything special - it should just
work.

md/raid5 (and raid6) do all writes as 4K blocks, 4K aligned (as the
stripe-cache is made of pages which are 4K).  So that fits perfectly with the
new drives.
If your filesystem issued a non-aligned read, then it could get down to the
device as a non-aligned read, but there is little performance penalty for
reads, only writes.
And XFS almost certainly does all IO in 4K multiples, so you should be fine.

In short: I can see no reason why it shouldn't work smoothly.

NeilBrown

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: 3T drives and RAID
  2010-10-31 15:30     ` Neil Brown
@ 2010-10-31 19:05       ` Bernd Schubert
  0 siblings, 0 replies; 5+ messages in thread
From: Bernd Schubert @ 2010-10-31 19:05 UTC (permalink / raw)
  To: Neil Brown; +Cc: Leslie Rhorer, 'Johannes Truschnigg', linux-raid

On Sunday, October 31, 2010, Neil Brown wrote:
> On Sat, 30 Oct 2010 04:21:16 -0500
> 
> "Leslie Rhorer" <lrhorer@satx.rr.com> wrote:
> > 	Md will automatically treat the oddball drive as if it had .5K
> > 
> > sectors, or does one need to tell ma (or the kernel) to do so?
> 
> You don't need to tell the kernel to do anything special - it should just
> work.
> 
> md/raid5 (and raid6) do all writes as 4K blocks, 4K aligned (as the
> stripe-cache is made of pages which are 4K).  So that fits perfectly with
> the new drives.
> If your filesystem issued a non-aligned read, then it could get down to the
> device as a non-aligned read, but there is little performance penalty for
> reads, only writes.
> And XFS almost certainly does all IO in 4K multiples, so you should be
> fine.
> 
> In short: I can see no reason why it shouldn't work smoothly.

Well, I think alignment on a larger basis is something we need to discuss 
about. I have a modified blkiomon on my disk, which shows IO sizes (will send 
the patches to the corresponding list, once I find the time to finalize it).

On one shell:
bathl:~# dd if=/dev/md5 of=/dev/null bs=1M iflag=direct

On  another shell:

bathl:~# blktrace -d /dev/sdc  -d /dev/sdd -a issue -a complete  -o - \| 
/tmpa/devel/blktrace/blktrace-1.0.1/blkiomon -I10 -h -

sizes histogram (kiB):
           32:   470
          124:  3096
          496:   166


(I modified blkiomon not to print the histogram based on doubled IO sizes, but 
to print multiple of 4K and to skip sizes with zero requests).

Well, I think I need to make an option to print it on the basis of 512B, but 
already the present output shows rather bad IO requests. One thing I have 
learned during my work at DDN is that good performance numbers only can be 
achieved if large IO requests come in. Now a DDN hardware raid is certainly 
not comparable with linux software raid, but if the local disk can do 512KB 
requests and gets that with direct io, linux md should do the same.

The same for a read from sdc:

bathl:~# dd if=/dev/sdc of=/dev/null bs=1M iflag=direct

blktrace -d /dev/sdc  -d /dev/sdd -a issue -a complete  -o - \| 
/tmpa/devel/blktrace/blktrace-1.0.1/blkiomon -I10 -h -

sizes histogram (kiB):
          512:  1874


md5 : active raid10 sdc[0] sdd[1]
      976760832 blocks super 1.2 1024K chunks 2 offset-copies [2/2] [UU]
      bitmap: 0/15 pages [0KB], 32768KB chunk


Cheers,
Bernd




^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2010-10-31 19:05 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2010-10-30  3:36 3T drives and RAID Leslie Rhorer
2010-10-30  8:33 ` Johannes Truschnigg
2010-10-30  9:21   ` Leslie Rhorer
2010-10-31 15:30     ` Neil Brown
2010-10-31 19:05       ` Bernd Schubert

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.