All of lore.kernel.org
 help / color / mirror / Atom feed
* Correct RAID options
@ 2014-08-19 18:38 Chris Knipe
  2014-08-19 23:28 ` Craig Curtin
                   ` (3 more replies)
  0 siblings, 4 replies; 11+ messages in thread
From: Chris Knipe @ 2014-08-19 18:38 UTC (permalink / raw)
  To: linux-raid

Hi All,

I'm sitting with a bit of a catch 22 and need some feedback / inputs please.
This isn't strictly md related as all servers has MegaRAID SAS controllers
with BBUs and I am running hardware raid.  So my apologies about the off
topic posting, but the theory remains the same I presume.   All the servers
store millions of small (< 2mb) files, in a structured directory structure
to keep the amount of files per directory in check.

Firstly, I have a bunch (3) of front end servers, all configured in RAID10
and consisting of 8 x 4TB SATAIII drives.  Up to now they have performed
very well, with roughly 30% reads and 70% writes.  This is absolutely fine
as RAID10 does give much better write performance and we expect this.  I
can't recall what the benches said when I tested this many, many months ago,
but it was good and IO wait even under heavy heavy usage is very little...

The problem now is coming in that the servers are reaching their capacity
and the arrays are starting to fill up.  Deleting files, isn't really an
option for me as I want to keep them as long as possible.  So, let's get a
server to archive data on.

So, a new server, 15 x 4TB SATAIII drives again, on a MegaRAID controller.
With the understanding that the "archives" will be read more than written to
(we only write here once we move data from the RAID10 arrays), I opted for
RAID5 rather.  The higher spindle count surely should count for something.
Well.  The server was configured, array initialised, and tests shows more
than 1gb/s in write speeds - faster than the RAID10 arrays.  I am pleased!

What's the problem?  Well the front end servers does an enormous amount of
random read/writes (30/70 split), 24x7.  Some 3 million files are added
(written) per day, of which roughly 30% are read again.  So, the majority of
the IO activity is writing to disk.  With all the writing going on, there is
effectively zero IO left for reading data.  I can't read (or should we say
"move") data off the server faster than what it is being written.  The
moment I start to do any amount of significant read requests, the IO wait
jumps through the roof and the write speeds obviously also crawl to a halt.
I suspect due to the seek time on the spindles, which does make sense and
all of that.  So there still isn't really any problem here that we don't
know about already.

Now, I realise that this is a really, really open question in terms of
interpretation, but what raid levels with high spindle counts (say 8, 12 or
15 or so) will provide for the best "overall" and balanced read/write
performance in terms of random IO?  I do not necessarily need blistering
performance in terms of speeds due to the small file sizes, but I do need
blistering fast performance in terms of IOPS and random read/writes...  All
file systems currently EXT4 and all raid disks running with a 64K block
size.

Many thanks, and once again my apologise for my theoretical question rather
than md specific question.

--
Chris.







^ permalink raw reply	[flat|nested] 11+ messages in thread

* RE: Correct RAID options
  2014-08-19 18:38 Correct RAID options Chris Knipe
@ 2014-08-19 23:28 ` Craig Curtin
  2014-08-19 23:42 ` Roger Heflin
                   ` (2 subsequent siblings)
  3 siblings, 0 replies; 11+ messages in thread
From: Craig Curtin @ 2014-08-19 23:28 UTC (permalink / raw)
  To: linux-raid

It is off topic for this list - but the one question I would say is what happens if you lose the new RAID5 setup ? If you lose a single disk - the rebuild time on that amount of data is going to be enormous - I hope you are at least using Enterprise Level SATA drives. I would definitely be looking at RAID 6 for a solution like this.

As to more throughput - surely it would be better to expand outwards - i.e. add more servers to distribute the write workload ?

You have not stated what the server specs are or how their processors are coping with the throughput requirements.

The other alternative which sounds like it would be a smarter long term alternative would be to look at a SAN solution - the Dell Equallogic systems offer tiered storage for workloads exactly as you describe and can front end the write process with SSD storage.

Craig

-----Original Message-----
From: linux-raid-owner@vger.kernel.org [mailto:linux-raid-owner@vger.kernel.org] On Behalf Of Chris Knipe
Sent: Wednesday, 20 August 2014 4:39 AM
To: linux-raid@vger.kernel.org
Subject: Correct RAID options

Hi All,

I'm sitting with a bit of a catch 22 and need some feedback / inputs please.
This isn't strictly md related as all servers has MegaRAID SAS controllers with BBUs and I am running hardware raid.  So my apologies about the off
topic posting, but the theory remains the same I presume.   All the servers
store millions of small (< 2mb) files, in a structured directory structure to keep the amount of files per directory in check.

Firstly, I have a bunch (3) of front end servers, all configured in RAID10 and consisting of 8 x 4TB SATAIII drives.  Up to now they have performed very well, with roughly 30% reads and 70% writes.  This is absolutely fine as RAID10 does give much better write performance and we expect this.  I can't recall what the benches said when I tested this many, many months ago, but it was good and IO wait even under heavy heavy usage is very little...

The problem now is coming in that the servers are reaching their capacity and the arrays are starting to fill up.  Deleting files, isn't really an option for me as I want to keep them as long as possible.  So, let's get a server to archive data on.

So, a new server, 15 x 4TB SATAIII drives again, on a MegaRAID controller.
With the understanding that the "archives" will be read more than written to (we only write here once we move data from the RAID10 arrays), I opted for
RAID5 rather.  The higher spindle count surely should count for something.
Well.  The server was configured, array initialised, and tests shows more than 1gb/s in write speeds - faster than the RAID10 arrays.  I am pleased!

What's the problem?  Well the front end servers does an enormous amount of random read/writes (30/70 split), 24x7.  Some 3 million files are added
(written) per day, of which roughly 30% are read again.  So, the majority of the IO activity is writing to disk.  With all the writing going on, there is effectively zero IO left for reading data.  I can't read (or should we say
"move") data off the server faster than what it is being written.  The moment I start to do any amount of significant read requests, the IO wait jumps through the roof and the write speeds obviously also crawl to a halt.
I suspect due to the seek time on the spindles, which does make sense and all of that.  So there still isn't really any problem here that we don't know about already.

Now, I realise that this is a really, really open question in terms of interpretation, but what raid levels with high spindle counts (say 8, 12 or
15 or so) will provide for the best "overall" and balanced read/write performance in terms of random IO?  I do not necessarily need blistering performance in terms of speeds due to the small file sizes, but I do need blistering fast performance in terms of IOPS and random read/writes...  All file systems currently EXT4 and all raid disks running with a 64K block size.

Many thanks, and once again my apologise for my theoretical question rather than md specific question.

--
Chris.






--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in the body of a message to majordomo@vger.kernel.org More majordomo info at  http://vger.kernel.org/majordomo-info.html

Disclaimer

CONFIDENTIAL

This message contains confidential information and is intended only for the intended recipients. If you are not the named addressee you should not disseminate, distribute or copy this e-mail. Please notify the sender immediately by e-mail if you have received this e-mail by mistake and delete this e-mail from your system.

Disclaimer

CONFIDENTIAL

This message contains confidential information and is intended only for the intended recipients. If you are not the named addressee you should not disseminate, distribute or copy this e-mail. Please notify the sender immediately by e-mail if you have received this e-mail by mistake and delete this e-mail from your system.

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Correct RAID options
  2014-08-19 18:38 Correct RAID options Chris Knipe
  2014-08-19 23:28 ` Craig Curtin
@ 2014-08-19 23:42 ` Roger Heflin
  2014-08-20  0:22 ` David Brown
  2014-08-20  5:58 ` Chris Schanzle
  3 siblings, 0 replies; 11+ messages in thread
From: Roger Heflin @ 2014-08-19 23:42 UTC (permalink / raw)
  To: savage; +Cc: Linux RAID

For reads bigger block sizes are better (file size does not appear to matter).

Depending on alignment a 8k file could cause 2 blocks to be read off
of 2 separate disks, using more spindles and taking longer.

The bigger the block io size on the raid disk the less likely with the
same file size you have to read from 2 different disks.

When I tested random read rates I found that no matter what the file
size was a bigger block size on the raid was always better.   It
appear to come down to minimizing the number of disk spindles that
have to be used for each file read.

So my practice was always to set the block size to as big as possible,
I believe some of the controllers I was using could go as high as
1mbyte.

On Tue, Aug 19, 2014 at 1:38 PM, Chris Knipe <savage@savage.za.org> wrote:
> Hi All,
>
> I'm sitting with a bit of a catch 22 and need some feedback / inputs please.
> This isn't strictly md related as all servers has MegaRAID SAS controllers
> with BBUs and I am running hardware raid.  So my apologies about the off
> topic posting, but the theory remains the same I presume.   All the servers
> store millions of small (< 2mb) files, in a structured directory structure
> to keep the amount of files per directory in check.
>
> Firstly, I have a bunch (3) of front end servers, all configured in RAID10
> and consisting of 8 x 4TB SATAIII drives.  Up to now they have performed
> very well, with roughly 30% reads and 70% writes.  This is absolutely fine
> as RAID10 does give much better write performance and we expect this.  I
> can't recall what the benches said when I tested this many, many months ago,
> but it was good and IO wait even under heavy heavy usage is very little...
>
> The problem now is coming in that the servers are reaching their capacity
> and the arrays are starting to fill up.  Deleting files, isn't really an
> option for me as I want to keep them as long as possible.  So, let's get a
> server to archive data on.
>
> So, a new server, 15 x 4TB SATAIII drives again, on a MegaRAID controller.
> With the understanding that the "archives" will be read more than written to
> (we only write here once we move data from the RAID10 arrays), I opted for
> RAID5 rather.  The higher spindle count surely should count for something.
> Well.  The server was configured, array initialised, and tests shows more
> than 1gb/s in write speeds - faster than the RAID10 arrays.  I am pleased!
>
> What's the problem?  Well the front end servers does an enormous amount of
> random read/writes (30/70 split), 24x7.  Some 3 million files are added
> (written) per day, of which roughly 30% are read again.  So, the majority of
> the IO activity is writing to disk.  With all the writing going on, there is
> effectively zero IO left for reading data.  I can't read (or should we say
> "move") data off the server faster than what it is being written.  The
> moment I start to do any amount of significant read requests, the IO wait
> jumps through the roof and the write speeds obviously also crawl to a halt.
> I suspect due to the seek time on the spindles, which does make sense and
> all of that.  So there still isn't really any problem here that we don't
> know about already.
>
> Now, I realise that this is a really, really open question in terms of
> interpretation, but what raid levels with high spindle counts (say 8, 12 or
> 15 or so) will provide for the best "overall" and balanced read/write
> performance in terms of random IO?  I do not necessarily need blistering
> performance in terms of speeds due to the small file sizes, but I do need
> blistering fast performance in terms of IOPS and random read/writes...  All
> file systems currently EXT4 and all raid disks running with a 64K block
> size.
>
> Many thanks, and once again my apologise for my theoretical question rather
> than md specific question.
>
> --
> Chris.
>
>
>
>
>
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-raid" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Correct RAID options
  2014-08-19 18:38 Correct RAID options Chris Knipe
  2014-08-19 23:28 ` Craig Curtin
  2014-08-19 23:42 ` Roger Heflin
@ 2014-08-20  0:22 ` David Brown
  2014-08-20  1:24   ` Chris Knipe
  2014-08-20  5:58 ` Chris Schanzle
  3 siblings, 1 reply; 11+ messages in thread
From: David Brown @ 2014-08-20  0:22 UTC (permalink / raw)
  To: savage, linux-raid


Hi,

This mailing list is for raid on Linux.  While it is dominated by md 
raid, it covers hardware raid too.

In general, a 15 disk raid5 array is asking for trouble.  At least make 
it raid6.

However, when I hear of multiple parallel access to lots of small files, 
I think of XFS over a linear concat.  If Stan Hoeppner is following at 
the moment, I'm sure he can help here - he is an expert on this sort of 
thing.

But the general idea is to have a set of raid1 mirrors (or possible 
Linux md raid10,far2 pairs if the traffic is read-heavy), and then tie 
them all together using a linear concatenation rather than raid0 
stripes.  When you have XFS on this, it divides up the disk space into 
blocks that can be accessed independently.  Thus it can access both the 
data and metadata relating to a file within a single raid1 pair - and 
simultaneously access other files on other pairs.  The block 
partitioning is done by directory, so it only works well if the parallel 
accesses are spread across a range of different directories.

I am assuming your files are fairly small - if your reads or writes are 
often smaller than a full stripe of raid10 or raid5, performance will 
suffer greatly compared to XFS on a linear concat.

mvh.,

David


On 19/08/14 20:38, Chris Knipe wrote:
> Hi All,
>
> I'm sitting with a bit of a catch 22 and need some feedback / inputs please.
> This isn't strictly md related as all servers has MegaRAID SAS controllers
> with BBUs and I am running hardware raid.  So my apologies about the off
> topic posting, but the theory remains the same I presume.   All the servers
> store millions of small (< 2mb) files, in a structured directory structure
> to keep the amount of files per directory in check.
>
> Firstly, I have a bunch (3) of front end servers, all configured in RAID10
> and consisting of 8 x 4TB SATAIII drives.  Up to now they have performed
> very well, with roughly 30% reads and 70% writes.  This is absolutely fine
> as RAID10 does give much better write performance and we expect this.  I
> can't recall what the benches said when I tested this many, many months ago,
> but it was good and IO wait even under heavy heavy usage is very little...
>
> The problem now is coming in that the servers are reaching their capacity
> and the arrays are starting to fill up.  Deleting files, isn't really an
> option for me as I want to keep them as long as possible.  So, let's get a
> server to archive data on.
>
> So, a new server, 15 x 4TB SATAIII drives again, on a MegaRAID controller.
> With the understanding that the "archives" will be read more than written to
> (we only write here once we move data from the RAID10 arrays), I opted for
> RAID5 rather.  The higher spindle count surely should count for something.
> Well.  The server was configured, array initialised, and tests shows more
> than 1gb/s in write speeds - faster than the RAID10 arrays.  I am pleased!
>
> What's the problem?  Well the front end servers does an enormous amount of
> random read/writes (30/70 split), 24x7.  Some 3 million files are added
> (written) per day, of which roughly 30% are read again.  So, the majority of
> the IO activity is writing to disk.  With all the writing going on, there is
> effectively zero IO left for reading data.  I can't read (or should we say
> "move") data off the server faster than what it is being written.  The
> moment I start to do any amount of significant read requests, the IO wait
> jumps through the roof and the write speeds obviously also crawl to a halt.
> I suspect due to the seek time on the spindles, which does make sense and
> all of that.  So there still isn't really any problem here that we don't
> know about already.
>
> Now, I realise that this is a really, really open question in terms of
> interpretation, but what raid levels with high spindle counts (say 8, 12 or
> 15 or so) will provide for the best "overall" and balanced read/write
> performance in terms of random IO?  I do not necessarily need blistering
> performance in terms of speeds due to the small file sizes, but I do need
> blistering fast performance in terms of IOPS and random read/writes...  All
> file systems currently EXT4 and all raid disks running with a 64K block
> size.
>
> Many thanks, and once again my apologise for my theoretical question rather
> than md specific question.
>


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Correct RAID options
  2014-08-20  0:22 ` David Brown
@ 2014-08-20  1:24   ` Chris Knipe
  2014-08-20  2:38     ` Craig Curtin
  2014-08-20  7:32     ` David Brown
  0 siblings, 2 replies; 11+ messages in thread
From: Chris Knipe @ 2014-08-20  1:24 UTC (permalink / raw)
  To: David Brown; +Cc: linux-raid

On Wed, Aug 20, 2014 at 2:22 AM, David Brown <david.brown@hesbynett.no> wrote:

> In general, a 15 disk raid5 array is asking for trouble.  At least make it
> raid6.

At this stage the IO load on the archiver with the 15 disk RAID5 is
-very- minimal.  It's not even writing 8MB/s currently as the front
end RAID10 servers are obviously severely hampered whilst doing the
concurrent read/write requests. Now that it is our peak times, load
averages shoot up to over 80 due to IO wait from time to time, so this
is kinda critical for me right now :-(

Just a bit more background as was asked in the other replies...
Front end Servers are Dell PowerEdge R720 DX150s (8 x 4TB SATA-III,
64GB Ram, and Dual Xeon E5-2620 Hex-Core @ 2.00GHz)
The archiver is custom built (no brand name) and consists of the 15 x
4TB SATA-II drives, 32GB Ram, and a single Xeon E3-1245 Quad-Core @
3.3Ghz

Now the archiver we added is new - so I can't really comment at this
stage on how it is performing as it is not getting any real work from
the front ends.  During our standard benching (hdparm / dd / bonnie)
with no load on the archiver in terms of IO, performance was more than
adequate.

In terms of the front-ends with our "normal" load distribution of a
70/30 split between writes/reads, there's no serious performance
problems.  With over 500 concurrent application threads per server
accessing the files on the disks, load averages are generally around
the 3 to 5 range, with very minimal IO wait.  Munin reports "disk
utilization" between 20% and 30%, "disk latency" sub 100ms, and "disk
throughput" at about 30MB/s if I have to average all of this out.

Since we've now started to move data from the front ends to the
archiver, we have obviously thrown the 70/30 split out of the window,
and all stats are basically now off the charts. "disk utilization" is
averaging between 90% to 100%. The reading of the data from the front
end servers is obviously causing a bottleneck, and I can confirm this
seeing that as soon as we stop the archiving process that reads the
data on the front ends and writes it to the archiver, the load on the
servers return to normal.

In terms of adding more front end servers - it is definitely an option
yes.  Being brand name servers they do come at a premium however so I
would ideally like to have this as a last resort.  The premium cost,
together with the limited storage capacity basically made us opt to
rather try and offload some of the storage requirements to cheaper
alternatives (more than double the capacity - even at RAID10, for less
than half the price - realistically, we will be more than happy with
half the performance as well, so I'm not expecting miracles either).

RAID rebuilds are already problematic on the front end servers (RAID
10 over 8 x 4TB) with a single drive failure whilst the server is
under load takes approximately 8 odd hours to rebuild if memory serves
me correctly.  We've had a few failures in the past (even a double
drive failure at the same time), but nothing recent that I can recall
accurately.

I was never aware that bigger block sizes would increase read
performance though - this is interesting and something I can
definitely explore.  I am talking under correction, but I believe the
MegaRAIDs we're using can even go bigger than 1mbyte blocks.  I'll
have to check on this.  Bigger blocks does mean wasting more space
though if the files written are smaller and can't necessarily fill up
an entire block, right?  I suppose when you start talking about 12TB
and 50TB arrays, the amount of wasted space really becomes
insignificant, or am I mistaken?

SANs unfortunately is out of the question as this is hosted
infrastructure at a provider that does not offer SANs as part of their
product offerings.


> But the general idea is to have a set of raid1 mirrors (or possible Linux md
> raid10,far2 pairs if the traffic is read-heavy), and then tie them all
> together using a linear concatenation rather than raid0 stripes.  When you

Can I perhaps ask that you just elaborate a bit on what you mean by
linear concatenation?  I am presuming you are not referring to RAID 10
'per say' here as to your comment to use this rather than RAID 0
stripes.  XFS by itself, is also a good option - I honestly do not
know why this wasn't given consideration when we initially set the
machines up.  By the sound of it, all of them are now going to be
facing a rebuild.

> I am assuming your files are fairly small - if your reads or writes are
> often smaller than a full stripe of raid10 or raid5, performance will suffer
> greatly compared to XFS on a linear concat.

The files are VERY evenly distributed using md5 hashes.  We have 16
top level directories, 255 second level directories, and 4094 third
level directories.  Each third level directory currently holds between
4K and 4.5K files per directory (the archiver servers should have
roughly three or four times that amount once the disks are full).
Files are generally between 250kb and 750kb, a small percentage are a
bit larger to the 1.5mb range, and I can almost guarantee that not one
single file will exceed the 5mb range.  I'm not sure what the stripe
size is at this stage but it is more than likely what ever the default
is for the controller (64kb?)

I think to explore XFS would need to be my first port of call here.
Take one of the front ends out of production tomorrow when load has
quieted down, trash it, and rebuild it.  Then we'll more than likely
need 2 or 3 weeks for the disks to fill up again with files before
we're really going to see how it compares.

If I can perhaps just get some clarity in terms of the physical disk
layouts / configurations that you would recommend, I would appreciate
it greately.  You're obviously not talking about a simple RAID 10
array here, even though I think just XFS over EXT4 would already do us
wonders.

Many thanks for all the responces!

--
Chris.

^ permalink raw reply	[flat|nested] 11+ messages in thread

* RE: Correct RAID options
  2014-08-20  1:24   ` Chris Knipe
@ 2014-08-20  2:38     ` Craig Curtin
  2014-08-20  3:05       ` Chris Knipe
  2014-08-20  7:32     ` David Brown
  1 sibling, 1 reply; 11+ messages in thread
From: Craig Curtin @ 2014-08-20  2:38 UTC (permalink / raw)
  To: Chris Knipe; +Cc: linux-raid

Chris,

I assume that your application will handle it OK if the Archive is offline ? I understand you have throughput issues now, but if you lose the RAID5 setup it is going to take a long time to recover really need to rebuild that as RAID6 ASAP - particularly if it is only under light load now.

Now that you have clarified what your servers are and how they are performing - I would suggest another option would be an external SATA storage system - for cost you can not go past the Promise equipment. You could whack additional SATA drives in one of these on your front end servers (they have models with different connection options - ESATA, SAS, FC etc etc) and this would give you more space on the front end as well as more spindles to handle the writing etc - it would also give you the ability to mess around with different file systems etc.

These Promise systems have inbuilt RAID controllers so you can format as appropriate and present them to the system as whatever disk you wish i.e. multiple logical disks, multiple LUNs etc etc)

Craig

-----Original Message-----
From: linux-raid-owner@vger.kernel.org [mailto:linux-raid-owner@vger.kernel.org] On Behalf Of Chris Knipe
Sent: Wednesday, 20 August 2014 11:25 AM
To: David Brown
Cc: linux-raid@vger.kernel.org
Subject: Re: Correct RAID options

On Wed, Aug 20, 2014 at 2:22 AM, David Brown <david.brown@hesbynett.no> wrote:

> In general, a 15 disk raid5 array is asking for trouble.  At least
> make it raid6.

At this stage the IO load on the archiver with the 15 disk RAID5 is
-very- minimal.  It's not even writing 8MB/s currently as the front end RAID10 servers are obviously severely hampered whilst doing the concurrent read/write requests. Now that it is our peak times, load averages shoot up to over 80 due to IO wait from time to time, so this is kinda critical for me right now :-(

Just a bit more background as was asked in the other replies...
Front end Servers are Dell PowerEdge R720 DX150s (8 x 4TB SATA-III, 64GB Ram, and Dual Xeon E5-2620 Hex-Core @ 2.00GHz) The archiver is custom built (no brand name) and consists of the 15 x 4TB SATA-II drives, 32GB Ram, and a single Xeon E3-1245 Quad-Core @ 3.3Ghz

Now the archiver we added is new - so I can't really comment at this stage on how it is performing as it is not getting any real work from the front ends.  During our standard benching (hdparm / dd / bonnie) with no load on the archiver in terms of IO, performance was more than adequate.

In terms of the front-ends with our "normal" load distribution of a
70/30 split between writes/reads, there's no serious performance problems.  With over 500 concurrent application threads per server accessing the files on the disks, load averages are generally around the 3 to 5 range, with very minimal IO wait.  Munin reports "disk utilization" between 20% and 30%, "disk latency" sub 100ms, and "disk throughput" at about 30MB/s if I have to average all of this out.

Since we've now started to move data from the front ends to the archiver, we have obviously thrown the 70/30 split out of the window, and all stats are basically now off the charts. "disk utilization" is averaging between 90% to 100%. The reading of the data from the front end servers is obviously causing a bottleneck, and I can confirm this seeing that as soon as we stop the archiving process that reads the data on the front ends and writes it to the archiver, the load on the servers return to normal.

In terms of adding more front end servers - it is definitely an option yes.  Being brand name servers they do come at a premium however so I would ideally like to have this as a last resort.  The premium cost, together with the limited storage capacity basically made us opt to rather try and offload some of the storage requirements to cheaper alternatives (more than double the capacity - even at RAID10, for less than half the price - realistically, we will be more than happy with half the performance as well, so I'm not expecting miracles either).

RAID rebuilds are already problematic on the front end servers (RAID
10 over 8 x 4TB) with a single drive failure whilst the server is under load takes approximately 8 odd hours to rebuild if memory serves me correctly.  We've had a few failures in the past (even a double drive failure at the same time), but nothing recent that I can recall accurately.

I was never aware that bigger block sizes would increase read performance though - this is interesting and something I can definitely explore.  I am talking under correction, but I believe the MegaRAIDs we're using can even go bigger than 1mbyte blocks.  I'll have to check on this.  Bigger blocks does mean wasting more space though if the files written are smaller and can't necessarily fill up an entire block, right?  I suppose when you start talking about 12TB and 50TB arrays, the amount of wasted space really becomes insignificant, or am I mistaken?

SANs unfortunately is out of the question as this is hosted infrastructure at a provider that does not offer SANs as part of their product offerings.


> But the general idea is to have a set of raid1 mirrors (or possible
> Linux md
> raid10,far2 pairs if the traffic is read-heavy), and then tie them all
> together using a linear concatenation rather than raid0 stripes.  When
> you

Can I perhaps ask that you just elaborate a bit on what you mean by linear concatenation?  I am presuming you are not referring to RAID 10 'per say' here as to your comment to use this rather than RAID 0 stripes.  XFS by itself, is also a good option - I honestly do not know why this wasn't given consideration when we initially set the machines up.  By the sound of it, all of them are now going to be facing a rebuild.

> I am assuming your files are fairly small - if your reads or writes
> are often smaller than a full stripe of raid10 or raid5, performance
> will suffer greatly compared to XFS on a linear concat.

The files are VERY evenly distributed using md5 hashes.  We have 16 top level directories, 255 second level directories, and 4094 third level directories.  Each third level directory currently holds between 4K and 4.5K files per directory (the archiver servers should have roughly three or four times that amount once the disks are full).
Files are generally between 250kb and 750kb, a small percentage are a bit larger to the 1.5mb range, and I can almost guarantee that not one single file will exceed the 5mb range.  I'm not sure what the stripe size is at this stage but it is more than likely what ever the default is for the controller (64kb?)

I think to explore XFS would need to be my first port of call here.
Take one of the front ends out of production tomorrow when load has quieted down, trash it, and rebuild it.  Then we'll more than likely need 2 or 3 weeks for the disks to fill up again with files before we're really going to see how it compares.

If I can perhaps just get some clarity in terms of the physical disk layouts / configurations that you would recommend, I would appreciate it greately.  You're obviously not talking about a simple RAID 10 array here, even though I think just XFS over EXT4 would already do us wonders.

Many thanks for all the responces!

--
Chris.
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in the body of a message to majordomo@vger.kernel.org More majordomo info at  http://vger.kernel.org/majordomo-info.html

Disclaimer

CONFIDENTIAL

This message contains confidential information and is intended only for the intended recipients. If you are not the named addressee you should not disseminate, distribute or copy this e-mail. Please notify the sender immediately by e-mail if you have received this e-mail by mistake and delete this e-mail from your system.

Disclaimer

CONFIDENTIAL

This message contains confidential information and is intended only for the intended recipients. If you are not the named addressee you should not disseminate, distribute or copy this e-mail. Please notify the sender immediately by e-mail if you have received this e-mail by mistake and delete this e-mail from your system.

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Correct RAID options
  2014-08-20  2:38     ` Craig Curtin
@ 2014-08-20  3:05       ` Chris Knipe
  2014-08-20  3:37         ` Craig Curtin
  0 siblings, 1 reply; 11+ messages in thread
From: Chris Knipe @ 2014-08-20  3:05 UTC (permalink / raw)
  To: Craig Curtin; +Cc: linux-raid

On Wed, Aug 20, 2014 at 4:38 AM, Craig Curtin <craigc@prosis.com.au> wrote:

> I assume that your application will handle it OK if the Archive is offline ? I understand you have throughput issues now, but if you lose the RAID5 setup it is going to take a long time to recover really need to rebuild that as RAID6 ASAP - particularly if it is only under light load now.

Correct.  I will however lose data as I would need to delete old data
off the front end servers instead of archiving it in order not run out
of disk space (which will be catastrophic).  It's not an ideal
situation, but it's not a major loss either at this point in time.  Up
to a day or three ago, we deleted anyways.  So yes, I am more than
likely trashing the archive server and rebuilding it (or not using it
at all and rather expanding on the front end servers - against my
better judgement).


> Now that you have clarified what your servers are and how they are performing - I would suggest another option would be an external SATA storage system - for cost you can not go past the Promise equipment. You could whack additional SATA drives in one of these on your front end servers (they have models with different connection options - ESATA, SAS, FC etc etc) and this would give you more space on the front end as well as more spindles to handle the writing etc - it would also give you the ability to mess around with different file systems etc.

I really don't want to be a PITA, but these are leased servers (sorry
if I haven't been clear). We do not even have physical access to the
servers, frankly, they are half way around the world from where I am
:-)  Hardware changes in any way or form is not going to happen.  The
provider is simply not going to play - we've tried before. We've tried
so badly before that it was easier / quicker to cancel a server and
order a new one, instead of having them fix something.  Yes, it's sad,
but that's what is there and that's unfortunately the way it currently
is.

I really do not need a 'blue sky' here.  I know what I am doing is
-insanely- resource intensive and frankly speaking, I am -amazed- at
how well these front end servers are coping up to now.  There are
limits, and sure we are hitting those limits.  The question really is
what can be done to achieve the best performance with what we have?
RAID10/EXT4 isn't working now that we are reading and writing
extensively at the same time - it was more than likely a bad /
uneducated / inexperienced choice.  I understand that, I accept that.

Will XFS give any amount of reasonable increases in performance to
justify a format / reinstall (let's forget about backups and downtime
now), or would the only viable option be the expensive one, and get
more servers (touch wood - the load balancers thankfully are nowhere
near max capacity)?  I don't have the experience or the tools to
properly bench servers simulating our current IO load, so I'm really
hoping that someone more experienced will be able to chip in here with
some definitive answers (if that is even possible).  If what I have
currently is as good as it is going to, reasonably, get then so be it
as well.  If that's the case I have no option but to increase the
amount of front end servers and I can close my eyes and accept that
too.

--
Chris.

^ permalink raw reply	[flat|nested] 11+ messages in thread

* RE: Correct RAID options
  2014-08-20  3:05       ` Chris Knipe
@ 2014-08-20  3:37         ` Craig Curtin
  0 siblings, 0 replies; 11+ messages in thread
From: Craig Curtin @ 2014-08-20  3:37 UTC (permalink / raw)
  To: Chris Knipe; +Cc: linux-raid

OK - sorry but I will have to get someone else to chip in re the XFS situation.

I would say though that is sounds like you are painting yourself into a very tight corner and you might want to find a Hosted site that offers more options than you appear to have available.

Craig

-----Original Message-----
From: cknipe@savage.za.org [mailto:cknipe@savage.za.org] On Behalf Of Chris Knipe
Sent: Wednesday, 20 August 2014 1:06 PM
To: Craig Curtin
Cc: linux-raid@vger.kernel.org
Subject: Re: Correct RAID options

On Wed, Aug 20, 2014 at 4:38 AM, Craig Curtin <craigc@prosis.com.au> wrote:

> I assume that your application will handle it OK if the Archive is offline ? I understand you have throughput issues now, but if you lose the RAID5 setup it is going to take a long time to recover really need to rebuild that as RAID6 ASAP - particularly if it is only under light load now.

Correct.  I will however lose data as I would need to delete old data off the front end servers instead of archiving it in order not run out of disk space (which will be catastrophic).  It's not an ideal situation, but it's not a major loss either at this point in time.  Up to a day or three ago, we deleted anyways.  So yes, I am more than likely trashing the archive server and rebuilding it (or not using it at all and rather expanding on the front end servers - against my better judgement).


> Now that you have clarified what your servers are and how they are performing - I would suggest another option would be an external SATA storage system - for cost you can not go past the Promise equipment. You could whack additional SATA drives in one of these on your front end servers (they have models with different connection options - ESATA, SAS, FC etc etc) and this would give you more space on the front end as well as more spindles to handle the writing etc - it would also give you the ability to mess around with different file systems etc.

I really don't want to be a PITA, but these are leased servers (sorry if I haven't been clear). We do not even have physical access to the servers, frankly, they are half way around the world from where I am
:-)  Hardware changes in any way or form is not going to happen.  The provider is simply not going to play - we've tried before. We've tried so badly before that it was easier / quicker to cancel a server and order a new one, instead of having them fix something.  Yes, it's sad, but that's what is there and that's unfortunately the way it currently is.

I really do not need a 'blue sky' here.  I know what I am doing is
-insanely- resource intensive and frankly speaking, I am -amazed- at how well these front end servers are coping up to now.  There are limits, and sure we are hitting those limits.  The question really is what can be done to achieve the best performance with what we have?
RAID10/EXT4 isn't working now that we are reading and writing extensively at the same time - it was more than likely a bad / uneducated / inexperienced choice.  I understand that, I accept that.

Will XFS give any amount of reasonable increases in performance to justify a format / reinstall (let's forget about backups and downtime now), or would the only viable option be the expensive one, and get more servers (touch wood - the load balancers thankfully are nowhere near max capacity)?  I don't have the experience or the tools to properly bench servers simulating our current IO load, so I'm really hoping that someone more experienced will be able to chip in here with some definitive answers (if that is even possible).  If what I have currently is as good as it is going to, reasonably, get then so be it as well.  If that's the case I have no option but to increase the amount of front end servers and I can close my eyes and accept that too.

--
Chris.

Disclaimer

CONFIDENTIAL

This message contains confidential information and is intended only for the intended recipients. If you are not the named addressee you should not disseminate, distribute or copy this e-mail. Please notify the sender immediately by e-mail if you have received this e-mail by mistake and delete this e-mail from your system.

Disclaimer

CONFIDENTIAL

This message contains confidential information and is intended only for the intended recipients. If you are not the named addressee you should not disseminate, distribute or copy this e-mail. Please notify the sender immediately by e-mail if you have received this e-mail by mistake and delete this e-mail from your system.

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Correct RAID options
  2014-08-19 18:38 Correct RAID options Chris Knipe
                   ` (2 preceding siblings ...)
  2014-08-20  0:22 ` David Brown
@ 2014-08-20  5:58 ` Chris Schanzle
  3 siblings, 0 replies; 11+ messages in thread
From: Chris Schanzle @ 2014-08-20  5:58 UTC (permalink / raw)
  To: savage, linux-raid

On 08/19/2014 02:38 PM, Chris Knipe wrote:
> All the servers
> store millions of small (< 2mb) files, in a structured directory structure
> to keep the amount of files per directory in check.

How many millions?
If you ever have to xfs_repair, the RAM requirements get pretty substantial with many files.  Shouldn't be a problem with 64GB servers, but not long ago with 4-8GB boxes, this was an issue.  Be sure to test and monitor with 'top'.


 >Bigger blocks does mean wasting more space though if the files written are smaller and can't necessarily fill up an entire block, right?

No, it means you have to read more "junk" to rewrite the whole stripe (think read:modify:write).  [Perhaps you are thinking of filesystem block size affecting "internal fragmentation".]


 >load averages shoot up to over 80 due to IO wait from time to time,

Just a friendly reminder that your processors are not "busy" with that high load, just you have many processes WAITING on disk I/O. Many people incorrectly equate load average with CPU utilization.


 >Files are generally between 250kb and 750kb, a small percentage are a
 >bit larger to the 1.5mb range, and I can almost guarantee that not one
 >single file will exceed the 5mb range.

With so many of your files being similar sized, a 1MB stripe size should be optimal for a parallel random read/write usage.


When archiving, your disk waits are probably caused by disk seeking.  If you can cache more (hopefully *all*!) inode/dnodes, you will reduce disk seeking tremendously by not having to seek for metadata (possibly requiring multiple seeks for each file read), instead just seeking to the actual file data and reading a whole stripe greatly reduces disk head thrashing and waits.  I know it is counter-intuitive, but the last thing you want your file server to cache is file data (unless the same file data is repeatedly read).  Metadata is king.  [I'm still waiting for a filesystem that supports storing metadata on a separate device, like RAID1 SSD.]

Try tuning /proc/sys/vm/vfs_cache_pressure to low values, preferably 0 to (theoretically) never flush inode/dnode data (though in practice, it still can drop inode data to cache file data).  Watch /proc/meminfo Slab (mostly inode/dnodes) grow in balance to Cached (file data).  With small (4GB) systems, I have seen kernel hangs under memory pressure from heavy disk writing. To help with this, tell the kernel to start asynchronously flushing dirty file data earlier by reducing /proc/sys/vm/dirty_background_ratio down to say, 5 or 2 (% of RAM).  Set /proc/sys/vm/dirty_ratio (% of RAM) where processes get blocked to flush dirty data to disk) to a realistic value so as to not let the dirty data cache wipe out your Slab, but still handle bursts of writes without putting processes into a blocked 
 disk wait state.  Use "time du -shx" to load the Slab with all your inode/dnode data.  Repeat.  Compare those times and monitor with 'iostat -x' to see how much disk I/O it takes (if none, 
it will be only be seconds, even for millions of files, rather than 10s of minutes).

You never mentioned how much RAM your processes take, but be sure to leave room for those as well when coming up with more appropriate dirty_ratio value.  Swapping out infrequently used pages from long-running processes is not necessarily a bad thing (especially if swap is on other spindles) and tuning /proc/sys/vm/swappiness *up* a bit can encourage the kernel to do just that.


With RAM being *relatively* inexpensive, if you use it effectively, it can greatly reduce your disk seeking and thus, waiting.


Of course, be sure to consider file system mount options to reduce seeking/writing:  mount with noatime, nobarrier.  And for ext3/4, consider using "tune2fs -o journal_data_writeback".  Be sure to understand what you're giving up in exchange for what you're getting.

Crazy as it may be, this is interesting stuff to me.  Feel free to drop me a line (on or off list) if you try this and if it helps (or hurts) or you have other ideas.

Regards,
Chris Schanzle



^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Correct RAID options
  2014-08-20  1:24   ` Chris Knipe
  2014-08-20  2:38     ` Craig Curtin
@ 2014-08-20  7:32     ` David Brown
  2014-08-20 14:45       ` Chris Knipe
  1 sibling, 1 reply; 11+ messages in thread
From: David Brown @ 2014-08-20  7:32 UTC (permalink / raw)
  To: Chris Knipe; +Cc: linux-raid

On 20/08/14 03:24, Chris Knipe wrote:
> On Wed, Aug 20, 2014 at 2:22 AM, David Brown <david.brown@hesbynett.no> wrote:
> 

Let me first add a disclaimer - my comments here are based mainly on
theory, much of it gained from discussions on this list over the years.
 I haven't built servers like this, and I haven't used XFS with linear
concatenation - I have just heard many nice things about it, and it
sounds to me like a good fit for your needs.  But you are doing the
right thing with your testing and benchmarking - it's the only way to
find out if the raid/filesystem setup matches /your/ loads.

>> In general, a 15 disk raid5 array is asking for trouble.  At least make it
>> raid6.
> 
> At this stage the IO load on the archiver with the 15 disk RAID5 is
> -very- minimal.  It's not even writing 8MB/s currently as the front
> end RAID10 servers are obviously severely hampered whilst doing the
> concurrent read/write requests. Now that it is our peak times, load
> averages shoot up to over 80 due to IO wait from time to time, so this
> is kinda critical for me right now :-(

I wasn't thinking so much about the load here, as the
safety/reliability.  With a 15 disk system with heavy load, you /will/
get double failures, such as a disk failing and an unrecoverable read
error during rebuild.  Raid6 will make it orders of magnitude more reliable.

Regarding performance, striping (with raid0, raid5, raid6) across a
large number of disks (or raid1 pairs for raid10) works well for large
reads and writes, but for smaller accesses you get lots of partial
stripe writes (which have to be turned into full stripe writes, or RMW
accesses) and lots of head movement for each access.  Stripe caches
help, of course, but with hardware raid cards the stripe cache is
limited by the hardware raid card rather than main memory.

> 
> Just a bit more background as was asked in the other replies...
> Front end Servers are Dell PowerEdge R720 DX150s (8 x 4TB SATA-III,
> 64GB Ram, and Dual Xeon E5-2620 Hex-Core @ 2.00GHz)
> The archiver is custom built (no brand name) and consists of the 15 x
> 4TB SATA-II drives, 32GB Ram, and a single Xeon E3-1245 Quad-Core @
> 3.3Ghz
> 
> Now the archiver we added is new - so I can't really comment at this
> stage on how it is performing as it is not getting any real work from
> the front ends.  During our standard benching (hdparm / dd / bonnie)
> with no load on the archiver in terms of IO, performance was more than
> adequate.
> 
> In terms of the front-ends with our "normal" load distribution of a
> 70/30 split between writes/reads, there's no serious performance
> problems.  With over 500 concurrent application threads per server
> accessing the files on the disks, load averages are generally around
> the 3 to 5 range, with very minimal IO wait.  Munin reports "disk
> utilization" between 20% and 30%, "disk latency" sub 100ms, and "disk
> throughput" at about 30MB/s if I have to average all of this out.
> 
> Since we've now started to move data from the front ends to the
> archiver, we have obviously thrown the 70/30 split out of the window,
> and all stats are basically now off the charts. "disk utilization" is
> averaging between 90% to 100%. The reading of the data from the front
> end servers is obviously causing a bottleneck, and I can confirm this
> seeing that as soon as we stop the archiving process that reads the
> data on the front ends and writes it to the archiver, the load on the
> servers return to normal.

Is there any way to coordinate the writes to the front end and the
archiver?  If you can archive a file just after it has been written to
the front-end disks, then it will be served from ram, and there will be
no need to read it physically from the disk.

> 
> In terms of adding more front end servers - it is definitely an option
> yes.  Being brand name servers they do come at a premium however so I
> would ideally like to have this as a last resort.  The premium cost,
> together with the limited storage capacity basically made us opt to
> rather try and offload some of the storage requirements to cheaper
> alternatives (more than double the capacity - even at RAID10, for less
> than half the price - realistically, we will be more than happy with
> half the performance as well, so I'm not expecting miracles either).
> 
> RAID rebuilds are already problematic on the front end servers (RAID
> 10 over 8 x 4TB) with a single drive failure whilst the server is
> under load takes approximately 8 odd hours to rebuild if memory serves
> me correctly.  We've had a few failures in the past (even a double
> drive failure at the same time), but nothing recent that I can recall
> accurately.
> 
> I was never aware that bigger block sizes would increase read
> performance though - this is interesting and something I can
> definitely explore.  I am talking under correction, but I believe the
> MegaRAIDs we're using can even go bigger than 1mbyte blocks.  I'll
> have to check on this.  Bigger blocks does mean wasting more space
> though if the files written are smaller and can't necessarily fill up
> an entire block, right?  I suppose when you start talking about 12TB
> and 50TB arrays, the amount of wasted space really becomes
> insignificant, or am I mistaken?
> 
> SANs unfortunately is out of the question as this is hosted
> infrastructure at a provider that does not offer SANs as part of their
> product offerings.
> 
> 
>> But the general idea is to have a set of raid1 mirrors (or possible Linux md
>> raid10,far2 pairs if the traffic is read-heavy), and then tie them all
>> together using a linear concatenation rather than raid0 stripes.  When you
> 
> Can I perhaps ask that you just elaborate a bit on what you mean by
> linear concatenation?  I am presuming you are not referring to RAID 10
> 'per say' here as to your comment to use this rather than RAID 0
> stripes.  XFS by itself, is also a good option - I honestly do not
> know why this wasn't given consideration when we initially set the
> machines up.  By the sound of it, all of them are now going to be
> facing a rebuild.

Let me step back a little, and try to make the jargon clearer.  Terms
can be slightly different in the md raid world than the hardware raid
world, because md raid is often more flexible.

raid1 is a simple mirroring of two or more disks.  I don't know if your
hardware allows three-way mirroring, but it can help speed up read
access (more parallel reads from the same set), gives extra redundancy,
and faster rebuilds, at the cost of marginally more write time (since
your write latency is the worst case of three disks) - and obviously at
the cost of more disks.  For many complex raid systems, raid1 pairs are
your basic building block.

md raid supports a type of raid10 on two disks (actually, on any number
of disks).  You can imagine the "far" layout as splitting the two disks
into two halves, 1a+1b and 2a+2b.  1a is mirrored (raid1) with 2b and 1b
is mirrored with 2a.  Then these two mirrors are striped (raid0).  Write
performance is similar to raid1 - data is written in two copies, once to
each disk.  But read performance is fast - it is read as a stripe, with
the faster outer halves of each disk used of preference, giving faster
than raid0 reads.  For read-heavy loads, it is therefore a very nice setup.

<http://en.wikipedia.org/wiki/Linux_MD_RAID_10#LINUX-MD-RAID-10>

In your case, however, I expect you will use plain raid1 pairs from the
hardware raid controller.


"linear concatenation" is a md raid type that simply makes one big
logical disk from putting the contained disks (or raid1 pairs) together.
 There is no striping or extra parity.  The overhead of the "raid" is
absolutely minimal - no more than a re-mapping of logical sectors on the
concat to the constituent block devices.

This is quite inefficient for most filesystems - critical structures
would end up on the one raid1 pair, and you would make no use of the
later pairs until the first pairs were full.  But XFS has a concept of
"allocation groups", and likes to divide the whole "disk" into these
AG's.  Every time you make a new directory, it gets put into another AG
with a simple round-robin policy (AFAIK).  All access to a file - data,
metadata, inodes, directory entries, etc., will be done within entirely
within the AG.

So with your 8 disk front-end servers, you would first set up 4 pairs of
hardware raid1 mirrors.  You join these in a linear concat.  Then you
make an XFS filesystem with two AG's per mirror - 8 AG's altogether.
The directories you make will then be spread evenly across these, and
you will get maximal parallelism accessing files in different directories.

XFS over linear concat is typically used for large mail servers (using
maildir directories for each user), or "home directory servers" for
large numbers of users.  It is efficient for small file access, and
stops large file accesses blocking other accesses (but it is not ideal
if you need high speed streaming of a few big files).

(As the XFS fills up, if an AG gets full then new files spill over into
other AGs - so if your data is not evenly spread across directories then
you can still use all your disk space, but you lose a little of the
parallelism.)

> 
>> I am assuming your files are fairly small - if your reads or writes are
>> often smaller than a full stripe of raid10 or raid5, performance will suffer
>> greatly compared to XFS on a linear concat.
> 
> The files are VERY evenly distributed using md5 hashes.  We have 16
> top level directories, 255 second level directories, and 4094 third
> level directories.  Each third level directory currently holds between
> 4K and 4.5K files per directory (the archiver servers should have
> roughly three or four times that amount once the disks are full).
> Files are generally between 250kb and 750kb, a small percentage are a
> bit larger to the 1.5mb range, and I can almost guarantee that not one
> single file will exceed the 5mb range.  I'm not sure what the stripe
> size is at this stage but it is more than likely what ever the default
> is for the controller (64kb?)
> 
> I think to explore XFS would need to be my first port of call here.
> Take one of the front ends out of production tomorrow when load has
> quieted down, trash it, and rebuild it.  Then we'll more than likely
> need 2 or 3 weeks for the disks to fill up again with files before
> we're really going to see how it compares.
> 
> If I can perhaps just get some clarity in terms of the physical disk
> layouts / configurations that you would recommend, I would appreciate
> it greately.  You're obviously not talking about a simple RAID 10
> array here, even though I think just XFS over EXT4 would already do us
> wonders.
> 
> Many thanks for all the responces!
> 

The xfs.org site should have more information on this (read the FAQ),
and I believe they have a good mailing list too.  There are a number of
options and parameters that are important when creating and mounting an
XFS system, and they can make a huge difference to performance.  You
need to be careful about barriers and caching - if your hardware raid
controller has battery backup then you can disable barriers for faster
performance.  If you get your AG's aligned with the elements of your
linear cat, you will get high speeds - but if you get it wrong,
performance will be crippled.  And while I believe "twice the number of
raid1 pairs" is a common choice for the number of AG's in this sort of
arrangement, it may be better with more (but still a multiple of the
number of pairs).

Another possibility for XFS is to use an external log file rather than
putting it on main disks.  Consider using a small but fast SSD for the
log, in addition to the main disk array.  This would also be a
convenient place to put everything else, such as the OS, leaving your
main disks for the application data.

Also be aware that doing a fsck on XFS can take a long time, and use a
lot of memory.  I assume you've got a good UPS!

Remember, this is expensive, high-performance equipment you are playing
with.  So have fun :-)



^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Correct RAID options
  2014-08-20  7:32     ` David Brown
@ 2014-08-20 14:45       ` Chris Knipe
  0 siblings, 0 replies; 11+ messages in thread
From: Chris Knipe @ 2014-08-20 14:45 UTC (permalink / raw)
  To: David Brown; +Cc: linux-raid

Hi All,

Thanks for all the valued feedback and recommendations.  Chris
Schanzle has been a great help in tweaking what we already have and we
are seeing considerable improvements after making some changes as to
how (i|d)nodes are being cached and slab caches and what not.  There
still is fundamental issues based on other's recommendations that goes
back to how the servers was setup to begin with.

I'm slowly tweaking and getting everything stable again to a degree
where it actually works, and then we will more than likely deploy a
new front-end configured correctly (1M stripe sizes, xfs, proper mkfs
options, etc.) before starting some rolling process to move load and
reinstall each front-end server correctly.  We do need to
unfortunately take this drastic step as the raid arrays needs to be
trashed and re-created obviously.

Definitely learned a lot, and I'm sure that a lot of really useful
information has made it to the archives as well!

--
Chris.

^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2014-08-20 14:45 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2014-08-19 18:38 Correct RAID options Chris Knipe
2014-08-19 23:28 ` Craig Curtin
2014-08-19 23:42 ` Roger Heflin
2014-08-20  0:22 ` David Brown
2014-08-20  1:24   ` Chris Knipe
2014-08-20  2:38     ` Craig Curtin
2014-08-20  3:05       ` Chris Knipe
2014-08-20  3:37         ` Craig Curtin
2014-08-20  7:32     ` David Brown
2014-08-20 14:45       ` Chris Knipe
2014-08-20  5:58 ` Chris Schanzle

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.