linux-raid.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* Best way to add caching to a new raid setup.
       [not found] <16cee7f2-38d9-13c8-4342-4562be68930b.ref@verizon.net>
@ 2020-08-28  2:31 ` R. Ramesh
  2020-08-28  3:05   ` Peter Grandi
                     ` (2 more replies)
  0 siblings, 3 replies; 36+ messages in thread
From: R. Ramesh @ 2020-08-28  2:31 UTC (permalink / raw)
  To: Linux Raid

I have two raid6s running on mythbuntu 14.04. The are built on 6 
enterprise drives. So, no hd issues as of now. Still, I plan to upgrade 
as it has been a while and the size of the hard drives have become 
significantly larger (a indication that my disks may be older) I want to 
build new raid using the 16/14tb drives. Since I am building new raid, I 
thought I could explore caching options. I see a mention of LVM cache 
and few other bcache/xyzcache etc.

Is anyone of them better than other or no cache is safer. Since I 
switched over to NVME boot drives, I have quite a few SATA SSDs lying 
around that I can put to good use, if I cache using them.

I will move to xubuntu 20.04 as part of this upgrade. So, hopefully, I 
will have recent versions of kernel, mdadm and fstools. With these I 
should be able to make full use of current features, if any is needed 
for caching support.

Please let me know your expert opinion.

Thanks
Ramesh


^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: Best way to add caching to a new raid setup.
  2020-08-28  2:31 ` Best way to add caching to a new raid setup R. Ramesh
@ 2020-08-28  3:05   ` Peter Grandi
  2020-08-28  3:19     ` Ram Ramesh
  2020-08-28 15:26   ` antlists
  2020-08-28 17:46   ` Roman Mamedov
  2 siblings, 1 reply; 36+ messages in thread
From: Peter Grandi @ 2020-08-28  3:05 UTC (permalink / raw)
  To: Linux Raid

> I have two raid6s running on mythbuntu 14.04. The are built on
> 6 enterprise drives. [...] want to build new raid using the
> 16/14tb drives. [...]

This may be the beginning of an exciting adventure into setting
up a RAID set with stunning rebuild times, minimizing IOPS-per-TB
and setting up filetrees that cannot be realistically 'fsck'ed.
Plenty of people seem to like that kind of exciting adventure :-).

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: Best way to add caching to a new raid setup.
  2020-08-28  3:05   ` Peter Grandi
@ 2020-08-28  3:19     ` Ram Ramesh
  0 siblings, 0 replies; 36+ messages in thread
From: Ram Ramesh @ 2020-08-28  3:19 UTC (permalink / raw)
  To: Peter Grandi, Linux Raid

On 8/27/20 10:05 PM, Peter Grandi wrote:
>> I have two raid6s running on mythbuntu 14.04. The are built on
>> 6 enterprise drives. [...] want to build new raid using the
>> 16/14tb drives. [...]
> This may be the beginning of an exciting adventure into setting
> up a RAID set with stunning rebuild times, minimizing IOPS-per-TB
> and setting up filetrees that cannot be realistically 'fsck'ed.
> Plenty of people seem to like that kind of exciting adventure :-).
Yes, just as exciting as my raid1 on another machine with 3 one TB WD 
black from 15+ years ago (one of the first 1 TB blacks) Still running 
strong after this many years 24x7 and has TLR :-)

Most likely I am building same size raid (likely raid1 on two 14/16tb) 
No EB filesystem for me (yet!)

Ramesh

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: Best way to add caching to a new raid setup.
  2020-08-28  2:31 ` Best way to add caching to a new raid setup R. Ramesh
  2020-08-28  3:05   ` Peter Grandi
@ 2020-08-28 15:26   ` antlists
  2020-08-28 17:25     ` Ram Ramesh
  2020-08-28 17:46   ` Roman Mamedov
  2 siblings, 1 reply; 36+ messages in thread
From: antlists @ 2020-08-28 15:26 UTC (permalink / raw)
  To: R. Ramesh, Linux Raid

On 28/08/2020 03:31, R. Ramesh wrote:
> I want to build new raid using the 16/14tb drives. Since I am building 
> new raid, I thought I could explore caching options. I see a mention of 
> LVM cache and few other bcache/xyzcache etc.
> 
> Is anyone of them better than other or no cache is safer. Since I 
> switched over to NVME boot drives, I have quite a few SATA SSDs lying 
> around that I can put to good use, if I cache using them.

Sounds like a fun idea. Just make sure you're getting CMR not SMR 
drives, but I'm not aware of SMR that large ...

Hopefully I'm going to do some work on it soon, but look at dm-integrity 
to make sure you don't get a dodgy mirror. You can add dm-integrity 
retrospectively, so if you leave a bit of unused space on the drive, I 
think you can tell dm-integrity where to put its checksums.

Cheers,
Wol

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: Best way to add caching to a new raid setup.
  2020-08-28 15:26   ` antlists
@ 2020-08-28 17:25     ` Ram Ramesh
  2020-08-28 22:12       ` antlists
  0 siblings, 1 reply; 36+ messages in thread
From: Ram Ramesh @ 2020-08-28 17:25 UTC (permalink / raw)
  To: antlists, R. Ramesh, Linux Raid

On 8/28/20 10:26 AM, antlists wrote:
> On 28/08/2020 03:31, R. Ramesh wrote:
>> I want to build new raid using the 16/14tb drives. Since I am 
>> building new raid, I thought I could explore caching options. I see a 
>> mention of LVM cache and few other bcache/xyzcache etc.
>>
>> Is anyone of them better than other or no cache is safer. Since I 
>> switched over to NVME boot drives, I have quite a few SATA SSDs lying 
>> around that I can put to good use, if I cache using them.
>
> Sounds like a fun idea. Just make sure you're getting CMR not SMR 
> drives, but I'm not aware of SMR that large ...
>
> Hopefully I'm going to do some work on it soon, but look at 
> dm-integrity to make sure you don't get a dodgy mirror. You can add 
> dm-integrity retrospectively, so if you leave a bit of unused space on 
> the drive, I think you can tell dm-integrity where to put its checksums.
>
> Cheers,
> Wol
Yes, no SMR. I plan to get only enterprise helium drives (seagate exos 
X14 or X16).

I googled on RAID cache performance. I did not get too many interesting 
hits. A couple that find seem to indicate that LVM cache shows no 
performance improvement. Can't understand why. May be SATA limits (SSD = 
500MB and disk could be as high as 200M and with raid1 that might go up 
as we have two disks to read etc)

I am mainly looking for IOP improvement as I want to use this RAID in 
mythtv environment. So multiple threads will be active and I expect 
cache to help with random access IOPs.

Regards
Ramesh


^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: Best way to add caching to a new raid setup.
  2020-08-28  2:31 ` Best way to add caching to a new raid setup R. Ramesh
  2020-08-28  3:05   ` Peter Grandi
  2020-08-28 15:26   ` antlists
@ 2020-08-28 17:46   ` Roman Mamedov
  2020-08-28 20:39     ` Ram Ramesh
  2 siblings, 1 reply; 36+ messages in thread
From: Roman Mamedov @ 2020-08-28 17:46 UTC (permalink / raw)
  To: R. Ramesh; +Cc: Linux Raid

On Thu, 27 Aug 2020 21:31:07 -0500
"R. Ramesh" <rramesh@verizon.net> wrote:

> I have two raid6s running on mythbuntu 14.04. The are built on 6 
> enterprise drives. So, no hd issues as of now. Still, I plan to upgrade 
> as it has been a while and the size of the hard drives have become 
> significantly larger (a indication that my disks may be older) I want to 
> build new raid using the 16/14tb drives. Since I am building new raid, I 
> thought I could explore caching options. I see a mention of LVM cache 
> and few other bcache/xyzcache etc.

Once you set up bcache, it cannot be removed. The volume will always stay a
bcache volume, even if you decide to stop using caching. Which feels weird and
potentially troublesome, going through an extra layer (kernel driver) with its
complexity and computational overhead (no matter how small).

On the other hand LVM with caching turned off is just normal LVM, that you'd
likely would have used anyway, for other benefits that it provides.

Also my impression is that LVM has more solid and reliable codebase, but
bcache might provide a somewhat better the performance boost due to caching.

-- 
With respect,
Roman

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: Best way to add caching to a new raid setup.
  2020-08-28 17:46   ` Roman Mamedov
@ 2020-08-28 20:39     ` Ram Ramesh
  2020-08-29 15:34       ` antlists
  2020-08-30 22:16       ` Michal Soltys
  0 siblings, 2 replies; 36+ messages in thread
From: Ram Ramesh @ 2020-08-28 20:39 UTC (permalink / raw)
  To: Roman Mamedov, R. Ramesh; +Cc: Linux Raid

On 8/28/20 12:46 PM, Roman Mamedov wrote:
> On Thu, 27 Aug 2020 21:31:07 -0500
> "R. Ramesh" <rramesh@verizon.net> wrote:
>
>> I have two raid6s running on mythbuntu 14.04. The are built on 6
>> enterprise drives. So, no hd issues as of now. Still, I plan to upgrade
>> as it has been a while and the size of the hard drives have become
>> significantly larger (a indication that my disks may be older) I want to
>> build new raid using the 16/14tb drives. Since I am building new raid, I
>> thought I could explore caching options. I see a mention of LVM cache
>> and few other bcache/xyzcache etc.
> Once you set up bcache, it cannot be removed. The volume will always stay a
> bcache volume, even if you decide to stop using caching. Which feels weird and
> potentially troublesome, going through an extra layer (kernel driver) with its
> complexity and computational overhead (no matter how small).
>
> On the other hand LVM with caching turned off is just normal LVM, that you'd
> likely would have used anyway, for other benefits that it provides.
>
> Also my impression is that LVM has more solid and reliable codebase, but
> bcache might provide a somewhat better the performance boost due to caching.
>
Thanks for the info on bcache. I do not think it will be my favorite. I 
am going to try LVM cache as my first choice. Note that the new disks 
will be spare disks for some time and I will be able to try out a few 
things before deciding to put it into use.

One thing about LVM that I am not clear. Given the choice between 
creating /mirror LV /on a VG over simple PVs and /simple LV/ over raid1 
PVs, which is preferred method? Why?

Ramesh


^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: Best way to add caching to a new raid setup.
  2020-08-28 17:25     ` Ram Ramesh
@ 2020-08-28 22:12       ` antlists
  2020-08-28 22:40         ` Ram Ramesh
  0 siblings, 1 reply; 36+ messages in thread
From: antlists @ 2020-08-28 22:12 UTC (permalink / raw)
  To: Ram Ramesh, R. Ramesh, Linux Raid

On 28/08/2020 18:25, Ram Ramesh wrote:
> I am mainly looking for IOP improvement as I want to use this RAID in 
> mythtv environment. So multiple threads will be active and I expect 
> cache to help with random access IOPs.

???

Caching will only help in a read-after-write scenario, or a 
read-several-times scenario.

I'm guessing mythtv means it's a film server? Can ALL your films (or at 
least your favourite "watch again and again" ones) fit in the cache? If 
you watch a lot of films, chances are you'll read it from disk (no 
advantage from the cache), and by the time you watch it again it will 
have been evicted so you'll have to read it again.

The other time cache may be useful, is if you're recording one thing and 
watching another. That way, the writes can stall in cache as you 
prioritise reading.

Think about what is actually happening at the i/o level, and will cache 
help?

Cheers,
Wol

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: Best way to add caching to a new raid setup.
  2020-08-28 22:12       ` antlists
@ 2020-08-28 22:40         ` Ram Ramesh
  2020-08-28 22:59           ` antlists
                             ` (2 more replies)
  0 siblings, 3 replies; 36+ messages in thread
From: Ram Ramesh @ 2020-08-28 22:40 UTC (permalink / raw)
  To: antlists, R. Ramesh, Linux Raid

On 8/28/20 5:12 PM, antlists wrote:
> On 28/08/2020 18:25, Ram Ramesh wrote:
>> I am mainly looking for IOP improvement as I want to use this RAID in 
>> mythtv environment. So multiple threads will be active and I expect 
>> cache to help with random access IOPs.
>
> ???
>
> Caching will only help in a read-after-write scenario, or a 
> read-several-times scenario.
>
> I'm guessing mythtv means it's a film server? Can ALL your films (or 
> at least your favourite "watch again and again" ones) fit in the 
> cache? If you watch a lot of films, chances are you'll read it from 
> disk (no advantage from the cache), and by the time you watch it again 
> it will have been evicted so you'll have to read it again.
>
> The other time cache may be useful, is if you're recording one thing 
> and watching another. That way, the writes can stall in cache as you 
> prioritise reading.
>
> Think about what is actually happening at the i/o level, and will 
> cache help?
>
> Cheers,
> Wol

Mythtv is a sever client DVR system. I have a client next to each of my 
TVs and one backend with large disk (this will have RAID with cache). At 
any time many clients will be accessing different programs and any 
scheduled recording will also be going on in parallel. So you will see a 
lot of seeks, but still all will be based on limited threads (I only 
have 3 TVs and may be one other PC acting as a client) So lots of IOs, 
mostly sequential, across small number of threads. I think most cache 
algorithms should be able to benefit from random access to blocks in SSD.

Do you see any flaws in my argument?

Regards
Ramesh


^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: Best way to add caching to a new raid setup.
  2020-08-28 22:40         ` Ram Ramesh
@ 2020-08-28 22:59           ` antlists
  2020-08-29  3:08             ` R. Ramesh
  2020-08-29  0:01           ` Roger Heflin
  2020-08-31 19:20           ` Nix
  2 siblings, 1 reply; 36+ messages in thread
From: antlists @ 2020-08-28 22:59 UTC (permalink / raw)
  To: Ram Ramesh, antlists, R. Ramesh, Linux Raid

On 28/08/2020 23:40, Ram Ramesh wrote:
> On 8/28/20 5:12 PM, antlists wrote:
>> On 28/08/2020 18:25, Ram Ramesh wrote:
>>> I am mainly looking for IOP improvement as I want to use this RAID in 
>>> mythtv environment. So multiple threads will be active and I expect 
>>> cache to help with random access IOPs.
>>
>> ???
>>
>> Caching will only help in a read-after-write scenario, or a 
>> read-several-times scenario.
>>
>> I'm guessing mythtv means it's a film server? Can ALL your films (or 
>> at least your favourite "watch again and again" ones) fit in the 
>> cache? If you watch a lot of films, chances are you'll read it from 
>> disk (no advantage from the cache), and by the time you watch it again 
>> it will have been evicted so you'll have to read it again.
>>
>> The other time cache may be useful, is if you're recording one thing 
>> and watching another. That way, the writes can stall in cache as you 
>> prioritise reading.
>>
>> Think about what is actually happening at the i/o level, and will 
>> cache help?
>>
>> Cheers,
>> Wol
> 
> Mythtv is a sever client DVR system. I have a client next to each of my 
> TVs and one backend with large disk (this will have RAID with cache). At 
> any time many clients will be accessing different programs and any 
> scheduled recording will also be going on in parallel. So you will see a 
> lot of seeks, but still all will be based on limited threads (I only 
> have 3 TVs and may be one other PC acting as a client) So lots of IOs, 
> mostly sequential, across small number of threads. I think most cache 
> algorithms should be able to benefit from random access to blocks in SSD.
> 
> Do you see any flaws in my argument?
> 
I don't think you've understood mine. Doesn't matter what the cache 
algorithm is, the whole point of caching is that - when reading - it is 
only a benefit if the different threads are reading THE SAME bits of 
disk. So if your 3 TVs and the PC are accessing different tv programs, 
caching won't be much use, as all the reads will be cache misses.

As for writing, caching can let you prioritise reading so you don't get 
stutter while watching. And it'll speed things up if you watch while 
recording.

But basically, caching will really only benefit you if (a) your cache is 
large enough to hold all your favourite films so they don't get evicted 
from cache, or (b) you're in the habit of watching while recording, or 
(c) two or more tvs are in the habit of watching the same program.

The question is not "how many simultaneous threads do I have?", but "how 
many of my disk i/os are going to be cache misses?" Your argument 
actively avoids that question. I suspect the answer is "most of them".

Cheers,
Wol

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: Best way to add caching to a new raid setup.
  2020-08-28 22:40         ` Ram Ramesh
  2020-08-28 22:59           ` antlists
@ 2020-08-29  0:01           ` Roger Heflin
  2020-08-29  3:12             ` R. Ramesh
  2020-08-31 19:20           ` Nix
  2 siblings, 1 reply; 36+ messages in thread
From: Roger Heflin @ 2020-08-29  0:01 UTC (permalink / raw)
  To: Ram Ramesh; +Cc: antlists, R. Ramesh, Linux Raid

Something I would suggest, I have found improves my mythtv experience
is:  Get a big enough SSD to hold 12-18 hours of the recording or
whatever you do daily, and setup the recordings to go to the SSD.    i
defined use the disk with the highest percentage free to be used
first, and since my raid6 is always 90% plus the SSD always gets used.
Then nightly I move the files from the ssd recordings directory onto
the raid6 recordings directory.  This also helps when your disks start
going bad and getting badblocks, the badblocks *WILL* cause mythtv to
stop recording shows at random because of some prior choices the
developers made (sync often, and if you get more than a few seconds
behind stop recording, attempting to save some recordings).

I also put daily security camera data on the ssd and copy it over to
the raid6 device nightly.

Using the ssd for recording much reduces the load on the slower raid6
spinning disks.

You would have to have a large number of people watching at the same
time as the watching is relatively easy load, compared to the writes.

On Fri, Aug 28, 2020 at 5:42 PM Ram Ramesh <rramesh2400@gmail.com> wrote:
>
> On 8/28/20 5:12 PM, antlists wrote:
> > On 28/08/2020 18:25, Ram Ramesh wrote:
> >> I am mainly looking for IOP improvement as I want to use this RAID in
> >> mythtv environment. So multiple threads will be active and I expect
> >> cache to help with random access IOPs.
> >
> > ???
> >
> > Caching will only help in a read-after-write scenario, or a
> > read-several-times scenario.
> >
> > I'm guessing mythtv means it's a film server? Can ALL your films (or
> > at least your favourite "watch again and again" ones) fit in the
> > cache? If you watch a lot of films, chances are you'll read it from
> > disk (no advantage from the cache), and by the time you watch it again
> > it will have been evicted so you'll have to read it again.
> >
> > The other time cache may be useful, is if you're recording one thing
> > and watching another. That way, the writes can stall in cache as you
> > prioritise reading.
> >
> > Think about what is actually happening at the i/o level, and will
> > cache help?
> >
> > Cheers,
> > Wol
>
> Mythtv is a sever client DVR system. I have a client next to each of my
> TVs and one backend with large disk (this will have RAID with cache). At
> any time many clients will be accessing different programs and any
> scheduled recording will also be going on in parallel. So you will see a
> lot of seeks, but still all will be based on limited threads (I only
> have 3 TVs and may be one other PC acting as a client) So lots of IOs,
> mostly sequential, across small number of threads. I think most cache
> algorithms should be able to benefit from random access to blocks in SSD.
>
> Do you see any flaws in my argument?
>
> Regards
> Ramesh
>

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: Best way to add caching to a new raid setup.
  2020-08-28 22:59           ` antlists
@ 2020-08-29  3:08             ` R. Ramesh
  2020-08-29  5:02               ` Roman Mamedov
  0 siblings, 1 reply; 36+ messages in thread
From: R. Ramesh @ 2020-08-29  3:08 UTC (permalink / raw)
  To: antlists, Ram Ramesh, Linux Raid

On 8/28/20 5:59 PM, antlists wrote:
> On 28/08/2020 23:40, Ram Ramesh wrote:
>> On 8/28/20 5:12 PM, antlists wrote:
>>> On 28/08/2020 18:25, Ram Ramesh wrote:
>>>> I am mainly looking for IOP improvement as I want to use this RAID 
>>>> in mythtv environment. So multiple threads will be active and I 
>>>> expect cache to help with random access IOPs.
>>>
>>> ???
>>>
>>> Caching will only help in a read-after-write scenario, or a 
>>> read-several-times scenario.
>>>
>>> I'm guessing mythtv means it's a film server? Can ALL your films (or 
>>> at least your favourite "watch again and again" ones) fit in the 
>>> cache? If you watch a lot of films, chances are you'll read it from 
>>> disk (no advantage from the cache), and by the time you watch it 
>>> again it will have been evicted so you'll have to read it again.
>>>
>>> The other time cache may be useful, is if you're recording one thing 
>>> and watching another. That way, the writes can stall in cache as you 
>>> prioritise reading.
>>>
>>> Think about what is actually happening at the i/o level, and will 
>>> cache help?
>>>
>>> Cheers,
>>> Wol
>>
>> Mythtv is a sever client DVR system. I have a client next to each of 
>> my TVs and one backend with large disk (this will have RAID with 
>> cache). At any time many clients will be accessing different programs 
>> and any scheduled recording will also be going on in parallel. So you 
>> will see a lot of seeks, but still all will be based on limited 
>> threads (I only have 3 TVs and may be one other PC acting as a 
>> client) So lots of IOs, mostly sequential, across small number of 
>> threads. I think most cache algorithms should be able to benefit from 
>> random access to blocks in SSD.
>>
>> Do you see any flaws in my argument?
>>
> I don't think you've understood mine. Doesn't matter what the cache 
> algorithm is, the whole point of caching is that - when reading - it 
> is only a benefit if the different threads are reading THE SAME bits 
> of disk. So if your 3 TVs and the PC are accessing different tv 
> programs, caching won't be much use, as all the reads will be cache 
> misses.
>
> As for writing, caching can let you prioritise reading so you don't 
> get stutter while watching. And it'll speed things up if you watch 
> while recording.
>
> But basically, caching will really only benefit you if (a) your cache 
> is large enough to hold all your favourite films so they don't get 
> evicted from cache, or (b) you're in the habit of watching while 
> recording, or (c) two or more tvs are in the habit of watching the 
> same program.
>
> The question is not "how many simultaneous threads do I have?", but 
> "how many of my disk i/os are going to be cache misses?" Your argument 
> actively avoids that question. I suspect the answer is "most of them".
>
> Cheers,
> Wol

I do not know how SSD caching is implemented. I assumed it will be 
somewhat similar to memory cache (L2 vs L3 vs L4 etc). I am hoping that 
with SSD caching, reads/writes to disk will be larger in size and 
sequential within a file (similar to cache line fill in memory cache 
which results in memory bursts that are efficient). I thought that is 
what SSD caching will do to disk reads/writes. I assumed, once reads 
(ahead) and writes (assuming writeback cache) buffers data sufficiently 
in the SSD, all reads/writes will be to SSD with periodic well organized 
large transfers to disk. If I am wrong here then I do not see any point 
in SSD as a cache. My aim is not to optimize by cache hits, but optimize 
by preventing disks from thrashing back and forth seeking after every 
block read. I suppose Linux (memory) buffer cache alleviates some of 
that. I was hoping SSD will provide next level. If not, I am off in my 
understanding of SSD as a disk cache.

Regards
Ramesh


^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: Best way to add caching to a new raid setup.
  2020-08-29  0:01           ` Roger Heflin
@ 2020-08-29  3:12             ` R. Ramesh
  2020-08-29 22:36               ` Drew
  0 siblings, 1 reply; 36+ messages in thread
From: R. Ramesh @ 2020-08-29  3:12 UTC (permalink / raw)
  To: Roger Heflin, Ram Ramesh; +Cc: antlists, Linux Raid

On 8/28/20 7:01 PM, Roger Heflin wrote:
> Something I would suggest, I have found improves my mythtv experience
> is:  Get a big enough SSD to hold 12-18 hours of the recording or
> whatever you do daily, and setup the recordings to go to the SSD.    i
> defined use the disk with the highest percentage free to be used
> first, and since my raid6 is always 90% plus the SSD always gets used.
> Then nightly I move the files from the ssd recordings directory onto
> the raid6 recordings directory.  This also helps when your disks start
> going bad and getting badblocks, the badblocks *WILL* cause mythtv to
> stop recording shows at random because of some prior choices the
> developers made (sync often, and if you get more than a few seconds
> behind stop recording, attempting to save some recordings).
>
> I also put daily security camera data on the ssd and copy it over to
> the raid6 device nightly.
>
> Using the ssd for recording much reduces the load on the slower raid6
> spinning disks.
>
> You would have to have a large number of people watching at the same
> time as the watching is relatively easy load, compared to the writes.
>
> On Fri, Aug 28, 2020 at 5:42 PM Ram Ramesh <rramesh2400@gmail.com> wrote:
>> On 8/28/20 5:12 PM, antlists wrote:
>>> On 28/08/2020 18:25, Ram Ramesh wrote:
>>>> I am mainly looking for IOP improvement as I want to use this RAID in
>>>> mythtv environment. So multiple threads will be active and I expect
>>>> cache to help with random access IOPs.
>>> ???
>>>
>>> Caching will only help in a read-after-write scenario, or a
>>> read-several-times scenario.
>>>
>>> I'm guessing mythtv means it's a film server? Can ALL your films (or
>>> at least your favourite "watch again and again" ones) fit in the
>>> cache? If you watch a lot of films, chances are you'll read it from
>>> disk (no advantage from the cache), and by the time you watch it again
>>> it will have been evicted so you'll have to read it again.
>>>
>>> The other time cache may be useful, is if you're recording one thing
>>> and watching another. That way, the writes can stall in cache as you
>>> prioritise reading.
>>>
>>> Think about what is actually happening at the i/o level, and will
>>> cache help?
>>>
>>> Cheers,
>>> Wol
>> Mythtv is a sever client DVR system. I have a client next to each of my
>> TVs and one backend with large disk (this will have RAID with cache). At
>> any time many clients will be accessing different programs and any
>> scheduled recording will also be going on in parallel. So you will see a
>> lot of seeks, but still all will be based on limited threads (I only
>> have 3 TVs and may be one other PC acting as a client) So lots of IOs,
>> mostly sequential, across small number of threads. I think most cache
>> algorithms should be able to benefit from random access to blocks in SSD.
>>
>> Do you see any flaws in my argument?
>>
>> Regards
>> Ramesh
>>
I was hoping SSD caching would do what you are suggesting without daily 
copying. Based on Wol's comments, it does not. May be I misunderstood 
how SSD caching works.  I will try it any way and see what happens. If 
it does not do what I want, I will remove caching and go straight to disks.

Ramesh

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: Best way to add caching to a new raid setup.
  2020-08-29  3:08             ` R. Ramesh
@ 2020-08-29  5:02               ` Roman Mamedov
  2020-08-29 20:48                 ` Ram Ramesh
  0 siblings, 1 reply; 36+ messages in thread
From: Roman Mamedov @ 2020-08-29  5:02 UTC (permalink / raw)
  To: R. Ramesh; +Cc: antlists, Ram Ramesh, Linux Raid

On Fri, 28 Aug 2020 22:08:22 -0500
"R. Ramesh" <rramesh@verizon.net> wrote:

> I do not know how SSD caching is implemented. I assumed it will be 
> somewhat similar to memory cache (L2 vs L3 vs L4 etc). I am hoping that 
> with SSD caching, reads/writes to disk will be larger in size and 
> sequential within a file (similar to cache line fill in memory cache 
> which results in memory bursts that are efficient). I thought that is 
> what SSD caching will do to disk reads/writes. I assumed, once reads 
> (ahead) and writes (assuming writeback cache) buffers data sufficiently 
> in the SSD, all reads/writes will be to SSD with periodic well organized 
> large transfers to disk. If I am wrong here then I do not see any point 
> in SSD as a cache. My aim is not to optimize by cache hits, but optimize 
> by preventing disks from thrashing back and forth seeking after every 
> block read. I suppose Linux (memory) buffer cache alleviates some of 
> that. I was hoping SSD will provide next level. If not, I am off in my 
> understanding of SSD as a disk cache.

Just try it, as I said before with LVM it is easy to remove if it doesn't work
out. You can always go to the manual copying method or whatnot, but first why
not check if the automatic caching solution might be "good enough" for your
needs.

Yes it usually tries to avoid caching long sequential reads or writes, but
there's also quite a bit of other load on the FS, i.e. metadata. I found that
browsing directories and especially mounting the filesystem had a great
benefit from caching.

You are correct that it will try to increase performance via writeback
caching, however with LVM that needs to be enabled explicitly:
https://www.systutorials.com/docs/linux/man/7-lvmcache/#lbAK
And of course a failure of that cache SSD will mean losing some data, even if
the main array is RAID. Perhaps should consider a RAID of SSDs for cache in
that case then.

-- 
With respect,
Roman

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: Best way to add caching to a new raid setup.
  2020-08-28 20:39     ` Ram Ramesh
@ 2020-08-29 15:34       ` antlists
  2020-08-29 15:57         ` Roman Mamedov
  2020-08-30 22:16       ` Michal Soltys
  1 sibling, 1 reply; 36+ messages in thread
From: antlists @ 2020-08-29 15:34 UTC (permalink / raw)
  To: Ram Ramesh, Roman Mamedov, R. Ramesh; +Cc: Linux Raid

On 28/08/2020 21:39, Ram Ramesh wrote:
> One thing about LVM that I am not clear. Given the choice between 
> creating /mirror LV /on a VG over simple PVs and /simple LV/ over raid1 
> PVs, which is preferred method? Why?

Simplicity says have ONE raid, with ONE PV on top of it.

The other way round is you need TWO SEPARATE (at least) PV/VG/LVs, which 
you then stick a raid on top.

Basically, it's just KISS.

Cheers,
Wol

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: Best way to add caching to a new raid setup.
  2020-08-29 15:34       ` antlists
@ 2020-08-29 15:57         ` Roman Mamedov
  2020-08-29 16:26           ` Roger Heflin
  0 siblings, 1 reply; 36+ messages in thread
From: Roman Mamedov @ 2020-08-29 15:57 UTC (permalink / raw)
  To: antlists; +Cc: Ram Ramesh, R. Ramesh, Linux Raid

On Sat, 29 Aug 2020 16:34:56 +0100
antlists <antlists@youngman.org.uk> wrote:

> On 28/08/2020 21:39, Ram Ramesh wrote:
> > One thing about LVM that I am not clear. Given the choice between 
> > creating /mirror LV /on a VG over simple PVs and /simple LV/ over raid1 
> > PVs, which is preferred method? Why?
> 
> Simplicity says have ONE raid, with ONE PV on top of it.
> 
> The other way round is you need TWO SEPARATE (at least) PV/VG/LVs, which 
> you then stick a raid on top.

I believe the question was not about the order of layers, but whether to
create a RAID with mdadm and then LVM on top, vs. abandoning mdadm and using
LVM's built-in RAID support instead:
https://access.redhat.com/documentation/en-us/red_hat_enterprise_linux/6/html/logical_volume_manager_administration/mirror_create

Personally I hugely prefer mdadm, due to the familiar and convenient interface
of the program itself, as well as of /proc/mdstat.

-- 
With respect,
Roman

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: Best way to add caching to a new raid setup.
  2020-08-29 15:57         ` Roman Mamedov
@ 2020-08-29 16:26           ` Roger Heflin
  2020-08-29 20:45             ` Ram Ramesh
  0 siblings, 1 reply; 36+ messages in thread
From: Roger Heflin @ 2020-08-29 16:26 UTC (permalink / raw)
  To: Roman Mamedov; +Cc: antlists, Ram Ramesh, R. Ramesh, Linux Raid

I use mdadm raid.  From what I can tell mdadm has been around a lot
longer and is better understood by a larger group of users.   Hence if
something does go wrong there are a significant number of people that
can help.

I have been running mythtv on mdadm since early-2006, using LVM over
top of it.  I have migrated from 4x500 to 4x1.5tb and am currently on
7x3tb.

One trick I did do on the 3tb's is I did partition the disk into 4
750gb partitions and then each set of 7 makes up a PV.  Often if a
disk gets a bad block or a random io failure it only takes a single
raid from +2 down to +1, and when rebuilding them it rebuilds faster.
I created mine like below:, making sure md13 has all sdX3 disks on it
as when you have to add devices the numbers are the same.  This also
means that when enlarging it that there are 4 separate enlarges, but
no one enlarge takes more than a day.  So there might be a good reason
to say separate a 12tb drive into 6x2 or 4x3 just so if you enlarge it
it does not take a week to finish.   Also make sure to use a bitmap,
when you re-add a previous disk to it the rebuilds are much faster
especially if the drive has only been out for a few hours.

Personalities : [raid6] [raid5] [raid4]
md13 : active raid6 sdi3[9] sdg3[6] sdf3[12] sde3[10] sdd3[1] sdc3[5] sdb3[7]
      3612623360 blocks super 1.2 level 6, 512k chunk, algorithm 2
[7/7] [UUUUUUU]
      bitmap: 0/6 pages [0KB], 65536KB chunk

md14 : active raid6 sdi4[11] sdg4[6] sdf4[9] sde4[10] sdb4[7] sdd4[1] sdc4[5]
      3612623360 blocks super 1.2 level 6, 512k chunk, algorithm 2
[7/7] [UUUUUUU]
      bitmap: 1/6 pages [4KB], 65536KB chunk

md15 : active raid6 sdi5[11] sdg5[8] sdf5[9] sde5[10] sdb5[7] sdd5[1] sdc5[5]
      3612623360 blocks super 1.2 level 6, 512k chunk, algorithm 2
[7/7] [UUUUUUU]
      bitmap: 1/6 pages [4KB], 65536KB chunk

md16 : active raid6 sdi6[9] sdg6[7] sdf6[11] sde6[10] sdb6[8] sdd6[1] sdc6[5]
      3615495680 blocks super 1.2 level 6, 512k chunk, algorithm 2
[7/7] [UUUUUUU]
      bitmap: 0/6 pages [0KB], 65536KB chunk



On Sat, Aug 29, 2020 at 11:00 AM Roman Mamedov <rm@romanrm.net> wrote:
>
> On Sat, 29 Aug 2020 16:34:56 +0100
> antlists <antlists@youngman.org.uk> wrote:
>
> > On 28/08/2020 21:39, Ram Ramesh wrote:
> > > One thing about LVM that I am not clear. Given the choice between
> > > creating /mirror LV /on a VG over simple PVs and /simple LV/ over raid1
> > > PVs, which is preferred method? Why?
> >
> > Simplicity says have ONE raid, with ONE PV on top of it.
> >
> > The other way round is you need TWO SEPARATE (at least) PV/VG/LVs, which
> > you then stick a raid on top.
>
> I believe the question was not about the order of layers, but whether to
> create a RAID with mdadm and then LVM on top, vs. abandoning mdadm and using
> LVM's built-in RAID support instead:
> https://access.redhat.com/documentation/en-us/red_hat_enterprise_linux/6/html/logical_volume_manager_administration/mirror_create
>
> Personally I hugely prefer mdadm, due to the familiar and convenient interface
> of the program itself, as well as of /proc/mdstat.
>
> --
> With respect,
> Roman

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: Best way to add caching to a new raid setup.
  2020-08-29 16:26           ` Roger Heflin
@ 2020-08-29 20:45             ` Ram Ramesh
  0 siblings, 0 replies; 36+ messages in thread
From: Ram Ramesh @ 2020-08-29 20:45 UTC (permalink / raw)
  To: Roger Heflin, Roman Mamedov; +Cc: antlists, R. Ramesh, Linux Raid

On 8/29/20 11:26 AM, Roger Heflin wrote:
> I use mdadm raid.  From what I can tell mdadm has been around a lot
> longer and is better understood by a larger group of users.   Hence if
> something does go wrong there are a significant number of people that
> can help.
>
> I have been running mythtv on mdadm since early-2006, using LVM over
> top of it.  I have migrated from 4x500 to 4x1.5tb and am currently on
> 7x3tb.
>
> One trick I did do on the 3tb's is I did partition the disk into 4
> 750gb partitions and then each set of 7 makes up a PV.  Often if a
> disk gets a bad block or a random io failure it only takes a single
> raid from +2 down to +1, and when rebuilding them it rebuilds faster.
> I created mine like below:, making sure md13 has all sdX3 disks on it
> as when you have to add devices the numbers are the same.  This also
> means that when enlarging it that there are 4 separate enlarges, but
> no one enlarge takes more than a day.  So there might be a good reason
> to say separate a 12tb drive into 6x2 or 4x3 just so if you enlarge it
> it does not take a week to finish.   Also make sure to use a bitmap,
> when you re-add a previous disk to it the rebuilds are much faster
> especially if the drive has only been out for a few hours.
>
> Personalities : [raid6] [raid5] [raid4]
> md13 : active raid6 sdi3[9] sdg3[6] sdf3[12] sde3[10] sdd3[1] sdc3[5] sdb3[7]
>        3612623360 blocks super 1.2 level 6, 512k chunk, algorithm 2
> [7/7] [UUUUUUU]
>        bitmap: 0/6 pages [0KB], 65536KB chunk
>
> md14 : active raid6 sdi4[11] sdg4[6] sdf4[9] sde4[10] sdb4[7] sdd4[1] sdc4[5]
>        3612623360 blocks super 1.2 level 6, 512k chunk, algorithm 2
> [7/7] [UUUUUUU]
>        bitmap: 1/6 pages [4KB], 65536KB chunk
>
> md15 : active raid6 sdi5[11] sdg5[8] sdf5[9] sde5[10] sdb5[7] sdd5[1] sdc5[5]
>        3612623360 blocks super 1.2 level 6, 512k chunk, algorithm 2
> [7/7] [UUUUUUU]
>        bitmap: 1/6 pages [4KB], 65536KB chunk
>
> md16 : active raid6 sdi6[9] sdg6[7] sdf6[11] sde6[10] sdb6[8] sdd6[1] sdc6[5]
>        3615495680 blocks super 1.2 level 6, 512k chunk, algorithm 2
> [7/7] [UUUUUUU]
>        bitmap: 0/6 pages [0KB], 65536KB chunk
>
>
>
> On Sat, Aug 29, 2020 at 11:00 AM Roman Mamedov <rm@romanrm.net> wrote:
>> On Sat, 29 Aug 2020 16:34:56 +0100
>> antlists <antlists@youngman.org.uk> wrote:
>>
>>> On 28/08/2020 21:39, Ram Ramesh wrote:
>>>> One thing about LVM that I am not clear. Given the choice between
>>>> creating /mirror LV /on a VG over simple PVs and /simple LV/ over raid1
>>>> PVs, which is preferred method? Why?
>>> Simplicity says have ONE raid, with ONE PV on top of it.
>>>
>>> The other way round is you need TWO SEPARATE (at least) PV/VG/LVs, which
>>> you then stick a raid on top.
>> I believe the question was not about the order of layers, but whether to
>> create a RAID with mdadm and then LVM on top, vs. abandoning mdadm and using
>> LVM's built-in RAID support instead:
>> https://access.redhat.com/documentation/en-us/red_hat_enterprise_linux/6/html/logical_volume_manager_administration/mirror_create
>>
>> Personally I hugely prefer mdadm, due to the familiar and convenient interface
>> of the program itself, as well as of /proc/mdstat.
>>
>> --
>> With respect,
>> Roman
Roger,

    Good point about breaking up the disk into partitions and building 
same numbered partition in to a raid volume. Do you recommend this 
procedure even if I do only raid1? I am afraid to make raid6 over 4x14TB 
disks. I want to keep rebuild simple and not thrash the disks each time 
I (have to) replace one. Even if I split into 3tb partitions, I replace 
one disk all of them will rebuild and it will be a seek festival. I am 
hoping simplicity of raid1 will be more suited when expected URE size is 
smaller than a single disk capacity. I like the +2 redundancy of raid6 
over +1 raid1 (not doing raid1 over 3 disks as I fee that is a huge waste)

Regards
Ramesh

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: Best way to add caching to a new raid setup.
  2020-08-29  5:02               ` Roman Mamedov
@ 2020-08-29 20:48                 ` Ram Ramesh
  2020-08-29 21:26                   ` Roger Heflin
  0 siblings, 1 reply; 36+ messages in thread
From: Ram Ramesh @ 2020-08-29 20:48 UTC (permalink / raw)
  To: Roman Mamedov, R. Ramesh; +Cc: antlists, Linux Raid

On 8/29/20 12:02 AM, Roman Mamedov wrote:
> On Fri, 28 Aug 2020 22:08:22 -0500
> "R. Ramesh" <rramesh@verizon.net> wrote:
>
>> I do not know how SSD caching is implemented. I assumed it will be
>> somewhat similar to memory cache (L2 vs L3 vs L4 etc). I am hoping that
>> with SSD caching, reads/writes to disk will be larger in size and
>> sequential within a file (similar to cache line fill in memory cache
>> which results in memory bursts that are efficient). I thought that is
>> what SSD caching will do to disk reads/writes. I assumed, once reads
>> (ahead) and writes (assuming writeback cache) buffers data sufficiently
>> in the SSD, all reads/writes will be to SSD with periodic well organized
>> large transfers to disk. If I am wrong here then I do not see any point
>> in SSD as a cache. My aim is not to optimize by cache hits, but optimize
>> by preventing disks from thrashing back and forth seeking after every
>> block read. I suppose Linux (memory) buffer cache alleviates some of
>> that. I was hoping SSD will provide next level. If not, I am off in my
>> understanding of SSD as a disk cache.
> Just try it, as I said before with LVM it is easy to remove if it doesn't work
> out. You can always go to the manual copying method or whatnot, but first why
> not check if the automatic caching solution might be "good enough" for your
> needs.
>
> Yes it usually tries to avoid caching long sequential reads or writes, but
> there's also quite a bit of other load on the FS, i.e. metadata. I found that
> browsing directories and especially mounting the filesystem had a great
> benefit from caching.
>
> You are correct that it will try to increase performance via writeback
> caching, however with LVM that needs to be enabled explicitly:
> https://www.systutorials.com/docs/linux/man/7-lvmcache/#lbAK
> And of course a failure of that cache SSD will mean losing some data, even if
> the main array is RAID. Perhaps should consider a RAID of SSDs for cache in
> that case then.
>
Yes, I have 2x500GB ssds for cache. May be, I should do raid1 on them 
and use as cache volume.
I thought SSDs are more reliable and even when they begin to die, they 
become readonly before quitting.  Of course, this is all theory, and I 
do not think standards exists on how they behave when reaching EoL.

Ramesh


^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: Best way to add caching to a new raid setup.
  2020-08-29 20:48                 ` Ram Ramesh
@ 2020-08-29 21:26                   ` Roger Heflin
  2020-08-30  0:56                     ` Ram Ramesh
  0 siblings, 1 reply; 36+ messages in thread
From: Roger Heflin @ 2020-08-29 21:26 UTC (permalink / raw)
  To: Ram Ramesh; +Cc: Roman Mamedov, R. Ramesh, antlists, Linux Raid

It should be worth noting that if you buy 2 exactly the same SSD's at
the same time and use them in a mirror they are very likely to be
wearing about the same.

I am hesitant to go much bigger on disks, especially since the $$/GB
really does not change much as the disks get bigger.

And be careful of adding on a cheap sata controller as a lot of them work badly.

Most of my disks have died from bad blocks causing a section of the
disk to have some errors, or bad blocks on sections causing the array
to pause for 7 seconds.  Make sure to get a disk with SCTERC settable
(timeout when bad blocks happen, otherwise the default timeout is a
60-120seconds, but with it you can set it to no more than 7 seconds).
 In the cases where the entire disk did not just stop and is just
getting bad blocks in places, typically you have time as only a single
section is getting bad blocks, so in this case having sections does
help.    Also note that mdadm with 4 sections like I have will only
run a single rebuild at a time as mdadm understands that the
underlying disks are shared, this makes replacing a disk with 1
section or 4 sections basically work pretty much the same.  It does
the same thing on the weekly scans, it sets all 4 to scan, and it
scans 1 and defers the other scan as disks are shared.

It seems to be a disk completely dying is a lot less often than badblock issues.

On Sat, Aug 29, 2020 at 3:50 PM Ram Ramesh <rramesh2400@gmail.com> wrote:
>
> On 8/29/20 12:02 AM, Roman Mamedov wrote:
> > On Fri, 28 Aug 2020 22:08:22 -0500
> > "R. Ramesh" <rramesh@verizon.net> wrote:
> >
> >> I do not know how SSD caching is implemented. I assumed it will be
> >> somewhat similar to memory cache (L2 vs L3 vs L4 etc). I am hoping that
> >> with SSD caching, reads/writes to disk will be larger in size and
> >> sequential within a file (similar to cache line fill in memory cache
> >> which results in memory bursts that are efficient). I thought that is
> >> what SSD caching will do to disk reads/writes. I assumed, once reads
> >> (ahead) and writes (assuming writeback cache) buffers data sufficiently
> >> in the SSD, all reads/writes will be to SSD with periodic well organized
> >> large transfers to disk. If I am wrong here then I do not see any point
> >> in SSD as a cache. My aim is not to optimize by cache hits, but optimize
> >> by preventing disks from thrashing back and forth seeking after every
> >> block read. I suppose Linux (memory) buffer cache alleviates some of
> >> that. I was hoping SSD will provide next level. If not, I am off in my
> >> understanding of SSD as a disk cache.
> > Just try it, as I said before with LVM it is easy to remove if it doesn't work
> > out. You can always go to the manual copying method or whatnot, but first why
> > not check if the automatic caching solution might be "good enough" for your
> > needs.
> >
> > Yes it usually tries to avoid caching long sequential reads or writes, but
> > there's also quite a bit of other load on the FS, i.e. metadata. I found that
> > browsing directories and especially mounting the filesystem had a great
> > benefit from caching.
> >
> > You are correct that it will try to increase performance via writeback
> > caching, however with LVM that needs to be enabled explicitly:
> > https://www.systutorials.com/docs/linux/man/7-lvmcache/#lbAK
> > And of course a failure of that cache SSD will mean losing some data, even if
> > the main array is RAID. Perhaps should consider a RAID of SSDs for cache in
> > that case then.
> >
> Yes, I have 2x500GB ssds for cache. May be, I should do raid1 on them
> and use as cache volume.
> I thought SSDs are more reliable and even when they begin to die, they
> become readonly before quitting.  Of course, this is all theory, and I
> do not think standards exists on how they behave when reaching EoL.
>
> Ramesh
>

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: Best way to add caching to a new raid setup.
  2020-08-29  3:12             ` R. Ramesh
@ 2020-08-29 22:36               ` Drew
  2020-09-01 16:12                 ` Ram Ramesh
  0 siblings, 1 reply; 36+ messages in thread
From: Drew @ 2020-08-29 22:36 UTC (permalink / raw)
  To: R. Ramesh; +Cc: Ram Ramesh, antlists, Linux Raid

I know what you and Wols are talking about and I think it's actually
two separate things. Wol's is referring to traditional read caching
where it only benefits if you are reading the same thing over and over
again, cache hits. For streaming it won't help as you'll never hit the
cache.

What you are talking about is a write cache, something I have seen
implemented before. Basically the idea is for writes to hit the SSD's
first, the SSD acting as a cache or buffer between the filesystem and
the slower RAID array. To the end process they're just writing to a
disk, they don't see the SSD buffer/cache. QNAP implements this in
their NAS chassis, just not sure what the exact implementation is in
their case.

On Fri, Aug 28, 2020 at 9:14 PM R. Ramesh <rramesh@verizon.net> wrote:
>
> On 8/28/20 7:01 PM, Roger Heflin wrote:
> > Something I would suggest, I have found improves my mythtv experience
> > is:  Get a big enough SSD to hold 12-18 hours of the recording or
> > whatever you do daily, and setup the recordings to go to the SSD.    i
> > defined use the disk with the highest percentage free to be used
> > first, and since my raid6 is always 90% plus the SSD always gets used.
> > Then nightly I move the files from the ssd recordings directory onto
> > the raid6 recordings directory.  This also helps when your disks start
> > going bad and getting badblocks, the badblocks *WILL* cause mythtv to
> > stop recording shows at random because of some prior choices the
> > developers made (sync often, and if you get more than a few seconds
> > behind stop recording, attempting to save some recordings).
> >
> > I also put daily security camera data on the ssd and copy it over to
> > the raid6 device nightly.
> >
> > Using the ssd for recording much reduces the load on the slower raid6
> > spinning disks.
> >
> > You would have to have a large number of people watching at the same
> > time as the watching is relatively easy load, compared to the writes.
> >
> > On Fri, Aug 28, 2020 at 5:42 PM Ram Ramesh <rramesh2400@gmail.com> wrote:
> >> On 8/28/20 5:12 PM, antlists wrote:
> >>> On 28/08/2020 18:25, Ram Ramesh wrote:
> >>>> I am mainly looking for IOP improvement as I want to use this RAID in
> >>>> mythtv environment. So multiple threads will be active and I expect
> >>>> cache to help with random access IOPs.
> >>> ???
> >>>
> >>> Caching will only help in a read-after-write scenario, or a
> >>> read-several-times scenario.
> >>>
> >>> I'm guessing mythtv means it's a film server? Can ALL your films (or
> >>> at least your favourite "watch again and again" ones) fit in the
> >>> cache? If you watch a lot of films, chances are you'll read it from
> >>> disk (no advantage from the cache), and by the time you watch it again
> >>> it will have been evicted so you'll have to read it again.
> >>>
> >>> The other time cache may be useful, is if you're recording one thing
> >>> and watching another. That way, the writes can stall in cache as you
> >>> prioritise reading.
> >>>
> >>> Think about what is actually happening at the i/o level, and will
> >>> cache help?
> >>>
> >>> Cheers,
> >>> Wol
> >> Mythtv is a sever client DVR system. I have a client next to each of my
> >> TVs and one backend with large disk (this will have RAID with cache). At
> >> any time many clients will be accessing different programs and any
> >> scheduled recording will also be going on in parallel. So you will see a
> >> lot of seeks, but still all will be based on limited threads (I only
> >> have 3 TVs and may be one other PC acting as a client) So lots of IOs,
> >> mostly sequential, across small number of threads. I think most cache
> >> algorithms should be able to benefit from random access to blocks in SSD.
> >>
> >> Do you see any flaws in my argument?
> >>
> >> Regards
> >> Ramesh
> >>
> I was hoping SSD caching would do what you are suggesting without daily
> copying. Based on Wol's comments, it does not. May be I misunderstood
> how SSD caching works.  I will try it any way and see what happens. If
> it does not do what I want, I will remove caching and go straight to disks.
>
> Ramesh



-- 
Drew

"Nothing in life is to be feared. It is only to be understood."
--Marie Curie

"This started out as a hobby and spun horribly out of control."
-Unknown

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: Best way to add caching to a new raid setup.
  2020-08-29 21:26                   ` Roger Heflin
@ 2020-08-30  0:56                     ` Ram Ramesh
  2020-08-30 15:42                       ` Roger Heflin
  0 siblings, 1 reply; 36+ messages in thread
From: Ram Ramesh @ 2020-08-30  0:56 UTC (permalink / raw)
  To: Roger Heflin; +Cc: Roman Mamedov, R. Ramesh, antlists, Linux Raid

On 8/29/20 4:26 PM, Roger Heflin wrote:
> It should be worth noting that if you buy 2 exactly the same SSD's at
> the same time and use them in a mirror they are very likely to be
> wearing about the same.
>
> I am hesitant to go much bigger on disks, especially since the $$/GB
> really does not change much as the disks get bigger.
>
> And be careful of adding on a cheap sata controller as a lot of them work badly.
>
> Most of my disks have died from bad blocks causing a section of the
> disk to have some errors, or bad blocks on sections causing the array
> to pause for 7 seconds.  Make sure to get a disk with SCTERC settable
> (timeout when bad blocks happen, otherwise the default timeout is a
> 60-120seconds, but with it you can set it to no more than 7 seconds).
>   In the cases where the entire disk did not just stop and is just
> getting bad blocks in places, typically you have time as only a single
> section is getting bad blocks, so in this case having sections does
> help.    Also note that mdadm with 4 sections like I have will only
> run a single rebuild at a time as mdadm understands that the
> underlying disks are shared, this makes replacing a disk with 1
> section or 4 sections basically work pretty much the same.  It does
> the same thing on the weekly scans, it sets all 4 to scan, and it
> scans 1 and defers the other scan as disks are shared.
>
> It seems to be a disk completely dying is a lot less often than badblock issues.
>
> On Sat, Aug 29, 2020 at 3:50 PM Ram Ramesh <rramesh2400@gmail.com> wrote:
>> On 8/29/20 12:02 AM, Roman Mamedov wrote:
>>> On Fri, 28 Aug 2020 22:08:22 -0500
>>> "R. Ramesh" <rramesh@verizon.net> wrote:
>>>
>>>> I do not know how SSD caching is implemented. I assumed it will be
>>>> somewhat similar to memory cache (L2 vs L3 vs L4 etc). I am hoping that
>>>> with SSD caching, reads/writes to disk will be larger in size and
>>>> sequential within a file (similar to cache line fill in memory cache
>>>> which results in memory bursts that are efficient). I thought that is
>>>> what SSD caching will do to disk reads/writes. I assumed, once reads
>>>> (ahead) and writes (assuming writeback cache) buffers data sufficiently
>>>> in the SSD, all reads/writes will be to SSD with periodic well organized
>>>> large transfers to disk. If I am wrong here then I do not see any point
>>>> in SSD as a cache. My aim is not to optimize by cache hits, but optimize
>>>> by preventing disks from thrashing back and forth seeking after every
>>>> block read. I suppose Linux (memory) buffer cache alleviates some of
>>>> that. I was hoping SSD will provide next level. If not, I am off in my
>>>> understanding of SSD as a disk cache.
>>> Just try it, as I said before with LVM it is easy to remove if it doesn't work
>>> out. You can always go to the manual copying method or whatnot, but first why
>>> not check if the automatic caching solution might be "good enough" for your
>>> needs.
>>>
>>> Yes it usually tries to avoid caching long sequential reads or writes, but
>>> there's also quite a bit of other load on the FS, i.e. metadata. I found that
>>> browsing directories and especially mounting the filesystem had a great
>>> benefit from caching.
>>>
>>> You are correct that it will try to increase performance via writeback
>>> caching, however with LVM that needs to be enabled explicitly:
>>> https://www.systutorials.com/docs/linux/man/7-lvmcache/#lbAK
>>> And of course a failure of that cache SSD will mean losing some data, even if
>>> the main array is RAID. Perhaps should consider a RAID of SSDs for cache in
>>> that case then.
>>>
>> Yes, I have 2x500GB ssds for cache. May be, I should do raid1 on them
>> and use as cache volume.
>> I thought SSDs are more reliable and even when they begin to die, they
>> become readonly before quitting.  Of course, this is all theory, and I
>> do not think standards exists on how they behave when reaching EoL.
>>
>> Ramesh
>>
My SSDs are from different companies and bought at different times 
(2019/2016, I think).

I have not had many hard disk failures. However, each time I had one, it 
has been a total death. So, I am a bit biased. May be with sections, I 
can replace one md at a time and letting others run degraded. I am sure 
there other tricks. I am simply saying it is a lot of reads/writes, and 
of course computation, in cold replacement of disks in RAID6 vs. RAID1.

Yes, larger disks are not cheaper, but they use one SATA port vs. 
smaller disks. Also, they use less power in the long run (mine run 
24x7). That is why I have a policy of replacing disks once 2x size disks 
(compared to what I currently own) become commonplace.

I have a LSI 9211 SAS HBA which is touted to be reliable by this community.

Regards
Ramesh


^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: Best way to add caching to a new raid setup.
  2020-08-30  0:56                     ` Ram Ramesh
@ 2020-08-30 15:42                       ` Roger Heflin
  2020-08-30 17:19                         ` Ram Ramesh
  2020-09-11 18:39                         ` R. Ramesh
  0 siblings, 2 replies; 36+ messages in thread
From: Roger Heflin @ 2020-08-30 15:42 UTC (permalink / raw)
  To: Ram Ramesh; +Cc: Roman Mamedov, R. Ramesh, antlists, Linux Raid

The LSI should be a good controller as long as you the HBA fw and not
the raid fw.

I use an LSI with hba + the 8 AMD chipset sata ports, currently I have
12 ports cabled to hot swap bays but only 7+boot disk used.

How many recording do you think you will have and how many
clients/watchers?  With the SSD handling the writes for recording my
disks actually spin down if no one is watching anything.

The other trick the partitions let me do is initially I moved from 1.5
-> 3tb disks (2x750 -> 4x750) and once I got 3-3tbs in I added the 2
more partitions raid6(+1.5TB) (I bought the 3tb drives slowly), then
the next 3tb gets added to all 4 partitions (+3TB).

On reads at least each disk can do at least 50 iops, and for the most
part the disks themselves are very likely to cache the entire track
the head goes over, so a 2nd sequential read likely comes from the
disk's read cache and does not have to actually be read.  So several
sequential workloads jumping back and forth do not behave as bad as
one would expect.  Write are a different story and a lot more
expensive.  I isloate those to ssd and copy them in the middle of the
night when it is low activity.  And since they are being copied as big
fast streams one file at a time they end up with very few fragments
and write very quickly.   The way I have mine setup mythtv will find
the file whether it is on the ssd recording directory or the raid
recording directory, so when I mv the files nothing has to be done
except the mv.


On Sat, Aug 29, 2020 at 7:56 PM Ram Ramesh <rramesh2400@gmail.com> wrote:
>
> On 8/29/20 4:26 PM, Roger Heflin wrote:
> > It should be worth noting that if you buy 2 exactly the same SSD's at
> > the same time and use them in a mirror they are very likely to be
> > wearing about the same.
> >
> > I am hesitant to go much bigger on disks, especially since the $$/GB
> > really does not change much as the disks get bigger.
> >
> > And be careful of adding on a cheap sata controller as a lot of them work badly.
> >
> > Most of my disks have died from bad blocks causing a section of the
> > disk to have some errors, or bad blocks on sections causing the array
> > to pause for 7 seconds.  Make sure to get a disk with SCTERC settable
> > (timeout when bad blocks happen, otherwise the default timeout is a
> > 60-120seconds, but with it you can set it to no more than 7 seconds).
> >   In the cases where the entire disk did not just stop and is just
> > getting bad blocks in places, typically you have time as only a single
> > section is getting bad blocks, so in this case having sections does
> > help.    Also note that mdadm with 4 sections like I have will only
> > run a single rebuild at a time as mdadm understands that the
> > underlying disks are shared, this makes replacing a disk with 1
> > section or 4 sections basically work pretty much the same.  It does
> > the same thing on the weekly scans, it sets all 4 to scan, and it
> > scans 1 and defers the other scan as disks are shared.
> >
> > It seems to be a disk completely dying is a lot less often than badblock issues.
> >
> > On Sat, Aug 29, 2020 at 3:50 PM Ram Ramesh <rramesh2400@gmail.com> wrote:
> >> On 8/29/20 12:02 AM, Roman Mamedov wrote:
> >>> On Fri, 28 Aug 2020 22:08:22 -0500
> >>> "R. Ramesh" <rramesh@verizon.net> wrote:
> >>>
> >>>> I do not know how SSD caching is implemented. I assumed it will be
> >>>> somewhat similar to memory cache (L2 vs L3 vs L4 etc). I am hoping that
> >>>> with SSD caching, reads/writes to disk will be larger in size and
> >>>> sequential within a file (similar to cache line fill in memory cache
> >>>> which results in memory bursts that are efficient). I thought that is
> >>>> what SSD caching will do to disk reads/writes. I assumed, once reads
> >>>> (ahead) and writes (assuming writeback cache) buffers data sufficiently
> >>>> in the SSD, all reads/writes will be to SSD with periodic well organized
> >>>> large transfers to disk. If I am wrong here then I do not see any point
> >>>> in SSD as a cache. My aim is not to optimize by cache hits, but optimize
> >>>> by preventing disks from thrashing back and forth seeking after every
> >>>> block read. I suppose Linux (memory) buffer cache alleviates some of
> >>>> that. I was hoping SSD will provide next level. If not, I am off in my
> >>>> understanding of SSD as a disk cache.
> >>> Just try it, as I said before with LVM it is easy to remove if it doesn't work
> >>> out. You can always go to the manual copying method or whatnot, but first why
> >>> not check if the automatic caching solution might be "good enough" for your
> >>> needs.
> >>>
> >>> Yes it usually tries to avoid caching long sequential reads or writes, but
> >>> there's also quite a bit of other load on the FS, i.e. metadata. I found that
> >>> browsing directories and especially mounting the filesystem had a great
> >>> benefit from caching.
> >>>
> >>> You are correct that it will try to increase performance via writeback
> >>> caching, however with LVM that needs to be enabled explicitly:
> >>> https://www.systutorials.com/docs/linux/man/7-lvmcache/#lbAK
> >>> And of course a failure of that cache SSD will mean losing some data, even if
> >>> the main array is RAID. Perhaps should consider a RAID of SSDs for cache in
> >>> that case then.
> >>>
> >> Yes, I have 2x500GB ssds for cache. May be, I should do raid1 on them
> >> and use as cache volume.
> >> I thought SSDs are more reliable and even when they begin to die, they
> >> become readonly before quitting.  Of course, this is all theory, and I
> >> do not think standards exists on how they behave when reaching EoL.
> >>
> >> Ramesh
> >>
> My SSDs are from different companies and bought at different times
> (2019/2016, I think).
>
> I have not had many hard disk failures. However, each time I had one, it
> has been a total death. So, I am a bit biased. May be with sections, I
> can replace one md at a time and letting others run degraded. I am sure
> there other tricks. I am simply saying it is a lot of reads/writes, and
> of course computation, in cold replacement of disks in RAID6 vs. RAID1.
>
> Yes, larger disks are not cheaper, but they use one SATA port vs.
> smaller disks. Also, they use less power in the long run (mine run
> 24x7). That is why I have a policy of replacing disks once 2x size disks
> (compared to what I currently own) become commonplace.
>
> I have a LSI 9211 SAS HBA which is touted to be reliable by this community.
>
> Regards
> Ramesh
>

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: Best way to add caching to a new raid setup.
  2020-08-30 15:42                       ` Roger Heflin
@ 2020-08-30 17:19                         ` Ram Ramesh
  2020-09-11 18:39                         ` R. Ramesh
  1 sibling, 0 replies; 36+ messages in thread
From: Ram Ramesh @ 2020-08-30 17:19 UTC (permalink / raw)
  To: Roger Heflin; +Cc: Linux Raid

On 8/30/20 10:42 AM, Roger Heflin wrote:
> The LSI should be a good controller as long as you the HBA fw and not
> the raid fw.
>
> I use an LSI with hba + the 8 AMD chipset sata ports, currently I have
> 12 ports cabled to hot swap bays but only 7+boot disk used.
>
> How many recording do you think you will have and how many
> clients/watchers?  With the SSD handling the writes for recording my
> disks actually spin down if no one is watching anything.
>
> The other trick the partitions let me do is initially I moved from 1.5
> -> 3tb disks (2x750 -> 4x750) and once I got 3-3tbs in I added the 2
> more partitions raid6(+1.5TB) (I bought the 3tb drives slowly), then
> the next 3tb gets added to all 4 partitions (+3TB).
>
> On reads at least each disk can do at least 50 iops, and for the most
> part the disks themselves are very likely to cache the entire track
> the head goes over, so a 2nd sequential read likely comes from the
> disk's read cache and does not have to actually be read.  So several
> sequential workloads jumping back and forth do not behave as bad as
> one would expect.  Write are a different story and a lot more
> expensive.  I isloate those to ssd and copy them in the middle of the
> night when it is low activity.  And since they are being copied as big
> fast streams one file at a time they end up with very few fragments
> and write very quickly.   The way I have mine setup mythtv will find
> the file whether it is on the ssd recording directory or the raid
> recording directory, so when I mv the files nothing has to be done
> except the mv.
>
>
> On Sat, Aug 29, 2020 at 7:56 PM Ram Ramesh <rramesh2400@gmail.com> wrote:
>> On 8/29/20 4:26 PM, Roger Heflin wrote:
>>> It should be worth noting that if you buy 2 exactly the same SSD's at
>>> the same time and use them in a mirror they are very likely to be
>>> wearing about the same.
>>>
>>> I am hesitant to go much bigger on disks, especially since the $$/GB
>>> really does not change much as the disks get bigger.
>>>
>>> And be careful of adding on a cheap sata controller as a lot of them work badly.
>>>
>>> Most of my disks have died from bad blocks causing a section of the
>>> disk to have some errors, or bad blocks on sections causing the array
>>> to pause for 7 seconds.  Make sure to get a disk with SCTERC settable
>>> (timeout when bad blocks happen, otherwise the default timeout is a
>>> 60-120seconds, but with it you can set it to no more than 7 seconds).
>>>    In the cases where the entire disk did not just stop and is just
>>> getting bad blocks in places, typically you have time as only a single
>>> section is getting bad blocks, so in this case having sections does
>>> help.    Also note that mdadm with 4 sections like I have will only
>>> run a single rebuild at a time as mdadm understands that the
>>> underlying disks are shared, this makes replacing a disk with 1
>>> section or 4 sections basically work pretty much the same.  It does
>>> the same thing on the weekly scans, it sets all 4 to scan, and it
>>> scans 1 and defers the other scan as disks are shared.
>>>
>>> It seems to be a disk completely dying is a lot less often than badblock issues.
>>>
>>> On Sat, Aug 29, 2020 at 3:50 PM Ram Ramesh <rramesh2400@gmail.com> wrote:
>>>> On 8/29/20 12:02 AM, Roman Mamedov wrote:
>>>>> On Fri, 28 Aug 2020 22:08:22 -0500
>>>>> "R. Ramesh" <rramesh@verizon.net> wrote:
>>>>>
>>>>>> I do not know how SSD caching is implemented. I assumed it will be
>>>>>> somewhat similar to memory cache (L2 vs L3 vs L4 etc). I am hoping that
>>>>>> with SSD caching, reads/writes to disk will be larger in size and
>>>>>> sequential within a file (similar to cache line fill in memory cache
>>>>>> which results in memory bursts that are efficient). I thought that is
>>>>>> what SSD caching will do to disk reads/writes. I assumed, once reads
>>>>>> (ahead) and writes (assuming writeback cache) buffers data sufficiently
>>>>>> in the SSD, all reads/writes will be to SSD with periodic well organized
>>>>>> large transfers to disk. If I am wrong here then I do not see any point
>>>>>> in SSD as a cache. My aim is not to optimize by cache hits, but optimize
>>>>>> by preventing disks from thrashing back and forth seeking after every
>>>>>> block read. I suppose Linux (memory) buffer cache alleviates some of
>>>>>> that. I was hoping SSD will provide next level. If not, I am off in my
>>>>>> understanding of SSD as a disk cache.
>>>>> Just try it, as I said before with LVM it is easy to remove if it doesn't work
>>>>> out. You can always go to the manual copying method or whatnot, but first why
>>>>> not check if the automatic caching solution might be "good enough" for your
>>>>> needs.
>>>>>
>>>>> Yes it usually tries to avoid caching long sequential reads or writes, but
>>>>> there's also quite a bit of other load on the FS, i.e. metadata. I found that
>>>>> browsing directories and especially mounting the filesystem had a great
>>>>> benefit from caching.
>>>>>
>>>>> You are correct that it will try to increase performance via writeback
>>>>> caching, however with LVM that needs to be enabled explicitly:
>>>>> https://www.systutorials.com/docs/linux/man/7-lvmcache/#lbAK
>>>>> And of course a failure of that cache SSD will mean losing some data, even if
>>>>> the main array is RAID. Perhaps should consider a RAID of SSDs for cache in
>>>>> that case then.
>>>>>
>>>> Yes, I have 2x500GB ssds for cache. May be, I should do raid1 on them
>>>> and use as cache volume.
>>>> I thought SSDs are more reliable and even when they begin to die, they
>>>> become readonly before quitting.  Of course, this is all theory, and I
>>>> do not think standards exists on how they behave when reaching EoL.
>>>>
>>>> Ramesh
>>>>
>> My SSDs are from different companies and bought at different times
>> (2019/2016, I think).
>>
>> I have not had many hard disk failures. However, each time I had one, it
>> has been a total death. So, I am a bit biased. May be with sections, I
>> can replace one md at a time and letting others run degraded. I am sure
>> there other tricks. I am simply saying it is a lot of reads/writes, and
>> of course computation, in cold replacement of disks in RAID6 vs. RAID1.
>>
>> Yes, larger disks are not cheaper, but they use one SATA port vs.
>> smaller disks. Also, they use less power in the long run (mine run
>> 24x7). That is why I have a policy of replacing disks once 2x size disks
>> (compared to what I currently own) become commonplace.
>>
>> I have a LSI 9211 SAS HBA which is touted to be reliable by this community.
>>
>> Regards
>> Ramesh
>>
Roger,

   Thanks for the details on your SSD setup. Yes, mythtv is supposed to 
find the file from storage group entries regardless of the actual 
location and thus mv is all that is required. However, I have never 
tried to use this feature though. So, it will be a new thing for me.  
Like I said before, I will try the LVM cache and see my disk activities 
are better. If that is not to my satisfaction, I will remove the cache 
and add it differently like you have. I only have a 500GB SSD, but I do 
not think daily recording will be anywhere close to that size.

Regards
Ramesh


^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: Best way to add caching to a new raid setup.
  2020-08-28 20:39     ` Ram Ramesh
  2020-08-29 15:34       ` antlists
@ 2020-08-30 22:16       ` Michal Soltys
  1 sibling, 0 replies; 36+ messages in thread
From: Michal Soltys @ 2020-08-30 22:16 UTC (permalink / raw)
  To: Ram Ramesh, Roman Mamedov, R. Ramesh; +Cc: Linux Raid

On 20/08/28 22:39, Ram Ramesh wrote:
> On 8/28/20 12:46 PM, Roman Mamedov wrote:
>> On Thu, 27 Aug 2020 21:31:07 -0500
>> Also my impression is that LVM has more solid and reliable codebase, but
>> bcache might provide a somewhat better the performance boost due to 
>> caching.
>>
> Thanks for the info on bcache. I do not think it will be my favorite. I 
> am going to try LVM cache as my first choice. Note that the new disks 
> will be spare disks for some time and I will be able to try out a few 
> things before deciding to put it into use.

I had some _very nasty_ adventures with LVM's cache, that ended with 
rather massive corruption at the end of the last year. I described it in:

https://github.com/lvmteam/lvm2/issues/26

though not much from that was answered or commented, except confirmation 
that flushing issue was fixed.

At the same time I have yet to have bcache failing on me (and it - so 
far flawlessly - did survive kernel panics (bugged nic drivers) and 
disks dying).

YMMV of course, just _make sure_ to have backups. And make sure to test 
it thoroughly in your setup (including things like hard reset).


^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: Best way to add caching to a new raid setup.
  2020-08-28 22:40         ` Ram Ramesh
  2020-08-28 22:59           ` antlists
  2020-08-29  0:01           ` Roger Heflin
@ 2020-08-31 19:20           ` Nix
  2 siblings, 0 replies; 36+ messages in thread
From: Nix @ 2020-08-31 19:20 UTC (permalink / raw)
  To: Ram Ramesh; +Cc: antlists, R. Ramesh, Linux Raid

On 28 Aug 2020, Ram Ramesh verbalised:
> Mythtv is a sever client DVR system. I have a client next to each of
> my TVs and one backend with large disk (this will have RAID with
> cache). At any time many clients will be accessing different programs
> and any scheduled recording will also be going on in parallel. So you
> will see a lot of seeks, but still all will be based on limited
> threads (I only have 3 TVs and may be one other PC acting as a client)
> So lots of IOs, mostly sequential, across small number of threads. I
> think most cache algorithms should be able to benefit from random
> access to blocks in SSD.

FYI: bcache documents how its caching works. Assuming you ignore the
write cache (which I recommend, since nearly all the data corruption and
starvation bugs in bcache have been in the write caching code, and it
doesn't look like write caching would benefit your use case anyway:
if you want an ssd write cache, just use RAID journalling), bcache is
very hard to break: if by some mischance the cache does become corrupted
you can decouple it from the backing RAID array and just keep using it
until you recreate the cache device and reattach it.

bcache tracks the "sequentiality" of recent reads and avoids caching big
sequential I/O on the grounds that it's a likely waste of SSD lifetime
to do so: HDDs can do contiguous reads quite fast: what you want to
cache is seeky reads. This means that your mythtv reads will only be
cached when there are multiple contending reads going on. This doesn't
seem terribly useful, since for a media player any given contending read
is probably not going to be of metadata and is probably not going to be
repeated for a very long time (unless you particularly like repeatedly
rewatching the same things). So you won't get much of a speedup or
reduction in contention.

Where caches like bcache and the LVM cache help is when small seeky
reads are likely to be repeated, which is very common with filesystem
metadata and a lot of other workloads, but not common at all for media
files in my experience.

(FYI: my setup is spinning rust <- md-raid6 <- bcache <- LVM PV, with
one LVM PV omitting the bcache layer and both combined into one VG. My
bulk media storage is on the non-bcached PV. The filesystems are almost
all xfs, some of them with cryptsetups in the way too. One warning:
bcache works by stuffing a header onto the data, and does *not* pass
through RAID stripe size info etc: you'll need to pass in a suitable
--data-offset to make-bcache to ensure that I/O is RAID-aligned, and
pass in the stripe size etc to the underlying oeprations. I did this by
mkfsing everything and then doing a blktrace of the underlying RAID
devices while I did some simple I/Os to make sure the RAID layer was
doing nice stripe-aligned I/O. This is probably total overkill for a
media server, but this was my do-everything server, so I cared very much
about small random I/O performance. This was particularly fun given that
one LVM PV had a bcache header and the other one didn't, and I wanted
the filesystems to have suitable alignment for *both* of them at once...
it was distinctly fiddly to get right.)

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: Best way to add caching to a new raid setup.
  2020-08-29 22:36               ` Drew
@ 2020-09-01 16:12                 ` Ram Ramesh
  2020-09-01 17:01                   ` Kai Stian Olstad
  2020-09-14 11:40                   ` Nix
  0 siblings, 2 replies; 36+ messages in thread
From: Ram Ramesh @ 2020-09-01 16:12 UTC (permalink / raw)
  To: Drew; +Cc: antlists, Linux Raid

On 8/29/20 5:36 PM, Drew wrote:
> I know what you and Wols are talking about and I think it's actually
> two separate things. Wol's is referring to traditional read caching
> where it only benefits if you are reading the same thing over and over
> again, cache hits. For streaming it won't help as you'll never hit the
> cache.
>
> What you are talking about is a write cache, something I have seen
> implemented before. Basically the idea is for writes to hit the SSD's
> first, the SSD acting as a cache or buffer between the filesystem and
> the slower RAID array. To the end process they're just writing to a
> disk, they don't see the SSD buffer/cache. QNAP implements this in
> their NAS chassis, just not sure what the exact implementation is in
> their case.
>
> On Fri, Aug 28, 2020 at 9:14 PM R. Ramesh <rramesh@verizon.net> wrote:
>> On 8/28/20 7:01 PM, Roger Heflin wrote:
>>> Something I would suggest, I have found improves my mythtv experience
>>> is:  Get a big enough SSD to hold 12-18 hours of the recording or
>>> whatever you do daily, and setup the recordings to go to the SSD.    i
>>> defined use the disk with the highest percentage free to be used
>>> first, and since my raid6 is always 90% plus the SSD always gets used.
>>> Then nightly I move the files from the ssd recordings directory onto
>>> the raid6 recordings directory.  This also helps when your disks start
>>> going bad and getting badblocks, the badblocks *WILL* cause mythtv to
>>> stop recording shows at random because of some prior choices the
>>> developers made (sync often, and if you get more than a few seconds
>>> behind stop recording, attempting to save some recordings).
>>>
>>> I also put daily security camera data on the ssd and copy it over to
>>> the raid6 device nightly.
>>>
>>> Using the ssd for recording much reduces the load on the slower raid6
>>> spinning disks.
>>>
>>> You would have to have a large number of people watching at the same
>>> time as the watching is relatively easy load, compared to the writes.
>>>
>>> On Fri, Aug 28, 2020 at 5:42 PM Ram Ramesh <rramesh2400@gmail.com> wrote:
>>>> On 8/28/20 5:12 PM, antlists wrote:
>>>>> On 28/08/2020 18:25, Ram Ramesh wrote:
>>>>>> I am mainly looking for IOP improvement as I want to use this RAID in
>>>>>> mythtv environment. So multiple threads will be active and I expect
>>>>>> cache to help with random access IOPs.
>>>>> ???
>>>>>
>>>>> Caching will only help in a read-after-write scenario, or a
>>>>> read-several-times scenario.
>>>>>
>>>>> I'm guessing mythtv means it's a film server? Can ALL your films (or
>>>>> at least your favourite "watch again and again" ones) fit in the
>>>>> cache? If you watch a lot of films, chances are you'll read it from
>>>>> disk (no advantage from the cache), and by the time you watch it again
>>>>> it will have been evicted so you'll have to read it again.
>>>>>
>>>>> The other time cache may be useful, is if you're recording one thing
>>>>> and watching another. That way, the writes can stall in cache as you
>>>>> prioritise reading.
>>>>>
>>>>> Think about what is actually happening at the i/o level, and will
>>>>> cache help?
>>>>>
>>>>> Cheers,
>>>>> Wol
>>>> Mythtv is a sever client DVR system. I have a client next to each of my
>>>> TVs and one backend with large disk (this will have RAID with cache). At
>>>> any time many clients will be accessing different programs and any
>>>> scheduled recording will also be going on in parallel. So you will see a
>>>> lot of seeks, but still all will be based on limited threads (I only
>>>> have 3 TVs and may be one other PC acting as a client) So lots of IOs,
>>>> mostly sequential, across small number of threads. I think most cache
>>>> algorithms should be able to benefit from random access to blocks in SSD.
>>>>
>>>> Do you see any flaws in my argument?
>>>>
>>>> Regards
>>>> Ramesh
>>>>
>> I was hoping SSD caching would do what you are suggesting without daily
>> copying. Based on Wol's comments, it does not. May be I misunderstood
>> how SSD caching works.  I will try it any way and see what happens. If
>> it does not do what I want, I will remove caching and go straight to disks.
>>
>> Ramesh
>
>
After thinking through this, I really like the idea of simply recording 
programs to SSD and move one file at a time based on some aging 
algorithms of my own. I will move files back and forth as needed during 
overnight hours creating my own caching effect. As long as I keep the 
original (renamed) and cache the ones needed with correct name, mythtv 
will find the cached copy. When mythtv complains about something 
missing, I can manually look at the renamed backup copy and make the 
corrections. Unless my thinking is badly broken, this should work.

I really wished overlay fs had a nice merge/clean feature that will 
allow us to move overlay items to underlying file system and start over 
the overlay. All I need is file level caching and not block level caching.

Ramesh


^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: Best way to add caching to a new raid setup.
  2020-09-01 16:12                 ` Ram Ramesh
@ 2020-09-01 17:01                   ` Kai Stian Olstad
  2020-09-02 18:17                     ` Ram Ramesh
  2020-09-14 11:40                   ` Nix
  1 sibling, 1 reply; 36+ messages in thread
From: Kai Stian Olstad @ 2020-09-01 17:01 UTC (permalink / raw)
  To: Ram Ramesh; +Cc: Drew, antlists, Linux Raid

On Tue, Sep 01, 2020 at 11:12:40AM -0500, Ram Ramesh wrote:
> I really wished overlay fs had a nice merge/clean feature that will allow us
> to move overlay items to underlying file system and start over the overlay.

You should check out mergerfs[1], it can merge multiple directories together
on different disks and you can transparently move files between them.
Mergerfs have a lot of other features too that you might find useful.

[1] https://github.com/trapexit/mergerfs/

-- 
Kai Stian Olstad

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: Best way to add caching to a new raid setup.
  2020-09-01 17:01                   ` Kai Stian Olstad
@ 2020-09-02 18:17                     ` Ram Ramesh
  0 siblings, 0 replies; 36+ messages in thread
From: Ram Ramesh @ 2020-09-02 18:17 UTC (permalink / raw)
  To: Kai Stian Olstad; +Cc: Drew, antlists, Linux Raid

On 9/1/20 12:01 PM, Kai Stian Olstad wrote:
> On Tue, Sep 01, 2020 at 11:12:40AM -0500, Ram Ramesh wrote:
>> I really wished overlay fs had a nice merge/clean feature that will allow us
>> to move overlay items to underlying file system and start over the overlay.
> You should check out mergerfs[1], it can merge multiple directories together
> on different disks and you can transparently move files between them.
> Mergerfs have a lot of other features too that you might find useful.
>
> [1] https://github.com/trapexit/mergerfs/
>
Kai,

   Thanks. It is interesting. However, my starting point for this 
discussion was improving performance and this one seems a bit backward 
as it uses FUSE. I still think it is a good step. So I will learn a bit 
more to see if I can use it.

Regards
Ramesh


^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: Best way to add caching to a new raid setup.
  2020-08-30 15:42                       ` Roger Heflin
  2020-08-30 17:19                         ` Ram Ramesh
@ 2020-09-11 18:39                         ` R. Ramesh
  2020-09-11 20:37                           ` Roger Heflin
  1 sibling, 1 reply; 36+ messages in thread
From: R. Ramesh @ 2020-09-11 18:39 UTC (permalink / raw)
  To: Roger Heflin; +Cc: Linux Raid

On 8/30/20 10:42 AM, Roger Heflin wrote:
> The LSI should be a good controller as long as you the HBA fw and not
> the raid fw.
>
> I use an LSI with hba + the 8 AMD chipset sata ports, currently I have
> 12 ports cabled to hot swap bays but only 7+boot disk used.
>
> How many recording do you think you will have and how many
> clients/watchers?  With the SSD handling the writes for recording my
> disks actually spin down if no one is watching anything.
>
> The other trick the partitions let me do is initially I moved from 1.5
> -> 3tb disks (2x750 -> 4x750) and once I got 3-3tbs in I added the 2
> more partitions raid6(+1.5TB) (I bought the 3tb drives slowly), then
> the next 3tb gets added to all 4 partitions (+3TB).
>
> On reads at least each disk can do at least 50 iops, and for the most
> part the disks themselves are very likely to cache the entire track
> the head goes over, so a 2nd sequential read likely comes from the
> disk's read cache and does not have to actually be read.  So several
> sequential workloads jumping back and forth do not behave as bad as
> one would expect.  Write are a different story and a lot more
> expensive.  I isloate those to ssd and copy them in the middle of the
> night when it is low activity.  And since they are being copied as big
> fast streams one file at a time they end up with very few fragments
> and write very quickly.   The way I have mine setup mythtv will find
> the file whether it is on the ssd recording directory or the raid
> recording directory, so when I mv the files nothing has to be done
> except the mv.
>
>
> On Sat, Aug 29, 2020 at 7:56 PM Ram Ramesh <rramesh2400@gmail.com> wrote:
>> On 8/29/20 4:26 PM, Roger Heflin wrote:
>>> It should be worth noting that if you buy 2 exactly the same SSD's at
>>> the same time and use them in a mirror they are very likely to be
>>> wearing about the same.
>>>
>>> I am hesitant to go much bigger on disks, especially since the $$/GB
>>> really does not change much as the disks get bigger.
>>>
>>> And be careful of adding on a cheap sata controller as a lot of them work badly.
>>>
>>> Most of my disks have died from bad blocks causing a section of the
>>> disk to have some errors, or bad blocks on sections causing the array
>>> to pause for 7 seconds.  Make sure to get a disk with SCTERC settable
>>> (timeout when bad blocks happen, otherwise the default timeout is a
>>> 60-120seconds, but with it you can set it to no more than 7 seconds).
>>>    In the cases where the entire disk did not just stop and is just
>>> getting bad blocks in places, typically you have time as only a single
>>> section is getting bad blocks, so in this case having sections does
>>> help.    Also note that mdadm with 4 sections like I have will only
>>> run a single rebuild at a time as mdadm understands that the
>>> underlying disks are shared, this makes replacing a disk with 1
>>> section or 4 sections basically work pretty much the same.  It does
>>> the same thing on the weekly scans, it sets all 4 to scan, and it
>>> scans 1 and defers the other scan as disks are shared.
>>>
>>> It seems to be a disk completely dying is a lot less often than badblock issues.
>>>
>>> On Sat, Aug 29, 2020 at 3:50 PM Ram Ramesh <rramesh2400@gmail.com> wrote:
>>>> On 8/29/20 12:02 AM, Roman Mamedov wrote:
>>>>> On Fri, 28 Aug 2020 22:08:22 -0500
>>>>> "R. Ramesh" <rramesh@verizon.net> wrote:
>>>>>
>>>>>> I do not know how SSD caching is implemented. I assumed it will be
>>>>>> somewhat similar to memory cache (L2 vs L3 vs L4 etc). I am hoping that
>>>>>> with SSD caching, reads/writes to disk will be larger in size and
>>>>>> sequential within a file (similar to cache line fill in memory cache
>>>>>> which results in memory bursts that are efficient). I thought that is
>>>>>> what SSD caching will do to disk reads/writes. I assumed, once reads
>>>>>> (ahead) and writes (assuming writeback cache) buffers data sufficiently
>>>>>> in the SSD, all reads/writes will be to SSD with periodic well organized
>>>>>> large transfers to disk. If I am wrong here then I do not see any point
>>>>>> in SSD as a cache. My aim is not to optimize by cache hits, but optimize
>>>>>> by preventing disks from thrashing back and forth seeking after every
>>>>>> block read. I suppose Linux (memory) buffer cache alleviates some of
>>>>>> that. I was hoping SSD will provide next level. If not, I am off in my
>>>>>> understanding of SSD as a disk cache.
>>>>> Just try it, as I said before with LVM it is easy to remove if it doesn't work
>>>>> out. You can always go to the manual copying method or whatnot, but first why
>>>>> not check if the automatic caching solution might be "good enough" for your
>>>>> needs.
>>>>>
>>>>> Yes it usually tries to avoid caching long sequential reads or writes, but
>>>>> there's also quite a bit of other load on the FS, i.e. metadata. I found that
>>>>> browsing directories and especially mounting the filesystem had a great
>>>>> benefit from caching.
>>>>>
>>>>> You are correct that it will try to increase performance via writeback
>>>>> caching, however with LVM that needs to be enabled explicitly:
>>>>> https://www.systutorials.com/docs/linux/man/7-lvmcache/#lbAK
>>>>> And of course a failure of that cache SSD will mean losing some data, even if
>>>>> the main array is RAID. Perhaps should consider a RAID of SSDs for cache in
>>>>> that case then.
>>>>>
>>>> Yes, I have 2x500GB ssds for cache. May be, I should do raid1 on them
>>>> and use as cache volume.
>>>> I thought SSDs are more reliable and even when they begin to die, they
>>>> become readonly before quitting.  Of course, this is all theory, and I
>>>> do not think standards exists on how they behave when reaching EoL.
>>>>
>>>> Ramesh
>>>>
>> My SSDs are from different companies and bought at different times
>> (2019/2016, I think).
>>
>> I have not had many hard disk failures. However, each time I had one, it
>> has been a total death. So, I am a bit biased. May be with sections, I
>> can replace one md at a time and letting others run degraded. I am sure
>> there other tricks. I am simply saying it is a lot of reads/writes, and
>> of course computation, in cold replacement of disks in RAID6 vs. RAID1.
>>
>> Yes, larger disks are not cheaper, but they use one SATA port vs.
>> smaller disks. Also, they use less power in the long run (mine run
>> 24x7). That is why I have a policy of replacing disks once 2x size disks
>> (compared to what I currently own) become commonplace.
>>
>> I have a LSI 9211 SAS HBA which is touted to be reliable by this community.
>>
>> Regards
>> Ramesh
>>

Roger,

   Just curious, in your search for a SSD solution to mythtv recording, 
did you consider overlayfs, unionfs or mergerfs? If you did, why did you 
decide that a simple copy is better?

Ramesh


^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: Best way to add caching to a new raid setup.
  2020-09-11 18:39                         ` R. Ramesh
@ 2020-09-11 20:37                           ` Roger Heflin
  2020-09-11 22:41                             ` Ram Ramesh
  0 siblings, 1 reply; 36+ messages in thread
From: Roger Heflin @ 2020-09-11 20:37 UTC (permalink / raw)
  To: R. Ramesh; +Cc: Linux Raid

It is simpler, and has very simple to maintain moving parts.  I have
been a linux admin for 20+ years, and a professional unix admin for
longer, and too often complicated seems nice but has burned me with
bugs and other unexpected results, so simple is best.  The daily move
uses nothing complicated and can be expected to work on any unix
system that has ever existed and relies on heavily used operations
that have a high probability of working and of being caught quickly as
broken if they did not work.  Any of the others are a bit more
complicated, and more likely to have bugs and less likely to get
caught as quick as the moving parts I rely on.  I also wanted to be
able to spin down my array for any hours when no one is watching the
dvr (usually this is 18+ hours per day, x 7 drives ==   1.25kw/day, or
37kw/month, or $4-$10 depending on power costs), and I also have
motion software collecting security cams that go to the SSD and are
also copied onto the array nighty.   The security cams would have kept
the array spinning when anything moved anywhere outside so pretty much
100% of the time.

On Fri, Sep 11, 2020 at 1:39 PM R. Ramesh <rramesh@verizon.net> wrote:
>
> On 8/30/20 10:42 AM, Roger Heflin wrote:
> > The LSI should be a good controller as long as you the HBA fw and not
> > the raid fw.
> >
> > I use an LSI with hba + the 8 AMD chipset sata ports, currently I have
> > 12 ports cabled to hot swap bays but only 7+boot disk used.
> >
> > How many recording do you think you will have and how many
> > clients/watchers?  With the SSD handling the writes for recording my
> > disks actually spin down if no one is watching anything.
> >
> > The other trick the partitions let me do is initially I moved from 1.5
> > -> 3tb disks (2x750 -> 4x750) and once I got 3-3tbs in I added the 2
> > more partitions raid6(+1.5TB) (I bought the 3tb drives slowly), then
> > the next 3tb gets added to all 4 partitions (+3TB).
> >
> > On reads at least each disk can do at least 50 iops, and for the most
> > part the disks themselves are very likely to cache the entire track
> > the head goes over, so a 2nd sequential read likely comes from the
> > disk's read cache and does not have to actually be read.  So several
> > sequential workloads jumping back and forth do not behave as bad as
> > one would expect.  Write are a different story and a lot more
> > expensive.  I isloate those to ssd and copy them in the middle of the
> > night when it is low activity.  And since they are being copied as big
> > fast streams one file at a time they end up with very few fragments
> > and write very quickly.   The way I have mine setup mythtv will find
> > the file whether it is on the ssd recording directory or the raid
> > recording directory, so when I mv the files nothing has to be done
> > except the mv.
> >
> >
> > On Sat, Aug 29, 2020 at 7:56 PM Ram Ramesh <rramesh2400@gmail.com> wrote:
> >> On 8/29/20 4:26 PM, Roger Heflin wrote:
> >>> It should be worth noting that if you buy 2 exactly the same SSD's at
> >>> the same time and use them in a mirror they are very likely to be
> >>> wearing about the same.
> >>>
> >>> I am hesitant to go much bigger on disks, especially since the $$/GB
> >>> really does not change much as the disks get bigger.
> >>>
> >>> And be careful of adding on a cheap sata controller as a lot of them work badly.
> >>>
> >>> Most of my disks have died from bad blocks causing a section of the
> >>> disk to have some errors, or bad blocks on sections causing the array
> >>> to pause for 7 seconds.  Make sure to get a disk with SCTERC settable
> >>> (timeout when bad blocks happen, otherwise the default timeout is a
> >>> 60-120seconds, but with it you can set it to no more than 7 seconds).
> >>>    In the cases where the entire disk did not just stop and is just
> >>> getting bad blocks in places, typically you have time as only a single
> >>> section is getting bad blocks, so in this case having sections does
> >>> help.    Also note that mdadm with 4 sections like I have will only
> >>> run a single rebuild at a time as mdadm understands that the
> >>> underlying disks are shared, this makes replacing a disk with 1
> >>> section or 4 sections basically work pretty much the same.  It does
> >>> the same thing on the weekly scans, it sets all 4 to scan, and it
> >>> scans 1 and defers the other scan as disks are shared.
> >>>
> >>> It seems to be a disk completely dying is a lot less often than badblock issues.
> >>>
> >>> On Sat, Aug 29, 2020 at 3:50 PM Ram Ramesh <rramesh2400@gmail.com> wrote:
> >>>> On 8/29/20 12:02 AM, Roman Mamedov wrote:
> >>>>> On Fri, 28 Aug 2020 22:08:22 -0500
> >>>>> "R. Ramesh" <rramesh@verizon.net> wrote:
> >>>>>
> >>>>>> I do not know how SSD caching is implemented. I assumed it will be
> >>>>>> somewhat similar to memory cache (L2 vs L3 vs L4 etc). I am hoping that
> >>>>>> with SSD caching, reads/writes to disk will be larger in size and
> >>>>>> sequential within a file (similar to cache line fill in memory cache
> >>>>>> which results in memory bursts that are efficient). I thought that is
> >>>>>> what SSD caching will do to disk reads/writes. I assumed, once reads
> >>>>>> (ahead) and writes (assuming writeback cache) buffers data sufficiently
> >>>>>> in the SSD, all reads/writes will be to SSD with periodic well organized
> >>>>>> large transfers to disk. If I am wrong here then I do not see any point
> >>>>>> in SSD as a cache. My aim is not to optimize by cache hits, but optimize
> >>>>>> by preventing disks from thrashing back and forth seeking after every
> >>>>>> block read. I suppose Linux (memory) buffer cache alleviates some of
> >>>>>> that. I was hoping SSD will provide next level. If not, I am off in my
> >>>>>> understanding of SSD as a disk cache.
> >>>>> Just try it, as I said before with LVM it is easy to remove if it doesn't work
> >>>>> out. You can always go to the manual copying method or whatnot, but first why
> >>>>> not check if the automatic caching solution might be "good enough" for your
> >>>>> needs.
> >>>>>
> >>>>> Yes it usually tries to avoid caching long sequential reads or writes, but
> >>>>> there's also quite a bit of other load on the FS, i.e. metadata. I found that
> >>>>> browsing directories and especially mounting the filesystem had a great
> >>>>> benefit from caching.
> >>>>>
> >>>>> You are correct that it will try to increase performance via writeback
> >>>>> caching, however with LVM that needs to be enabled explicitly:
> >>>>> https://www.systutorials.com/docs/linux/man/7-lvmcache/#lbAK
> >>>>> And of course a failure of that cache SSD will mean losing some data, even if
> >>>>> the main array is RAID. Perhaps should consider a RAID of SSDs for cache in
> >>>>> that case then.
> >>>>>
> >>>> Yes, I have 2x500GB ssds for cache. May be, I should do raid1 on them
> >>>> and use as cache volume.
> >>>> I thought SSDs are more reliable and even when they begin to die, they
> >>>> become readonly before quitting.  Of course, this is all theory, and I
> >>>> do not think standards exists on how they behave when reaching EoL.
> >>>>
> >>>> Ramesh
> >>>>
> >> My SSDs are from different companies and bought at different times
> >> (2019/2016, I think).
> >>
> >> I have not had many hard disk failures. However, each time I had one, it
> >> has been a total death. So, I am a bit biased. May be with sections, I
> >> can replace one md at a time and letting others run degraded. I am sure
> >> there other tricks. I am simply saying it is a lot of reads/writes, and
> >> of course computation, in cold replacement of disks in RAID6 vs. RAID1.
> >>
> >> Yes, larger disks are not cheaper, but they use one SATA port vs.
> >> smaller disks. Also, they use less power in the long run (mine run
> >> 24x7). That is why I have a policy of replacing disks once 2x size disks
> >> (compared to what I currently own) become commonplace.
> >>
> >> I have a LSI 9211 SAS HBA which is touted to be reliable by this community.
> >>
> >> Regards
> >> Ramesh
> >>
>
> Roger,
>
>    Just curious, in your search for a SSD solution to mythtv recording,
> did you consider overlayfs, unionfs or mergerfs? If you did, why did you
> decide that a simple copy is better?
>
> Ramesh
>

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: Best way to add caching to a new raid setup.
  2020-09-11 20:37                           ` Roger Heflin
@ 2020-09-11 22:41                             ` Ram Ramesh
  0 siblings, 0 replies; 36+ messages in thread
From: Ram Ramesh @ 2020-09-11 22:41 UTC (permalink / raw)
  To: Roger Heflin, R. Ramesh, Linux Raid

Appreciate the details. I agree spinning down the disk is a good idea to 
consider. I will look more in to it.

Ramesh


On 9/11/20 3:37 PM, Roger Heflin wrote:
> It is simpler, and has very simple to maintain moving parts.  I have
> been a linux admin for 20+ years, and a professional unix admin for
> longer, and too often complicated seems nice but has burned me with
> bugs and other unexpected results, so simple is best.  The daily move
> uses nothing complicated and can be expected to work on any unix
> system that has ever existed and relies on heavily used operations
> that have a high probability of working and of being caught quickly as
> broken if they did not work.  Any of the others are a bit more
> complicated, and more likely to have bugs and less likely to get
> caught as quick as the moving parts I rely on.  I also wanted to be
> able to spin down my array for any hours when no one is watching the
> dvr (usually this is 18+ hours per day, x 7 drives ==   1.25kw/day, or
> 37kw/month, or $4-$10 depending on power costs), and I also have
> motion software collecting security cams that go to the SSD and are
> also copied onto the array nighty.   The security cams would have kept
> the array spinning when anything moved anywhere outside so pretty much
> 100% of the time.
>
> On Fri, Sep 11, 2020 at 1:39 PM R. Ramesh <rramesh@verizon.net> wrote:
>> On 8/30/20 10:42 AM, Roger Heflin wrote:
>>> The LSI should be a good controller as long as you the HBA fw and not
>>> the raid fw.
>>>
>>> I use an LSI with hba + the 8 AMD chipset sata ports, currently I have
>>> 12 ports cabled to hot swap bays but only 7+boot disk used.
>>>
>>> How many recording do you think you will have and how many
>>> clients/watchers?  With the SSD handling the writes for recording my
>>> disks actually spin down if no one is watching anything.
>>>
>>> The other trick the partitions let me do is initially I moved from 1.5
>>> -> 3tb disks (2x750 -> 4x750) and once I got 3-3tbs in I added the 2
>>> more partitions raid6(+1.5TB) (I bought the 3tb drives slowly), then
>>> the next 3tb gets added to all 4 partitions (+3TB).
>>>
>>> On reads at least each disk can do at least 50 iops, and for the most
>>> part the disks themselves are very likely to cache the entire track
>>> the head goes over, so a 2nd sequential read likely comes from the
>>> disk's read cache and does not have to actually be read.  So several
>>> sequential workloads jumping back and forth do not behave as bad as
>>> one would expect.  Write are a different story and a lot more
>>> expensive.  I isloate those to ssd and copy them in the middle of the
>>> night when it is low activity.  And since they are being copied as big
>>> fast streams one file at a time they end up with very few fragments
>>> and write very quickly.   The way I have mine setup mythtv will find
>>> the file whether it is on the ssd recording directory or the raid
>>> recording directory, so when I mv the files nothing has to be done
>>> except the mv.
>>>
>>>
>>> On Sat, Aug 29, 2020 at 7:56 PM Ram Ramesh <rramesh2400@gmail.com> wrote:
>>>> On 8/29/20 4:26 PM, Roger Heflin wrote:
>>>>> It should be worth noting that if you buy 2 exactly the same SSD's at
>>>>> the same time and use them in a mirror they are very likely to be
>>>>> wearing about the same.
>>>>>
>>>>> I am hesitant to go much bigger on disks, especially since the $$/GB
>>>>> really does not change much as the disks get bigger.
>>>>>
>>>>> And be careful of adding on a cheap sata controller as a lot of them work badly.
>>>>>
>>>>> Most of my disks have died from bad blocks causing a section of the
>>>>> disk to have some errors, or bad blocks on sections causing the array
>>>>> to pause for 7 seconds.  Make sure to get a disk with SCTERC settable
>>>>> (timeout when bad blocks happen, otherwise the default timeout is a
>>>>> 60-120seconds, but with it you can set it to no more than 7 seconds).
>>>>>     In the cases where the entire disk did not just stop and is just
>>>>> getting bad blocks in places, typically you have time as only a single
>>>>> section is getting bad blocks, so in this case having sections does
>>>>> help.    Also note that mdadm with 4 sections like I have will only
>>>>> run a single rebuild at a time as mdadm understands that the
>>>>> underlying disks are shared, this makes replacing a disk with 1
>>>>> section or 4 sections basically work pretty much the same.  It does
>>>>> the same thing on the weekly scans, it sets all 4 to scan, and it
>>>>> scans 1 and defers the other scan as disks are shared.
>>>>>
>>>>> It seems to be a disk completely dying is a lot less often than badblock issues.
>>>>>
>>>>> On Sat, Aug 29, 2020 at 3:50 PM Ram Ramesh <rramesh2400@gmail.com> wrote:
>>>>>> On 8/29/20 12:02 AM, Roman Mamedov wrote:
>>>>>>> On Fri, 28 Aug 2020 22:08:22 -0500
>>>>>>> "R. Ramesh" <rramesh@verizon.net> wrote:
>>>>>>>
>>>>>>>> I do not know how SSD caching is implemented. I assumed it will be
>>>>>>>> somewhat similar to memory cache (L2 vs L3 vs L4 etc). I am hoping that
>>>>>>>> with SSD caching, reads/writes to disk will be larger in size and
>>>>>>>> sequential within a file (similar to cache line fill in memory cache
>>>>>>>> which results in memory bursts that are efficient). I thought that is
>>>>>>>> what SSD caching will do to disk reads/writes. I assumed, once reads
>>>>>>>> (ahead) and writes (assuming writeback cache) buffers data sufficiently
>>>>>>>> in the SSD, all reads/writes will be to SSD with periodic well organized
>>>>>>>> large transfers to disk. If I am wrong here then I do not see any point
>>>>>>>> in SSD as a cache. My aim is not to optimize by cache hits, but optimize
>>>>>>>> by preventing disks from thrashing back and forth seeking after every
>>>>>>>> block read. I suppose Linux (memory) buffer cache alleviates some of
>>>>>>>> that. I was hoping SSD will provide next level. If not, I am off in my
>>>>>>>> understanding of SSD as a disk cache.
>>>>>>> Just try it, as I said before with LVM it is easy to remove if it doesn't work
>>>>>>> out. You can always go to the manual copying method or whatnot, but first why
>>>>>>> not check if the automatic caching solution might be "good enough" for your
>>>>>>> needs.
>>>>>>>
>>>>>>> Yes it usually tries to avoid caching long sequential reads or writes, but
>>>>>>> there's also quite a bit of other load on the FS, i.e. metadata. I found that
>>>>>>> browsing directories and especially mounting the filesystem had a great
>>>>>>> benefit from caching.
>>>>>>>
>>>>>>> You are correct that it will try to increase performance via writeback
>>>>>>> caching, however with LVM that needs to be enabled explicitly:
>>>>>>> https://www.systutorials.com/docs/linux/man/7-lvmcache/#lbAK
>>>>>>> And of course a failure of that cache SSD will mean losing some data, even if
>>>>>>> the main array is RAID. Perhaps should consider a RAID of SSDs for cache in
>>>>>>> that case then.
>>>>>>>
>>>>>> Yes, I have 2x500GB ssds for cache. May be, I should do raid1 on them
>>>>>> and use as cache volume.
>>>>>> I thought SSDs are more reliable and even when they begin to die, they
>>>>>> become readonly before quitting.  Of course, this is all theory, and I
>>>>>> do not think standards exists on how they behave when reaching EoL.
>>>>>>
>>>>>> Ramesh
>>>>>>
>>>> My SSDs are from different companies and bought at different times
>>>> (2019/2016, I think).
>>>>
>>>> I have not had many hard disk failures. However, each time I had one, it
>>>> has been a total death. So, I am a bit biased. May be with sections, I
>>>> can replace one md at a time and letting others run degraded. I am sure
>>>> there other tricks. I am simply saying it is a lot of reads/writes, and
>>>> of course computation, in cold replacement of disks in RAID6 vs. RAID1.
>>>>
>>>> Yes, larger disks are not cheaper, but they use one SATA port vs.
>>>> smaller disks. Also, they use less power in the long run (mine run
>>>> 24x7). That is why I have a policy of replacing disks once 2x size disks
>>>> (compared to what I currently own) become commonplace.
>>>>
>>>> I have a LSI 9211 SAS HBA which is touted to be reliable by this community.
>>>>
>>>> Regards
>>>> Ramesh
>>>>
>> Roger,
>>
>>     Just curious, in your search for a SSD solution to mythtv recording,
>> did you consider overlayfs, unionfs or mergerfs? If you did, why did you
>> decide that a simple copy is better?
>>
>> Ramesh
>>


^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: Best way to add caching to a new raid setup.
  2020-09-01 16:12                 ` Ram Ramesh
  2020-09-01 17:01                   ` Kai Stian Olstad
@ 2020-09-14 11:40                   ` Nix
  2020-09-14 14:32                     ` Ram Ramesh
  1 sibling, 1 reply; 36+ messages in thread
From: Nix @ 2020-09-14 11:40 UTC (permalink / raw)
  To: Ram Ramesh; +Cc: Drew, antlists, Linux Raid

On 1 Sep 2020, Ram Ramesh uttered the following:

> After thinking through this, I really like the idea of simply
> recording programs to SSD and move one file at a time based on some
> aging algorithms of my own. I will move files back and forth as needed
> during overnight hours creating my own caching effect.

I don't really see the benefit here for a mythtv installation in
particular. I/O patterns for large media are extremely non-seeky: even
with multiple live recordings at once, an HDD would easily be able to
keep up since it'd only have to seek a few times per 30s period given
the size of most plausible write caches.

In general, doing the hierarchical storage thing is useful if you have
stuff you will almost never access that you can keep on slower media
(or, in this case, stuff whose access patterns are non-seeky that you
can keep on media with a high seek time). But in this case, that would
be 'all of it'. Even if it weren't, by-hand copying won't deal with the
thing you really need to keep on fast-seek media: metadata. You can't
build your own filesystem with metadata on SSD and data on non-SSD this
way! But both LVM caching and bcache do exactly that.

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: Best way to add caching to a new raid setup.
  2020-09-14 11:40                   ` Nix
@ 2020-09-14 14:32                     ` Ram Ramesh
  2020-09-14 14:48                       ` Roger Heflin
  0 siblings, 1 reply; 36+ messages in thread
From: Ram Ramesh @ 2020-09-14 14:32 UTC (permalink / raw)
  To: Nix; +Cc: Drew, antlists, Linux Raid

On 9/14/20 6:40 AM, Nix wrote:
> On 1 Sep 2020, Ram Ramesh uttered the following:
>
>> After thinking through this, I really like the idea of simply
>> recording programs to SSD and move one file at a time based on some
>> aging algorithms of my own. I will move files back and forth as needed
>> during overnight hours creating my own caching effect.
> I don't really see the benefit here for a mythtv installation in
> particular. I/O patterns for large media are extremely non-seeky: even
> with multiple live recordings at once, an HDD would easily be able to
> keep up since it'd only have to seek a few times per 30s period given
> the size of most plausible write caches.
>
> In general, doing the hierarchical storage thing is useful if you have
> stuff you will almost never access that you can keep on slower media
> (or, in this case, stuff whose access patterns are non-seeky that you
> can keep on media with a high seek time). But in this case, that would
> be 'all of it'. Even if it weren't, by-hand copying won't deal with the
> thing you really need to keep on fast-seek media: metadata. You can't
> build your own filesystem with metadata on SSD and data on non-SSD this
> way! But both LVM caching and bcache do exactly that.
Agreed, all I need is a file level LRU caching effect. All recently 
accessed/created files in SSD and the ones untouched for a while in 
spinning disks. I was trying to get this done using a block level 
caching methods which is too complicated for the purpose.

My aim is not to improve the performance, instead improve on power. I 
want my raid disks to be mostly sitting idle holding files and spin up 
and serve only when called for. Most of the time, I am 
watching/recording recent shows/programs or popular movies and typically 
that is about 200-400GB of storage. With ultraviolet, prime, netflix and 
disney, movies are more often sourced from online content and TV shows 
get deleted after watching and new ones gets added in that space. So, 
typical usage seem ideal for popular SSD size (with a large backup store 
in spinning disk), I think. This means my spinning disks are going to 
wake up once a day or two at most. More often I expect it to be once a 
week or have periods of high activity and die down to nothing for a 
while.  Instead, currently they are running 24x7 which does not make sense.

Regards
Ramesh


^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: Best way to add caching to a new raid setup.
  2020-09-14 14:32                     ` Ram Ramesh
@ 2020-09-14 14:48                       ` Roger Heflin
  2020-09-14 15:08                         ` Wols Lists
  0 siblings, 1 reply; 36+ messages in thread
From: Roger Heflin @ 2020-09-14 14:48 UTC (permalink / raw)
  To: Ram Ramesh; +Cc: Nix, Drew, antlists, Linux Raid

It should be noted that mythtv is a badly behaved IO application, it
does a lot of sync calls that effectively for the most part makes
linux's write cache small.

That code was apparently done on the idea that the syncs would take
too long when too many recordings were being done and a few would
timeout and then kill a few of them so that at least some recordings
will work.  A number of people believe its usage of sync is badly
designed (me included) but last a saw the dev's were arguing for it.
It is a decent assumption if recordings we being done on a single
spinning disk, but it is not so good when there are multiple spindles
under it and the disks would be able to cache up.  I had recordings
being killed when a single spinning disk in the array with the SCTERC
set to 7seconds happened, and that is part of the reason why I went to
SSD.  With old disks the disk block replacements is going to happen
often enough that with the sync/kill crap recordings aren't reliable.

On Mon, Sep 14, 2020 at 9:37 AM Ram Ramesh <rramesh2400@gmail.com> wrote:
>
> On 9/14/20 6:40 AM, Nix wrote:
> > On 1 Sep 2020, Ram Ramesh uttered the following:
> >
> >> After thinking through this, I really like the idea of simply
> >> recording programs to SSD and move one file at a time based on some
> >> aging algorithms of my own. I will move files back and forth as needed
> >> during overnight hours creating my own caching effect.
> > I don't really see the benefit here for a mythtv installation in
> > particular. I/O patterns for large media are extremely non-seeky: even
> > with multiple live recordings at once, an HDD would easily be able to
> > keep up since it'd only have to seek a few times per 30s period given
> > the size of most plausible write caches.
> >
> > In general, doing the hierarchical storage thing is useful if you have
> > stuff you will almost never access that you can keep on slower media
> > (or, in this case, stuff whose access patterns are non-seeky that you
> > can keep on media with a high seek time). But in this case, that would
> > be 'all of it'. Even if it weren't, by-hand copying won't deal with the
> > thing you really need to keep on fast-seek media: metadata. You can't
> > build your own filesystem with metadata on SSD and data on non-SSD this
> > way! But both LVM caching and bcache do exactly that.
> Agreed, all I need is a file level LRU caching effect. All recently
> accessed/created files in SSD and the ones untouched for a while in
> spinning disks. I was trying to get this done using a block level
> caching methods which is too complicated for the purpose.
>
> My aim is not to improve the performance, instead improve on power. I
> want my raid disks to be mostly sitting idle holding files and spin up
> and serve only when called for. Most of the time, I am
> watching/recording recent shows/programs or popular movies and typically
> that is about 200-400GB of storage. With ultraviolet, prime, netflix and
> disney, movies are more often sourced from online content and TV shows
> get deleted after watching and new ones gets added in that space. So,
> typical usage seem ideal for popular SSD size (with a large backup store
> in spinning disk), I think. This means my spinning disks are going to
> wake up once a day or two at most. More often I expect it to be once a
> week or have periods of high activity and die down to nothing for a
> while.  Instead, currently they are running 24x7 which does not make sense.
>
> Regards
> Ramesh
>

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: Best way to add caching to a new raid setup.
  2020-09-14 14:48                       ` Roger Heflin
@ 2020-09-14 15:08                         ` Wols Lists
  0 siblings, 0 replies; 36+ messages in thread
From: Wols Lists @ 2020-09-14 15:08 UTC (permalink / raw)
  To: Roger Heflin, Ram Ramesh; +Cc: Nix, Drew, Linux Raid

On 14/09/20 15:48, Roger Heflin wrote:
> It should be noted that mythtv is a badly behaved IO application, it
> does a lot of sync calls that effectively for the most part makes
> linux's write cache small.

Sounds like the devs need reminding of the early days of ext4. Iiuc, if
that ran on an early ext4 you wouldn't have got ANY successful
recordings AT ALL.

At an absolute minimum, that sort of behaviour needs to be configurable,
because if the user is running mythtv on anything other than a dedicated
machine then it's going to kill performance for anything else.

Cheers,
Wol

^ permalink raw reply	[flat|nested] 36+ messages in thread

end of thread, other threads:[~2020-09-14 15:08 UTC | newest]

Thread overview: 36+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
     [not found] <16cee7f2-38d9-13c8-4342-4562be68930b.ref@verizon.net>
2020-08-28  2:31 ` Best way to add caching to a new raid setup R. Ramesh
2020-08-28  3:05   ` Peter Grandi
2020-08-28  3:19     ` Ram Ramesh
2020-08-28 15:26   ` antlists
2020-08-28 17:25     ` Ram Ramesh
2020-08-28 22:12       ` antlists
2020-08-28 22:40         ` Ram Ramesh
2020-08-28 22:59           ` antlists
2020-08-29  3:08             ` R. Ramesh
2020-08-29  5:02               ` Roman Mamedov
2020-08-29 20:48                 ` Ram Ramesh
2020-08-29 21:26                   ` Roger Heflin
2020-08-30  0:56                     ` Ram Ramesh
2020-08-30 15:42                       ` Roger Heflin
2020-08-30 17:19                         ` Ram Ramesh
2020-09-11 18:39                         ` R. Ramesh
2020-09-11 20:37                           ` Roger Heflin
2020-09-11 22:41                             ` Ram Ramesh
2020-08-29  0:01           ` Roger Heflin
2020-08-29  3:12             ` R. Ramesh
2020-08-29 22:36               ` Drew
2020-09-01 16:12                 ` Ram Ramesh
2020-09-01 17:01                   ` Kai Stian Olstad
2020-09-02 18:17                     ` Ram Ramesh
2020-09-14 11:40                   ` Nix
2020-09-14 14:32                     ` Ram Ramesh
2020-09-14 14:48                       ` Roger Heflin
2020-09-14 15:08                         ` Wols Lists
2020-08-31 19:20           ` Nix
2020-08-28 17:46   ` Roman Mamedov
2020-08-28 20:39     ` Ram Ramesh
2020-08-29 15:34       ` antlists
2020-08-29 15:57         ` Roman Mamedov
2020-08-29 16:26           ` Roger Heflin
2020-08-29 20:45             ` Ram Ramesh
2020-08-30 22:16       ` Michal Soltys

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).