* Best way to add caching to a new raid setup. [not found] <16cee7f2-38d9-13c8-4342-4562be68930b.ref@verizon.net> @ 2020-08-28 2:31 ` R. Ramesh 2020-08-28 3:05 ` Peter Grandi ` (2 more replies) 0 siblings, 3 replies; 36+ messages in thread From: R. Ramesh @ 2020-08-28 2:31 UTC (permalink / raw) To: Linux Raid I have two raid6s running on mythbuntu 14.04. The are built on 6 enterprise drives. So, no hd issues as of now. Still, I plan to upgrade as it has been a while and the size of the hard drives have become significantly larger (a indication that my disks may be older) I want to build new raid using the 16/14tb drives. Since I am building new raid, I thought I could explore caching options. I see a mention of LVM cache and few other bcache/xyzcache etc. Is anyone of them better than other or no cache is safer. Since I switched over to NVME boot drives, I have quite a few SATA SSDs lying around that I can put to good use, if I cache using them. I will move to xubuntu 20.04 as part of this upgrade. So, hopefully, I will have recent versions of kernel, mdadm and fstools. With these I should be able to make full use of current features, if any is needed for caching support. Please let me know your expert opinion. Thanks Ramesh ^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: Best way to add caching to a new raid setup. 2020-08-28 2:31 ` Best way to add caching to a new raid setup R. Ramesh @ 2020-08-28 3:05 ` Peter Grandi 2020-08-28 3:19 ` Ram Ramesh 2020-08-28 15:26 ` antlists 2020-08-28 17:46 ` Roman Mamedov 2 siblings, 1 reply; 36+ messages in thread From: Peter Grandi @ 2020-08-28 3:05 UTC (permalink / raw) To: Linux Raid > I have two raid6s running on mythbuntu 14.04. The are built on > 6 enterprise drives. [...] want to build new raid using the > 16/14tb drives. [...] This may be the beginning of an exciting adventure into setting up a RAID set with stunning rebuild times, minimizing IOPS-per-TB and setting up filetrees that cannot be realistically 'fsck'ed. Plenty of people seem to like that kind of exciting adventure :-). ^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: Best way to add caching to a new raid setup. 2020-08-28 3:05 ` Peter Grandi @ 2020-08-28 3:19 ` Ram Ramesh 0 siblings, 0 replies; 36+ messages in thread From: Ram Ramesh @ 2020-08-28 3:19 UTC (permalink / raw) To: Peter Grandi, Linux Raid On 8/27/20 10:05 PM, Peter Grandi wrote: >> I have two raid6s running on mythbuntu 14.04. The are built on >> 6 enterprise drives. [...] want to build new raid using the >> 16/14tb drives. [...] > This may be the beginning of an exciting adventure into setting > up a RAID set with stunning rebuild times, minimizing IOPS-per-TB > and setting up filetrees that cannot be realistically 'fsck'ed. > Plenty of people seem to like that kind of exciting adventure :-). Yes, just as exciting as my raid1 on another machine with 3 one TB WD black from 15+ years ago (one of the first 1 TB blacks) Still running strong after this many years 24x7 and has TLR :-) Most likely I am building same size raid (likely raid1 on two 14/16tb) No EB filesystem for me (yet!) Ramesh ^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: Best way to add caching to a new raid setup. 2020-08-28 2:31 ` Best way to add caching to a new raid setup R. Ramesh 2020-08-28 3:05 ` Peter Grandi @ 2020-08-28 15:26 ` antlists 2020-08-28 17:25 ` Ram Ramesh 2020-08-28 17:46 ` Roman Mamedov 2 siblings, 1 reply; 36+ messages in thread From: antlists @ 2020-08-28 15:26 UTC (permalink / raw) To: R. Ramesh, Linux Raid On 28/08/2020 03:31, R. Ramesh wrote: > I want to build new raid using the 16/14tb drives. Since I am building > new raid, I thought I could explore caching options. I see a mention of > LVM cache and few other bcache/xyzcache etc. > > Is anyone of them better than other or no cache is safer. Since I > switched over to NVME boot drives, I have quite a few SATA SSDs lying > around that I can put to good use, if I cache using them. Sounds like a fun idea. Just make sure you're getting CMR not SMR drives, but I'm not aware of SMR that large ... Hopefully I'm going to do some work on it soon, but look at dm-integrity to make sure you don't get a dodgy mirror. You can add dm-integrity retrospectively, so if you leave a bit of unused space on the drive, I think you can tell dm-integrity where to put its checksums. Cheers, Wol ^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: Best way to add caching to a new raid setup. 2020-08-28 15:26 ` antlists @ 2020-08-28 17:25 ` Ram Ramesh 2020-08-28 22:12 ` antlists 0 siblings, 1 reply; 36+ messages in thread From: Ram Ramesh @ 2020-08-28 17:25 UTC (permalink / raw) To: antlists, R. Ramesh, Linux Raid On 8/28/20 10:26 AM, antlists wrote: > On 28/08/2020 03:31, R. Ramesh wrote: >> I want to build new raid using the 16/14tb drives. Since I am >> building new raid, I thought I could explore caching options. I see a >> mention of LVM cache and few other bcache/xyzcache etc. >> >> Is anyone of them better than other or no cache is safer. Since I >> switched over to NVME boot drives, I have quite a few SATA SSDs lying >> around that I can put to good use, if I cache using them. > > Sounds like a fun idea. Just make sure you're getting CMR not SMR > drives, but I'm not aware of SMR that large ... > > Hopefully I'm going to do some work on it soon, but look at > dm-integrity to make sure you don't get a dodgy mirror. You can add > dm-integrity retrospectively, so if you leave a bit of unused space on > the drive, I think you can tell dm-integrity where to put its checksums. > > Cheers, > Wol Yes, no SMR. I plan to get only enterprise helium drives (seagate exos X14 or X16). I googled on RAID cache performance. I did not get too many interesting hits. A couple that find seem to indicate that LVM cache shows no performance improvement. Can't understand why. May be SATA limits (SSD = 500MB and disk could be as high as 200M and with raid1 that might go up as we have two disks to read etc) I am mainly looking for IOP improvement as I want to use this RAID in mythtv environment. So multiple threads will be active and I expect cache to help with random access IOPs. Regards Ramesh ^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: Best way to add caching to a new raid setup. 2020-08-28 17:25 ` Ram Ramesh @ 2020-08-28 22:12 ` antlists 2020-08-28 22:40 ` Ram Ramesh 0 siblings, 1 reply; 36+ messages in thread From: antlists @ 2020-08-28 22:12 UTC (permalink / raw) To: Ram Ramesh, R. Ramesh, Linux Raid On 28/08/2020 18:25, Ram Ramesh wrote: > I am mainly looking for IOP improvement as I want to use this RAID in > mythtv environment. So multiple threads will be active and I expect > cache to help with random access IOPs. ??? Caching will only help in a read-after-write scenario, or a read-several-times scenario. I'm guessing mythtv means it's a film server? Can ALL your films (or at least your favourite "watch again and again" ones) fit in the cache? If you watch a lot of films, chances are you'll read it from disk (no advantage from the cache), and by the time you watch it again it will have been evicted so you'll have to read it again. The other time cache may be useful, is if you're recording one thing and watching another. That way, the writes can stall in cache as you prioritise reading. Think about what is actually happening at the i/o level, and will cache help? Cheers, Wol ^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: Best way to add caching to a new raid setup. 2020-08-28 22:12 ` antlists @ 2020-08-28 22:40 ` Ram Ramesh 2020-08-28 22:59 ` antlists ` (2 more replies) 0 siblings, 3 replies; 36+ messages in thread From: Ram Ramesh @ 2020-08-28 22:40 UTC (permalink / raw) To: antlists, R. Ramesh, Linux Raid On 8/28/20 5:12 PM, antlists wrote: > On 28/08/2020 18:25, Ram Ramesh wrote: >> I am mainly looking for IOP improvement as I want to use this RAID in >> mythtv environment. So multiple threads will be active and I expect >> cache to help with random access IOPs. > > ??? > > Caching will only help in a read-after-write scenario, or a > read-several-times scenario. > > I'm guessing mythtv means it's a film server? Can ALL your films (or > at least your favourite "watch again and again" ones) fit in the > cache? If you watch a lot of films, chances are you'll read it from > disk (no advantage from the cache), and by the time you watch it again > it will have been evicted so you'll have to read it again. > > The other time cache may be useful, is if you're recording one thing > and watching another. That way, the writes can stall in cache as you > prioritise reading. > > Think about what is actually happening at the i/o level, and will > cache help? > > Cheers, > Wol Mythtv is a sever client DVR system. I have a client next to each of my TVs and one backend with large disk (this will have RAID with cache). At any time many clients will be accessing different programs and any scheduled recording will also be going on in parallel. So you will see a lot of seeks, but still all will be based on limited threads (I only have 3 TVs and may be one other PC acting as a client) So lots of IOs, mostly sequential, across small number of threads. I think most cache algorithms should be able to benefit from random access to blocks in SSD. Do you see any flaws in my argument? Regards Ramesh ^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: Best way to add caching to a new raid setup. 2020-08-28 22:40 ` Ram Ramesh @ 2020-08-28 22:59 ` antlists 2020-08-29 3:08 ` R. Ramesh 2020-08-29 0:01 ` Roger Heflin 2020-08-31 19:20 ` Nix 2 siblings, 1 reply; 36+ messages in thread From: antlists @ 2020-08-28 22:59 UTC (permalink / raw) To: Ram Ramesh, antlists, R. Ramesh, Linux Raid On 28/08/2020 23:40, Ram Ramesh wrote: > On 8/28/20 5:12 PM, antlists wrote: >> On 28/08/2020 18:25, Ram Ramesh wrote: >>> I am mainly looking for IOP improvement as I want to use this RAID in >>> mythtv environment. So multiple threads will be active and I expect >>> cache to help with random access IOPs. >> >> ??? >> >> Caching will only help in a read-after-write scenario, or a >> read-several-times scenario. >> >> I'm guessing mythtv means it's a film server? Can ALL your films (or >> at least your favourite "watch again and again" ones) fit in the >> cache? If you watch a lot of films, chances are you'll read it from >> disk (no advantage from the cache), and by the time you watch it again >> it will have been evicted so you'll have to read it again. >> >> The other time cache may be useful, is if you're recording one thing >> and watching another. That way, the writes can stall in cache as you >> prioritise reading. >> >> Think about what is actually happening at the i/o level, and will >> cache help? >> >> Cheers, >> Wol > > Mythtv is a sever client DVR system. I have a client next to each of my > TVs and one backend with large disk (this will have RAID with cache). At > any time many clients will be accessing different programs and any > scheduled recording will also be going on in parallel. So you will see a > lot of seeks, but still all will be based on limited threads (I only > have 3 TVs and may be one other PC acting as a client) So lots of IOs, > mostly sequential, across small number of threads. I think most cache > algorithms should be able to benefit from random access to blocks in SSD. > > Do you see any flaws in my argument? > I don't think you've understood mine. Doesn't matter what the cache algorithm is, the whole point of caching is that - when reading - it is only a benefit if the different threads are reading THE SAME bits of disk. So if your 3 TVs and the PC are accessing different tv programs, caching won't be much use, as all the reads will be cache misses. As for writing, caching can let you prioritise reading so you don't get stutter while watching. And it'll speed things up if you watch while recording. But basically, caching will really only benefit you if (a) your cache is large enough to hold all your favourite films so they don't get evicted from cache, or (b) you're in the habit of watching while recording, or (c) two or more tvs are in the habit of watching the same program. The question is not "how many simultaneous threads do I have?", but "how many of my disk i/os are going to be cache misses?" Your argument actively avoids that question. I suspect the answer is "most of them". Cheers, Wol ^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: Best way to add caching to a new raid setup. 2020-08-28 22:59 ` antlists @ 2020-08-29 3:08 ` R. Ramesh 2020-08-29 5:02 ` Roman Mamedov 0 siblings, 1 reply; 36+ messages in thread From: R. Ramesh @ 2020-08-29 3:08 UTC (permalink / raw) To: antlists, Ram Ramesh, Linux Raid On 8/28/20 5:59 PM, antlists wrote: > On 28/08/2020 23:40, Ram Ramesh wrote: >> On 8/28/20 5:12 PM, antlists wrote: >>> On 28/08/2020 18:25, Ram Ramesh wrote: >>>> I am mainly looking for IOP improvement as I want to use this RAID >>>> in mythtv environment. So multiple threads will be active and I >>>> expect cache to help with random access IOPs. >>> >>> ??? >>> >>> Caching will only help in a read-after-write scenario, or a >>> read-several-times scenario. >>> >>> I'm guessing mythtv means it's a film server? Can ALL your films (or >>> at least your favourite "watch again and again" ones) fit in the >>> cache? If you watch a lot of films, chances are you'll read it from >>> disk (no advantage from the cache), and by the time you watch it >>> again it will have been evicted so you'll have to read it again. >>> >>> The other time cache may be useful, is if you're recording one thing >>> and watching another. That way, the writes can stall in cache as you >>> prioritise reading. >>> >>> Think about what is actually happening at the i/o level, and will >>> cache help? >>> >>> Cheers, >>> Wol >> >> Mythtv is a sever client DVR system. I have a client next to each of >> my TVs and one backend with large disk (this will have RAID with >> cache). At any time many clients will be accessing different programs >> and any scheduled recording will also be going on in parallel. So you >> will see a lot of seeks, but still all will be based on limited >> threads (I only have 3 TVs and may be one other PC acting as a >> client) So lots of IOs, mostly sequential, across small number of >> threads. I think most cache algorithms should be able to benefit from >> random access to blocks in SSD. >> >> Do you see any flaws in my argument? >> > I don't think you've understood mine. Doesn't matter what the cache > algorithm is, the whole point of caching is that - when reading - it > is only a benefit if the different threads are reading THE SAME bits > of disk. So if your 3 TVs and the PC are accessing different tv > programs, caching won't be much use, as all the reads will be cache > misses. > > As for writing, caching can let you prioritise reading so you don't > get stutter while watching. And it'll speed things up if you watch > while recording. > > But basically, caching will really only benefit you if (a) your cache > is large enough to hold all your favourite films so they don't get > evicted from cache, or (b) you're in the habit of watching while > recording, or (c) two or more tvs are in the habit of watching the > same program. > > The question is not "how many simultaneous threads do I have?", but > "how many of my disk i/os are going to be cache misses?" Your argument > actively avoids that question. I suspect the answer is "most of them". > > Cheers, > Wol I do not know how SSD caching is implemented. I assumed it will be somewhat similar to memory cache (L2 vs L3 vs L4 etc). I am hoping that with SSD caching, reads/writes to disk will be larger in size and sequential within a file (similar to cache line fill in memory cache which results in memory bursts that are efficient). I thought that is what SSD caching will do to disk reads/writes. I assumed, once reads (ahead) and writes (assuming writeback cache) buffers data sufficiently in the SSD, all reads/writes will be to SSD with periodic well organized large transfers to disk. If I am wrong here then I do not see any point in SSD as a cache. My aim is not to optimize by cache hits, but optimize by preventing disks from thrashing back and forth seeking after every block read. I suppose Linux (memory) buffer cache alleviates some of that. I was hoping SSD will provide next level. If not, I am off in my understanding of SSD as a disk cache. Regards Ramesh ^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: Best way to add caching to a new raid setup. 2020-08-29 3:08 ` R. Ramesh @ 2020-08-29 5:02 ` Roman Mamedov 2020-08-29 20:48 ` Ram Ramesh 0 siblings, 1 reply; 36+ messages in thread From: Roman Mamedov @ 2020-08-29 5:02 UTC (permalink / raw) To: R. Ramesh; +Cc: antlists, Ram Ramesh, Linux Raid On Fri, 28 Aug 2020 22:08:22 -0500 "R. Ramesh" <rramesh@verizon.net> wrote: > I do not know how SSD caching is implemented. I assumed it will be > somewhat similar to memory cache (L2 vs L3 vs L4 etc). I am hoping that > with SSD caching, reads/writes to disk will be larger in size and > sequential within a file (similar to cache line fill in memory cache > which results in memory bursts that are efficient). I thought that is > what SSD caching will do to disk reads/writes. I assumed, once reads > (ahead) and writes (assuming writeback cache) buffers data sufficiently > in the SSD, all reads/writes will be to SSD with periodic well organized > large transfers to disk. If I am wrong here then I do not see any point > in SSD as a cache. My aim is not to optimize by cache hits, but optimize > by preventing disks from thrashing back and forth seeking after every > block read. I suppose Linux (memory) buffer cache alleviates some of > that. I was hoping SSD will provide next level. If not, I am off in my > understanding of SSD as a disk cache. Just try it, as I said before with LVM it is easy to remove if it doesn't work out. You can always go to the manual copying method or whatnot, but first why not check if the automatic caching solution might be "good enough" for your needs. Yes it usually tries to avoid caching long sequential reads or writes, but there's also quite a bit of other load on the FS, i.e. metadata. I found that browsing directories and especially mounting the filesystem had a great benefit from caching. You are correct that it will try to increase performance via writeback caching, however with LVM that needs to be enabled explicitly: https://www.systutorials.com/docs/linux/man/7-lvmcache/#lbAK And of course a failure of that cache SSD will mean losing some data, even if the main array is RAID. Perhaps should consider a RAID of SSDs for cache in that case then. -- With respect, Roman ^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: Best way to add caching to a new raid setup. 2020-08-29 5:02 ` Roman Mamedov @ 2020-08-29 20:48 ` Ram Ramesh 2020-08-29 21:26 ` Roger Heflin 0 siblings, 1 reply; 36+ messages in thread From: Ram Ramesh @ 2020-08-29 20:48 UTC (permalink / raw) To: Roman Mamedov, R. Ramesh; +Cc: antlists, Linux Raid On 8/29/20 12:02 AM, Roman Mamedov wrote: > On Fri, 28 Aug 2020 22:08:22 -0500 > "R. Ramesh" <rramesh@verizon.net> wrote: > >> I do not know how SSD caching is implemented. I assumed it will be >> somewhat similar to memory cache (L2 vs L3 vs L4 etc). I am hoping that >> with SSD caching, reads/writes to disk will be larger in size and >> sequential within a file (similar to cache line fill in memory cache >> which results in memory bursts that are efficient). I thought that is >> what SSD caching will do to disk reads/writes. I assumed, once reads >> (ahead) and writes (assuming writeback cache) buffers data sufficiently >> in the SSD, all reads/writes will be to SSD with periodic well organized >> large transfers to disk. If I am wrong here then I do not see any point >> in SSD as a cache. My aim is not to optimize by cache hits, but optimize >> by preventing disks from thrashing back and forth seeking after every >> block read. I suppose Linux (memory) buffer cache alleviates some of >> that. I was hoping SSD will provide next level. If not, I am off in my >> understanding of SSD as a disk cache. > Just try it, as I said before with LVM it is easy to remove if it doesn't work > out. You can always go to the manual copying method or whatnot, but first why > not check if the automatic caching solution might be "good enough" for your > needs. > > Yes it usually tries to avoid caching long sequential reads or writes, but > there's also quite a bit of other load on the FS, i.e. metadata. I found that > browsing directories and especially mounting the filesystem had a great > benefit from caching. > > You are correct that it will try to increase performance via writeback > caching, however with LVM that needs to be enabled explicitly: > https://www.systutorials.com/docs/linux/man/7-lvmcache/#lbAK > And of course a failure of that cache SSD will mean losing some data, even if > the main array is RAID. Perhaps should consider a RAID of SSDs for cache in > that case then. > Yes, I have 2x500GB ssds for cache. May be, I should do raid1 on them and use as cache volume. I thought SSDs are more reliable and even when they begin to die, they become readonly before quitting. Of course, this is all theory, and I do not think standards exists on how they behave when reaching EoL. Ramesh ^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: Best way to add caching to a new raid setup. 2020-08-29 20:48 ` Ram Ramesh @ 2020-08-29 21:26 ` Roger Heflin 2020-08-30 0:56 ` Ram Ramesh 0 siblings, 1 reply; 36+ messages in thread From: Roger Heflin @ 2020-08-29 21:26 UTC (permalink / raw) To: Ram Ramesh; +Cc: Roman Mamedov, R. Ramesh, antlists, Linux Raid It should be worth noting that if you buy 2 exactly the same SSD's at the same time and use them in a mirror they are very likely to be wearing about the same. I am hesitant to go much bigger on disks, especially since the $$/GB really does not change much as the disks get bigger. And be careful of adding on a cheap sata controller as a lot of them work badly. Most of my disks have died from bad blocks causing a section of the disk to have some errors, or bad blocks on sections causing the array to pause for 7 seconds. Make sure to get a disk with SCTERC settable (timeout when bad blocks happen, otherwise the default timeout is a 60-120seconds, but with it you can set it to no more than 7 seconds). In the cases where the entire disk did not just stop and is just getting bad blocks in places, typically you have time as only a single section is getting bad blocks, so in this case having sections does help. Also note that mdadm with 4 sections like I have will only run a single rebuild at a time as mdadm understands that the underlying disks are shared, this makes replacing a disk with 1 section or 4 sections basically work pretty much the same. It does the same thing on the weekly scans, it sets all 4 to scan, and it scans 1 and defers the other scan as disks are shared. It seems to be a disk completely dying is a lot less often than badblock issues. On Sat, Aug 29, 2020 at 3:50 PM Ram Ramesh <rramesh2400@gmail.com> wrote: > > On 8/29/20 12:02 AM, Roman Mamedov wrote: > > On Fri, 28 Aug 2020 22:08:22 -0500 > > "R. Ramesh" <rramesh@verizon.net> wrote: > > > >> I do not know how SSD caching is implemented. I assumed it will be > >> somewhat similar to memory cache (L2 vs L3 vs L4 etc). I am hoping that > >> with SSD caching, reads/writes to disk will be larger in size and > >> sequential within a file (similar to cache line fill in memory cache > >> which results in memory bursts that are efficient). I thought that is > >> what SSD caching will do to disk reads/writes. I assumed, once reads > >> (ahead) and writes (assuming writeback cache) buffers data sufficiently > >> in the SSD, all reads/writes will be to SSD with periodic well organized > >> large transfers to disk. If I am wrong here then I do not see any point > >> in SSD as a cache. My aim is not to optimize by cache hits, but optimize > >> by preventing disks from thrashing back and forth seeking after every > >> block read. I suppose Linux (memory) buffer cache alleviates some of > >> that. I was hoping SSD will provide next level. If not, I am off in my > >> understanding of SSD as a disk cache. > > Just try it, as I said before with LVM it is easy to remove if it doesn't work > > out. You can always go to the manual copying method or whatnot, but first why > > not check if the automatic caching solution might be "good enough" for your > > needs. > > > > Yes it usually tries to avoid caching long sequential reads or writes, but > > there's also quite a bit of other load on the FS, i.e. metadata. I found that > > browsing directories and especially mounting the filesystem had a great > > benefit from caching. > > > > You are correct that it will try to increase performance via writeback > > caching, however with LVM that needs to be enabled explicitly: > > https://www.systutorials.com/docs/linux/man/7-lvmcache/#lbAK > > And of course a failure of that cache SSD will mean losing some data, even if > > the main array is RAID. Perhaps should consider a RAID of SSDs for cache in > > that case then. > > > Yes, I have 2x500GB ssds for cache. May be, I should do raid1 on them > and use as cache volume. > I thought SSDs are more reliable and even when they begin to die, they > become readonly before quitting. Of course, this is all theory, and I > do not think standards exists on how they behave when reaching EoL. > > Ramesh > ^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: Best way to add caching to a new raid setup. 2020-08-29 21:26 ` Roger Heflin @ 2020-08-30 0:56 ` Ram Ramesh 2020-08-30 15:42 ` Roger Heflin 0 siblings, 1 reply; 36+ messages in thread From: Ram Ramesh @ 2020-08-30 0:56 UTC (permalink / raw) To: Roger Heflin; +Cc: Roman Mamedov, R. Ramesh, antlists, Linux Raid On 8/29/20 4:26 PM, Roger Heflin wrote: > It should be worth noting that if you buy 2 exactly the same SSD's at > the same time and use them in a mirror they are very likely to be > wearing about the same. > > I am hesitant to go much bigger on disks, especially since the $$/GB > really does not change much as the disks get bigger. > > And be careful of adding on a cheap sata controller as a lot of them work badly. > > Most of my disks have died from bad blocks causing a section of the > disk to have some errors, or bad blocks on sections causing the array > to pause for 7 seconds. Make sure to get a disk with SCTERC settable > (timeout when bad blocks happen, otherwise the default timeout is a > 60-120seconds, but with it you can set it to no more than 7 seconds). > In the cases where the entire disk did not just stop and is just > getting bad blocks in places, typically you have time as only a single > section is getting bad blocks, so in this case having sections does > help. Also note that mdadm with 4 sections like I have will only > run a single rebuild at a time as mdadm understands that the > underlying disks are shared, this makes replacing a disk with 1 > section or 4 sections basically work pretty much the same. It does > the same thing on the weekly scans, it sets all 4 to scan, and it > scans 1 and defers the other scan as disks are shared. > > It seems to be a disk completely dying is a lot less often than badblock issues. > > On Sat, Aug 29, 2020 at 3:50 PM Ram Ramesh <rramesh2400@gmail.com> wrote: >> On 8/29/20 12:02 AM, Roman Mamedov wrote: >>> On Fri, 28 Aug 2020 22:08:22 -0500 >>> "R. Ramesh" <rramesh@verizon.net> wrote: >>> >>>> I do not know how SSD caching is implemented. I assumed it will be >>>> somewhat similar to memory cache (L2 vs L3 vs L4 etc). I am hoping that >>>> with SSD caching, reads/writes to disk will be larger in size and >>>> sequential within a file (similar to cache line fill in memory cache >>>> which results in memory bursts that are efficient). I thought that is >>>> what SSD caching will do to disk reads/writes. I assumed, once reads >>>> (ahead) and writes (assuming writeback cache) buffers data sufficiently >>>> in the SSD, all reads/writes will be to SSD with periodic well organized >>>> large transfers to disk. If I am wrong here then I do not see any point >>>> in SSD as a cache. My aim is not to optimize by cache hits, but optimize >>>> by preventing disks from thrashing back and forth seeking after every >>>> block read. I suppose Linux (memory) buffer cache alleviates some of >>>> that. I was hoping SSD will provide next level. If not, I am off in my >>>> understanding of SSD as a disk cache. >>> Just try it, as I said before with LVM it is easy to remove if it doesn't work >>> out. You can always go to the manual copying method or whatnot, but first why >>> not check if the automatic caching solution might be "good enough" for your >>> needs. >>> >>> Yes it usually tries to avoid caching long sequential reads or writes, but >>> there's also quite a bit of other load on the FS, i.e. metadata. I found that >>> browsing directories and especially mounting the filesystem had a great >>> benefit from caching. >>> >>> You are correct that it will try to increase performance via writeback >>> caching, however with LVM that needs to be enabled explicitly: >>> https://www.systutorials.com/docs/linux/man/7-lvmcache/#lbAK >>> And of course a failure of that cache SSD will mean losing some data, even if >>> the main array is RAID. Perhaps should consider a RAID of SSDs for cache in >>> that case then. >>> >> Yes, I have 2x500GB ssds for cache. May be, I should do raid1 on them >> and use as cache volume. >> I thought SSDs are more reliable and even when they begin to die, they >> become readonly before quitting. Of course, this is all theory, and I >> do not think standards exists on how they behave when reaching EoL. >> >> Ramesh >> My SSDs are from different companies and bought at different times (2019/2016, I think). I have not had many hard disk failures. However, each time I had one, it has been a total death. So, I am a bit biased. May be with sections, I can replace one md at a time and letting others run degraded. I am sure there other tricks. I am simply saying it is a lot of reads/writes, and of course computation, in cold replacement of disks in RAID6 vs. RAID1. Yes, larger disks are not cheaper, but they use one SATA port vs. smaller disks. Also, they use less power in the long run (mine run 24x7). That is why I have a policy of replacing disks once 2x size disks (compared to what I currently own) become commonplace. I have a LSI 9211 SAS HBA which is touted to be reliable by this community. Regards Ramesh ^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: Best way to add caching to a new raid setup. 2020-08-30 0:56 ` Ram Ramesh @ 2020-08-30 15:42 ` Roger Heflin 2020-08-30 17:19 ` Ram Ramesh 2020-09-11 18:39 ` R. Ramesh 0 siblings, 2 replies; 36+ messages in thread From: Roger Heflin @ 2020-08-30 15:42 UTC (permalink / raw) To: Ram Ramesh; +Cc: Roman Mamedov, R. Ramesh, antlists, Linux Raid The LSI should be a good controller as long as you the HBA fw and not the raid fw. I use an LSI with hba + the 8 AMD chipset sata ports, currently I have 12 ports cabled to hot swap bays but only 7+boot disk used. How many recording do you think you will have and how many clients/watchers? With the SSD handling the writes for recording my disks actually spin down if no one is watching anything. The other trick the partitions let me do is initially I moved from 1.5 -> 3tb disks (2x750 -> 4x750) and once I got 3-3tbs in I added the 2 more partitions raid6(+1.5TB) (I bought the 3tb drives slowly), then the next 3tb gets added to all 4 partitions (+3TB). On reads at least each disk can do at least 50 iops, and for the most part the disks themselves are very likely to cache the entire track the head goes over, so a 2nd sequential read likely comes from the disk's read cache and does not have to actually be read. So several sequential workloads jumping back and forth do not behave as bad as one would expect. Write are a different story and a lot more expensive. I isloate those to ssd and copy them in the middle of the night when it is low activity. And since they are being copied as big fast streams one file at a time they end up with very few fragments and write very quickly. The way I have mine setup mythtv will find the file whether it is on the ssd recording directory or the raid recording directory, so when I mv the files nothing has to be done except the mv. On Sat, Aug 29, 2020 at 7:56 PM Ram Ramesh <rramesh2400@gmail.com> wrote: > > On 8/29/20 4:26 PM, Roger Heflin wrote: > > It should be worth noting that if you buy 2 exactly the same SSD's at > > the same time and use them in a mirror they are very likely to be > > wearing about the same. > > > > I am hesitant to go much bigger on disks, especially since the $$/GB > > really does not change much as the disks get bigger. > > > > And be careful of adding on a cheap sata controller as a lot of them work badly. > > > > Most of my disks have died from bad blocks causing a section of the > > disk to have some errors, or bad blocks on sections causing the array > > to pause for 7 seconds. Make sure to get a disk with SCTERC settable > > (timeout when bad blocks happen, otherwise the default timeout is a > > 60-120seconds, but with it you can set it to no more than 7 seconds). > > In the cases where the entire disk did not just stop and is just > > getting bad blocks in places, typically you have time as only a single > > section is getting bad blocks, so in this case having sections does > > help. Also note that mdadm with 4 sections like I have will only > > run a single rebuild at a time as mdadm understands that the > > underlying disks are shared, this makes replacing a disk with 1 > > section or 4 sections basically work pretty much the same. It does > > the same thing on the weekly scans, it sets all 4 to scan, and it > > scans 1 and defers the other scan as disks are shared. > > > > It seems to be a disk completely dying is a lot less often than badblock issues. > > > > On Sat, Aug 29, 2020 at 3:50 PM Ram Ramesh <rramesh2400@gmail.com> wrote: > >> On 8/29/20 12:02 AM, Roman Mamedov wrote: > >>> On Fri, 28 Aug 2020 22:08:22 -0500 > >>> "R. Ramesh" <rramesh@verizon.net> wrote: > >>> > >>>> I do not know how SSD caching is implemented. I assumed it will be > >>>> somewhat similar to memory cache (L2 vs L3 vs L4 etc). I am hoping that > >>>> with SSD caching, reads/writes to disk will be larger in size and > >>>> sequential within a file (similar to cache line fill in memory cache > >>>> which results in memory bursts that are efficient). I thought that is > >>>> what SSD caching will do to disk reads/writes. I assumed, once reads > >>>> (ahead) and writes (assuming writeback cache) buffers data sufficiently > >>>> in the SSD, all reads/writes will be to SSD with periodic well organized > >>>> large transfers to disk. If I am wrong here then I do not see any point > >>>> in SSD as a cache. My aim is not to optimize by cache hits, but optimize > >>>> by preventing disks from thrashing back and forth seeking after every > >>>> block read. I suppose Linux (memory) buffer cache alleviates some of > >>>> that. I was hoping SSD will provide next level. If not, I am off in my > >>>> understanding of SSD as a disk cache. > >>> Just try it, as I said before with LVM it is easy to remove if it doesn't work > >>> out. You can always go to the manual copying method or whatnot, but first why > >>> not check if the automatic caching solution might be "good enough" for your > >>> needs. > >>> > >>> Yes it usually tries to avoid caching long sequential reads or writes, but > >>> there's also quite a bit of other load on the FS, i.e. metadata. I found that > >>> browsing directories and especially mounting the filesystem had a great > >>> benefit from caching. > >>> > >>> You are correct that it will try to increase performance via writeback > >>> caching, however with LVM that needs to be enabled explicitly: > >>> https://www.systutorials.com/docs/linux/man/7-lvmcache/#lbAK > >>> And of course a failure of that cache SSD will mean losing some data, even if > >>> the main array is RAID. Perhaps should consider a RAID of SSDs for cache in > >>> that case then. > >>> > >> Yes, I have 2x500GB ssds for cache. May be, I should do raid1 on them > >> and use as cache volume. > >> I thought SSDs are more reliable and even when they begin to die, they > >> become readonly before quitting. Of course, this is all theory, and I > >> do not think standards exists on how they behave when reaching EoL. > >> > >> Ramesh > >> > My SSDs are from different companies and bought at different times > (2019/2016, I think). > > I have not had many hard disk failures. However, each time I had one, it > has been a total death. So, I am a bit biased. May be with sections, I > can replace one md at a time and letting others run degraded. I am sure > there other tricks. I am simply saying it is a lot of reads/writes, and > of course computation, in cold replacement of disks in RAID6 vs. RAID1. > > Yes, larger disks are not cheaper, but they use one SATA port vs. > smaller disks. Also, they use less power in the long run (mine run > 24x7). That is why I have a policy of replacing disks once 2x size disks > (compared to what I currently own) become commonplace. > > I have a LSI 9211 SAS HBA which is touted to be reliable by this community. > > Regards > Ramesh > ^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: Best way to add caching to a new raid setup. 2020-08-30 15:42 ` Roger Heflin @ 2020-08-30 17:19 ` Ram Ramesh 2020-09-11 18:39 ` R. Ramesh 1 sibling, 0 replies; 36+ messages in thread From: Ram Ramesh @ 2020-08-30 17:19 UTC (permalink / raw) To: Roger Heflin; +Cc: Linux Raid On 8/30/20 10:42 AM, Roger Heflin wrote: > The LSI should be a good controller as long as you the HBA fw and not > the raid fw. > > I use an LSI with hba + the 8 AMD chipset sata ports, currently I have > 12 ports cabled to hot swap bays but only 7+boot disk used. > > How many recording do you think you will have and how many > clients/watchers? With the SSD handling the writes for recording my > disks actually spin down if no one is watching anything. > > The other trick the partitions let me do is initially I moved from 1.5 > -> 3tb disks (2x750 -> 4x750) and once I got 3-3tbs in I added the 2 > more partitions raid6(+1.5TB) (I bought the 3tb drives slowly), then > the next 3tb gets added to all 4 partitions (+3TB). > > On reads at least each disk can do at least 50 iops, and for the most > part the disks themselves are very likely to cache the entire track > the head goes over, so a 2nd sequential read likely comes from the > disk's read cache and does not have to actually be read. So several > sequential workloads jumping back and forth do not behave as bad as > one would expect. Write are a different story and a lot more > expensive. I isloate those to ssd and copy them in the middle of the > night when it is low activity. And since they are being copied as big > fast streams one file at a time they end up with very few fragments > and write very quickly. The way I have mine setup mythtv will find > the file whether it is on the ssd recording directory or the raid > recording directory, so when I mv the files nothing has to be done > except the mv. > > > On Sat, Aug 29, 2020 at 7:56 PM Ram Ramesh <rramesh2400@gmail.com> wrote: >> On 8/29/20 4:26 PM, Roger Heflin wrote: >>> It should be worth noting that if you buy 2 exactly the same SSD's at >>> the same time and use them in a mirror they are very likely to be >>> wearing about the same. >>> >>> I am hesitant to go much bigger on disks, especially since the $$/GB >>> really does not change much as the disks get bigger. >>> >>> And be careful of adding on a cheap sata controller as a lot of them work badly. >>> >>> Most of my disks have died from bad blocks causing a section of the >>> disk to have some errors, or bad blocks on sections causing the array >>> to pause for 7 seconds. Make sure to get a disk with SCTERC settable >>> (timeout when bad blocks happen, otherwise the default timeout is a >>> 60-120seconds, but with it you can set it to no more than 7 seconds). >>> In the cases where the entire disk did not just stop and is just >>> getting bad blocks in places, typically you have time as only a single >>> section is getting bad blocks, so in this case having sections does >>> help. Also note that mdadm with 4 sections like I have will only >>> run a single rebuild at a time as mdadm understands that the >>> underlying disks are shared, this makes replacing a disk with 1 >>> section or 4 sections basically work pretty much the same. It does >>> the same thing on the weekly scans, it sets all 4 to scan, and it >>> scans 1 and defers the other scan as disks are shared. >>> >>> It seems to be a disk completely dying is a lot less often than badblock issues. >>> >>> On Sat, Aug 29, 2020 at 3:50 PM Ram Ramesh <rramesh2400@gmail.com> wrote: >>>> On 8/29/20 12:02 AM, Roman Mamedov wrote: >>>>> On Fri, 28 Aug 2020 22:08:22 -0500 >>>>> "R. Ramesh" <rramesh@verizon.net> wrote: >>>>> >>>>>> I do not know how SSD caching is implemented. I assumed it will be >>>>>> somewhat similar to memory cache (L2 vs L3 vs L4 etc). I am hoping that >>>>>> with SSD caching, reads/writes to disk will be larger in size and >>>>>> sequential within a file (similar to cache line fill in memory cache >>>>>> which results in memory bursts that are efficient). I thought that is >>>>>> what SSD caching will do to disk reads/writes. I assumed, once reads >>>>>> (ahead) and writes (assuming writeback cache) buffers data sufficiently >>>>>> in the SSD, all reads/writes will be to SSD with periodic well organized >>>>>> large transfers to disk. If I am wrong here then I do not see any point >>>>>> in SSD as a cache. My aim is not to optimize by cache hits, but optimize >>>>>> by preventing disks from thrashing back and forth seeking after every >>>>>> block read. I suppose Linux (memory) buffer cache alleviates some of >>>>>> that. I was hoping SSD will provide next level. If not, I am off in my >>>>>> understanding of SSD as a disk cache. >>>>> Just try it, as I said before with LVM it is easy to remove if it doesn't work >>>>> out. You can always go to the manual copying method or whatnot, but first why >>>>> not check if the automatic caching solution might be "good enough" for your >>>>> needs. >>>>> >>>>> Yes it usually tries to avoid caching long sequential reads or writes, but >>>>> there's also quite a bit of other load on the FS, i.e. metadata. I found that >>>>> browsing directories and especially mounting the filesystem had a great >>>>> benefit from caching. >>>>> >>>>> You are correct that it will try to increase performance via writeback >>>>> caching, however with LVM that needs to be enabled explicitly: >>>>> https://www.systutorials.com/docs/linux/man/7-lvmcache/#lbAK >>>>> And of course a failure of that cache SSD will mean losing some data, even if >>>>> the main array is RAID. Perhaps should consider a RAID of SSDs for cache in >>>>> that case then. >>>>> >>>> Yes, I have 2x500GB ssds for cache. May be, I should do raid1 on them >>>> and use as cache volume. >>>> I thought SSDs are more reliable and even when they begin to die, they >>>> become readonly before quitting. Of course, this is all theory, and I >>>> do not think standards exists on how they behave when reaching EoL. >>>> >>>> Ramesh >>>> >> My SSDs are from different companies and bought at different times >> (2019/2016, I think). >> >> I have not had many hard disk failures. However, each time I had one, it >> has been a total death. So, I am a bit biased. May be with sections, I >> can replace one md at a time and letting others run degraded. I am sure >> there other tricks. I am simply saying it is a lot of reads/writes, and >> of course computation, in cold replacement of disks in RAID6 vs. RAID1. >> >> Yes, larger disks are not cheaper, but they use one SATA port vs. >> smaller disks. Also, they use less power in the long run (mine run >> 24x7). That is why I have a policy of replacing disks once 2x size disks >> (compared to what I currently own) become commonplace. >> >> I have a LSI 9211 SAS HBA which is touted to be reliable by this community. >> >> Regards >> Ramesh >> Roger, Thanks for the details on your SSD setup. Yes, mythtv is supposed to find the file from storage group entries regardless of the actual location and thus mv is all that is required. However, I have never tried to use this feature though. So, it will be a new thing for me. Like I said before, I will try the LVM cache and see my disk activities are better. If that is not to my satisfaction, I will remove the cache and add it differently like you have. I only have a 500GB SSD, but I do not think daily recording will be anywhere close to that size. Regards Ramesh ^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: Best way to add caching to a new raid setup. 2020-08-30 15:42 ` Roger Heflin 2020-08-30 17:19 ` Ram Ramesh @ 2020-09-11 18:39 ` R. Ramesh 2020-09-11 20:37 ` Roger Heflin 1 sibling, 1 reply; 36+ messages in thread From: R. Ramesh @ 2020-09-11 18:39 UTC (permalink / raw) To: Roger Heflin; +Cc: Linux Raid On 8/30/20 10:42 AM, Roger Heflin wrote: > The LSI should be a good controller as long as you the HBA fw and not > the raid fw. > > I use an LSI with hba + the 8 AMD chipset sata ports, currently I have > 12 ports cabled to hot swap bays but only 7+boot disk used. > > How many recording do you think you will have and how many > clients/watchers? With the SSD handling the writes for recording my > disks actually spin down if no one is watching anything. > > The other trick the partitions let me do is initially I moved from 1.5 > -> 3tb disks (2x750 -> 4x750) and once I got 3-3tbs in I added the 2 > more partitions raid6(+1.5TB) (I bought the 3tb drives slowly), then > the next 3tb gets added to all 4 partitions (+3TB). > > On reads at least each disk can do at least 50 iops, and for the most > part the disks themselves are very likely to cache the entire track > the head goes over, so a 2nd sequential read likely comes from the > disk's read cache and does not have to actually be read. So several > sequential workloads jumping back and forth do not behave as bad as > one would expect. Write are a different story and a lot more > expensive. I isloate those to ssd and copy them in the middle of the > night when it is low activity. And since they are being copied as big > fast streams one file at a time they end up with very few fragments > and write very quickly. The way I have mine setup mythtv will find > the file whether it is on the ssd recording directory or the raid > recording directory, so when I mv the files nothing has to be done > except the mv. > > > On Sat, Aug 29, 2020 at 7:56 PM Ram Ramesh <rramesh2400@gmail.com> wrote: >> On 8/29/20 4:26 PM, Roger Heflin wrote: >>> It should be worth noting that if you buy 2 exactly the same SSD's at >>> the same time and use them in a mirror they are very likely to be >>> wearing about the same. >>> >>> I am hesitant to go much bigger on disks, especially since the $$/GB >>> really does not change much as the disks get bigger. >>> >>> And be careful of adding on a cheap sata controller as a lot of them work badly. >>> >>> Most of my disks have died from bad blocks causing a section of the >>> disk to have some errors, or bad blocks on sections causing the array >>> to pause for 7 seconds. Make sure to get a disk with SCTERC settable >>> (timeout when bad blocks happen, otherwise the default timeout is a >>> 60-120seconds, but with it you can set it to no more than 7 seconds). >>> In the cases where the entire disk did not just stop and is just >>> getting bad blocks in places, typically you have time as only a single >>> section is getting bad blocks, so in this case having sections does >>> help. Also note that mdadm with 4 sections like I have will only >>> run a single rebuild at a time as mdadm understands that the >>> underlying disks are shared, this makes replacing a disk with 1 >>> section or 4 sections basically work pretty much the same. It does >>> the same thing on the weekly scans, it sets all 4 to scan, and it >>> scans 1 and defers the other scan as disks are shared. >>> >>> It seems to be a disk completely dying is a lot less often than badblock issues. >>> >>> On Sat, Aug 29, 2020 at 3:50 PM Ram Ramesh <rramesh2400@gmail.com> wrote: >>>> On 8/29/20 12:02 AM, Roman Mamedov wrote: >>>>> On Fri, 28 Aug 2020 22:08:22 -0500 >>>>> "R. Ramesh" <rramesh@verizon.net> wrote: >>>>> >>>>>> I do not know how SSD caching is implemented. I assumed it will be >>>>>> somewhat similar to memory cache (L2 vs L3 vs L4 etc). I am hoping that >>>>>> with SSD caching, reads/writes to disk will be larger in size and >>>>>> sequential within a file (similar to cache line fill in memory cache >>>>>> which results in memory bursts that are efficient). I thought that is >>>>>> what SSD caching will do to disk reads/writes. I assumed, once reads >>>>>> (ahead) and writes (assuming writeback cache) buffers data sufficiently >>>>>> in the SSD, all reads/writes will be to SSD with periodic well organized >>>>>> large transfers to disk. If I am wrong here then I do not see any point >>>>>> in SSD as a cache. My aim is not to optimize by cache hits, but optimize >>>>>> by preventing disks from thrashing back and forth seeking after every >>>>>> block read. I suppose Linux (memory) buffer cache alleviates some of >>>>>> that. I was hoping SSD will provide next level. If not, I am off in my >>>>>> understanding of SSD as a disk cache. >>>>> Just try it, as I said before with LVM it is easy to remove if it doesn't work >>>>> out. You can always go to the manual copying method or whatnot, but first why >>>>> not check if the automatic caching solution might be "good enough" for your >>>>> needs. >>>>> >>>>> Yes it usually tries to avoid caching long sequential reads or writes, but >>>>> there's also quite a bit of other load on the FS, i.e. metadata. I found that >>>>> browsing directories and especially mounting the filesystem had a great >>>>> benefit from caching. >>>>> >>>>> You are correct that it will try to increase performance via writeback >>>>> caching, however with LVM that needs to be enabled explicitly: >>>>> https://www.systutorials.com/docs/linux/man/7-lvmcache/#lbAK >>>>> And of course a failure of that cache SSD will mean losing some data, even if >>>>> the main array is RAID. Perhaps should consider a RAID of SSDs for cache in >>>>> that case then. >>>>> >>>> Yes, I have 2x500GB ssds for cache. May be, I should do raid1 on them >>>> and use as cache volume. >>>> I thought SSDs are more reliable and even when they begin to die, they >>>> become readonly before quitting. Of course, this is all theory, and I >>>> do not think standards exists on how they behave when reaching EoL. >>>> >>>> Ramesh >>>> >> My SSDs are from different companies and bought at different times >> (2019/2016, I think). >> >> I have not had many hard disk failures. However, each time I had one, it >> has been a total death. So, I am a bit biased. May be with sections, I >> can replace one md at a time and letting others run degraded. I am sure >> there other tricks. I am simply saying it is a lot of reads/writes, and >> of course computation, in cold replacement of disks in RAID6 vs. RAID1. >> >> Yes, larger disks are not cheaper, but they use one SATA port vs. >> smaller disks. Also, they use less power in the long run (mine run >> 24x7). That is why I have a policy of replacing disks once 2x size disks >> (compared to what I currently own) become commonplace. >> >> I have a LSI 9211 SAS HBA which is touted to be reliable by this community. >> >> Regards >> Ramesh >> Roger, Just curious, in your search for a SSD solution to mythtv recording, did you consider overlayfs, unionfs or mergerfs? If you did, why did you decide that a simple copy is better? Ramesh ^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: Best way to add caching to a new raid setup. 2020-09-11 18:39 ` R. Ramesh @ 2020-09-11 20:37 ` Roger Heflin 2020-09-11 22:41 ` Ram Ramesh 0 siblings, 1 reply; 36+ messages in thread From: Roger Heflin @ 2020-09-11 20:37 UTC (permalink / raw) To: R. Ramesh; +Cc: Linux Raid It is simpler, and has very simple to maintain moving parts. I have been a linux admin for 20+ years, and a professional unix admin for longer, and too often complicated seems nice but has burned me with bugs and other unexpected results, so simple is best. The daily move uses nothing complicated and can be expected to work on any unix system that has ever existed and relies on heavily used operations that have a high probability of working and of being caught quickly as broken if they did not work. Any of the others are a bit more complicated, and more likely to have bugs and less likely to get caught as quick as the moving parts I rely on. I also wanted to be able to spin down my array for any hours when no one is watching the dvr (usually this is 18+ hours per day, x 7 drives == 1.25kw/day, or 37kw/month, or $4-$10 depending on power costs), and I also have motion software collecting security cams that go to the SSD and are also copied onto the array nighty. The security cams would have kept the array spinning when anything moved anywhere outside so pretty much 100% of the time. On Fri, Sep 11, 2020 at 1:39 PM R. Ramesh <rramesh@verizon.net> wrote: > > On 8/30/20 10:42 AM, Roger Heflin wrote: > > The LSI should be a good controller as long as you the HBA fw and not > > the raid fw. > > > > I use an LSI with hba + the 8 AMD chipset sata ports, currently I have > > 12 ports cabled to hot swap bays but only 7+boot disk used. > > > > How many recording do you think you will have and how many > > clients/watchers? With the SSD handling the writes for recording my > > disks actually spin down if no one is watching anything. > > > > The other trick the partitions let me do is initially I moved from 1.5 > > -> 3tb disks (2x750 -> 4x750) and once I got 3-3tbs in I added the 2 > > more partitions raid6(+1.5TB) (I bought the 3tb drives slowly), then > > the next 3tb gets added to all 4 partitions (+3TB). > > > > On reads at least each disk can do at least 50 iops, and for the most > > part the disks themselves are very likely to cache the entire track > > the head goes over, so a 2nd sequential read likely comes from the > > disk's read cache and does not have to actually be read. So several > > sequential workloads jumping back and forth do not behave as bad as > > one would expect. Write are a different story and a lot more > > expensive. I isloate those to ssd and copy them in the middle of the > > night when it is low activity. And since they are being copied as big > > fast streams one file at a time they end up with very few fragments > > and write very quickly. The way I have mine setup mythtv will find > > the file whether it is on the ssd recording directory or the raid > > recording directory, so when I mv the files nothing has to be done > > except the mv. > > > > > > On Sat, Aug 29, 2020 at 7:56 PM Ram Ramesh <rramesh2400@gmail.com> wrote: > >> On 8/29/20 4:26 PM, Roger Heflin wrote: > >>> It should be worth noting that if you buy 2 exactly the same SSD's at > >>> the same time and use them in a mirror they are very likely to be > >>> wearing about the same. > >>> > >>> I am hesitant to go much bigger on disks, especially since the $$/GB > >>> really does not change much as the disks get bigger. > >>> > >>> And be careful of adding on a cheap sata controller as a lot of them work badly. > >>> > >>> Most of my disks have died from bad blocks causing a section of the > >>> disk to have some errors, or bad blocks on sections causing the array > >>> to pause for 7 seconds. Make sure to get a disk with SCTERC settable > >>> (timeout when bad blocks happen, otherwise the default timeout is a > >>> 60-120seconds, but with it you can set it to no more than 7 seconds). > >>> In the cases where the entire disk did not just stop and is just > >>> getting bad blocks in places, typically you have time as only a single > >>> section is getting bad blocks, so in this case having sections does > >>> help. Also note that mdadm with 4 sections like I have will only > >>> run a single rebuild at a time as mdadm understands that the > >>> underlying disks are shared, this makes replacing a disk with 1 > >>> section or 4 sections basically work pretty much the same. It does > >>> the same thing on the weekly scans, it sets all 4 to scan, and it > >>> scans 1 and defers the other scan as disks are shared. > >>> > >>> It seems to be a disk completely dying is a lot less often than badblock issues. > >>> > >>> On Sat, Aug 29, 2020 at 3:50 PM Ram Ramesh <rramesh2400@gmail.com> wrote: > >>>> On 8/29/20 12:02 AM, Roman Mamedov wrote: > >>>>> On Fri, 28 Aug 2020 22:08:22 -0500 > >>>>> "R. Ramesh" <rramesh@verizon.net> wrote: > >>>>> > >>>>>> I do not know how SSD caching is implemented. I assumed it will be > >>>>>> somewhat similar to memory cache (L2 vs L3 vs L4 etc). I am hoping that > >>>>>> with SSD caching, reads/writes to disk will be larger in size and > >>>>>> sequential within a file (similar to cache line fill in memory cache > >>>>>> which results in memory bursts that are efficient). I thought that is > >>>>>> what SSD caching will do to disk reads/writes. I assumed, once reads > >>>>>> (ahead) and writes (assuming writeback cache) buffers data sufficiently > >>>>>> in the SSD, all reads/writes will be to SSD with periodic well organized > >>>>>> large transfers to disk. If I am wrong here then I do not see any point > >>>>>> in SSD as a cache. My aim is not to optimize by cache hits, but optimize > >>>>>> by preventing disks from thrashing back and forth seeking after every > >>>>>> block read. I suppose Linux (memory) buffer cache alleviates some of > >>>>>> that. I was hoping SSD will provide next level. If not, I am off in my > >>>>>> understanding of SSD as a disk cache. > >>>>> Just try it, as I said before with LVM it is easy to remove if it doesn't work > >>>>> out. You can always go to the manual copying method or whatnot, but first why > >>>>> not check if the automatic caching solution might be "good enough" for your > >>>>> needs. > >>>>> > >>>>> Yes it usually tries to avoid caching long sequential reads or writes, but > >>>>> there's also quite a bit of other load on the FS, i.e. metadata. I found that > >>>>> browsing directories and especially mounting the filesystem had a great > >>>>> benefit from caching. > >>>>> > >>>>> You are correct that it will try to increase performance via writeback > >>>>> caching, however with LVM that needs to be enabled explicitly: > >>>>> https://www.systutorials.com/docs/linux/man/7-lvmcache/#lbAK > >>>>> And of course a failure of that cache SSD will mean losing some data, even if > >>>>> the main array is RAID. Perhaps should consider a RAID of SSDs for cache in > >>>>> that case then. > >>>>> > >>>> Yes, I have 2x500GB ssds for cache. May be, I should do raid1 on them > >>>> and use as cache volume. > >>>> I thought SSDs are more reliable and even when they begin to die, they > >>>> become readonly before quitting. Of course, this is all theory, and I > >>>> do not think standards exists on how they behave when reaching EoL. > >>>> > >>>> Ramesh > >>>> > >> My SSDs are from different companies and bought at different times > >> (2019/2016, I think). > >> > >> I have not had many hard disk failures. However, each time I had one, it > >> has been a total death. So, I am a bit biased. May be with sections, I > >> can replace one md at a time and letting others run degraded. I am sure > >> there other tricks. I am simply saying it is a lot of reads/writes, and > >> of course computation, in cold replacement of disks in RAID6 vs. RAID1. > >> > >> Yes, larger disks are not cheaper, but they use one SATA port vs. > >> smaller disks. Also, they use less power in the long run (mine run > >> 24x7). That is why I have a policy of replacing disks once 2x size disks > >> (compared to what I currently own) become commonplace. > >> > >> I have a LSI 9211 SAS HBA which is touted to be reliable by this community. > >> > >> Regards > >> Ramesh > >> > > Roger, > > Just curious, in your search for a SSD solution to mythtv recording, > did you consider overlayfs, unionfs or mergerfs? If you did, why did you > decide that a simple copy is better? > > Ramesh > ^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: Best way to add caching to a new raid setup. 2020-09-11 20:37 ` Roger Heflin @ 2020-09-11 22:41 ` Ram Ramesh 0 siblings, 0 replies; 36+ messages in thread From: Ram Ramesh @ 2020-09-11 22:41 UTC (permalink / raw) To: Roger Heflin, R. Ramesh, Linux Raid Appreciate the details. I agree spinning down the disk is a good idea to consider. I will look more in to it. Ramesh On 9/11/20 3:37 PM, Roger Heflin wrote: > It is simpler, and has very simple to maintain moving parts. I have > been a linux admin for 20+ years, and a professional unix admin for > longer, and too often complicated seems nice but has burned me with > bugs and other unexpected results, so simple is best. The daily move > uses nothing complicated and can be expected to work on any unix > system that has ever existed and relies on heavily used operations > that have a high probability of working and of being caught quickly as > broken if they did not work. Any of the others are a bit more > complicated, and more likely to have bugs and less likely to get > caught as quick as the moving parts I rely on. I also wanted to be > able to spin down my array for any hours when no one is watching the > dvr (usually this is 18+ hours per day, x 7 drives == 1.25kw/day, or > 37kw/month, or $4-$10 depending on power costs), and I also have > motion software collecting security cams that go to the SSD and are > also copied onto the array nighty. The security cams would have kept > the array spinning when anything moved anywhere outside so pretty much > 100% of the time. > > On Fri, Sep 11, 2020 at 1:39 PM R. Ramesh <rramesh@verizon.net> wrote: >> On 8/30/20 10:42 AM, Roger Heflin wrote: >>> The LSI should be a good controller as long as you the HBA fw and not >>> the raid fw. >>> >>> I use an LSI with hba + the 8 AMD chipset sata ports, currently I have >>> 12 ports cabled to hot swap bays but only 7+boot disk used. >>> >>> How many recording do you think you will have and how many >>> clients/watchers? With the SSD handling the writes for recording my >>> disks actually spin down if no one is watching anything. >>> >>> The other trick the partitions let me do is initially I moved from 1.5 >>> -> 3tb disks (2x750 -> 4x750) and once I got 3-3tbs in I added the 2 >>> more partitions raid6(+1.5TB) (I bought the 3tb drives slowly), then >>> the next 3tb gets added to all 4 partitions (+3TB). >>> >>> On reads at least each disk can do at least 50 iops, and for the most >>> part the disks themselves are very likely to cache the entire track >>> the head goes over, so a 2nd sequential read likely comes from the >>> disk's read cache and does not have to actually be read. So several >>> sequential workloads jumping back and forth do not behave as bad as >>> one would expect. Write are a different story and a lot more >>> expensive. I isloate those to ssd and copy them in the middle of the >>> night when it is low activity. And since they are being copied as big >>> fast streams one file at a time they end up with very few fragments >>> and write very quickly. The way I have mine setup mythtv will find >>> the file whether it is on the ssd recording directory or the raid >>> recording directory, so when I mv the files nothing has to be done >>> except the mv. >>> >>> >>> On Sat, Aug 29, 2020 at 7:56 PM Ram Ramesh <rramesh2400@gmail.com> wrote: >>>> On 8/29/20 4:26 PM, Roger Heflin wrote: >>>>> It should be worth noting that if you buy 2 exactly the same SSD's at >>>>> the same time and use them in a mirror they are very likely to be >>>>> wearing about the same. >>>>> >>>>> I am hesitant to go much bigger on disks, especially since the $$/GB >>>>> really does not change much as the disks get bigger. >>>>> >>>>> And be careful of adding on a cheap sata controller as a lot of them work badly. >>>>> >>>>> Most of my disks have died from bad blocks causing a section of the >>>>> disk to have some errors, or bad blocks on sections causing the array >>>>> to pause for 7 seconds. Make sure to get a disk with SCTERC settable >>>>> (timeout when bad blocks happen, otherwise the default timeout is a >>>>> 60-120seconds, but with it you can set it to no more than 7 seconds). >>>>> In the cases where the entire disk did not just stop and is just >>>>> getting bad blocks in places, typically you have time as only a single >>>>> section is getting bad blocks, so in this case having sections does >>>>> help. Also note that mdadm with 4 sections like I have will only >>>>> run a single rebuild at a time as mdadm understands that the >>>>> underlying disks are shared, this makes replacing a disk with 1 >>>>> section or 4 sections basically work pretty much the same. It does >>>>> the same thing on the weekly scans, it sets all 4 to scan, and it >>>>> scans 1 and defers the other scan as disks are shared. >>>>> >>>>> It seems to be a disk completely dying is a lot less often than badblock issues. >>>>> >>>>> On Sat, Aug 29, 2020 at 3:50 PM Ram Ramesh <rramesh2400@gmail.com> wrote: >>>>>> On 8/29/20 12:02 AM, Roman Mamedov wrote: >>>>>>> On Fri, 28 Aug 2020 22:08:22 -0500 >>>>>>> "R. Ramesh" <rramesh@verizon.net> wrote: >>>>>>> >>>>>>>> I do not know how SSD caching is implemented. I assumed it will be >>>>>>>> somewhat similar to memory cache (L2 vs L3 vs L4 etc). I am hoping that >>>>>>>> with SSD caching, reads/writes to disk will be larger in size and >>>>>>>> sequential within a file (similar to cache line fill in memory cache >>>>>>>> which results in memory bursts that are efficient). I thought that is >>>>>>>> what SSD caching will do to disk reads/writes. I assumed, once reads >>>>>>>> (ahead) and writes (assuming writeback cache) buffers data sufficiently >>>>>>>> in the SSD, all reads/writes will be to SSD with periodic well organized >>>>>>>> large transfers to disk. If I am wrong here then I do not see any point >>>>>>>> in SSD as a cache. My aim is not to optimize by cache hits, but optimize >>>>>>>> by preventing disks from thrashing back and forth seeking after every >>>>>>>> block read. I suppose Linux (memory) buffer cache alleviates some of >>>>>>>> that. I was hoping SSD will provide next level. If not, I am off in my >>>>>>>> understanding of SSD as a disk cache. >>>>>>> Just try it, as I said before with LVM it is easy to remove if it doesn't work >>>>>>> out. You can always go to the manual copying method or whatnot, but first why >>>>>>> not check if the automatic caching solution might be "good enough" for your >>>>>>> needs. >>>>>>> >>>>>>> Yes it usually tries to avoid caching long sequential reads or writes, but >>>>>>> there's also quite a bit of other load on the FS, i.e. metadata. I found that >>>>>>> browsing directories and especially mounting the filesystem had a great >>>>>>> benefit from caching. >>>>>>> >>>>>>> You are correct that it will try to increase performance via writeback >>>>>>> caching, however with LVM that needs to be enabled explicitly: >>>>>>> https://www.systutorials.com/docs/linux/man/7-lvmcache/#lbAK >>>>>>> And of course a failure of that cache SSD will mean losing some data, even if >>>>>>> the main array is RAID. Perhaps should consider a RAID of SSDs for cache in >>>>>>> that case then. >>>>>>> >>>>>> Yes, I have 2x500GB ssds for cache. May be, I should do raid1 on them >>>>>> and use as cache volume. >>>>>> I thought SSDs are more reliable and even when they begin to die, they >>>>>> become readonly before quitting. Of course, this is all theory, and I >>>>>> do not think standards exists on how they behave when reaching EoL. >>>>>> >>>>>> Ramesh >>>>>> >>>> My SSDs are from different companies and bought at different times >>>> (2019/2016, I think). >>>> >>>> I have not had many hard disk failures. However, each time I had one, it >>>> has been a total death. So, I am a bit biased. May be with sections, I >>>> can replace one md at a time and letting others run degraded. I am sure >>>> there other tricks. I am simply saying it is a lot of reads/writes, and >>>> of course computation, in cold replacement of disks in RAID6 vs. RAID1. >>>> >>>> Yes, larger disks are not cheaper, but they use one SATA port vs. >>>> smaller disks. Also, they use less power in the long run (mine run >>>> 24x7). That is why I have a policy of replacing disks once 2x size disks >>>> (compared to what I currently own) become commonplace. >>>> >>>> I have a LSI 9211 SAS HBA which is touted to be reliable by this community. >>>> >>>> Regards >>>> Ramesh >>>> >> Roger, >> >> Just curious, in your search for a SSD solution to mythtv recording, >> did you consider overlayfs, unionfs or mergerfs? If you did, why did you >> decide that a simple copy is better? >> >> Ramesh >> ^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: Best way to add caching to a new raid setup. 2020-08-28 22:40 ` Ram Ramesh 2020-08-28 22:59 ` antlists @ 2020-08-29 0:01 ` Roger Heflin 2020-08-29 3:12 ` R. Ramesh 2020-08-31 19:20 ` Nix 2 siblings, 1 reply; 36+ messages in thread From: Roger Heflin @ 2020-08-29 0:01 UTC (permalink / raw) To: Ram Ramesh; +Cc: antlists, R. Ramesh, Linux Raid Something I would suggest, I have found improves my mythtv experience is: Get a big enough SSD to hold 12-18 hours of the recording or whatever you do daily, and setup the recordings to go to the SSD. i defined use the disk with the highest percentage free to be used first, and since my raid6 is always 90% plus the SSD always gets used. Then nightly I move the files from the ssd recordings directory onto the raid6 recordings directory. This also helps when your disks start going bad and getting badblocks, the badblocks *WILL* cause mythtv to stop recording shows at random because of some prior choices the developers made (sync often, and if you get more than a few seconds behind stop recording, attempting to save some recordings). I also put daily security camera data on the ssd and copy it over to the raid6 device nightly. Using the ssd for recording much reduces the load on the slower raid6 spinning disks. You would have to have a large number of people watching at the same time as the watching is relatively easy load, compared to the writes. On Fri, Aug 28, 2020 at 5:42 PM Ram Ramesh <rramesh2400@gmail.com> wrote: > > On 8/28/20 5:12 PM, antlists wrote: > > On 28/08/2020 18:25, Ram Ramesh wrote: > >> I am mainly looking for IOP improvement as I want to use this RAID in > >> mythtv environment. So multiple threads will be active and I expect > >> cache to help with random access IOPs. > > > > ??? > > > > Caching will only help in a read-after-write scenario, or a > > read-several-times scenario. > > > > I'm guessing mythtv means it's a film server? Can ALL your films (or > > at least your favourite "watch again and again" ones) fit in the > > cache? If you watch a lot of films, chances are you'll read it from > > disk (no advantage from the cache), and by the time you watch it again > > it will have been evicted so you'll have to read it again. > > > > The other time cache may be useful, is if you're recording one thing > > and watching another. That way, the writes can stall in cache as you > > prioritise reading. > > > > Think about what is actually happening at the i/o level, and will > > cache help? > > > > Cheers, > > Wol > > Mythtv is a sever client DVR system. I have a client next to each of my > TVs and one backend with large disk (this will have RAID with cache). At > any time many clients will be accessing different programs and any > scheduled recording will also be going on in parallel. So you will see a > lot of seeks, but still all will be based on limited threads (I only > have 3 TVs and may be one other PC acting as a client) So lots of IOs, > mostly sequential, across small number of threads. I think most cache > algorithms should be able to benefit from random access to blocks in SSD. > > Do you see any flaws in my argument? > > Regards > Ramesh > ^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: Best way to add caching to a new raid setup. 2020-08-29 0:01 ` Roger Heflin @ 2020-08-29 3:12 ` R. Ramesh 2020-08-29 22:36 ` Drew 0 siblings, 1 reply; 36+ messages in thread From: R. Ramesh @ 2020-08-29 3:12 UTC (permalink / raw) To: Roger Heflin, Ram Ramesh; +Cc: antlists, Linux Raid On 8/28/20 7:01 PM, Roger Heflin wrote: > Something I would suggest, I have found improves my mythtv experience > is: Get a big enough SSD to hold 12-18 hours of the recording or > whatever you do daily, and setup the recordings to go to the SSD. i > defined use the disk with the highest percentage free to be used > first, and since my raid6 is always 90% plus the SSD always gets used. > Then nightly I move the files from the ssd recordings directory onto > the raid6 recordings directory. This also helps when your disks start > going bad and getting badblocks, the badblocks *WILL* cause mythtv to > stop recording shows at random because of some prior choices the > developers made (sync often, and if you get more than a few seconds > behind stop recording, attempting to save some recordings). > > I also put daily security camera data on the ssd and copy it over to > the raid6 device nightly. > > Using the ssd for recording much reduces the load on the slower raid6 > spinning disks. > > You would have to have a large number of people watching at the same > time as the watching is relatively easy load, compared to the writes. > > On Fri, Aug 28, 2020 at 5:42 PM Ram Ramesh <rramesh2400@gmail.com> wrote: >> On 8/28/20 5:12 PM, antlists wrote: >>> On 28/08/2020 18:25, Ram Ramesh wrote: >>>> I am mainly looking for IOP improvement as I want to use this RAID in >>>> mythtv environment. So multiple threads will be active and I expect >>>> cache to help with random access IOPs. >>> ??? >>> >>> Caching will only help in a read-after-write scenario, or a >>> read-several-times scenario. >>> >>> I'm guessing mythtv means it's a film server? Can ALL your films (or >>> at least your favourite "watch again and again" ones) fit in the >>> cache? If you watch a lot of films, chances are you'll read it from >>> disk (no advantage from the cache), and by the time you watch it again >>> it will have been evicted so you'll have to read it again. >>> >>> The other time cache may be useful, is if you're recording one thing >>> and watching another. That way, the writes can stall in cache as you >>> prioritise reading. >>> >>> Think about what is actually happening at the i/o level, and will >>> cache help? >>> >>> Cheers, >>> Wol >> Mythtv is a sever client DVR system. I have a client next to each of my >> TVs and one backend with large disk (this will have RAID with cache). At >> any time many clients will be accessing different programs and any >> scheduled recording will also be going on in parallel. So you will see a >> lot of seeks, but still all will be based on limited threads (I only >> have 3 TVs and may be one other PC acting as a client) So lots of IOs, >> mostly sequential, across small number of threads. I think most cache >> algorithms should be able to benefit from random access to blocks in SSD. >> >> Do you see any flaws in my argument? >> >> Regards >> Ramesh >> I was hoping SSD caching would do what you are suggesting without daily copying. Based on Wol's comments, it does not. May be I misunderstood how SSD caching works. I will try it any way and see what happens. If it does not do what I want, I will remove caching and go straight to disks. Ramesh ^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: Best way to add caching to a new raid setup. 2020-08-29 3:12 ` R. Ramesh @ 2020-08-29 22:36 ` Drew 2020-09-01 16:12 ` Ram Ramesh 0 siblings, 1 reply; 36+ messages in thread From: Drew @ 2020-08-29 22:36 UTC (permalink / raw) To: R. Ramesh; +Cc: Ram Ramesh, antlists, Linux Raid I know what you and Wols are talking about and I think it's actually two separate things. Wol's is referring to traditional read caching where it only benefits if you are reading the same thing over and over again, cache hits. For streaming it won't help as you'll never hit the cache. What you are talking about is a write cache, something I have seen implemented before. Basically the idea is for writes to hit the SSD's first, the SSD acting as a cache or buffer between the filesystem and the slower RAID array. To the end process they're just writing to a disk, they don't see the SSD buffer/cache. QNAP implements this in their NAS chassis, just not sure what the exact implementation is in their case. On Fri, Aug 28, 2020 at 9:14 PM R. Ramesh <rramesh@verizon.net> wrote: > > On 8/28/20 7:01 PM, Roger Heflin wrote: > > Something I would suggest, I have found improves my mythtv experience > > is: Get a big enough SSD to hold 12-18 hours of the recording or > > whatever you do daily, and setup the recordings to go to the SSD. i > > defined use the disk with the highest percentage free to be used > > first, and since my raid6 is always 90% plus the SSD always gets used. > > Then nightly I move the files from the ssd recordings directory onto > > the raid6 recordings directory. This also helps when your disks start > > going bad and getting badblocks, the badblocks *WILL* cause mythtv to > > stop recording shows at random because of some prior choices the > > developers made (sync often, and if you get more than a few seconds > > behind stop recording, attempting to save some recordings). > > > > I also put daily security camera data on the ssd and copy it over to > > the raid6 device nightly. > > > > Using the ssd for recording much reduces the load on the slower raid6 > > spinning disks. > > > > You would have to have a large number of people watching at the same > > time as the watching is relatively easy load, compared to the writes. > > > > On Fri, Aug 28, 2020 at 5:42 PM Ram Ramesh <rramesh2400@gmail.com> wrote: > >> On 8/28/20 5:12 PM, antlists wrote: > >>> On 28/08/2020 18:25, Ram Ramesh wrote: > >>>> I am mainly looking for IOP improvement as I want to use this RAID in > >>>> mythtv environment. So multiple threads will be active and I expect > >>>> cache to help with random access IOPs. > >>> ??? > >>> > >>> Caching will only help in a read-after-write scenario, or a > >>> read-several-times scenario. > >>> > >>> I'm guessing mythtv means it's a film server? Can ALL your films (or > >>> at least your favourite "watch again and again" ones) fit in the > >>> cache? If you watch a lot of films, chances are you'll read it from > >>> disk (no advantage from the cache), and by the time you watch it again > >>> it will have been evicted so you'll have to read it again. > >>> > >>> The other time cache may be useful, is if you're recording one thing > >>> and watching another. That way, the writes can stall in cache as you > >>> prioritise reading. > >>> > >>> Think about what is actually happening at the i/o level, and will > >>> cache help? > >>> > >>> Cheers, > >>> Wol > >> Mythtv is a sever client DVR system. I have a client next to each of my > >> TVs and one backend with large disk (this will have RAID with cache). At > >> any time many clients will be accessing different programs and any > >> scheduled recording will also be going on in parallel. So you will see a > >> lot of seeks, but still all will be based on limited threads (I only > >> have 3 TVs and may be one other PC acting as a client) So lots of IOs, > >> mostly sequential, across small number of threads. I think most cache > >> algorithms should be able to benefit from random access to blocks in SSD. > >> > >> Do you see any flaws in my argument? > >> > >> Regards > >> Ramesh > >> > I was hoping SSD caching would do what you are suggesting without daily > copying. Based on Wol's comments, it does not. May be I misunderstood > how SSD caching works. I will try it any way and see what happens. If > it does not do what I want, I will remove caching and go straight to disks. > > Ramesh -- Drew "Nothing in life is to be feared. It is only to be understood." --Marie Curie "This started out as a hobby and spun horribly out of control." -Unknown ^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: Best way to add caching to a new raid setup. 2020-08-29 22:36 ` Drew @ 2020-09-01 16:12 ` Ram Ramesh 2020-09-01 17:01 ` Kai Stian Olstad 2020-09-14 11:40 ` Nix 0 siblings, 2 replies; 36+ messages in thread From: Ram Ramesh @ 2020-09-01 16:12 UTC (permalink / raw) To: Drew; +Cc: antlists, Linux Raid On 8/29/20 5:36 PM, Drew wrote: > I know what you and Wols are talking about and I think it's actually > two separate things. Wol's is referring to traditional read caching > where it only benefits if you are reading the same thing over and over > again, cache hits. For streaming it won't help as you'll never hit the > cache. > > What you are talking about is a write cache, something I have seen > implemented before. Basically the idea is for writes to hit the SSD's > first, the SSD acting as a cache or buffer between the filesystem and > the slower RAID array. To the end process they're just writing to a > disk, they don't see the SSD buffer/cache. QNAP implements this in > their NAS chassis, just not sure what the exact implementation is in > their case. > > On Fri, Aug 28, 2020 at 9:14 PM R. Ramesh <rramesh@verizon.net> wrote: >> On 8/28/20 7:01 PM, Roger Heflin wrote: >>> Something I would suggest, I have found improves my mythtv experience >>> is: Get a big enough SSD to hold 12-18 hours of the recording or >>> whatever you do daily, and setup the recordings to go to the SSD. i >>> defined use the disk with the highest percentage free to be used >>> first, and since my raid6 is always 90% plus the SSD always gets used. >>> Then nightly I move the files from the ssd recordings directory onto >>> the raid6 recordings directory. This also helps when your disks start >>> going bad and getting badblocks, the badblocks *WILL* cause mythtv to >>> stop recording shows at random because of some prior choices the >>> developers made (sync often, and if you get more than a few seconds >>> behind stop recording, attempting to save some recordings). >>> >>> I also put daily security camera data on the ssd and copy it over to >>> the raid6 device nightly. >>> >>> Using the ssd for recording much reduces the load on the slower raid6 >>> spinning disks. >>> >>> You would have to have a large number of people watching at the same >>> time as the watching is relatively easy load, compared to the writes. >>> >>> On Fri, Aug 28, 2020 at 5:42 PM Ram Ramesh <rramesh2400@gmail.com> wrote: >>>> On 8/28/20 5:12 PM, antlists wrote: >>>>> On 28/08/2020 18:25, Ram Ramesh wrote: >>>>>> I am mainly looking for IOP improvement as I want to use this RAID in >>>>>> mythtv environment. So multiple threads will be active and I expect >>>>>> cache to help with random access IOPs. >>>>> ??? >>>>> >>>>> Caching will only help in a read-after-write scenario, or a >>>>> read-several-times scenario. >>>>> >>>>> I'm guessing mythtv means it's a film server? Can ALL your films (or >>>>> at least your favourite "watch again and again" ones) fit in the >>>>> cache? If you watch a lot of films, chances are you'll read it from >>>>> disk (no advantage from the cache), and by the time you watch it again >>>>> it will have been evicted so you'll have to read it again. >>>>> >>>>> The other time cache may be useful, is if you're recording one thing >>>>> and watching another. That way, the writes can stall in cache as you >>>>> prioritise reading. >>>>> >>>>> Think about what is actually happening at the i/o level, and will >>>>> cache help? >>>>> >>>>> Cheers, >>>>> Wol >>>> Mythtv is a sever client DVR system. I have a client next to each of my >>>> TVs and one backend with large disk (this will have RAID with cache). At >>>> any time many clients will be accessing different programs and any >>>> scheduled recording will also be going on in parallel. So you will see a >>>> lot of seeks, but still all will be based on limited threads (I only >>>> have 3 TVs and may be one other PC acting as a client) So lots of IOs, >>>> mostly sequential, across small number of threads. I think most cache >>>> algorithms should be able to benefit from random access to blocks in SSD. >>>> >>>> Do you see any flaws in my argument? >>>> >>>> Regards >>>> Ramesh >>>> >> I was hoping SSD caching would do what you are suggesting without daily >> copying. Based on Wol's comments, it does not. May be I misunderstood >> how SSD caching works. I will try it any way and see what happens. If >> it does not do what I want, I will remove caching and go straight to disks. >> >> Ramesh > > After thinking through this, I really like the idea of simply recording programs to SSD and move one file at a time based on some aging algorithms of my own. I will move files back and forth as needed during overnight hours creating my own caching effect. As long as I keep the original (renamed) and cache the ones needed with correct name, mythtv will find the cached copy. When mythtv complains about something missing, I can manually look at the renamed backup copy and make the corrections. Unless my thinking is badly broken, this should work. I really wished overlay fs had a nice merge/clean feature that will allow us to move overlay items to underlying file system and start over the overlay. All I need is file level caching and not block level caching. Ramesh ^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: Best way to add caching to a new raid setup. 2020-09-01 16:12 ` Ram Ramesh @ 2020-09-01 17:01 ` Kai Stian Olstad 2020-09-02 18:17 ` Ram Ramesh 2020-09-14 11:40 ` Nix 1 sibling, 1 reply; 36+ messages in thread From: Kai Stian Olstad @ 2020-09-01 17:01 UTC (permalink / raw) To: Ram Ramesh; +Cc: Drew, antlists, Linux Raid On Tue, Sep 01, 2020 at 11:12:40AM -0500, Ram Ramesh wrote: > I really wished overlay fs had a nice merge/clean feature that will allow us > to move overlay items to underlying file system and start over the overlay. You should check out mergerfs[1], it can merge multiple directories together on different disks and you can transparently move files between them. Mergerfs have a lot of other features too that you might find useful. [1] https://github.com/trapexit/mergerfs/ -- Kai Stian Olstad ^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: Best way to add caching to a new raid setup. 2020-09-01 17:01 ` Kai Stian Olstad @ 2020-09-02 18:17 ` Ram Ramesh 0 siblings, 0 replies; 36+ messages in thread From: Ram Ramesh @ 2020-09-02 18:17 UTC (permalink / raw) To: Kai Stian Olstad; +Cc: Drew, antlists, Linux Raid On 9/1/20 12:01 PM, Kai Stian Olstad wrote: > On Tue, Sep 01, 2020 at 11:12:40AM -0500, Ram Ramesh wrote: >> I really wished overlay fs had a nice merge/clean feature that will allow us >> to move overlay items to underlying file system and start over the overlay. > You should check out mergerfs[1], it can merge multiple directories together > on different disks and you can transparently move files between them. > Mergerfs have a lot of other features too that you might find useful. > > [1] https://github.com/trapexit/mergerfs/ > Kai, Thanks. It is interesting. However, my starting point for this discussion was improving performance and this one seems a bit backward as it uses FUSE. I still think it is a good step. So I will learn a bit more to see if I can use it. Regards Ramesh ^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: Best way to add caching to a new raid setup. 2020-09-01 16:12 ` Ram Ramesh 2020-09-01 17:01 ` Kai Stian Olstad @ 2020-09-14 11:40 ` Nix 2020-09-14 14:32 ` Ram Ramesh 1 sibling, 1 reply; 36+ messages in thread From: Nix @ 2020-09-14 11:40 UTC (permalink / raw) To: Ram Ramesh; +Cc: Drew, antlists, Linux Raid On 1 Sep 2020, Ram Ramesh uttered the following: > After thinking through this, I really like the idea of simply > recording programs to SSD and move one file at a time based on some > aging algorithms of my own. I will move files back and forth as needed > during overnight hours creating my own caching effect. I don't really see the benefit here for a mythtv installation in particular. I/O patterns for large media are extremely non-seeky: even with multiple live recordings at once, an HDD would easily be able to keep up since it'd only have to seek a few times per 30s period given the size of most plausible write caches. In general, doing the hierarchical storage thing is useful if you have stuff you will almost never access that you can keep on slower media (or, in this case, stuff whose access patterns are non-seeky that you can keep on media with a high seek time). But in this case, that would be 'all of it'. Even if it weren't, by-hand copying won't deal with the thing you really need to keep on fast-seek media: metadata. You can't build your own filesystem with metadata on SSD and data on non-SSD this way! But both LVM caching and bcache do exactly that. ^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: Best way to add caching to a new raid setup. 2020-09-14 11:40 ` Nix @ 2020-09-14 14:32 ` Ram Ramesh 2020-09-14 14:48 ` Roger Heflin 0 siblings, 1 reply; 36+ messages in thread From: Ram Ramesh @ 2020-09-14 14:32 UTC (permalink / raw) To: Nix; +Cc: Drew, antlists, Linux Raid On 9/14/20 6:40 AM, Nix wrote: > On 1 Sep 2020, Ram Ramesh uttered the following: > >> After thinking through this, I really like the idea of simply >> recording programs to SSD and move one file at a time based on some >> aging algorithms of my own. I will move files back and forth as needed >> during overnight hours creating my own caching effect. > I don't really see the benefit here for a mythtv installation in > particular. I/O patterns for large media are extremely non-seeky: even > with multiple live recordings at once, an HDD would easily be able to > keep up since it'd only have to seek a few times per 30s period given > the size of most plausible write caches. > > In general, doing the hierarchical storage thing is useful if you have > stuff you will almost never access that you can keep on slower media > (or, in this case, stuff whose access patterns are non-seeky that you > can keep on media with a high seek time). But in this case, that would > be 'all of it'. Even if it weren't, by-hand copying won't deal with the > thing you really need to keep on fast-seek media: metadata. You can't > build your own filesystem with metadata on SSD and data on non-SSD this > way! But both LVM caching and bcache do exactly that. Agreed, all I need is a file level LRU caching effect. All recently accessed/created files in SSD and the ones untouched for a while in spinning disks. I was trying to get this done using a block level caching methods which is too complicated for the purpose. My aim is not to improve the performance, instead improve on power. I want my raid disks to be mostly sitting idle holding files and spin up and serve only when called for. Most of the time, I am watching/recording recent shows/programs or popular movies and typically that is about 200-400GB of storage. With ultraviolet, prime, netflix and disney, movies are more often sourced from online content and TV shows get deleted after watching and new ones gets added in that space. So, typical usage seem ideal for popular SSD size (with a large backup store in spinning disk), I think. This means my spinning disks are going to wake up once a day or two at most. More often I expect it to be once a week or have periods of high activity and die down to nothing for a while. Instead, currently they are running 24x7 which does not make sense. Regards Ramesh ^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: Best way to add caching to a new raid setup. 2020-09-14 14:32 ` Ram Ramesh @ 2020-09-14 14:48 ` Roger Heflin 2020-09-14 15:08 ` Wols Lists 0 siblings, 1 reply; 36+ messages in thread From: Roger Heflin @ 2020-09-14 14:48 UTC (permalink / raw) To: Ram Ramesh; +Cc: Nix, Drew, antlists, Linux Raid It should be noted that mythtv is a badly behaved IO application, it does a lot of sync calls that effectively for the most part makes linux's write cache small. That code was apparently done on the idea that the syncs would take too long when too many recordings were being done and a few would timeout and then kill a few of them so that at least some recordings will work. A number of people believe its usage of sync is badly designed (me included) but last a saw the dev's were arguing for it. It is a decent assumption if recordings we being done on a single spinning disk, but it is not so good when there are multiple spindles under it and the disks would be able to cache up. I had recordings being killed when a single spinning disk in the array with the SCTERC set to 7seconds happened, and that is part of the reason why I went to SSD. With old disks the disk block replacements is going to happen often enough that with the sync/kill crap recordings aren't reliable. On Mon, Sep 14, 2020 at 9:37 AM Ram Ramesh <rramesh2400@gmail.com> wrote: > > On 9/14/20 6:40 AM, Nix wrote: > > On 1 Sep 2020, Ram Ramesh uttered the following: > > > >> After thinking through this, I really like the idea of simply > >> recording programs to SSD and move one file at a time based on some > >> aging algorithms of my own. I will move files back and forth as needed > >> during overnight hours creating my own caching effect. > > I don't really see the benefit here for a mythtv installation in > > particular. I/O patterns for large media are extremely non-seeky: even > > with multiple live recordings at once, an HDD would easily be able to > > keep up since it'd only have to seek a few times per 30s period given > > the size of most plausible write caches. > > > > In general, doing the hierarchical storage thing is useful if you have > > stuff you will almost never access that you can keep on slower media > > (or, in this case, stuff whose access patterns are non-seeky that you > > can keep on media with a high seek time). But in this case, that would > > be 'all of it'. Even if it weren't, by-hand copying won't deal with the > > thing you really need to keep on fast-seek media: metadata. You can't > > build your own filesystem with metadata on SSD and data on non-SSD this > > way! But both LVM caching and bcache do exactly that. > Agreed, all I need is a file level LRU caching effect. All recently > accessed/created files in SSD and the ones untouched for a while in > spinning disks. I was trying to get this done using a block level > caching methods which is too complicated for the purpose. > > My aim is not to improve the performance, instead improve on power. I > want my raid disks to be mostly sitting idle holding files and spin up > and serve only when called for. Most of the time, I am > watching/recording recent shows/programs or popular movies and typically > that is about 200-400GB of storage. With ultraviolet, prime, netflix and > disney, movies are more often sourced from online content and TV shows > get deleted after watching and new ones gets added in that space. So, > typical usage seem ideal for popular SSD size (with a large backup store > in spinning disk), I think. This means my spinning disks are going to > wake up once a day or two at most. More often I expect it to be once a > week or have periods of high activity and die down to nothing for a > while. Instead, currently they are running 24x7 which does not make sense. > > Regards > Ramesh > ^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: Best way to add caching to a new raid setup. 2020-09-14 14:48 ` Roger Heflin @ 2020-09-14 15:08 ` Wols Lists 0 siblings, 0 replies; 36+ messages in thread From: Wols Lists @ 2020-09-14 15:08 UTC (permalink / raw) To: Roger Heflin, Ram Ramesh; +Cc: Nix, Drew, Linux Raid On 14/09/20 15:48, Roger Heflin wrote: > It should be noted that mythtv is a badly behaved IO application, it > does a lot of sync calls that effectively for the most part makes > linux's write cache small. Sounds like the devs need reminding of the early days of ext4. Iiuc, if that ran on an early ext4 you wouldn't have got ANY successful recordings AT ALL. At an absolute minimum, that sort of behaviour needs to be configurable, because if the user is running mythtv on anything other than a dedicated machine then it's going to kill performance for anything else. Cheers, Wol ^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: Best way to add caching to a new raid setup. 2020-08-28 22:40 ` Ram Ramesh 2020-08-28 22:59 ` antlists 2020-08-29 0:01 ` Roger Heflin @ 2020-08-31 19:20 ` Nix 2 siblings, 0 replies; 36+ messages in thread From: Nix @ 2020-08-31 19:20 UTC (permalink / raw) To: Ram Ramesh; +Cc: antlists, R. Ramesh, Linux Raid On 28 Aug 2020, Ram Ramesh verbalised: > Mythtv is a sever client DVR system. I have a client next to each of > my TVs and one backend with large disk (this will have RAID with > cache). At any time many clients will be accessing different programs > and any scheduled recording will also be going on in parallel. So you > will see a lot of seeks, but still all will be based on limited > threads (I only have 3 TVs and may be one other PC acting as a client) > So lots of IOs, mostly sequential, across small number of threads. I > think most cache algorithms should be able to benefit from random > access to blocks in SSD. FYI: bcache documents how its caching works. Assuming you ignore the write cache (which I recommend, since nearly all the data corruption and starvation bugs in bcache have been in the write caching code, and it doesn't look like write caching would benefit your use case anyway: if you want an ssd write cache, just use RAID journalling), bcache is very hard to break: if by some mischance the cache does become corrupted you can decouple it from the backing RAID array and just keep using it until you recreate the cache device and reattach it. bcache tracks the "sequentiality" of recent reads and avoids caching big sequential I/O on the grounds that it's a likely waste of SSD lifetime to do so: HDDs can do contiguous reads quite fast: what you want to cache is seeky reads. This means that your mythtv reads will only be cached when there are multiple contending reads going on. This doesn't seem terribly useful, since for a media player any given contending read is probably not going to be of metadata and is probably not going to be repeated for a very long time (unless you particularly like repeatedly rewatching the same things). So you won't get much of a speedup or reduction in contention. Where caches like bcache and the LVM cache help is when small seeky reads are likely to be repeated, which is very common with filesystem metadata and a lot of other workloads, but not common at all for media files in my experience. (FYI: my setup is spinning rust <- md-raid6 <- bcache <- LVM PV, with one LVM PV omitting the bcache layer and both combined into one VG. My bulk media storage is on the non-bcached PV. The filesystems are almost all xfs, some of them with cryptsetups in the way too. One warning: bcache works by stuffing a header onto the data, and does *not* pass through RAID stripe size info etc: you'll need to pass in a suitable --data-offset to make-bcache to ensure that I/O is RAID-aligned, and pass in the stripe size etc to the underlying oeprations. I did this by mkfsing everything and then doing a blktrace of the underlying RAID devices while I did some simple I/Os to make sure the RAID layer was doing nice stripe-aligned I/O. This is probably total overkill for a media server, but this was my do-everything server, so I cared very much about small random I/O performance. This was particularly fun given that one LVM PV had a bcache header and the other one didn't, and I wanted the filesystems to have suitable alignment for *both* of them at once... it was distinctly fiddly to get right.) ^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: Best way to add caching to a new raid setup. 2020-08-28 2:31 ` Best way to add caching to a new raid setup R. Ramesh 2020-08-28 3:05 ` Peter Grandi 2020-08-28 15:26 ` antlists @ 2020-08-28 17:46 ` Roman Mamedov 2020-08-28 20:39 ` Ram Ramesh 2 siblings, 1 reply; 36+ messages in thread From: Roman Mamedov @ 2020-08-28 17:46 UTC (permalink / raw) To: R. Ramesh; +Cc: Linux Raid On Thu, 27 Aug 2020 21:31:07 -0500 "R. Ramesh" <rramesh@verizon.net> wrote: > I have two raid6s running on mythbuntu 14.04. The are built on 6 > enterprise drives. So, no hd issues as of now. Still, I plan to upgrade > as it has been a while and the size of the hard drives have become > significantly larger (a indication that my disks may be older) I want to > build new raid using the 16/14tb drives. Since I am building new raid, I > thought I could explore caching options. I see a mention of LVM cache > and few other bcache/xyzcache etc. Once you set up bcache, it cannot be removed. The volume will always stay a bcache volume, even if you decide to stop using caching. Which feels weird and potentially troublesome, going through an extra layer (kernel driver) with its complexity and computational overhead (no matter how small). On the other hand LVM with caching turned off is just normal LVM, that you'd likely would have used anyway, for other benefits that it provides. Also my impression is that LVM has more solid and reliable codebase, but bcache might provide a somewhat better the performance boost due to caching. -- With respect, Roman ^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: Best way to add caching to a new raid setup. 2020-08-28 17:46 ` Roman Mamedov @ 2020-08-28 20:39 ` Ram Ramesh 2020-08-29 15:34 ` antlists 2020-08-30 22:16 ` Michal Soltys 0 siblings, 2 replies; 36+ messages in thread From: Ram Ramesh @ 2020-08-28 20:39 UTC (permalink / raw) To: Roman Mamedov, R. Ramesh; +Cc: Linux Raid On 8/28/20 12:46 PM, Roman Mamedov wrote: > On Thu, 27 Aug 2020 21:31:07 -0500 > "R. Ramesh" <rramesh@verizon.net> wrote: > >> I have two raid6s running on mythbuntu 14.04. The are built on 6 >> enterprise drives. So, no hd issues as of now. Still, I plan to upgrade >> as it has been a while and the size of the hard drives have become >> significantly larger (a indication that my disks may be older) I want to >> build new raid using the 16/14tb drives. Since I am building new raid, I >> thought I could explore caching options. I see a mention of LVM cache >> and few other bcache/xyzcache etc. > Once you set up bcache, it cannot be removed. The volume will always stay a > bcache volume, even if you decide to stop using caching. Which feels weird and > potentially troublesome, going through an extra layer (kernel driver) with its > complexity and computational overhead (no matter how small). > > On the other hand LVM with caching turned off is just normal LVM, that you'd > likely would have used anyway, for other benefits that it provides. > > Also my impression is that LVM has more solid and reliable codebase, but > bcache might provide a somewhat better the performance boost due to caching. > Thanks for the info on bcache. I do not think it will be my favorite. I am going to try LVM cache as my first choice. Note that the new disks will be spare disks for some time and I will be able to try out a few things before deciding to put it into use. One thing about LVM that I am not clear. Given the choice between creating /mirror LV /on a VG over simple PVs and /simple LV/ over raid1 PVs, which is preferred method? Why? Ramesh ^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: Best way to add caching to a new raid setup. 2020-08-28 20:39 ` Ram Ramesh @ 2020-08-29 15:34 ` antlists 2020-08-29 15:57 ` Roman Mamedov 2020-08-30 22:16 ` Michal Soltys 1 sibling, 1 reply; 36+ messages in thread From: antlists @ 2020-08-29 15:34 UTC (permalink / raw) To: Ram Ramesh, Roman Mamedov, R. Ramesh; +Cc: Linux Raid On 28/08/2020 21:39, Ram Ramesh wrote: > One thing about LVM that I am not clear. Given the choice between > creating /mirror LV /on a VG over simple PVs and /simple LV/ over raid1 > PVs, which is preferred method? Why? Simplicity says have ONE raid, with ONE PV on top of it. The other way round is you need TWO SEPARATE (at least) PV/VG/LVs, which you then stick a raid on top. Basically, it's just KISS. Cheers, Wol ^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: Best way to add caching to a new raid setup. 2020-08-29 15:34 ` antlists @ 2020-08-29 15:57 ` Roman Mamedov 2020-08-29 16:26 ` Roger Heflin 0 siblings, 1 reply; 36+ messages in thread From: Roman Mamedov @ 2020-08-29 15:57 UTC (permalink / raw) To: antlists; +Cc: Ram Ramesh, R. Ramesh, Linux Raid On Sat, 29 Aug 2020 16:34:56 +0100 antlists <antlists@youngman.org.uk> wrote: > On 28/08/2020 21:39, Ram Ramesh wrote: > > One thing about LVM that I am not clear. Given the choice between > > creating /mirror LV /on a VG over simple PVs and /simple LV/ over raid1 > > PVs, which is preferred method? Why? > > Simplicity says have ONE raid, with ONE PV on top of it. > > The other way round is you need TWO SEPARATE (at least) PV/VG/LVs, which > you then stick a raid on top. I believe the question was not about the order of layers, but whether to create a RAID with mdadm and then LVM on top, vs. abandoning mdadm and using LVM's built-in RAID support instead: https://access.redhat.com/documentation/en-us/red_hat_enterprise_linux/6/html/logical_volume_manager_administration/mirror_create Personally I hugely prefer mdadm, due to the familiar and convenient interface of the program itself, as well as of /proc/mdstat. -- With respect, Roman ^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: Best way to add caching to a new raid setup. 2020-08-29 15:57 ` Roman Mamedov @ 2020-08-29 16:26 ` Roger Heflin 2020-08-29 20:45 ` Ram Ramesh 0 siblings, 1 reply; 36+ messages in thread From: Roger Heflin @ 2020-08-29 16:26 UTC (permalink / raw) To: Roman Mamedov; +Cc: antlists, Ram Ramesh, R. Ramesh, Linux Raid I use mdadm raid. From what I can tell mdadm has been around a lot longer and is better understood by a larger group of users. Hence if something does go wrong there are a significant number of people that can help. I have been running mythtv on mdadm since early-2006, using LVM over top of it. I have migrated from 4x500 to 4x1.5tb and am currently on 7x3tb. One trick I did do on the 3tb's is I did partition the disk into 4 750gb partitions and then each set of 7 makes up a PV. Often if a disk gets a bad block or a random io failure it only takes a single raid from +2 down to +1, and when rebuilding them it rebuilds faster. I created mine like below:, making sure md13 has all sdX3 disks on it as when you have to add devices the numbers are the same. This also means that when enlarging it that there are 4 separate enlarges, but no one enlarge takes more than a day. So there might be a good reason to say separate a 12tb drive into 6x2 or 4x3 just so if you enlarge it it does not take a week to finish. Also make sure to use a bitmap, when you re-add a previous disk to it the rebuilds are much faster especially if the drive has only been out for a few hours. Personalities : [raid6] [raid5] [raid4] md13 : active raid6 sdi3[9] sdg3[6] sdf3[12] sde3[10] sdd3[1] sdc3[5] sdb3[7] 3612623360 blocks super 1.2 level 6, 512k chunk, algorithm 2 [7/7] [UUUUUUU] bitmap: 0/6 pages [0KB], 65536KB chunk md14 : active raid6 sdi4[11] sdg4[6] sdf4[9] sde4[10] sdb4[7] sdd4[1] sdc4[5] 3612623360 blocks super 1.2 level 6, 512k chunk, algorithm 2 [7/7] [UUUUUUU] bitmap: 1/6 pages [4KB], 65536KB chunk md15 : active raid6 sdi5[11] sdg5[8] sdf5[9] sde5[10] sdb5[7] sdd5[1] sdc5[5] 3612623360 blocks super 1.2 level 6, 512k chunk, algorithm 2 [7/7] [UUUUUUU] bitmap: 1/6 pages [4KB], 65536KB chunk md16 : active raid6 sdi6[9] sdg6[7] sdf6[11] sde6[10] sdb6[8] sdd6[1] sdc6[5] 3615495680 blocks super 1.2 level 6, 512k chunk, algorithm 2 [7/7] [UUUUUUU] bitmap: 0/6 pages [0KB], 65536KB chunk On Sat, Aug 29, 2020 at 11:00 AM Roman Mamedov <rm@romanrm.net> wrote: > > On Sat, 29 Aug 2020 16:34:56 +0100 > antlists <antlists@youngman.org.uk> wrote: > > > On 28/08/2020 21:39, Ram Ramesh wrote: > > > One thing about LVM that I am not clear. Given the choice between > > > creating /mirror LV /on a VG over simple PVs and /simple LV/ over raid1 > > > PVs, which is preferred method? Why? > > > > Simplicity says have ONE raid, with ONE PV on top of it. > > > > The other way round is you need TWO SEPARATE (at least) PV/VG/LVs, which > > you then stick a raid on top. > > I believe the question was not about the order of layers, but whether to > create a RAID with mdadm and then LVM on top, vs. abandoning mdadm and using > LVM's built-in RAID support instead: > https://access.redhat.com/documentation/en-us/red_hat_enterprise_linux/6/html/logical_volume_manager_administration/mirror_create > > Personally I hugely prefer mdadm, due to the familiar and convenient interface > of the program itself, as well as of /proc/mdstat. > > -- > With respect, > Roman ^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: Best way to add caching to a new raid setup. 2020-08-29 16:26 ` Roger Heflin @ 2020-08-29 20:45 ` Ram Ramesh 0 siblings, 0 replies; 36+ messages in thread From: Ram Ramesh @ 2020-08-29 20:45 UTC (permalink / raw) To: Roger Heflin, Roman Mamedov; +Cc: antlists, R. Ramesh, Linux Raid On 8/29/20 11:26 AM, Roger Heflin wrote: > I use mdadm raid. From what I can tell mdadm has been around a lot > longer and is better understood by a larger group of users. Hence if > something does go wrong there are a significant number of people that > can help. > > I have been running mythtv on mdadm since early-2006, using LVM over > top of it. I have migrated from 4x500 to 4x1.5tb and am currently on > 7x3tb. > > One trick I did do on the 3tb's is I did partition the disk into 4 > 750gb partitions and then each set of 7 makes up a PV. Often if a > disk gets a bad block or a random io failure it only takes a single > raid from +2 down to +1, and when rebuilding them it rebuilds faster. > I created mine like below:, making sure md13 has all sdX3 disks on it > as when you have to add devices the numbers are the same. This also > means that when enlarging it that there are 4 separate enlarges, but > no one enlarge takes more than a day. So there might be a good reason > to say separate a 12tb drive into 6x2 or 4x3 just so if you enlarge it > it does not take a week to finish. Also make sure to use a bitmap, > when you re-add a previous disk to it the rebuilds are much faster > especially if the drive has only been out for a few hours. > > Personalities : [raid6] [raid5] [raid4] > md13 : active raid6 sdi3[9] sdg3[6] sdf3[12] sde3[10] sdd3[1] sdc3[5] sdb3[7] > 3612623360 blocks super 1.2 level 6, 512k chunk, algorithm 2 > [7/7] [UUUUUUU] > bitmap: 0/6 pages [0KB], 65536KB chunk > > md14 : active raid6 sdi4[11] sdg4[6] sdf4[9] sde4[10] sdb4[7] sdd4[1] sdc4[5] > 3612623360 blocks super 1.2 level 6, 512k chunk, algorithm 2 > [7/7] [UUUUUUU] > bitmap: 1/6 pages [4KB], 65536KB chunk > > md15 : active raid6 sdi5[11] sdg5[8] sdf5[9] sde5[10] sdb5[7] sdd5[1] sdc5[5] > 3612623360 blocks super 1.2 level 6, 512k chunk, algorithm 2 > [7/7] [UUUUUUU] > bitmap: 1/6 pages [4KB], 65536KB chunk > > md16 : active raid6 sdi6[9] sdg6[7] sdf6[11] sde6[10] sdb6[8] sdd6[1] sdc6[5] > 3615495680 blocks super 1.2 level 6, 512k chunk, algorithm 2 > [7/7] [UUUUUUU] > bitmap: 0/6 pages [0KB], 65536KB chunk > > > > On Sat, Aug 29, 2020 at 11:00 AM Roman Mamedov <rm@romanrm.net> wrote: >> On Sat, 29 Aug 2020 16:34:56 +0100 >> antlists <antlists@youngman.org.uk> wrote: >> >>> On 28/08/2020 21:39, Ram Ramesh wrote: >>>> One thing about LVM that I am not clear. Given the choice between >>>> creating /mirror LV /on a VG over simple PVs and /simple LV/ over raid1 >>>> PVs, which is preferred method? Why? >>> Simplicity says have ONE raid, with ONE PV on top of it. >>> >>> The other way round is you need TWO SEPARATE (at least) PV/VG/LVs, which >>> you then stick a raid on top. >> I believe the question was not about the order of layers, but whether to >> create a RAID with mdadm and then LVM on top, vs. abandoning mdadm and using >> LVM's built-in RAID support instead: >> https://access.redhat.com/documentation/en-us/red_hat_enterprise_linux/6/html/logical_volume_manager_administration/mirror_create >> >> Personally I hugely prefer mdadm, due to the familiar and convenient interface >> of the program itself, as well as of /proc/mdstat. >> >> -- >> With respect, >> Roman Roger, Good point about breaking up the disk into partitions and building same numbered partition in to a raid volume. Do you recommend this procedure even if I do only raid1? I am afraid to make raid6 over 4x14TB disks. I want to keep rebuild simple and not thrash the disks each time I (have to) replace one. Even if I split into 3tb partitions, I replace one disk all of them will rebuild and it will be a seek festival. I am hoping simplicity of raid1 will be more suited when expected URE size is smaller than a single disk capacity. I like the +2 redundancy of raid6 over +1 raid1 (not doing raid1 over 3 disks as I fee that is a huge waste) Regards Ramesh ^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: Best way to add caching to a new raid setup. 2020-08-28 20:39 ` Ram Ramesh 2020-08-29 15:34 ` antlists @ 2020-08-30 22:16 ` Michal Soltys 1 sibling, 0 replies; 36+ messages in thread From: Michal Soltys @ 2020-08-30 22:16 UTC (permalink / raw) To: Ram Ramesh, Roman Mamedov, R. Ramesh; +Cc: Linux Raid On 20/08/28 22:39, Ram Ramesh wrote: > On 8/28/20 12:46 PM, Roman Mamedov wrote: >> On Thu, 27 Aug 2020 21:31:07 -0500 >> Also my impression is that LVM has more solid and reliable codebase, but >> bcache might provide a somewhat better the performance boost due to >> caching. >> > Thanks for the info on bcache. I do not think it will be my favorite. I > am going to try LVM cache as my first choice. Note that the new disks > will be spare disks for some time and I will be able to try out a few > things before deciding to put it into use. I had some _very nasty_ adventures with LVM's cache, that ended with rather massive corruption at the end of the last year. I described it in: https://github.com/lvmteam/lvm2/issues/26 though not much from that was answered or commented, except confirmation that flushing issue was fixed. At the same time I have yet to have bcache failing on me (and it - so far flawlessly - did survive kernel panics (bugged nic drivers) and disks dying). YMMV of course, just _make sure_ to have backups. And make sure to test it thoroughly in your setup (including things like hard reset). ^ permalink raw reply [flat|nested] 36+ messages in thread
end of thread, other threads:[~2020-09-14 15:08 UTC | newest] Thread overview: 36+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- [not found] <16cee7f2-38d9-13c8-4342-4562be68930b.ref@verizon.net> 2020-08-28 2:31 ` Best way to add caching to a new raid setup R. Ramesh 2020-08-28 3:05 ` Peter Grandi 2020-08-28 3:19 ` Ram Ramesh 2020-08-28 15:26 ` antlists 2020-08-28 17:25 ` Ram Ramesh 2020-08-28 22:12 ` antlists 2020-08-28 22:40 ` Ram Ramesh 2020-08-28 22:59 ` antlists 2020-08-29 3:08 ` R. Ramesh 2020-08-29 5:02 ` Roman Mamedov 2020-08-29 20:48 ` Ram Ramesh 2020-08-29 21:26 ` Roger Heflin 2020-08-30 0:56 ` Ram Ramesh 2020-08-30 15:42 ` Roger Heflin 2020-08-30 17:19 ` Ram Ramesh 2020-09-11 18:39 ` R. Ramesh 2020-09-11 20:37 ` Roger Heflin 2020-09-11 22:41 ` Ram Ramesh 2020-08-29 0:01 ` Roger Heflin 2020-08-29 3:12 ` R. Ramesh 2020-08-29 22:36 ` Drew 2020-09-01 16:12 ` Ram Ramesh 2020-09-01 17:01 ` Kai Stian Olstad 2020-09-02 18:17 ` Ram Ramesh 2020-09-14 11:40 ` Nix 2020-09-14 14:32 ` Ram Ramesh 2020-09-14 14:48 ` Roger Heflin 2020-09-14 15:08 ` Wols Lists 2020-08-31 19:20 ` Nix 2020-08-28 17:46 ` Roman Mamedov 2020-08-28 20:39 ` Ram Ramesh 2020-08-29 15:34 ` antlists 2020-08-29 15:57 ` Roman Mamedov 2020-08-29 16:26 ` Roger Heflin 2020-08-29 20:45 ` Ram Ramesh 2020-08-30 22:16 ` Michal Soltys
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).