* Hot data tracking / hybrid storage @ 2016-05-15 12:12 Ferry Toth 2016-05-15 21:11 ` Duncan 2016-05-16 11:25 ` Austin S. Hemmelgarn 0 siblings, 2 replies; 26+ messages in thread From: Ferry Toth @ 2016-05-15 12:12 UTC (permalink / raw) To: linux-btrfs Is there anything going on in this area? We have btrfs in RAID10 using 4 HDD's for many years now with a rotating scheme of snapshots for easy backup. <10% files (bytes) change between oldest snapshot and the current state. However, the filesystem seems to become very slow, probably due to the RAID10 and the snapshots. It would be fantastic if we could just add 4 SSD's to the pool and btrfs would just magically prefer to put often accessed files there and move older or less popular files to the HDD's. In my simple mind this can not be done easily using bcache as that would require completely rebuilding the file system on top of bcache (can not just add a few SSD's to the pool), while implementing a cache inside btrfs is probably a complex thing with lots of overhead. Simply telling the allocator to prefer new files to go to the ssd and move away unpopular stuff to hdd during balance should do the trick, or am I wrong? Are none of the big users looking into this? Ferry ^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: Hot data tracking / hybrid storage 2016-05-15 12:12 Hot data tracking / hybrid storage Ferry Toth @ 2016-05-15 21:11 ` Duncan 2016-05-15 23:05 ` Kai Krakow 2016-05-16 11:25 ` Austin S. Hemmelgarn 1 sibling, 1 reply; 26+ messages in thread From: Duncan @ 2016-05-15 21:11 UTC (permalink / raw) To: linux-btrfs Ferry Toth posted on Sun, 15 May 2016 12:12:09 +0000 as excerpted: > Is there anything going on in this area? > > We have btrfs in RAID10 using 4 HDD's for many years now with a rotating > scheme of snapshots for easy backup. <10% files (bytes) change between > oldest snapshot and the current state. > > However, the filesystem seems to become very slow, probably due to the > RAID10 and the snapshots. > > It would be fantastic if we could just add 4 SSD's to the pool and btrfs > would just magically prefer to put often accessed files there and move > older or less popular files to the HDD's. > > In my simple mind this can not be done easily using bcache as that would > require completely rebuilding the file system on top of bcache (can not > just add a few SSD's to the pool), while implementing a cache inside > btrfs is probably a complex thing with lots of overhead. > > Simply telling the allocator to prefer new files to go to the ssd and > move away unpopular stuff to hdd during balance should do the trick, or > am I wrong? > > Are none of the big users looking into this? Hot data tracking remains on the list of requested features, but at this point there's far more features on that list than developers working on them, so unless it's a developer's (or their employer/sponsor's) high priority, it's unlikely to see the light of day for some time, years, yet. And given the availability of two hybrid solutions in the form of bcache and a device-mapper solution (the name of which I can't recall ATM), priority for a btrfs-builtin solution isn't going to be as high as it might be otherwise, so... The good news for the dmapper solution is that AFAIK, it doesn't require reformatting like bcache does. The bad news for it is that while we have list regulars using btrfs on bcache so it's a (relatively) well known and tested solution, we're lacking any regulars known to be using btrfs on the dmapper solution. Additionally, some posters looking at the dmapper choice have reported that it's not as mature as bcache and not really ready for use with btrfs, which is itself still stabilizing and maturing, and they weren't ready to deal with the complexities and reliability issues of two still stabilizing and maturing subsystems one on top of the other. Of course, that does give you the opportunity of being that list regular using btrfs on top of that dmapper solution, should you be willing to undertake that task. =:^) Meanwhile, you did mention backups, and of course as btrfs /is/ still maturing, use without backups (and snapshots aren't backups) ready if needed is highly discouraged in any case, so you do have the option of simply blowing away the existing filesystem and either redoing it as-is, which will likely speed it up dramatically, for a few more years, or throwing in those ssds and redoing it with bcache. It's also worth noting that if you can add 4 ssds to the existing set, you obviously have the hookups available for four more devices, and with hdds cheap as they are compared to ssds, if necessary you should be able to throw four more hdds in there, formatting them with bcache or not first as desired, and creating a new btrfs on them, then copying everything over. After that you could yank the old ones for use as spares or whatever, and replace them with ssds, which could be setup with bcache as well and then activated. Given the cost of a single ssd, the total cost of four of them plus four hdds should still be below the cost of five ssds, and you're still not using more than the 8 total hookups you had already mentioned, so it should be quite reasonable to do it that way. -- Duncan - List replies preferred. No HTML msgs. "Every nonfree program has a lord, a master -- and if you use the program, he is your master." Richard Stallman ^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: Hot data tracking / hybrid storage 2016-05-15 21:11 ` Duncan @ 2016-05-15 23:05 ` Kai Krakow 2016-05-17 6:27 ` Ferry Toth 0 siblings, 1 reply; 26+ messages in thread From: Kai Krakow @ 2016-05-15 23:05 UTC (permalink / raw) To: linux-btrfs Am Sun, 15 May 2016 21:11:11 +0000 (UTC) schrieb Duncan <1i5t5.duncan@cox.net>: > Ferry Toth posted on Sun, 15 May 2016 12:12:09 +0000 as excerpted: > > > Is there anything going on in this area? > > > > We have btrfs in RAID10 using 4 HDD's for many years now with a > > rotating scheme of snapshots for easy backup. <10% files (bytes) > > change between oldest snapshot and the current state. > > > > However, the filesystem seems to become very slow, probably due to > > the RAID10 and the snapshots. > > > > It would be fantastic if we could just add 4 SSD's to the pool and > > btrfs would just magically prefer to put often accessed files there > > and move older or less popular files to the HDD's. > > > > In my simple mind this can not be done easily using bcache as that > > would require completely rebuilding the file system on top of > > bcache (can not just add a few SSD's to the pool), while > > implementing a cache inside btrfs is probably a complex thing with > > lots of overhead. > > > > Simply telling the allocator to prefer new files to go to the ssd > > and move away unpopular stuff to hdd during balance should do the > > trick, or am I wrong? > > > > Are none of the big users looking into this? > > Hot data tracking remains on the list of requested features, but at > this point there's far more features on that list than developers > working on them, so unless it's a developer's (or their > employer/sponsor's) high priority, it's unlikely to see the light of > day for some time, years, yet. > > And given the availability of two hybrid solutions in the form of > bcache and a device-mapper solution (the name of which I can't recall > ATM), priority for a btrfs-builtin solution isn't going to be as high > as it might be otherwise, so... > > > The good news for the dmapper solution is that AFAIK, it doesn't > require reformatting like bcache does. > > The bad news for it is that while we have list regulars using btrfs > on bcache so it's a (relatively) well known and tested solution, > we're lacking any regulars known to be using btrfs on the dmapper > solution. Additionally, some posters looking at the dmapper choice > have reported that it's not as mature as bcache and not really ready > for use with btrfs, which is itself still stabilizing and maturing, > and they weren't ready to deal with the complexities and reliability > issues of two still stabilizing and maturing subsystems one on top of > the other. > > Of course, that does give you the opportunity of being that list > regular using btrfs on top of that dmapper solution, should you be > willing to undertake that task. =:^) > > > Meanwhile, you did mention backups, and of course as btrfs /is/ still > maturing, use without backups (and snapshots aren't backups) ready if > needed is highly discouraged in any case, so you do have the option > of simply blowing away the existing filesystem and either redoing it > as-is, which will likely speed it up dramatically, for a few more > years, or throwing in those ssds and redoing it with bcache. > > It's also worth noting that if you can add 4 ssds to the existing > set, you obviously have the hookups available for four more devices, > and with hdds cheap as they are compared to ssds, if necessary you > should be able to throw four more hdds in there, formatting them with > bcache or not first as desired, and creating a new btrfs on them, > then copying everything over. After that you could yank the old ones > for use as spares or whatever, and replace them with ssds, which > could be setup with bcache as well and then activated. Given the > cost of a single ssd, the total cost of four of them plus four hdds > should still be below the cost of five ssds, and you're still not > using more than the 8 total hookups you had already mentioned, so it > should be quite reasonable to do it that way. You can go there with only one additional HDD as temporary storage. Just connect it, format as bcache, then do a "btrfs dev replace". Now wipe that "free" HDD (use wipefs), format as bcache, then... well, you get the point. At the last step, remove the remaining HDD. Now add your SSDs, format as caching device, and attach each individual HDD backing bcache to each SSD caching bcache. Devices don't need to be formatted and created at the same time. I'd also recommend to add all SSDs only in the last step to not wear them early with writes during device replacement. If you want, you can add one additional step to get the temporary hard disk back. But why not simply replace the oldest hard disk with the newest. Take a look at smartctl to see which is the best candidate. I went a similar route but without one extra HDD. I had three HDDs in mraid1/draid0 and enough spare space. I just removed one HDD, prepared it for bcache, then added it back and removed the next. -- Regards, Kai Replies to list-only preferred. ^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: Hot data tracking / hybrid storage 2016-05-15 23:05 ` Kai Krakow @ 2016-05-17 6:27 ` Ferry Toth 2016-05-17 11:32 ` Austin S. Hemmelgarn 0 siblings, 1 reply; 26+ messages in thread From: Ferry Toth @ 2016-05-17 6:27 UTC (permalink / raw) To: linux-btrfs Op Mon, 16 May 2016 01:05:24 +0200, schreef Kai Krakow: > Am Sun, 15 May 2016 21:11:11 +0000 (UTC) > schrieb Duncan <1i5t5.duncan@cox.net>: > >> Ferry Toth posted on Sun, 15 May 2016 12:12:09 +0000 as excerpted: >> <snip> > > You can go there with only one additional HDD as temporary storage. Just > connect it, format as bcache, then do a "btrfs dev replace". Now wipe > that "free" HDD (use wipefs), format as bcache, then... well, you get > the point. At the last step, remove the remaining HDD. Now add your > SSDs, format as caching device, and attach each individual HDD backing > bcache to each SSD caching bcache. > > Devices don't need to be formatted and created at the same time. I'd > also recommend to add all SSDs only in the last step to not wear them > early with writes during device replacement. > > If you want, you can add one additional step to get the temporary hard > disk back. But why not simply replace the oldest hard disk with the > newest. Take a look at smartctl to see which is the best candidate. > > I went a similar route but without one extra HDD. I had three HDDs in > mraid1/draid0 and enough spare space. I just removed one HDD, prepared > it for bcache, then added it back and removed the next. > That's what I mean, a lot of work. And it's still a cache, with unnecessary copying from the ssd to the hdd. And what happens when either a hdd or ssd starts failing? > -- > Regards, > Kai > > Replies to list-only preferred. ^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: Hot data tracking / hybrid storage 2016-05-17 6:27 ` Ferry Toth @ 2016-05-17 11:32 ` Austin S. Hemmelgarn 2016-05-17 18:33 ` Kai Krakow 0 siblings, 1 reply; 26+ messages in thread From: Austin S. Hemmelgarn @ 2016-05-17 11:32 UTC (permalink / raw) To: Ferry Toth, linux-btrfs On 2016-05-17 02:27, Ferry Toth wrote: > Op Mon, 16 May 2016 01:05:24 +0200, schreef Kai Krakow: > >> Am Sun, 15 May 2016 21:11:11 +0000 (UTC) >> schrieb Duncan <1i5t5.duncan@cox.net>: >> >>> Ferry Toth posted on Sun, 15 May 2016 12:12:09 +0000 as excerpted: >>> > <snip> >> >> You can go there with only one additional HDD as temporary storage. Just >> connect it, format as bcache, then do a "btrfs dev replace". Now wipe >> that "free" HDD (use wipefs), format as bcache, then... well, you get >> the point. At the last step, remove the remaining HDD. Now add your >> SSDs, format as caching device, and attach each individual HDD backing >> bcache to each SSD caching bcache. >> >> Devices don't need to be formatted and created at the same time. I'd >> also recommend to add all SSDs only in the last step to not wear them >> early with writes during device replacement. >> >> If you want, you can add one additional step to get the temporary hard >> disk back. But why not simply replace the oldest hard disk with the >> newest. Take a look at smartctl to see which is the best candidate. >> >> I went a similar route but without one extra HDD. I had three HDDs in >> mraid1/draid0 and enough spare space. I just removed one HDD, prepared >> it for bcache, then added it back and removed the next. >> > That's what I mean, a lot of work. And it's still a cache, with > unnecessary copying from the ssd to the hdd. On the other hand, it's actually possible to do this all online with BTRFS because of the reshaping and device replacement tools. In fact, I've done even more complex reprovisioning online before (for example, my home server system has 2 SSD's and 4 HDD's, running BTRFS on top of LVM, I've at least twice completely recreated the LVM layer online without any data loss and minimal performance degradation). > > And what happens when either a hdd or ssd starts failing? I have absolutely no idea how bcache handles this, but I doubt it's any better than BTRFS. ^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: Hot data tracking / hybrid storage 2016-05-17 11:32 ` Austin S. Hemmelgarn @ 2016-05-17 18:33 ` Kai Krakow 2016-05-18 22:44 ` Ferry Toth 0 siblings, 1 reply; 26+ messages in thread From: Kai Krakow @ 2016-05-17 18:33 UTC (permalink / raw) To: linux-btrfs Am Tue, 17 May 2016 07:32:11 -0400 schrieb "Austin S. Hemmelgarn" <ahferroin7@gmail.com>: > On 2016-05-17 02:27, Ferry Toth wrote: > > Op Mon, 16 May 2016 01:05:24 +0200, schreef Kai Krakow: > > > >> Am Sun, 15 May 2016 21:11:11 +0000 (UTC) > >> schrieb Duncan <1i5t5.duncan@cox.net>: > >> > [...] > > <snip> > >> > >> You can go there with only one additional HDD as temporary > >> storage. Just connect it, format as bcache, then do a "btrfs dev > >> replace". Now wipe that "free" HDD (use wipefs), format as bcache, > >> then... well, you get the point. At the last step, remove the > >> remaining HDD. Now add your SSDs, format as caching device, and > >> attach each individual HDD backing bcache to each SSD caching > >> bcache. > >> > >> Devices don't need to be formatted and created at the same time. > >> I'd also recommend to add all SSDs only in the last step to not > >> wear them early with writes during device replacement. > >> > >> If you want, you can add one additional step to get the temporary > >> hard disk back. But why not simply replace the oldest hard disk > >> with the newest. Take a look at smartctl to see which is the best > >> candidate. > >> > >> I went a similar route but without one extra HDD. I had three HDDs > >> in mraid1/draid0 and enough spare space. I just removed one HDD, > >> prepared it for bcache, then added it back and removed the next. > >> > > That's what I mean, a lot of work. And it's still a cache, with > > unnecessary copying from the ssd to the hdd. > On the other hand, it's actually possible to do this all online with > BTRFS because of the reshaping and device replacement tools. > > In fact, I've done even more complex reprovisioning online before > (for example, my home server system has 2 SSD's and 4 HDD's, running > BTRFS on top of LVM, I've at least twice completely recreated the LVM > layer online without any data loss and minimal performance > degradation). > > > > And what happens when either a hdd or ssd starts failing? > I have absolutely no idea how bcache handles this, but I doubt it's > any better than BTRFS. Bcache should in theory fall back to write-through as soon as an error counter exceeds a threshold. This is adjustable with sysfs io_error_halftime and io_error_limit. Tho I never tried what actually happens when either the HDD (in bcache writeback-mode) or the SSD fails. Actually, btrfs should be able to handle this (tho, according to list reports, it doesn't handle errors very well at this point). BTW: Unnecessary copying from SSD to HDD doesn't take place in bcache default mode: It only copies from HDD to SSD in writeback mode (data is written to the cache first, then persisted to HDD in the background). You can also use "write through" (data is written to SSD and persisted to HDD at the same time, reporting persistence to the application only when both copies were written) and "write around" mode (data is written to HDD only, and only reads are written to the SSD cache device). If you want bcache behave as a huge IO scheduler for writes, use writeback mode. If you have write-intensive applications, you may want to choose write-around to not wear out the SSDs early. If you want writes to be cached for later reads, you can choose write-through mode. The latter two modes will ensure written data is always persisted to HDD with the same guaranties you had without bcache. The last mode is default and should not change behavior of btrfs if the HDD fails, and if the SSD fails bcache would simply turn off and fall back to HDD. -- Regards, Kai Replies to list-only preferred. ^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: Hot data tracking / hybrid storage 2016-05-17 18:33 ` Kai Krakow @ 2016-05-18 22:44 ` Ferry Toth 2016-05-19 18:09 ` Kai Krakow 0 siblings, 1 reply; 26+ messages in thread From: Ferry Toth @ 2016-05-18 22:44 UTC (permalink / raw) To: linux-btrfs Op Tue, 17 May 2016 20:33:35 +0200, schreef Kai Krakow: > Am Tue, 17 May 2016 07:32:11 -0400 schrieb "Austin S. Hemmelgarn" > <ahferroin7@gmail.com>: > >> On 2016-05-17 02:27, Ferry Toth wrote: >> > Op Mon, 16 May 2016 01:05:24 +0200, schreef Kai Krakow: >> > >> >> Am Sun, 15 May 2016 21:11:11 +0000 (UTC) >> >> schrieb Duncan <1i5t5.duncan@cox.net>: >> >> >> [...] >> > <snip> >> >> >> >> You can go there with only one additional HDD as temporary storage. >> >> Just connect it, format as bcache, then do a "btrfs dev replace". >> >> Now wipe that "free" HDD (use wipefs), format as bcache, >> >> then... well, you get the point. At the last step, remove the >> >> remaining HDD. Now add your SSDs, format as caching device, and >> >> attach each individual HDD backing bcache to each SSD caching >> >> bcache. >> >> >> >> Devices don't need to be formatted and created at the same time. I'd >> >> also recommend to add all SSDs only in the last step to not wear >> >> them early with writes during device replacement. >> >> >> >> If you want, you can add one additional step to get the temporary >> >> hard disk back. But why not simply replace the oldest hard disk with >> >> the newest. Take a look at smartctl to see which is the best >> >> candidate. >> >> >> >> I went a similar route but without one extra HDD. I had three HDDs >> >> in mraid1/draid0 and enough spare space. I just removed one HDD, >> >> prepared it for bcache, then added it back and removed the next. >> >> >> > That's what I mean, a lot of work. And it's still a cache, with >> > unnecessary copying from the ssd to the hdd. >> On the other hand, it's actually possible to do this all online with >> BTRFS because of the reshaping and device replacement tools. >> >> In fact, I've done even more complex reprovisioning online before (for >> example, my home server system has 2 SSD's and 4 HDD's, running BTRFS >> on top of LVM, I've at least twice completely recreated the LVM layer >> online without any data loss and minimal performance degradation). >> > >> > And what happens when either a hdd or ssd starts failing? >> I have absolutely no idea how bcache handles this, but I doubt it's any >> better than BTRFS. > > Bcache should in theory fall back to write-through as soon as an error > counter exceeds a threshold. This is adjustable with sysfs > io_error_halftime and io_error_limit. Tho I never tried what actually > happens when either the HDD (in bcache writeback-mode) or the SSD fails. > Actually, btrfs should be able to handle this (tho, according to list > reports, it doesn't handle errors very well at this point). > > BTW: Unnecessary copying from SSD to HDD doesn't take place in bcache > default mode: It only copies from HDD to SSD in writeback mode (data is > written to the cache first, then persisted to HDD in the background). > You can also use "write through" (data is written to SSD and persisted > to HDD at the same time, reporting persistence to the application only > when both copies were written) and "write around" mode (data is written > to HDD only, and only reads are written to the SSD cache device). > > If you want bcache behave as a huge IO scheduler for writes, use > writeback mode. If you have write-intensive applications, you may want > to choose write-around to not wear out the SSDs early. If you want > writes to be cached for later reads, you can choose write-through mode. > The latter two modes will ensure written data is always persisted to HDD > with the same guaranties you had without bcache. The last mode is > default and should not change behavior of btrfs if the HDD fails, and if > the SSD fails bcache would simply turn off and fall back to HDD. > Hello Kai, Yeah, lots of modes. So that means, none works well for all cases? Our server has lots of old files, on smb (various size), imap (10000's small, 1000's large), postgresql server, virtualbox images (large), 50 or so snapshots and running synaptics for system upgrades is painfully slow. We are expecting slowness to be caused by fsyncs which appear to be much worse on a raid10 with snapshots. Presumably the whole thing would be fast enough with ssd's but that would be not very cost efficient. All the overhead of the cache layer could be avoided if btrfs would just prefer to write small, hot, files to the ssd in the first place and clean up while balancing. A combination of 2 ssd's and 4 hdd's would be very nice (the mobo has 6 x sata, which is pretty common) Moreover increasing the ssd's size in the future would then be just as simple as replacing a disk by a larger one. I think many would sign up for such a low maintenance, efficient setup that doesn't require a PhD in IT to think out and configure. Even at home, I would just throw in a low cost ssd next to the hdd if it was as simple as device add. But I wouldn't want to store my photo/video collection on just ssd, too expensive. > Regards, > Kai > > Replies to list-only preferred. ^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: Hot data tracking / hybrid storage 2016-05-18 22:44 ` Ferry Toth @ 2016-05-19 18:09 ` Kai Krakow 2016-05-19 18:51 ` Austin S. Hemmelgarn 0 siblings, 1 reply; 26+ messages in thread From: Kai Krakow @ 2016-05-19 18:09 UTC (permalink / raw) To: linux-btrfs Am Wed, 18 May 2016 22:44:55 +0000 (UTC) schrieb Ferry Toth <ftoth@exalondelft.nl>: > Op Tue, 17 May 2016 20:33:35 +0200, schreef Kai Krakow: > > > Am Tue, 17 May 2016 07:32:11 -0400 schrieb "Austin S. Hemmelgarn" > > <ahferroin7@gmail.com>: > > > >> On 2016-05-17 02:27, Ferry Toth wrote: > [...] > [...] > >> [...] > [...] > [...] > [...] > >> On the other hand, it's actually possible to do this all online > >> with BTRFS because of the reshaping and device replacement tools. > >> > >> In fact, I've done even more complex reprovisioning online before > >> (for example, my home server system has 2 SSD's and 4 HDD's, > >> running BTRFS on top of LVM, I've at least twice completely > >> recreated the LVM layer online without any data loss and minimal > >> performance degradation). > [...] > >> I have absolutely no idea how bcache handles this, but I doubt > >> it's any better than BTRFS. > > > > Bcache should in theory fall back to write-through as soon as an > > error counter exceeds a threshold. This is adjustable with sysfs > > io_error_halftime and io_error_limit. Tho I never tried what > > actually happens when either the HDD (in bcache writeback-mode) or > > the SSD fails. Actually, btrfs should be able to handle this (tho, > > according to list reports, it doesn't handle errors very well at > > this point). > > > > BTW: Unnecessary copying from SSD to HDD doesn't take place in > > bcache default mode: It only copies from HDD to SSD in writeback > > mode (data is written to the cache first, then persisted to HDD in > > the background). You can also use "write through" (data is written > > to SSD and persisted to HDD at the same time, reporting persistence > > to the application only when both copies were written) and "write > > around" mode (data is written to HDD only, and only reads are > > written to the SSD cache device). > > > > If you want bcache behave as a huge IO scheduler for writes, use > > writeback mode. If you have write-intensive applications, you may > > want to choose write-around to not wear out the SSDs early. If you > > want writes to be cached for later reads, you can choose > > write-through mode. The latter two modes will ensure written data > > is always persisted to HDD with the same guaranties you had without > > bcache. The last mode is default and should not change behavior of > > btrfs if the HDD fails, and if the SSD fails bcache would simply > > turn off and fall back to HDD. > > Hello Kai, > > Yeah, lots of modes. So that means, none works well for all cases? Just three, and they all work well. It's just a decision wearing vs. performance/safety. Depending on your workload you might benefit more or less from write-behind caching - that's when you want to turn the knob. Everything else works out of the box. In case of an SSD failure, write-back is just less safe while the other two modes should keep your FS intact in that case. > Our server has lots of old files, on smb (various size), imap > (10000's small, 1000's large), postgresql server, virtualbox images > (large), 50 or so snapshots and running synaptics for system upgrades > is painfully slow. I don't think that bcache even cares to cache imap accesses to mail bodies - it won't help performance. Network is usually much slower than SSD access. But it will cache fs meta data which will improve imap performance a lot. > We are expecting slowness to be caused by fsyncs which appear to be > much worse on a raid10 with snapshots. Presumably the whole thing > would be fast enough with ssd's but that would be not very cost > efficient. > > All the overhead of the cache layer could be avoided if btrfs would > just prefer to write small, hot, files to the ssd in the first place > and clean up while balancing. A combination of 2 ssd's and 4 hdd's > would be very nice (the mobo has 6 x sata, which is pretty common) Well, I don't want to advertise bcache. But there's nothing you couldn't do with it in your particular case: Just attach two HDDs to one SSD. Bcache doesn't use a 1:1 relation here, you can use 1:n where n is the backing devices. There's no need to clean up using balancing because bcache will track hot data by default. You just have to decide which balance between wearing the SSD vs. performance you prefer. If slow fsyncs are you primary concern, I'd go with write-back caching. The small file contents are propably not your performance problem anyways but the meta data management btrfs has to do in the background. Bcache will help a lot here, especially in write-back mode. I'd recommend against using balance too often and too intensive (don't use too big usage% filters), it will invalidate your block cache and probably also invalidate bcache if bcache is too small. It will hurt performance more than you gain. You may want to increase nr_requests in the IO scheduler for your situation. > Moreover increasing the ssd's size in the future would then be just > as simple as replacing a disk by a larger one. It's as simple as detaching the HDDs from the caching SSD, replace it, reattach it. It can be done online without reboot. SATA is usually hotpluggable nowadays. > I think many would sign up for such a low maintenance, efficient > setup that doesn't require a PhD in IT to think out and configure. Bcache is actually low maintenance, no knobs to turn. Converting to bcache protective superblocks is a one-time procedure which can be done online. The bcache devices act as normal HDD if not attached to a caching SSD. It's really less pain than you may think. And it's a solution available now. Converting back later is easy: Just detach the HDDs from the SSDs and use them for some other purpose if you feel so later. Having the bcache protective superblock still in place doesn't hurt then. Bcache is a no-op without caching device attached. > Even at home, I would just throw in a low cost ssd next to the hdd if > it was as simple as device add. But I wouldn't want to store my > photo/video collection on just ssd, too expensive. Bcache won't store your photos if you copied them: Large copy operations (like backups) and sequential access is detected and bypassed by bcache. It won't invalidate your valuable "hot data" in the cache. It works really well. I'd even recommend to format filesystems with bcache protective superblock (aka format backing device) even if you not gonna use caching and not gonna insert an SSD now, just to have the option for the future easily and without much hassle. I don't think native hot data tracking will land in btrfs anytime soon (read: in the next 5 years). Bcache is a general purpose solution for all filesystems that works now (and properly works). You maybe want to clone your current system and try to integrate bcache to see the benefits. There's actually a really big impact on performance from my testing (home machine, 3x 1TB HDD btrfs mraid1 draid0, 1x 500GB SSD as cache, hit rate >90%, cache utilization ~70%, boot time improvement ~400%, application startup times almost instant, workload: MariaDB development server, git usage, 3 nspawn containers, VirtualBox Windows 7 + XP VMs, Steam gaming, daily rsync backups, btrfs 60% filled). I'd recommend to not use a too small SSD because it wears out very fast when used as cache (I think that generally applies and is not bcache specific). My old 120GB SSD was specified for 85TB write performance, and it was worn out after 12 months of bcache usage, which included 2 complete backup restores, multiple scrubs (which relocates and rewrites every data block), and weekly balances with relatime enabled. I've since used noatime+nossd, completely stopped using balance and never used scrub yet, with the result of vastly reduced write accesses to the caching SSD. This setup is able to write bursts of 800MB/s to the disk and read up to 800MB/s from disk (if btrfs can properly distribute reads to all disks). Bootchart shows up to 600 MB/s during cold booting (with warmed SSD cache). My nspawn containers boot in 1-2 seconds and do not add to the normal boot time at all (they are autostarted during boot, 1x MySQL, 1x ElasticSearch, 1x idle/spare/testing container). This is really impressive for a home machine, and c'mon: 3x 1TB HDD + 1x 500GB SSD is not that expensive nowadays. If you still prefer a low-end SSD I'd recommend to use write-around only from my own experience. The cache usage of the 120GB of 100% with 70-80% hit rate, which means it was constantly rewriting stuff. 500GB (which I use now) is a little underutilized now but almost no writes happen after warming up, so it's mostly a hot-data read cache (although I configured it as write-back). Plus, bigger SSDs are usually faster - especially for write ops. Conclusion: Btrfs + bcache make a very good pair. Btrfs is not really optimized for good latency and that's where bcache comes in. Operating noise from HDD reduces a lot as soon as bcache is warmed up. BTW: If deployed, keep an eye on your SSD wearing (using smartctl). But given you are using btrfs, you keep backups anyways. ;-) -- Regards, Kai Replies to list-only preferred. ^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: Hot data tracking / hybrid storage 2016-05-19 18:09 ` Kai Krakow @ 2016-05-19 18:51 ` Austin S. Hemmelgarn 2016-05-19 21:01 ` Kai Krakow 2016-05-19 23:23 ` Henk Slager 0 siblings, 2 replies; 26+ messages in thread From: Austin S. Hemmelgarn @ 2016-05-19 18:51 UTC (permalink / raw) To: linux-btrfs On 2016-05-19 14:09, Kai Krakow wrote: > Am Wed, 18 May 2016 22:44:55 +0000 (UTC) > schrieb Ferry Toth <ftoth@exalondelft.nl>: > >> Op Tue, 17 May 2016 20:33:35 +0200, schreef Kai Krakow: >> >>> Am Tue, 17 May 2016 07:32:11 -0400 schrieb "Austin S. Hemmelgarn" >>> <ahferroin7@gmail.com>: >>> >>>> On 2016-05-17 02:27, Ferry Toth wrote: >> [...] >> [...] >>>> [...] >> [...] >> [...] >> [...] >>>> On the other hand, it's actually possible to do this all online >>>> with BTRFS because of the reshaping and device replacement tools. >>>> >>>> In fact, I've done even more complex reprovisioning online before >>>> (for example, my home server system has 2 SSD's and 4 HDD's, >>>> running BTRFS on top of LVM, I've at least twice completely >>>> recreated the LVM layer online without any data loss and minimal >>>> performance degradation). >> [...] >>>> I have absolutely no idea how bcache handles this, but I doubt >>>> it's any better than BTRFS. >>> >>> Bcache should in theory fall back to write-through as soon as an >>> error counter exceeds a threshold. This is adjustable with sysfs >>> io_error_halftime and io_error_limit. Tho I never tried what >>> actually happens when either the HDD (in bcache writeback-mode) or >>> the SSD fails. Actually, btrfs should be able to handle this (tho, >>> according to list reports, it doesn't handle errors very well at >>> this point). >>> >>> BTW: Unnecessary copying from SSD to HDD doesn't take place in >>> bcache default mode: It only copies from HDD to SSD in writeback >>> mode (data is written to the cache first, then persisted to HDD in >>> the background). You can also use "write through" (data is written >>> to SSD and persisted to HDD at the same time, reporting persistence >>> to the application only when both copies were written) and "write >>> around" mode (data is written to HDD only, and only reads are >>> written to the SSD cache device). >>> >>> If you want bcache behave as a huge IO scheduler for writes, use >>> writeback mode. If you have write-intensive applications, you may >>> want to choose write-around to not wear out the SSDs early. If you >>> want writes to be cached for later reads, you can choose >>> write-through mode. The latter two modes will ensure written data >>> is always persisted to HDD with the same guaranties you had without >>> bcache. The last mode is default and should not change behavior of >>> btrfs if the HDD fails, and if the SSD fails bcache would simply >>> turn off and fall back to HDD. >> >> Hello Kai, >> >> Yeah, lots of modes. So that means, none works well for all cases? > > Just three, and they all work well. It's just a decision wearing vs. > performance/safety. Depending on your workload you might benefit more or > less from write-behind caching - that's when you want to turn the knob. > Everything else works out of the box. In case of an SSD failure, > write-back is just less safe while the other two modes should keep your > FS intact in that case. > >> Our server has lots of old files, on smb (various size), imap >> (10000's small, 1000's large), postgresql server, virtualbox images >> (large), 50 or so snapshots and running synaptics for system upgrades >> is painfully slow. > > I don't think that bcache even cares to cache imap accesses to mail > bodies - it won't help performance. Network is usually much slower than > SSD access. But it will cache fs meta data which will improve imap > performance a lot. Bcache caches anything that falls within it's heuristics as candidates for caching. It pays no attention to what type of data you're accessing, just the access patterns. This is also the case for dm-cache, and for Windows ReadyBoost (or whatever the hell they're calling it these days). Unless you're shifting very big e-mails, it's pretty likely that ones that get accessed more than once in a short period of time will end up being cached. > >> We are expecting slowness to be caused by fsyncs which appear to be >> much worse on a raid10 with snapshots. Presumably the whole thing >> would be fast enough with ssd's but that would be not very cost >> efficient. >> >> All the overhead of the cache layer could be avoided if btrfs would >> just prefer to write small, hot, files to the ssd in the first place >> and clean up while balancing. A combination of 2 ssd's and 4 hdd's >> would be very nice (the mobo has 6 x sata, which is pretty common) > > Well, I don't want to advertise bcache. But there's nothing you > couldn't do with it in your particular case: > > Just attach two HDDs to one SSD. Bcache doesn't use a 1:1 relation > here, you can use 1:n where n is the backing devices. There's no need > to clean up using balancing because bcache will track hot data by > default. You just have to decide which balance between wearing the SSD > vs. performance you prefer. If slow fsyncs are you primary concern, I'd > go with write-back caching. The small file contents are propably not > your performance problem anyways but the meta data management btrfs has > to do in the background. Bcache will help a lot here, especially in > write-back mode. I'd recommend against using balance too often and too > intensive (don't use too big usage% filters), it will invalidate your > block cache and probably also invalidate bcache if bcache is too small. > It will hurt performance more than you gain. You may want to increase > nr_requests in the IO scheduler for your situation. This may not perform as well as you would think, depending on your configuration. If things are in raid1 (or raid10) mode on the BTRFS side, then you can end up caching duplicate data (and on some workloads, you're almost guaranteed to cache duplicate data), which is a bigger issue when you're sharing a cache between devices, because it means they are competing for cache space. > >> Moreover increasing the ssd's size in the future would then be just >> as simple as replacing a disk by a larger one. > > It's as simple as detaching the HDDs from the caching SSD, replace it, > reattach it. It can be done online without reboot. SATA is usually > hotpluggable nowadays. > >> I think many would sign up for such a low maintenance, efficient >> setup that doesn't require a PhD in IT to think out and configure. > > Bcache is actually low maintenance, no knobs to turn. Converting to > bcache protective superblocks is a one-time procedure which can be done > online. The bcache devices act as normal HDD if not attached to a > caching SSD. It's really less pain than you may think. And it's a > solution available now. Converting back later is easy: Just detach the > HDDs from the SSDs and use them for some other purpose if you feel so > later. Having the bcache protective superblock still in place doesn't > hurt then. Bcache is a no-op without caching device attached. No, bcache is _almost_ a no-op without a caching device. From a userspace perspective, it does nothing, but it is still another layer of indirection in the kernel, which does have a small impact on performance. The same is true of using LVM with a single volume taking up the entire partition, it looks almost no different from just using the partition, but it will perform worse than using the partition directly. I've actually done profiling of both to figure out base values for the overhead, and while bcache with no cache device is not as bad as the LVM example, it can still be a roughly 0.5-2% slowdown (it gets more noticeable the faster your backing storage is). You also lose the ability to mount that filesystem directly on a kernel without bcache support (this may or may not be an issue for you). > >> Even at home, I would just throw in a low cost ssd next to the hdd if >> it was as simple as device add. But I wouldn't want to store my >> photo/video collection on just ssd, too expensive. > > Bcache won't store your photos if you copied them: Large copy > operations (like backups) and sequential access is detected and bypassed > by bcache. It won't invalidate your valuable "hot data" in the cache. > It works really well. > > I'd even recommend to format filesystems with bcache protective > superblock (aka format backing device) even if you not gonna use > caching and not gonna insert an SSD now, just to have the option for > the future easily and without much hassle. > > I don't think native hot data tracking will land in btrfs anytime soon > (read: in the next 5 years). Bcache is a general purpose solution for > all filesystems that works now (and properly works). > > You maybe want to clone your current system and try to integrate bcache > to see the benefits. There's actually a really big impact on > performance from my testing (home machine, 3x 1TB HDD btrfs mraid1 > draid0, 1x 500GB SSD as cache, hit rate >90%, cache utilization ~70%, > boot time improvement ~400%, application startup times almost instant, > workload: MariaDB development server, git usage, 3 nspawn containers, > VirtualBox Windows 7 + XP VMs, Steam gaming, daily rsync backups, btrfs > 60% filled). > > I'd recommend to not use a too small SSD because it wears out very fast > when used as cache (I think that generally applies and is not bcache > specific). My old 120GB SSD was specified for 85TB write performance, > and it was worn out after 12 months of bcache usage, which included 2 > complete backup restores, multiple scrubs (which relocates and rewrites > every data block), and weekly balances with relatime enabled. I've > since used noatime+nossd, completely stopped using balance and never > used scrub yet, with the result of vastly reduced write accesses to the > caching SSD. This setup is able to write bursts of 800MB/s to the disk > and read up to 800MB/s from disk (if btrfs can properly distribute > reads to all disks). Bootchart shows up to 600 MB/s during cold booting > (with warmed SSD cache). My nspawn containers boot in 1-2 seconds and > do not add to the normal boot time at all (they are autostarted during > boot, 1x MySQL, 1x ElasticSearch, 1x idle/spare/testing container). > This is really impressive for a home machine, and c'mon: 3x 1TB HDD + > 1x 500GB SSD is not that expensive nowadays. If you still prefer a > low-end SSD I'd recommend to use write-around only from my own > experience. > > The cache usage of the 120GB of 100% with 70-80% hit rate, which means > it was constantly rewriting stuff. 500GB (which I use now) is a little > underutilized now but almost no writes happen after warming up, so it's > mostly a hot-data read cache (although I configured it as write-back). > Plus, bigger SSDs are usually faster - especially for write ops. > > Conclusion: Btrfs + bcache make a very good pair. Btrfs is not really > optimized for good latency and that's where bcache comes in. Operating > noise from HDD reduces a lot as soon as bcache is warmed up. > > BTW: If deployed, keep an eye on your SSD wearing (using smartctl). But > given you are using btrfs, you keep backups anyways. ;-) Any decent SSD (read as 'any SSD of a major brand other than OCZ that you bought from a reputable source') will still take years to wear out unless you're constantly re-writing things and not using discard/trim support (and bcache does use discard). Even if you're not using discard/trim, the typical wear-out point is well over 100x the size of the SSD for the good consumer devices. For a point of reference, I've got a pair of 250GB Crucial MX100's (they cost less than 0.50 USD per GB when I got them and provide essentially the same power-loss protections that the high end Intel SSD's do) which have seen more than 2.5TB of data writes over their lifetime, combined from at least three different filesystem formats (BTRFS, FAT32, and ext4), swap space, and LVM management, and the wear-leveling indicator on each still says they have 100% life remaining, and the similar 500GB one I just recently upgraded in my laptop had seen over 50TB of writes and was still saying 95% life remaining (and had been for months). ^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: Hot data tracking / hybrid storage 2016-05-19 18:51 ` Austin S. Hemmelgarn @ 2016-05-19 21:01 ` Kai Krakow 2016-05-20 11:46 ` Austin S. Hemmelgarn 2016-05-19 23:23 ` Henk Slager 1 sibling, 1 reply; 26+ messages in thread From: Kai Krakow @ 2016-05-19 21:01 UTC (permalink / raw) To: linux-btrfs Am Thu, 19 May 2016 14:51:01 -0400 schrieb "Austin S. Hemmelgarn" <ahferroin7@gmail.com>: > For a point of reference, I've > got a pair of 250GB Crucial MX100's (they cost less than 0.50 USD per > GB when I got them and provide essentially the same power-loss > protections that the high end Intel SSD's do) which have seen more > than 2.5TB of data writes over their lifetime, combined from at least > three different filesystem formats (BTRFS, FAT32, and ext4), swap > space, and LVM management, and the wear-leveling indicator on each > still says they have 100% life remaining, and the similar 500GB one I > just recently upgraded in my laptop had seen over 50TB of writes and > was still saying 95% life remaining (and had been for months). The smaller Crucials are much worse at that: The MX100 128GB version I had was specified for 85TB writes which I hit after about 12 months (97% lifetime used according to smartctl) due to excessive write patterns. I'm not sure how long it would have lasted but I decided to swap it for a Samsung 500GB drive, and reconfigure my system for much less write patterns. What should I say: I liked the Crucial more, first: It has an easy lifetime counter in smartctl, Samsung doesn't. And it had powerloss protection which Samsung doesn't explicitly mention (tho I think it has it). At least, according to endurance tests, my Samsung SSD should take about 1 PB of writes. I've already written 7 TB if I can trust the smartctl raw value. But I think you cannot compare specification values to a real endurance test... I think it says 150TBW for 500GB 850 EVO. -- Regards, Kai Replies to list-only preferred. ^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: Hot data tracking / hybrid storage 2016-05-19 21:01 ` Kai Krakow @ 2016-05-20 11:46 ` Austin S. Hemmelgarn 0 siblings, 0 replies; 26+ messages in thread From: Austin S. Hemmelgarn @ 2016-05-20 11:46 UTC (permalink / raw) To: linux-btrfs On 2016-05-19 17:01, Kai Krakow wrote: > Am Thu, 19 May 2016 14:51:01 -0400 > schrieb "Austin S. Hemmelgarn" <ahferroin7@gmail.com>: > >> For a point of reference, I've >> got a pair of 250GB Crucial MX100's (they cost less than 0.50 USD per >> GB when I got them and provide essentially the same power-loss >> protections that the high end Intel SSD's do) which have seen more >> than 2.5TB of data writes over their lifetime, combined from at least >> three different filesystem formats (BTRFS, FAT32, and ext4), swap >> space, and LVM management, and the wear-leveling indicator on each >> still says they have 100% life remaining, and the similar 500GB one I >> just recently upgraded in my laptop had seen over 50TB of writes and >> was still saying 95% life remaining (and had been for months). Correction, I hadn't checked recently, the 250G ones have seen about 6.336TB of writes (I hadn't checked for multiple months), and report 90% remaining life, with about 240 days of power-on time. This overall equates to about 775MBs of writes per-hour, and assuming similar write rates for the remaining life of the SSD, I can still expect roughly 9 years of service from these, which means about 10 years of life given my usage, which is well beyond what I typically get from a traditional hard disk for the same price, and far exceeds the typical usable life of most desktops, laptops, and even some workstation computers. And you have to also keep in mind, this 775MB/hour of writes is coming from a system that is running: * BOINC distributed computing applications (regularly downloading big files, and almost constantly writing data) * Dropbox * Software builds for almost a dozen different systems (I use Gentoo, so _everything_ is built locally) * Regression testing for BTRFS * Basic network services (DHCP, DNS, and similar things) * A tor entry node * A local mail server (store and forward only, I just use it for monitoring messages) And all of that (except the BTRFS regression testing) is running 24/7, and that's just the local VM's, and doesn't include the file sharing or SAN services. Root filesystems for all of these VM's are all on the SSD's, as is the host's root filesystem and swap partition, and many of the data partitions. And I haven't really done any write optimization, and it's still less than 1GB/hour of writes to the SSD. The typical user (including many types of server systems) will be writing much less than that most of the time. > > The smaller Crucials are much worse at that: The MX100 128GB version I > had was specified for 85TB writes which I hit after about 12 months (97% > lifetime used according to smartctl) due to excessive write patterns. > I'm not sure how long it would have lasted but I decided to swap it for > a Samsung 500GB drive, and reconfigure my system for much less write > patterns. > > What should I say: I liked the Crucial more, first: It has an easy > lifetime counter in smartctl, Samsung doesn't. And it had powerloss > protection which Samsung doesn't explicitly mention (tho I think it has > it). > > At least, according to endurance tests, my Samsung SSD should take > about 1 PB of writes. I've already written 7 TB if I can trust the > smartctl raw value. > > But I think you cannot compare specification values to a real endurance > test... I think it says 150TBW for 500GB 850 EVO. > The point was more that wear out is less of an issue for a lot of people than many individuals make it out to be, not me trying to make Crucial sound like an amazing brand. Yes, one of the Crucial MX100's may not last long as a Samsung EVO in a busy mail server or something similar, but for a majority of people, they will probably outlast the usefulness of the computer. ^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: Hot data tracking / hybrid storage 2016-05-19 18:51 ` Austin S. Hemmelgarn 2016-05-19 21:01 ` Kai Krakow @ 2016-05-19 23:23 ` Henk Slager 2016-05-20 12:03 ` Austin S. Hemmelgarn 1 sibling, 1 reply; 26+ messages in thread From: Henk Slager @ 2016-05-19 23:23 UTC (permalink / raw) To: linux-btrfs On Thu, May 19, 2016 at 8:51 PM, Austin S. Hemmelgarn <ahferroin7@gmail.com> wrote: > On 2016-05-19 14:09, Kai Krakow wrote: >> >> Am Wed, 18 May 2016 22:44:55 +0000 (UTC) >> schrieb Ferry Toth <ftoth@exalondelft.nl>: >> >>> Op Tue, 17 May 2016 20:33:35 +0200, schreef Kai Krakow: >>> >>>> Am Tue, 17 May 2016 07:32:11 -0400 schrieb "Austin S. Hemmelgarn" >>>> <ahferroin7@gmail.com>: >>>> >>>>> On 2016-05-17 02:27, Ferry Toth wrote: >>> >>> [...] >>> [...] >>>>> >>>>> [...] >>> >>> [...] >>> [...] >>> [...] >>>>> >>>>> On the other hand, it's actually possible to do this all online >>>>> with BTRFS because of the reshaping and device replacement tools. >>>>> >>>>> In fact, I've done even more complex reprovisioning online before >>>>> (for example, my home server system has 2 SSD's and 4 HDD's, >>>>> running BTRFS on top of LVM, I've at least twice completely >>>>> recreated the LVM layer online without any data loss and minimal >>>>> performance degradation). >>> >>> [...] >>>>> >>>>> I have absolutely no idea how bcache handles this, but I doubt >>>>> it's any better than BTRFS. >>>> >>>> >>>> Bcache should in theory fall back to write-through as soon as an >>>> error counter exceeds a threshold. This is adjustable with sysfs >>>> io_error_halftime and io_error_limit. Tho I never tried what >>>> actually happens when either the HDD (in bcache writeback-mode) or >>>> the SSD fails. Actually, btrfs should be able to handle this (tho, >>>> according to list reports, it doesn't handle errors very well at >>>> this point). >>>> >>>> BTW: Unnecessary copying from SSD to HDD doesn't take place in >>>> bcache default mode: It only copies from HDD to SSD in writeback >>>> mode (data is written to the cache first, then persisted to HDD in >>>> the background). You can also use "write through" (data is written >>>> to SSD and persisted to HDD at the same time, reporting persistence >>>> to the application only when both copies were written) and "write >>>> around" mode (data is written to HDD only, and only reads are >>>> written to the SSD cache device). >>>> >>>> If you want bcache behave as a huge IO scheduler for writes, use >>>> writeback mode. If you have write-intensive applications, you may >>>> want to choose write-around to not wear out the SSDs early. If you >>>> want writes to be cached for later reads, you can choose >>>> write-through mode. The latter two modes will ensure written data >>>> is always persisted to HDD with the same guaranties you had without >>>> bcache. The last mode is default and should not change behavior of >>>> btrfs if the HDD fails, and if the SSD fails bcache would simply >>>> turn off and fall back to HDD. >>> >>> >>> Hello Kai, >>> >>> Yeah, lots of modes. So that means, none works well for all cases? >> >> >> Just three, and they all work well. It's just a decision wearing vs. >> performance/safety. Depending on your workload you might benefit more or >> less from write-behind caching - that's when you want to turn the knob. >> Everything else works out of the box. In case of an SSD failure, >> write-back is just less safe while the other two modes should keep your >> FS intact in that case. >> >>> Our server has lots of old files, on smb (various size), imap >>> (10000's small, 1000's large), postgresql server, virtualbox images >>> (large), 50 or so snapshots and running synaptics for system upgrades >>> is painfully slow. >> >> >> I don't think that bcache even cares to cache imap accesses to mail >> bodies - it won't help performance. Network is usually much slower than >> SSD access. But it will cache fs meta data which will improve imap >> performance a lot. > > Bcache caches anything that falls within it's heuristics as candidates for > caching. It pays no attention to what type of data you're accessing, just > the access patterns. This is also the case for dm-cache, and for Windows > ReadyBoost (or whatever the hell they're calling it these days). Unless > you're shifting very big e-mails, it's pretty likely that ones that get > accessed more than once in a short period of time will end up being cached. >> >> >>> We are expecting slowness to be caused by fsyncs which appear to be >>> much worse on a raid10 with snapshots. Presumably the whole thing >>> would be fast enough with ssd's but that would be not very cost >>> efficient. >>> >>> All the overhead of the cache layer could be avoided if btrfs would >>> just prefer to write small, hot, files to the ssd in the first place >>> and clean up while balancing. A combination of 2 ssd's and 4 hdd's >>> would be very nice (the mobo has 6 x sata, which is pretty common) >> >> >> Well, I don't want to advertise bcache. But there's nothing you >> couldn't do with it in your particular case: >> >> Just attach two HDDs to one SSD. Bcache doesn't use a 1:1 relation >> here, you can use 1:n where n is the backing devices. There's no need >> to clean up using balancing because bcache will track hot data by >> default. You just have to decide which balance between wearing the SSD >> vs. performance you prefer. If slow fsyncs are you primary concern, I'd >> go with write-back caching. The small file contents are propably not >> your performance problem anyways but the meta data management btrfs has >> to do in the background. Bcache will help a lot here, especially in >> write-back mode. I'd recommend against using balance too often and too >> intensive (don't use too big usage% filters), it will invalidate your >> block cache and probably also invalidate bcache if bcache is too small. >> It will hurt performance more than you gain. You may want to increase >> nr_requests in the IO scheduler for your situation. > > This may not perform as well as you would think, depending on your > configuration. If things are in raid1 (or raid10) mode on the BTRFS side, > then you can end up caching duplicate data (and on some workloads, you're > almost guaranteed to cache duplicate data), which is a bigger issue when > you're sharing a cache between devices, because it means they are competing > for cache space. >> >> >>> Moreover increasing the ssd's size in the future would then be just >>> as simple as replacing a disk by a larger one. >> >> >> It's as simple as detaching the HDDs from the caching SSD, replace it, >> reattach it. It can be done online without reboot. SATA is usually >> hotpluggable nowadays. >> >>> I think many would sign up for such a low maintenance, efficient >>> setup that doesn't require a PhD in IT to think out and configure. >> >> >> Bcache is actually low maintenance, no knobs to turn. Converting to >> bcache protective superblocks is a one-time procedure which can be done >> online. The bcache devices act as normal HDD if not attached to a >> caching SSD. It's really less pain than you may think. And it's a >> solution available now. Converting back later is easy: Just detach the >> HDDs from the SSDs and use them for some other purpose if you feel so >> later. Having the bcache protective superblock still in place doesn't >> hurt then. Bcache is a no-op without caching device attached. > > No, bcache is _almost_ a no-op without a caching device. From a userspace > perspective, it does nothing, but it is still another layer of indirection > in the kernel, which does have a small impact on performance. The same is > true of using LVM with a single volume taking up the entire partition, it > looks almost no different from just using the partition, but it will perform > worse than using the partition directly. I've actually done profiling of > both to figure out base values for the overhead, and while bcache with no > cache device is not as bad as the LVM example, it can still be a roughly > 0.5-2% slowdown (it gets more noticeable the faster your backing storage > is). > > You also lose the ability to mount that filesystem directly on a kernel > without bcache support (this may or may not be an issue for you). The bcache (protective) superblock is in an 8KiB block in front of the file system device. In case the current, non-bcached HDD's use modern partitioning, you can do a 5-minute remove or add of bcache, without moving/copying filesystem data. So in case you have a bcache-formatted HDD that had just 1 primary partition (512 byte logical sectors), the partition start is at sector 2048 and the filesystem start is at 2064. Hard removing bcache (so making sure the module is not needed/loaded/used the next boot) can be done done by changing the start-sector of the partition from 2048 to 2064. In gdisk one has to change the alignment to 16 first, otherwise this it refuses. And of course, also first flush+stop+de-register bcache for the HDD. The other way around is also possible, i.e. changing the start-sector from 2048 to 2032. So that makes adding bcache to an existing filesystem a 5 minute action and not a GBs- or TBs copy action. It is not online of course, but just one reboot is needed (or just umount, gdisk, partprobe, add bcache etc). For RAID setups, one could just do 1 HDD first. There is also a tool doing the conversion in-place (I haven't used it myself, my python(s) had trouble; I could do the partition table edit much faster/easier): https://github.com/g2p/blocks#bcache-conversion >>> Even at home, I would just throw in a low cost ssd next to the hdd if >>> it was as simple as device add. But I wouldn't want to store my >>> photo/video collection on just ssd, too expensive. >> >> >> Bcache won't store your photos if you copied them: Large copy >> operations (like backups) and sequential access is detected and bypassed >> by bcache. It won't invalidate your valuable "hot data" in the cache. >> It works really well. >> >> I'd even recommend to format filesystems with bcache protective >> superblock (aka format backing device) even if you not gonna use >> caching and not gonna insert an SSD now, just to have the option for >> the future easily and without much hassle. >> >> I don't think native hot data tracking will land in btrfs anytime soon >> (read: in the next 5 years). Bcache is a general purpose solution for >> all filesystems that works now (and properly works). >> >> You maybe want to clone your current system and try to integrate bcache >> to see the benefits. There's actually a really big impact on >> performance from my testing (home machine, 3x 1TB HDD btrfs mraid1 >> draid0, 1x 500GB SSD as cache, hit rate >90%, cache utilization ~70%, >> boot time improvement ~400%, application startup times almost instant, >> workload: MariaDB development server, git usage, 3 nspawn containers, >> VirtualBox Windows 7 + XP VMs, Steam gaming, daily rsync backups, btrfs >> 60% filled). >> >> I'd recommend to not use a too small SSD because it wears out very fast >> when used as cache (I think that generally applies and is not bcache >> specific). My old 120GB SSD was specified for 85TB write performance, >> and it was worn out after 12 months of bcache usage, which included 2 >> complete backup restores, multiple scrubs (which relocates and rewrites >> every data block), and weekly balances with relatime enabled. I've >> since used noatime+nossd, completely stopped using balance and never >> used scrub yet, with the result of vastly reduced write accesses to the >> caching SSD. This setup is able to write bursts of 800MB/s to the disk >> and read up to 800MB/s from disk (if btrfs can properly distribute >> reads to all disks). Bootchart shows up to 600 MB/s during cold booting >> (with warmed SSD cache). My nspawn containers boot in 1-2 seconds and >> do not add to the normal boot time at all (they are autostarted during >> boot, 1x MySQL, 1x ElasticSearch, 1x idle/spare/testing container). >> This is really impressive for a home machine, and c'mon: 3x 1TB HDD + >> 1x 500GB SSD is not that expensive nowadays. If you still prefer a >> low-end SSD I'd recommend to use write-around only from my own >> experience. >> >> The cache usage of the 120GB of 100% with 70-80% hit rate, which means >> it was constantly rewriting stuff. 500GB (which I use now) is a little >> underutilized now but almost no writes happen after warming up, so it's >> mostly a hot-data read cache (although I configured it as write-back). >> Plus, bigger SSDs are usually faster - especially for write ops. >> >> Conclusion: Btrfs + bcache make a very good pair. Btrfs is not really >> optimized for good latency and that's where bcache comes in. Operating >> noise from HDD reduces a lot as soon as bcache is warmed up. >> >> BTW: If deployed, keep an eye on your SSD wearing (using smartctl). But >> given you are using btrfs, you keep backups anyways. ;-) > > Any decent SSD (read as 'any SSD of a major brand other than OCZ that you > bought from a reputable source') will still take years to wear out unless > you're constantly re-writing things and not using discard/trim support (and > bcache does use discard). Even if you're not using discard/trim, the > typical wear-out point is well over 100x the size of the SSD for the good > consumer devices. For a point of reference, I've got a pair of 250GB > Crucial MX100's (they cost less than 0.50 USD per GB when I got them and > provide essentially the same power-loss protections that the high end Intel > SSD's do) which have seen more than 2.5TB of data writes over their > lifetime, combined from at least three different filesystem formats (BTRFS, > FAT32, and ext4), swap space, and LVM management, and the wear-leveling > indicator on each still says they have 100% life remaining, and the similar > 500GB one I just recently upgraded in my laptop had seen over 50TB of writes > and was still saying 95% life remaining (and had been for months). ^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: Hot data tracking / hybrid storage 2016-05-19 23:23 ` Henk Slager @ 2016-05-20 12:03 ` Austin S. Hemmelgarn 2016-05-20 17:02 ` Ferry Toth 2016-05-20 22:26 ` Henk Slager 0 siblings, 2 replies; 26+ messages in thread From: Austin S. Hemmelgarn @ 2016-05-20 12:03 UTC (permalink / raw) To: Henk Slager, linux-btrfs On 2016-05-19 19:23, Henk Slager wrote: > On Thu, May 19, 2016 at 8:51 PM, Austin S. Hemmelgarn > <ahferroin7@gmail.com> wrote: >> On 2016-05-19 14:09, Kai Krakow wrote: >>> >>> Am Wed, 18 May 2016 22:44:55 +0000 (UTC) >>> schrieb Ferry Toth <ftoth@exalondelft.nl>: >>> >>>> Op Tue, 17 May 2016 20:33:35 +0200, schreef Kai Krakow: >>> Bcache is actually low maintenance, no knobs to turn. Converting to >>> bcache protective superblocks is a one-time procedure which can be done >>> online. The bcache devices act as normal HDD if not attached to a >>> caching SSD. It's really less pain than you may think. And it's a >>> solution available now. Converting back later is easy: Just detach the >>> HDDs from the SSDs and use them for some other purpose if you feel so >>> later. Having the bcache protective superblock still in place doesn't >>> hurt then. Bcache is a no-op without caching device attached. >> >> No, bcache is _almost_ a no-op without a caching device. From a userspace >> perspective, it does nothing, but it is still another layer of indirection >> in the kernel, which does have a small impact on performance. The same is >> true of using LVM with a single volume taking up the entire partition, it >> looks almost no different from just using the partition, but it will perform >> worse than using the partition directly. I've actually done profiling of >> both to figure out base values for the overhead, and while bcache with no >> cache device is not as bad as the LVM example, it can still be a roughly >> 0.5-2% slowdown (it gets more noticeable the faster your backing storage >> is). >> >> You also lose the ability to mount that filesystem directly on a kernel >> without bcache support (this may or may not be an issue for you). > > The bcache (protective) superblock is in an 8KiB block in front of the > file system device. In case the current, non-bcached HDD's use modern > partitioning, you can do a 5-minute remove or add of bcache, without > moving/copying filesystem data. So in case you have a bcache-formatted > HDD that had just 1 primary partition (512 byte logical sectors), the > partition start is at sector 2048 and the filesystem start is at 2064. > Hard removing bcache (so making sure the module is not > needed/loaded/used the next boot) can be done done by changing the > start-sector of the partition from 2048 to 2064. In gdisk one has to > change the alignment to 16 first, otherwise this it refuses. And of > course, also first flush+stop+de-register bcache for the HDD. > > The other way around is also possible, i.e. changing the start-sector > from 2048 to 2032. So that makes adding bcache to an existing > filesystem a 5 minute action and not a GBs- or TBs copy action. It is > not online of course, but just one reboot is needed (or just umount, > gdisk, partprobe, add bcache etc). > For RAID setups, one could just do 1 HDD first. My argument about the overhead was not about the superblock, it was about the bcache layer itself. It isn't practical to just access the data directly if you plan on adding a cache device, because then you couldn't do so online unless you're going through bcache. This extra layer of indirection in the kernel does add overhead, regardless of the on-disk format. Secondarily, having a HDD with just one partition is not a typical use case, and that argument about the slack space resulting from the 1M alignment only holds true if you're using an MBR instead of a GPT layout (or for that matter, almost any other partition table format), and you're not booting from that disk (because GRUB embeds itself there). It's also fully possible to have an MBR formatted disk which doesn't have any spare space there too (which is how most flash drives get formatted). This also doesn't change the fact that without careful initial formatting (it is possible on some filesystems to embed the bcache SB at the beginning of the FS itself, many of them have some reserved space at the beginning of the partition for bootloaders, and this space doesn't have to exist when mounting the FS) or manual alteration of the partition, it's not possible to mount the FS on a system without bcache support. > > There is also a tool doing the conversion in-place (I haven't used it > myself, my python(s) had trouble; I could do the partition table edit > much faster/easier): > https://github.com/g2p/blocks#bcache-conversion > I actually hadn't known about this tool, thanks for mentioning it. ^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: Hot data tracking / hybrid storage 2016-05-20 12:03 ` Austin S. Hemmelgarn @ 2016-05-20 17:02 ` Ferry Toth 2016-05-20 17:59 ` Austin S. Hemmelgarn 2016-05-20 22:26 ` Henk Slager 1 sibling, 1 reply; 26+ messages in thread From: Ferry Toth @ 2016-05-20 17:02 UTC (permalink / raw) To: linux-btrfs Op Fri, 20 May 2016 08:03:12 -0400, schreef Austin S. Hemmelgarn: > On 2016-05-19 19:23, Henk Slager wrote: >> On Thu, May 19, 2016 at 8:51 PM, Austin S. Hemmelgarn >> <ahferroin7@gmail.com> wrote: >>> On 2016-05-19 14:09, Kai Krakow wrote: >>>> >>>> Am Wed, 18 May 2016 22:44:55 +0000 (UTC) >>>> schrieb Ferry Toth <ftoth@exalondelft.nl>: >>>> >>>>> Op Tue, 17 May 2016 20:33:35 +0200, schreef Kai Krakow: >>>> Bcache is actually low maintenance, no knobs to turn. Converting to >>>> bcache protective superblocks is a one-time procedure which can be >>>> done online. The bcache devices act as normal HDD if not attached to >>>> a caching SSD. It's really less pain than you may think. And it's a >>>> solution available now. Converting back later is easy: Just detach >>>> the HDDs from the SSDs and use them for some other purpose if you >>>> feel so later. Having the bcache protective superblock still in place >>>> doesn't hurt then. Bcache is a no-op without caching device attached. >>> >>> No, bcache is _almost_ a no-op without a caching device. From a >>> userspace perspective, it does nothing, but it is still another layer >>> of indirection in the kernel, which does have a small impact on >>> performance. The same is true of using LVM with a single volume >>> taking up the entire partition, it looks almost no different from just >>> using the partition, but it will perform worse than using the >>> partition directly. I've actually done profiling of both to figure >>> out base values for the overhead, and while bcache with no cache >>> device is not as bad as the LVM example, it can still be a roughly >>> 0.5-2% slowdown (it gets more noticeable the faster your backing >>> storage is). >>> >>> You also lose the ability to mount that filesystem directly on a >>> kernel without bcache support (this may or may not be an issue for >>> you). >> >> The bcache (protective) superblock is in an 8KiB block in front of the >> file system device. In case the current, non-bcached HDD's use modern >> partitioning, you can do a 5-minute remove or add of bcache, without >> moving/copying filesystem data. So in case you have a bcache-formatted >> HDD that had just 1 primary partition (512 byte logical sectors), the >> partition start is at sector 2048 and the filesystem start is at 2064. >> Hard removing bcache (so making sure the module is not >> needed/loaded/used the next boot) can be done done by changing the >> start-sector of the partition from 2048 to 2064. In gdisk one has to >> change the alignment to 16 first, otherwise this it refuses. And of >> course, also first flush+stop+de-register bcache for the HDD. >> >> The other way around is also possible, i.e. changing the start-sector >> from 2048 to 2032. So that makes adding bcache to an existing >> filesystem a 5 minute action and not a GBs- or TBs copy action. It is >> not online of course, but just one reboot is needed (or just umount, >> gdisk, partprobe, add bcache etc). >> For RAID setups, one could just do 1 HDD first. > My argument about the overhead was not about the superblock, it was > about the bcache layer itself. It isn't practical to just access the > data directly if you plan on adding a cache device, because then you > couldn't do so online unless you're going through bcache. This extra > layer of indirection in the kernel does add overhead, regardless of the > on-disk format. > > Secondarily, having a HDD with just one partition is not a typical use > case, and that argument about the slack space resulting from the 1M > alignment only holds true if you're using an MBR instead of a GPT layout > (or for that matter, almost any other partition table format), and > you're not booting from that disk (because GRUB embeds itself there). > It's also fully possible to have an MBR formatted disk which doesn't > have any spare space there too (which is how most flash drives get > formatted). We have 4 1TB drives in MBR, 1MB free at the beginning, grub on all 4, then 8GB swap, then all the rest btrfs (no LVM used). The 4 btrfs partitions are in the same pool, which is in btrfs RAID10 format. /boot is in subvolume @boot. In this configuration nothing would beat btrfs if I could just add 2 SSD's to the pool that would be clever enough to be paired in RAID1 and would be preferred for small (<1GB) file writes. Then balance should be able to move not often used files to the HDD. None of the methods mentioned here sound easy or quick to do, or even well tested. > This also doesn't change the fact that without careful initial > formatting (it is possible on some filesystems to embed the bcache SB at > the beginning of the FS itself, many of them have some reserved space at > the beginning of the partition for bootloaders, and this space doesn't > have to exist when mounting the FS) or manual alteration of the > partition, it's not possible to mount the FS on a system without bcache > support. >> >> There is also a tool doing the conversion in-place (I haven't used it >> myself, my python(s) had trouble; I could do the partition table edit >> much faster/easier): >> https://github.com/g2p/blocks#bcache-conversion >> > I actually hadn't known about this tool, thanks for mentioning it. ^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: Hot data tracking / hybrid storage 2016-05-20 17:02 ` Ferry Toth @ 2016-05-20 17:59 ` Austin S. Hemmelgarn 2016-05-20 21:31 ` Henk Slager 2016-05-29 6:23 ` Andrei Borzenkov 0 siblings, 2 replies; 26+ messages in thread From: Austin S. Hemmelgarn @ 2016-05-20 17:59 UTC (permalink / raw) To: Ferry Toth, linux-btrfs On 2016-05-20 13:02, Ferry Toth wrote: > We have 4 1TB drives in MBR, 1MB free at the beginning, grub on all 4, > then 8GB swap, then all the rest btrfs (no LVM used). The 4 btrfs > partitions are in the same pool, which is in btrfs RAID10 format. /boot > is in subvolume @boot. If you have GRUB installed on all 4, then you don't actually have the full 2047 sectors between the MBR and the partition free, as GRUB is embedded in that space. I forget exactly how much space it takes up, but I know it's not the whole 1023.5K I would not suggest risking usage of the final 8k there though. You could however convert to raid1 temporarily, and then for each device, delete it, reformat for bcache, then re-add it to the FS. This may take a while, but should be safe (of course, it's only an option if you're already using a kernel with bcache support). > In this configuration nothing would beat btrfs if I could just add 2 > SSD's to the pool that would be clever enough to be paired in RAID1 and > would be preferred for small (<1GB) file writes. Then balance should be > able to move not often used files to the HDD. > > None of the methods mentioned here sound easy or quick to do, or even > well tested. It really depends on what you're used to. I would consider most of the options easy, but one of the areas I'm strongest with is storage management, and I've repaired damaged filesystems and partition tables by hand with a hex editor before, so I'm not necessarily a typical user. If I was going to suggest something specifically, it would be dm-cache, because it requires no modification to the backing store at all, but that would require running on LVM if you want it to be easy to set up (it's possible to do it without LVM, but you need something to call dmsetup before mounting the filesystem, which is not easy to configure correctly), and if you're on an enterprise distro, it may not be supported. If you wanted to, it's possible, and not all that difficult, to convert a BTRFS system to BTRFS on top of LVM online, but you would probably have to split out the boot subvolume to a separate partition (depends on which distro you're on, some have working LVM support in GRUB, some don't). If you're on a distro which does have LVM support in GRUB, the procedure would be: 1. Convert the BTRFS array to raid1. This lets you run with only 3 disks instead of 4. 2. Delete one of the disks from the array. 3. Convert the disk you deleted from the array to a LVM PV and add it to a VG. 4. Create a new logical volume occupying almost all of the PV you just added (having a little slack space is usually a good thing). 5. Add use btrfs replace to add the LV to the BTRFS array while deleting one of the others. 6. Repeat from step 3-5 for each disk, but stop at step 4 when you have exactly one disk that isn't on LVM (so for four disks, stop at step four when you have 2 with BTRFS+LVM, one with just the LVM logical volume, and one with just BTRFS). 7. Reinstall GRUB (it should pull in LVM support now). 8. Use BTRFS replace to move the final BTRFS disk to the empty LVM volume. 9. Convert the now empty final disk to LVM using steps 3-4 10. Add the LV to the BTRFS array and rebalance to raid10. 11. Reinstall GRUB again (just to be certain). I've done essentially the same thing on numerous occasions when reprovisioning for various reasons, and it's actually one of the things outside of the xfstests that I check with my regression testing (including simulating a couple of the common failure modes). It takes a while (especially for big arrays with lots of data), but it works, and is relatively safe (you are guaranteed to be able to rebuild a raid1 array of 3 disks from just 2, so losing the disk in the process of copying it will not result in data loss unless you hit a kernel bug). ^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: Hot data tracking / hybrid storage 2016-05-20 17:59 ` Austin S. Hemmelgarn @ 2016-05-20 21:31 ` Henk Slager 2016-05-29 6:23 ` Andrei Borzenkov 1 sibling, 0 replies; 26+ messages in thread From: Henk Slager @ 2016-05-20 21:31 UTC (permalink / raw) To: linux-btrfs On Fri, May 20, 2016 at 7:59 PM, Austin S. Hemmelgarn <ahferroin7@gmail.com> wrote: > On 2016-05-20 13:02, Ferry Toth wrote: >> >> We have 4 1TB drives in MBR, 1MB free at the beginning, grub on all 4, >> then 8GB swap, then all the rest btrfs (no LVM used). The 4 btrfs >> partitions are in the same pool, which is in btrfs RAID10 format. /boot >> is in subvolume @boot. > > If you have GRUB installed on all 4, then you don't actually have the full > 2047 sectors between the MBR and the partition free, as GRUB is embedded in > that space. I forget exactly how much space it takes up, but I know it's > not the whole 1023.5K I would not suggest risking usage of the final 8k > there though. You could however convert to raid1 temporarily, and then for > each device, delete it, reformat for bcache, then re-add it to the FS. This > may take a while, but should be safe (of course, it's only an option if > you're already using a kernel with bcache support). There is more then enough space in that 2047 sectors area for inserting a bcache SB, but initially I also found it risky and was not so sure. I anyhow don't want GRUB in the MBR, but in the filesystem/OS partition that it should boot, otherwise multi-OS on the same SSD or HDD gets into trouble. For the described system, assuming a few minutes offline or 'maintenance' mode is acceptable, I personally would just shrink the swap by 8KiB, lower its end-sector by 16 and also lower the start-sector of the btrfs partition by 16 and then add bcache. The location of GRUB should not matter actually. >> In this configuration nothing would beat btrfs if I could just add 2 >> SSD's to the pool that would be clever enough to be paired in RAID1 and >> would be preferred for small (<1GB) file writes. Then balance should be >> able to move not often used files to the HDD. >> >> None of the methods mentioned here sound easy or quick to do, or even >> well tested. I agree that all the methods are actually quite complicated, especially if compared to ZFS and its tools. Adding an ARC is as simple and easy as you want and describe. The statement I wanted make is that adding bcache for a (btrfs) file-system can be done without touching the FS itself, provided that one can allow some offline time for the FS. > It really depends on what you're used to. I would consider most of the > options easy, but one of the areas I'm strongest with is storage management, > and I've repaired damaged filesystems and partition tables by hand with a > hex editor before, so I'm not necessarily a typical user. If I was going to > suggest something specifically, it would be dm-cache, because it requires no > modification to the backing store at all, but that would require running on > LVM if you want it to be easy to set up (it's possible to do it without LVM, > but you need something to call dmsetup before mounting the filesystem, which > is not easy to configure correctly), and if you're on an enterprise distro, > it may not be supported. > > If you wanted to, it's possible, and not all that difficult, to convert a > BTRFS system to BTRFS on top of LVM online, but you would probably have to > split out the boot subvolume to a separate partition (depends on which > distro you're on, some have working LVM support in GRUB, some don't). If > you're on a distro which does have LVM support in GRUB, the procedure would > be: > 1. Convert the BTRFS array to raid1. This lets you run with only 3 disks > instead of 4. > 2. Delete one of the disks from the array. > 3. Convert the disk you deleted from the array to a LVM PV and add it to a > VG. > 4. Create a new logical volume occupying almost all of the PV you just added > (having a little slack space is usually a good thing). > 5. Add use btrfs replace to add the LV to the BTRFS array while deleting one > of the others. > 6. Repeat from step 3-5 for each disk, but stop at step 4 when you have > exactly one disk that isn't on LVM (so for four disks, stop at step four > when you have 2 with BTRFS+LVM, one with just the LVM logical volume, and > one with just BTRFS). > 7. Reinstall GRUB (it should pull in LVM support now). > 8. Use BTRFS replace to move the final BTRFS disk to the empty LVM volume. > 9. Convert the now empty final disk to LVM using steps 3-4 > 10. Add the LV to the BTRFS array and rebalance to raid10. > 11. Reinstall GRUB again (just to be certain). > > I've done essentially the same thing on numerous occasions when > reprovisioning for various reasons, and it's actually one of the things > outside of the xfstests that I check with my regression testing (including > simulating a couple of the common failure modes). It takes a while > (especially for big arrays with lots of data), but it works, and is > relatively safe (you are guaranteed to be able to rebuild a raid1 array of 3 > disks from just 2, so losing the disk in the process of copying it will not > result in data loss unless you hit a kernel bug). ^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: Hot data tracking / hybrid storage 2016-05-20 17:59 ` Austin S. Hemmelgarn 2016-05-20 21:31 ` Henk Slager @ 2016-05-29 6:23 ` Andrei Borzenkov 2016-05-29 17:53 ` Chris Murphy 1 sibling, 1 reply; 26+ messages in thread From: Andrei Borzenkov @ 2016-05-29 6:23 UTC (permalink / raw) To: Austin S. Hemmelgarn, Ferry Toth, linux-btrfs 20.05.2016 20:59, Austin S. Hemmelgarn пишет: > On 2016-05-20 13:02, Ferry Toth wrote: >> We have 4 1TB drives in MBR, 1MB free at the beginning, grub on all 4, >> then 8GB swap, then all the rest btrfs (no LVM used). The 4 btrfs >> partitions are in the same pool, which is in btrfs RAID10 format. /boot >> is in subvolume @boot. > If you have GRUB installed on all 4, then you don't actually have the > full 2047 sectors between the MBR and the partition free, as GRUB is > embedded in that space. I forget exactly how much space it takes up, > but I know it's not the whole 1023.5K I would not suggest risking usage > of the final 8k there though. If you mean grub2, required space is variable and depends on where /boot/grub is located (i.e. which drivers it needs to access it). Assuming plain btrfs on legacy BIOS MBR, required space is around 40-50KB. Note that grub2 detects some post-MBR gap software signatures and skips over them (space need not be contiguous). It is entirely possible to add bcache detection if enough demand exists. ^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: Hot data tracking / hybrid storage 2016-05-29 6:23 ` Andrei Borzenkov @ 2016-05-29 17:53 ` Chris Murphy 2016-05-29 18:03 ` Holger Hoffstätte 0 siblings, 1 reply; 26+ messages in thread From: Chris Murphy @ 2016-05-29 17:53 UTC (permalink / raw) To: Andrei Borzenkov; +Cc: Austin S. Hemmelgarn, Ferry Toth, Btrfs BTRFS On Sun, May 29, 2016 at 12:23 AM, Andrei Borzenkov <arvidjaar@gmail.com> wrote: > 20.05.2016 20:59, Austin S. Hemmelgarn пишет: >> On 2016-05-20 13:02, Ferry Toth wrote: >>> We have 4 1TB drives in MBR, 1MB free at the beginning, grub on all 4, >>> then 8GB swap, then all the rest btrfs (no LVM used). The 4 btrfs >>> partitions are in the same pool, which is in btrfs RAID10 format. /boot >>> is in subvolume @boot. >> If you have GRUB installed on all 4, then you don't actually have the >> full 2047 sectors between the MBR and the partition free, as GRUB is >> embedded in that space. I forget exactly how much space it takes up, >> but I know it's not the whole 1023.5K I would not suggest risking usage >> of the final 8k there though. > > If you mean grub2, required space is variable and depends on where > /boot/grub is located (i.e. which drivers it needs to access it). > Assuming plain btrfs on legacy BIOS MBR, required space is around 40-50KB. > > Note that grub2 detects some post-MBR gap software signatures and skips > over them (space need not be contiguous). It is entirely possible to add > bcache detection if enough demand exists. Might not be a bad idea, just to avoid it getting stepped on and causing later confusion. If it is stepped on I don't think there's data loss except possibly in the case where there's an unclean shutdown where the SSD has bcache data that hasn't been committed to the HDD? But I'm skeptical of bcache using a hidden area historically for the bootloader, to put its device metadata. I didn't realize that was the case. Imagine if LVM were to stuff metadata into the MBR gap, or mdadm. Egads. -- Chris Murphy ^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: Hot data tracking / hybrid storage 2016-05-29 17:53 ` Chris Murphy @ 2016-05-29 18:03 ` Holger Hoffstätte 2016-05-29 18:33 ` Chris Murphy 0 siblings, 1 reply; 26+ messages in thread From: Holger Hoffstätte @ 2016-05-29 18:03 UTC (permalink / raw) To: linux-btrfs On 05/29/16 19:53, Chris Murphy wrote: > But I'm skeptical of bcache using a hidden area historically for the > bootloader, to put its device metadata. I didn't realize that was the > case. Imagine if LVM were to stuff metadata into the MBR gap, or > mdadm. Egads. On the matter of bcache in general this seems noteworthy: https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/?id=4d1034eb7c2f5e32d48ddc4dfce0f1a723d28667 bummer.. Holger ^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: Hot data tracking / hybrid storage 2016-05-29 18:03 ` Holger Hoffstätte @ 2016-05-29 18:33 ` Chris Murphy 2016-05-29 20:45 ` Ferry Toth 0 siblings, 1 reply; 26+ messages in thread From: Chris Murphy @ 2016-05-29 18:33 UTC (permalink / raw) To: Holger Hoffstätte; +Cc: linux-btrfs On Sun, May 29, 2016 at 12:03 PM, Holger Hoffstätte <holger@applied-asynchrony.com> wrote: > On 05/29/16 19:53, Chris Murphy wrote: >> But I'm skeptical of bcache using a hidden area historically for the >> bootloader, to put its device metadata. I didn't realize that was the >> case. Imagine if LVM were to stuff metadata into the MBR gap, or >> mdadm. Egads. > > On the matter of bcache in general this seems noteworthy: > > https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/?id=4d1034eb7c2f5e32d48ddc4dfce0f1a723d28667 > > bummer.. Well it doesn't mean no one will take it, just that no one has taken it yet. But the future of SSD caching may only be with LVM. -- Chris Murphy ^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: Hot data tracking / hybrid storage 2016-05-29 18:33 ` Chris Murphy @ 2016-05-29 20:45 ` Ferry Toth 2016-05-31 12:21 ` Austin S. Hemmelgarn 2016-06-01 10:45 ` Dmitry Katsubo 0 siblings, 2 replies; 26+ messages in thread From: Ferry Toth @ 2016-05-29 20:45 UTC (permalink / raw) To: linux-btrfs Op Sun, 29 May 2016 12:33:06 -0600, schreef Chris Murphy: > On Sun, May 29, 2016 at 12:03 PM, Holger Hoffstätte > <holger@applied-asynchrony.com> wrote: >> On 05/29/16 19:53, Chris Murphy wrote: >>> But I'm skeptical of bcache using a hidden area historically for the >>> bootloader, to put its device metadata. I didn't realize that was the >>> case. Imagine if LVM were to stuff metadata into the MBR gap, or >>> mdadm. Egads. >> >> On the matter of bcache in general this seems noteworthy: >> >> https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/ commit/?id=4d1034eb7c2f5e32d48ddc4dfce0f1a723d28667 >> >> bummer.. > > Well it doesn't mean no one will take it, just that no one has taken it > yet. But the future of SSD caching may only be with LVM. > > -- > Chris Murphy I think all the above posts underline exacly my point: Instead of using a ssd cache (be it bcache or dm-cache) it would be much better to have the btrfs allocator be aware of ssd's in the pool and prioritize allocations to the ssd to maximize performance. This will allow to easily add more ssd's or replace worn out ones, without the mentioned headaches. After all adding/replacing drives to a pool is one of btrfs's biggest advantages. ^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: Hot data tracking / hybrid storage 2016-05-29 20:45 ` Ferry Toth @ 2016-05-31 12:21 ` Austin S. Hemmelgarn 2016-06-01 10:45 ` Dmitry Katsubo 1 sibling, 0 replies; 26+ messages in thread From: Austin S. Hemmelgarn @ 2016-05-31 12:21 UTC (permalink / raw) To: Ferry Toth, linux-btrfs On 2016-05-29 16:45, Ferry Toth wrote: > Op Sun, 29 May 2016 12:33:06 -0600, schreef Chris Murphy: > >> On Sun, May 29, 2016 at 12:03 PM, Holger Hoffstätte >> <holger@applied-asynchrony.com> wrote: >>> On 05/29/16 19:53, Chris Murphy wrote: >>>> But I'm skeptical of bcache using a hidden area historically for the >>>> bootloader, to put its device metadata. I didn't realize that was the >>>> case. Imagine if LVM were to stuff metadata into the MBR gap, or >>>> mdadm. Egads. >>> >>> On the matter of bcache in general this seems noteworthy: >>> >>> https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/ > commit/?id=4d1034eb7c2f5e32d48ddc4dfce0f1a723d28667 >>> >>> bummer.. >> >> Well it doesn't mean no one will take it, just that no one has taken it >> yet. But the future of SSD caching may only be with LVM. >> >> -- >> Chris Murphy > > I think all the above posts underline exacly my point: > > Instead of using a ssd cache (be it bcache or dm-cache) it would be much > better to have the btrfs allocator be aware of ssd's in the pool and > prioritize allocations to the ssd to maximize performance. > > This will allow to easily add more ssd's or replace worn out ones, > without the mentioned headaches. After all adding/replacing drives to a > pool is one of btrfs's biggest advantages. It would still need to be pretty configurable, and even then would still be a niche use case. It would also need automatic migration to be practical beyond a certain point, most people using regular computers outside of corporate environments don't have that same 'access frequency decreases over time' pattern that the manual migration scheme you suggested would be good for. I think overall the most useful way of doing it would be something like the L2ARC on ZFS, which is essentially swap space for the page-cache, put on an SSD. ^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: Hot data tracking / hybrid storage 2016-05-29 20:45 ` Ferry Toth 2016-05-31 12:21 ` Austin S. Hemmelgarn @ 2016-06-01 10:45 ` Dmitry Katsubo 1 sibling, 0 replies; 26+ messages in thread From: Dmitry Katsubo @ 2016-06-01 10:45 UTC (permalink / raw) To: linux-btrfs, linux-btrfs-owner On 2016-05-29 22:45, Ferry Toth wrote: > Op Sun, 29 May 2016 12:33:06 -0600, schreef Chris Murphy: > >> On Sun, May 29, 2016 at 12:03 PM, Holger Hoffstätte >> <holger@applied-asynchrony.com> wrote: >>> On 05/29/16 19:53, Chris Murphy wrote: >>>> But I'm skeptical of bcache using a hidden area historically for the >>>> bootloader, to put its device metadata. I didn't realize that was >>>> the >>>> case. Imagine if LVM were to stuff metadata into the MBR gap, or >>>> mdadm. Egads. >>> >>> On the matter of bcache in general this seems noteworthy: >>> >>> https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/?id=4d1034eb7c2f5e32d48ddc4dfce0f1a723d28667 >>> >>> bummer.. >> >> Well it doesn't mean no one will take it, just that no one has taken >> it >> yet. But the future of SSD caching may only be with LVM. >> > > I think all the above posts underline exacly my point: > > Instead of using a ssd cache (be it bcache or dm-cache) it would be > much > better to have the btrfs allocator be aware of ssd's in the pool and > prioritize allocations to the ssd to maximize performance. > > This will allow to easily add more ssd's or replace worn out ones, > without the mentioned headaches. After all adding/replacing drives to a > pool is one of btrfs's biggest advantages. I would certainly vote for this feature. If I understand correctly, the mirror is selected based on the PID of btrfs worker thread [1], which is simple but not most effective. I would suggest implementing the queue of read operations per physical device (perhaps reads/writes should be put into same queue). If device is fast (and for SSD that is the case), the queue becomes empty quicker which means it should be loaded more intensively. Allocation logic should simply put the next request to the shortest queue. I think this will guarantee that most operations are served by SSD (or any other even faster technology that appears in the future). [1] https://btrfs.wiki.kernel.org/index.php/Project_ideas#Better_data_balancing_over_multiple_devices_for_raid1.2F10_.28read.29 -- With best regards, Dmitry ^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: Hot data tracking / hybrid storage 2016-05-20 12:03 ` Austin S. Hemmelgarn 2016-05-20 17:02 ` Ferry Toth @ 2016-05-20 22:26 ` Henk Slager 2016-05-23 11:32 ` Austin S. Hemmelgarn 1 sibling, 1 reply; 26+ messages in thread From: Henk Slager @ 2016-05-20 22:26 UTC (permalink / raw) To: linux-btrfs >>>> bcache protective superblocks is a one-time procedure which can be done >>>> online. The bcache devices act as normal HDD if not attached to a >>>> caching SSD. It's really less pain than you may think. And it's a >>>> solution available now. Converting back later is easy: Just detach the >>>> HDDs from the SSDs and use them for some other purpose if you feel so >>>> later. Having the bcache protective superblock still in place doesn't >>>> hurt then. Bcache is a no-op without caching device attached. >>> >>> >>> No, bcache is _almost_ a no-op without a caching device. From a >>> userspace >>> perspective, it does nothing, but it is still another layer of >>> indirection >>> in the kernel, which does have a small impact on performance. The same >>> is >>> true of using LVM with a single volume taking up the entire partition, it >>> looks almost no different from just using the partition, but it will >>> perform >>> worse than using the partition directly. I've actually done profiling of >>> both to figure out base values for the overhead, and while bcache with no >>> cache device is not as bad as the LVM example, it can still be a roughly >>> 0.5-2% slowdown (it gets more noticeable the faster your backing storage >>> is). >>> >>> You also lose the ability to mount that filesystem directly on a kernel >>> without bcache support (this may or may not be an issue for you). >> >> >> The bcache (protective) superblock is in an 8KiB block in front of the >> file system device. In case the current, non-bcached HDD's use modern >> partitioning, you can do a 5-minute remove or add of bcache, without >> moving/copying filesystem data. So in case you have a bcache-formatted >> HDD that had just 1 primary partition (512 byte logical sectors), the >> partition start is at sector 2048 and the filesystem start is at 2064. >> Hard removing bcache (so making sure the module is not >> needed/loaded/used the next boot) can be done done by changing the >> start-sector of the partition from 2048 to 2064. In gdisk one has to >> change the alignment to 16 first, otherwise this it refuses. And of >> course, also first flush+stop+de-register bcache for the HDD. >> >> The other way around is also possible, i.e. changing the start-sector >> from 2048 to 2032. So that makes adding bcache to an existing >> filesystem a 5 minute action and not a GBs- or TBs copy action. It is >> not online of course, but just one reboot is needed (or just umount, >> gdisk, partprobe, add bcache etc). >> For RAID setups, one could just do 1 HDD first. > > My argument about the overhead was not about the superblock, it was about > the bcache layer itself. It isn't practical to just access the data > directly if you plan on adding a cache device, because then you couldn't do > so online unless you're going through bcache. This extra layer of > indirection in the kernel does add overhead, regardless of the on-disk > format. Yes, sorry, I took some shortcut in the discussion and jumped to a method for avoiding this 0.5-2% slowdown that you mention. (Or a kernel crashing in bcache code due to corrupt SB on a backing device or corrupted caching device contents). I am actually bit surprised that there is a measurable slowdown, considering that it is basically just one 8KiB offset on a certain layer in the kernel stack, but I haven't looked at that code. > Secondarily, having a HDD with just one partition is not a typical use case, > and that argument about the slack space resulting from the 1M alignment only > holds true if you're using an MBR instead of a GPT layout (or for that > matter, almost any other partition table format), and you're not booting > from that disk (because GRUB embeds itself there). It's also fully possible > to have an MBR formatted disk which doesn't have any spare space there too > (which is how most flash drives get formatted). I don't know other tables than MBR and GPT, but this bcache SB 'insertion' works with both. Indeed, if GRUB is involved, it can get complicated, I have avoided that. If there is less than 8KiB slack space on a HDD, I would worry about alignment/performance first, then there is likely a reason to fully rewrite the HDD with a standard 1M alingment. If there is more partitions and the partition in front of the one you would like to be bcached, I personally would shrink it by 8KiB (like NTFS or swap or ext4 ) if that saves me TeraBytes of datatransfers. > This also doesn't change the fact that without careful initial formatting > (it is possible on some filesystems to embed the bcache SB at the beginning > of the FS itself, many of them have some reserved space at the beginning of > the partition for bootloaders, and this space doesn't have to exist when > mounting the FS) or manual alteration of the partition, it's not possible to > mount the FS on a system without bcache support. If we consider a non-bootable single HDD btrfs FS, are you then suggesting that the bcache SB could be placed in the first 64KiB where also GRUB stores its code if the FS would need booting ? That would be interesting, it would mean that also for btrfs on raw device (and also multi-device) there is no extra exclusive 8KiB space needed in front. Is there someone who has this working? I think it would lead to issues on the blocklayer, but I have currently no clue about that. >> There is also a tool doing the conversion in-place (I haven't used it >> myself, my python(s) had trouble; I could do the partition table edit >> much faster/easier): >> https://github.com/g2p/blocks#bcache-conversion >> > I actually hadn't known about this tool, thanks for mentioning it. ^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: Hot data tracking / hybrid storage 2016-05-20 22:26 ` Henk Slager @ 2016-05-23 11:32 ` Austin S. Hemmelgarn 0 siblings, 0 replies; 26+ messages in thread From: Austin S. Hemmelgarn @ 2016-05-23 11:32 UTC (permalink / raw) To: Henk Slager, linux-btrfs On 2016-05-20 18:26, Henk Slager wrote: > Yes, sorry, I took some shortcut in the discussion and jumped to a > method for avoiding this 0.5-2% slowdown that you mention. (Or a > kernel crashing in bcache code due to corrupt SB on a backing device > or corrupted caching device contents). > I am actually bit surprised that there is a measurable slowdown, > considering that it is basically just one 8KiB offset on a certain > layer in the kernel stack, but I haven't looked at that code. There's still a layer of indirection in the kernel code, even in the pass-through mode with no cache, and that's probably where the slowdown comes from. My testing was also in a VM with it's backing device on an SSD though, so you may get different results on other hardware > I don't know other tables than MBR and GPT, but this bcache SB > 'insertion' works with both. Indeed, if GRUB is involved, it can get > complicated, I have avoided that. If there is less than 8KiB slack > space on a HDD, I would worry about alignment/performance first, then > there is likely a reason to fully rewrite the HDD with a standard 1M > alingment. The 'alignment' things is mostly bogus these days. It originated when 1M was a full track on the disk, and you wanted your filesystem to start on the beginning of a track for performance reasons. On most modern disks though, this is not a full track, but it got kept because a number of bootloaders (GRUB included) used to use the slack space this caused to embed themselves before the filesystem. The only case where 1M alignment actually makes sense is on SSD's with a 1M erase block size (which are rare, most consumer devices have a 4M erase block). As far as partition tables, you're not likely to see any other formats these days (the only ones I've dealt with other than MBR and GPT are APM (the old pre-OSX Apple format), RDB (the Amiga format, which is kind of neat because it can embed drivers), and the old Sun disk labels (from before SunOS became Solaris)), and I had actually forgotten that a GPT is only 32k, hence my comment about it potentially being an issue. > If there is more partitions and the partition in front of the one you > would like to be bcached, I personally would shrink it by 8KiB (like > NTFS or swap or ext4 ) if that saves me TeraBytes of datatransfers. Definitely, although depending on how the system is set up, this will almost certainly need down time. > >> This also doesn't change the fact that without careful initial formatting >> (it is possible on some filesystems to embed the bcache SB at the beginning >> of the FS itself, many of them have some reserved space at the beginning of >> the partition for bootloaders, and this space doesn't have to exist when >> mounting the FS) or manual alteration of the partition, it's not possible to >> mount the FS on a system without bcache support. > > If we consider a non-bootable single HDD btrfs FS, are you then > suggesting that the bcache SB could be placed in the first 64KiB where > also GRUB stores its code if the FS would need booting ? > That would be interesting, it would mean that also for btrfs on raw > device (and also multi-device) there is no extra exclusive 8KiB space > needed in front. > Is there someone who has this working? I think it would lead to issues > on the blocklayer, but I have currently no clue about that. I don't think it would work on BTRFS, we expect the SB at a fixed location into the device, and it wouldn't be there on the bcache device. It might work on ext4 though, but I'm not certain about that. I do know of at least one person who got it working with a FAT32 filesystem as a proof of concept though. Trying to do that even if it would work on BTRFS would be _really_ risky though, because the kernel would potentially see both devices, and you would probably have the same issues that you do with block level copies. ^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: Hot data tracking / hybrid storage 2016-05-15 12:12 Hot data tracking / hybrid storage Ferry Toth 2016-05-15 21:11 ` Duncan @ 2016-05-16 11:25 ` Austin S. Hemmelgarn 1 sibling, 0 replies; 26+ messages in thread From: Austin S. Hemmelgarn @ 2016-05-16 11:25 UTC (permalink / raw) To: Ferry Toth, linux-btrfs On 2016-05-15 08:12, Ferry Toth wrote: > Is there anything going on in this area? > > We have btrfs in RAID10 using 4 HDD's for many years now with a rotating > scheme of snapshots for easy backup. <10% files (bytes) change between > oldest snapshot and the current state. > > However, the filesystem seems to become very slow, probably due to the > RAID10 and the snapshots. While it's not exactly what you're thinking of, have you tried running BTRFS in raid1 mode on top of two DM/MD RAID0 volumes? This provides the same degree of data integrity that BTRFS raid10 does, but gets measurably better performance. > > It would be fantastic if we could just add 4 SSD's to the pool and btrfs > would just magically prefer to put often accessed files there and move > older or less popular files to the HDD's. > > In my simple mind this can not be done easily using bcache as that would > require completely rebuilding the file system on top of bcache (can not > just add a few SSD's to the pool), while implementing a cache inside btrfs > is probably a complex thing with lots of overhead. You may want to look into dm-cache, as that doesn't require reformatting the source device. It doesn't quite get the same performance as bcache, but for me at least, the lower performance is a reasonable trade-off for being able to easily convert a device to use it, and being able to easily convert away from it if need be. > > Simply telling the allocator to prefer new files to go to the ssd and > move away unpopular stuff to hdd during balance should do the trick, or am > I wrong? In theory this would work as a first implementation, but it would need to have automatic data migration as an option to be considered practical, and that's not as easy to do correctly. ^ permalink raw reply [flat|nested] 26+ messages in thread
end of thread, other threads:[~2016-06-01 10:45 UTC | newest] Thread overview: 26+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2016-05-15 12:12 Hot data tracking / hybrid storage Ferry Toth 2016-05-15 21:11 ` Duncan 2016-05-15 23:05 ` Kai Krakow 2016-05-17 6:27 ` Ferry Toth 2016-05-17 11:32 ` Austin S. Hemmelgarn 2016-05-17 18:33 ` Kai Krakow 2016-05-18 22:44 ` Ferry Toth 2016-05-19 18:09 ` Kai Krakow 2016-05-19 18:51 ` Austin S. Hemmelgarn 2016-05-19 21:01 ` Kai Krakow 2016-05-20 11:46 ` Austin S. Hemmelgarn 2016-05-19 23:23 ` Henk Slager 2016-05-20 12:03 ` Austin S. Hemmelgarn 2016-05-20 17:02 ` Ferry Toth 2016-05-20 17:59 ` Austin S. Hemmelgarn 2016-05-20 21:31 ` Henk Slager 2016-05-29 6:23 ` Andrei Borzenkov 2016-05-29 17:53 ` Chris Murphy 2016-05-29 18:03 ` Holger Hoffstätte 2016-05-29 18:33 ` Chris Murphy 2016-05-29 20:45 ` Ferry Toth 2016-05-31 12:21 ` Austin S. Hemmelgarn 2016-06-01 10:45 ` Dmitry Katsubo 2016-05-20 22:26 ` Henk Slager 2016-05-23 11:32 ` Austin S. Hemmelgarn 2016-05-16 11:25 ` Austin S. Hemmelgarn
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.