* Q: what exactly does SSD mode still do? @ 2020-03-26 18:16 Holger Hoffstätte 2020-03-26 22:21 ` Hans van Kranenburg 0 siblings, 1 reply; 5+ messages in thread From: Holger Hoffstätte @ 2020-03-26 18:16 UTC (permalink / raw) To: linux-btrfs Hi, could someone explain what SSD mode *actually* still does? Not ssd_spread, that's clear and unrelated. A recent commit removed the thread-offloaded bio submission (avoiding context switches etc.) - which I thought was the reason for SSD mode? - and looking through the code I couldn't find any bits that helped clarify the difference. Thanks! Holger ^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: Q: what exactly does SSD mode still do? 2020-03-26 18:16 Q: what exactly does SSD mode still do? Holger Hoffstätte @ 2020-03-26 22:21 ` Hans van Kranenburg 2020-03-27 10:29 ` Holger Hoffstätte 0 siblings, 1 reply; 5+ messages in thread From: Hans van Kranenburg @ 2020-03-26 22:21 UTC (permalink / raw) To: Holger Hoffstätte, linux-btrfs Hi! On 3/26/20 7:16 PM, Holger Hoffstätte wrote: > > could someone explain what SSD mode *actually* still does? Not ssd_spread, > that's clear and unrelated. A recent commit removed the thread-offloaded > bio submission (avoiding context switches etc.) Can you share the commit id? > - which I thought was the > reason for SSD mode? - and looking through the code I couldn't find any > bits that helped clarify the difference. After the change in 2017 to change the extent allocator in ssd mode for data to behave like nossd already did before, there are two differences between ssd and nossd left: 1) This if statement in tree-log.c: cd354ad613a39 (Chris Mason 2011-10-20 15:45:37 -0400 3042) /* when we're on an ssd, just kick the log commit out */ 0b246afa62b0c (Jeff Mahoney 2016-06-22 18:54:23 -0400 3043) if (!btrfs_test_opt(fs_info, SSD) && 2) Metadata "cluster allocator" write behavior: *empty_cluster = SZ_64K # nossd *empty_cluster = SZ_2M # ssd This happens in extent-tree.c. For 1) I guess this is ok if you can do "seek free writes"? For 2) I initially wanted to start more research on the behavioral difference, but when upgrading from Linux 4.9 to 4.19, the majority of the problems with exploding extent tree metadata writes were already gone (in ssd mode), so that never happened. So, there's still those two hard coded values without any proper recent explanation why they should be at that value. Hans ^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: Q: what exactly does SSD mode still do? 2020-03-26 22:21 ` Hans van Kranenburg @ 2020-03-27 10:29 ` Holger Hoffstätte 2020-03-28 19:35 ` Zygo Blaxell 0 siblings, 1 reply; 5+ messages in thread From: Holger Hoffstätte @ 2020-03-27 10:29 UTC (permalink / raw) To: Hans van Kranenburg, linux-btrfs On 3/26/20 11:21 PM, Hans van Kranenburg wrote: > Hi! > > On 3/26/20 7:16 PM, Holger Hoffstätte wrote: >> >> could someone explain what SSD mode *actually* still does? Not ssd_spread, >> that's clear and unrelated. A recent commit removed the thread-offloaded >> bio submission (avoiding context switches etc.) > > Can you share the commit id? [1] followed by [2]. >> - which I thought was the >> reason for SSD mode? - and looking through the code I couldn't find any >> bits that helped clarify the difference. > > After the change in 2017 to change the extent allocator in ssd mode for > data to behave like nossd already did before, there are two differences > between ssd and nossd left: > > 1) This if statement in tree-log.c: > > cd354ad613a39 (Chris Mason 2011-10-20 15:45:37 -0400 3042) > /* when we're on an ssd, just kick the log commit out */ > 0b246afa62b0c (Jeff Mahoney 2016-06-22 18:54:23 -0400 3043) > if (!btrfs_test_opt(fs_info, SSD) && Ah yes, multi-writer batching - a common DB optimization technique. I wonder how much of a difference that actually still makes, but it sounds like a good idea. > 2) Metadata "cluster allocator" write behavior: > > *empty_cluster = SZ_64K # nossd > *empty_cluster = SZ_2M # ssd > > This happens in extent-tree.c. 2M used to be a common erase block size on SSDs. Or maybe it's just a nice round number.. ¯\(ツ)/¯ cheers, Holger [1] https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=08635bae0b4ceb08fe4c156a11c83baec397d36d [2] https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=ba8a9d07954397f0645cf62bcc1ef536e8e7ba24 ^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: Q: what exactly does SSD mode still do? 2020-03-27 10:29 ` Holger Hoffstätte @ 2020-03-28 19:35 ` Zygo Blaxell 2020-03-28 21:31 ` Hans van Kranenburg 0 siblings, 1 reply; 5+ messages in thread From: Zygo Blaxell @ 2020-03-28 19:35 UTC (permalink / raw) To: Holger Hoffstätte; +Cc: Hans van Kranenburg, linux-btrfs On Fri, Mar 27, 2020 at 11:29:52AM +0100, Holger Hoffstätte wrote: > On 3/26/20 11:21 PM, Hans van Kranenburg wrote: > > 2) Metadata "cluster allocator" write behavior: > > > > *empty_cluster = SZ_64K # nossd > > *empty_cluster = SZ_2M # ssd > > > > This happens in extent-tree.c. > > 2M used to be a common erase block size on SSDs. Or maybe it's just > a nice round number.. ¯\(ツ)/¯ As a side-effect, 2M write clusters close the write hole on raid5/6 if you have an array that is a power of 2 data disks wide. This capability is wasted when it's only available through the 'ssd' mount option. The behavior could be quite useful if it was properly integrated with the raid5/6 stuff: set *empty_cluster = block group data width, make sure it's aligned to raid5/6 stripe boundaries, and use it for both data and metadata. It works by effectively making partially-filled clusters read-only. If we can guarantee that clusters are aligned to raid5/6 data/parity block boundaries, then btrfs can't allocate new data in partially filled raid5/6 stripes, so it won't break the parity relation and won't have write hole. > cheers, > Holger > > [1] https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=08635bae0b4ceb08fe4c156a11c83baec397d36d > > [2] https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=ba8a9d07954397f0645cf62bcc1ef536e8e7ba24 > ^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: Q: what exactly does SSD mode still do? 2020-03-28 19:35 ` Zygo Blaxell @ 2020-03-28 21:31 ` Hans van Kranenburg 0 siblings, 0 replies; 5+ messages in thread From: Hans van Kranenburg @ 2020-03-28 21:31 UTC (permalink / raw) To: Zygo Blaxell, Holger Hoffstätte; +Cc: linux-btrfs On 3/28/20 8:35 PM, Zygo Blaxell wrote: > On Fri, Mar 27, 2020 at 11:29:52AM +0100, Holger Hoffstätte wrote: >> On 3/26/20 11:21 PM, Hans van Kranenburg wrote: >>> 2) Metadata "cluster allocator" write behavior: >>> >>> *empty_cluster = SZ_64K # nossd >>> *empty_cluster = SZ_2M # ssd >>> >>> This happens in extent-tree.c. >> >> 2M used to be a common erase block size on SSDs. Or maybe it's just >> a nice round number.. ¯\(ツ)/¯ > > As a side-effect, 2M write clusters close the write hole on raid5/6 if you > have an array that is a power of 2 data disks wide. This capability is > wasted when it's only available through the 'ssd' mount option. Search for SSD_SPREAD in free-space-cache.c. There's this cont1_bytes which is a fallback, so you'll have to run full SSD_SPREAD mode for this to happen IINM. https://www.spinics.net/lists/linux-btrfs/msg70624.html for a huge braindump While running Linux 4.9 back then, I had to actually use 'ssd_spread' metadata (not for data, possible thanks to that 'bug') to prevent metadata writes from running around in circles while writing the extent tree. With 4.19, I can juse use 'ssd' and TBH I have no idea what change in between got rid of that insane amount of write overhead. So, I never continued with researching behavior of different options (empty_cluster, cont1_bytes combinations). > The behavior could be quite useful if it was properly integrated with > the raid5/6 stuff: set *empty_cluster = block group data width, make > sure it's aligned to raid5/6 stripe boundaries, and use it for both data > and metadata. > > It works by effectively making partially-filled clusters read-only. > If we can guarantee that clusters are aligned to raid5/6 data/parity block > boundaries, then btrfs can't allocate new data in partially filled raid5/6 > stripes, so it won't break the parity relation and won't have write hole. > >> cheers, >> Holger >> >> [1] https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=08635bae0b4ceb08fe4c156a11c83baec397d36d >> >> [2] https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=ba8a9d07954397f0645cf62bcc1ef536e8e7ba24 >> K ^ permalink raw reply [flat|nested] 5+ messages in thread
end of thread, other threads:[~2020-03-28 21:31 UTC | newest] Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2020-03-26 18:16 Q: what exactly does SSD mode still do? Holger Hoffstätte 2020-03-26 22:21 ` Hans van Kranenburg 2020-03-27 10:29 ` Holger Hoffstätte 2020-03-28 19:35 ` Zygo Blaxell 2020-03-28 21:31 ` Hans van Kranenburg
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).