* Using async discard by default with SSDs? @ 2022-07-21 18:43 Neal Gompa 2022-07-21 19:21 ` Roman Mamedov 2022-07-25 19:08 ` David Sterba 0 siblings, 2 replies; 13+ messages in thread From: Neal Gompa @ 2022-07-21 18:43 UTC (permalink / raw) To: Btrfs BTRFS; +Cc: Chris Mason, Josef Bacik, Chris Murphy Hey all, I was talking with Chris Mason today at a Fedora Hatch event about async discards (as we were thinking about doing this in Fedora some time ago[1]), and he seemed to consider it reasonable to make it so Btrfs uses async discards by default when being formatted on SSDs. He and I couldn't think of a reason why not to, other than the potential lack of "discard=none" option to turn off discards if the user wanted it to. Do we already have this option? Are there any other reasons not to do this? Or is this something we should have changed in Btrfs so everyone gets async discards by default going forward? Thanks in advance and best regards, Neal [1]: https://pagure.io/fedora-btrfs/project/issue/6 -- 真実はいつも一つ!/ Always, there's only one truth! ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: Using async discard by default with SSDs? 2022-07-21 18:43 Using async discard by default with SSDs? Neal Gompa @ 2022-07-21 19:21 ` Roman Mamedov 2022-07-25 19:08 ` David Sterba 1 sibling, 0 replies; 13+ messages in thread From: Roman Mamedov @ 2022-07-21 19:21 UTC (permalink / raw) To: Neal Gompa; +Cc: Btrfs BTRFS, Chris Mason, Josef Bacik, Chris Murphy On Thu, 21 Jul 2022 14:43:16 -0400 Neal Gompa <ngompa13@gmail.com> wrote: > potential lack of "discard=none" option to turn off discards if the > user wanted it to. Do we already have this option? Like on Ext4, XFS and others, the "nodiscard" mount option is available: https://btrfs.readthedocs.io/en/latest/btrfs-man5.html -- With respect, Roman ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: Using async discard by default with SSDs? 2022-07-21 18:43 Using async discard by default with SSDs? Neal Gompa 2022-07-21 19:21 ` Roman Mamedov @ 2022-07-25 19:08 ` David Sterba 2022-07-26 20:00 ` Chris Murphy 1 sibling, 1 reply; 13+ messages in thread From: David Sterba @ 2022-07-25 19:08 UTC (permalink / raw) To: Neal Gompa; +Cc: Btrfs BTRFS, Chris Mason, Josef Bacik, Chris Murphy On Thu, Jul 21, 2022 at 02:43:16PM -0400, Neal Gompa wrote: > Hey all, > > I was talking with Chris Mason today at a Fedora Hatch event about > async discards (as we were thinking about doing this in Fedora some > time ago[1]), and he seemed to consider it reasonable to make it so > Btrfs uses async discards by default when being formatted on SSDs. > > He and I couldn't think of a reason why not to, other than the > potential lack of "discard=none" option to turn off discards if the > user wanted it to. Do we already have this option? Are there any other > reasons not to do this? Or is this something we should have changed in > Btrfs so everyone gets async discards by default going forward? I think it's safe to use by default, with the usual question who could be affected by that negatively. The async triggers, timeouts and thresholds are preset conservatively and so far there have been no complaints. The tunables have been hidden under debug, but there are also some stats (like how many bytes could be discarded in the next round), so it would be good to also publish that when it's on by default. ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: Using async discard by default with SSDs? 2022-07-25 19:08 ` David Sterba @ 2022-07-26 20:00 ` Chris Murphy 2022-07-26 21:36 ` David Sterba 0 siblings, 1 reply; 13+ messages in thread From: Chris Murphy @ 2022-07-26 20:00 UTC (permalink / raw) To: David Sterba, Neal Gompa; +Cc: Btrfs BTRFS, Chris Mason, Josef Bacik On Mon, Jul 25, 2022, at 3:08 PM, David Sterba wrote: > I think it's safe to use by default, with the usual question who could > be affected by that negatively. The async triggers, timeouts and > thresholds are preset conservatively and so far there have been no > complaints. > > The tunables have been hidden under debug, but there are also some stats > (like how many bytes could be discarded in the next round), so it would > be good to also publish that when it's on by default. In my testing, discard=async rather quickly submits recently freed metadata blocks for gc, blocks referred by the backup roots in the super. Is this a concern for recovery? Is there a way to exclude the most recent ~5 generations of metadata from gc, and could it improve the usefulness of backup roots for recovery? -- Chris Murphy ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: Using async discard by default with SSDs? 2022-07-26 20:00 ` Chris Murphy @ 2022-07-26 21:36 ` David Sterba 2022-07-26 22:10 ` Chris Murphy 0 siblings, 1 reply; 13+ messages in thread From: David Sterba @ 2022-07-26 21:36 UTC (permalink / raw) To: Chris Murphy Cc: David Sterba, Neal Gompa, Btrfs BTRFS, Chris Mason, Josef Bacik On Tue, Jul 26, 2022 at 04:00:42PM -0400, Chris Murphy wrote: > > > On Mon, Jul 25, 2022, at 3:08 PM, David Sterba wrote: > > > I think it's safe to use by default, with the usual question who could > > be affected by that negatively. The async triggers, timeouts and > > thresholds are preset conservatively and so far there have been no > > complaints. > > > > The tunables have been hidden under debug, but there are also some stats > > (like how many bytes could be discarded in the next round), so it would > > be good to also publish that when it's on by default. > > In my testing, discard=async rather quickly submits recently freed > metadata blocks for gc, blocks referred by the backup roots in the > super. Is this a concern for recovery? Is there a way to exclude the > most recent ~5 generations of metadata from gc, and could it improve > the usefulness of backup roots for recovery? What is quickly here? The default timeout is set to 2 minutes and that's what I've seen when testing that on a fresh filesystem. There's some additional logic that's adaptive and derived from the latency increase caused by the discard, but this could be observed on some specific delete heavy workloads. There's no logic that would take into account generation so it could interfere with the backup root feature, but it's been unreliable already and there are no guarantees to keep the recently freed blocks unallocated. I've heared some feedback that the feature should be removed or improved, I'm personally in the "removed" camp. The block groups eligible for discard are evaluated based on time and size criteria so I think adding a generation is straightforward, rough guess is to btrfs_discard_check_filter(). This affects overall performance though, FB guys wanted faster discard for their workload so adding delay based on generation could make it worse. They now use the default settings, though there are tunables, and I'd like to keep defaults working for most users. If somebody evaluates the effects of additional generational we can change it. ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: Using async discard by default with SSDs? 2022-07-26 21:36 ` David Sterba @ 2022-07-26 22:10 ` Chris Murphy 2022-07-27 14:50 ` Chris Murphy 0 siblings, 1 reply; 13+ messages in thread From: Chris Murphy @ 2022-07-26 22:10 UTC (permalink / raw) To: David Sterba; +Cc: Neal Gompa, Btrfs BTRFS, Chris Mason, Josef Bacik On Tue, Jul 26, 2022, at 5:36 PM, David Sterba wrote: > On Tue, Jul 26, 2022 at 04:00:42PM -0400, Chris Murphy wrote: >> >> >> On Mon, Jul 25, 2022, at 3:08 PM, David Sterba wrote: >> >> > I think it's safe to use by default, with the usual question who could >> > be affected by that negatively. The async triggers, timeouts and >> > thresholds are preset conservatively and so far there have been no >> > complaints. >> > >> > The tunables have been hidden under debug, but there are also some stats >> > (like how many bytes could be discarded in the next round), so it would >> > be good to also publish that when it's on by default. >> >> In my testing, discard=async rather quickly submits recently freed >> metadata blocks for gc, blocks referred by the backup roots in the >> super. Is this a concern for recovery? Is there a way to exclude the >> most recent ~5 generations of metadata from gc, and could it improve >> the usefulness of backup roots for recovery? > > What is quickly here? Seconds, less than a minute, all the blocks pointed to by backup roots are empty (zeros). The default timeout is set to 2 minutes and that's > what I've seen when testing that on a fresh filesystem. Ok I'll retest with 5.19 series. The last time I tested time to drive gc backup rootswas circa 5.8. There's some > additional logic that's adaptive and derived from the latency increase > caused by the discard, but this could be observed on some specific > delete heavy workloads. > > There's no logic that would take into account generation so it could > interfere with the backup root feature, but it's been unreliable already > and there are no guarantees to keep the recently freed blocks unallocated. > I've heared some feedback that the feature should be removed or > improved, I'm personally in the "removed" camp. I have no strong opinion. If it very rarely works, I'm less concerned. But if it can sometimes work a significant minority of the time, some folks occasionally benefit in a pretty significant way. -- Chris Murphy ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: Using async discard by default with SSDs? 2022-07-26 22:10 ` Chris Murphy @ 2022-07-27 14:50 ` Chris Murphy 2022-07-27 14:56 ` David Sterba 0 siblings, 1 reply; 13+ messages in thread From: Chris Murphy @ 2022-07-27 14:50 UTC (permalink / raw) To: David Sterba; +Cc: Neal Gompa, Btrfs BTRFS, Chris Mason, Josef Bacik On Tue, Jul 26, 2022, at 6:10 PM, Chris Murphy wrote: > On Tue, Jul 26, 2022, at 5:36 PM, David Sterba wrote: >> What is quickly here? > > Seconds, less than a minute, all the blocks pointed to by backup roots > are empty (zeros). > > The default timeout is set to 2 minutes and that's >> what I've seen when testing that on a fresh filesystem. > > Ok I'll retest with 5.19 series. The last time I tested time to drive > gc backup rootswas circa 5.8. With 5.19, backup roots remain available for a surprisingly long time, definitely more than 2 minutes. It's not an exhaustive test, just a dozen samples over a half hour, but I never once saw zeros returned. Two out of those dozen, I saw the block already overwritten with a leaf of a newer generation and a completely different owner (e.g. csum tree written at the block address for the backup tree root) - which would make that backup root useless. What is a likely target kernel version to make discard=async the default? The merge window for 5.20 closes August 14. Is 5.21 a practical target? -- Chris Murphy ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: Using async discard by default with SSDs? 2022-07-27 14:50 ` Chris Murphy @ 2022-07-27 14:56 ` David Sterba 2022-07-27 15:14 ` Chris Murphy ` (2 more replies) 0 siblings, 3 replies; 13+ messages in thread From: David Sterba @ 2022-07-27 14:56 UTC (permalink / raw) To: Chris Murphy; +Cc: Neal Gompa, Btrfs BTRFS, Chris Mason, Josef Bacik On Wed, Jul 27, 2022 at 10:50:01AM -0400, Chris Murphy wrote: > On Tue, Jul 26, 2022, at 6:10 PM, Chris Murphy wrote: > > On Tue, Jul 26, 2022, at 5:36 PM, David Sterba wrote: > >> What is quickly here? > > > > Seconds, less than a minute, all the blocks pointed to by backup roots > > are empty (zeros). > > > > The default timeout is set to 2 minutes and that's > >> what I've seen when testing that on a fresh filesystem. > > > > Ok I'll retest with 5.19 series. The last time I tested time to drive > > gc backup rootswas circa 5.8. > > With 5.19, backup roots remain available for a surprisingly long time, > definitely more than 2 minutes. It's not an exhaustive test, just a > dozen samples over a half hour, but I never once saw zeros returned. > Two out of those dozen, I saw the block already overwritten with a > leaf of a newer generation and a completely different owner (e.g. csum > tree written at the block address for the backup tree root) - which > would make that backup root useless. > > What is a likely target kernel version to make discard=async the > default? The merge window for 5.20 closes August 14. Is 5.21 a > practical target? The changes for the next merge window are supposed to be done a week or two before it opens, but as this is a simple change I think I can squeeze it in. ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: Using async discard by default with SSDs? 2022-07-27 14:56 ` David Sterba @ 2022-07-27 15:14 ` Chris Murphy 2022-07-27 17:47 ` David Sterba 2022-07-27 15:26 ` Neal Gompa 2022-08-05 15:14 ` David Sterba 2 siblings, 1 reply; 13+ messages in thread From: Chris Murphy @ 2022-07-27 15:14 UTC (permalink / raw) To: David Sterba; +Cc: Neal Gompa, Btrfs BTRFS, Chris Mason, Josef Bacik On Wed, Jul 27, 2022, at 10:56 AM, David Sterba wrote: > On Wed, Jul 27, 2022 at 10:50:01AM -0400, Chris Murphy wrote: >> What is a likely target kernel version to make discard=async the >> default? The merge window for 5.20 closes August 14. Is 5.21 a >> practical target? > > The changes for the next merge window are supposed to be done a week or > two before it opens, but as this is a simple change I think I can > squeeze it in. For 5.20? I'm not aware of any conflict with fstrim. But I wonder if there's a preference to coordinate the change with util-linux folks? Currently, util-linux provides fstrim.timer which runs fstrim.service every Monday at 00:00 local time. The command is: ExecStart=/usr/sbin/fstrim --listed-in /etc/fstab:/proc/self/mountinfo --verbose --quiet-unsupported I'm not sure how they'd go about implementing an exception for Btrfs, either entirely or only if a discard mount option is detected. But I can ask? -- Chris Murphy ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: Using async discard by default with SSDs? 2022-07-27 15:14 ` Chris Murphy @ 2022-07-27 17:47 ` David Sterba 2022-07-28 12:27 ` Chris Murphy 0 siblings, 1 reply; 13+ messages in thread From: David Sterba @ 2022-07-27 17:47 UTC (permalink / raw) To: Chris Murphy; +Cc: Neal Gompa, Btrfs BTRFS, Chris Mason, Josef Bacik On Wed, Jul 27, 2022 at 11:14:01AM -0400, Chris Murphy wrote: > > > On Wed, Jul 27, 2022, at 10:56 AM, David Sterba wrote: > > On Wed, Jul 27, 2022 at 10:50:01AM -0400, Chris Murphy wrote: > >> What is a likely target kernel version to make discard=async the > >> default? The merge window for 5.20 closes August 14. Is 5.21 a > >> practical target? > > > > The changes for the next merge window are supposed to be done a week or > > two before it opens, but as this is a simple change I think I can > > squeeze it in. > > For 5.20? Yes, 5.20. > I'm not aware of any conflict with fstrim. But I wonder if there's a > preference to coordinate the change with util-linux folks? The -o discard and fstrim are independent and can be used at the same time. > Currently, util-linux provides fstrim.timer which runs fstrim.service every Monday at 00:00 local time. The command is: > > ExecStart=/usr/sbin/fstrim --listed-in /etc/fstab:/proc/self/mountinfo --verbose --quiet-unsupported > > I'm not sure how they'd go about implementing an exception for Btrfs, > either entirely or only if a discard mount option is detected. But I > can ask? No need for an exception, during mount btrfs track ranges that have been trimmed and skips them when discard is requested by fstrim, so it's a no-op. ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: Using async discard by default with SSDs? 2022-07-27 17:47 ` David Sterba @ 2022-07-28 12:27 ` Chris Murphy 0 siblings, 0 replies; 13+ messages in thread From: Chris Murphy @ 2022-07-28 12:27 UTC (permalink / raw) To: David Sterba; +Cc: Neal Gompa, Btrfs BTRFS, Chris Mason, Josef Bacik On Wed, Jul 27, 2022, at 1:47 PM, David Sterba wrote: > On Wed, Jul 27, 2022 at 11:14:01AM -0400, Chris Murphy wrote: >> >> >> On Wed, Jul 27, 2022, at 10:56 AM, David Sterba wrote: >> > On Wed, Jul 27, 2022 at 10:50:01AM -0400, Chris Murphy wrote: > >> > The changes for the next merge window are supposed to be done a week or >> > two before it opens, but as this is a simple change I think I can >> > squeeze it in. >> >> For 5.20? > > Yes, 5.20. Well I'll let Josef and Chris arm wrestle over who sends you a patch when they can get ahold of some round tuits. Once it's merged in 5.20, sounds easily backported for Fedora to carry a patch for 5.19. -- Chris Murphy ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: Using async discard by default with SSDs? 2022-07-27 14:56 ` David Sterba 2022-07-27 15:14 ` Chris Murphy @ 2022-07-27 15:26 ` Neal Gompa 2022-08-05 15:14 ` David Sterba 2 siblings, 0 replies; 13+ messages in thread From: Neal Gompa @ 2022-07-27 15:26 UTC (permalink / raw) To: dsterba, Chris Murphy, Neal Gompa, Btrfs BTRFS, Chris Mason, Josef Bacik On Wed, Jul 27, 2022 at 8:01 AM David Sterba <dsterba@suse.cz> wrote: > > On Wed, Jul 27, 2022 at 10:50:01AM -0400, Chris Murphy wrote: > > On Tue, Jul 26, 2022, at 6:10 PM, Chris Murphy wrote: > > > On Tue, Jul 26, 2022, at 5:36 PM, David Sterba wrote: > > >> What is quickly here? > > > > > > Seconds, less than a minute, all the blocks pointed to by backup roots > > > are empty (zeros). > > > > > > The default timeout is set to 2 minutes and that's > > >> what I've seen when testing that on a fresh filesystem. > > > > > > Ok I'll retest with 5.19 series. The last time I tested time to drive > > > gc backup rootswas circa 5.8. > > > > With 5.19, backup roots remain available for a surprisingly long time, > > definitely more than 2 minutes. It's not an exhaustive test, just a > > dozen samples over a half hour, but I never once saw zeros returned. > > Two out of those dozen, I saw the block already overwritten with a > > leaf of a newer generation and a completely different owner (e.g. csum > > tree written at the block address for the backup tree root) - which > > would make that backup root useless. > > > > What is a likely target kernel version to make discard=async the > > default? The merge window for 5.20 closes August 14. Is 5.21 a > > practical target? > > The changes for the next merge window are supposed to be done a week or > two before it opens, but as this is a simple change I think I can > squeeze it in. It would definitely be nice to see this for 5.20. :) -- 真実はいつも一つ!/ Always, there's only one truth! ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: Using async discard by default with SSDs? 2022-07-27 14:56 ` David Sterba 2022-07-27 15:14 ` Chris Murphy 2022-07-27 15:26 ` Neal Gompa @ 2022-08-05 15:14 ` David Sterba 2 siblings, 0 replies; 13+ messages in thread From: David Sterba @ 2022-08-05 15:14 UTC (permalink / raw) To: dsterba, Chris Murphy, Neal Gompa, Btrfs BTRFS, Chris Mason, Josef Bacik On Wed, Jul 27, 2022 at 04:56:40PM +0200, David Sterba wrote: > On Wed, Jul 27, 2022 at 10:50:01AM -0400, Chris Murphy wrote: > > On Tue, Jul 26, 2022, at 6:10 PM, Chris Murphy wrote: > > > On Tue, Jul 26, 2022, at 5:36 PM, David Sterba wrote: > > >> What is quickly here? > > > > > > Seconds, less than a minute, all the blocks pointed to by backup roots > > > are empty (zeros). > > > > > > The default timeout is set to 2 minutes and that's > > >> what I've seen when testing that on a fresh filesystem. > > > > > > Ok I'll retest with 5.19 series. The last time I tested time to drive > > > gc backup rootswas circa 5.8. > > > > With 5.19, backup roots remain available for a surprisingly long time, > > definitely more than 2 minutes. It's not an exhaustive test, just a > > dozen samples over a half hour, but I never once saw zeros returned. > > Two out of those dozen, I saw the block already overwritten with a > > leaf of a newer generation and a completely different owner (e.g. csum > > tree written at the block address for the backup tree root) - which > > would make that backup root useless. > > > > What is a likely target kernel version to make discard=async the > > default? The merge window for 5.20 closes August 14. Is 5.21 a > > practical target? > > The changes for the next merge window are supposed to be done a week or > two before it opens, but as this is a simple change I think I can > squeeze it in. This will have to be postponed, Filipe found some problems, transaction abort can happen due to ENOSPC when the system under load. This does not happen in most cases but for a feature supposed to be on by default such thing needs to be fixed first. ^ permalink raw reply [flat|nested] 13+ messages in thread
end of thread, other threads:[~2022-08-05 15:19 UTC | newest] Thread overview: 13+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2022-07-21 18:43 Using async discard by default with SSDs? Neal Gompa 2022-07-21 19:21 ` Roman Mamedov 2022-07-25 19:08 ` David Sterba 2022-07-26 20:00 ` Chris Murphy 2022-07-26 21:36 ` David Sterba 2022-07-26 22:10 ` Chris Murphy 2022-07-27 14:50 ` Chris Murphy 2022-07-27 14:56 ` David Sterba 2022-07-27 15:14 ` Chris Murphy 2022-07-27 17:47 ` David Sterba 2022-07-28 12:27 ` Chris Murphy 2022-07-27 15:26 ` Neal Gompa 2022-08-05 15:14 ` David Sterba
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.