All of lore.kernel.org
 help / color / mirror / Atom feed
* Metadata chunks on ssd?
@ 2019-12-23 13:44 Stéphane Lesimple
  2019-12-23 13:59 ` Hugo Mills
  2019-12-23 23:30 ` Hans van Kranenburg
  0 siblings, 2 replies; 7+ messages in thread
From: Stéphane Lesimple @ 2019-12-23 13:44 UTC (permalink / raw)
  To: Btrfs BTRFS

Hello btrfs gurus,

Has this ever been considered to implement a feature so that metadata 
chunks would always be allocated on a given set of disks part of the btrfs 
filesystem?

As metadata use can be intensive and some operations are known to be slow 
(such as backref walking), I'm under the (maybe wrong) impression that 
having a set of small ssd's just for the metadata would give quite a boost 
to a filesystem. Maybe even make qgroups more usable with volumes having 10 
snapshots?

This could just be a preference set on the allocator, so that a 6 disks 
raid1 FS with 4 spinning disks and 2 ssds prefer to allocate metadata on 
the ssd than on the slow drives (and falling back to spinning disks if ssds 
are full, with the possibility to rebalance later).

Would such a feature make sense?

-- 
Stephane.



^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Metadata chunks on ssd?
  2019-12-23 13:44 Metadata chunks on ssd? Stéphane Lesimple
@ 2019-12-23 13:59 ` Hugo Mills
  2019-12-23 23:30 ` Hans van Kranenburg
  1 sibling, 0 replies; 7+ messages in thread
From: Hugo Mills @ 2019-12-23 13:59 UTC (permalink / raw)
  To: Stéphane Lesimple; +Cc: Btrfs BTRFS

On Mon, Dec 23, 2019 at 02:44:07PM +0100, Stéphane Lesimple wrote:
> Hello btrfs gurus,
> 
> Has this ever been considered to implement a feature so that metadata chunks
> would always be allocated on a given set of disks part of the btrfs
> filesystem?

   Yes, many times. There was even an RFC implementation, many years
ago, but it wasn't merged.

   Hugo.

> As metadata use can be intensive and some operations are known to be slow
> (such as backref walking), I'm under the (maybe wrong) impression that
> having a set of small ssd's just for the metadata would give quite a boost
> to a filesystem. Maybe even make qgroups more usable with volumes having 10
> snapshots?
> 
> This could just be a preference set on the allocator, so that a 6 disks
> raid1 FS with 4 spinning disks and 2 ssds prefer to allocate metadata on the
> ssd than on the slow drives (and falling back to spinning disks if ssds are
> full, with the possibility to rebalance later).
> 
> Would such a feature make sense?
> 

-- 
Hugo Mills             | Quantum Mechanics: the dreams stuff is made of.
hugo@... carfax.org.uk |
http://carfax.org.uk/  |
PGP: E2AB1DE4          |

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Metadata chunks on ssd?
  2019-12-23 13:44 Metadata chunks on ssd? Stéphane Lesimple
  2019-12-23 13:59 ` Hugo Mills
@ 2019-12-23 23:30 ` Hans van Kranenburg
  2019-12-24  2:04   ` Wang Shilong
  1 sibling, 1 reply; 7+ messages in thread
From: Hans van Kranenburg @ 2019-12-23 23:30 UTC (permalink / raw)
  To: Stéphane Lesimple, Btrfs BTRFS

Hi Stéphane,

On 12/23/19 2:44 PM, Stéphane Lesimple wrote:
> 
> Has this ever been considered to implement a feature so that metadata 
> chunks would always be allocated on a given set of disks part of the btrfs 
> filesystem?

Yes, many times.

> As metadata use can be intensive and some operations are known to be slow 
> (such as backref walking), I'm under the (maybe wrong) impression that 
> having a set of small ssd's just for the metadata would give quite a boost 
> to a filesystem. Maybe even make qgroups more usable with volumes having 10 
> snapshots?

No, it's not wrong. For bigger filesystems this would certainly help.

> This could just be a preference set on the allocator,

Yes. Now, the big question is, how do we 'just' set this preference?

Be sure to take into account that the filesystem has no way to find out
itself which disks are those ssds. There's no easy way to discover this
in a running system.

> so that a 6 disks 
> raid1 FS with 4 spinning disks and 2 ssds prefer to allocate metadata on 
> the ssd than on the slow drives (and falling back to spinning disks if ssds 
> are full, with the possibility to rebalance later).
> 
> Would such a feature make sense?

Absolutely.

Hans


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Metadata chunks on ssd?
  2019-12-23 23:30 ` Hans van Kranenburg
@ 2019-12-24  2:04   ` Wang Shilong
  2019-12-24  8:20     ` Stéphane Lesimple
  2019-12-24 12:40     ` Austin S. Hemmelgarn
  0 siblings, 2 replies; 7+ messages in thread
From: Wang Shilong @ 2019-12-24  2:04 UTC (permalink / raw)
  To: Hans van Kranenburg; +Cc: Stéphane Lesimple, Btrfs BTRFS

On Tue, Dec 24, 2019 at 7:38 AM Hans van Kranenburg <hans@knorrie.org> wrote:
>
> Hi Stéphane,
>
> On 12/23/19 2:44 PM, Stéphane Lesimple wrote:
> >
> > Has this ever been considered to implement a feature so that metadata
> > chunks would always be allocated on a given set of disks part of the btrfs
> > filesystem?
>
> Yes, many times.
>

I implement it locally before for my local testing before.

> > As metadata use can be intensive and some operations are known to be slow
> > (such as backref walking), I'm under the (maybe wrong) impression that
> > having a set of small ssd's just for the metadata would give quite a boost
> > to a filesystem. Maybe even make qgroups more usable with volumes having 10
> > snapshots?
>
> No, it's not wrong. For bigger filesystems this would certainly help.
>
> > This could just be a preference set on the allocator,
>
> Yes. Now, the big question is, how do we 'just' set this preference?
>
> Be sure to take into account that the filesystem has no way to find out
> itself which disks are those ssds. There's no easy way to discover this
> in a running system.
>

No, there is API for filesystem to detect whether lower device is SSD or not.
Something like:
       if (!blk_queue_nonrot(q))
                fs_devices->rotating = 1;

Currently, btrfs will treat filesystem as rotational disks if any of
one disk is rotational,
We might record how many non-rotational disks, and make chunk allocation try SSD
firstly if it possible.

> > so that a 6 disks
> > raid1 FS with 4 spinning disks and 2 ssds prefer to allocate metadata on
> > the ssd than on the slow drives (and falling back to spinning disks if ssds
> > are full, with the possibility to rebalance later).
> >
> > Would such a feature make sense?
>
> Absolutely.
>
> Hans
>

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Metadata chunks on ssd?
  2019-12-24  2:04   ` Wang Shilong
@ 2019-12-24  8:20     ` Stéphane Lesimple
  2019-12-24 12:40     ` Austin S. Hemmelgarn
  1 sibling, 0 replies; 7+ messages in thread
From: Stéphane Lesimple @ 2019-12-24  8:20 UTC (permalink / raw)
  To: Wang Shilong, Hans van Kranenburg; +Cc: Btrfs BTRFS



Le 24 décembre 2019 03:04:34 Wang Shilong <wangshilong1991@gmail.com> a écrit :

> On Tue, Dec 24, 2019 at 7:38 AM Hans van Kranenburg <hans@knorrie.org> wrote:
>>
>> Hi Stéphane,
>>
>> On 12/23/19 2:44 PM, Stéphane Lesimple wrote:
>>>
>>> Has this ever been considered to implement a feature so that metadata
>>> chunks would always be allocated on a given set of disks part of the btrfs
>>> filesystem?
>>
>> Yes, many times.
>
> I implement it locally before for my local testing before.

If you happen to still have a couple patches around (even if they dont 
apply cleanly), I'm interested!

>
>>> As metadata use can be intensive and some operations are known to be slow
>>> (such as backref walking), I'm under the (maybe wrong) impression that
>>> having a set of small ssd's just for the metadata would give quite a boost
>>> to a filesystem. Maybe even make qgroups more usable with volumes having 10
>>> snapshots?
>>
>> No, it's not wrong. For bigger filesystems this would certainly help.
>>
>>> This could just be a preference set on the allocator,
>>
>> Yes. Now, the big question is, how do we 'just' set this preference?
>>
>> Be sure to take into account that the filesystem has no way to find out
>> itself which disks are those ssds. There's no easy way to discover this
>> in a running system.
>
> No, there is API for filesystem to detect whether lower device is SSD or not.
> Something like:
>       if (!blk_queue_nonrot(q))
>                fs_devices->rotating = 1;
>
> Currently, btrfs will treat filesystem as rotational disks if any of
> one disk is rotational,
> We might record how many non-rotational disks, and make chunk allocation 
> try SSD
> firstly if it possible.

My first idea was a preference to be set by the admin himself actually. A 
tristate that would be set for data chunks and for metadata chunks, for 
each device. Something like:

- "never": never allocate this type of chunk on this device
- "default": keep the current allocator behavior
- "always": consider this device first when trying to allocate

The allocator would then run in 2 passes max, like this:

- try to allocate considering only the device tagged "always"
- if this fails, try to allocate considering the devices tagged "always" 
and "default"

For my setup, I would use 2 small SSDs set to metadata=always data=never, 
and the other rotational drives set to metadata=default data=default.

If we want to autodetect things while not being too invasive, on fs 
creation and on device add/replace, we could autodetect the rotational 
status, and maybe set data=default metadata=always for non-rotational 
drives, and data=default metadata=default for the others. This would 
enhance the overall experience without getting the user potential "out of 
space" issues he wouldn't expect.

Migration from a previous preference-unaware code would see all the prefs 
set to "default", without adding a incompat flag as being 
preference-unaware doesn't break the filesystem.

-- 
Stéphane.



^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Metadata chunks on ssd?
  2019-12-24  2:04   ` Wang Shilong
  2019-12-24  8:20     ` Stéphane Lesimple
@ 2019-12-24 12:40     ` Austin S. Hemmelgarn
  2019-12-24 12:58       ` Stéphane Lesimple
  1 sibling, 1 reply; 7+ messages in thread
From: Austin S. Hemmelgarn @ 2019-12-24 12:40 UTC (permalink / raw)
  To: Wang Shilong, Hans van Kranenburg; +Cc: Stéphane Lesimple, Btrfs BTRFS

On 2019-12-23 21:04, Wang Shilong wrote:
> On Tue, Dec 24, 2019 at 7:38 AM Hans van Kranenburg <hans@knorrie.org> wrote:
>>
>> Hi Stéphane,
>>
>> On 12/23/19 2:44 PM, Stéphane Lesimple wrote:
>>>
>>> Has this ever been considered to implement a feature so that metadata
>>> chunks would always be allocated on a given set of disks part of the btrfs
>>> filesystem?
>>
>> Yes, many times.
>>
> 
> I implement it locally before for my local testing before.
> 
>>> As metadata use can be intensive and some operations are known to be slow
>>> (such as backref walking), I'm under the (maybe wrong) impression that
>>> having a set of small ssd's just for the metadata would give quite a boost
>>> to a filesystem. Maybe even make qgroups more usable with volumes having 10
>>> snapshots?
>>
>> No, it's not wrong. For bigger filesystems this would certainly help.
>>
>>> This could just be a preference set on the allocator,
>>
>> Yes. Now, the big question is, how do we 'just' set this preference?
>>
>> Be sure to take into account that the filesystem has no way to find out
>> itself which disks are those ssds. There's no easy way to discover this
>> in a running system.
>>
> 
> No, there is API for filesystem to detect whether lower device is SSD or not.
> Something like:
>         if (!blk_queue_nonrot(q))
>                  fs_devices->rotating = 1;
> 
> Currently, btrfs will treat filesystem as rotational disks if any of
> one disk is rotational,
> We might record how many non-rotational disks, and make chunk allocation try SSD
> firstly if it possible.
This doesn't tell you that the device is an SSD though, just that it 
reports to the kernel as non+rotational. For example, NBD devices 
present as non-rotational by default, and in most cases you do _not_ 
want hot data on a network disc.

The important thing here is disk performance, not whether it's an SSD or 
not. An SD card is non-rotational and solid-state, but on most systems 
the performance is going to be sufficiently bad for BTRFS-type workloads 
that it's almost useless for this type of thing.
> 
>>> so that a 6 disks
>>> raid1 FS with 4 spinning disks and 2 ssds prefer to allocate metadata on
>>> the ssd than on the slow drives (and falling back to spinning disks if ssds
>>> are full, with the possibility to rebalance later).
>>>
>>> Would such a feature make sense?
>>
>> Absolutely.
>>
>> Hans
>>


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Metadata chunks on ssd?
  2019-12-24 12:40     ` Austin S. Hemmelgarn
@ 2019-12-24 12:58       ` Stéphane Lesimple
  0 siblings, 0 replies; 7+ messages in thread
From: Stéphane Lesimple @ 2019-12-24 12:58 UTC (permalink / raw)
  To: Austin S. Hemmelgarn, Wang Shilong, Hans van Kranenburg; +Cc: Btrfs BTRFS



Le 24 décembre 2019 13:40:56 "Austin S. Hemmelgarn" <ahferroin7@gmail.com> 
a écrit :

> On 2019-12-23 21:04, Wang Shilong wrote:
>> On Tue, Dec 24, 2019 at 7:38 AM Hans van Kranenburg <hans@knorrie.org> wrote:
>>>
>>> Hi Stéphane,
>>>
>>> On 12/23/19 2:44 PM, Stéphane Lesimple wrote:
>>>>
>>>> Has this ever been considered to implement a feature so that metadata
>>>> chunks would always be allocated on a given set of disks part of the btrfs
>>>> filesystem?
>>>
>>> Yes, many times.
>>
>> I implement it locally before for my local testing before.
>>
>>>> As metadata use can be intensive and some operations are known to be slow
>>>> (such as backref walking), I'm under the (maybe wrong) impression that
>>>> having a set of small ssd's just for the metadata would give quite a boost
>>>> to a filesystem. Maybe even make qgroups more usable with volumes having 10
>>>> snapshots?
>>>
>>> No, it's not wrong. For bigger filesystems this would certainly help.
>>>
>>>> This could just be a preference set on the allocator,
>>>
>>> Yes. Now, the big question is, how do we 'just' set this preference?
>>>
>>> Be sure to take into account that the filesystem has no way to find out
>>> itself which disks are those ssds. There's no easy way to discover this
>>> in a running system.
>>
>> No, there is API for filesystem to detect whether lower device is SSD or not.
>> Something like:
>> if (!blk_queue_nonrot(q))
>>         fs_devices->rotating = 1;
>>
>> Currently, btrfs will treat filesystem as rotational disks if any of
>> one disk is rotational,
>> We might record how many non-rotational disks, and make chunk allocation 
>> try SSD
>> firstly if it possible.
> This doesn't tell you that the device is an SSD though, just that it
> reports to the kernel as non+rotational. For example, NBD devices
> present as non-rotational by default, and in most cases you do _not_
> want hot data on a network disc.
>
> The important thing here is disk performance, not whether it's an SSD or
> not. An SD card is non-rotational and solid-state, but on most systems
> the performance is going to be sufficiently bad for BTRFS-type workloads
> that it's almost useless for this type of thing.

That's a good point, which is why I think this kind of preference should be 
set manually by the user on fs creation, on device add/replace or anytime 
later with "btrfs device set allocator.hint.metadata always /tank".

Now, we might still want to add some autodetection routine candy to the 
btrfs user space tool, or for mkfs.btrfs, albeit not enabled by default as 
your counter-examples indicate. But that's entirely optional. A manual mode 
would already be awesome.

>
>>
>>>> so that a 6 disks
>>>> raid1 FS with 4 spinning disks and 2 ssds prefer to allocate metadata on
>>>> the ssd than on the slow drives (and falling back to spinning disks if ssds
>>>> are full, with the possibility to rebalance later).
>>>>
>>>> Would such a feature make sense?
>>>
>>> Absolutely.
>>>
>>> Hans




^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2019-12-24 12:58 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-12-23 13:44 Metadata chunks on ssd? Stéphane Lesimple
2019-12-23 13:59 ` Hugo Mills
2019-12-23 23:30 ` Hans van Kranenburg
2019-12-24  2:04   ` Wang Shilong
2019-12-24  8:20     ` Stéphane Lesimple
2019-12-24 12:40     ` Austin S. Hemmelgarn
2019-12-24 12:58       ` Stéphane Lesimple

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.