From: Goffredo Baroncelli <kreijack@libero.it>
To: Josef Bacik <josef@toxicpanda.com>, David Sterba <dsterba@suse.cz>
Cc: Sinnamohideen Shafeeq <shafeeqs@panasas.com>,
Goffredo Baroncelli <kreijack@inwind.it>,
Paul Jones <paul@pauljones.id.au>,
"linux-btrfs@vger.kernel.org" <linux-btrfs@vger.kernel.org>,
Zygo Blaxell <ce3g8jdj@umail.furryterror.org>
Subject: Re: [RFC][V8][PATCH 0/5] btrfs: allocation_hint mode
Date: Mon, 13 Dec 2021 20:54:24 +0100 [thread overview]
Message-ID: <1d725df7-b435-4acf-4d17-26c2bd171c1a@libero.it> (raw)
In-Reply-To: <SYXPR01MB1918689AF49BE6E6E031C8B69E749@SYXPR01MB1918.ausprd01.prod.outlook.com>
Gentle ping :-)
Are there anyone of the mains developer interested in supporting this patch ?
I am open to improve it if required.
BR
G.Baroncelli
On 12/13/21 10:39, Paul Jones wrote:
>> -----Original Message-----
>> From: Goffredo Baroncelli <kreijack@tiscali.it>
>> Sent: Monday, 25 October 2021 2:31 AM
>> To: linux-btrfs@vger.kernel.org
>> Cc: Zygo Blaxell <ce3g8jdj@umail.furryterror.org>; Josef Bacik
>> <josef@toxicpanda.com>; David Sterba <dsterba@suse.cz>; Sinnamohideen
>> Shafeeq <shafeeqs@panasas.com>; Goffredo Baroncelli
>> <kreijack@inwind.it>
>> Subject: [RFC][V8][PATCH 0/5] btrfs: allocation_hint mode
>>
>> From: Goffredo Baroncelli <kreijack@inwind.it>
>>
>> Hi all,
>>
>> This patches set was born after some discussion between me, Zygo and
>> Josef.
>> Some details can be found in https://github.com/btrfs/btrfs-todo/issues/19.
>>
>> Some further information about a real use case can be found in
>> https://lore.kernel.org/linux-
>> btrfs/20210116002533.GE31381@hungrycats.org/
>>
>> Reently Shafeeq told me that he is interested too, due to the performance
>> gain.
>>
>> In this revision I switched away from an ioctl API in favor of a sysfs API ( see
>> patch #2 and #3).
>>
>> The idea behind this patches set, is to dedicate some disks (the fastest one)
>> to the metadata chunk. My initial idea was a "soft" hint. However Zygo asked
>> an option for a "strong" hint (== mandatory). The result is that each disk can
>> be "tagged" by one of the following flags:
>> - BTRFS_DEV_ALLOCATION_METADATA_ONLY
>> - BTRFS_DEV_ALLOCATION_PREFERRED_METADATA
>> - BTRFS_DEV_ALLOCATION_PREFERRED_DATA
>> - BTRFS_DEV_ALLOCATION_DATA_ONLY
>>
>> When the chunk allocator search a disks to allocate a chunk, scans the disks in
>> an order decided by these tags. For metadata, the order is:
>> *_METADATA_ONLY
>> *_PREFERRED_METADATA
>> *_PREFERRED_DATA
>>
>> The *_DATA_ONLY are not eligible from metadata chunk allocation.
>>
>> For the data chunk, the order is reversed, and the *_METADATA_ONLY are
>> excluded.
>>
>> The exact sort logic is to sort first for the "tag", and then for the space
>> available. If there is no space available, the next "tag" disks set are selected.
>>
>> To set these tags, a new property called "allocation_hint" was created.
>> There is a dedicated btrfs-prog patches set [[PATCH V5] btrfs-progs:
>> allocation_hint disk property].
>>
>> $ sudo mount /dev/loop0 /mnt/test-btrfs/ $ for i in /dev/loop[0-9]; do sudo
>> ./btrfs prop get $i allocation_hint; done devid=1, path=/dev/loop0:
>> allocation_hint=PREFERRED_METADATA
>> devid=2, path=/dev/loop1: allocation_hint=PREFERRED_METADATA
>> devid=3, path=/dev/loop2: allocation_hint=PREFERRED_DATA devid=4,
>> path=/dev/loop3: allocation_hint=PREFERRED_DATA devid=5,
>> path=/dev/loop4: allocation_hint=PREFERRED_DATA devid=6,
>> path=/dev/loop5: allocation_hint=DATA_ONLY devid=7, path=/dev/loop6:
>> allocation_hint=METADATA_ONLY devid=8, path=/dev/loop7:
>> allocation_hint=METADATA_ONLY
>>
>> $ sudo ./btrfs fi us /mnt/test-btrfs/
>> Overall:
>> Device size: 2.75GiB
>> Device allocated: 1.34GiB
>> Device unallocated: 1.41GiB
>> Device missing: 0.00B
>> Used: 400.89MiB
>> Free (estimated): 1.04GiB (min: 1.04GiB)
>> Data ratio: 2.00
>> Metadata ratio: 1.00
>> Global reserve: 3.25MiB (used: 0.00B)
>> Multiple profiles: no
>>
>> Data,RAID1: Size:542.00MiB, Used:200.25MiB (36.95%)
>> /dev/loop0 288.00MiB
>> /dev/loop1 288.00MiB
>> /dev/loop2 127.00MiB
>> /dev/loop3 127.00MiB
>> /dev/loop4 127.00MiB
>> /dev/loop5 127.00MiB
>>
>> Metadata,single: Size:256.00MiB, Used:384.00KiB (0.15%)
>> /dev/loop1 256.00MiB
>>
>> System,single: Size:32.00MiB, Used:16.00KiB (0.05%)
>> /dev/loop0 32.00MiB
>>
>> Unallocated:
>> /dev/loop0 704.00MiB
>> /dev/loop1 480.00MiB
>> /dev/loop2 1.00MiB
>> /dev/loop3 1.00MiB
>> /dev/loop4 1.00MiB
>> /dev/loop5 1.00MiB
>> /dev/loop6 128.00MiB
>> /dev/loop7 128.00MiB
>>
>> # change the tag of some disks
>>
>> $ sudo ./btrfs prop set /dev/loop0 allocation_hint DATA_ONLY $ sudo ./btrfs
>> prop set /dev/loop1 allocation_hint DATA_ONLY $ sudo ./btrfs prop set
>> /dev/loop5 allocation_hint METADATA_ONLY
>>
>> $ for i in /dev/loop[0-9]; do sudo ./btrfs prop get $i allocation_hint; done
>> devid=1, path=/dev/loop0: allocation_hint=DATA_ONLY devid=2,
>> path=/dev/loop1: allocation_hint=DATA_ONLY devid=3, path=/dev/loop2:
>> allocation_hint=PREFERRED_DATA devid=4, path=/dev/loop3:
>> allocation_hint=PREFERRED_DATA devid=5, path=/dev/loop4:
>> allocation_hint=PREFERRED_DATA devid=6, path=/dev/loop5:
>> allocation_hint=METADATA_ONLY devid=7, path=/dev/loop6:
>> allocation_hint=METADATA_ONLY devid=8, path=/dev/loop7:
>> allocation_hint=METADATA_ONLY
>>
>> $ sudo btrfs bal start --full-balance /mnt/test-btrfs/ $ sudo ./btrfs fi us
>> /mnt/test-btrfs/
>> Overall:
>> Device size: 2.75GiB
>> Device allocated: 735.00MiB
>> Device unallocated: 2.03GiB
>> Device missing: 0.00B
>> Used: 400.72MiB
>> Free (estimated): 1.10GiB (min: 1.10GiB)
>> Data ratio: 2.00
>> Metadata ratio: 1.00
>> Global reserve: 3.25MiB (used: 0.00B)
>> Multiple profiles: no
>>
>> Data,RAID1: Size:288.00MiB, Used:200.19MiB (69.51%)
>> /dev/loop0 288.00MiB
>> /dev/loop1 288.00MiB
>>
>> Metadata,single: Size:127.00MiB, Used:336.00KiB (0.26%)
>> /dev/loop5 127.00MiB
>>
>> System,single: Size:32.00MiB, Used:16.00KiB (0.05%)
>> /dev/loop7 32.00MiB
>>
>> Unallocated:
>> /dev/loop0 736.00MiB
>> /dev/loop1 736.00MiB
>> /dev/loop2 128.00MiB
>> /dev/loop3 128.00MiB
>> /dev/loop4 128.00MiB
>> /dev/loop5 1.00MiB
>> /dev/loop6 128.00MiB
>> /dev/loop7 96.00MiB
>>
>>
>> #As you can see all the metadata were placed on the disk loop5/loop7 even if
>> #the most empty one are loop0 and loop1.
>>
>>
>>
>> TODO:
>> - more tests
>> - the tool which show the space available should consider the tagging (eg
>> the disks tagged by _METADATA_ONLY should be excluded from the data
>> availability)
>>
>>
>> Comments are welcome
>> BR
>> G.Baroncelli
>
>
> I've been running this patch series since about V4, works really well. Would be nice to have it merged eventually.
>
> Tested By: Paul Jones <paul@pauljones.id.au>
>
--
gpg @keyserver.linux.it: Goffredo Baroncelli <kreijackATinwind.it>
Key fingerprint BBF5 1610 0B64 DAC6 5F7D 17B2 0EDA 9B37 8B82 E0B5
next prev parent reply other threads:[~2021-12-13 20:02 UTC|newest]
Thread overview: 32+ messages / expand[flat|nested] mbox.gz Atom feed top
2021-10-24 15:31 [RFC][V8][PATCH 0/5] btrfs: allocation_hint mode Goffredo Baroncelli
2021-10-24 15:31 ` [PATCH 1/4] btrfs: add flags to give an hint to the chunk allocator Goffredo Baroncelli
2021-10-24 15:31 ` [PATCH 2/4] btrfs: export dev_item.type in /sys/fs/btrfs/<uuid>/devinfo/<devid>/type Goffredo Baroncelli
2021-10-24 15:31 ` [PATCH 3/4] btrfs: change the DEV_ITEM 'type' field via sysfs Goffredo Baroncelli
2021-10-24 15:31 ` [PATCH 4/4] btrfs: add allocator_hint mode Goffredo Baroncelli
2021-12-17 15:58 ` Hans van Kranenburg
2021-12-17 18:28 ` Goffredo Baroncelli
2021-12-17 19:41 ` Zygo Blaxell
2021-12-18 9:07 ` Goffredo Baroncelli
2021-12-18 22:48 ` Zygo Blaxell
2021-12-19 0:03 ` Graham Cobb
2021-12-19 2:30 ` Zygo Blaxell
2021-12-13 9:39 ` [RFC][V8][PATCH 0/5] btrfs: allocation_hint mode Paul Jones
2021-12-13 19:54 ` Goffredo Baroncelli [this message]
2021-12-13 21:15 ` Josef Bacik
2021-12-13 22:49 ` Zygo Blaxell
2021-12-14 14:31 ` Josef Bacik
2021-12-14 19:03 ` Goffredo Baroncelli
2021-12-14 20:04 ` Zygo Blaxell
2021-12-14 20:34 ` Josef Bacik
2021-12-14 20:41 ` Goffredo Baroncelli
2021-12-15 13:58 ` Josef Bacik
2021-12-15 18:53 ` Goffredo Baroncelli
2021-12-16 0:56 ` Josef Bacik
2021-12-17 5:40 ` Zygo Blaxell
2021-12-17 14:48 ` Josef Bacik
2021-12-17 16:31 ` Zygo Blaxell
2021-12-17 18:08 ` Goffredo Baroncelli
2021-12-16 2:30 ` Paul Jones
2021-12-14 1:03 ` Sinnamohideen, Shafeeq
2021-12-14 18:53 ` Goffredo Baroncelli
2021-12-14 20:35 ` Josef Bacik
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=1d725df7-b435-4acf-4d17-26c2bd171c1a@libero.it \
--to=kreijack@libero.it \
--cc=ce3g8jdj@umail.furryterror.org \
--cc=dsterba@suse.cz \
--cc=josef@toxicpanda.com \
--cc=kreijack@inwind.it \
--cc=linux-btrfs@vger.kernel.org \
--cc=paul@pauljones.id.au \
--cc=shafeeqs@panasas.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.