From: Goffredo Baroncelli <kreijack@libero.it>
To: Josef Bacik <josef@toxicpanda.com>, David Sterba <dsterba@suse.cz>
Cc: Sinnamohideen Shafeeq <shafeeqs@panasas.com>,
Goffredo Baroncelli <kreijack@inwind.it>,
Paul Jones <paul@pauljones.id.au>,
"linux-btrfs@vger.kernel.org" <linux-btrfs@vger.kernel.org>,
Zygo Blaxell <ce3g8jdj@umail.furryterror.org>
Subject: Re: [RFC][V8][PATCH 0/5] btrfs: allocation_hint mode
Date: Mon, 13 Dec 2021 20:54:24 +0100 [thread overview]
Message-ID: <1d725df7-b435-4acf-4d17-26c2bd171c1a@libero.it> (raw)
In-Reply-To: <SYXPR01MB1918689AF49BE6E6E031C8B69E749@SYXPR01MB1918.ausprd01.prod.outlook.com>
Gentle ping :-)
Are there anyone of the mains developer interested in supporting this patch ?
I am open to improve it if required.
BR
G.Baroncelli
On 12/13/21 10:39, Paul Jones wrote:
>> -----Original Message-----
>> From: Goffredo Baroncelli <kreijack@tiscali.it>
>> Sent: Monday, 25 October 2021 2:31 AM
>> To: linux-btrfs@vger.kernel.org
>> Cc: Zygo Blaxell <ce3g8jdj@umail.furryterror.org>; Josef Bacik
>> <josef@toxicpanda.com>; David Sterba <dsterba@suse.cz>; Sinnamohideen
>> Shafeeq <shafeeqs@panasas.com>; Goffredo Baroncelli
>> <kreijack@inwind.it>
>> Subject: [RFC][V8][PATCH 0/5] btrfs: allocation_hint mode
>>
>> From: Goffredo Baroncelli <kreijack@inwind.it>
>>
>> Hi all,
>>
>> This patches set was born after some discussion between me, Zygo and
>> Josef.
>> Some details can be found in https://github.com/btrfs/btrfs-todo/issues/19.
>>
>> Some further information about a real use case can be found in
>> https://lore.kernel.org/linux-
>> btrfs/20210116002533.GE31381@hungrycats.org/
>>
>> Reently Shafeeq told me that he is interested too, due to the performance
>> gain.
>>
>> In this revision I switched away from an ioctl API in favor of a sysfs API ( see
>> patch #2 and #3).
>>
>> The idea behind this patches set, is to dedicate some disks (the fastest one)
>> to the metadata chunk. My initial idea was a "soft" hint. However Zygo asked
>> an option for a "strong" hint (== mandatory). The result is that each disk can
>> be "tagged" by one of the following flags:
>> - BTRFS_DEV_ALLOCATION_METADATA_ONLY
>> - BTRFS_DEV_ALLOCATION_PREFERRED_METADATA
>> - BTRFS_DEV_ALLOCATION_PREFERRED_DATA
>> - BTRFS_DEV_ALLOCATION_DATA_ONLY
>>
>> When the chunk allocator search a disks to allocate a chunk, scans the disks in
>> an order decided by these tags. For metadata, the order is:
>> *_METADATA_ONLY
>> *_PREFERRED_METADATA
>> *_PREFERRED_DATA
>>
>> The *_DATA_ONLY are not eligible from metadata chunk allocation.
>>
>> For the data chunk, the order is reversed, and the *_METADATA_ONLY are
>> excluded.
>>
>> The exact sort logic is to sort first for the "tag", and then for the space
>> available. If there is no space available, the next "tag" disks set are selected.
>>
>> To set these tags, a new property called "allocation_hint" was created.
>> There is a dedicated btrfs-prog patches set [[PATCH V5] btrfs-progs:
>> allocation_hint disk property].
>>
>> $ sudo mount /dev/loop0 /mnt/test-btrfs/ $ for i in /dev/loop[0-9]; do sudo
>> ./btrfs prop get $i allocation_hint; done devid=1, path=/dev/loop0:
>> allocation_hint=PREFERRED_METADATA
>> devid=2, path=/dev/loop1: allocation_hint=PREFERRED_METADATA
>> devid=3, path=/dev/loop2: allocation_hint=PREFERRED_DATA devid=4,
>> path=/dev/loop3: allocation_hint=PREFERRED_DATA devid=5,
>> path=/dev/loop4: allocation_hint=PREFERRED_DATA devid=6,
>> path=/dev/loop5: allocation_hint=DATA_ONLY devid=7, path=/dev/loop6:
>> allocation_hint=METADATA_ONLY devid=8, path=/dev/loop7:
>> allocation_hint=METADATA_ONLY
>>
>> $ sudo ./btrfs fi us /mnt/test-btrfs/
>> Overall:
>> Device size: 2.75GiB
>> Device allocated: 1.34GiB
>> Device unallocated: 1.41GiB
>> Device missing: 0.00B
>> Used: 400.89MiB
>> Free (estimated): 1.04GiB (min: 1.04GiB)
>> Data ratio: 2.00
>> Metadata ratio: 1.00
>> Global reserve: 3.25MiB (used: 0.00B)
>> Multiple profiles: no
>>
>> Data,RAID1: Size:542.00MiB, Used:200.25MiB (36.95%)
>> /dev/loop0 288.00MiB
>> /dev/loop1 288.00MiB
>> /dev/loop2 127.00MiB
>> /dev/loop3 127.00MiB
>> /dev/loop4 127.00MiB
>> /dev/loop5 127.00MiB
>>
>> Metadata,single: Size:256.00MiB, Used:384.00KiB (0.15%)
>> /dev/loop1 256.00MiB
>>
>> System,single: Size:32.00MiB, Used:16.00KiB (0.05%)
>> /dev/loop0 32.00MiB
>>
>> Unallocated:
>> /dev/loop0 704.00MiB
>> /dev/loop1 480.00MiB
>> /dev/loop2 1.00MiB
>> /dev/loop3 1.00MiB
>> /dev/loop4 1.00MiB
>> /dev/loop5 1.00MiB
>> /dev/loop6 128.00MiB
>> /dev/loop7 128.00MiB
>>
>> # change the tag of some disks
>>
>> $ sudo ./btrfs prop set /dev/loop0 allocation_hint DATA_ONLY $ sudo ./btrfs
>> prop set /dev/loop1 allocation_hint DATA_ONLY $ sudo ./btrfs prop set
>> /dev/loop5 allocation_hint METADATA_ONLY
>>
>> $ for i in /dev/loop[0-9]; do sudo ./btrfs prop get $i allocation_hint; done
>> devid=1, path=/dev/loop0: allocation_hint=DATA_ONLY devid=2,
>> path=/dev/loop1: allocation_hint=DATA_ONLY devid=3, path=/dev/loop2:
>> allocation_hint=PREFERRED_DATA devid=4, path=/dev/loop3:
>> allocation_hint=PREFERRED_DATA devid=5, path=/dev/loop4:
>> allocation_hint=PREFERRED_DATA devid=6, path=/dev/loop5:
>> allocation_hint=METADATA_ONLY devid=7, path=/dev/loop6:
>> allocation_hint=METADATA_ONLY devid=8, path=/dev/loop7:
>> allocation_hint=METADATA_ONLY
>>
>> $ sudo btrfs bal start --full-balance /mnt/test-btrfs/ $ sudo ./btrfs fi us
>> /mnt/test-btrfs/
>> Overall:
>> Device size: 2.75GiB
>> Device allocated: 735.00MiB
>> Device unallocated: 2.03GiB
>> Device missing: 0.00B
>> Used: 400.72MiB
>> Free (estimated): 1.10GiB (min: 1.10GiB)
>> Data ratio: 2.00
>> Metadata ratio: 1.00
>> Global reserve: 3.25MiB (used: 0.00B)
>> Multiple profiles: no
>>
>> Data,RAID1: Size:288.00MiB, Used:200.19MiB (69.51%)
>> /dev/loop0 288.00MiB
>> /dev/loop1 288.00MiB
>>
>> Metadata,single: Size:127.00MiB, Used:336.00KiB (0.26%)
>> /dev/loop5 127.00MiB
>>
>> System,single: Size:32.00MiB, Used:16.00KiB (0.05%)
>> /dev/loop7 32.00MiB
>>
>> Unallocated:
>> /dev/loop0 736.00MiB
>> /dev/loop1 736.00MiB
>> /dev/loop2 128.00MiB
>> /dev/loop3 128.00MiB
>> /dev/loop4 128.00MiB
>> /dev/loop5 1.00MiB
>> /dev/loop6 128.00MiB
>> /dev/loop7 96.00MiB
>>
>>
>> #As you can see all the metadata were placed on the disk loop5/loop7 even if
>> #the most empty one are loop0 and loop1.
>>
>>
>>
>> TODO:
>> - more tests
>> - the tool which show the space available should consider the tagging (eg
>> the disks tagged by _METADATA_ONLY should be excluded from the data
>> availability)
>>
>>
>> Comments are welcome
>> BR
>> G.Baroncelli
>
>
> I've been running this patch series since about V4, works really well. Would be nice to have it merged eventually.
>
> Tested By: Paul Jones <paul@pauljones.id.au>
>
--
gpg @keyserver.linux.it: Goffredo Baroncelli <kreijackATinwind.it>
Key fingerprint BBF5 1610 0B64 DAC6 5F7D 17B2 0EDA 9B37 8B82 E0B5
next prev parent reply other threads:[~2021-12-13 20:02 UTC|newest]
Thread overview: 32+ messages / expand[flat|nested] mbox.gz Atom feed top
2021-10-24 15:31 [RFC][V8][PATCH 0/5] btrfs: allocation_hint mode Goffredo Baroncelli
2021-10-24 15:31 ` [PATCH 1/4] btrfs: add flags to give an hint to the chunk allocator Goffredo Baroncelli
2021-10-24 15:31 ` [PATCH 2/4] btrfs: export dev_item.type in /sys/fs/btrfs/<uuid>/devinfo/<devid>/type Goffredo Baroncelli
2021-10-24 15:31 ` [PATCH 3/4] btrfs: change the DEV_ITEM 'type' field via sysfs Goffredo Baroncelli
2021-10-24 15:31 ` [PATCH 4/4] btrfs: add allocator_hint mode Goffredo Baroncelli
2021-12-17 15:58 ` Hans van Kranenburg
2021-12-17 18:28 ` Goffredo Baroncelli
2021-12-17 19:41 ` Zygo Blaxell
2021-12-18 9:07 ` Goffredo Baroncelli
2021-12-18 22:48 ` Zygo Blaxell
2021-12-19 0:03 ` Graham Cobb
2021-12-19 2:30 ` Zygo Blaxell
2021-12-13 9:39 ` [RFC][V8][PATCH 0/5] btrfs: allocation_hint mode Paul Jones
2021-12-13 19:54 ` Goffredo Baroncelli [this message]
2021-12-13 21:15 ` Josef Bacik
2021-12-13 22:49 ` Zygo Blaxell
2021-12-14 14:31 ` Josef Bacik
2021-12-14 19:03 ` Goffredo Baroncelli
2021-12-14 20:04 ` Zygo Blaxell
2021-12-14 20:34 ` Josef Bacik
2021-12-14 20:41 ` Goffredo Baroncelli
2021-12-15 13:58 ` Josef Bacik
2021-12-15 18:53 ` Goffredo Baroncelli
2021-12-16 0:56 ` Josef Bacik
2021-12-17 5:40 ` Zygo Blaxell
2021-12-17 14:48 ` Josef Bacik
2021-12-17 16:31 ` Zygo Blaxell
2021-12-17 18:08 ` Goffredo Baroncelli
2021-12-16 2:30 ` Paul Jones
2021-12-14 1:03 ` Sinnamohideen, Shafeeq
2021-12-14 18:53 ` Goffredo Baroncelli
2021-12-14 20:35 ` Josef Bacik
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=1d725df7-b435-4acf-4d17-26c2bd171c1a@libero.it \
--to=kreijack@libero.it \
--cc=ce3g8jdj@umail.furryterror.org \
--cc=dsterba@suse.cz \
--cc=josef@toxicpanda.com \
--cc=kreijack@inwind.it \
--cc=linux-btrfs@vger.kernel.org \
--cc=paul@pauljones.id.au \
--cc=shafeeqs@panasas.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).