From: Paul Jones <paul@pauljones.id.au>
To: Goffredo Baroncelli <kreijack@libero.it>,
"linux-btrfs@vger.kernel.org" <linux-btrfs@vger.kernel.org>
Cc: Zygo Blaxell <ce3g8jdj@umail.furryterror.org>,
Josef Bacik <josef@toxicpanda.com>,
David Sterba <dsterba@suse.cz>,
Sinnamohideen Shafeeq <shafeeqs@panasas.com>,
Goffredo Baroncelli <kreijack@inwind.it>
Subject: RE: [RFC][V8][PATCH 0/5] btrfs: allocation_hint mode
Date: Mon, 13 Dec 2021 09:39:15 +0000 [thread overview]
Message-ID: <SYXPR01MB1918689AF49BE6E6E031C8B69E749@SYXPR01MB1918.ausprd01.prod.outlook.com> (raw)
In-Reply-To: <cover.1635089352.git.kreijack@inwind.it>
> -----Original Message-----
> From: Goffredo Baroncelli <kreijack@tiscali.it>
> Sent: Monday, 25 October 2021 2:31 AM
> To: linux-btrfs@vger.kernel.org
> Cc: Zygo Blaxell <ce3g8jdj@umail.furryterror.org>; Josef Bacik
> <josef@toxicpanda.com>; David Sterba <dsterba@suse.cz>; Sinnamohideen
> Shafeeq <shafeeqs@panasas.com>; Goffredo Baroncelli
> <kreijack@inwind.it>
> Subject: [RFC][V8][PATCH 0/5] btrfs: allocation_hint mode
>
> From: Goffredo Baroncelli <kreijack@inwind.it>
>
> Hi all,
>
> This patches set was born after some discussion between me, Zygo and
> Josef.
> Some details can be found in https://github.com/btrfs/btrfs-todo/issues/19.
>
> Some further information about a real use case can be found in
> https://lore.kernel.org/linux-
> btrfs/20210116002533.GE31381@hungrycats.org/
>
> Reently Shafeeq told me that he is interested too, due to the performance
> gain.
>
> In this revision I switched away from an ioctl API in favor of a sysfs API ( see
> patch #2 and #3).
>
> The idea behind this patches set, is to dedicate some disks (the fastest one)
> to the metadata chunk. My initial idea was a "soft" hint. However Zygo asked
> an option for a "strong" hint (== mandatory). The result is that each disk can
> be "tagged" by one of the following flags:
> - BTRFS_DEV_ALLOCATION_METADATA_ONLY
> - BTRFS_DEV_ALLOCATION_PREFERRED_METADATA
> - BTRFS_DEV_ALLOCATION_PREFERRED_DATA
> - BTRFS_DEV_ALLOCATION_DATA_ONLY
>
> When the chunk allocator search a disks to allocate a chunk, scans the disks in
> an order decided by these tags. For metadata, the order is:
> *_METADATA_ONLY
> *_PREFERRED_METADATA
> *_PREFERRED_DATA
>
> The *_DATA_ONLY are not eligible from metadata chunk allocation.
>
> For the data chunk, the order is reversed, and the *_METADATA_ONLY are
> excluded.
>
> The exact sort logic is to sort first for the "tag", and then for the space
> available. If there is no space available, the next "tag" disks set are selected.
>
> To set these tags, a new property called "allocation_hint" was created.
> There is a dedicated btrfs-prog patches set [[PATCH V5] btrfs-progs:
> allocation_hint disk property].
>
> $ sudo mount /dev/loop0 /mnt/test-btrfs/ $ for i in /dev/loop[0-9]; do sudo
> ./btrfs prop get $i allocation_hint; done devid=1, path=/dev/loop0:
> allocation_hint=PREFERRED_METADATA
> devid=2, path=/dev/loop1: allocation_hint=PREFERRED_METADATA
> devid=3, path=/dev/loop2: allocation_hint=PREFERRED_DATA devid=4,
> path=/dev/loop3: allocation_hint=PREFERRED_DATA devid=5,
> path=/dev/loop4: allocation_hint=PREFERRED_DATA devid=6,
> path=/dev/loop5: allocation_hint=DATA_ONLY devid=7, path=/dev/loop6:
> allocation_hint=METADATA_ONLY devid=8, path=/dev/loop7:
> allocation_hint=METADATA_ONLY
>
> $ sudo ./btrfs fi us /mnt/test-btrfs/
> Overall:
> Device size: 2.75GiB
> Device allocated: 1.34GiB
> Device unallocated: 1.41GiB
> Device missing: 0.00B
> Used: 400.89MiB
> Free (estimated): 1.04GiB (min: 1.04GiB)
> Data ratio: 2.00
> Metadata ratio: 1.00
> Global reserve: 3.25MiB (used: 0.00B)
> Multiple profiles: no
>
> Data,RAID1: Size:542.00MiB, Used:200.25MiB (36.95%)
> /dev/loop0 288.00MiB
> /dev/loop1 288.00MiB
> /dev/loop2 127.00MiB
> /dev/loop3 127.00MiB
> /dev/loop4 127.00MiB
> /dev/loop5 127.00MiB
>
> Metadata,single: Size:256.00MiB, Used:384.00KiB (0.15%)
> /dev/loop1 256.00MiB
>
> System,single: Size:32.00MiB, Used:16.00KiB (0.05%)
> /dev/loop0 32.00MiB
>
> Unallocated:
> /dev/loop0 704.00MiB
> /dev/loop1 480.00MiB
> /dev/loop2 1.00MiB
> /dev/loop3 1.00MiB
> /dev/loop4 1.00MiB
> /dev/loop5 1.00MiB
> /dev/loop6 128.00MiB
> /dev/loop7 128.00MiB
>
> # change the tag of some disks
>
> $ sudo ./btrfs prop set /dev/loop0 allocation_hint DATA_ONLY $ sudo ./btrfs
> prop set /dev/loop1 allocation_hint DATA_ONLY $ sudo ./btrfs prop set
> /dev/loop5 allocation_hint METADATA_ONLY
>
> $ for i in /dev/loop[0-9]; do sudo ./btrfs prop get $i allocation_hint; done
> devid=1, path=/dev/loop0: allocation_hint=DATA_ONLY devid=2,
> path=/dev/loop1: allocation_hint=DATA_ONLY devid=3, path=/dev/loop2:
> allocation_hint=PREFERRED_DATA devid=4, path=/dev/loop3:
> allocation_hint=PREFERRED_DATA devid=5, path=/dev/loop4:
> allocation_hint=PREFERRED_DATA devid=6, path=/dev/loop5:
> allocation_hint=METADATA_ONLY devid=7, path=/dev/loop6:
> allocation_hint=METADATA_ONLY devid=8, path=/dev/loop7:
> allocation_hint=METADATA_ONLY
>
> $ sudo btrfs bal start --full-balance /mnt/test-btrfs/ $ sudo ./btrfs fi us
> /mnt/test-btrfs/
> Overall:
> Device size: 2.75GiB
> Device allocated: 735.00MiB
> Device unallocated: 2.03GiB
> Device missing: 0.00B
> Used: 400.72MiB
> Free (estimated): 1.10GiB (min: 1.10GiB)
> Data ratio: 2.00
> Metadata ratio: 1.00
> Global reserve: 3.25MiB (used: 0.00B)
> Multiple profiles: no
>
> Data,RAID1: Size:288.00MiB, Used:200.19MiB (69.51%)
> /dev/loop0 288.00MiB
> /dev/loop1 288.00MiB
>
> Metadata,single: Size:127.00MiB, Used:336.00KiB (0.26%)
> /dev/loop5 127.00MiB
>
> System,single: Size:32.00MiB, Used:16.00KiB (0.05%)
> /dev/loop7 32.00MiB
>
> Unallocated:
> /dev/loop0 736.00MiB
> /dev/loop1 736.00MiB
> /dev/loop2 128.00MiB
> /dev/loop3 128.00MiB
> /dev/loop4 128.00MiB
> /dev/loop5 1.00MiB
> /dev/loop6 128.00MiB
> /dev/loop7 96.00MiB
>
>
> #As you can see all the metadata were placed on the disk loop5/loop7 even if
> #the most empty one are loop0 and loop1.
>
>
>
> TODO:
> - more tests
> - the tool which show the space available should consider the tagging (eg
> the disks tagged by _METADATA_ONLY should be excluded from the data
> availability)
>
>
> Comments are welcome
> BR
> G.Baroncelli
I've been running this patch series since about V4, works really well. Would be nice to have it merged eventually.
Tested By: Paul Jones <paul@pauljones.id.au>
next prev parent reply other threads:[~2021-12-13 9:41 UTC|newest]
Thread overview: 32+ messages / expand[flat|nested] mbox.gz Atom feed top
2021-10-24 15:31 [RFC][V8][PATCH 0/5] btrfs: allocation_hint mode Goffredo Baroncelli
2021-10-24 15:31 ` [PATCH 1/4] btrfs: add flags to give an hint to the chunk allocator Goffredo Baroncelli
2021-10-24 15:31 ` [PATCH 2/4] btrfs: export dev_item.type in /sys/fs/btrfs/<uuid>/devinfo/<devid>/type Goffredo Baroncelli
2021-10-24 15:31 ` [PATCH 3/4] btrfs: change the DEV_ITEM 'type' field via sysfs Goffredo Baroncelli
2021-10-24 15:31 ` [PATCH 4/4] btrfs: add allocator_hint mode Goffredo Baroncelli
2021-12-17 15:58 ` Hans van Kranenburg
2021-12-17 18:28 ` Goffredo Baroncelli
2021-12-17 19:41 ` Zygo Blaxell
2021-12-18 9:07 ` Goffredo Baroncelli
2021-12-18 22:48 ` Zygo Blaxell
2021-12-19 0:03 ` Graham Cobb
2021-12-19 2:30 ` Zygo Blaxell
2021-12-13 9:39 ` Paul Jones [this message]
2021-12-13 19:54 ` [RFC][V8][PATCH 0/5] btrfs: allocation_hint mode Goffredo Baroncelli
2021-12-13 21:15 ` Josef Bacik
2021-12-13 22:49 ` Zygo Blaxell
2021-12-14 14:31 ` Josef Bacik
2021-12-14 19:03 ` Goffredo Baroncelli
2021-12-14 20:04 ` Zygo Blaxell
2021-12-14 20:34 ` Josef Bacik
2021-12-14 20:41 ` Goffredo Baroncelli
2021-12-15 13:58 ` Josef Bacik
2021-12-15 18:53 ` Goffredo Baroncelli
2021-12-16 0:56 ` Josef Bacik
2021-12-17 5:40 ` Zygo Blaxell
2021-12-17 14:48 ` Josef Bacik
2021-12-17 16:31 ` Zygo Blaxell
2021-12-17 18:08 ` Goffredo Baroncelli
2021-12-16 2:30 ` Paul Jones
2021-12-14 1:03 ` Sinnamohideen, Shafeeq
2021-12-14 18:53 ` Goffredo Baroncelli
2021-12-14 20:35 ` Josef Bacik
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=SYXPR01MB1918689AF49BE6E6E031C8B69E749@SYXPR01MB1918.ausprd01.prod.outlook.com \
--to=paul@pauljones.id.au \
--cc=ce3g8jdj@umail.furryterror.org \
--cc=dsterba@suse.cz \
--cc=josef@toxicpanda.com \
--cc=kreijack@inwind.it \
--cc=kreijack@libero.it \
--cc=linux-btrfs@vger.kernel.org \
--cc=shafeeqs@panasas.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).