All of lore.kernel.org
 help / color / mirror / Atom feed
From: Goffredo Baroncelli <kreijack@inwind.it>
To: dsterba@suse.cz, linux-btrfs@vger.kernel.org,
	Josef Bacik <josef@toxicpanda.com>
Subject: Re: [PATCH rfc v3] New ioctl BTRFS_IOC_GET_CHUNK_INFO.
Date: Thu, 11 Jun 2020 13:50:09 +0200	[thread overview]
Message-ID: <d75fcfa5-0320-c6c2-13df-329cfdf45eaf@inwind.it> (raw)
In-Reply-To: <20200610203023.GL27795@twin.jikos.cz>

On 6/10/20 10:30 PM, David Sterba wrote:
> On Tue, May 26, 2020 at 10:19:35PM +0200, Goffredo Baroncelli wrote:
>> On 5/25/20 7:14 PM, David Sterba wrote:
>>> I'll start with the data structures
>>>
>>> On Thu, Mar 19, 2020 at 09:39:13PM +0100, Goffredo Baroncelli wrote:
>>>> From: Goffredo Baroncelli <kreijack@inwind.it>
>>>> +struct btrfs_chunk_info_stripe {
>>>> +	__u64 devid;
>>>> +	__u64 offset;
>>>> +	__u8 dev_uuid[BTRFS_UUID_SIZE];
>>>> +};
>>>> +
>>>> +struct btrfs_chunk_info {
>>>> +	/* logical start of this chunk */
>>>> +	__u64 offset;
>>>> +	/* size of this chunk in bytes */
>>>> +	__u64 length;
>>>> +
>>>> +	__u64 stripe_len;
>>>> +	__u64 type;
>>>> +
>>>> +	/* 2^16 stripes is quite a lot, a second limit is the size of a single
>>>> +	 * item in the btree
>>>> +	 */
>>>> +	__u16 num_stripes;
>>>> +
>>>> +	/* sub stripes only matter for raid10 */
>>>> +	__u16 sub_stripes;
>>>> +
>>>> +	struct btrfs_chunk_info_stripe stripes[1];
>>>> +	/* additional stripes go here */
>>>> +};
>>>
>>> This looks like a copy of btrfs_chunk and stripe, only removing items
>>> not needed for the chunk information. Rather than copying the
>>> unnecessary fileds like dev_uuid in stripe, this should be designed for
>>> data exchange with the usecase in mind.
>>
>> There are two clients for this api:
>> - btrfs fi us
>> - btrfs dev us
>>
>> We can get rid of:
>> 	- "offset" fields (2x)
>> 	- "uuid" fields
>>
>> However the "offset" fields can be used to understand where a logical map
>> is on the physical disks. I am thinking about a graphical tool to show this
>> mapping, which doesn't exits yet -).
>> The offset field may be used as key to get further information (like the chunk
>> usage, see below)
>>
>> Regarding the UUID field, I agree it can be removed because it is redundant (there
>> is already the devid)
> 
> Offset is ok. I had something like this:
> 
> struct dump_chunks_entry {
>         u64 devid;
>         u64 start;
>         u64 lstart;
>         u64 length;
>         u64 flags;
>         u64 used;
> };
> 
> This selects the most interesting data from the CHUNK_ITEM, except the
> 'used' member, see below.


The structure above is a structure "device basis". This means that for (e.g.) raidX chunks the fields:
- lstart
- length
- flags
- used
are repeated

In fact only devid and start are device specific.
I see the following possibilities

1)

struct dump_chunks_entry {
          u64 lstart;
          u64 length;
          u64 flags;
          u64 used;
          u64 start;
	 u64 devid;
}

pro: simple api
cons: waste of space (60% of data are repeated

2)

struct dump_chunk_disk_entry {
          u64 devid;
          u64 start;
}

struct dump_chunks_entry {
          u64 lstart;
          u64 length;
          u64 flags;
          u64 used;
	 u16 disks_count;
	 struct dump_chunk_disk_entry disks[]
};

pro: smaller data
cons: variable length data

3)

two different ioctl

BTRFS_IOC_DUMP_BLOCK_GROUP

struct dump_block_group {
	u64	lstart
	u64	used
	u64	length
	u64	flags
}

BTRFS_IOC_DUMP_CHUNK

struct dump_chunks_entry {
          u64 lstart;
          u64 start;
	 u64 devid;
}


Where the filed lstart is the key

pro: smaller data (only lstart is repeated); quite orthogonal api
cons: two IOCTLs

Considering that having as optional the "used" field, means to have two ioctl (or one ioctl with a parameter, but this is not so different).
More I think, more I like option 3), however I am less happy to loose an information like sub_stripes (will something like RAID50 comes to BTRFS ?). Unfortunately to have this information, we need to duplicate it for each dump_chunks_entry...

GB





> 
>>> The format does not need follow the exact layout that kernel uses, ie.
>>> chunk info with one embedded stripe and then followed by variable length
>>> array of further stripes. This is convenient in one way but not in
>>> another one. Alternatively each chunk can be emitted as a single entry,
>>> duplicating part of the common fields and adding the stripe-specific
>>> ones. This is for consideration.
>>>
>>> I've looked at my old code doing the chunk dump based on the search
>>> ioctl and found that it also allows to read the chunk usage, with one
>>> extra search to the block group item where the usage is stored. As this
>>> is can be slow, it should be optional. Ie. the main ioctl structure
>>> needs flags where this can be requested.
>>
>> This info could be very useful. I think to something like a balance of
>> chunks which are near filled (or near empty). The question is if we
>> should have a different ioctl.
> 
> I was not proposing a new ioctl but designing the data exchange format
> to optionally provide a way to pass more information, like the usage.
> The reference to search ioctl was merely to point out that there's one
> more search for BLOCK_GROUP_ITEM where the 'used' is stored. As this is
> potentially expensive, it won't be filled by default.
> 
> The structure above does not capture all the chunk data. We could pack
> more such structures into one ioctl call. I think that num_stripes is
> missing from there as this would make possible the raid56 calculations
> but otherwise it should be it.
> 


-- 
gpg @keyserver.linux.it: Goffredo Baroncelli <kreijackATinwind.it>
Key fingerprint BBF5 1610 0B64 DAC6 5F7D  17B2 0EDA 9B37 8B82 E0B5

  reply	other threads:[~2020-06-11 11:50 UTC|newest]

Thread overview: 15+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-03-19 20:39 [PATCH RFC V3] new ioctl BTRFS_IOC_GET_CHUNK_INFO Goffredo Baroncelli
2020-03-19 20:39 ` [PATCH rfc v3] New " Goffredo Baroncelli
2020-03-19 20:59   ` Josef Bacik
2020-03-19 21:09     ` Goffredo Baroncelli
2020-05-25 17:14   ` David Sterba
2020-05-26 20:19     ` Goffredo Baroncelli
2020-05-28 21:03       ` Hans van Kranenburg
2020-05-29 16:23         ` Goffredo Baroncelli
2020-05-29 18:54           ` Hans van Kranenburg
2020-05-29 18:59             ` Hans van Kranenburg
2020-05-30  6:41             ` Goffredo Baroncelli
2020-06-10 20:30       ` David Sterba
2020-06-11 11:50         ` Goffredo Baroncelli [this message]
2020-06-14 11:06           ` Goffredo Baroncelli
2020-05-25 16:39 ` [PATCH RFC V3] new " David Sterba

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=d75fcfa5-0320-c6c2-13df-329cfdf45eaf@inwind.it \
    --to=kreijack@inwind.it \
    --cc=dsterba@suse.cz \
    --cc=josef@toxicpanda.com \
    --cc=linux-btrfs@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.