linux-btrfs.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Anand Jain <anand.jain@oracle.com>
To: dsterba@suse.cz
Cc: linux-btrfs@vger.kernel.org
Subject: Re: [PATCH RFC 0/5] readmirror feature
Date: Fri, 12 Apr 2019 17:02:01 +0800	[thread overview]
Message-ID: <8d31c3a2-3fb0-63af-3173-948ed0e84de3@oracle.com> (raw)
In-Reply-To: <20190409154840.GM29086@twin.jikos.cz>


Thanks for the comments.. more below.

On 9/4/19 11:48 PM, David Sterba wrote:
> On Tue, Mar 19, 2019 at 06:00:19PM +0800, Anand Jain wrote:
>> RFC patch as of now, appreciate your comments. This patch set has
>> been tested.
>>
>> Function call chain  __btrfs_map_block()->find_live_mirror() uses
>> thread pid to determine the %mirror_num when the mirror_num=0.
>>      
>> The pid based mirror_num extrapolation has following disadvantages
>>   . A large single-process read IO will read only from one disk.
> 
> Please note that the pid refers to the metadata kernel threads that
> submit the IO, so on average the IO is serviced by different threads
> with random PID. In the end it gets distributed over multiple drives.
> But it's nott ideal, no argument about that.

  ok.

>>   . In a worst scenario all processes read accessing the FS could have
>>         either odd or even pid, the read IO gets skewed.
>>   . There is no deterministic way of knowing/controlling which copy will
>>         be used for reading.
>>   . May see performance variations for a given set of multi process
>>         workload ran at different times.
>>      
>> So we need other types of readmirror policies.
>>      
>> This patch introduces a framework so that we can add more policies, and
>> converts the existing %pid into as a configurable parameter using the
>> property. And also provides a transient readmirror mount option, so that
>> this property can be applied for the read io during mount and for
>> readonly FS.
> 
> I think the mirror selection by pid was a temporary solution (that
> stuck, as it always happens). On average it does not work bad. Once we
> have a better way then it can be dropped. So I'm not convinced it needs
> to be one of the configuration options.

  Ok. Thanks for confirming, we could replace pid with something better,
  as of now we have devid-based and device-qdepth based, IMO qdepth based
  should be default as I commented in the respective patch.

> What exactly do you mean by 'transient'? I see that it could be needed
> for mount or read-only, though I don't see the usefulness of that.

  I mean to say does not persist across reboot. To override the
  original setting lets say device-qdepth based, but momentarily
  to avoid known read errors from a disk.

>>   For example:
>>     btrfs property set <mnt> readmirror pid
>>     btrfs property set <mnt> readmirror ""
>>     btrfs property set <mnt> readmirror devid<n>
>>
>>     mount -o readmirror=pid <mnt>
>>     mount -o readmirror=devid<n> <mnt>
>>
>> Please note:
>>   The property storage is an extented attributes of root inode
>>   (BTRFS_FS_TREE_OBJECTID), the other choice is a new ondisk KEY,
>>   which is open for comments.
> 
> A new key type is not going to be allocated for that, the extended
> attributes for properties are there. Another question is where to attach
> the property as this is a whole-filesystem property, we don't have an
> example to follow so far.

  Ok no key.
  Yep its a whole filesystem property. As of now it goes to the root
  inode as an extended attribute. I find it reasonable.

>>   This patch uses the unused btrfs_device::type, and does not use the
>>   bitwise ops because as type is u64 and bitwise ops need u32, so we
>>   need 32bits of the btrfs_device::type. Which is a kind of messy.
>>   Its open for suggestion for any better way.
> 
> For a prototype and early testing, using the type is probably ok, but
> I don't think it's right for permanent storage. Besides I'm sure that
> the u64 will not suffice in the long run.

Hmm.. I don't know. I thought
   <upper 32 bits> for disk groups
   <lower 32 bits> for disk types
should suffice. Not too sure what I am missing.

Thanks, Anand.


      reply	other threads:[~2019-04-12  9:02 UTC|newest]

Thread overview: 13+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-03-19 10:00 [PATCH RFC 0/5] readmirror feature Anand Jain
2019-03-19 10:00 ` [PATCH RFC 1/5] btrfs: add inode pointer to prop_handler::validate() Anand Jain
2019-03-19 10:00 ` [PATCH RFC 2/5] btrfs: add readmirror pid property Anand Jain
2019-03-19 10:00 ` [PATCH RFC 3/5] btrfs: add readmirror devid property Anand Jain
2019-03-19 10:00 ` [PATCH RFC 4/5] btrfs: add readmirror pid mount option Anand Jain
2019-03-19 10:00 ` [PATCH RFC 5/5] btrfs: add readmirror devid " Anand Jain
2019-03-19 10:02 ` [PATCH RFC 1/2] btrfs-progs: add helper to create xattr name Anand Jain
2019-03-19 10:02   ` [PATCH RFC 2/2] btrfs-progs: add readmirror policy Anand Jain
2019-03-21 16:26 ` [PATCH RFC 0/5] readmirror feature Steven Davies
2019-03-21 18:26   ` waxhead
2019-03-22  6:08   ` Anand Jain
2019-04-09 15:48 ` David Sterba
2019-04-12  9:02   ` Anand Jain [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=8d31c3a2-3fb0-63af-3173-948ed0e84de3@oracle.com \
    --to=anand.jain@oracle.com \
    --cc=dsterba@suse.cz \
    --cc=linux-btrfs@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).