linux-btrfs.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Josef Bacik <josef@toxicpanda.com>
To: Anand Jain <anand.jain@oracle.com>
Cc: Josef Bacik <josef@toxicpanda.com>, Eli V <eliventer@gmail.com>,
	linux-btrfs <linux-btrfs@vger.kernel.org>
Subject: Re: [PATCH RFC v2 0/2] readmirror feature
Date: Thu, 12 Sep 2019 05:50:23 -0400	[thread overview]
Message-ID: <20190912095021.htmpvvowdprc2jhv@MacBook-Pro-91.local> (raw)
In-Reply-To: <2f10bebf-bc63-fe9e-d7d3-06b3113bc95c@oracle.com>

On Thu, Sep 12, 2019 at 03:41:42PM +0800, Anand Jain wrote:
> 
> 
> Thanks for the comments. More below.
> 
> On 12/9/19 3:16 AM, Josef Bacik wrote:
> > On Wed, Sep 11, 2019 at 03:13:21PM -0400, Eli V wrote:
> > > On Wed, Sep 11, 2019 at 2:46 PM Josef Bacik <josef@toxicpanda.com> wrote:
> > > > 
> > > > On Mon, Aug 26, 2019 at 05:04:36PM +0800, Anand Jain wrote:
> > > > > Function call chain  __btrfs_map_block()->find_live_mirror() uses
> > > > > thread pid to determine the %mirror_num when the mirror_num=0.
> > > > > 
> > > > > This patch introduces a framework so that we can add policies to determine
> > > > > the %mirror_num. And also adds the devid as the readmirror policy.
> > > > > 
> > > > > The new property is stored as an item in the device tree as show below.
> > > > >      (BTRFS_READMIRROR_OBJECTID, BTRFS_PERSISTENT_ITEM_KEY, devid)
> > > > > 
> > > > > To be able to set and get this new property also introduces new ioctls
> > > > > BTRFS_IOC_GET_READMIRROR and BTRFS_IOC_SET_READMIRROR. The ioctl argument
> > > > > is defined as
> > > > >          struct btrfs_ioctl_readmirror_args {
> > > > >                  __u64 type; /* RW */
> > > > >                  __u64 device_bitmap; /* RW */
> > > > >          }
> > > > > 
> > > > > An usage example as follows:
> > > > >          btrfs property set /btrfs readmirror devid:1,3
> > > > >          btrfs property get /btrfs readmirror
> > > > >            readmirror devid:1 3
> > > > >          btrfs property set /btrfs readmirror ""
> > > > >          btrfs property get /btrfs readmirror
> > > > >            readmirror default
> > > > > 
> > > > > This patchset has been tested completely, however marked as RFC for the
> > > > > following reasons and comments on them (or any other) are appreciated as
> > > > > usual.
> > > > >   . The new objectid is defined as
> > > > >        #define BTRFS_READMIRROR_OBJECTID -1ULL
> > > > >     Need consent we are fine to use this value, and with this value it
> > > > >     shall be placed just before the DEV_STATS_OBJECTID item which is more
> > > > >     frequently used only during the device errors.
> > > > > 
> > > > > .  I am using a u64 bitmap to represent the devices id, so the max device
> > > > >     id that we could represent is 63, its a kind of limitation which should
> > > > >     be addressed before integration, I wonder if there is any suggestion?
> > > > >     Kindly note that, multiple ioctls with each time representing a set of
> > > > >     device(s) is not a choice because we need to make sure the readmirror
> > > > >     changes happens in a commit transaction.
> > > > > 
> > > > > v1->RFC v2:
> > > > >    . Property is stored as a dev-tree item instead of root inode extended
> > > > >      attribute.
> > > > >    . Rename BTRFS_DEV_STATE_READ_OPRIMIZED to BTRFS_DEV_STATE_READ_PREFERRED.
> > > > >    . Changed format specifier from devid1,2,3.. to devid:1,2,3..
> > > > > 
> > > > > RFC->v1:
> > > > >    Drops pid as one of the readmirror policy choices and as usual remains
> > > > >    as default. And when the devid is reset the readmirror policy falls back
> > > > >    to pid.
> > > > >    Drops the mount -o readmirror idea, it can be added at a later point of
> > > > >    time.
> > > > >    Property now accepts more than 1 devid as readmirror device. As shown
> > > > >    in the example above.
> > > > > 
> > > > 
> > > > This is a lot of infrastructure
> 
>   Ok. Any idea on a better implementation?
>   How about extended attribute approach? v1 patches proposed
>   it, but it abused the extended attribute as commented here [1]
>   and v2 got changed to an item-key.
> 
> [1]
> https://lore.kernel.org/linux-btrfs/be68e6ea-00bc-b750-25e1-9c584b99308f@gmx.com/
>

That's a NAK on the prop interface.  This is a fs wide policy, not a
directory/inode policy.
 
> 
> > > > to just change which mirror we read to based on
> > > > some arbitrary user policy.  I assume this is to solve the case where you have
> > > > slow and fast disks, so you can always read from the fast disk?  And then it's
> > > > only used in RAID1, so the very narrow usecase of having a RAID1 setup with a
> > > > SSD and a normal disk?  I'm not seeing a point to this much code for one
> > > > particular obscure setup.  Thanks,
> > > > 
> > > > Josef
> > > 
> > > Not commenting on the code itself, but as a user I see this SSD RAID1
> > > acceleration as a future much have feature. It's only obscure at the
> > > moment because we don't have code to take advantage of it. But on
> > > large btrfs filesystems with hundreds of GB of metadata, like I have
> > > for backups, the usability of the filesystem is dramatically improved
> > > having the metadata on an SSD( though currently only half of the time
> > > due to the even/odd pid distribution.)
> > 
> > But that's different from a mirror.  100% it would be nice to say "put my
> > metadata on the ssd, data elsewhere".  That's not what this patch is about, this
> > patch is specifically about changing which drive we choose in a mirrored setup,
> > which is super unlikely to mirror a SSD with a slow drive, cause it's just going
> > to be slow no matter what.  Sure we could make it so reads always go to the SSD,
> > but we can accomplish that by just adding a check for nonrotational in the code,
> > and then we don't have to encode all this nonsense in the file system.  Thanks,
> 
>  I wrote about the readmirror policy framework here[2],
>  I forgot to link it here, sorry about that, my mistake.
> 
>  [2]
> 
> https://lore.kernel.org/linux-btrfs/1552989624-29577-1-git-send-email-anand.jain@oracle.com/
> 
>  Readmirror policy is for raid1, raid10 and future N way mirror.
>  Yes for now its only for raid1.
> 
>  Here the idea is to create a framework so that readmirror policy
>  can be configured as needed. And nonrotational can be one such policy.
> 
>  The example of hard-coded nonrotational policy does not work in case
>  of ssd and a remote iscsi ssd, OR in case of local ssd and a NVME block
>  device, as all these are still nonrotational devices. So hard-coded
>  policy is not a good idea. If we have to hardcode then there is Q-depth
>  based readmirror routing is better (patch in the ML), but that is
>  not good enough, because some configs wants it based on the disk-LBA
>  so that SAN storage target cache is balanced and not duplicated.
>  So in short it must be a configurable policy.
> 

Again, if you are mixing disk types you likely always want non-rotational, but
still mixing different speed devices in a mirror setup is just asking for weird
latency problems.  I don't think solving this use case is necessary.  If you mix
ssd + network device in a serious production setup then you probably should be
fired cause you don't know what you are doing.  Having the generic
"nonrotational gets priority" is going to cover 99% of the actual use cases that
make sense.

The SAN usecase I can sort of see, but again I don't feel like it's a problem we
need to solve with on-disk format.  Add a priority to sysfs so you can change it
with udev or something on the fly.  Thanks,

Josef

  reply	other threads:[~2019-09-12  9:50 UTC|newest]

Thread overview: 17+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-08-26  9:04 [PATCH RFC v2 0/2] readmirror feature Anand Jain
2019-08-26  9:04 ` [PATCH RFC v2] btrfs: add readmirror framework and policy devid Anand Jain
2019-08-26  9:04 ` [PATCH RFC v2] btrfs-progs: add readmirror property and ioctl to set get readmirror Anand Jain
2019-08-29  3:39   ` [PATCH RFC v2.1] btrfs-progs: add readmirror property and ioctl to set get Anand Jain
2019-09-11 16:37 ` [PATCH RFC v2 0/2] readmirror feature David Sterba
2019-09-12  7:48   ` Anand Jain
2019-09-11 18:42 ` Josef Bacik
2019-09-11 19:13   ` Eli V
2019-09-11 19:16     ` Josef Bacik
2019-09-12  7:41       ` Anand Jain
2019-09-12  9:50         ` Josef Bacik [this message]
2019-09-12 10:00           ` Anand Jain
2019-09-12 10:03             ` Josef Bacik
2019-09-12 10:10               ` Anand Jain
2019-09-12 10:13                 ` Josef Bacik
2019-09-16  8:19                   ` Anand Jain
2019-09-24 14:27                     ` Anand Jain

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20190912095021.htmpvvowdprc2jhv@MacBook-Pro-91.local \
    --to=josef@toxicpanda.com \
    --cc=anand.jain@oracle.com \
    --cc=eliventer@gmail.com \
    --cc=linux-btrfs@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).