linux-btrfs.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Josef Bacik <josef@toxicpanda.com>
To: Anand Jain <anand.jain@oracle.com>
Cc: Josef Bacik <josef@toxicpanda.com>, Eli V <eliventer@gmail.com>,
	linux-btrfs <linux-btrfs@vger.kernel.org>
Subject: Re: [PATCH RFC v2 0/2] readmirror feature
Date: Thu, 12 Sep 2019 06:03:15 -0400	[thread overview]
Message-ID: <20190912100313.kjdatocumj6bbe7x@MacBook-Pro-91.local> (raw)
In-Reply-To: <B10B8AC4-5BDB-40B0-B76C-44B22BBF3095@oracle.com>

On Thu, Sep 12, 2019 at 06:00:21PM +0800, Anand Jain wrote:
> 
> 
> > On 12 Sep 2019, at 5:50 PM, Josef Bacik <josef@toxicpanda.com> wrote:
> > 
> > On Thu, Sep 12, 2019 at 03:41:42PM +0800, Anand Jain wrote:
> >> 
> >> 
> >> Thanks for the comments. More below.
> >> 
> >> On 12/9/19 3:16 AM, Josef Bacik wrote:
> >>> On Wed, Sep 11, 2019 at 03:13:21PM -0400, Eli V wrote:
> >>>> On Wed, Sep 11, 2019 at 2:46 PM Josef Bacik <josef@toxicpanda.com> wrote:
> >>>>> 
> >>>>> On Mon, Aug 26, 2019 at 05:04:36PM +0800, Anand Jain wrote:
> >>>>>> Function call chain  __btrfs_map_block()->find_live_mirror() uses
> >>>>>> thread pid to determine the %mirror_num when the mirror_num=0.
> >>>>>> 
> >>>>>> This patch introduces a framework so that we can add policies to determine
> >>>>>> the %mirror_num. And also adds the devid as the readmirror policy.
> >>>>>> 
> >>>>>> The new property is stored as an item in the device tree as show below.
> >>>>>>     (BTRFS_READMIRROR_OBJECTID, BTRFS_PERSISTENT_ITEM_KEY, devid)
> >>>>>> 
> >>>>>> To be able to set and get this new property also introduces new ioctls
> >>>>>> BTRFS_IOC_GET_READMIRROR and BTRFS_IOC_SET_READMIRROR. The ioctl argument
> >>>>>> is defined as
> >>>>>>         struct btrfs_ioctl_readmirror_args {
> >>>>>>                 __u64 type; /* RW */
> >>>>>>                 __u64 device_bitmap; /* RW */
> >>>>>>         }
> >>>>>> 
> >>>>>> An usage example as follows:
> >>>>>>         btrfs property set /btrfs readmirror devid:1,3
> >>>>>>         btrfs property get /btrfs readmirror
> >>>>>>           readmirror devid:1 3
> >>>>>>         btrfs property set /btrfs readmirror ""
> >>>>>>         btrfs property get /btrfs readmirror
> >>>>>>           readmirror default
> >>>>>> 
> >>>>>> This patchset has been tested completely, however marked as RFC for the
> >>>>>> following reasons and comments on them (or any other) are appreciated as
> >>>>>> usual.
> >>>>>>  . The new objectid is defined as
> >>>>>>       #define BTRFS_READMIRROR_OBJECTID -1ULL
> >>>>>>    Need consent we are fine to use this value, and with this value it
> >>>>>>    shall be placed just before the DEV_STATS_OBJECTID item which is more
> >>>>>>    frequently used only during the device errors.
> >>>>>> 
> >>>>>> .  I am using a u64 bitmap to represent the devices id, so the max device
> >>>>>>    id that we could represent is 63, its a kind of limitation which should
> >>>>>>    be addressed before integration, I wonder if there is any suggestion?
> >>>>>>    Kindly note that, multiple ioctls with each time representing a set of
> >>>>>>    device(s) is not a choice because we need to make sure the readmirror
> >>>>>>    changes happens in a commit transaction.
> >>>>>> 
> >>>>>> v1->RFC v2:
> >>>>>>   . Property is stored as a dev-tree item instead of root inode extended
> >>>>>>     attribute.
> >>>>>>   . Rename BTRFS_DEV_STATE_READ_OPRIMIZED to BTRFS_DEV_STATE_READ_PREFERRED.
> >>>>>>   . Changed format specifier from devid1,2,3.. to devid:1,2,3..
> >>>>>> 
> >>>>>> RFC->v1:
> >>>>>>   Drops pid as one of the readmirror policy choices and as usual remains
> >>>>>>   as default. And when the devid is reset the readmirror policy falls back
> >>>>>>   to pid.
> >>>>>>   Drops the mount -o readmirror idea, it can be added at a later point of
> >>>>>>   time.
> >>>>>>   Property now accepts more than 1 devid as readmirror device. As shown
> >>>>>>   in the example above.
> >>>>>> 
> >>>>> 
> >>>>> This is a lot of infrastructure
> >> 
> >>  Ok. Any idea on a better implementation?
> >>  How about extended attribute approach? v1 patches proposed
> >>  it, but it abused the extended attribute as commented here [1]
> >>  and v2 got changed to an item-key.
> >> 
> >> [1]
> >> https://lore.kernel.org/linux-btrfs/be68e6ea-00bc-b750-25e1-9c584b99308f@gmx.com/
> >> 
> > 
> > That's a NAK on the prop interface.  This is a fs wide policy, not a
> > directory/inode policy.
> > 
> >> 
> >>>>> to just change which mirror we read to based on
> >>>>> some arbitrary user policy.  I assume this is to solve the case where you have
> >>>>> slow and fast disks, so you can always read from the fast disk?  And then it's
> >>>>> only used in RAID1, so the very narrow usecase of having a RAID1 setup with a
> >>>>> SSD and a normal disk?  I'm not seeing a point to this much code for one
> >>>>> particular obscure setup.  Thanks,
> >>>>> 
> >>>>> Josef
> >>>> 
> >>>> Not commenting on the code itself, but as a user I see this SSD RAID1
> >>>> acceleration as a future much have feature. It's only obscure at the
> >>>> moment because we don't have code to take advantage of it. But on
> >>>> large btrfs filesystems with hundreds of GB of metadata, like I have
> >>>> for backups, the usability of the filesystem is dramatically improved
> >>>> having the metadata on an SSD( though currently only half of the time
> >>>> due to the even/odd pid distribution.)
> >>> 
> >>> But that's different from a mirror.  100% it would be nice to say "put my
> >>> metadata on the ssd, data elsewhere".  That's not what this patch is about, this
> >>> patch is specifically about changing which drive we choose in a mirrored setup,
> >>> which is super unlikely to mirror a SSD with a slow drive, cause it's just going
> >>> to be slow no matter what.  Sure we could make it so reads always go to the SSD,
> >>> but we can accomplish that by just adding a check for nonrotational in the code,
> >>> and then we don't have to encode all this nonsense in the file system.  Thanks,
> >> 
> >> I wrote about the readmirror policy framework here[2],
> >> I forgot to link it here, sorry about that, my mistake.
> >> 
> >> [2]
> >> 
> >> https://lore.kernel.org/linux-btrfs/1552989624-29577-1-git-send-email-anand.jain@oracle.com/
> >> 
> >> Readmirror policy is for raid1, raid10 and future N way mirror.
> >> Yes for now its only for raid1.
> >> 
> >> Here the idea is to create a framework so that readmirror policy
> >> can be configured as needed. And nonrotational can be one such policy.
> >> 
> >> The example of hard-coded nonrotational policy does not work in case
> >> of ssd and a remote iscsi ssd, OR in case of local ssd and a NVME block
> >> device, as all these are still nonrotational devices. So hard-coded
> >> policy is not a good idea. If we have to hardcode then there is Q-depth
> >> based readmirror routing is better (patch in the ML), but that is
> >> not good enough, because some configs wants it based on the disk-LBA
> >> so that SAN storage target cache is balanced and not duplicated.
> >> So in short it must be a configurable policy.
> >> 
> > 
> > Again, if you are mixing disk types you likely always want non-rotational, but
> > still mixing different speed devices in a mirror setup is just asking for weird
> > latency problems.  I don't think solving this use case is necessary.  If you mix
> > ssd + network device in a serious production setup then you probably should be
> > fired cause you don't know what you are doing.  Having the generic
> > "nonrotational gets priority" is going to cover 99% of the actual use cases that
> > make sense.
> > 
> > The SAN usecase I can sort of see, but again I don't feel like it's a problem we
> > need to solve with on-disk format.  Add a priority to sysfs so you can change it
> > with udev or something on the fly.  Thanks,
> > 
>  
>  Ok.
>  Sysfs is fine however we need configuration to be persistent across reboots.
>  Any idea?
>

Udev rules.  Thanks,

Josef 

  reply	other threads:[~2019-09-12 10:03 UTC|newest]

Thread overview: 17+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-08-26  9:04 [PATCH RFC v2 0/2] readmirror feature Anand Jain
2019-08-26  9:04 ` [PATCH RFC v2] btrfs: add readmirror framework and policy devid Anand Jain
2019-08-26  9:04 ` [PATCH RFC v2] btrfs-progs: add readmirror property and ioctl to set get readmirror Anand Jain
2019-08-29  3:39   ` [PATCH RFC v2.1] btrfs-progs: add readmirror property and ioctl to set get Anand Jain
2019-09-11 16:37 ` [PATCH RFC v2 0/2] readmirror feature David Sterba
2019-09-12  7:48   ` Anand Jain
2019-09-11 18:42 ` Josef Bacik
2019-09-11 19:13   ` Eli V
2019-09-11 19:16     ` Josef Bacik
2019-09-12  7:41       ` Anand Jain
2019-09-12  9:50         ` Josef Bacik
2019-09-12 10:00           ` Anand Jain
2019-09-12 10:03             ` Josef Bacik [this message]
2019-09-12 10:10               ` Anand Jain
2019-09-12 10:13                 ` Josef Bacik
2019-09-16  8:19                   ` Anand Jain
2019-09-24 14:27                     ` Anand Jain

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20190912100313.kjdatocumj6bbe7x@MacBook-Pro-91.local \
    --to=josef@toxicpanda.com \
    --cc=anand.jain@oracle.com \
    --cc=eliventer@gmail.com \
    --cc=linux-btrfs@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).