From: Anand Jain <anand.jain@oracle.com>
To: Josef Bacik <josef@toxicpanda.com>
Cc: Anand Jain <anand.jain@oracle.com>, Eli V <eliventer@gmail.com>,
linux-btrfs <linux-btrfs@vger.kernel.org>
Subject: Re: [PATCH RFC v2 0/2] readmirror feature
Date: Thu, 12 Sep 2019 18:10:08 +0800 [thread overview]
Message-ID: <7ECB777E-BA58-46A0-925F-2B0AB9030288@oracle.com> (raw)
In-Reply-To: <20190912100313.kjdatocumj6bbe7x@MacBook-Pro-91.local>
> On 12 Sep 2019, at 6:03 PM, Josef Bacik <josef@toxicpanda.com> wrote:
>
> On Thu, Sep 12, 2019 at 06:00:21PM +0800, Anand Jain wrote:
>>
>>
>>> On 12 Sep 2019, at 5:50 PM, Josef Bacik <josef@toxicpanda.com> wrote:
>>>
>>> On Thu, Sep 12, 2019 at 03:41:42PM +0800, Anand Jain wrote:
>>>>
>>>>
>>>> Thanks for the comments. More below.
>>>>
>>>> On 12/9/19 3:16 AM, Josef Bacik wrote:
>>>>> On Wed, Sep 11, 2019 at 03:13:21PM -0400, Eli V wrote:
>>>>>> On Wed, Sep 11, 2019 at 2:46 PM Josef Bacik <josef@toxicpanda.com> wrote:
>>>>>>>
>>>>>>> On Mon, Aug 26, 2019 at 05:04:36PM +0800, Anand Jain wrote:
>>>>>>>> Function call chain __btrfs_map_block()->find_live_mirror() uses
>>>>>>>> thread pid to determine the %mirror_num when the mirror_num=0.
>>>>>>>>
>>>>>>>> This patch introduces a framework so that we can add policies to determine
>>>>>>>> the %mirror_num. And also adds the devid as the readmirror policy.
>>>>>>>>
>>>>>>>> The new property is stored as an item in the device tree as show below.
>>>>>>>> (BTRFS_READMIRROR_OBJECTID, BTRFS_PERSISTENT_ITEM_KEY, devid)
>>>>>>>>
>>>>>>>> To be able to set and get this new property also introduces new ioctls
>>>>>>>> BTRFS_IOC_GET_READMIRROR and BTRFS_IOC_SET_READMIRROR. The ioctl argument
>>>>>>>> is defined as
>>>>>>>> struct btrfs_ioctl_readmirror_args {
>>>>>>>> __u64 type; /* RW */
>>>>>>>> __u64 device_bitmap; /* RW */
>>>>>>>> }
>>>>>>>>
>>>>>>>> An usage example as follows:
>>>>>>>> btrfs property set /btrfs readmirror devid:1,3
>>>>>>>> btrfs property get /btrfs readmirror
>>>>>>>> readmirror devid:1 3
>>>>>>>> btrfs property set /btrfs readmirror ""
>>>>>>>> btrfs property get /btrfs readmirror
>>>>>>>> readmirror default
>>>>>>>>
>>>>>>>> This patchset has been tested completely, however marked as RFC for the
>>>>>>>> following reasons and comments on them (or any other) are appreciated as
>>>>>>>> usual.
>>>>>>>> . The new objectid is defined as
>>>>>>>> #define BTRFS_READMIRROR_OBJECTID -1ULL
>>>>>>>> Need consent we are fine to use this value, and with this value it
>>>>>>>> shall be placed just before the DEV_STATS_OBJECTID item which is more
>>>>>>>> frequently used only during the device errors.
>>>>>>>>
>>>>>>>> . I am using a u64 bitmap to represent the devices id, so the max device
>>>>>>>> id that we could represent is 63, its a kind of limitation which should
>>>>>>>> be addressed before integration, I wonder if there is any suggestion?
>>>>>>>> Kindly note that, multiple ioctls with each time representing a set of
>>>>>>>> device(s) is not a choice because we need to make sure the readmirror
>>>>>>>> changes happens in a commit transaction.
>>>>>>>>
>>>>>>>> v1->RFC v2:
>>>>>>>> . Property is stored as a dev-tree item instead of root inode extended
>>>>>>>> attribute.
>>>>>>>> . Rename BTRFS_DEV_STATE_READ_OPRIMIZED to BTRFS_DEV_STATE_READ_PREFERRED.
>>>>>>>> . Changed format specifier from devid1,2,3.. to devid:1,2,3..
>>>>>>>>
>>>>>>>> RFC->v1:
>>>>>>>> Drops pid as one of the readmirror policy choices and as usual remains
>>>>>>>> as default. And when the devid is reset the readmirror policy falls back
>>>>>>>> to pid.
>>>>>>>> Drops the mount -o readmirror idea, it can be added at a later point of
>>>>>>>> time.
>>>>>>>> Property now accepts more than 1 devid as readmirror device. As shown
>>>>>>>> in the example above.
>>>>>>>>
>>>>>>>
>>>>>>> This is a lot of infrastructure
>>>>
>>>> Ok. Any idea on a better implementation?
>>>> How about extended attribute approach? v1 patches proposed
>>>> it, but it abused the extended attribute as commented here [1]
>>>> and v2 got changed to an item-key.
>>>>
>>>> [1]
>>>> https://lore.kernel.org/linux-btrfs/be68e6ea-00bc-b750-25e1-9c584b99308f@gmx.com/
>>>>
>>>
>>> That's a NAK on the prop interface. This is a fs wide policy, not a
>>> directory/inode policy.
>>>
>>>>
>>>>>>> to just change which mirror we read to based on
>>>>>>> some arbitrary user policy. I assume this is to solve the case where you have
>>>>>>> slow and fast disks, so you can always read from the fast disk? And then it's
>>>>>>> only used in RAID1, so the very narrow usecase of having a RAID1 setup with a
>>>>>>> SSD and a normal disk? I'm not seeing a point to this much code for one
>>>>>>> particular obscure setup. Thanks,
>>>>>>>
>>>>>>> Josef
>>>>>>
>>>>>> Not commenting on the code itself, but as a user I see this SSD RAID1
>>>>>> acceleration as a future much have feature. It's only obscure at the
>>>>>> moment because we don't have code to take advantage of it. But on
>>>>>> large btrfs filesystems with hundreds of GB of metadata, like I have
>>>>>> for backups, the usability of the filesystem is dramatically improved
>>>>>> having the metadata on an SSD( though currently only half of the time
>>>>>> due to the even/odd pid distribution.)
>>>>>
>>>>> But that's different from a mirror. 100% it would be nice to say "put my
>>>>> metadata on the ssd, data elsewhere". That's not what this patch is about, this
>>>>> patch is specifically about changing which drive we choose in a mirrored setup,
>>>>> which is super unlikely to mirror a SSD with a slow drive, cause it's just going
>>>>> to be slow no matter what. Sure we could make it so reads always go to the SSD,
>>>>> but we can accomplish that by just adding a check for nonrotational in the code,
>>>>> and then we don't have to encode all this nonsense in the file system. Thanks,
>>>>
>>>> I wrote about the readmirror policy framework here[2],
>>>> I forgot to link it here, sorry about that, my mistake.
>>>>
>>>> [2]
>>>>
>>>> https://lore.kernel.org/linux-btrfs/1552989624-29577-1-git-send-email-anand.jain@oracle.com/
>>>>
>>>> Readmirror policy is for raid1, raid10 and future N way mirror.
>>>> Yes for now its only for raid1.
>>>>
>>>> Here the idea is to create a framework so that readmirror policy
>>>> can be configured as needed. And nonrotational can be one such policy.
>>>>
>>>> The example of hard-coded nonrotational policy does not work in case
>>>> of ssd and a remote iscsi ssd, OR in case of local ssd and a NVME block
>>>> device, as all these are still nonrotational devices. So hard-coded
>>>> policy is not a good idea. If we have to hardcode then there is Q-depth
>>>> based readmirror routing is better (patch in the ML), but that is
>>>> not good enough, because some configs wants it based on the disk-LBA
>>>> so that SAN storage target cache is balanced and not duplicated.
>>>> So in short it must be a configurable policy.
>>>>
>>>
>>> Again, if you are mixing disk types you likely always want non-rotational, but
>>> still mixing different speed devices in a mirror setup is just asking for weird
>>> latency problems. I don't think solving this use case is necessary. If you mix
>>> ssd + network device in a serious production setup then you probably should be
>>> fired cause you don't know what you are doing. Having the generic
>>> "nonrotational gets priority" is going to cover 99% of the actual use cases that
>>> make sense.
>>>
>>> The SAN usecase I can sort of see, but again I don't feel like it's a problem we
>>> need to solve with on-disk format. Add a priority to sysfs so you can change it
>>> with udev or something on the fly. Thanks,
>>>
>>
>> Ok.
>> Sysfs is fine however we need configuration to be persistent across reboots.
>> Any idea?
>>
>
> Udev rules. Thanks,
>
Josef, configs moving along with the luns in san seems to be more
easy to manage, otherwise when the host fails, each potential new
server has to be pre-configured with the udev rules.
Thanks, Anand
> Josef
next prev parent reply other threads:[~2019-09-12 10:10 UTC|newest]
Thread overview: 17+ messages / expand[flat|nested] mbox.gz Atom feed top
2019-08-26 9:04 [PATCH RFC v2 0/2] readmirror feature Anand Jain
2019-08-26 9:04 ` [PATCH RFC v2] btrfs: add readmirror framework and policy devid Anand Jain
2019-08-26 9:04 ` [PATCH RFC v2] btrfs-progs: add readmirror property and ioctl to set get readmirror Anand Jain
2019-08-29 3:39 ` [PATCH RFC v2.1] btrfs-progs: add readmirror property and ioctl to set get Anand Jain
2019-09-11 16:37 ` [PATCH RFC v2 0/2] readmirror feature David Sterba
2019-09-12 7:48 ` Anand Jain
2019-09-11 18:42 ` Josef Bacik
2019-09-11 19:13 ` Eli V
2019-09-11 19:16 ` Josef Bacik
2019-09-12 7:41 ` Anand Jain
2019-09-12 9:50 ` Josef Bacik
2019-09-12 10:00 ` Anand Jain
2019-09-12 10:03 ` Josef Bacik
2019-09-12 10:10 ` Anand Jain [this message]
2019-09-12 10:13 ` Josef Bacik
2019-09-16 8:19 ` Anand Jain
2019-09-24 14:27 ` Anand Jain
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=7ECB777E-BA58-46A0-925F-2B0AB9030288@oracle.com \
--to=anand.jain@oracle.com \
--cc=eliventer@gmail.com \
--cc=josef@toxicpanda.com \
--cc=linux-btrfs@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).