* [PATCH 1/2] rbd: RBD_DEV_FLAG_THICK rbd_dev_flags bit @ 2018-03-19 12:01 KAMEI Hitoshi [not found] ` <5AB0E4F3.8090108@easystack.cn> 0 siblings, 1 reply; 8+ messages in thread From: KAMEI Hitoshi @ 2018-03-19 12:01 UTC (permalink / raw) To: idryomov, sage, elder, ceph-devel; +Cc: linux-kernel This patch adds a user interface to prevent from issuing discard requests onto the specified rbd device. This is needed for thick-provisioned images. To avoid discarding allocated blocks on thick-provision image, users can specify this flag bit via sysfs (/sys/bus/rbd/devices/<dev-id>/thick). When users write "1" to the file, rbd doesn't issue discard operation; meanwhile, if users write "0" to the file then rbd issues discard operation. Signed-off-by: Hitoshi Kamei <hitoshi.kamei.xm@hitachi.com> Cc: Mitsuo Hayasaka <mitsuo.hayasaka.hu@hitachi.com> --- drivers/block/rbd.c | 38 ++++++++++++++++++++++++++++++++++++++ 1 file changed, 38 insertions(+) diff --git a/drivers/block/rbd.c b/drivers/block/rbd.c index 8e40da093766..dba60ef43a47 100644 --- a/drivers/block/rbd.c +++ b/drivers/block/rbd.c @@ -427,6 +427,7 @@ enum rbd_dev_flags { RBD_DEV_FLAG_EXISTS, /* mapped snapshot has not been deleted */ RBD_DEV_FLAG_REMOVING, /* this mapping is being removed */ RBD_DEV_FLAG_BLACKLISTED, /* our ceph_client is blacklisted */ + RBD_DEV_FLAG_THICK, /* image is thick-provisioned */ }; static DEFINE_MUTEX(client_mutex); /* Serialize client creation */ @@ -4011,6 +4012,15 @@ static void rbd_queue_workfn(struct work_struct *work) goto err; } + /* Ignore/skip discard requests for thick-provision image */ + + if (op_type == OBJ_OP_DISCARD && + test_bit(RBD_DEV_FLAG_THICK, &rbd_dev->flags)) { + dout("%s: Ignored a discard request\n", __func__); + result = 0; + goto err_rq; + } + /* Ignore/skip any zero-length requests */ if (!length) { @@ -4600,6 +4610,32 @@ static ssize_t rbd_image_refresh(struct device *dev, return size; } +static ssize_t rbd_thick_store(struct device *dev, + struct device_attribute *attr, + const char *buf, + size_t size) +{ + struct rbd_device *rbd_dev = dev_to_rbd_dev(dev); + + spin_lock_irq(&rbd_dev->lock); + if (!strncmp(buf, "1", size)) + set_bit(RBD_DEV_FLAG_THICK, &rbd_dev->flags); + else if (!strncmp(buf, "0", size)) + clear_bit(RBD_DEV_FLAG_THICK, &rbd_dev->flags); + spin_unlock_irq(&rbd_dev->lock); + + return size; +} + +static ssize_t rbd_thick_show(struct device *dev, + struct device_attribute *attr, char *buf) +{ + struct rbd_device *rbd_dev = dev_to_rbd_dev(dev); + int is_thick = test_bit(RBD_DEV_FLAG_THICK, &rbd_dev->flags) ? 1 : 0; + + return sprintf(buf, "%d\n", is_thick); +} + static DEVICE_ATTR(size, S_IRUGO, rbd_size_show, NULL); static DEVICE_ATTR(features, S_IRUGO, rbd_features_show, NULL); static DEVICE_ATTR(major, S_IRUGO, rbd_major_show, NULL); @@ -4616,6 +4652,7 @@ static DEVICE_ATTR(refresh, S_IWUSR, NULL, rbd_image_refresh); static DEVICE_ATTR(current_snap, S_IRUGO, rbd_snap_show, NULL); static DEVICE_ATTR(snap_id, S_IRUGO, rbd_snap_id_show, NULL); static DEVICE_ATTR(parent, S_IRUGO, rbd_parent_show, NULL); +static DEVICE_ATTR(thick, 0644, rbd_thick_show, rbd_thick_store); static struct attribute *rbd_attrs[] = { &dev_attr_size.attr, @@ -4634,6 +4671,7 @@ static struct attribute *rbd_attrs[] = { &dev_attr_snap_id.attr, &dev_attr_parent.attr, &dev_attr_refresh.attr, + &dev_attr_thick.attr, NULL }; -- 2.15.1 ^ permalink raw reply related [flat|nested] 8+ messages in thread
[parent not found: <5AB0E4F3.8090108@easystack.cn>]
* RE: Re: [PATCH 1/2] rbd: RBD_DEV_FLAG_THICK rbd_dev_flags bit [not found] ` <5AB0E4F3.8090108@easystack.cn> @ 2018-03-22 11:57 ` 亀井仁志 / KAMEI,HITOSHI 2018-03-23 9:31 ` Ilya Dryomov 0 siblings, 1 reply; 8+ messages in thread From: 亀井仁志 / KAMEI,HITOSHI @ 2018-03-22 11:57 UTC (permalink / raw) To: Dongsheng Yang, idryomov, sage, elder, ceph-devel; +Cc: linux-kernel Hi Yang, > I am not sure is this the best way for this case, what about adding an option in "rbd map -o thick rbd/test"? I will add such option to the rbd map command to manipulate image settings. So, the end-user do not change the settings directly via sysfs file. > @@ -4011,6 +4012,15 @@ static void rbd_queue_workfn(struct work_struct *work) > goto err; > } > > + /* Ignore/skip discard requests for thick-provision image */ > > Just ignore? or return -EOPNOTSUPP? Thanks, I think -EOPNOTSUPP is better because user programs cannot know the result of requested operation when the kernel rbd driver ignores discard request. The result of requested operation when the kernel rbd driver ignores discard requests, which probably misleads the user programs. > In addition, we should not ignore the REQ_OP_WRITE_ZEROES. Relating to the above, the return code of REQ_OP_WRITE_ZEROS request is also -EOPNOTSUPP instead of ignoring. I think the result of -EOPNOTSUPP is also better for this request because the kernel rbd driver can expect that user programs write zero data by itself. > + spin_lock_irq(&rbd_dev->lock); > + if (!strncmp(buf, "1", size)) > + set_bit(RBD_DEV_FLAG_THICK, &rbd_dev->flags); > + else if (!strncmp(buf, "0", size)) > + clear_bit(RBD_DEV_FLAG_THICK, &rbd_dev->flags); > else ? I will add the warning message in else case. > static DEVICE_ATTR(snap_id, S_IRUGO, rbd_snap_id_show, NULL); > static DEVICE_ATTR(parent, S_IRUGO, rbd_parent_show, NULL); > +static DEVICE_ATTR(thick, 0644, rbd_thick_show, rbd_thick_store); > maybe S_IRUGO | S_IWUGO? I will change the code by using the flags. Regards, -- Hitoshi Kamei ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: Re: [PATCH 1/2] rbd: RBD_DEV_FLAG_THICK rbd_dev_flags bit 2018-03-22 11:57 ` 亀井仁志 / KAMEI,HITOSHI @ 2018-03-23 9:31 ` Ilya Dryomov 2018-03-23 9:34 ` Ilya Dryomov 0 siblings, 1 reply; 8+ messages in thread From: Ilya Dryomov @ 2018-03-23 9:31 UTC (permalink / raw) To: 亀井仁志 / KAMEI,HITOSHI Cc: Dongsheng Yang, sage, elder, ceph-devel, linux-kernel On Thu, Mar 22, 2018 at 12:57 PM, 亀井仁志 / KAMEI,HITOSHI <hitoshi.kamei.xm@hitachi.com> wrote: > Hi Yang, > >> I am not sure is this the best way for this case, what about adding an option in "rbd map -o thick rbd/test"? > > I will add such option to the rbd map command to manipulate image settings. So, the end-user > do not change the settings directly via sysfs file. > >> @@ -4011,6 +4012,15 @@ static void rbd_queue_workfn(struct work_struct *work) >> goto err; >> } >> >> + /* Ignore/skip discard requests for thick-provision image */ >> >> Just ignore? or return -EOPNOTSUPP? > > Thanks, I think -EOPNOTSUPP is better because user programs cannot know > the result of requested operation when the kernel rbd driver ignores > discard request. The result of requested operation when the kernel rbd driver > ignores discard requests, which probably misleads the user programs. > >> In addition, we should not ignore the REQ_OP_WRITE_ZEROES. > > Relating to the above, the return code of REQ_OP_WRITE_ZEROS request > is also -EOPNOTSUPP instead of ignoring. I think the result of > -EOPNOTSUPP is also better for this request because the kernel > rbd driver can expect that user programs write zero data by itself. REQ_OP_WRITE_ZEROES should continue to work, we just need to make sure it never issues truncates or deletes and instead writes zeroes explicitly. I think we should be explicit about the fact that discard is not supported instead of accepting the discard request and failing it in rbd_queue_workfn(). Attached patch is what I have in mind. Thanks, Ilya ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: Re: [PATCH 1/2] rbd: RBD_DEV_FLAG_THICK rbd_dev_flags bit 2018-03-23 9:31 ` Ilya Dryomov @ 2018-03-23 9:34 ` Ilya Dryomov 2018-03-26 12:31 ` 亀井仁志 / KAMEI,HITOSHI 0 siblings, 1 reply; 8+ messages in thread From: Ilya Dryomov @ 2018-03-23 9:34 UTC (permalink / raw) To: 亀井仁志 / KAMEI,HITOSHI Cc: Dongsheng Yang, sage, elder, ceph-devel, linux-kernel [-- Attachment #1: Type: text/plain, Size: 1837 bytes --] On Fri, Mar 23, 2018 at 10:31 AM, Ilya Dryomov <idryomov@gmail.com> wrote: > On Thu, Mar 22, 2018 at 12:57 PM, 亀井仁志 / KAMEI,HITOSHI > <hitoshi.kamei.xm@hitachi.com> wrote: >> Hi Yang, >> >>> I am not sure is this the best way for this case, what about adding an option in "rbd map -o thick rbd/test"? >> >> I will add such option to the rbd map command to manipulate image settings. So, the end-user >> do not change the settings directly via sysfs file. >> >>> @@ -4011,6 +4012,15 @@ static void rbd_queue_workfn(struct work_struct *work) >>> goto err; >>> } >>> >>> + /* Ignore/skip discard requests for thick-provision image */ >>> >>> Just ignore? or return -EOPNOTSUPP? >> >> Thanks, I think -EOPNOTSUPP is better because user programs cannot know >> the result of requested operation when the kernel rbd driver ignores >> discard request. The result of requested operation when the kernel rbd driver >> ignores discard requests, which probably misleads the user programs. >> >>> In addition, we should not ignore the REQ_OP_WRITE_ZEROES. >> >> Relating to the above, the return code of REQ_OP_WRITE_ZEROS request >> is also -EOPNOTSUPP instead of ignoring. I think the result of >> -EOPNOTSUPP is also better for this request because the kernel >> rbd driver can expect that user programs write zero data by itself. > > REQ_OP_WRITE_ZEROES should continue to work, we just need to make > sure it never issues truncates or deletes and instead writes zeroes > explicitly. > > I think we should be explicit about the fact that discard is not > supported instead of accepting the discard request and failing it in > rbd_queue_workfn(). Attached patch is what I have in mind. Now with the patch. Thanks, Ilya [-- Attachment #2: 0001-rbd-notrim-map-option.patch --] [-- Type: text/x-patch, Size: 2883 bytes --] From 1db405366371cb12e5182815c06f6f491af4b63c Mon Sep 17 00:00:00 2001 From: Ilya Dryomov <idryomov@gmail.com> Date: Fri, 23 Mar 2018 09:14:47 +0100 Subject: [PATCH] rbd: notrim map option Add an option to turn off discard and write zeroes offload support to avoid deprovisioning a fully provisioned image. When enabled, discard requests will fail with -EOPNOTSUPP, write zeroes requests will fall back to manually zeroing. Signed-off-by: Ilya Dryomov <idryomov@gmail.com> --- drivers/block/rbd.c | 19 ++++++++++++++----- 1 file changed, 14 insertions(+), 5 deletions(-) diff --git a/drivers/block/rbd.c b/drivers/block/rbd.c index 1a90e5dd7470..dbe440a6531e 100644 --- a/drivers/block/rbd.c +++ b/drivers/block/rbd.c @@ -733,6 +733,7 @@ enum { Opt_read_write, Opt_lock_on_read, Opt_exclusive, + Opt_notrim, Opt_err }; @@ -746,6 +747,7 @@ static match_table_t rbd_opts_tokens = { {Opt_read_write, "rw"}, /* Alternate spelling */ {Opt_lock_on_read, "lock_on_read"}, {Opt_exclusive, "exclusive"}, + {Opt_notrim, "notrim"}, {Opt_err, NULL} }; @@ -754,12 +756,14 @@ struct rbd_options { bool read_only; bool lock_on_read; bool exclusive; + bool trim; }; #define RBD_QUEUE_DEPTH_DEFAULT BLKDEV_MAX_RQ #define RBD_READ_ONLY_DEFAULT false #define RBD_LOCK_ON_READ_DEFAULT false #define RBD_EXCLUSIVE_DEFAULT false +#define RBD_TRIM_DEFAULT true static int parse_rbd_opts_token(char *c, void *private) { @@ -801,6 +805,9 @@ static int parse_rbd_opts_token(char *c, void *private) case Opt_exclusive: rbd_opts->exclusive = true; break; + case Opt_notrim: + rbd_opts->trim = false; + break; default: /* libceph prints "bad option" msg */ return -EINVAL; @@ -3940,11 +3947,12 @@ static int rbd_init_disk(struct rbd_device *rbd_dev) blk_queue_io_min(q, objset_bytes); blk_queue_io_opt(q, objset_bytes); - /* enable the discard support */ - queue_flag_set_unlocked(QUEUE_FLAG_DISCARD, q); - q->limits.discard_granularity = objset_bytes; - blk_queue_max_discard_sectors(q, objset_bytes >> SECTOR_SHIFT); - blk_queue_max_write_zeroes_sectors(q, objset_bytes >> SECTOR_SHIFT); + if (rbd_dev->opts->trim) { + queue_flag_set_unlocked(QUEUE_FLAG_DISCARD, q); + q->limits.discard_granularity = objset_bytes; + blk_queue_max_discard_sectors(q, objset_bytes >> SECTOR_SHIFT); + blk_queue_max_write_zeroes_sectors(q, objset_bytes >> SECTOR_SHIFT); + } if (!ceph_test_opt(rbd_dev->rbd_client->client, NOCRC)) q->backing_dev_info->capabilities |= BDI_CAP_STABLE_WRITES; @@ -5170,6 +5178,7 @@ static int rbd_add_parse_args(const char *buf, rbd_opts->queue_depth = RBD_QUEUE_DEPTH_DEFAULT; rbd_opts->lock_on_read = RBD_LOCK_ON_READ_DEFAULT; rbd_opts->exclusive = RBD_EXCLUSIVE_DEFAULT; + rbd_opts->trim = RBD_TRIM_DEFAULT; copts = ceph_parse_options(options, mon_addrs, mon_addrs + mon_addrs_size - 1, -- 2.4.3 ^ permalink raw reply related [flat|nested] 8+ messages in thread
* RE: Re: Re: [PATCH 1/2] rbd: RBD_DEV_FLAG_THICK rbd_dev_flags bit 2018-03-23 9:34 ` Ilya Dryomov @ 2018-03-26 12:31 ` 亀井仁志 / KAMEI,HITOSHI 2018-03-26 13:24 ` Ilya Dryomov 0 siblings, 1 reply; 8+ messages in thread From: 亀井仁志 / KAMEI,HITOSHI @ 2018-03-26 12:31 UTC (permalink / raw) To: Ilya Dryomov; +Cc: Dongsheng Yang, sage, elder, ceph-devel, linux-kernel Hi Ilya, I think your patch fully completes our purpose and I confirmed that the kernel with the patch could work well by testing in my environment. I added the notrim option to rbd map command in accordance with your kernel rbd driver patch, and I pushed the patch to GitHub (https://github.com/hitoshikamei/ceph/tree/rbdmap-notrim). So, could you please merge your patch to the kernel? If your patch is merged, then my patch can work, so I will send the Pull Request. Regards, -- Hitoshi Kamei ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: Re: Re: [PATCH 1/2] rbd: RBD_DEV_FLAG_THICK rbd_dev_flags bit 2018-03-26 12:31 ` 亀井仁志 / KAMEI,HITOSHI @ 2018-03-26 13:24 ` Ilya Dryomov 2018-03-27 9:44 ` KAMEI Hitoshi 0 siblings, 1 reply; 8+ messages in thread From: Ilya Dryomov @ 2018-03-26 13:24 UTC (permalink / raw) To: 亀井仁志 / KAMEI,HITOSHI Cc: Dongsheng Yang, sage, elder, ceph-devel, linux-kernel On Mon, Mar 26, 2018 at 2:31 PM, 亀井仁志 / KAMEI,HITOSHI <hitoshi.kamei.xm@hitachi.com> wrote: > Hi Ilya, > > I think your patch fully completes our purpose and I confirmed > that the kernel with the patch could work well by testing in my environment. > > I added the notrim option to rbd map command in accordance with > your kernel rbd driver patch, and I pushed the patch to GitHub > (https://github.com/hitoshikamei/ceph/tree/rbdmap-notrim). Change the man page text to say * notrim - Turn off discard and write zeroes offload support to avoid deprovisioning a fully provisioned image (since 4.17). When enabled, discard requests will fail with -EOPNOTSUPP, write zeroes requests will fall back to manually zeroing. and merge it into the first patch. > > So, could you please merge your patch to the kernel? If your patch is merged, > then my patch can work, so I will send the Pull Request. https://github.com/ceph/ceph-client/commit/7ddd22f2dd1d82da9a4cc0c54dc10760a53964f0 You can open the PR right now, we just won't merge it until the kernel patch is in. Thanks, Ilya ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: Re: Re: Re: [PATCH 1/2] rbd: RBD_DEV_FLAG_THICK rbd_dev_flags bit 2018-03-26 13:24 ` Ilya Dryomov @ 2018-03-27 9:44 ` KAMEI Hitoshi 2018-03-27 16:27 ` Ilya Dryomov 0 siblings, 1 reply; 8+ messages in thread From: KAMEI Hitoshi @ 2018-03-27 9:44 UTC (permalink / raw) To: Ilya Dryomov; +Cc: Dongsheng Yang, sage, elder, ceph-devel, linux-kernel Hi Ilya, Thank you for reviewing. I merged two patches into one patch and pushed the new patch to GitHub. And I opened the PR. Could you check the PR? And I forgot to mention in previous email that I modified your patch to apply the latest kernel because I couldn't apply your patch directly by using git am command due to a little bit problem, and I tested the kernel with the modified patch and rbd map command with notrim option. The code described below is the modified patch by my hand. Regards, --- From here diff --git a/drivers/block/rbd.c b/drivers/block/rbd.c index 8e40da093766..c62e3788858c 100644 --- a/drivers/block/rbd.c +++ b/drivers/block/rbd.c @@ -775,6 +775,7 @@ enum { Opt_read_write, Opt_lock_on_read, Opt_exclusive, + Opt_notrim, Opt_err }; @@ -788,6 +789,7 @@ static match_table_t rbd_opts_tokens = { {Opt_read_write, "rw"}, /* Alternate spelling */ {Opt_lock_on_read, "lock_on_read"}, {Opt_exclusive, "exclusive"}, + {Opt_notrim, "notrim"}, {Opt_err, NULL} }; @@ -796,12 +798,14 @@ struct rbd_options { bool read_only; bool lock_on_read; bool exclusive; + bool trim; }; #define RBD_QUEUE_DEPTH_DEFAULT BLKDEV_MAX_RQ #define RBD_READ_ONLY_DEFAULT false #define RBD_LOCK_ON_READ_DEFAULT false #define RBD_EXCLUSIVE_DEFAULT false +#define RBD_TRIM_DEFAULT true static int parse_rbd_opts_token(char *c, void *private) { @@ -843,6 +847,9 @@ static int parse_rbd_opts_token(char *c, void *private) case Opt_exclusive: rbd_opts->exclusive = true; break; + case Opt_notrim: + rbd_opts->trim = false; + break; default: /* libceph prints "bad option" msg */ return -EINVAL; @@ -4382,11 +4389,12 @@ static int rbd_init_disk(struct rbd_device *rbd_dev) blk_queue_io_min(q, segment_size); blk_queue_io_opt(q, segment_size); - /* enable the discard support */ - queue_flag_set_unlocked(QUEUE_FLAG_DISCARD, q); - q->limits.discard_granularity = segment_size; - blk_queue_max_discard_sectors(q, segment_size / SECTOR_SIZE); - blk_queue_max_write_zeroes_sectors(q, segment_size / SECTOR_SIZE); + if (rbd_dev->opts->trim) { + queue_flag_set_unlocked(QUEUE_FLAG_DISCARD, q); + q->limits.discard_granularity = segment_size; + blk_queue_max_discard_sectors(q, segment_size / SECTOR_SIZE); + blk_queue_max_write_zeroes_sectors(q, segment_size / SECTOR_SIZE); + } if (!ceph_test_opt(rbd_dev->rbd_client->client, NOCRC)) q->backing_dev_info->capabilities |= BDI_CAP_STABLE_WRITES; @@ -5637,6 +5645,7 @@ static int rbd_add_parse_args(const char *buf, rbd_opts->queue_depth = RBD_QUEUE_DEPTH_DEFAULT; rbd_opts->lock_on_read = RBD_LOCK_ON_READ_DEFAULT; rbd_opts->exclusive = RBD_EXCLUSIVE_DEFAULT; + rbd_opts->trim = RBD_TRIM_DEFAULT; copts = ceph_parse_options(options, mon_addrs, mon_addrs + mon_addrs_size - 1, --- To here -- Hitoshi Kamei ^ permalink raw reply related [flat|nested] 8+ messages in thread
* Re: Re: Re: Re: [PATCH 1/2] rbd: RBD_DEV_FLAG_THICK rbd_dev_flags bit 2018-03-27 9:44 ` KAMEI Hitoshi @ 2018-03-27 16:27 ` Ilya Dryomov 0 siblings, 0 replies; 8+ messages in thread From: Ilya Dryomov @ 2018-03-27 16:27 UTC (permalink / raw) To: KAMEI Hitoshi; +Cc: Dongsheng Yang, sage, elder, ceph-devel, linux-kernel On Tue, Mar 27, 2018 at 11:44 AM, KAMEI Hitoshi <hitoshi.kamei.xm@hitachi.com> wrote: > Hi Ilya, > > Thank you for reviewing. > I merged two patches into one patch and pushed the new patch to GitHub. > And I opened the PR. Could you check the PR? > > And I forgot to mention in previous email that I modified your patch to > apply the latest kernel because I couldn't apply your patch directly > by using git am command due to a little bit problem, and I tested > the kernel with the modified patch and rbd map command with notrim option. > > The code described below is the modified patch by my hand. Right, my patch is based on ceph/ceph-client.git:testing which has some unrelated rbd_init_disk() changes. Thanks, Ilya ^ permalink raw reply [flat|nested] 8+ messages in thread
end of thread, other threads:[~2018-03-27 16:27 UTC | newest] Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2018-03-19 12:01 [PATCH 1/2] rbd: RBD_DEV_FLAG_THICK rbd_dev_flags bit KAMEI Hitoshi [not found] ` <5AB0E4F3.8090108@easystack.cn> 2018-03-22 11:57 ` 亀井仁志 / KAMEI,HITOSHI 2018-03-23 9:31 ` Ilya Dryomov 2018-03-23 9:34 ` Ilya Dryomov 2018-03-26 12:31 ` 亀井仁志 / KAMEI,HITOSHI 2018-03-26 13:24 ` Ilya Dryomov 2018-03-27 9:44 ` KAMEI Hitoshi 2018-03-27 16:27 ` Ilya Dryomov
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).