* [PATCH] rbd: implement REQ_OP_WRITE_ZEROES
@ 2017-05-23 15:08 Ilya Dryomov
2017-05-23 18:28 ` Jason Dillaman
` (2 more replies)
0 siblings, 3 replies; 8+ messages in thread
From: Ilya Dryomov @ 2017-05-23 15:08 UTC (permalink / raw)
To: ceph-devel; +Cc: Christoph Hellwig, Hannes Reinecke
Commit 93c1defedcae ("rbd: remove the discard_zeroes_data flag")
explicitly didn't implement REQ_OP_WRITE_ZEROES for rbd, while the
following commit 48920ff2a5a9 ("block: remove the discard_zeroes_data
flag") dropped ->discard_zeroes_data in favor of REQ_OP_WRITE_ZEROES.
rbd does support efficient zeroing via CEPH_OSD_OP_ZERO opcode and will
release either some or all blocks depending on whether the zeroing
request is rbd_obj_bytes() aligned. This is how we currently implement
discards, so REQ_OP_WRITE_ZEROES can be identical to REQ_OP_DISCARD for
now. Caveats:
- REQ_NOUNMAP is ignored, but AFAICT that's true of at least two other
current implementations - nvme and loop
- there is no ->write_zeroes_alignment and blk_bio_write_zeroes_split()
is hence less helpful than blk_bio_discard_split(), but this can (and
should) be fixed on the rbd side
In the future we will split these into two code paths to respect
REQ_NOUNMAP on zeroout and save on zeroing blocks that couldn't be
released on discard.
Fixes: 93c1defedcae ("rbd: remove the discard_zeroes_data flag")
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
---
drivers/block/rbd.c | 2 ++
1 file changed, 2 insertions(+)
diff --git a/drivers/block/rbd.c b/drivers/block/rbd.c
index 454bf9c34882..c16f74547804 100644
--- a/drivers/block/rbd.c
+++ b/drivers/block/rbd.c
@@ -4023,6 +4023,7 @@ static void rbd_queue_workfn(struct work_struct *work)
switch (req_op(rq)) {
case REQ_OP_DISCARD:
+ case REQ_OP_WRITE_ZEROES:
op_type = OBJ_OP_DISCARD;
break;
case REQ_OP_WRITE:
@@ -4420,6 +4421,7 @@ static int rbd_init_disk(struct rbd_device *rbd_dev)
q->limits.discard_granularity = segment_size;
q->limits.discard_alignment = segment_size;
blk_queue_max_discard_sectors(q, segment_size / SECTOR_SIZE);
+ blk_queue_max_write_zeroes_sectors(q, segment_size / SECTOR_SIZE);
if (!ceph_test_opt(rbd_dev->rbd_client->client, NOCRC))
q->backing_dev_info->capabilities |= BDI_CAP_STABLE_WRITES;
--
2.4.3
^ permalink raw reply related [flat|nested] 8+ messages in thread
* Re: [PATCH] rbd: implement REQ_OP_WRITE_ZEROES
2017-05-23 15:08 [PATCH] rbd: implement REQ_OP_WRITE_ZEROES Ilya Dryomov
@ 2017-05-23 18:28 ` Jason Dillaman
2017-05-24 11:38 ` Alexandre DERUMIER
2017-05-29 8:39 ` Ilya Dryomov
2017-05-29 8:43 ` Christoph Hellwig
2 siblings, 1 reply; 8+ messages in thread
From: Jason Dillaman @ 2017-05-23 18:28 UTC (permalink / raw)
To: Ilya Dryomov; +Cc: ceph-devel, Christoph Hellwig, Hannes Reinecke
lgtm
Reviewed-by: Jason Dillaman <dillaman@redhat.com>
On Tue, May 23, 2017 at 11:08 AM, Ilya Dryomov <idryomov@gmail.com> wrote:
> Commit 93c1defedcae ("rbd: remove the discard_zeroes_data flag")
> explicitly didn't implement REQ_OP_WRITE_ZEROES for rbd, while the
> following commit 48920ff2a5a9 ("block: remove the discard_zeroes_data
> flag") dropped ->discard_zeroes_data in favor of REQ_OP_WRITE_ZEROES.
>
> rbd does support efficient zeroing via CEPH_OSD_OP_ZERO opcode and will
> release either some or all blocks depending on whether the zeroing
> request is rbd_obj_bytes() aligned. This is how we currently implement
> discards, so REQ_OP_WRITE_ZEROES can be identical to REQ_OP_DISCARD for
> now. Caveats:
>
> - REQ_NOUNMAP is ignored, but AFAICT that's true of at least two other
> current implementations - nvme and loop
>
> - there is no ->write_zeroes_alignment and blk_bio_write_zeroes_split()
> is hence less helpful than blk_bio_discard_split(), but this can (and
> should) be fixed on the rbd side
>
> In the future we will split these into two code paths to respect
> REQ_NOUNMAP on zeroout and save on zeroing blocks that couldn't be
> released on discard.
>
> Fixes: 93c1defedcae ("rbd: remove the discard_zeroes_data flag")
> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
> ---
> drivers/block/rbd.c | 2 ++
> 1 file changed, 2 insertions(+)
>
> diff --git a/drivers/block/rbd.c b/drivers/block/rbd.c
> index 454bf9c34882..c16f74547804 100644
> --- a/drivers/block/rbd.c
> +++ b/drivers/block/rbd.c
> @@ -4023,6 +4023,7 @@ static void rbd_queue_workfn(struct work_struct *work)
>
> switch (req_op(rq)) {
> case REQ_OP_DISCARD:
> + case REQ_OP_WRITE_ZEROES:
> op_type = OBJ_OP_DISCARD;
> break;
> case REQ_OP_WRITE:
> @@ -4420,6 +4421,7 @@ static int rbd_init_disk(struct rbd_device *rbd_dev)
> q->limits.discard_granularity = segment_size;
> q->limits.discard_alignment = segment_size;
> blk_queue_max_discard_sectors(q, segment_size / SECTOR_SIZE);
> + blk_queue_max_write_zeroes_sectors(q, segment_size / SECTOR_SIZE);
>
> if (!ceph_test_opt(rbd_dev->rbd_client->client, NOCRC))
> q->backing_dev_info->capabilities |= BDI_CAP_STABLE_WRITES;
> --
> 2.4.3
>
> --
> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
--
Jason
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH] rbd: implement REQ_OP_WRITE_ZEROES
2017-05-23 18:28 ` Jason Dillaman
@ 2017-05-24 11:38 ` Alexandre DERUMIER
2017-05-24 11:53 ` Jason Dillaman
0 siblings, 1 reply; 8+ messages in thread
From: Alexandre DERUMIER @ 2017-05-24 11:38 UTC (permalink / raw)
To: Ilya Dryomov; +Cc: ceph-devel, Christoph Hellwig, Hannes Reinecke, dillaman
Hi,
is it planned to implement write zeroes in qemu rbd block driver soon ?
(bdrv_co_write_zeroes)
It's really missing currently, as qemu drive-mirror need it to have sparse images on copy.
Ref from my discussion with Paolo from redhat in 2014 about this:
https://lists.gnu.org/archive/html/qemu-devel/2014-10/msg01274.html
REgards,
Alexandre
----- Mail original -----
De: "Jason Dillaman" <jdillama@redhat.com>
À: "Ilya Dryomov" <idryomov@gmail.com>
Cc: "ceph-devel" <ceph-devel@vger.kernel.org>, "Christoph Hellwig" <hch@lst.de>, "Hannes Reinecke" <hare@suse.com>
Envoyé: Mardi 23 Mai 2017 20:28:00
Objet: Re: [PATCH] rbd: implement REQ_OP_WRITE_ZEROES
lgtm
Reviewed-by: Jason Dillaman <dillaman@redhat.com>
On Tue, May 23, 2017 at 11:08 AM, Ilya Dryomov <idryomov@gmail.com> wrote:
> Commit 93c1defedcae ("rbd: remove the discard_zeroes_data flag")
> explicitly didn't implement REQ_OP_WRITE_ZEROES for rbd, while the
> following commit 48920ff2a5a9 ("block: remove the discard_zeroes_data
> flag") dropped ->discard_zeroes_data in favor of REQ_OP_WRITE_ZEROES.
>
> rbd does support efficient zeroing via CEPH_OSD_OP_ZERO opcode and will
> release either some or all blocks depending on whether the zeroing
> request is rbd_obj_bytes() aligned. This is how we currently implement
> discards, so REQ_OP_WRITE_ZEROES can be identical to REQ_OP_DISCARD for
> now. Caveats:
>
> - REQ_NOUNMAP is ignored, but AFAICT that's true of at least two other
> current implementations - nvme and loop
>
> - there is no ->write_zeroes_alignment and blk_bio_write_zeroes_split()
> is hence less helpful than blk_bio_discard_split(), but this can (and
> should) be fixed on the rbd side
>
> In the future we will split these into two code paths to respect
> REQ_NOUNMAP on zeroout and save on zeroing blocks that couldn't be
> released on discard.
>
> Fixes: 93c1defedcae ("rbd: remove the discard_zeroes_data flag")
> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
> ---
> drivers/block/rbd.c | 2 ++
> 1 file changed, 2 insertions(+)
>
> diff --git a/drivers/block/rbd.c b/drivers/block/rbd.c
> index 454bf9c34882..c16f74547804 100644
> --- a/drivers/block/rbd.c
> +++ b/drivers/block/rbd.c
> @@ -4023,6 +4023,7 @@ static void rbd_queue_workfn(struct work_struct *work)
>
> switch (req_op(rq)) {
> case REQ_OP_DISCARD:
> + case REQ_OP_WRITE_ZEROES:
> op_type = OBJ_OP_DISCARD;
> break;
> case REQ_OP_WRITE:
> @@ -4420,6 +4421,7 @@ static int rbd_init_disk(struct rbd_device *rbd_dev)
> q->limits.discard_granularity = segment_size;
> q->limits.discard_alignment = segment_size;
> blk_queue_max_discard_sectors(q, segment_size / SECTOR_SIZE);
> + blk_queue_max_write_zeroes_sectors(q, segment_size / SECTOR_SIZE);
>
> if (!ceph_test_opt(rbd_dev->rbd_client->client, NOCRC))
> q->backing_dev_info->capabilities |= BDI_CAP_STABLE_WRITES;
> --
> 2.4.3
>
> --
> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
--
Jason
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH] rbd: implement REQ_OP_WRITE_ZEROES
2017-05-24 11:38 ` Alexandre DERUMIER
@ 2017-05-24 11:53 ` Jason Dillaman
2017-05-24 17:38 ` Alexandre DERUMIER
0 siblings, 1 reply; 8+ messages in thread
From: Jason Dillaman @ 2017-05-24 11:53 UTC (permalink / raw)
To: Alexandre DERUMIER
Cc: Ilya Dryomov, ceph-devel, Christoph Hellwig, Hannes Reinecke
I just opened a tracker ticket for this [1] -- let me know if you have
any other QEMU improvement ideas.
[1] http://tracker.ceph.com/issues/20070
On Wed, May 24, 2017 at 7:38 AM, Alexandre DERUMIER <aderumier@odiso.com> wrote:
> Hi,
>
> is it planned to implement write zeroes in qemu rbd block driver soon ?
> (bdrv_co_write_zeroes)
>
> It's really missing currently, as qemu drive-mirror need it to have sparse images on copy.
>
> Ref from my discussion with Paolo from redhat in 2014 about this:
> https://lists.gnu.org/archive/html/qemu-devel/2014-10/msg01274.html
>
>
> REgards,
>
> Alexandre
>
> ----- Mail original -----
> De: "Jason Dillaman" <jdillama@redhat.com>
> À: "Ilya Dryomov" <idryomov@gmail.com>
> Cc: "ceph-devel" <ceph-devel@vger.kernel.org>, "Christoph Hellwig" <hch@lst.de>, "Hannes Reinecke" <hare@suse.com>
> Envoyé: Mardi 23 Mai 2017 20:28:00
> Objet: Re: [PATCH] rbd: implement REQ_OP_WRITE_ZEROES
>
> lgtm
>
> Reviewed-by: Jason Dillaman <dillaman@redhat.com>
>
> On Tue, May 23, 2017 at 11:08 AM, Ilya Dryomov <idryomov@gmail.com> wrote:
>> Commit 93c1defedcae ("rbd: remove the discard_zeroes_data flag")
>> explicitly didn't implement REQ_OP_WRITE_ZEROES for rbd, while the
>> following commit 48920ff2a5a9 ("block: remove the discard_zeroes_data
>> flag") dropped ->discard_zeroes_data in favor of REQ_OP_WRITE_ZEROES.
>>
>> rbd does support efficient zeroing via CEPH_OSD_OP_ZERO opcode and will
>> release either some or all blocks depending on whether the zeroing
>> request is rbd_obj_bytes() aligned. This is how we currently implement
>> discards, so REQ_OP_WRITE_ZEROES can be identical to REQ_OP_DISCARD for
>> now. Caveats:
>>
>> - REQ_NOUNMAP is ignored, but AFAICT that's true of at least two other
>> current implementations - nvme and loop
>>
>> - there is no ->write_zeroes_alignment and blk_bio_write_zeroes_split()
>> is hence less helpful than blk_bio_discard_split(), but this can (and
>> should) be fixed on the rbd side
>>
>> In the future we will split these into two code paths to respect
>> REQ_NOUNMAP on zeroout and save on zeroing blocks that couldn't be
>> released on discard.
>>
>> Fixes: 93c1defedcae ("rbd: remove the discard_zeroes_data flag")
>> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
>> ---
>> drivers/block/rbd.c | 2 ++
>> 1 file changed, 2 insertions(+)
>>
>> diff --git a/drivers/block/rbd.c b/drivers/block/rbd.c
>> index 454bf9c34882..c16f74547804 100644
>> --- a/drivers/block/rbd.c
>> +++ b/drivers/block/rbd.c
>> @@ -4023,6 +4023,7 @@ static void rbd_queue_workfn(struct work_struct *work)
>>
>> switch (req_op(rq)) {
>> case REQ_OP_DISCARD:
>> + case REQ_OP_WRITE_ZEROES:
>> op_type = OBJ_OP_DISCARD;
>> break;
>> case REQ_OP_WRITE:
>> @@ -4420,6 +4421,7 @@ static int rbd_init_disk(struct rbd_device *rbd_dev)
>> q->limits.discard_granularity = segment_size;
>> q->limits.discard_alignment = segment_size;
>> blk_queue_max_discard_sectors(q, segment_size / SECTOR_SIZE);
>> + blk_queue_max_write_zeroes_sectors(q, segment_size / SECTOR_SIZE);
>>
>> if (!ceph_test_opt(rbd_dev->rbd_client->client, NOCRC))
>> q->backing_dev_info->capabilities |= BDI_CAP_STABLE_WRITES;
>> --
>> 2.4.3
>>
>> --
>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
>> the body of a message to majordomo@vger.kernel.org
>> More majordomo info at http://vger.kernel.org/majordomo-info.html
>
>
>
> --
> Jason
> --
> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
>
> --
> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
--
Jason
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH] rbd: implement REQ_OP_WRITE_ZEROES
2017-05-24 11:53 ` Jason Dillaman
@ 2017-05-24 17:38 ` Alexandre DERUMIER
2017-05-24 19:34 ` Jason Dillaman
0 siblings, 1 reply; 8+ messages in thread
From: Alexandre DERUMIER @ 2017-05-24 17:38 UTC (permalink / raw)
To: dillaman; +Cc: Ilya Dryomov, ceph-devel, Christoph Hellwig, Hannes Reinecke
>>I just opened a tracker ticket for this [1]
>>
>>[1] http://tracker.ceph.com/issues/20070
Thanks Jason!
>>-- let me know if you have
>>any other QEMU improvement ideas.
For the moment, the bigger limitation is cpu usage of librbd,
as qemu can only 1 thread, I can't reach more than around 70000 iops by disk.
(3,1ghz cpu, disabling debug, rbd_cache, using jemalloc).
So any improvement to reduce cpu usage could be great :)
Also, in the future, I think qemu will support multiple iothreads by disk,
I don't known if librbd is already ready for this ?
----- Mail original -----
De: "Jason Dillaman" <jdillama@redhat.com>
À: "aderumier" <aderumier@odiso.com>
Cc: "Ilya Dryomov" <idryomov@gmail.com>, "ceph-devel" <ceph-devel@vger.kernel.org>, "Christoph Hellwig" <hch@lst.de>, "Hannes Reinecke" <hare@suse.com>
Envoyé: Mercredi 24 Mai 2017 13:53:40
Objet: Re: [PATCH] rbd: implement REQ_OP_WRITE_ZEROES
I just opened a tracker ticket for this [1] -- let me know if you have
any other QEMU improvement ideas.
[1] http://tracker.ceph.com/issues/20070
On Wed, May 24, 2017 at 7:38 AM, Alexandre DERUMIER <aderumier@odiso.com> wrote:
> Hi,
>
> is it planned to implement write zeroes in qemu rbd block driver soon ?
> (bdrv_co_write_zeroes)
>
> It's really missing currently, as qemu drive-mirror need it to have sparse images on copy.
>
> Ref from my discussion with Paolo from redhat in 2014 about this:
> https://lists.gnu.org/archive/html/qemu-devel/2014-10/msg01274.html
>
>
> REgards,
>
> Alexandre
>
> ----- Mail original -----
> De: "Jason Dillaman" <jdillama@redhat.com>
> À: "Ilya Dryomov" <idryomov@gmail.com>
> Cc: "ceph-devel" <ceph-devel@vger.kernel.org>, "Christoph Hellwig" <hch@lst.de>, "Hannes Reinecke" <hare@suse.com>
> Envoyé: Mardi 23 Mai 2017 20:28:00
> Objet: Re: [PATCH] rbd: implement REQ_OP_WRITE_ZEROES
>
> lgtm
>
> Reviewed-by: Jason Dillaman <dillaman@redhat.com>
>
> On Tue, May 23, 2017 at 11:08 AM, Ilya Dryomov <idryomov@gmail.com> wrote:
>> Commit 93c1defedcae ("rbd: remove the discard_zeroes_data flag")
>> explicitly didn't implement REQ_OP_WRITE_ZEROES for rbd, while the
>> following commit 48920ff2a5a9 ("block: remove the discard_zeroes_data
>> flag") dropped ->discard_zeroes_data in favor of REQ_OP_WRITE_ZEROES.
>>
>> rbd does support efficient zeroing via CEPH_OSD_OP_ZERO opcode and will
>> release either some or all blocks depending on whether the zeroing
>> request is rbd_obj_bytes() aligned. This is how we currently implement
>> discards, so REQ_OP_WRITE_ZEROES can be identical to REQ_OP_DISCARD for
>> now. Caveats:
>>
>> - REQ_NOUNMAP is ignored, but AFAICT that's true of at least two other
>> current implementations - nvme and loop
>>
>> - there is no ->write_zeroes_alignment and blk_bio_write_zeroes_split()
>> is hence less helpful than blk_bio_discard_split(), but this can (and
>> should) be fixed on the rbd side
>>
>> In the future we will split these into two code paths to respect
>> REQ_NOUNMAP on zeroout and save on zeroing blocks that couldn't be
>> released on discard.
>>
>> Fixes: 93c1defedcae ("rbd: remove the discard_zeroes_data flag")
>> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
>> ---
>> drivers/block/rbd.c | 2 ++
>> 1 file changed, 2 insertions(+)
>>
>> diff --git a/drivers/block/rbd.c b/drivers/block/rbd.c
>> index 454bf9c34882..c16f74547804 100644
>> --- a/drivers/block/rbd.c
>> +++ b/drivers/block/rbd.c
>> @@ -4023,6 +4023,7 @@ static void rbd_queue_workfn(struct work_struct *work)
>>
>> switch (req_op(rq)) {
>> case REQ_OP_DISCARD:
>> + case REQ_OP_WRITE_ZEROES:
>> op_type = OBJ_OP_DISCARD;
>> break;
>> case REQ_OP_WRITE:
>> @@ -4420,6 +4421,7 @@ static int rbd_init_disk(struct rbd_device *rbd_dev)
>> q->limits.discard_granularity = segment_size;
>> q->limits.discard_alignment = segment_size;
>> blk_queue_max_discard_sectors(q, segment_size / SECTOR_SIZE);
>> + blk_queue_max_write_zeroes_sectors(q, segment_size / SECTOR_SIZE);
>>
>> if (!ceph_test_opt(rbd_dev->rbd_client->client, NOCRC))
>> q->backing_dev_info->capabilities |= BDI_CAP_STABLE_WRITES;
>> --
>> 2.4.3
>>
>> --
>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
>> the body of a message to majordomo@vger.kernel.org
>> More majordomo info at http://vger.kernel.org/majordomo-info.html
>
>
>
> --
> Jason
> --
> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
>
> --
> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
--
Jason
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH] rbd: implement REQ_OP_WRITE_ZEROES
2017-05-24 17:38 ` Alexandre DERUMIER
@ 2017-05-24 19:34 ` Jason Dillaman
0 siblings, 0 replies; 8+ messages in thread
From: Jason Dillaman @ 2017-05-24 19:34 UTC (permalink / raw)
To: Alexandre DERUMIER
Cc: Ilya Dryomov, ceph-devel, Christoph Hellwig, Hannes Reinecke
On Wed, May 24, 2017 at 1:38 PM, Alexandre DERUMIER <aderumier@odiso.com> wrote:
>>>I just opened a tracker ticket for this [1]
>>>
>>>[1] http://tracker.ceph.com/issues/20070
>
> Thanks Jason!
>
>>>-- let me know if you have
>>>any other QEMU improvement ideas.
>
> For the moment, the bigger limitation is cpu usage of librbd,
> as qemu can only 1 thread, I can't reach more than around 70000 iops by disk.
> (3,1ghz cpu, disabling debug, rbd_cache, using jemalloc).
>
> So any improvement to reduce cpu usage could be great :)
We've had some interest expressed in volunteering time and code
contributions to reduce the CPU usage of librbd / librados. I am
hoping some of these improvements will be included in the Mimic
release.
> Also, in the future, I think qemu will support multiple iothreads by disk,
> I don't known if librbd is already ready for this ?
This is hopefully another improvement we can make in the Mimic release
timeframe. librbd already has an architecture to support multiple
internal IO threads, but we have it disabled for now due to lack of
thorough testing -- and for the fact that librados needs to utilize
multiple IO threads as well.
>
>
> ----- Mail original -----
> De: "Jason Dillaman" <jdillama@redhat.com>
> À: "aderumier" <aderumier@odiso.com>
> Cc: "Ilya Dryomov" <idryomov@gmail.com>, "ceph-devel" <ceph-devel@vger.kernel.org>, "Christoph Hellwig" <hch@lst.de>, "Hannes Reinecke" <hare@suse.com>
> Envoyé: Mercredi 24 Mai 2017 13:53:40
> Objet: Re: [PATCH] rbd: implement REQ_OP_WRITE_ZEROES
>
> I just opened a tracker ticket for this [1] -- let me know if you have
> any other QEMU improvement ideas.
>
> [1] http://tracker.ceph.com/issues/20070
>
> On Wed, May 24, 2017 at 7:38 AM, Alexandre DERUMIER <aderumier@odiso.com> wrote:
>> Hi,
>>
>> is it planned to implement write zeroes in qemu rbd block driver soon ?
>> (bdrv_co_write_zeroes)
>>
>> It's really missing currently, as qemu drive-mirror need it to have sparse images on copy.
>>
>> Ref from my discussion with Paolo from redhat in 2014 about this:
>> https://lists.gnu.org/archive/html/qemu-devel/2014-10/msg01274.html
>>
>>
>> REgards,
>>
>> Alexandre
>>
>> ----- Mail original -----
>> De: "Jason Dillaman" <jdillama@redhat.com>
>> À: "Ilya Dryomov" <idryomov@gmail.com>
>> Cc: "ceph-devel" <ceph-devel@vger.kernel.org>, "Christoph Hellwig" <hch@lst.de>, "Hannes Reinecke" <hare@suse.com>
>> Envoyé: Mardi 23 Mai 2017 20:28:00
>> Objet: Re: [PATCH] rbd: implement REQ_OP_WRITE_ZEROES
>>
>> lgtm
>>
>> Reviewed-by: Jason Dillaman <dillaman@redhat.com>
>>
>> On Tue, May 23, 2017 at 11:08 AM, Ilya Dryomov <idryomov@gmail.com> wrote:
>>> Commit 93c1defedcae ("rbd: remove the discard_zeroes_data flag")
>>> explicitly didn't implement REQ_OP_WRITE_ZEROES for rbd, while the
>>> following commit 48920ff2a5a9 ("block: remove the discard_zeroes_data
>>> flag") dropped ->discard_zeroes_data in favor of REQ_OP_WRITE_ZEROES.
>>>
>>> rbd does support efficient zeroing via CEPH_OSD_OP_ZERO opcode and will
>>> release either some or all blocks depending on whether the zeroing
>>> request is rbd_obj_bytes() aligned. This is how we currently implement
>>> discards, so REQ_OP_WRITE_ZEROES can be identical to REQ_OP_DISCARD for
>>> now. Caveats:
>>>
>>> - REQ_NOUNMAP is ignored, but AFAICT that's true of at least two other
>>> current implementations - nvme and loop
>>>
>>> - there is no ->write_zeroes_alignment and blk_bio_write_zeroes_split()
>>> is hence less helpful than blk_bio_discard_split(), but this can (and
>>> should) be fixed on the rbd side
>>>
>>> In the future we will split these into two code paths to respect
>>> REQ_NOUNMAP on zeroout and save on zeroing blocks that couldn't be
>>> released on discard.
>>>
>>> Fixes: 93c1defedcae ("rbd: remove the discard_zeroes_data flag")
>>> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
>>> ---
>>> drivers/block/rbd.c | 2 ++
>>> 1 file changed, 2 insertions(+)
>>>
>>> diff --git a/drivers/block/rbd.c b/drivers/block/rbd.c
>>> index 454bf9c34882..c16f74547804 100644
>>> --- a/drivers/block/rbd.c
>>> +++ b/drivers/block/rbd.c
>>> @@ -4023,6 +4023,7 @@ static void rbd_queue_workfn(struct work_struct *work)
>>>
>>> switch (req_op(rq)) {
>>> case REQ_OP_DISCARD:
>>> + case REQ_OP_WRITE_ZEROES:
>>> op_type = OBJ_OP_DISCARD;
>>> break;
>>> case REQ_OP_WRITE:
>>> @@ -4420,6 +4421,7 @@ static int rbd_init_disk(struct rbd_device *rbd_dev)
>>> q->limits.discard_granularity = segment_size;
>>> q->limits.discard_alignment = segment_size;
>>> blk_queue_max_discard_sectors(q, segment_size / SECTOR_SIZE);
>>> + blk_queue_max_write_zeroes_sectors(q, segment_size / SECTOR_SIZE);
>>>
>>> if (!ceph_test_opt(rbd_dev->rbd_client->client, NOCRC))
>>> q->backing_dev_info->capabilities |= BDI_CAP_STABLE_WRITES;
>>> --
>>> 2.4.3
>>>
>>> --
>>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
>>> the body of a message to majordomo@vger.kernel.org
>>> More majordomo info at http://vger.kernel.org/majordomo-info.html
>>
>>
>>
>> --
>> Jason
>> --
>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
>> the body of a message to majordomo@vger.kernel.org
>> More majordomo info at http://vger.kernel.org/majordomo-info.html
>>
>> --
>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
>> the body of a message to majordomo@vger.kernel.org
>> More majordomo info at http://vger.kernel.org/majordomo-info.html
>
>
>
> --
> Jason
>
--
Jason
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH] rbd: implement REQ_OP_WRITE_ZEROES
2017-05-23 15:08 [PATCH] rbd: implement REQ_OP_WRITE_ZEROES Ilya Dryomov
2017-05-23 18:28 ` Jason Dillaman
@ 2017-05-29 8:39 ` Ilya Dryomov
2017-05-29 8:43 ` Christoph Hellwig
2 siblings, 0 replies; 8+ messages in thread
From: Ilya Dryomov @ 2017-05-29 8:39 UTC (permalink / raw)
To: Christoph Hellwig; +Cc: Ceph Development, Hannes Reinecke
On Tue, May 23, 2017 at 5:08 PM, Ilya Dryomov <idryomov@gmail.com> wrote:
> Commit 93c1defedcae ("rbd: remove the discard_zeroes_data flag")
> explicitly didn't implement REQ_OP_WRITE_ZEROES for rbd, while the
> following commit 48920ff2a5a9 ("block: remove the discard_zeroes_data
> flag") dropped ->discard_zeroes_data in favor of REQ_OP_WRITE_ZEROES.
>
> rbd does support efficient zeroing via CEPH_OSD_OP_ZERO opcode and will
> release either some or all blocks depending on whether the zeroing
> request is rbd_obj_bytes() aligned. This is how we currently implement
> discards, so REQ_OP_WRITE_ZEROES can be identical to REQ_OP_DISCARD for
> now. Caveats:
>
> - REQ_NOUNMAP is ignored, but AFAICT that's true of at least two other
> current implementations - nvme and loop
>
> - there is no ->write_zeroes_alignment and blk_bio_write_zeroes_split()
> is hence less helpful than blk_bio_discard_split(), but this can (and
> should) be fixed on the rbd side
>
> In the future we will split these into two code paths to respect
> REQ_NOUNMAP on zeroout and save on zeroing blocks that couldn't be
> released on discard.
>
> Fixes: 93c1defedcae ("rbd: remove the discard_zeroes_data flag")
> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
> ---
> drivers/block/rbd.c | 2 ++
> 1 file changed, 2 insertions(+)
>
> diff --git a/drivers/block/rbd.c b/drivers/block/rbd.c
> index 454bf9c34882..c16f74547804 100644
> --- a/drivers/block/rbd.c
> +++ b/drivers/block/rbd.c
> @@ -4023,6 +4023,7 @@ static void rbd_queue_workfn(struct work_struct *work)
>
> switch (req_op(rq)) {
> case REQ_OP_DISCARD:
> + case REQ_OP_WRITE_ZEROES:
> op_type = OBJ_OP_DISCARD;
> break;
> case REQ_OP_WRITE:
> @@ -4420,6 +4421,7 @@ static int rbd_init_disk(struct rbd_device *rbd_dev)
> q->limits.discard_granularity = segment_size;
> q->limits.discard_alignment = segment_size;
> blk_queue_max_discard_sectors(q, segment_size / SECTOR_SIZE);
> + blk_queue_max_write_zeroes_sectors(q, segment_size / SECTOR_SIZE);
>
> if (!ceph_test_opt(rbd_dev->rbd_client->client, NOCRC))
> q->backing_dev_info->capabilities |= BDI_CAP_STABLE_WRITES;
Hi Christoph,
I'm planning to merge this into 4.12-rc because it fixes a 93c1defedcae
regression, but I'm not quite sure what you meant by "rbd only supports
discarding on large alignments, so the zeroing code would always fall
back to explicit writings of zeroes". Care to take a look and ack?
Thanks,
Ilya
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH] rbd: implement REQ_OP_WRITE_ZEROES
2017-05-23 15:08 [PATCH] rbd: implement REQ_OP_WRITE_ZEROES Ilya Dryomov
2017-05-23 18:28 ` Jason Dillaman
2017-05-29 8:39 ` Ilya Dryomov
@ 2017-05-29 8:43 ` Christoph Hellwig
2 siblings, 0 replies; 8+ messages in thread
From: Christoph Hellwig @ 2017-05-29 8:43 UTC (permalink / raw)
To: Ilya Dryomov; +Cc: ceph-devel, Christoph Hellwig, Hannes Reinecke
On Tue, May 23, 2017 at 05:08:27PM +0200, Ilya Dryomov wrote:
> Commit 93c1defedcae ("rbd: remove the discard_zeroes_data flag")
> explicitly didn't implement REQ_OP_WRITE_ZEROES for rbd, while the
> following commit 48920ff2a5a9 ("block: remove the discard_zeroes_data
> flag") dropped ->discard_zeroes_data in favor of REQ_OP_WRITE_ZEROES.
>
> rbd does support efficient zeroing via CEPH_OSD_OP_ZERO opcode and will
> release either some or all blocks depending on whether the zeroing
> request is rbd_obj_bytes() aligned. This is how we currently implement
> discards, so REQ_OP_WRITE_ZEROES can be identical to REQ_OP_DISCARD for
> now. Caveats:
>
> - REQ_NOUNMAP is ignored, but AFAICT that's true of at least two other
> current implementations - nvme and loop
That's no problem, it's just a hint in case the device differenciates
the cases.
> - there is no ->write_zeroes_alignment and blk_bio_write_zeroes_split()
> is hence less helpful than blk_bio_discard_split(), but this can (and
> should) be fixed on the rbd side
Yes.
> In the future we will split these into two code paths to respect
> REQ_NOUNMAP on zeroout and save on zeroing blocks that couldn't be
> released on discard.
>
> Fixes: 93c1defedcae ("rbd: remove the discard_zeroes_data flag")
> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
Looks fine,
Reviewed-by: Christoph Hellwig <hch@lst.de>
^ permalink raw reply [flat|nested] 8+ messages in thread
end of thread, other threads:[~2017-05-29 8:43 UTC | newest]
Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-05-23 15:08 [PATCH] rbd: implement REQ_OP_WRITE_ZEROES Ilya Dryomov
2017-05-23 18:28 ` Jason Dillaman
2017-05-24 11:38 ` Alexandre DERUMIER
2017-05-24 11:53 ` Jason Dillaman
2017-05-24 17:38 ` Alexandre DERUMIER
2017-05-24 19:34 ` Jason Dillaman
2017-05-29 8:39 ` Ilya Dryomov
2017-05-29 8:43 ` Christoph Hellwig
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.