All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH] rbd: implement REQ_OP_WRITE_ZEROES
@ 2017-05-23 15:08 Ilya Dryomov
  2017-05-23 18:28 ` Jason Dillaman
                   ` (2 more replies)
  0 siblings, 3 replies; 8+ messages in thread
From: Ilya Dryomov @ 2017-05-23 15:08 UTC (permalink / raw)
  To: ceph-devel; +Cc: Christoph Hellwig, Hannes Reinecke

Commit 93c1defedcae ("rbd: remove the discard_zeroes_data flag")
explicitly didn't implement REQ_OP_WRITE_ZEROES for rbd, while the
following commit 48920ff2a5a9 ("block: remove the discard_zeroes_data
flag") dropped ->discard_zeroes_data in favor of REQ_OP_WRITE_ZEROES.

rbd does support efficient zeroing via CEPH_OSD_OP_ZERO opcode and will
release either some or all blocks depending on whether the zeroing
request is rbd_obj_bytes() aligned.  This is how we currently implement
discards, so REQ_OP_WRITE_ZEROES can be identical to REQ_OP_DISCARD for
now.  Caveats:

- REQ_NOUNMAP is ignored, but AFAICT that's true of at least two other
  current implementations - nvme and loop

- there is no ->write_zeroes_alignment and blk_bio_write_zeroes_split()
  is hence less helpful than blk_bio_discard_split(), but this can (and
  should) be fixed on the rbd side

In the future we will split these into two code paths to respect
REQ_NOUNMAP on zeroout and save on zeroing blocks that couldn't be
released on discard.

Fixes: 93c1defedcae ("rbd: remove the discard_zeroes_data flag")
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
---
 drivers/block/rbd.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/drivers/block/rbd.c b/drivers/block/rbd.c
index 454bf9c34882..c16f74547804 100644
--- a/drivers/block/rbd.c
+++ b/drivers/block/rbd.c
@@ -4023,6 +4023,7 @@ static void rbd_queue_workfn(struct work_struct *work)
 
 	switch (req_op(rq)) {
 	case REQ_OP_DISCARD:
+	case REQ_OP_WRITE_ZEROES:
 		op_type = OBJ_OP_DISCARD;
 		break;
 	case REQ_OP_WRITE:
@@ -4420,6 +4421,7 @@ static int rbd_init_disk(struct rbd_device *rbd_dev)
 	q->limits.discard_granularity = segment_size;
 	q->limits.discard_alignment = segment_size;
 	blk_queue_max_discard_sectors(q, segment_size / SECTOR_SIZE);
+	blk_queue_max_write_zeroes_sectors(q, segment_size / SECTOR_SIZE);
 
 	if (!ceph_test_opt(rbd_dev->rbd_client->client, NOCRC))
 		q->backing_dev_info->capabilities |= BDI_CAP_STABLE_WRITES;
-- 
2.4.3


^ permalink raw reply related	[flat|nested] 8+ messages in thread

* Re: [PATCH] rbd: implement REQ_OP_WRITE_ZEROES
  2017-05-23 15:08 [PATCH] rbd: implement REQ_OP_WRITE_ZEROES Ilya Dryomov
@ 2017-05-23 18:28 ` Jason Dillaman
  2017-05-24 11:38   ` Alexandre DERUMIER
  2017-05-29  8:39 ` Ilya Dryomov
  2017-05-29  8:43 ` Christoph Hellwig
  2 siblings, 1 reply; 8+ messages in thread
From: Jason Dillaman @ 2017-05-23 18:28 UTC (permalink / raw)
  To: Ilya Dryomov; +Cc: ceph-devel, Christoph Hellwig, Hannes Reinecke

lgtm

Reviewed-by: Jason Dillaman <dillaman@redhat.com>

On Tue, May 23, 2017 at 11:08 AM, Ilya Dryomov <idryomov@gmail.com> wrote:
> Commit 93c1defedcae ("rbd: remove the discard_zeroes_data flag")
> explicitly didn't implement REQ_OP_WRITE_ZEROES for rbd, while the
> following commit 48920ff2a5a9 ("block: remove the discard_zeroes_data
> flag") dropped ->discard_zeroes_data in favor of REQ_OP_WRITE_ZEROES.
>
> rbd does support efficient zeroing via CEPH_OSD_OP_ZERO opcode and will
> release either some or all blocks depending on whether the zeroing
> request is rbd_obj_bytes() aligned.  This is how we currently implement
> discards, so REQ_OP_WRITE_ZEROES can be identical to REQ_OP_DISCARD for
> now.  Caveats:
>
> - REQ_NOUNMAP is ignored, but AFAICT that's true of at least two other
>   current implementations - nvme and loop
>
> - there is no ->write_zeroes_alignment and blk_bio_write_zeroes_split()
>   is hence less helpful than blk_bio_discard_split(), but this can (and
>   should) be fixed on the rbd side
>
> In the future we will split these into two code paths to respect
> REQ_NOUNMAP on zeroout and save on zeroing blocks that couldn't be
> released on discard.
>
> Fixes: 93c1defedcae ("rbd: remove the discard_zeroes_data flag")
> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
> ---
>  drivers/block/rbd.c | 2 ++
>  1 file changed, 2 insertions(+)
>
> diff --git a/drivers/block/rbd.c b/drivers/block/rbd.c
> index 454bf9c34882..c16f74547804 100644
> --- a/drivers/block/rbd.c
> +++ b/drivers/block/rbd.c
> @@ -4023,6 +4023,7 @@ static void rbd_queue_workfn(struct work_struct *work)
>
>         switch (req_op(rq)) {
>         case REQ_OP_DISCARD:
> +       case REQ_OP_WRITE_ZEROES:
>                 op_type = OBJ_OP_DISCARD;
>                 break;
>         case REQ_OP_WRITE:
> @@ -4420,6 +4421,7 @@ static int rbd_init_disk(struct rbd_device *rbd_dev)
>         q->limits.discard_granularity = segment_size;
>         q->limits.discard_alignment = segment_size;
>         blk_queue_max_discard_sectors(q, segment_size / SECTOR_SIZE);
> +       blk_queue_max_write_zeroes_sectors(q, segment_size / SECTOR_SIZE);
>
>         if (!ceph_test_opt(rbd_dev->rbd_client->client, NOCRC))
>                 q->backing_dev_info->capabilities |= BDI_CAP_STABLE_WRITES;
> --
> 2.4.3
>
> --
> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html



-- 
Jason

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH] rbd: implement REQ_OP_WRITE_ZEROES
  2017-05-23 18:28 ` Jason Dillaman
@ 2017-05-24 11:38   ` Alexandre DERUMIER
  2017-05-24 11:53     ` Jason Dillaman
  0 siblings, 1 reply; 8+ messages in thread
From: Alexandre DERUMIER @ 2017-05-24 11:38 UTC (permalink / raw)
  To: Ilya Dryomov; +Cc: ceph-devel, Christoph Hellwig, Hannes Reinecke, dillaman

Hi,

is it planned to implement write zeroes in qemu rbd block driver soon ?
(bdrv_co_write_zeroes)

It's really missing currently, as qemu drive-mirror need it to have sparse images on copy.

Ref from my discussion with Paolo from redhat in 2014 about this:
  https://lists.gnu.org/archive/html/qemu-devel/2014-10/msg01274.html


REgards,

Alexandre

----- Mail original -----
De: "Jason Dillaman" <jdillama@redhat.com>
À: "Ilya Dryomov" <idryomov@gmail.com>
Cc: "ceph-devel" <ceph-devel@vger.kernel.org>, "Christoph Hellwig" <hch@lst.de>, "Hannes Reinecke" <hare@suse.com>
Envoyé: Mardi 23 Mai 2017 20:28:00
Objet: Re: [PATCH] rbd: implement REQ_OP_WRITE_ZEROES

lgtm 

Reviewed-by: Jason Dillaman <dillaman@redhat.com> 

On Tue, May 23, 2017 at 11:08 AM, Ilya Dryomov <idryomov@gmail.com> wrote: 
> Commit 93c1defedcae ("rbd: remove the discard_zeroes_data flag") 
> explicitly didn't implement REQ_OP_WRITE_ZEROES for rbd, while the 
> following commit 48920ff2a5a9 ("block: remove the discard_zeroes_data 
> flag") dropped ->discard_zeroes_data in favor of REQ_OP_WRITE_ZEROES. 
> 
> rbd does support efficient zeroing via CEPH_OSD_OP_ZERO opcode and will 
> release either some or all blocks depending on whether the zeroing 
> request is rbd_obj_bytes() aligned. This is how we currently implement 
> discards, so REQ_OP_WRITE_ZEROES can be identical to REQ_OP_DISCARD for 
> now. Caveats: 
> 
> - REQ_NOUNMAP is ignored, but AFAICT that's true of at least two other 
> current implementations - nvme and loop 
> 
> - there is no ->write_zeroes_alignment and blk_bio_write_zeroes_split() 
> is hence less helpful than blk_bio_discard_split(), but this can (and 
> should) be fixed on the rbd side 
> 
> In the future we will split these into two code paths to respect 
> REQ_NOUNMAP on zeroout and save on zeroing blocks that couldn't be 
> released on discard. 
> 
> Fixes: 93c1defedcae ("rbd: remove the discard_zeroes_data flag") 
> Signed-off-by: Ilya Dryomov <idryomov@gmail.com> 
> --- 
> drivers/block/rbd.c | 2 ++ 
> 1 file changed, 2 insertions(+) 
> 
> diff --git a/drivers/block/rbd.c b/drivers/block/rbd.c 
> index 454bf9c34882..c16f74547804 100644 
> --- a/drivers/block/rbd.c 
> +++ b/drivers/block/rbd.c 
> @@ -4023,6 +4023,7 @@ static void rbd_queue_workfn(struct work_struct *work) 
> 
> switch (req_op(rq)) { 
> case REQ_OP_DISCARD: 
> + case REQ_OP_WRITE_ZEROES: 
> op_type = OBJ_OP_DISCARD; 
> break; 
> case REQ_OP_WRITE: 
> @@ -4420,6 +4421,7 @@ static int rbd_init_disk(struct rbd_device *rbd_dev) 
> q->limits.discard_granularity = segment_size; 
> q->limits.discard_alignment = segment_size; 
> blk_queue_max_discard_sectors(q, segment_size / SECTOR_SIZE); 
> + blk_queue_max_write_zeroes_sectors(q, segment_size / SECTOR_SIZE); 
> 
> if (!ceph_test_opt(rbd_dev->rbd_client->client, NOCRC)) 
> q->backing_dev_info->capabilities |= BDI_CAP_STABLE_WRITES; 
> -- 
> 2.4.3 
> 
> -- 
> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in 
> the body of a message to majordomo@vger.kernel.org 
> More majordomo info at http://vger.kernel.org/majordomo-info.html 



-- 
Jason 
-- 
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in 
the body of a message to majordomo@vger.kernel.org 
More majordomo info at http://vger.kernel.org/majordomo-info.html 


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH] rbd: implement REQ_OP_WRITE_ZEROES
  2017-05-24 11:38   ` Alexandre DERUMIER
@ 2017-05-24 11:53     ` Jason Dillaman
  2017-05-24 17:38       ` Alexandre DERUMIER
  0 siblings, 1 reply; 8+ messages in thread
From: Jason Dillaman @ 2017-05-24 11:53 UTC (permalink / raw)
  To: Alexandre DERUMIER
  Cc: Ilya Dryomov, ceph-devel, Christoph Hellwig, Hannes Reinecke

I just opened a tracker ticket for this [1] -- let me know if you have
any other QEMU improvement ideas.

[1] http://tracker.ceph.com/issues/20070

On Wed, May 24, 2017 at 7:38 AM, Alexandre DERUMIER <aderumier@odiso.com> wrote:
> Hi,
>
> is it planned to implement write zeroes in qemu rbd block driver soon ?
> (bdrv_co_write_zeroes)
>
> It's really missing currently, as qemu drive-mirror need it to have sparse images on copy.
>
> Ref from my discussion with Paolo from redhat in 2014 about this:
>   https://lists.gnu.org/archive/html/qemu-devel/2014-10/msg01274.html
>
>
> REgards,
>
> Alexandre
>
> ----- Mail original -----
> De: "Jason Dillaman" <jdillama@redhat.com>
> À: "Ilya Dryomov" <idryomov@gmail.com>
> Cc: "ceph-devel" <ceph-devel@vger.kernel.org>, "Christoph Hellwig" <hch@lst.de>, "Hannes Reinecke" <hare@suse.com>
> Envoyé: Mardi 23 Mai 2017 20:28:00
> Objet: Re: [PATCH] rbd: implement REQ_OP_WRITE_ZEROES
>
> lgtm
>
> Reviewed-by: Jason Dillaman <dillaman@redhat.com>
>
> On Tue, May 23, 2017 at 11:08 AM, Ilya Dryomov <idryomov@gmail.com> wrote:
>> Commit 93c1defedcae ("rbd: remove the discard_zeroes_data flag")
>> explicitly didn't implement REQ_OP_WRITE_ZEROES for rbd, while the
>> following commit 48920ff2a5a9 ("block: remove the discard_zeroes_data
>> flag") dropped ->discard_zeroes_data in favor of REQ_OP_WRITE_ZEROES.
>>
>> rbd does support efficient zeroing via CEPH_OSD_OP_ZERO opcode and will
>> release either some or all blocks depending on whether the zeroing
>> request is rbd_obj_bytes() aligned. This is how we currently implement
>> discards, so REQ_OP_WRITE_ZEROES can be identical to REQ_OP_DISCARD for
>> now. Caveats:
>>
>> - REQ_NOUNMAP is ignored, but AFAICT that's true of at least two other
>> current implementations - nvme and loop
>>
>> - there is no ->write_zeroes_alignment and blk_bio_write_zeroes_split()
>> is hence less helpful than blk_bio_discard_split(), but this can (and
>> should) be fixed on the rbd side
>>
>> In the future we will split these into two code paths to respect
>> REQ_NOUNMAP on zeroout and save on zeroing blocks that couldn't be
>> released on discard.
>>
>> Fixes: 93c1defedcae ("rbd: remove the discard_zeroes_data flag")
>> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
>> ---
>> drivers/block/rbd.c | 2 ++
>> 1 file changed, 2 insertions(+)
>>
>> diff --git a/drivers/block/rbd.c b/drivers/block/rbd.c
>> index 454bf9c34882..c16f74547804 100644
>> --- a/drivers/block/rbd.c
>> +++ b/drivers/block/rbd.c
>> @@ -4023,6 +4023,7 @@ static void rbd_queue_workfn(struct work_struct *work)
>>
>> switch (req_op(rq)) {
>> case REQ_OP_DISCARD:
>> + case REQ_OP_WRITE_ZEROES:
>> op_type = OBJ_OP_DISCARD;
>> break;
>> case REQ_OP_WRITE:
>> @@ -4420,6 +4421,7 @@ static int rbd_init_disk(struct rbd_device *rbd_dev)
>> q->limits.discard_granularity = segment_size;
>> q->limits.discard_alignment = segment_size;
>> blk_queue_max_discard_sectors(q, segment_size / SECTOR_SIZE);
>> + blk_queue_max_write_zeroes_sectors(q, segment_size / SECTOR_SIZE);
>>
>> if (!ceph_test_opt(rbd_dev->rbd_client->client, NOCRC))
>> q->backing_dev_info->capabilities |= BDI_CAP_STABLE_WRITES;
>> --
>> 2.4.3
>>
>> --
>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
>> the body of a message to majordomo@vger.kernel.org
>> More majordomo info at http://vger.kernel.org/majordomo-info.html
>
>
>
> --
> Jason
> --
> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
>
> --
> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html



-- 
Jason

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH] rbd: implement REQ_OP_WRITE_ZEROES
  2017-05-24 11:53     ` Jason Dillaman
@ 2017-05-24 17:38       ` Alexandre DERUMIER
  2017-05-24 19:34         ` Jason Dillaman
  0 siblings, 1 reply; 8+ messages in thread
From: Alexandre DERUMIER @ 2017-05-24 17:38 UTC (permalink / raw)
  To: dillaman; +Cc: Ilya Dryomov, ceph-devel, Christoph Hellwig, Hannes Reinecke

>>I just opened a tracker ticket for this [1] 
>>
>>[1] http://tracker.ceph.com/issues/20070

Thanks Jason!

>>-- let me know if you have
>>any other QEMU improvement ideas.

For the moment, the bigger limitation is cpu usage of librbd, 
as qemu can only 1 thread, I can't reach more than around 70000 iops by disk.
(3,1ghz cpu, disabling debug, rbd_cache, using jemalloc).

So any improvement to reduce cpu usage could be great :)


Also, in the future, I think qemu will support multiple iothreads by disk, 
I don't known if librbd is already ready for this ?




----- Mail original -----
De: "Jason Dillaman" <jdillama@redhat.com>
À: "aderumier" <aderumier@odiso.com>
Cc: "Ilya Dryomov" <idryomov@gmail.com>, "ceph-devel" <ceph-devel@vger.kernel.org>, "Christoph Hellwig" <hch@lst.de>, "Hannes Reinecke" <hare@suse.com>
Envoyé: Mercredi 24 Mai 2017 13:53:40
Objet: Re: [PATCH] rbd: implement REQ_OP_WRITE_ZEROES

I just opened a tracker ticket for this [1] -- let me know if you have 
any other QEMU improvement ideas. 

[1] http://tracker.ceph.com/issues/20070 

On Wed, May 24, 2017 at 7:38 AM, Alexandre DERUMIER <aderumier@odiso.com> wrote: 
> Hi, 
> 
> is it planned to implement write zeroes in qemu rbd block driver soon ? 
> (bdrv_co_write_zeroes) 
> 
> It's really missing currently, as qemu drive-mirror need it to have sparse images on copy. 
> 
> Ref from my discussion with Paolo from redhat in 2014 about this: 
> https://lists.gnu.org/archive/html/qemu-devel/2014-10/msg01274.html 
> 
> 
> REgards, 
> 
> Alexandre 
> 
> ----- Mail original ----- 
> De: "Jason Dillaman" <jdillama@redhat.com> 
> À: "Ilya Dryomov" <idryomov@gmail.com> 
> Cc: "ceph-devel" <ceph-devel@vger.kernel.org>, "Christoph Hellwig" <hch@lst.de>, "Hannes Reinecke" <hare@suse.com> 
> Envoyé: Mardi 23 Mai 2017 20:28:00 
> Objet: Re: [PATCH] rbd: implement REQ_OP_WRITE_ZEROES 
> 
> lgtm 
> 
> Reviewed-by: Jason Dillaman <dillaman@redhat.com> 
> 
> On Tue, May 23, 2017 at 11:08 AM, Ilya Dryomov <idryomov@gmail.com> wrote: 
>> Commit 93c1defedcae ("rbd: remove the discard_zeroes_data flag") 
>> explicitly didn't implement REQ_OP_WRITE_ZEROES for rbd, while the 
>> following commit 48920ff2a5a9 ("block: remove the discard_zeroes_data 
>> flag") dropped ->discard_zeroes_data in favor of REQ_OP_WRITE_ZEROES. 
>> 
>> rbd does support efficient zeroing via CEPH_OSD_OP_ZERO opcode and will 
>> release either some or all blocks depending on whether the zeroing 
>> request is rbd_obj_bytes() aligned. This is how we currently implement 
>> discards, so REQ_OP_WRITE_ZEROES can be identical to REQ_OP_DISCARD for 
>> now. Caveats: 
>> 
>> - REQ_NOUNMAP is ignored, but AFAICT that's true of at least two other 
>> current implementations - nvme and loop 
>> 
>> - there is no ->write_zeroes_alignment and blk_bio_write_zeroes_split() 
>> is hence less helpful than blk_bio_discard_split(), but this can (and 
>> should) be fixed on the rbd side 
>> 
>> In the future we will split these into two code paths to respect 
>> REQ_NOUNMAP on zeroout and save on zeroing blocks that couldn't be 
>> released on discard. 
>> 
>> Fixes: 93c1defedcae ("rbd: remove the discard_zeroes_data flag") 
>> Signed-off-by: Ilya Dryomov <idryomov@gmail.com> 
>> --- 
>> drivers/block/rbd.c | 2 ++ 
>> 1 file changed, 2 insertions(+) 
>> 
>> diff --git a/drivers/block/rbd.c b/drivers/block/rbd.c 
>> index 454bf9c34882..c16f74547804 100644 
>> --- a/drivers/block/rbd.c 
>> +++ b/drivers/block/rbd.c 
>> @@ -4023,6 +4023,7 @@ static void rbd_queue_workfn(struct work_struct *work) 
>> 
>> switch (req_op(rq)) { 
>> case REQ_OP_DISCARD: 
>> + case REQ_OP_WRITE_ZEROES: 
>> op_type = OBJ_OP_DISCARD; 
>> break; 
>> case REQ_OP_WRITE: 
>> @@ -4420,6 +4421,7 @@ static int rbd_init_disk(struct rbd_device *rbd_dev) 
>> q->limits.discard_granularity = segment_size; 
>> q->limits.discard_alignment = segment_size; 
>> blk_queue_max_discard_sectors(q, segment_size / SECTOR_SIZE); 
>> + blk_queue_max_write_zeroes_sectors(q, segment_size / SECTOR_SIZE); 
>> 
>> if (!ceph_test_opt(rbd_dev->rbd_client->client, NOCRC)) 
>> q->backing_dev_info->capabilities |= BDI_CAP_STABLE_WRITES; 
>> -- 
>> 2.4.3 
>> 
>> -- 
>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in 
>> the body of a message to majordomo@vger.kernel.org 
>> More majordomo info at http://vger.kernel.org/majordomo-info.html 
> 
> 
> 
> -- 
> Jason 
> -- 
> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in 
> the body of a message to majordomo@vger.kernel.org 
> More majordomo info at http://vger.kernel.org/majordomo-info.html 
> 
> -- 
> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in 
> the body of a message to majordomo@vger.kernel.org 
> More majordomo info at http://vger.kernel.org/majordomo-info.html 



-- 
Jason 


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH] rbd: implement REQ_OP_WRITE_ZEROES
  2017-05-24 17:38       ` Alexandre DERUMIER
@ 2017-05-24 19:34         ` Jason Dillaman
  0 siblings, 0 replies; 8+ messages in thread
From: Jason Dillaman @ 2017-05-24 19:34 UTC (permalink / raw)
  To: Alexandre DERUMIER
  Cc: Ilya Dryomov, ceph-devel, Christoph Hellwig, Hannes Reinecke

On Wed, May 24, 2017 at 1:38 PM, Alexandre DERUMIER <aderumier@odiso.com> wrote:
>>>I just opened a tracker ticket for this [1]
>>>
>>>[1] http://tracker.ceph.com/issues/20070
>
> Thanks Jason!
>
>>>-- let me know if you have
>>>any other QEMU improvement ideas.
>
> For the moment, the bigger limitation is cpu usage of librbd,
> as qemu can only 1 thread, I can't reach more than around 70000 iops by disk.
> (3,1ghz cpu, disabling debug, rbd_cache, using jemalloc).
>
> So any improvement to reduce cpu usage could be great :)

We've had some interest expressed in volunteering time and code
contributions to reduce the CPU usage of librbd / librados. I am
hoping some of these improvements will be included in the Mimic
release.

> Also, in the future, I think qemu will support multiple iothreads by disk,
> I don't known if librbd is already ready for this ?

This is hopefully another improvement we can make in the Mimic release
timeframe. librbd already has an architecture to support multiple
internal IO threads, but we have it disabled for now due to lack of
thorough testing -- and for the fact that librados needs to utilize
multiple IO threads as well.

>
>
> ----- Mail original -----
> De: "Jason Dillaman" <jdillama@redhat.com>
> À: "aderumier" <aderumier@odiso.com>
> Cc: "Ilya Dryomov" <idryomov@gmail.com>, "ceph-devel" <ceph-devel@vger.kernel.org>, "Christoph Hellwig" <hch@lst.de>, "Hannes Reinecke" <hare@suse.com>
> Envoyé: Mercredi 24 Mai 2017 13:53:40
> Objet: Re: [PATCH] rbd: implement REQ_OP_WRITE_ZEROES
>
> I just opened a tracker ticket for this [1] -- let me know if you have
> any other QEMU improvement ideas.
>
> [1] http://tracker.ceph.com/issues/20070
>
> On Wed, May 24, 2017 at 7:38 AM, Alexandre DERUMIER <aderumier@odiso.com> wrote:
>> Hi,
>>
>> is it planned to implement write zeroes in qemu rbd block driver soon ?
>> (bdrv_co_write_zeroes)
>>
>> It's really missing currently, as qemu drive-mirror need it to have sparse images on copy.
>>
>> Ref from my discussion with Paolo from redhat in 2014 about this:
>> https://lists.gnu.org/archive/html/qemu-devel/2014-10/msg01274.html
>>
>>
>> REgards,
>>
>> Alexandre
>>
>> ----- Mail original -----
>> De: "Jason Dillaman" <jdillama@redhat.com>
>> À: "Ilya Dryomov" <idryomov@gmail.com>
>> Cc: "ceph-devel" <ceph-devel@vger.kernel.org>, "Christoph Hellwig" <hch@lst.de>, "Hannes Reinecke" <hare@suse.com>
>> Envoyé: Mardi 23 Mai 2017 20:28:00
>> Objet: Re: [PATCH] rbd: implement REQ_OP_WRITE_ZEROES
>>
>> lgtm
>>
>> Reviewed-by: Jason Dillaman <dillaman@redhat.com>
>>
>> On Tue, May 23, 2017 at 11:08 AM, Ilya Dryomov <idryomov@gmail.com> wrote:
>>> Commit 93c1defedcae ("rbd: remove the discard_zeroes_data flag")
>>> explicitly didn't implement REQ_OP_WRITE_ZEROES for rbd, while the
>>> following commit 48920ff2a5a9 ("block: remove the discard_zeroes_data
>>> flag") dropped ->discard_zeroes_data in favor of REQ_OP_WRITE_ZEROES.
>>>
>>> rbd does support efficient zeroing via CEPH_OSD_OP_ZERO opcode and will
>>> release either some or all blocks depending on whether the zeroing
>>> request is rbd_obj_bytes() aligned. This is how we currently implement
>>> discards, so REQ_OP_WRITE_ZEROES can be identical to REQ_OP_DISCARD for
>>> now. Caveats:
>>>
>>> - REQ_NOUNMAP is ignored, but AFAICT that's true of at least two other
>>> current implementations - nvme and loop
>>>
>>> - there is no ->write_zeroes_alignment and blk_bio_write_zeroes_split()
>>> is hence less helpful than blk_bio_discard_split(), but this can (and
>>> should) be fixed on the rbd side
>>>
>>> In the future we will split these into two code paths to respect
>>> REQ_NOUNMAP on zeroout and save on zeroing blocks that couldn't be
>>> released on discard.
>>>
>>> Fixes: 93c1defedcae ("rbd: remove the discard_zeroes_data flag")
>>> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
>>> ---
>>> drivers/block/rbd.c | 2 ++
>>> 1 file changed, 2 insertions(+)
>>>
>>> diff --git a/drivers/block/rbd.c b/drivers/block/rbd.c
>>> index 454bf9c34882..c16f74547804 100644
>>> --- a/drivers/block/rbd.c
>>> +++ b/drivers/block/rbd.c
>>> @@ -4023,6 +4023,7 @@ static void rbd_queue_workfn(struct work_struct *work)
>>>
>>> switch (req_op(rq)) {
>>> case REQ_OP_DISCARD:
>>> + case REQ_OP_WRITE_ZEROES:
>>> op_type = OBJ_OP_DISCARD;
>>> break;
>>> case REQ_OP_WRITE:
>>> @@ -4420,6 +4421,7 @@ static int rbd_init_disk(struct rbd_device *rbd_dev)
>>> q->limits.discard_granularity = segment_size;
>>> q->limits.discard_alignment = segment_size;
>>> blk_queue_max_discard_sectors(q, segment_size / SECTOR_SIZE);
>>> + blk_queue_max_write_zeroes_sectors(q, segment_size / SECTOR_SIZE);
>>>
>>> if (!ceph_test_opt(rbd_dev->rbd_client->client, NOCRC))
>>> q->backing_dev_info->capabilities |= BDI_CAP_STABLE_WRITES;
>>> --
>>> 2.4.3
>>>
>>> --
>>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
>>> the body of a message to majordomo@vger.kernel.org
>>> More majordomo info at http://vger.kernel.org/majordomo-info.html
>>
>>
>>
>> --
>> Jason
>> --
>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
>> the body of a message to majordomo@vger.kernel.org
>> More majordomo info at http://vger.kernel.org/majordomo-info.html
>>
>> --
>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
>> the body of a message to majordomo@vger.kernel.org
>> More majordomo info at http://vger.kernel.org/majordomo-info.html
>
>
>
> --
> Jason
>



-- 
Jason

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH] rbd: implement REQ_OP_WRITE_ZEROES
  2017-05-23 15:08 [PATCH] rbd: implement REQ_OP_WRITE_ZEROES Ilya Dryomov
  2017-05-23 18:28 ` Jason Dillaman
@ 2017-05-29  8:39 ` Ilya Dryomov
  2017-05-29  8:43 ` Christoph Hellwig
  2 siblings, 0 replies; 8+ messages in thread
From: Ilya Dryomov @ 2017-05-29  8:39 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: Ceph Development, Hannes Reinecke

On Tue, May 23, 2017 at 5:08 PM, Ilya Dryomov <idryomov@gmail.com> wrote:
> Commit 93c1defedcae ("rbd: remove the discard_zeroes_data flag")
> explicitly didn't implement REQ_OP_WRITE_ZEROES for rbd, while the
> following commit 48920ff2a5a9 ("block: remove the discard_zeroes_data
> flag") dropped ->discard_zeroes_data in favor of REQ_OP_WRITE_ZEROES.
>
> rbd does support efficient zeroing via CEPH_OSD_OP_ZERO opcode and will
> release either some or all blocks depending on whether the zeroing
> request is rbd_obj_bytes() aligned.  This is how we currently implement
> discards, so REQ_OP_WRITE_ZEROES can be identical to REQ_OP_DISCARD for
> now.  Caveats:
>
> - REQ_NOUNMAP is ignored, but AFAICT that's true of at least two other
>   current implementations - nvme and loop
>
> - there is no ->write_zeroes_alignment and blk_bio_write_zeroes_split()
>   is hence less helpful than blk_bio_discard_split(), but this can (and
>   should) be fixed on the rbd side
>
> In the future we will split these into two code paths to respect
> REQ_NOUNMAP on zeroout and save on zeroing blocks that couldn't be
> released on discard.
>
> Fixes: 93c1defedcae ("rbd: remove the discard_zeroes_data flag")
> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
> ---
>  drivers/block/rbd.c | 2 ++
>  1 file changed, 2 insertions(+)
>
> diff --git a/drivers/block/rbd.c b/drivers/block/rbd.c
> index 454bf9c34882..c16f74547804 100644
> --- a/drivers/block/rbd.c
> +++ b/drivers/block/rbd.c
> @@ -4023,6 +4023,7 @@ static void rbd_queue_workfn(struct work_struct *work)
>
>         switch (req_op(rq)) {
>         case REQ_OP_DISCARD:
> +       case REQ_OP_WRITE_ZEROES:
>                 op_type = OBJ_OP_DISCARD;
>                 break;
>         case REQ_OP_WRITE:
> @@ -4420,6 +4421,7 @@ static int rbd_init_disk(struct rbd_device *rbd_dev)
>         q->limits.discard_granularity = segment_size;
>         q->limits.discard_alignment = segment_size;
>         blk_queue_max_discard_sectors(q, segment_size / SECTOR_SIZE);
> +       blk_queue_max_write_zeroes_sectors(q, segment_size / SECTOR_SIZE);
>
>         if (!ceph_test_opt(rbd_dev->rbd_client->client, NOCRC))
>                 q->backing_dev_info->capabilities |= BDI_CAP_STABLE_WRITES;

Hi Christoph,

I'm planning to merge this into 4.12-rc because it fixes a 93c1defedcae
regression, but I'm not quite sure what you meant by "rbd only supports
discarding on large alignments, so the zeroing code would always fall
back to explicit writings of zeroes".  Care to take a look and ack?

Thanks,

                Ilya

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH] rbd: implement REQ_OP_WRITE_ZEROES
  2017-05-23 15:08 [PATCH] rbd: implement REQ_OP_WRITE_ZEROES Ilya Dryomov
  2017-05-23 18:28 ` Jason Dillaman
  2017-05-29  8:39 ` Ilya Dryomov
@ 2017-05-29  8:43 ` Christoph Hellwig
  2 siblings, 0 replies; 8+ messages in thread
From: Christoph Hellwig @ 2017-05-29  8:43 UTC (permalink / raw)
  To: Ilya Dryomov; +Cc: ceph-devel, Christoph Hellwig, Hannes Reinecke

On Tue, May 23, 2017 at 05:08:27PM +0200, Ilya Dryomov wrote:
> Commit 93c1defedcae ("rbd: remove the discard_zeroes_data flag")
> explicitly didn't implement REQ_OP_WRITE_ZEROES for rbd, while the
> following commit 48920ff2a5a9 ("block: remove the discard_zeroes_data
> flag") dropped ->discard_zeroes_data in favor of REQ_OP_WRITE_ZEROES.
> 
> rbd does support efficient zeroing via CEPH_OSD_OP_ZERO opcode and will
> release either some or all blocks depending on whether the zeroing
> request is rbd_obj_bytes() aligned.  This is how we currently implement
> discards, so REQ_OP_WRITE_ZEROES can be identical to REQ_OP_DISCARD for
> now.  Caveats:
> 
> - REQ_NOUNMAP is ignored, but AFAICT that's true of at least two other
>   current implementations - nvme and loop

That's no problem, it's just a hint in case the device differenciates
the cases.

> - there is no ->write_zeroes_alignment and blk_bio_write_zeroes_split()
>   is hence less helpful than blk_bio_discard_split(), but this can (and
>   should) be fixed on the rbd side

Yes.

> In the future we will split these into two code paths to respect
> REQ_NOUNMAP on zeroout and save on zeroing blocks that couldn't be
> released on discard.
> 
> Fixes: 93c1defedcae ("rbd: remove the discard_zeroes_data flag")
> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>

Looks fine,

Reviewed-by: Christoph Hellwig <hch@lst.de>

^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2017-05-29  8:43 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-05-23 15:08 [PATCH] rbd: implement REQ_OP_WRITE_ZEROES Ilya Dryomov
2017-05-23 18:28 ` Jason Dillaman
2017-05-24 11:38   ` Alexandre DERUMIER
2017-05-24 11:53     ` Jason Dillaman
2017-05-24 17:38       ` Alexandre DERUMIER
2017-05-24 19:34         ` Jason Dillaman
2017-05-29  8:39 ` Ilya Dryomov
2017-05-29  8:43 ` Christoph Hellwig

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.