All of lore.kernel.org
 help / color / mirror / Atom feed
* Severe performance degradation with jewel rbd image
@ 2016-05-25 20:48 Somnath Roy
  2016-05-25 22:47 ` Jason Dillaman
  0 siblings, 1 reply; 18+ messages in thread
From: Somnath Roy @ 2016-05-25 20:48 UTC (permalink / raw)
  To: ceph-devel

Hi Mark/Josh,
As I mentioned in the performance meeting today , if we create rbd image with default 'rbd create' command in jewel , the individual image performance for 4k RW is not scaling up well. But, multiple of rbd images running in parallel aggregated throughput is scaling.
For the same QD and numjob combination, image created with image format 1 (and with hammer like rbd_default_features = 3) is producing *16X* more performance. I did some digging and here is my findings.

Setup:
--------

32 osds (all SSD) over 4 nodes. Pool size = 2 , min_size = 1.

root@stormeap-1:~# ceph -s
    cluster db0febf1-d2b0-4f8d-8f20-43731c134763
     health HEALTH_WARN
            noscrub,nodeep-scrub,sortbitwise flag(s) set
     monmap e1: 1 mons at {a=10.60.194.10:6789/0}
            election epoch 5, quorum 0 a
     osdmap e139: 32 osds: 32 up, 32 in
            flags noscrub,nodeep-scrub,sortbitwise
      pgmap v20532: 2500 pgs, 1 pools, 7421 GB data, 1855 kobjects
            14850 GB used, 208 TB / 223 TB avail
                2500 active+clean

IO profile : Fio rbd with QD 128 and numjob = 10
rbd cache is disabled.

Result:
--------
root@stormeap-1:~# rbd info recovery_test/rbd_degradation
rbd image 'rbd_degradation':
        size 1953 GB in 500000 objects
        order 22 (4096 kB objects)
        block_name_prefix: rb.0.5f5f.6b8b4567
        format: 1

On the above image with format 1 it is giving *~102K iops*

root@stormeap-1:~# rbd info recovery_test/rbd_degradation_with_hammer_features
rbd image 'rbd_degradation_with_hammer_features':
        size 195 GB in 50000 objects
        order 22 (4096 kB objects)
        block_name_prefix: rbd_data.5f8d6b8b4567
        format: 2
        features: layering
        flags:

On the above image with hammer rbd features on , it is giving *~105K iops*

root@stormeap-1:~# rbd info recovery_test/rbd_degradation_with_7
rbd image 'rbd_degradation_with_7':
        size 195 GB in 50000 objects
        order 22 (4096 kB objects)
        block_name_prefix: rbd_data.5fd86b8b4567
        format: 2
        features: layering, exclusive-lock
        flags:

On the above image with feature 7 (exclusive lock feature on) , it is giving *~8K iops*...So, >12X degradation

Tried with single numjob and QD = 128 , performance bumped up till ~40K..Further increasing QD , performance is not going up.


root@stormeap-1:~# rbd info recovery_test/rbd_degradation_with_15
rbd image 'rbd_degradation_with_15':
        size 195 GB in 50000 objects
        order 22 (4096 kB objects)
        block_name_prefix: rbd_data.5fab6b8b4567
        format: 2
        features: layering, exclusive-lock, object-map
        flags:

On the above image with feature 15 (exclusive lock, object map feature on) , it is giving *~8K iops*...So, >12X degradation

Tried with single numjob and QD = 128 , performance bumped up till ~40K..Further increasing QD , performance is not going up.


root@stormeap-1:~# rbd info recovery_test/ceph_recovery_img_1
rbd image 'ceph_recovery_img_1':
        size 4882 GB in 1250000 objects
        order 22 (4096 kB objects)
        block_name_prefix: rbd_data.371b6b8b4567
        format: 2
        features: layering, exclusive-lock, object-map, fast-diff, deep-flatten
        flags:

On the above image with feature 61 (Jewel default) , it is giving *~6K iops*...So, *>16X* degradation

Tried with single numjob and QD = 128 , performance bumped up till ~35K..Further increasing QD , performance is not going up.

Summary :
------------

1. It seems exclusive lock feature is degrading performance.

2. It is degrading a bit further on enabling fast-diff, deep-flatten


Let me know if you need more information on this.

Thanks & Regards
Somnath


PLEASE NOTE: The information contained in this electronic mail message is intended only for the use of the designated recipient(s) named above. If the reader of this message is not the intended recipient, you are hereby notified that you have received this message in error and that any review, dissemination, distribution, or copying of this message is strictly prohibited. If you have received this communication in error, please notify the sender by telephone or e-mail (as shown above) immediately and destroy any and all copies of this message in your possession (whether hard copies or electronically stored copies).

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: Severe performance degradation with jewel rbd image
  2016-05-25 20:48 Severe performance degradation with jewel rbd image Somnath Roy
@ 2016-05-25 22:47 ` Jason Dillaman
  2016-05-25 23:08   ` Somnath Roy
  0 siblings, 1 reply; 18+ messages in thread
From: Jason Dillaman @ 2016-05-25 22:47 UTC (permalink / raw)
  To: Somnath Roy; +Cc: ceph-devel

Just to eliminate the most straightforward explanation, are you
running multiple fio jobs against the same image concurrently?  If the
exclusive lock had to ping-pong back-and-forth between clients, that
would certainly explain the severe performance penalty.

Otherwise, the exclusive lock is not in the IO path once the client
has acquired the exclusive lock.  If you are seeing a performance
penalty for a single-client scenario with exclusive lock enabled, this
is something we haven't seen and will have to investigate ASAP.

Thanks,

On Wed, May 25, 2016 at 4:48 PM, Somnath Roy <Somnath.Roy@sandisk.com> wrote:
> Hi Mark/Josh,
> As I mentioned in the performance meeting today , if we create rbd image with default 'rbd create' command in jewel , the individual image performance for 4k RW is not scaling up well. But, multiple of rbd images running in parallel aggregated throughput is scaling.
> For the same QD and numjob combination, image created with image format 1 (and with hammer like rbd_default_features = 3) is producing *16X* more performance. I did some digging and here is my findings.
>
> Setup:
> --------
>
> 32 osds (all SSD) over 4 nodes. Pool size = 2 , min_size = 1.
>
> root@stormeap-1:~# ceph -s
>     cluster db0febf1-d2b0-4f8d-8f20-43731c134763
>      health HEALTH_WARN
>             noscrub,nodeep-scrub,sortbitwise flag(s) set
>      monmap e1: 1 mons at {a=10.60.194.10:6789/0}
>             election epoch 5, quorum 0 a
>      osdmap e139: 32 osds: 32 up, 32 in
>             flags noscrub,nodeep-scrub,sortbitwise
>       pgmap v20532: 2500 pgs, 1 pools, 7421 GB data, 1855 kobjects
>             14850 GB used, 208 TB / 223 TB avail
>                 2500 active+clean
>
> IO profile : Fio rbd with QD 128 and numjob = 10
> rbd cache is disabled.
>
> Result:
> --------
> root@stormeap-1:~# rbd info recovery_test/rbd_degradation
> rbd image 'rbd_degradation':
>         size 1953 GB in 500000 objects
>         order 22 (4096 kB objects)
>         block_name_prefix: rb.0.5f5f.6b8b4567
>         format: 1
>
> On the above image with format 1 it is giving *~102K iops*
>
> root@stormeap-1:~# rbd info recovery_test/rbd_degradation_with_hammer_features
> rbd image 'rbd_degradation_with_hammer_features':
>         size 195 GB in 50000 objects
>         order 22 (4096 kB objects)
>         block_name_prefix: rbd_data.5f8d6b8b4567
>         format: 2
>         features: layering
>         flags:
>
> On the above image with hammer rbd features on , it is giving *~105K iops*
>
> root@stormeap-1:~# rbd info recovery_test/rbd_degradation_with_7
> rbd image 'rbd_degradation_with_7':
>         size 195 GB in 50000 objects
>         order 22 (4096 kB objects)
>         block_name_prefix: rbd_data.5fd86b8b4567
>         format: 2
>         features: layering, exclusive-lock
>         flags:
>
> On the above image with feature 7 (exclusive lock feature on) , it is giving *~8K iops*...So, >12X degradation
>
> Tried with single numjob and QD = 128 , performance bumped up till ~40K..Further increasing QD , performance is not going up.
>
>
> root@stormeap-1:~# rbd info recovery_test/rbd_degradation_with_15
> rbd image 'rbd_degradation_with_15':
>         size 195 GB in 50000 objects
>         order 22 (4096 kB objects)
>         block_name_prefix: rbd_data.5fab6b8b4567
>         format: 2
>         features: layering, exclusive-lock, object-map
>         flags:
>
> On the above image with feature 15 (exclusive lock, object map feature on) , it is giving *~8K iops*...So, >12X degradation
>
> Tried with single numjob and QD = 128 , performance bumped up till ~40K..Further increasing QD , performance is not going up.
>
>
> root@stormeap-1:~# rbd info recovery_test/ceph_recovery_img_1
> rbd image 'ceph_recovery_img_1':
>         size 4882 GB in 1250000 objects
>         order 22 (4096 kB objects)
>         block_name_prefix: rbd_data.371b6b8b4567
>         format: 2
>         features: layering, exclusive-lock, object-map, fast-diff, deep-flatten
>         flags:
>
> On the above image with feature 61 (Jewel default) , it is giving *~6K iops*...So, *>16X* degradation
>
> Tried with single numjob and QD = 128 , performance bumped up till ~35K..Further increasing QD , performance is not going up.
>
> Summary :
> ------------
>
> 1. It seems exclusive lock feature is degrading performance.
>
> 2. It is degrading a bit further on enabling fast-diff, deep-flatten
>
>
> Let me know if you need more information on this.
>
> Thanks & Regards
> Somnath
>
>
> PLEASE NOTE: The information contained in this electronic mail message is intended only for the use of the designated recipient(s) named above. If the reader of this message is not the intended recipient, you are hereby notified that you have received this message in error and that any review, dissemination, distribution, or copying of this message is strictly prohibited. If you have received this communication in error, please notify the sender by telephone or e-mail (as shown above) immediately and destroy any and all copies of this message in your possession (whether hard copies or electronically stored copies).
> --
> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html



-- 
Jason

^ permalink raw reply	[flat|nested] 18+ messages in thread

* RE: Severe performance degradation with jewel rbd image
  2016-05-25 22:47 ` Jason Dillaman
@ 2016-05-25 23:08   ` Somnath Roy
  2016-05-26  0:50     ` Jason Dillaman
  2016-05-26  2:01     ` Haomai Wang
  0 siblings, 2 replies; 18+ messages in thread
From: Somnath Roy @ 2016-05-25 23:08 UTC (permalink / raw)
  To: dillaman; +Cc: ceph-devel

Hi Jason,
Yes, I am running on single image but with multiple fio jobs (with numjobs = 10 in the fio parameter) . I expected that this exclusive lock is locking between the jobs and that's why I have posted the single job result. Single job result is *not degrading*.
But, we need numjob and QD combination to extract most performance from an image with fio-rbd. A single librbd instance performance seems to be stuck at 40K and we can't scale this image up without running multiple librbd instances on this in parallel.
16X performance degradation because of this lock seems very destructive.

Thanks & Regards
Somnath

-----Original Message-----
From: Jason Dillaman [mailto:jdillama@redhat.com] 
Sent: Wednesday, May 25, 2016 3:47 PM
To: Somnath Roy
Cc: ceph-devel@vger.kernel.org
Subject: Re: Severe performance degradation with jewel rbd image

Just to eliminate the most straightforward explanation, are you running multiple fio jobs against the same image concurrently?  If the exclusive lock had to ping-pong back-and-forth between clients, that would certainly explain the severe performance penalty.

Otherwise, the exclusive lock is not in the IO path once the client has acquired the exclusive lock.  If you are seeing a performance penalty for a single-client scenario with exclusive lock enabled, this is something we haven't seen and will have to investigate ASAP.

Thanks,

On Wed, May 25, 2016 at 4:48 PM, Somnath Roy <Somnath.Roy@sandisk.com> wrote:
> Hi Mark/Josh,
> As I mentioned in the performance meeting today , if we create rbd image with default 'rbd create' command in jewel , the individual image performance for 4k RW is not scaling up well. But, multiple of rbd images running in parallel aggregated throughput is scaling.
> For the same QD and numjob combination, image created with image format 1 (and with hammer like rbd_default_features = 3) is producing *16X* more performance. I did some digging and here is my findings.
>
> Setup:
> --------
>
> 32 osds (all SSD) over 4 nodes. Pool size = 2 , min_size = 1.
>
> root@stormeap-1:~# ceph -s
>     cluster db0febf1-d2b0-4f8d-8f20-43731c134763
>      health HEALTH_WARN
>             noscrub,nodeep-scrub,sortbitwise flag(s) set
>      monmap e1: 1 mons at {a=10.60.194.10:6789/0}
>             election epoch 5, quorum 0 a
>      osdmap e139: 32 osds: 32 up, 32 in
>             flags noscrub,nodeep-scrub,sortbitwise
>       pgmap v20532: 2500 pgs, 1 pools, 7421 GB data, 1855 kobjects
>             14850 GB used, 208 TB / 223 TB avail
>                 2500 active+clean
>
> IO profile : Fio rbd with QD 128 and numjob = 10 rbd cache is 
> disabled.
>
> Result:
> --------
> root@stormeap-1:~# rbd info recovery_test/rbd_degradation rbd image 
> 'rbd_degradation':
>         size 1953 GB in 500000 objects
>         order 22 (4096 kB objects)
>         block_name_prefix: rb.0.5f5f.6b8b4567
>         format: 1
>
> On the above image with format 1 it is giving *~102K iops*
>
> root@stormeap-1:~# rbd info 
> recovery_test/rbd_degradation_with_hammer_features
> rbd image 'rbd_degradation_with_hammer_features':
>         size 195 GB in 50000 objects
>         order 22 (4096 kB objects)
>         block_name_prefix: rbd_data.5f8d6b8b4567
>         format: 2
>         features: layering
>         flags:
>
> On the above image with hammer rbd features on , it is giving *~105K 
> iops*
>
> root@stormeap-1:~# rbd info recovery_test/rbd_degradation_with_7
> rbd image 'rbd_degradation_with_7':
>         size 195 GB in 50000 objects
>         order 22 (4096 kB objects)
>         block_name_prefix: rbd_data.5fd86b8b4567
>         format: 2
>         features: layering, exclusive-lock
>         flags:
>
> On the above image with feature 7 (exclusive lock feature on) , it is 
> giving *~8K iops*...So, >12X degradation
>
> Tried with single numjob and QD = 128 , performance bumped up till ~40K..Further increasing QD , performance is not going up.
>
>
> root@stormeap-1:~# rbd info recovery_test/rbd_degradation_with_15
> rbd image 'rbd_degradation_with_15':
>         size 195 GB in 50000 objects
>         order 22 (4096 kB objects)
>         block_name_prefix: rbd_data.5fab6b8b4567
>         format: 2
>         features: layering, exclusive-lock, object-map
>         flags:
>
> On the above image with feature 15 (exclusive lock, object map feature 
> on) , it is giving *~8K iops*...So, >12X degradation
>
> Tried with single numjob and QD = 128 , performance bumped up till ~40K..Further increasing QD , performance is not going up.
>
>
> root@stormeap-1:~# rbd info recovery_test/ceph_recovery_img_1 rbd 
> image 'ceph_recovery_img_1':
>         size 4882 GB in 1250000 objects
>         order 22 (4096 kB objects)
>         block_name_prefix: rbd_data.371b6b8b4567
>         format: 2
>         features: layering, exclusive-lock, object-map, fast-diff, deep-flatten
>         flags:
>
> On the above image with feature 61 (Jewel default) , it is giving *~6K 
> iops*...So, *>16X* degradation
>
> Tried with single numjob and QD = 128 , performance bumped up till ~35K..Further increasing QD , performance is not going up.
>
> Summary :
> ------------
>
> 1. It seems exclusive lock feature is degrading performance.
>
> 2. It is degrading a bit further on enabling fast-diff, deep-flatten
>
>
> Let me know if you need more information on this.
>
> Thanks & Regards
> Somnath
>
>
> PLEASE NOTE: The information contained in this electronic mail message is intended only for the use of the designated recipient(s) named above. If the reader of this message is not the intended recipient, you are hereby notified that you have received this message in error and that any review, dissemination, distribution, or copying of this message is strictly prohibited. If you have received this communication in error, please notify the sender by telephone or e-mail (as shown above) immediately and destroy any and all copies of this message in your possession (whether hard copies or electronically stored copies).
> --
> To unsubscribe from this list: send the line "unsubscribe ceph-devel" 
> in the body of a message to majordomo@vger.kernel.org More majordomo 
> info at  http://vger.kernel.org/majordomo-info.html



--
Jason

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: Severe performance degradation with jewel rbd image
  2016-05-25 23:08   ` Somnath Roy
@ 2016-05-26  0:50     ` Jason Dillaman
  2016-05-26  3:39       ` Somnath Roy
  2016-05-26  2:01     ` Haomai Wang
  1 sibling, 1 reply; 18+ messages in thread
From: Jason Dillaman @ 2016-05-26  0:50 UTC (permalink / raw)
  To: Somnath Roy; +Cc: ceph-devel

Are you attempting to test a particular use-case where you would have
multiple clients connected to a single RBD image?  The rbd CLI has a
"--image-shared" option when creating/cloning images as a shortcut to
easily disable the exclusive lock, object-map, fast-diff, and
journaling features for such situations. You could also specify a
different `rbdname` per job to simulate multiple clients accessing
multiple images (instead of multiple clients sharing the same image).

I have to be honest, I am actually pretty impressed by your 8K IOPS
when you have multiple clients fighting over the exclusive lock since
acquiring the lock requires inter-client cooperative coordination to
request/release/acquire the lock from the current owner and any client
without the lock has all writes blocked.


On Wed, May 25, 2016 at 7:08 PM, Somnath Roy <Somnath.Roy@sandisk.com> wrote:
> Hi Jason,
> Yes, I am running on single image but with multiple fio jobs (with numjobs = 10 in the fio parameter) . I expected that this exclusive lock is locking between the jobs and that's why I have posted the single job result. Single job result is *not degrading*.
> But, we need numjob and QD combination to extract most performance from an image with fio-rbd. A single librbd instance performance seems to be stuck at 40K and we can't scale this image up without running multiple librbd instances on this in parallel.
> 16X performance degradation because of this lock seems very destructive.
>
> Thanks & Regards
> Somnath
>
> -----Original Message-----
> From: Jason Dillaman [mailto:jdillama@redhat.com]
> Sent: Wednesday, May 25, 2016 3:47 PM
> To: Somnath Roy
> Cc: ceph-devel@vger.kernel.org
> Subject: Re: Severe performance degradation with jewel rbd image
>
> Just to eliminate the most straightforward explanation, are you running multiple fio jobs against the same image concurrently?  If the exclusive lock had to ping-pong back-and-forth between clients, that would certainly explain the severe performance penalty.
>
> Otherwise, the exclusive lock is not in the IO path once the client has acquired the exclusive lock.  If you are seeing a performance penalty for a single-client scenario with exclusive lock enabled, this is something we haven't seen and will have to investigate ASAP.
>
> Thanks,
>
> On Wed, May 25, 2016 at 4:48 PM, Somnath Roy <Somnath.Roy@sandisk.com> wrote:
>> Hi Mark/Josh,
>> As I mentioned in the performance meeting today , if we create rbd image with default 'rbd create' command in jewel , the individual image performance for 4k RW is not scaling up well. But, multiple of rbd images running in parallel aggregated throughput is scaling.
>> For the same QD and numjob combination, image created with image format 1 (and with hammer like rbd_default_features = 3) is producing *16X* more performance. I did some digging and here is my findings.
>>
>> Setup:
>> --------
>>
>> 32 osds (all SSD) over 4 nodes. Pool size = 2 , min_size = 1.
>>
>> root@stormeap-1:~# ceph -s
>>     cluster db0febf1-d2b0-4f8d-8f20-43731c134763
>>      health HEALTH_WARN
>>             noscrub,nodeep-scrub,sortbitwise flag(s) set
>>      monmap e1: 1 mons at {a=10.60.194.10:6789/0}
>>             election epoch 5, quorum 0 a
>>      osdmap e139: 32 osds: 32 up, 32 in
>>             flags noscrub,nodeep-scrub,sortbitwise
>>       pgmap v20532: 2500 pgs, 1 pools, 7421 GB data, 1855 kobjects
>>             14850 GB used, 208 TB / 223 TB avail
>>                 2500 active+clean
>>
>> IO profile : Fio rbd with QD 128 and numjob = 10 rbd cache is
>> disabled.
>>
>> Result:
>> --------
>> root@stormeap-1:~# rbd info recovery_test/rbd_degradation rbd image
>> 'rbd_degradation':
>>         size 1953 GB in 500000 objects
>>         order 22 (4096 kB objects)
>>         block_name_prefix: rb.0.5f5f.6b8b4567
>>         format: 1
>>
>> On the above image with format 1 it is giving *~102K iops*
>>
>> root@stormeap-1:~# rbd info
>> recovery_test/rbd_degradation_with_hammer_features
>> rbd image 'rbd_degradation_with_hammer_features':
>>         size 195 GB in 50000 objects
>>         order 22 (4096 kB objects)
>>         block_name_prefix: rbd_data.5f8d6b8b4567
>>         format: 2
>>         features: layering
>>         flags:
>>
>> On the above image with hammer rbd features on , it is giving *~105K
>> iops*
>>
>> root@stormeap-1:~# rbd info recovery_test/rbd_degradation_with_7
>> rbd image 'rbd_degradation_with_7':
>>         size 195 GB in 50000 objects
>>         order 22 (4096 kB objects)
>>         block_name_prefix: rbd_data.5fd86b8b4567
>>         format: 2
>>         features: layering, exclusive-lock
>>         flags:
>>
>> On the above image with feature 7 (exclusive lock feature on) , it is
>> giving *~8K iops*...So, >12X degradation
>>
>> Tried with single numjob and QD = 128 , performance bumped up till ~40K..Further increasing QD , performance is not going up.
>>
>>
>> root@stormeap-1:~# rbd info recovery_test/rbd_degradation_with_15
>> rbd image 'rbd_degradation_with_15':
>>         size 195 GB in 50000 objects
>>         order 22 (4096 kB objects)
>>         block_name_prefix: rbd_data.5fab6b8b4567
>>         format: 2
>>         features: layering, exclusive-lock, object-map
>>         flags:
>>
>> On the above image with feature 15 (exclusive lock, object map feature
>> on) , it is giving *~8K iops*...So, >12X degradation
>>
>> Tried with single numjob and QD = 128 , performance bumped up till ~40K..Further increasing QD , performance is not going up.
>>
>>
>> root@stormeap-1:~# rbd info recovery_test/ceph_recovery_img_1 rbd
>> image 'ceph_recovery_img_1':
>>         size 4882 GB in 1250000 objects
>>         order 22 (4096 kB objects)
>>         block_name_prefix: rbd_data.371b6b8b4567
>>         format: 2
>>         features: layering, exclusive-lock, object-map, fast-diff, deep-flatten
>>         flags:
>>
>> On the above image with feature 61 (Jewel default) , it is giving *~6K
>> iops*...So, *>16X* degradation
>>
>> Tried with single numjob and QD = 128 , performance bumped up till ~35K..Further increasing QD , performance is not going up.
>>
>> Summary :
>> ------------
>>
>> 1. It seems exclusive lock feature is degrading performance.
>>
>> 2. It is degrading a bit further on enabling fast-diff, deep-flatten
>>
>>
>> Let me know if you need more information on this.
>>
>> Thanks & Regards
>> Somnath
>>
>>
>> PLEASE NOTE: The information contained in this electronic mail message is intended only for the use of the designated recipient(s) named above. If the reader of this message is not the intended recipient, you are hereby notified that you have received this message in error and that any review, dissemination, distribution, or copying of this message is strictly prohibited. If you have received this communication in error, please notify the sender by telephone or e-mail (as shown above) immediately and destroy any and all copies of this message in your possession (whether hard copies or electronically stored copies).
>> --
>> To unsubscribe from this list: send the line "unsubscribe ceph-devel"
>> in the body of a message to majordomo@vger.kernel.org More majordomo
>> info at  http://vger.kernel.org/majordomo-info.html
>
>
>
> --
> Jason


-- 
Jason

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: Severe performance degradation with jewel rbd image
  2016-05-25 23:08   ` Somnath Roy
  2016-05-26  0:50     ` Jason Dillaman
@ 2016-05-26  2:01     ` Haomai Wang
  1 sibling, 0 replies; 18+ messages in thread
From: Haomai Wang @ 2016-05-26  2:01 UTC (permalink / raw)
  To: Somnath Roy; +Cc: dillaman, ceph-devel

For the current design, I think we should target to improve the single
librbd instance performance. From my past experience, the bottleneck
will be radosclient finisher and objecter session lock.

On Thu, May 26, 2016 at 7:08 AM, Somnath Roy <Somnath.Roy@sandisk.com> wrote:
> Hi Jason,
> Yes, I am running on single image but with multiple fio jobs (with numjobs = 10 in the fio parameter) . I expected that this exclusive lock is locking between the jobs and that's why I have posted the single job result. Single job result is *not degrading*.
> But, we need numjob and QD combination to extract most performance from an image with fio-rbd. A single librbd instance performance seems to be stuck at 40K and we can't scale this image up without running multiple librbd instances on this in parallel.
> 16X performance degradation because of this lock seems very destructive.
>
> Thanks & Regards
> Somnath
>
> -----Original Message-----
> From: Jason Dillaman [mailto:jdillama@redhat.com]
> Sent: Wednesday, May 25, 2016 3:47 PM
> To: Somnath Roy
> Cc: ceph-devel@vger.kernel.org
> Subject: Re: Severe performance degradation with jewel rbd image
>
> Just to eliminate the most straightforward explanation, are you running multiple fio jobs against the same image concurrently?  If the exclusive lock had to ping-pong back-and-forth between clients, that would certainly explain the severe performance penalty.
>
> Otherwise, the exclusive lock is not in the IO path once the client has acquired the exclusive lock.  If you are seeing a performance penalty for a single-client scenario with exclusive lock enabled, this is something we haven't seen and will have to investigate ASAP.
>
> Thanks,
>
> On Wed, May 25, 2016 at 4:48 PM, Somnath Roy <Somnath.Roy@sandisk.com> wrote:
>> Hi Mark/Josh,
>> As I mentioned in the performance meeting today , if we create rbd image with default 'rbd create' command in jewel , the individual image performance for 4k RW is not scaling up well. But, multiple of rbd images running in parallel aggregated throughput is scaling.
>> For the same QD and numjob combination, image created with image format 1 (and with hammer like rbd_default_features = 3) is producing *16X* more performance. I did some digging and here is my findings.
>>
>> Setup:
>> --------
>>
>> 32 osds (all SSD) over 4 nodes. Pool size = 2 , min_size = 1.
>>
>> root@stormeap-1:~# ceph -s
>>     cluster db0febf1-d2b0-4f8d-8f20-43731c134763
>>      health HEALTH_WARN
>>             noscrub,nodeep-scrub,sortbitwise flag(s) set
>>      monmap e1: 1 mons at {a=10.60.194.10:6789/0}
>>             election epoch 5, quorum 0 a
>>      osdmap e139: 32 osds: 32 up, 32 in
>>             flags noscrub,nodeep-scrub,sortbitwise
>>       pgmap v20532: 2500 pgs, 1 pools, 7421 GB data, 1855 kobjects
>>             14850 GB used, 208 TB / 223 TB avail
>>                 2500 active+clean
>>
>> IO profile : Fio rbd with QD 128 and numjob = 10 rbd cache is
>> disabled.
>>
>> Result:
>> --------
>> root@stormeap-1:~# rbd info recovery_test/rbd_degradation rbd image
>> 'rbd_degradation':
>>         size 1953 GB in 500000 objects
>>         order 22 (4096 kB objects)
>>         block_name_prefix: rb.0.5f5f.6b8b4567
>>         format: 1
>>
>> On the above image with format 1 it is giving *~102K iops*
>>
>> root@stormeap-1:~# rbd info
>> recovery_test/rbd_degradation_with_hammer_features
>> rbd image 'rbd_degradation_with_hammer_features':
>>         size 195 GB in 50000 objects
>>         order 22 (4096 kB objects)
>>         block_name_prefix: rbd_data.5f8d6b8b4567
>>         format: 2
>>         features: layering
>>         flags:
>>
>> On the above image with hammer rbd features on , it is giving *~105K
>> iops*
>>
>> root@stormeap-1:~# rbd info recovery_test/rbd_degradation_with_7
>> rbd image 'rbd_degradation_with_7':
>>         size 195 GB in 50000 objects
>>         order 22 (4096 kB objects)
>>         block_name_prefix: rbd_data.5fd86b8b4567
>>         format: 2
>>         features: layering, exclusive-lock
>>         flags:
>>
>> On the above image with feature 7 (exclusive lock feature on) , it is
>> giving *~8K iops*...So, >12X degradation
>>
>> Tried with single numjob and QD = 128 , performance bumped up till ~40K..Further increasing QD , performance is not going up.
>>
>>
>> root@stormeap-1:~# rbd info recovery_test/rbd_degradation_with_15
>> rbd image 'rbd_degradation_with_15':
>>         size 195 GB in 50000 objects
>>         order 22 (4096 kB objects)
>>         block_name_prefix: rbd_data.5fab6b8b4567
>>         format: 2
>>         features: layering, exclusive-lock, object-map
>>         flags:
>>
>> On the above image with feature 15 (exclusive lock, object map feature
>> on) , it is giving *~8K iops*...So, >12X degradation
>>
>> Tried with single numjob and QD = 128 , performance bumped up till ~40K..Further increasing QD , performance is not going up.
>>
>>
>> root@stormeap-1:~# rbd info recovery_test/ceph_recovery_img_1 rbd
>> image 'ceph_recovery_img_1':
>>         size 4882 GB in 1250000 objects
>>         order 22 (4096 kB objects)
>>         block_name_prefix: rbd_data.371b6b8b4567
>>         format: 2
>>         features: layering, exclusive-lock, object-map, fast-diff, deep-flatten
>>         flags:
>>
>> On the above image with feature 61 (Jewel default) , it is giving *~6K
>> iops*...So, *>16X* degradation
>>
>> Tried with single numjob and QD = 128 , performance bumped up till ~35K..Further increasing QD , performance is not going up.
>>
>> Summary :
>> ------------
>>
>> 1. It seems exclusive lock feature is degrading performance.
>>
>> 2. It is degrading a bit further on enabling fast-diff, deep-flatten
>>
>>
>> Let me know if you need more information on this.
>>
>> Thanks & Regards
>> Somnath
>>
>>
>> PLEASE NOTE: The information contained in this electronic mail message is intended only for the use of the designated recipient(s) named above. If the reader of this message is not the intended recipient, you are hereby notified that you have received this message in error and that any review, dissemination, distribution, or copying of this message is strictly prohibited. If you have received this communication in error, please notify the sender by telephone or e-mail (as shown above) immediately and destroy any and all copies of this message in your possession (whether hard copies or electronically stored copies).
>> --
>> To unsubscribe from this list: send the line "unsubscribe ceph-devel"
>> in the body of a message to majordomo@vger.kernel.org More majordomo
>> info at  http://vger.kernel.org/majordomo-info.html
>
>
>
> --
> Jason

^ permalink raw reply	[flat|nested] 18+ messages in thread

* RE: Severe performance degradation with jewel rbd image
  2016-05-26  0:50     ` Jason Dillaman
@ 2016-05-26  3:39       ` Somnath Roy
  2016-05-26  3:52         ` Jason Dillaman
  0 siblings, 1 reply; 18+ messages in thread
From: Somnath Roy @ 2016-05-26  3:39 UTC (permalink / raw)
  To: dillaman; +Cc: ceph-devel

Jason,
My use case is to find out how much write performance I can extract out of a single rbd image. 
I don't want to use  --image-shared as writes will be inconsistent then (?)
It seems running a single fio job with high QD is the only option ?
Also, I believe the goal should be at least getting the similar aggregated throughput like single client/single image in case of multi client/single image (individual client will get less and that's fine).
Let me know if I am missing anything.

Thanks & Regards
Somnath


-----Original Message-----
From: Jason Dillaman [mailto:jdillama@redhat.com] 
Sent: Wednesday, May 25, 2016 5:51 PM
To: Somnath Roy
Cc: ceph-devel@vger.kernel.org
Subject: Re: Severe performance degradation with jewel rbd image

Are you attempting to test a particular use-case where you would have multiple clients connected to a single RBD image?  The rbd CLI has a "--image-shared" option when creating/cloning images as a shortcut to easily disable the exclusive lock, object-map, fast-diff, and journaling features for such situations. You could also specify a different `rbdname` per job to simulate multiple clients accessing multiple images (instead of multiple clients sharing the same image).

I have to be honest, I am actually pretty impressed by your 8K IOPS when you have multiple clients fighting over the exclusive lock since acquiring the lock requires inter-client cooperative coordination to request/release/acquire the lock from the current owner and any client without the lock has all writes blocked.


On Wed, May 25, 2016 at 7:08 PM, Somnath Roy <Somnath.Roy@sandisk.com> wrote:
> Hi Jason,
> Yes, I am running on single image but with multiple fio jobs (with numjobs = 10 in the fio parameter) . I expected that this exclusive lock is locking between the jobs and that's why I have posted the single job result. Single job result is *not degrading*.
> But, we need numjob and QD combination to extract most performance from an image with fio-rbd. A single librbd instance performance seems to be stuck at 40K and we can't scale this image up without running multiple librbd instances on this in parallel.
> 16X performance degradation because of this lock seems very destructive.
>
> Thanks & Regards
> Somnath
>
> -----Original Message-----
> From: Jason Dillaman [mailto:jdillama@redhat.com]
> Sent: Wednesday, May 25, 2016 3:47 PM
> To: Somnath Roy
> Cc: ceph-devel@vger.kernel.org
> Subject: Re: Severe performance degradation with jewel rbd image
>
> Just to eliminate the most straightforward explanation, are you running multiple fio jobs against the same image concurrently?  If the exclusive lock had to ping-pong back-and-forth between clients, that would certainly explain the severe performance penalty.
>
> Otherwise, the exclusive lock is not in the IO path once the client has acquired the exclusive lock.  If you are seeing a performance penalty for a single-client scenario with exclusive lock enabled, this is something we haven't seen and will have to investigate ASAP.
>
> Thanks,
>
> On Wed, May 25, 2016 at 4:48 PM, Somnath Roy <Somnath.Roy@sandisk.com> wrote:
>> Hi Mark/Josh,
>> As I mentioned in the performance meeting today , if we create rbd image with default 'rbd create' command in jewel , the individual image performance for 4k RW is not scaling up well. But, multiple of rbd images running in parallel aggregated throughput is scaling.
>> For the same QD and numjob combination, image created with image format 1 (and with hammer like rbd_default_features = 3) is producing *16X* more performance. I did some digging and here is my findings.
>>
>> Setup:
>> --------
>>
>> 32 osds (all SSD) over 4 nodes. Pool size = 2 , min_size = 1.
>>
>> root@stormeap-1:~# ceph -s
>>     cluster db0febf1-d2b0-4f8d-8f20-43731c134763
>>      health HEALTH_WARN
>>             noscrub,nodeep-scrub,sortbitwise flag(s) set
>>      monmap e1: 1 mons at {a=10.60.194.10:6789/0}
>>             election epoch 5, quorum 0 a
>>      osdmap e139: 32 osds: 32 up, 32 in
>>             flags noscrub,nodeep-scrub,sortbitwise
>>       pgmap v20532: 2500 pgs, 1 pools, 7421 GB data, 1855 kobjects
>>             14850 GB used, 208 TB / 223 TB avail
>>                 2500 active+clean
>>
>> IO profile : Fio rbd with QD 128 and numjob = 10 rbd cache is 
>> disabled.
>>
>> Result:
>> --------
>> root@stormeap-1:~# rbd info recovery_test/rbd_degradation rbd image
>> 'rbd_degradation':
>>         size 1953 GB in 500000 objects
>>         order 22 (4096 kB objects)
>>         block_name_prefix: rb.0.5f5f.6b8b4567
>>         format: 1
>>
>> On the above image with format 1 it is giving *~102K iops*
>>
>> root@stormeap-1:~# rbd info
>> recovery_test/rbd_degradation_with_hammer_features
>> rbd image 'rbd_degradation_with_hammer_features':
>>         size 195 GB in 50000 objects
>>         order 22 (4096 kB objects)
>>         block_name_prefix: rbd_data.5f8d6b8b4567
>>         format: 2
>>         features: layering
>>         flags:
>>
>> On the above image with hammer rbd features on , it is giving *~105K
>> iops*
>>
>> root@stormeap-1:~# rbd info recovery_test/rbd_degradation_with_7
>> rbd image 'rbd_degradation_with_7':
>>         size 195 GB in 50000 objects
>>         order 22 (4096 kB objects)
>>         block_name_prefix: rbd_data.5fd86b8b4567
>>         format: 2
>>         features: layering, exclusive-lock
>>         flags:
>>
>> On the above image with feature 7 (exclusive lock feature on) , it is 
>> giving *~8K iops*...So, >12X degradation
>>
>> Tried with single numjob and QD = 128 , performance bumped up till ~40K..Further increasing QD , performance is not going up.
>>
>>
>> root@stormeap-1:~# rbd info recovery_test/rbd_degradation_with_15
>> rbd image 'rbd_degradation_with_15':
>>         size 195 GB in 50000 objects
>>         order 22 (4096 kB objects)
>>         block_name_prefix: rbd_data.5fab6b8b4567
>>         format: 2
>>         features: layering, exclusive-lock, object-map
>>         flags:
>>
>> On the above image with feature 15 (exclusive lock, object map 
>> feature
>> on) , it is giving *~8K iops*...So, >12X degradation
>>
>> Tried with single numjob and QD = 128 , performance bumped up till ~40K..Further increasing QD , performance is not going up.
>>
>>
>> root@stormeap-1:~# rbd info recovery_test/ceph_recovery_img_1 rbd 
>> image 'ceph_recovery_img_1':
>>         size 4882 GB in 1250000 objects
>>         order 22 (4096 kB objects)
>>         block_name_prefix: rbd_data.371b6b8b4567
>>         format: 2
>>         features: layering, exclusive-lock, object-map, fast-diff, deep-flatten
>>         flags:
>>
>> On the above image with feature 61 (Jewel default) , it is giving 
>> *~6K iops*...So, *>16X* degradation
>>
>> Tried with single numjob and QD = 128 , performance bumped up till ~35K..Further increasing QD , performance is not going up.
>>
>> Summary :
>> ------------
>>
>> 1. It seems exclusive lock feature is degrading performance.
>>
>> 2. It is degrading a bit further on enabling fast-diff, deep-flatten
>>
>>
>> Let me know if you need more information on this.
>>
>> Thanks & Regards
>> Somnath
>>
>>
>> PLEASE NOTE: The information contained in this electronic mail message is intended only for the use of the designated recipient(s) named above. If the reader of this message is not the intended recipient, you are hereby notified that you have received this message in error and that any review, dissemination, distribution, or copying of this message is strictly prohibited. If you have received this communication in error, please notify the sender by telephone or e-mail (as shown above) immediately and destroy any and all copies of this message in your possession (whether hard copies or electronically stored copies).
>> --
>> To unsubscribe from this list: send the line "unsubscribe ceph-devel"
>> in the body of a message to majordomo@vger.kernel.org More majordomo 
>> info at  http://vger.kernel.org/majordomo-info.html
>
>
>
> --
> Jason


--
Jason

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: Severe performance degradation with jewel rbd image
  2016-05-26  3:39       ` Somnath Roy
@ 2016-05-26  3:52         ` Jason Dillaman
  2016-05-26  4:19           ` Somnath Roy
  0 siblings, 1 reply; 18+ messages in thread
From: Jason Dillaman @ 2016-05-26  3:52 UTC (permalink / raw)
  To: Somnath Roy; +Cc: ceph-devel

For multi-client, single-image, you should be using the
"--image-shared" option when creating the image (or just disable
exclusive-lock after the fact) since the expected use case of
exclusive-lock is single client, single image (e.g. single QEMU
process that can live-migrate to a new host but both hosts won't be
writing concurrently).

Regardless of whether or not you use exclusive-lock or not, when you
have multiple clients concurrently writing to the same image, the
necessary coordination to provide consistency needs to be provided at
the application layer (i.e. use a clustered filesystem on top of a
single RBD image when being manipulated by multiple clients
concurrently).

On Wed, May 25, 2016 at 11:39 PM, Somnath Roy <Somnath.Roy@sandisk.com> wrote:
> Jason,
> My use case is to find out how much write performance I can extract out of a single rbd image.
> I don't want to use  --image-shared as writes will be inconsistent then (?)
> It seems running a single fio job with high QD is the only option ?
> Also, I believe the goal should be at least getting the similar aggregated throughput like single client/single image in case of multi client/single image (individual client will get less and that's fine).
> Let me know if I am missing anything.
>
> Thanks & Regards
> Somnath
>
>
> -----Original Message-----
> From: Jason Dillaman [mailto:jdillama@redhat.com]
> Sent: Wednesday, May 25, 2016 5:51 PM
> To: Somnath Roy
> Cc: ceph-devel@vger.kernel.org
> Subject: Re: Severe performance degradation with jewel rbd image
>
> Are you attempting to test a particular use-case where you would have multiple clients connected to a single RBD image?  The rbd CLI has a "--image-shared" option when creating/cloning images as a shortcut to easily disable the exclusive lock, object-map, fast-diff, and journaling features for such situations. You could also specify a different `rbdname` per job to simulate multiple clients accessing multiple images (instead of multiple clients sharing the same image).
>
> I have to be honest, I am actually pretty impressed by your 8K IOPS when you have multiple clients fighting over the exclusive lock since acquiring the lock requires inter-client cooperative coordination to request/release/acquire the lock from the current owner and any client without the lock has all writes blocked.
>
>
> On Wed, May 25, 2016 at 7:08 PM, Somnath Roy <Somnath.Roy@sandisk.com> wrote:
>> Hi Jason,
>> Yes, I am running on single image but with multiple fio jobs (with numjobs = 10 in the fio parameter) . I expected that this exclusive lock is locking between the jobs and that's why I have posted the single job result. Single job result is *not degrading*.
>> But, we need numjob and QD combination to extract most performance from an image with fio-rbd. A single librbd instance performance seems to be stuck at 40K and we can't scale this image up without running multiple librbd instances on this in parallel.
>> 16X performance degradation because of this lock seems very destructive.
>>
>> Thanks & Regards
>> Somnath
>>
>> -----Original Message-----
>> From: Jason Dillaman [mailto:jdillama@redhat.com]
>> Sent: Wednesday, May 25, 2016 3:47 PM
>> To: Somnath Roy
>> Cc: ceph-devel@vger.kernel.org
>> Subject: Re: Severe performance degradation with jewel rbd image
>>
>> Just to eliminate the most straightforward explanation, are you running multiple fio jobs against the same image concurrently?  If the exclusive lock had to ping-pong back-and-forth between clients, that would certainly explain the severe performance penalty.
>>
>> Otherwise, the exclusive lock is not in the IO path once the client has acquired the exclusive lock.  If you are seeing a performance penalty for a single-client scenario with exclusive lock enabled, this is something we haven't seen and will have to investigate ASAP.
>>
>> Thanks,
>>
>> On Wed, May 25, 2016 at 4:48 PM, Somnath Roy <Somnath.Roy@sandisk.com> wrote:
>>> Hi Mark/Josh,
>>> As I mentioned in the performance meeting today , if we create rbd image with default 'rbd create' command in jewel , the individual image performance for 4k RW is not scaling up well. But, multiple of rbd images running in parallel aggregated throughput is scaling.
>>> For the same QD and numjob combination, image created with image format 1 (and with hammer like rbd_default_features = 3) is producing *16X* more performance. I did some digging and here is my findings.
>>>
>>> Setup:
>>> --------
>>>
>>> 32 osds (all SSD) over 4 nodes. Pool size = 2 , min_size = 1.
>>>
>>> root@stormeap-1:~# ceph -s
>>>     cluster db0febf1-d2b0-4f8d-8f20-43731c134763
>>>      health HEALTH_WARN
>>>             noscrub,nodeep-scrub,sortbitwise flag(s) set
>>>      monmap e1: 1 mons at {a=10.60.194.10:6789/0}
>>>             election epoch 5, quorum 0 a
>>>      osdmap e139: 32 osds: 32 up, 32 in
>>>             flags noscrub,nodeep-scrub,sortbitwise
>>>       pgmap v20532: 2500 pgs, 1 pools, 7421 GB data, 1855 kobjects
>>>             14850 GB used, 208 TB / 223 TB avail
>>>                 2500 active+clean
>>>
>>> IO profile : Fio rbd with QD 128 and numjob = 10 rbd cache is
>>> disabled.
>>>
>>> Result:
>>> --------
>>> root@stormeap-1:~# rbd info recovery_test/rbd_degradation rbd image
>>> 'rbd_degradation':
>>>         size 1953 GB in 500000 objects
>>>         order 22 (4096 kB objects)
>>>         block_name_prefix: rb.0.5f5f.6b8b4567
>>>         format: 1
>>>
>>> On the above image with format 1 it is giving *~102K iops*
>>>
>>> root@stormeap-1:~# rbd info
>>> recovery_test/rbd_degradation_with_hammer_features
>>> rbd image 'rbd_degradation_with_hammer_features':
>>>         size 195 GB in 50000 objects
>>>         order 22 (4096 kB objects)
>>>         block_name_prefix: rbd_data.5f8d6b8b4567
>>>         format: 2
>>>         features: layering
>>>         flags:
>>>
>>> On the above image with hammer rbd features on , it is giving *~105K
>>> iops*
>>>
>>> root@stormeap-1:~# rbd info recovery_test/rbd_degradation_with_7
>>> rbd image 'rbd_degradation_with_7':
>>>         size 195 GB in 50000 objects
>>>         order 22 (4096 kB objects)
>>>         block_name_prefix: rbd_data.5fd86b8b4567
>>>         format: 2
>>>         features: layering, exclusive-lock
>>>         flags:
>>>
>>> On the above image with feature 7 (exclusive lock feature on) , it is
>>> giving *~8K iops*...So, >12X degradation
>>>
>>> Tried with single numjob and QD = 128 , performance bumped up till ~40K..Further increasing QD , performance is not going up.
>>>
>>>
>>> root@stormeap-1:~# rbd info recovery_test/rbd_degradation_with_15
>>> rbd image 'rbd_degradation_with_15':
>>>         size 195 GB in 50000 objects
>>>         order 22 (4096 kB objects)
>>>         block_name_prefix: rbd_data.5fab6b8b4567
>>>         format: 2
>>>         features: layering, exclusive-lock, object-map
>>>         flags:
>>>
>>> On the above image with feature 15 (exclusive lock, object map
>>> feature
>>> on) , it is giving *~8K iops*...So, >12X degradation
>>>
>>> Tried with single numjob and QD = 128 , performance bumped up till ~40K..Further increasing QD , performance is not going up.
>>>
>>>
>>> root@stormeap-1:~# rbd info recovery_test/ceph_recovery_img_1 rbd
>>> image 'ceph_recovery_img_1':
>>>         size 4882 GB in 1250000 objects
>>>         order 22 (4096 kB objects)
>>>         block_name_prefix: rbd_data.371b6b8b4567
>>>         format: 2
>>>         features: layering, exclusive-lock, object-map, fast-diff, deep-flatten
>>>         flags:
>>>
>>> On the above image with feature 61 (Jewel default) , it is giving
>>> *~6K iops*...So, *>16X* degradation
>>>
>>> Tried with single numjob and QD = 128 , performance bumped up till ~35K..Further increasing QD , performance is not going up.
>>>
>>> Summary :
>>> ------------
>>>
>>> 1. It seems exclusive lock feature is degrading performance.
>>>
>>> 2. It is degrading a bit further on enabling fast-diff, deep-flatten
>>>
>>>
>>> Let me know if you need more information on this.
>>>
>>> Thanks & Regards
>>> Somnath
>>>
>>>
>>> PLEASE NOTE: The information contained in this electronic mail message is intended only for the use of the designated recipient(s) named above. If the reader of this message is not the intended recipient, you are hereby notified that you have received this message in error and that any review, dissemination, distribution, or copying of this message is strictly prohibited. If you have received this communication in error, please notify the sender by telephone or e-mail (as shown above) immediately and destroy any and all copies of this message in your possession (whether hard copies or electronically stored copies).
>>> --
>>> To unsubscribe from this list: send the line "unsubscribe ceph-devel"
>>> in the body of a message to majordomo@vger.kernel.org More majordomo
>>> info at  http://vger.kernel.org/majordomo-info.html
>>
>>
>>
>> --
>> Jason
>
>
> --
> Jason



-- 
Jason

^ permalink raw reply	[flat|nested] 18+ messages in thread

* RE: Severe performance degradation with jewel rbd image
  2016-05-26  3:52         ` Jason Dillaman
@ 2016-05-26  4:19           ` Somnath Roy
  2016-05-26 13:28             ` Jason Dillaman
  0 siblings, 1 reply; 18+ messages in thread
From: Somnath Roy @ 2016-05-26  4:19 UTC (permalink / raw)
  To: dillaman; +Cc: ceph-devel

Thanks Jason ! 
My bad, I thought exclusive lock is to maintain the consistency.
I think the features like objectmap , fast-diff , deep-flatten, journaling could not be enabled if I disable exclusive lock ?
Could you please give a one liner or point me to the doc where I can find what the features like objectmap , fast-diff , deep-flatten does ?

Thanks & Regards
Somnath

-----Original Message-----
From: Jason Dillaman [mailto:jdillama@redhat.com] 
Sent: Wednesday, May 25, 2016 8:53 PM
To: Somnath Roy
Cc: ceph-devel@vger.kernel.org
Subject: Re: Severe performance degradation with jewel rbd image

For multi-client, single-image, you should be using the "--image-shared" option when creating the image (or just disable exclusive-lock after the fact) since the expected use case of exclusive-lock is single client, single image (e.g. single QEMU process that can live-migrate to a new host but both hosts won't be writing concurrently).

Regardless of whether or not you use exclusive-lock or not, when you have multiple clients concurrently writing to the same image, the necessary coordination to provide consistency needs to be provided at the application layer (i.e. use a clustered filesystem on top of a single RBD image when being manipulated by multiple clients concurrently).

On Wed, May 25, 2016 at 11:39 PM, Somnath Roy <Somnath.Roy@sandisk.com> wrote:
> Jason,
> My use case is to find out how much write performance I can extract out of a single rbd image.
> I don't want to use  --image-shared as writes will be inconsistent 
> then (?) It seems running a single fio job with high QD is the only option ?
> Also, I believe the goal should be at least getting the similar aggregated throughput like single client/single image in case of multi client/single image (individual client will get less and that's fine).
> Let me know if I am missing anything.
>
> Thanks & Regards
> Somnath
>
>
> -----Original Message-----
> From: Jason Dillaman [mailto:jdillama@redhat.com]
> Sent: Wednesday, May 25, 2016 5:51 PM
> To: Somnath Roy
> Cc: ceph-devel@vger.kernel.org
> Subject: Re: Severe performance degradation with jewel rbd image
>
> Are you attempting to test a particular use-case where you would have multiple clients connected to a single RBD image?  The rbd CLI has a "--image-shared" option when creating/cloning images as a shortcut to easily disable the exclusive lock, object-map, fast-diff, and journaling features for such situations. You could also specify a different `rbdname` per job to simulate multiple clients accessing multiple images (instead of multiple clients sharing the same image).
>
> I have to be honest, I am actually pretty impressed by your 8K IOPS when you have multiple clients fighting over the exclusive lock since acquiring the lock requires inter-client cooperative coordination to request/release/acquire the lock from the current owner and any client without the lock has all writes blocked.
>
>
> On Wed, May 25, 2016 at 7:08 PM, Somnath Roy <Somnath.Roy@sandisk.com> wrote:
>> Hi Jason,
>> Yes, I am running on single image but with multiple fio jobs (with numjobs = 10 in the fio parameter) . I expected that this exclusive lock is locking between the jobs and that's why I have posted the single job result. Single job result is *not degrading*.
>> But, we need numjob and QD combination to extract most performance from an image with fio-rbd. A single librbd instance performance seems to be stuck at 40K and we can't scale this image up without running multiple librbd instances on this in parallel.
>> 16X performance degradation because of this lock seems very destructive.
>>
>> Thanks & Regards
>> Somnath
>>
>> -----Original Message-----
>> From: Jason Dillaman [mailto:jdillama@redhat.com]
>> Sent: Wednesday, May 25, 2016 3:47 PM
>> To: Somnath Roy
>> Cc: ceph-devel@vger.kernel.org
>> Subject: Re: Severe performance degradation with jewel rbd image
>>
>> Just to eliminate the most straightforward explanation, are you running multiple fio jobs against the same image concurrently?  If the exclusive lock had to ping-pong back-and-forth between clients, that would certainly explain the severe performance penalty.
>>
>> Otherwise, the exclusive lock is not in the IO path once the client has acquired the exclusive lock.  If you are seeing a performance penalty for a single-client scenario with exclusive lock enabled, this is something we haven't seen and will have to investigate ASAP.
>>
>> Thanks,
>>
>> On Wed, May 25, 2016 at 4:48 PM, Somnath Roy <Somnath.Roy@sandisk.com> wrote:
>>> Hi Mark/Josh,
>>> As I mentioned in the performance meeting today , if we create rbd image with default 'rbd create' command in jewel , the individual image performance for 4k RW is not scaling up well. But, multiple of rbd images running in parallel aggregated throughput is scaling.
>>> For the same QD and numjob combination, image created with image format 1 (and with hammer like rbd_default_features = 3) is producing *16X* more performance. I did some digging and here is my findings.
>>>
>>> Setup:
>>> --------
>>>
>>> 32 osds (all SSD) over 4 nodes. Pool size = 2 , min_size = 1.
>>>
>>> root@stormeap-1:~# ceph -s
>>>     cluster db0febf1-d2b0-4f8d-8f20-43731c134763
>>>      health HEALTH_WARN
>>>             noscrub,nodeep-scrub,sortbitwise flag(s) set
>>>      monmap e1: 1 mons at {a=10.60.194.10:6789/0}
>>>             election epoch 5, quorum 0 a
>>>      osdmap e139: 32 osds: 32 up, 32 in
>>>             flags noscrub,nodeep-scrub,sortbitwise
>>>       pgmap v20532: 2500 pgs, 1 pools, 7421 GB data, 1855 kobjects
>>>             14850 GB used, 208 TB / 223 TB avail
>>>                 2500 active+clean
>>>
>>> IO profile : Fio rbd with QD 128 and numjob = 10 rbd cache is 
>>> disabled.
>>>
>>> Result:
>>> --------
>>> root@stormeap-1:~# rbd info recovery_test/rbd_degradation rbd image
>>> 'rbd_degradation':
>>>         size 1953 GB in 500000 objects
>>>         order 22 (4096 kB objects)
>>>         block_name_prefix: rb.0.5f5f.6b8b4567
>>>         format: 1
>>>
>>> On the above image with format 1 it is giving *~102K iops*
>>>
>>> root@stormeap-1:~# rbd info
>>> recovery_test/rbd_degradation_with_hammer_features
>>> rbd image 'rbd_degradation_with_hammer_features':
>>>         size 195 GB in 50000 objects
>>>         order 22 (4096 kB objects)
>>>         block_name_prefix: rbd_data.5f8d6b8b4567
>>>         format: 2
>>>         features: layering
>>>         flags:
>>>
>>> On the above image with hammer rbd features on , it is giving *~105K
>>> iops*
>>>
>>> root@stormeap-1:~# rbd info recovery_test/rbd_degradation_with_7
>>> rbd image 'rbd_degradation_with_7':
>>>         size 195 GB in 50000 objects
>>>         order 22 (4096 kB objects)
>>>         block_name_prefix: rbd_data.5fd86b8b4567
>>>         format: 2
>>>         features: layering, exclusive-lock
>>>         flags:
>>>
>>> On the above image with feature 7 (exclusive lock feature on) , it 
>>> is giving *~8K iops*...So, >12X degradation
>>>
>>> Tried with single numjob and QD = 128 , performance bumped up till ~40K..Further increasing QD , performance is not going up.
>>>
>>>
>>> root@stormeap-1:~# rbd info recovery_test/rbd_degradation_with_15
>>> rbd image 'rbd_degradation_with_15':
>>>         size 195 GB in 50000 objects
>>>         order 22 (4096 kB objects)
>>>         block_name_prefix: rbd_data.5fab6b8b4567
>>>         format: 2
>>>         features: layering, exclusive-lock, object-map
>>>         flags:
>>>
>>> On the above image with feature 15 (exclusive lock, object map 
>>> feature
>>> on) , it is giving *~8K iops*...So, >12X degradation
>>>
>>> Tried with single numjob and QD = 128 , performance bumped up till ~40K..Further increasing QD , performance is not going up.
>>>
>>>
>>> root@stormeap-1:~# rbd info recovery_test/ceph_recovery_img_1 rbd 
>>> image 'ceph_recovery_img_1':
>>>         size 4882 GB in 1250000 objects
>>>         order 22 (4096 kB objects)
>>>         block_name_prefix: rbd_data.371b6b8b4567
>>>         format: 2
>>>         features: layering, exclusive-lock, object-map, fast-diff, deep-flatten
>>>         flags:
>>>
>>> On the above image with feature 61 (Jewel default) , it is giving 
>>> *~6K iops*...So, *>16X* degradation
>>>
>>> Tried with single numjob and QD = 128 , performance bumped up till ~35K..Further increasing QD , performance is not going up.
>>>
>>> Summary :
>>> ------------
>>>
>>> 1. It seems exclusive lock feature is degrading performance.
>>>
>>> 2. It is degrading a bit further on enabling fast-diff, deep-flatten
>>>
>>>
>>> Let me know if you need more information on this.
>>>
>>> Thanks & Regards
>>> Somnath
>>>
>>>
>>> PLEASE NOTE: The information contained in this electronic mail message is intended only for the use of the designated recipient(s) named above. If the reader of this message is not the intended recipient, you are hereby notified that you have received this message in error and that any review, dissemination, distribution, or copying of this message is strictly prohibited. If you have received this communication in error, please notify the sender by telephone or e-mail (as shown above) immediately and destroy any and all copies of this message in your possession (whether hard copies or electronically stored copies).
>>> --
>>> To unsubscribe from this list: send the line "unsubscribe ceph-devel"
>>> in the body of a message to majordomo@vger.kernel.org More majordomo 
>>> info at  http://vger.kernel.org/majordomo-info.html
>>
>>
>>
>> --
>> Jason
>
>
> --
> Jason



--
Jason

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: Severe performance degradation with jewel rbd image
  2016-05-26  4:19           ` Somnath Roy
@ 2016-05-26 13:28             ` Jason Dillaman
  2016-05-26 17:47               ` Somnath Roy
  0 siblings, 1 reply; 18+ messages in thread
From: Jason Dillaman @ 2016-05-26 13:28 UTC (permalink / raw)
  To: Somnath Roy; +Cc: ceph-devel

I create a ticket [1] a while ago to improve the documentation of RBD
image features.  The ticket is still open but I just added some
verbiage as a comment to the ticket in the meantime.

[1] http://tracker.ceph.com/issues/15000

On Thu, May 26, 2016 at 12:19 AM, Somnath Roy <Somnath.Roy@sandisk.com> wrote:
> Thanks Jason !
> My bad, I thought exclusive lock is to maintain the consistency.
> I think the features like objectmap , fast-diff , deep-flatten, journaling could not be enabled if I disable exclusive lock ?
> Could you please give a one liner or point me to the doc where I can find what the features like objectmap , fast-diff , deep-flatten does ?
>
> Thanks & Regards
> Somnath
>
> -----Original Message-----
> From: Jason Dillaman [mailto:jdillama@redhat.com]
> Sent: Wednesday, May 25, 2016 8:53 PM
> To: Somnath Roy
> Cc: ceph-devel@vger.kernel.org
> Subject: Re: Severe performance degradation with jewel rbd image
>
> For multi-client, single-image, you should be using the "--image-shared" option when creating the image (or just disable exclusive-lock after the fact) since the expected use case of exclusive-lock is single client, single image (e.g. single QEMU process that can live-migrate to a new host but both hosts won't be writing concurrently).
>
> Regardless of whether or not you use exclusive-lock or not, when you have multiple clients concurrently writing to the same image, the necessary coordination to provide consistency needs to be provided at the application layer (i.e. use a clustered filesystem on top of a single RBD image when being manipulated by multiple clients concurrently).
>
> On Wed, May 25, 2016 at 11:39 PM, Somnath Roy <Somnath.Roy@sandisk.com> wrote:
>> Jason,
>> My use case is to find out how much write performance I can extract out of a single rbd image.
>> I don't want to use  --image-shared as writes will be inconsistent
>> then (?) It seems running a single fio job with high QD is the only option ?
>> Also, I believe the goal should be at least getting the similar aggregated throughput like single client/single image in case of multi client/single image (individual client will get less and that's fine).
>> Let me know if I am missing anything.
>>
>> Thanks & Regards
>> Somnath
>>
>>
>> -----Original Message-----
>> From: Jason Dillaman [mailto:jdillama@redhat.com]
>> Sent: Wednesday, May 25, 2016 5:51 PM
>> To: Somnath Roy
>> Cc: ceph-devel@vger.kernel.org
>> Subject: Re: Severe performance degradation with jewel rbd image
>>
>> Are you attempting to test a particular use-case where you would have multiple clients connected to a single RBD image?  The rbd CLI has a "--image-shared" option when creating/cloning images as a shortcut to easily disable the exclusive lock, object-map, fast-diff, and journaling features for such situations. You could also specify a different `rbdname` per job to simulate multiple clients accessing multiple images (instead of multiple clients sharing the same image).
>>
>> I have to be honest, I am actually pretty impressed by your 8K IOPS when you have multiple clients fighting over the exclusive lock since acquiring the lock requires inter-client cooperative coordination to request/release/acquire the lock from the current owner and any client without the lock has all writes blocked.
>>
>>
>> On Wed, May 25, 2016 at 7:08 PM, Somnath Roy <Somnath.Roy@sandisk.com> wrote:
>>> Hi Jason,
>>> Yes, I am running on single image but with multiple fio jobs (with numjobs = 10 in the fio parameter) . I expected that this exclusive lock is locking between the jobs and that's why I have posted the single job result. Single job result is *not degrading*.
>>> But, we need numjob and QD combination to extract most performance from an image with fio-rbd. A single librbd instance performance seems to be stuck at 40K and we can't scale this image up without running multiple librbd instances on this in parallel.
>>> 16X performance degradation because of this lock seems very destructive.
>>>
>>> Thanks & Regards
>>> Somnath
>>>
>>> -----Original Message-----
>>> From: Jason Dillaman [mailto:jdillama@redhat.com]
>>> Sent: Wednesday, May 25, 2016 3:47 PM
>>> To: Somnath Roy
>>> Cc: ceph-devel@vger.kernel.org
>>> Subject: Re: Severe performance degradation with jewel rbd image
>>>
>>> Just to eliminate the most straightforward explanation, are you running multiple fio jobs against the same image concurrently?  If the exclusive lock had to ping-pong back-and-forth between clients, that would certainly explain the severe performance penalty.
>>>
>>> Otherwise, the exclusive lock is not in the IO path once the client has acquired the exclusive lock.  If you are seeing a performance penalty for a single-client scenario with exclusive lock enabled, this is something we haven't seen and will have to investigate ASAP.
>>>
>>> Thanks,
>>>
>>> On Wed, May 25, 2016 at 4:48 PM, Somnath Roy <Somnath.Roy@sandisk.com> wrote:
>>>> Hi Mark/Josh,
>>>> As I mentioned in the performance meeting today , if we create rbd image with default 'rbd create' command in jewel , the individual image performance for 4k RW is not scaling up well. But, multiple of rbd images running in parallel aggregated throughput is scaling.
>>>> For the same QD and numjob combination, image created with image format 1 (and with hammer like rbd_default_features = 3) is producing *16X* more performance. I did some digging and here is my findings.
>>>>
>>>> Setup:
>>>> --------
>>>>
>>>> 32 osds (all SSD) over 4 nodes. Pool size = 2 , min_size = 1.
>>>>
>>>> root@stormeap-1:~# ceph -s
>>>>     cluster db0febf1-d2b0-4f8d-8f20-43731c134763
>>>>      health HEALTH_WARN
>>>>             noscrub,nodeep-scrub,sortbitwise flag(s) set
>>>>      monmap e1: 1 mons at {a=10.60.194.10:6789/0}
>>>>             election epoch 5, quorum 0 a
>>>>      osdmap e139: 32 osds: 32 up, 32 in
>>>>             flags noscrub,nodeep-scrub,sortbitwise
>>>>       pgmap v20532: 2500 pgs, 1 pools, 7421 GB data, 1855 kobjects
>>>>             14850 GB used, 208 TB / 223 TB avail
>>>>                 2500 active+clean
>>>>
>>>> IO profile : Fio rbd with QD 128 and numjob = 10 rbd cache is
>>>> disabled.
>>>>
>>>> Result:
>>>> --------
>>>> root@stormeap-1:~# rbd info recovery_test/rbd_degradation rbd image
>>>> 'rbd_degradation':
>>>>         size 1953 GB in 500000 objects
>>>>         order 22 (4096 kB objects)
>>>>         block_name_prefix: rb.0.5f5f.6b8b4567
>>>>         format: 1
>>>>
>>>> On the above image with format 1 it is giving *~102K iops*
>>>>
>>>> root@stormeap-1:~# rbd info
>>>> recovery_test/rbd_degradation_with_hammer_features
>>>> rbd image 'rbd_degradation_with_hammer_features':
>>>>         size 195 GB in 50000 objects
>>>>         order 22 (4096 kB objects)
>>>>         block_name_prefix: rbd_data.5f8d6b8b4567
>>>>         format: 2
>>>>         features: layering
>>>>         flags:
>>>>
>>>> On the above image with hammer rbd features on , it is giving *~105K
>>>> iops*
>>>>
>>>> root@stormeap-1:~# rbd info recovery_test/rbd_degradation_with_7
>>>> rbd image 'rbd_degradation_with_7':
>>>>         size 195 GB in 50000 objects
>>>>         order 22 (4096 kB objects)
>>>>         block_name_prefix: rbd_data.5fd86b8b4567
>>>>         format: 2
>>>>         features: layering, exclusive-lock
>>>>         flags:
>>>>
>>>> On the above image with feature 7 (exclusive lock feature on) , it
>>>> is giving *~8K iops*...So, >12X degradation
>>>>
>>>> Tried with single numjob and QD = 128 , performance bumped up till ~40K..Further increasing QD , performance is not going up.
>>>>
>>>>
>>>> root@stormeap-1:~# rbd info recovery_test/rbd_degradation_with_15
>>>> rbd image 'rbd_degradation_with_15':
>>>>         size 195 GB in 50000 objects
>>>>         order 22 (4096 kB objects)
>>>>         block_name_prefix: rbd_data.5fab6b8b4567
>>>>         format: 2
>>>>         features: layering, exclusive-lock, object-map
>>>>         flags:
>>>>
>>>> On the above image with feature 15 (exclusive lock, object map
>>>> feature
>>>> on) , it is giving *~8K iops*...So, >12X degradation
>>>>
>>>> Tried with single numjob and QD = 128 , performance bumped up till ~40K..Further increasing QD , performance is not going up.
>>>>
>>>>
>>>> root@stormeap-1:~# rbd info recovery_test/ceph_recovery_img_1 rbd
>>>> image 'ceph_recovery_img_1':
>>>>         size 4882 GB in 1250000 objects
>>>>         order 22 (4096 kB objects)
>>>>         block_name_prefix: rbd_data.371b6b8b4567
>>>>         format: 2
>>>>         features: layering, exclusive-lock, object-map, fast-diff, deep-flatten
>>>>         flags:
>>>>
>>>> On the above image with feature 61 (Jewel default) , it is giving
>>>> *~6K iops*...So, *>16X* degradation
>>>>
>>>> Tried with single numjob and QD = 128 , performance bumped up till ~35K..Further increasing QD , performance is not going up.
>>>>
>>>> Summary :
>>>> ------------
>>>>
>>>> 1. It seems exclusive lock feature is degrading performance.
>>>>
>>>> 2. It is degrading a bit further on enabling fast-diff, deep-flatten
>>>>
>>>>
>>>> Let me know if you need more information on this.
>>>>
>>>> Thanks & Regards
>>>> Somnath
>>>>
>>>>
>>>> PLEASE NOTE: The information contained in this electronic mail message is intended only for the use of the designated recipient(s) named above. If the reader of this message is not the intended recipient, you are hereby notified that you have received this message in error and that any review, dissemination, distribution, or copying of this message is strictly prohibited. If you have received this communication in error, please notify the sender by telephone or e-mail (as shown above) immediately and destroy any and all copies of this message in your possession (whether hard copies or electronically stored copies).
>>>> --
>>>> To unsubscribe from this list: send the line "unsubscribe ceph-devel"
>>>> in the body of a message to majordomo@vger.kernel.org More majordomo
>>>> info at  http://vger.kernel.org/majordomo-info.html
>>>
>>>
>>>
>>> --
>>> Jason
>>
>>
>> --
>> Jason
>
>
>
> --
> Jason



-- 
Jason

^ permalink raw reply	[flat|nested] 18+ messages in thread

* RE: Severe performance degradation with jewel rbd image
  2016-05-26 13:28             ` Jason Dillaman
@ 2016-05-26 17:47               ` Somnath Roy
  2016-05-26 18:02                 ` Samuel Just
  0 siblings, 1 reply; 18+ messages in thread
From: Somnath Roy @ 2016-05-26 17:47 UTC (permalink / raw)
  To: dillaman; +Cc: ceph-devel

Thanks Jason , it helps !
One question , if I disable exclusive lock (and thus other features) during creation can I enable it on the same image in future if needed without any disruption ?

Regards
Somnath

-----Original Message-----
From: Jason Dillaman [mailto:jdillama@redhat.com] 
Sent: Thursday, May 26, 2016 6:29 AM
To: Somnath Roy
Cc: ceph-devel@vger.kernel.org
Subject: Re: Severe performance degradation with jewel rbd image

I create a ticket [1] a while ago to improve the documentation of RBD image features.  The ticket is still open but I just added some verbiage as a comment to the ticket in the meantime.

[1] http://tracker.ceph.com/issues/15000

On Thu, May 26, 2016 at 12:19 AM, Somnath Roy <Somnath.Roy@sandisk.com> wrote:
> Thanks Jason !
> My bad, I thought exclusive lock is to maintain the consistency.
> I think the features like objectmap , fast-diff , deep-flatten, journaling could not be enabled if I disable exclusive lock ?
> Could you please give a one liner or point me to the doc where I can find what the features like objectmap , fast-diff , deep-flatten does ?
>
> Thanks & Regards
> Somnath
>
> -----Original Message-----
> From: Jason Dillaman [mailto:jdillama@redhat.com]
> Sent: Wednesday, May 25, 2016 8:53 PM
> To: Somnath Roy
> Cc: ceph-devel@vger.kernel.org
> Subject: Re: Severe performance degradation with jewel rbd image
>
> For multi-client, single-image, you should be using the "--image-shared" option when creating the image (or just disable exclusive-lock after the fact) since the expected use case of exclusive-lock is single client, single image (e.g. single QEMU process that can live-migrate to a new host but both hosts won't be writing concurrently).
>
> Regardless of whether or not you use exclusive-lock or not, when you have multiple clients concurrently writing to the same image, the necessary coordination to provide consistency needs to be provided at the application layer (i.e. use a clustered filesystem on top of a single RBD image when being manipulated by multiple clients concurrently).
>
> On Wed, May 25, 2016 at 11:39 PM, Somnath Roy <Somnath.Roy@sandisk.com> wrote:
>> Jason,
>> My use case is to find out how much write performance I can extract out of a single rbd image.
>> I don't want to use  --image-shared as writes will be inconsistent 
>> then (?) It seems running a single fio job with high QD is the only option ?
>> Also, I believe the goal should be at least getting the similar aggregated throughput like single client/single image in case of multi client/single image (individual client will get less and that's fine).
>> Let me know if I am missing anything.
>>
>> Thanks & Regards
>> Somnath
>>
>>
>> -----Original Message-----
>> From: Jason Dillaman [mailto:jdillama@redhat.com]
>> Sent: Wednesday, May 25, 2016 5:51 PM
>> To: Somnath Roy
>> Cc: ceph-devel@vger.kernel.org
>> Subject: Re: Severe performance degradation with jewel rbd image
>>
>> Are you attempting to test a particular use-case where you would have multiple clients connected to a single RBD image?  The rbd CLI has a "--image-shared" option when creating/cloning images as a shortcut to easily disable the exclusive lock, object-map, fast-diff, and journaling features for such situations. You could also specify a different `rbdname` per job to simulate multiple clients accessing multiple images (instead of multiple clients sharing the same image).
>>
>> I have to be honest, I am actually pretty impressed by your 8K IOPS when you have multiple clients fighting over the exclusive lock since acquiring the lock requires inter-client cooperative coordination to request/release/acquire the lock from the current owner and any client without the lock has all writes blocked.
>>
>>
>> On Wed, May 25, 2016 at 7:08 PM, Somnath Roy <Somnath.Roy@sandisk.com> wrote:
>>> Hi Jason,
>>> Yes, I am running on single image but with multiple fio jobs (with numjobs = 10 in the fio parameter) . I expected that this exclusive lock is locking between the jobs and that's why I have posted the single job result. Single job result is *not degrading*.
>>> But, we need numjob and QD combination to extract most performance from an image with fio-rbd. A single librbd instance performance seems to be stuck at 40K and we can't scale this image up without running multiple librbd instances on this in parallel.
>>> 16X performance degradation because of this lock seems very destructive.
>>>
>>> Thanks & Regards
>>> Somnath
>>>
>>> -----Original Message-----
>>> From: Jason Dillaman [mailto:jdillama@redhat.com]
>>> Sent: Wednesday, May 25, 2016 3:47 PM
>>> To: Somnath Roy
>>> Cc: ceph-devel@vger.kernel.org
>>> Subject: Re: Severe performance degradation with jewel rbd image
>>>
>>> Just to eliminate the most straightforward explanation, are you running multiple fio jobs against the same image concurrently?  If the exclusive lock had to ping-pong back-and-forth between clients, that would certainly explain the severe performance penalty.
>>>
>>> Otherwise, the exclusive lock is not in the IO path once the client has acquired the exclusive lock.  If you are seeing a performance penalty for a single-client scenario with exclusive lock enabled, this is something we haven't seen and will have to investigate ASAP.
>>>
>>> Thanks,
>>>
>>> On Wed, May 25, 2016 at 4:48 PM, Somnath Roy <Somnath.Roy@sandisk.com> wrote:
>>>> Hi Mark/Josh,
>>>> As I mentioned in the performance meeting today , if we create rbd image with default 'rbd create' command in jewel , the individual image performance for 4k RW is not scaling up well. But, multiple of rbd images running in parallel aggregated throughput is scaling.
>>>> For the same QD and numjob combination, image created with image format 1 (and with hammer like rbd_default_features = 3) is producing *16X* more performance. I did some digging and here is my findings.
>>>>
>>>> Setup:
>>>> --------
>>>>
>>>> 32 osds (all SSD) over 4 nodes. Pool size = 2 , min_size = 1.
>>>>
>>>> root@stormeap-1:~# ceph -s
>>>>     cluster db0febf1-d2b0-4f8d-8f20-43731c134763
>>>>      health HEALTH_WARN
>>>>             noscrub,nodeep-scrub,sortbitwise flag(s) set
>>>>      monmap e1: 1 mons at {a=10.60.194.10:6789/0}
>>>>             election epoch 5, quorum 0 a
>>>>      osdmap e139: 32 osds: 32 up, 32 in
>>>>             flags noscrub,nodeep-scrub,sortbitwise
>>>>       pgmap v20532: 2500 pgs, 1 pools, 7421 GB data, 1855 kobjects
>>>>             14850 GB used, 208 TB / 223 TB avail
>>>>                 2500 active+clean
>>>>
>>>> IO profile : Fio rbd with QD 128 and numjob = 10 rbd cache is 
>>>> disabled.
>>>>
>>>> Result:
>>>> --------
>>>> root@stormeap-1:~# rbd info recovery_test/rbd_degradation rbd image
>>>> 'rbd_degradation':
>>>>         size 1953 GB in 500000 objects
>>>>         order 22 (4096 kB objects)
>>>>         block_name_prefix: rb.0.5f5f.6b8b4567
>>>>         format: 1
>>>>
>>>> On the above image with format 1 it is giving *~102K iops*
>>>>
>>>> root@stormeap-1:~# rbd info
>>>> recovery_test/rbd_degradation_with_hammer_features
>>>> rbd image 'rbd_degradation_with_hammer_features':
>>>>         size 195 GB in 50000 objects
>>>>         order 22 (4096 kB objects)
>>>>         block_name_prefix: rbd_data.5f8d6b8b4567
>>>>         format: 2
>>>>         features: layering
>>>>         flags:
>>>>
>>>> On the above image with hammer rbd features on , it is giving 
>>>> *~105K
>>>> iops*
>>>>
>>>> root@stormeap-1:~# rbd info recovery_test/rbd_degradation_with_7
>>>> rbd image 'rbd_degradation_with_7':
>>>>         size 195 GB in 50000 objects
>>>>         order 22 (4096 kB objects)
>>>>         block_name_prefix: rbd_data.5fd86b8b4567
>>>>         format: 2
>>>>         features: layering, exclusive-lock
>>>>         flags:
>>>>
>>>> On the above image with feature 7 (exclusive lock feature on) , it 
>>>> is giving *~8K iops*...So, >12X degradation
>>>>
>>>> Tried with single numjob and QD = 128 , performance bumped up till ~40K..Further increasing QD , performance is not going up.
>>>>
>>>>
>>>> root@stormeap-1:~# rbd info recovery_test/rbd_degradation_with_15
>>>> rbd image 'rbd_degradation_with_15':
>>>>         size 195 GB in 50000 objects
>>>>         order 22 (4096 kB objects)
>>>>         block_name_prefix: rbd_data.5fab6b8b4567
>>>>         format: 2
>>>>         features: layering, exclusive-lock, object-map
>>>>         flags:
>>>>
>>>> On the above image with feature 15 (exclusive lock, object map 
>>>> feature
>>>> on) , it is giving *~8K iops*...So, >12X degradation
>>>>
>>>> Tried with single numjob and QD = 128 , performance bumped up till ~40K..Further increasing QD , performance is not going up.
>>>>
>>>>
>>>> root@stormeap-1:~# rbd info recovery_test/ceph_recovery_img_1 rbd 
>>>> image 'ceph_recovery_img_1':
>>>>         size 4882 GB in 1250000 objects
>>>>         order 22 (4096 kB objects)
>>>>         block_name_prefix: rbd_data.371b6b8b4567
>>>>         format: 2
>>>>         features: layering, exclusive-lock, object-map, fast-diff, deep-flatten
>>>>         flags:
>>>>
>>>> On the above image with feature 61 (Jewel default) , it is giving 
>>>> *~6K iops*...So, *>16X* degradation
>>>>
>>>> Tried with single numjob and QD = 128 , performance bumped up till ~35K..Further increasing QD , performance is not going up.
>>>>
>>>> Summary :
>>>> ------------
>>>>
>>>> 1. It seems exclusive lock feature is degrading performance.
>>>>
>>>> 2. It is degrading a bit further on enabling fast-diff, 
>>>> deep-flatten
>>>>
>>>>
>>>> Let me know if you need more information on this.
>>>>
>>>> Thanks & Regards
>>>> Somnath
>>>>
>>>>
>>>> PLEASE NOTE: The information contained in this electronic mail message is intended only for the use of the designated recipient(s) named above. If the reader of this message is not the intended recipient, you are hereby notified that you have received this message in error and that any review, dissemination, distribution, or copying of this message is strictly prohibited. If you have received this communication in error, please notify the sender by telephone or e-mail (as shown above) immediately and destroy any and all copies of this message in your possession (whether hard copies or electronically stored copies).
>>>> --
>>>> To unsubscribe from this list: send the line "unsubscribe ceph-devel"
>>>> in the body of a message to majordomo@vger.kernel.org More 
>>>> majordomo info at  http://vger.kernel.org/majordomo-info.html
>>>
>>>
>>>
>>> --
>>> Jason
>>
>>
>> --
>> Jason
>
>
>
> --
> Jason



--
Jason

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: Severe performance degradation with jewel rbd image
  2016-05-26 17:47               ` Somnath Roy
@ 2016-05-26 18:02                 ` Samuel Just
  2016-05-26 18:17                   ` Jason Dillaman
  0 siblings, 1 reply; 18+ messages in thread
From: Samuel Just @ 2016-05-26 18:02 UTC (permalink / raw)
  To: Somnath Roy; +Cc: dillaman, ceph-devel

It should be noted that many users *will* have the lock enabled since
(iirc, Jason, correct me if I'm wrong), it's there to support some
important features and typical cloud users are 1 client/image users.
If you benchmark without it, it may skew your results.
-Sam

On Thu, May 26, 2016 at 10:47 AM, Somnath Roy <Somnath.Roy@sandisk.com> wrote:
> Thanks Jason , it helps !
> One question , if I disable exclusive lock (and thus other features) during creation can I enable it on the same image in future if needed without any disruption ?
>
> Regards
> Somnath
>
> -----Original Message-----
> From: Jason Dillaman [mailto:jdillama@redhat.com]
> Sent: Thursday, May 26, 2016 6:29 AM
> To: Somnath Roy
> Cc: ceph-devel@vger.kernel.org
> Subject: Re: Severe performance degradation with jewel rbd image
>
> I create a ticket [1] a while ago to improve the documentation of RBD image features.  The ticket is still open but I just added some verbiage as a comment to the ticket in the meantime.
>
> [1] http://tracker.ceph.com/issues/15000
>
> On Thu, May 26, 2016 at 12:19 AM, Somnath Roy <Somnath.Roy@sandisk.com> wrote:
>> Thanks Jason !
>> My bad, I thought exclusive lock is to maintain the consistency.
>> I think the features like objectmap , fast-diff , deep-flatten, journaling could not be enabled if I disable exclusive lock ?
>> Could you please give a one liner or point me to the doc where I can find what the features like objectmap , fast-diff , deep-flatten does ?
>>
>> Thanks & Regards
>> Somnath
>>
>> -----Original Message-----
>> From: Jason Dillaman [mailto:jdillama@redhat.com]
>> Sent: Wednesday, May 25, 2016 8:53 PM
>> To: Somnath Roy
>> Cc: ceph-devel@vger.kernel.org
>> Subject: Re: Severe performance degradation with jewel rbd image
>>
>> For multi-client, single-image, you should be using the "--image-shared" option when creating the image (or just disable exclusive-lock after the fact) since the expected use case of exclusive-lock is single client, single image (e.g. single QEMU process that can live-migrate to a new host but both hosts won't be writing concurrently).
>>
>> Regardless of whether or not you use exclusive-lock or not, when you have multiple clients concurrently writing to the same image, the necessary coordination to provide consistency needs to be provided at the application layer (i.e. use a clustered filesystem on top of a single RBD image when being manipulated by multiple clients concurrently).
>>
>> On Wed, May 25, 2016 at 11:39 PM, Somnath Roy <Somnath.Roy@sandisk.com> wrote:
>>> Jason,
>>> My use case is to find out how much write performance I can extract out of a single rbd image.
>>> I don't want to use  --image-shared as writes will be inconsistent
>>> then (?) It seems running a single fio job with high QD is the only option ?
>>> Also, I believe the goal should be at least getting the similar aggregated throughput like single client/single image in case of multi client/single image (individual client will get less and that's fine).
>>> Let me know if I am missing anything.
>>>
>>> Thanks & Regards
>>> Somnath
>>>
>>>
>>> -----Original Message-----
>>> From: Jason Dillaman [mailto:jdillama@redhat.com]
>>> Sent: Wednesday, May 25, 2016 5:51 PM
>>> To: Somnath Roy
>>> Cc: ceph-devel@vger.kernel.org
>>> Subject: Re: Severe performance degradation with jewel rbd image
>>>
>>> Are you attempting to test a particular use-case where you would have multiple clients connected to a single RBD image?  The rbd CLI has a "--image-shared" option when creating/cloning images as a shortcut to easily disable the exclusive lock, object-map, fast-diff, and journaling features for such situations. You could also specify a different `rbdname` per job to simulate multiple clients accessing multiple images (instead of multiple clients sharing the same image).
>>>
>>> I have to be honest, I am actually pretty impressed by your 8K IOPS when you have multiple clients fighting over the exclusive lock since acquiring the lock requires inter-client cooperative coordination to request/release/acquire the lock from the current owner and any client without the lock has all writes blocked.
>>>
>>>
>>> On Wed, May 25, 2016 at 7:08 PM, Somnath Roy <Somnath.Roy@sandisk.com> wrote:
>>>> Hi Jason,
>>>> Yes, I am running on single image but with multiple fio jobs (with numjobs = 10 in the fio parameter) . I expected that this exclusive lock is locking between the jobs and that's why I have posted the single job result. Single job result is *not degrading*.
>>>> But, we need numjob and QD combination to extract most performance from an image with fio-rbd. A single librbd instance performance seems to be stuck at 40K and we can't scale this image up without running multiple librbd instances on this in parallel.
>>>> 16X performance degradation because of this lock seems very destructive.
>>>>
>>>> Thanks & Regards
>>>> Somnath
>>>>
>>>> -----Original Message-----
>>>> From: Jason Dillaman [mailto:jdillama@redhat.com]
>>>> Sent: Wednesday, May 25, 2016 3:47 PM
>>>> To: Somnath Roy
>>>> Cc: ceph-devel@vger.kernel.org
>>>> Subject: Re: Severe performance degradation with jewel rbd image
>>>>
>>>> Just to eliminate the most straightforward explanation, are you running multiple fio jobs against the same image concurrently?  If the exclusive lock had to ping-pong back-and-forth between clients, that would certainly explain the severe performance penalty.
>>>>
>>>> Otherwise, the exclusive lock is not in the IO path once the client has acquired the exclusive lock.  If you are seeing a performance penalty for a single-client scenario with exclusive lock enabled, this is something we haven't seen and will have to investigate ASAP.
>>>>
>>>> Thanks,
>>>>
>>>> On Wed, May 25, 2016 at 4:48 PM, Somnath Roy <Somnath.Roy@sandisk.com> wrote:
>>>>> Hi Mark/Josh,
>>>>> As I mentioned in the performance meeting today , if we create rbd image with default 'rbd create' command in jewel , the individual image performance for 4k RW is not scaling up well. But, multiple of rbd images running in parallel aggregated throughput is scaling.
>>>>> For the same QD and numjob combination, image created with image format 1 (and with hammer like rbd_default_features = 3) is producing *16X* more performance. I did some digging and here is my findings.
>>>>>
>>>>> Setup:
>>>>> --------
>>>>>
>>>>> 32 osds (all SSD) over 4 nodes. Pool size = 2 , min_size = 1.
>>>>>
>>>>> root@stormeap-1:~# ceph -s
>>>>>     cluster db0febf1-d2b0-4f8d-8f20-43731c134763
>>>>>      health HEALTH_WARN
>>>>>             noscrub,nodeep-scrub,sortbitwise flag(s) set
>>>>>      monmap e1: 1 mons at {a=10.60.194.10:6789/0}
>>>>>             election epoch 5, quorum 0 a
>>>>>      osdmap e139: 32 osds: 32 up, 32 in
>>>>>             flags noscrub,nodeep-scrub,sortbitwise
>>>>>       pgmap v20532: 2500 pgs, 1 pools, 7421 GB data, 1855 kobjects
>>>>>             14850 GB used, 208 TB / 223 TB avail
>>>>>                 2500 active+clean
>>>>>
>>>>> IO profile : Fio rbd with QD 128 and numjob = 10 rbd cache is
>>>>> disabled.
>>>>>
>>>>> Result:
>>>>> --------
>>>>> root@stormeap-1:~# rbd info recovery_test/rbd_degradation rbd image
>>>>> 'rbd_degradation':
>>>>>         size 1953 GB in 500000 objects
>>>>>         order 22 (4096 kB objects)
>>>>>         block_name_prefix: rb.0.5f5f.6b8b4567
>>>>>         format: 1
>>>>>
>>>>> On the above image with format 1 it is giving *~102K iops*
>>>>>
>>>>> root@stormeap-1:~# rbd info
>>>>> recovery_test/rbd_degradation_with_hammer_features
>>>>> rbd image 'rbd_degradation_with_hammer_features':
>>>>>         size 195 GB in 50000 objects
>>>>>         order 22 (4096 kB objects)
>>>>>         block_name_prefix: rbd_data.5f8d6b8b4567
>>>>>         format: 2
>>>>>         features: layering
>>>>>         flags:
>>>>>
>>>>> On the above image with hammer rbd features on , it is giving
>>>>> *~105K
>>>>> iops*
>>>>>
>>>>> root@stormeap-1:~# rbd info recovery_test/rbd_degradation_with_7
>>>>> rbd image 'rbd_degradation_with_7':
>>>>>         size 195 GB in 50000 objects
>>>>>         order 22 (4096 kB objects)
>>>>>         block_name_prefix: rbd_data.5fd86b8b4567
>>>>>         format: 2
>>>>>         features: layering, exclusive-lock
>>>>>         flags:
>>>>>
>>>>> On the above image with feature 7 (exclusive lock feature on) , it
>>>>> is giving *~8K iops*...So, >12X degradation
>>>>>
>>>>> Tried with single numjob and QD = 128 , performance bumped up till ~40K..Further increasing QD , performance is not going up.
>>>>>
>>>>>
>>>>> root@stormeap-1:~# rbd info recovery_test/rbd_degradation_with_15
>>>>> rbd image 'rbd_degradation_with_15':
>>>>>         size 195 GB in 50000 objects
>>>>>         order 22 (4096 kB objects)
>>>>>         block_name_prefix: rbd_data.5fab6b8b4567
>>>>>         format: 2
>>>>>         features: layering, exclusive-lock, object-map
>>>>>         flags:
>>>>>
>>>>> On the above image with feature 15 (exclusive lock, object map
>>>>> feature
>>>>> on) , it is giving *~8K iops*...So, >12X degradation
>>>>>
>>>>> Tried with single numjob and QD = 128 , performance bumped up till ~40K..Further increasing QD , performance is not going up.
>>>>>
>>>>>
>>>>> root@stormeap-1:~# rbd info recovery_test/ceph_recovery_img_1 rbd
>>>>> image 'ceph_recovery_img_1':
>>>>>         size 4882 GB in 1250000 objects
>>>>>         order 22 (4096 kB objects)
>>>>>         block_name_prefix: rbd_data.371b6b8b4567
>>>>>         format: 2
>>>>>         features: layering, exclusive-lock, object-map, fast-diff, deep-flatten
>>>>>         flags:
>>>>>
>>>>> On the above image with feature 61 (Jewel default) , it is giving
>>>>> *~6K iops*...So, *>16X* degradation
>>>>>
>>>>> Tried with single numjob and QD = 128 , performance bumped up till ~35K..Further increasing QD , performance is not going up.
>>>>>
>>>>> Summary :
>>>>> ------------
>>>>>
>>>>> 1. It seems exclusive lock feature is degrading performance.
>>>>>
>>>>> 2. It is degrading a bit further on enabling fast-diff,
>>>>> deep-flatten
>>>>>
>>>>>
>>>>> Let me know if you need more information on this.
>>>>>
>>>>> Thanks & Regards
>>>>> Somnath
>>>>>
>>>>>
>>>>> PLEASE NOTE: The information contained in this electronic mail message is intended only for the use of the designated recipient(s) named above. If the reader of this message is not the intended recipient, you are hereby notified that you have received this message in error and that any review, dissemination, distribution, or copying of this message is strictly prohibited. If you have received this communication in error, please notify the sender by telephone or e-mail (as shown above) immediately and destroy any and all copies of this message in your possession (whether hard copies or electronically stored copies).
>>>>> --
>>>>> To unsubscribe from this list: send the line "unsubscribe ceph-devel"
>>>>> in the body of a message to majordomo@vger.kernel.org More
>>>>> majordomo info at  http://vger.kernel.org/majordomo-info.html
>>>>
>>>>
>>>>
>>>> --
>>>> Jason
>>>
>>>
>>> --
>>> Jason
>>
>>
>>
>> --
>> Jason
>
>
>
> --
> Jason

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: Severe performance degradation with jewel rbd image
  2016-05-26 18:02                 ` Samuel Just
@ 2016-05-26 18:17                   ` Jason Dillaman
  2016-05-26 18:49                     ` Somnath Roy
  0 siblings, 1 reply; 18+ messages in thread
From: Jason Dillaman @ 2016-05-26 18:17 UTC (permalink / raw)
  To: Samuel Just; +Cc: Somnath Roy, ceph-devel

Correct -- by default exclusive lock, object map, fast-diff, and
deep-flatten will be enabled starting with Jewel for all new images.
The exclusive lock, object map, fast-diff, and deep-flatten features
are to be used for single-client, single-image use-cases.  Only object
map and fast-diff are in the IO path (fast-diff is an extension to the
object map).

I agree with Haomai that we need to address the bottlenecks that are
capping single-client performance to ~40K IOPS.  It would be great to
eventually see an incoming message be able to be processed from the
messenger all the way to the librbd AIO callback without unnecessary
contexts switches, extra queuing, etc.

On Thu, May 26, 2016 at 2:02 PM, Samuel Just <sjust@redhat.com> wrote:
> It should be noted that many users *will* have the lock enabled since
> (iirc, Jason, correct me if I'm wrong), it's there to support some
> important features and typical cloud users are 1 client/image users.
> If you benchmark without it, it may skew your results.
> -Sam
>
> On Thu, May 26, 2016 at 10:47 AM, Somnath Roy <Somnath.Roy@sandisk.com> wrote:
>> Thanks Jason , it helps !
>> One question , if I disable exclusive lock (and thus other features) during creation can I enable it on the same image in future if needed without any disruption ?
>>
>> Regards
>> Somnath
>>
>> -----Original Message-----
>> From: Jason Dillaman [mailto:jdillama@redhat.com]
>> Sent: Thursday, May 26, 2016 6:29 AM
>> To: Somnath Roy
>> Cc: ceph-devel@vger.kernel.org
>> Subject: Re: Severe performance degradation with jewel rbd image
>>
>> I create a ticket [1] a while ago to improve the documentation of RBD image features.  The ticket is still open but I just added some verbiage as a comment to the ticket in the meantime.
>>
>> [1] http://tracker.ceph.com/issues/15000
>>
>> On Thu, May 26, 2016 at 12:19 AM, Somnath Roy <Somnath.Roy@sandisk.com> wrote:
>>> Thanks Jason !
>>> My bad, I thought exclusive lock is to maintain the consistency.
>>> I think the features like objectmap , fast-diff , deep-flatten, journaling could not be enabled if I disable exclusive lock ?
>>> Could you please give a one liner or point me to the doc where I can find what the features like objectmap , fast-diff , deep-flatten does ?
>>>
>>> Thanks & Regards
>>> Somnath
>>>
>>> -----Original Message-----
>>> From: Jason Dillaman [mailto:jdillama@redhat.com]
>>> Sent: Wednesday, May 25, 2016 8:53 PM
>>> To: Somnath Roy
>>> Cc: ceph-devel@vger.kernel.org
>>> Subject: Re: Severe performance degradation with jewel rbd image
>>>
>>> For multi-client, single-image, you should be using the "--image-shared" option when creating the image (or just disable exclusive-lock after the fact) since the expected use case of exclusive-lock is single client, single image (e.g. single QEMU process that can live-migrate to a new host but both hosts won't be writing concurrently).
>>>
>>> Regardless of whether or not you use exclusive-lock or not, when you have multiple clients concurrently writing to the same image, the necessary coordination to provide consistency needs to be provided at the application layer (i.e. use a clustered filesystem on top of a single RBD image when being manipulated by multiple clients concurrently).
>>>
>>> On Wed, May 25, 2016 at 11:39 PM, Somnath Roy <Somnath.Roy@sandisk.com> wrote:
>>>> Jason,
>>>> My use case is to find out how much write performance I can extract out of a single rbd image.
>>>> I don't want to use  --image-shared as writes will be inconsistent
>>>> then (?) It seems running a single fio job with high QD is the only option ?
>>>> Also, I believe the goal should be at least getting the similar aggregated throughput like single client/single image in case of multi client/single image (individual client will get less and that's fine).
>>>> Let me know if I am missing anything.
>>>>
>>>> Thanks & Regards
>>>> Somnath
>>>>
>>>>
>>>> -----Original Message-----
>>>> From: Jason Dillaman [mailto:jdillama@redhat.com]
>>>> Sent: Wednesday, May 25, 2016 5:51 PM
>>>> To: Somnath Roy
>>>> Cc: ceph-devel@vger.kernel.org
>>>> Subject: Re: Severe performance degradation with jewel rbd image
>>>>
>>>> Are you attempting to test a particular use-case where you would have multiple clients connected to a single RBD image?  The rbd CLI has a "--image-shared" option when creating/cloning images as a shortcut to easily disable the exclusive lock, object-map, fast-diff, and journaling features for such situations. You could also specify a different `rbdname` per job to simulate multiple clients accessing multiple images (instead of multiple clients sharing the same image).
>>>>
>>>> I have to be honest, I am actually pretty impressed by your 8K IOPS when you have multiple clients fighting over the exclusive lock since acquiring the lock requires inter-client cooperative coordination to request/release/acquire the lock from the current owner and any client without the lock has all writes blocked.
>>>>
>>>>
>>>> On Wed, May 25, 2016 at 7:08 PM, Somnath Roy <Somnath.Roy@sandisk.com> wrote:
>>>>> Hi Jason,
>>>>> Yes, I am running on single image but with multiple fio jobs (with numjobs = 10 in the fio parameter) . I expected that this exclusive lock is locking between the jobs and that's why I have posted the single job result. Single job result is *not degrading*.
>>>>> But, we need numjob and QD combination to extract most performance from an image with fio-rbd. A single librbd instance performance seems to be stuck at 40K and we can't scale this image up without running multiple librbd instances on this in parallel.
>>>>> 16X performance degradation because of this lock seems very destructive.
>>>>>
>>>>> Thanks & Regards
>>>>> Somnath
>>>>>
>>>>> -----Original Message-----
>>>>> From: Jason Dillaman [mailto:jdillama@redhat.com]
>>>>> Sent: Wednesday, May 25, 2016 3:47 PM
>>>>> To: Somnath Roy
>>>>> Cc: ceph-devel@vger.kernel.org
>>>>> Subject: Re: Severe performance degradation with jewel rbd image
>>>>>
>>>>> Just to eliminate the most straightforward explanation, are you running multiple fio jobs against the same image concurrently?  If the exclusive lock had to ping-pong back-and-forth between clients, that would certainly explain the severe performance penalty.
>>>>>
>>>>> Otherwise, the exclusive lock is not in the IO path once the client has acquired the exclusive lock.  If you are seeing a performance penalty for a single-client scenario with exclusive lock enabled, this is something we haven't seen and will have to investigate ASAP.
>>>>>
>>>>> Thanks,
>>>>>
>>>>> On Wed, May 25, 2016 at 4:48 PM, Somnath Roy <Somnath.Roy@sandisk.com> wrote:
>>>>>> Hi Mark/Josh,
>>>>>> As I mentioned in the performance meeting today , if we create rbd image with default 'rbd create' command in jewel , the individual image performance for 4k RW is not scaling up well. But, multiple of rbd images running in parallel aggregated throughput is scaling.
>>>>>> For the same QD and numjob combination, image created with image format 1 (and with hammer like rbd_default_features = 3) is producing *16X* more performance. I did some digging and here is my findings.
>>>>>>
>>>>>> Setup:
>>>>>> --------
>>>>>>
>>>>>> 32 osds (all SSD) over 4 nodes. Pool size = 2 , min_size = 1.
>>>>>>
>>>>>> root@stormeap-1:~# ceph -s
>>>>>>     cluster db0febf1-d2b0-4f8d-8f20-43731c134763
>>>>>>      health HEALTH_WARN
>>>>>>             noscrub,nodeep-scrub,sortbitwise flag(s) set
>>>>>>      monmap e1: 1 mons at {a=10.60.194.10:6789/0}
>>>>>>             election epoch 5, quorum 0 a
>>>>>>      osdmap e139: 32 osds: 32 up, 32 in
>>>>>>             flags noscrub,nodeep-scrub,sortbitwise
>>>>>>       pgmap v20532: 2500 pgs, 1 pools, 7421 GB data, 1855 kobjects
>>>>>>             14850 GB used, 208 TB / 223 TB avail
>>>>>>                 2500 active+clean
>>>>>>
>>>>>> IO profile : Fio rbd with QD 128 and numjob = 10 rbd cache is
>>>>>> disabled.
>>>>>>
>>>>>> Result:
>>>>>> --------
>>>>>> root@stormeap-1:~# rbd info recovery_test/rbd_degradation rbd image
>>>>>> 'rbd_degradation':
>>>>>>         size 1953 GB in 500000 objects
>>>>>>         order 22 (4096 kB objects)
>>>>>>         block_name_prefix: rb.0.5f5f.6b8b4567
>>>>>>         format: 1
>>>>>>
>>>>>> On the above image with format 1 it is giving *~102K iops*
>>>>>>
>>>>>> root@stormeap-1:~# rbd info
>>>>>> recovery_test/rbd_degradation_with_hammer_features
>>>>>> rbd image 'rbd_degradation_with_hammer_features':
>>>>>>         size 195 GB in 50000 objects
>>>>>>         order 22 (4096 kB objects)
>>>>>>         block_name_prefix: rbd_data.5f8d6b8b4567
>>>>>>         format: 2
>>>>>>         features: layering
>>>>>>         flags:
>>>>>>
>>>>>> On the above image with hammer rbd features on , it is giving
>>>>>> *~105K
>>>>>> iops*
>>>>>>
>>>>>> root@stormeap-1:~# rbd info recovery_test/rbd_degradation_with_7
>>>>>> rbd image 'rbd_degradation_with_7':
>>>>>>         size 195 GB in 50000 objects
>>>>>>         order 22 (4096 kB objects)
>>>>>>         block_name_prefix: rbd_data.5fd86b8b4567
>>>>>>         format: 2
>>>>>>         features: layering, exclusive-lock
>>>>>>         flags:
>>>>>>
>>>>>> On the above image with feature 7 (exclusive lock feature on) , it
>>>>>> is giving *~8K iops*...So, >12X degradation
>>>>>>
>>>>>> Tried with single numjob and QD = 128 , performance bumped up till ~40K..Further increasing QD , performance is not going up.
>>>>>>
>>>>>>
>>>>>> root@stormeap-1:~# rbd info recovery_test/rbd_degradation_with_15
>>>>>> rbd image 'rbd_degradation_with_15':
>>>>>>         size 195 GB in 50000 objects
>>>>>>         order 22 (4096 kB objects)
>>>>>>         block_name_prefix: rbd_data.5fab6b8b4567
>>>>>>         format: 2
>>>>>>         features: layering, exclusive-lock, object-map
>>>>>>         flags:
>>>>>>
>>>>>> On the above image with feature 15 (exclusive lock, object map
>>>>>> feature
>>>>>> on) , it is giving *~8K iops*...So, >12X degradation
>>>>>>
>>>>>> Tried with single numjob and QD = 128 , performance bumped up till ~40K..Further increasing QD , performance is not going up.
>>>>>>
>>>>>>
>>>>>> root@stormeap-1:~# rbd info recovery_test/ceph_recovery_img_1 rbd
>>>>>> image 'ceph_recovery_img_1':
>>>>>>         size 4882 GB in 1250000 objects
>>>>>>         order 22 (4096 kB objects)
>>>>>>         block_name_prefix: rbd_data.371b6b8b4567
>>>>>>         format: 2
>>>>>>         features: layering, exclusive-lock, object-map, fast-diff, deep-flatten
>>>>>>         flags:
>>>>>>
>>>>>> On the above image with feature 61 (Jewel default) , it is giving
>>>>>> *~6K iops*...So, *>16X* degradation
>>>>>>
>>>>>> Tried with single numjob and QD = 128 , performance bumped up till ~35K..Further increasing QD , performance is not going up.
>>>>>>
>>>>>> Summary :
>>>>>> ------------
>>>>>>
>>>>>> 1. It seems exclusive lock feature is degrading performance.
>>>>>>
>>>>>> 2. It is degrading a bit further on enabling fast-diff,
>>>>>> deep-flatten
>>>>>>
>>>>>>
>>>>>> Let me know if you need more information on this.
>>>>>>
>>>>>> Thanks & Regards
>>>>>> Somnath
>>>>>>
>>>>>>
>>>>>> PLEASE NOTE: The information contained in this electronic mail message is intended only for the use of the designated recipient(s) named above. If the reader of this message is not the intended recipient, you are hereby notified that you have received this message in error and that any review, dissemination, distribution, or copying of this message is strictly prohibited. If you have received this communication in error, please notify the sender by telephone or e-mail (as shown above) immediately and destroy any and all copies of this message in your possession (whether hard copies or electronically stored copies).
>>>>>> --
>>>>>> To unsubscribe from this list: send the line "unsubscribe ceph-devel"
>>>>>> in the body of a message to majordomo@vger.kernel.org More
>>>>>> majordomo info at  http://vger.kernel.org/majordomo-info.html
>>>>>
>>>>>
>>>>>
>>>>> --
>>>>> Jason
>>>>
>>>>
>>>> --
>>>> Jason
>>>
>>>
>>>
>>> --
>>> Jason
>>
>>
>>
>> --
>> Jason



-- 
Jason

^ permalink raw reply	[flat|nested] 18+ messages in thread

* RE: Severe performance degradation with jewel rbd image
  2016-05-26 18:17                   ` Jason Dillaman
@ 2016-05-26 18:49                     ` Somnath Roy
  2016-05-26 18:52                       ` Jason Dillaman
  0 siblings, 1 reply; 18+ messages in thread
From: Somnath Roy @ 2016-05-26 18:49 UTC (permalink / raw)
  To: dillaman, Samuel Just; +Cc: ceph-devel

Yes, single client bottleneck is hurting us. We had some discussion in the past regarding that and it seems major restructuring required to solve this.
Back to my original question, so, enabling these features is only during image creation and can't do that afterwards , am I right ?

Thanks & Regards
Somnath

-----Original Message-----
From: Jason Dillaman [mailto:jdillama@redhat.com] 
Sent: Thursday, May 26, 2016 11:17 AM
To: Samuel Just
Cc: Somnath Roy; ceph-devel@vger.kernel.org
Subject: Re: Severe performance degradation with jewel rbd image

Correct -- by default exclusive lock, object map, fast-diff, and deep-flatten will be enabled starting with Jewel for all new images.
The exclusive lock, object map, fast-diff, and deep-flatten features are to be used for single-client, single-image use-cases.  Only object map and fast-diff are in the IO path (fast-diff is an extension to the object map).

I agree with Haomai that we need to address the bottlenecks that are capping single-client performance to ~40K IOPS.  It would be great to eventually see an incoming message be able to be processed from the messenger all the way to the librbd AIO callback without unnecessary contexts switches, extra queuing, etc.

On Thu, May 26, 2016 at 2:02 PM, Samuel Just <sjust@redhat.com> wrote:
> It should be noted that many users *will* have the lock enabled since 
> (iirc, Jason, correct me if I'm wrong), it's there to support some 
> important features and typical cloud users are 1 client/image users.
> If you benchmark without it, it may skew your results.
> -Sam
>
> On Thu, May 26, 2016 at 10:47 AM, Somnath Roy <Somnath.Roy@sandisk.com> wrote:
>> Thanks Jason , it helps !
>> One question , if I disable exclusive lock (and thus other features) during creation can I enable it on the same image in future if needed without any disruption ?
>>
>> Regards
>> Somnath
>>
>> -----Original Message-----
>> From: Jason Dillaman [mailto:jdillama@redhat.com]
>> Sent: Thursday, May 26, 2016 6:29 AM
>> To: Somnath Roy
>> Cc: ceph-devel@vger.kernel.org
>> Subject: Re: Severe performance degradation with jewel rbd image
>>
>> I create a ticket [1] a while ago to improve the documentation of RBD image features.  The ticket is still open but I just added some verbiage as a comment to the ticket in the meantime.
>>
>> [1] http://tracker.ceph.com/issues/15000
>>
>> On Thu, May 26, 2016 at 12:19 AM, Somnath Roy <Somnath.Roy@sandisk.com> wrote:
>>> Thanks Jason !
>>> My bad, I thought exclusive lock is to maintain the consistency.
>>> I think the features like objectmap , fast-diff , deep-flatten, journaling could not be enabled if I disable exclusive lock ?
>>> Could you please give a one liner or point me to the doc where I can find what the features like objectmap , fast-diff , deep-flatten does ?
>>>
>>> Thanks & Regards
>>> Somnath
>>>
>>> -----Original Message-----
>>> From: Jason Dillaman [mailto:jdillama@redhat.com]
>>> Sent: Wednesday, May 25, 2016 8:53 PM
>>> To: Somnath Roy
>>> Cc: ceph-devel@vger.kernel.org
>>> Subject: Re: Severe performance degradation with jewel rbd image
>>>
>>> For multi-client, single-image, you should be using the "--image-shared" option when creating the image (or just disable exclusive-lock after the fact) since the expected use case of exclusive-lock is single client, single image (e.g. single QEMU process that can live-migrate to a new host but both hosts won't be writing concurrently).
>>>
>>> Regardless of whether or not you use exclusive-lock or not, when you have multiple clients concurrently writing to the same image, the necessary coordination to provide consistency needs to be provided at the application layer (i.e. use a clustered filesystem on top of a single RBD image when being manipulated by multiple clients concurrently).
>>>
>>> On Wed, May 25, 2016 at 11:39 PM, Somnath Roy <Somnath.Roy@sandisk.com> wrote:
>>>> Jason,
>>>> My use case is to find out how much write performance I can extract out of a single rbd image.
>>>> I don't want to use  --image-shared as writes will be inconsistent 
>>>> then (?) It seems running a single fio job with high QD is the only option ?
>>>> Also, I believe the goal should be at least getting the similar aggregated throughput like single client/single image in case of multi client/single image (individual client will get less and that's fine).
>>>> Let me know if I am missing anything.
>>>>
>>>> Thanks & Regards
>>>> Somnath
>>>>
>>>>
>>>> -----Original Message-----
>>>> From: Jason Dillaman [mailto:jdillama@redhat.com]
>>>> Sent: Wednesday, May 25, 2016 5:51 PM
>>>> To: Somnath Roy
>>>> Cc: ceph-devel@vger.kernel.org
>>>> Subject: Re: Severe performance degradation with jewel rbd image
>>>>
>>>> Are you attempting to test a particular use-case where you would have multiple clients connected to a single RBD image?  The rbd CLI has a "--image-shared" option when creating/cloning images as a shortcut to easily disable the exclusive lock, object-map, fast-diff, and journaling features for such situations. You could also specify a different `rbdname` per job to simulate multiple clients accessing multiple images (instead of multiple clients sharing the same image).
>>>>
>>>> I have to be honest, I am actually pretty impressed by your 8K IOPS when you have multiple clients fighting over the exclusive lock since acquiring the lock requires inter-client cooperative coordination to request/release/acquire the lock from the current owner and any client without the lock has all writes blocked.
>>>>
>>>>
>>>> On Wed, May 25, 2016 at 7:08 PM, Somnath Roy <Somnath.Roy@sandisk.com> wrote:
>>>>> Hi Jason,
>>>>> Yes, I am running on single image but with multiple fio jobs (with numjobs = 10 in the fio parameter) . I expected that this exclusive lock is locking between the jobs and that's why I have posted the single job result. Single job result is *not degrading*.
>>>>> But, we need numjob and QD combination to extract most performance from an image with fio-rbd. A single librbd instance performance seems to be stuck at 40K and we can't scale this image up without running multiple librbd instances on this in parallel.
>>>>> 16X performance degradation because of this lock seems very destructive.
>>>>>
>>>>> Thanks & Regards
>>>>> Somnath
>>>>>
>>>>> -----Original Message-----
>>>>> From: Jason Dillaman [mailto:jdillama@redhat.com]
>>>>> Sent: Wednesday, May 25, 2016 3:47 PM
>>>>> To: Somnath Roy
>>>>> Cc: ceph-devel@vger.kernel.org
>>>>> Subject: Re: Severe performance degradation with jewel rbd image
>>>>>
>>>>> Just to eliminate the most straightforward explanation, are you running multiple fio jobs against the same image concurrently?  If the exclusive lock had to ping-pong back-and-forth between clients, that would certainly explain the severe performance penalty.
>>>>>
>>>>> Otherwise, the exclusive lock is not in the IO path once the client has acquired the exclusive lock.  If you are seeing a performance penalty for a single-client scenario with exclusive lock enabled, this is something we haven't seen and will have to investigate ASAP.
>>>>>
>>>>> Thanks,
>>>>>
>>>>> On Wed, May 25, 2016 at 4:48 PM, Somnath Roy <Somnath.Roy@sandisk.com> wrote:
>>>>>> Hi Mark/Josh,
>>>>>> As I mentioned in the performance meeting today , if we create rbd image with default 'rbd create' command in jewel , the individual image performance for 4k RW is not scaling up well. But, multiple of rbd images running in parallel aggregated throughput is scaling.
>>>>>> For the same QD and numjob combination, image created with image format 1 (and with hammer like rbd_default_features = 3) is producing *16X* more performance. I did some digging and here is my findings.
>>>>>>
>>>>>> Setup:
>>>>>> --------
>>>>>>
>>>>>> 32 osds (all SSD) over 4 nodes. Pool size = 2 , min_size = 1.
>>>>>>
>>>>>> root@stormeap-1:~# ceph -s
>>>>>>     cluster db0febf1-d2b0-4f8d-8f20-43731c134763
>>>>>>      health HEALTH_WARN
>>>>>>             noscrub,nodeep-scrub,sortbitwise flag(s) set
>>>>>>      monmap e1: 1 mons at {a=10.60.194.10:6789/0}
>>>>>>             election epoch 5, quorum 0 a
>>>>>>      osdmap e139: 32 osds: 32 up, 32 in
>>>>>>             flags noscrub,nodeep-scrub,sortbitwise
>>>>>>       pgmap v20532: 2500 pgs, 1 pools, 7421 GB data, 1855 kobjects
>>>>>>             14850 GB used, 208 TB / 223 TB avail
>>>>>>                 2500 active+clean
>>>>>>
>>>>>> IO profile : Fio rbd with QD 128 and numjob = 10 rbd cache is 
>>>>>> disabled.
>>>>>>
>>>>>> Result:
>>>>>> --------
>>>>>> root@stormeap-1:~# rbd info recovery_test/rbd_degradation rbd 
>>>>>> image
>>>>>> 'rbd_degradation':
>>>>>>         size 1953 GB in 500000 objects
>>>>>>         order 22 (4096 kB objects)
>>>>>>         block_name_prefix: rb.0.5f5f.6b8b4567
>>>>>>         format: 1
>>>>>>
>>>>>> On the above image with format 1 it is giving *~102K iops*
>>>>>>
>>>>>> root@stormeap-1:~# rbd info
>>>>>> recovery_test/rbd_degradation_with_hammer_features
>>>>>> rbd image 'rbd_degradation_with_hammer_features':
>>>>>>         size 195 GB in 50000 objects
>>>>>>         order 22 (4096 kB objects)
>>>>>>         block_name_prefix: rbd_data.5f8d6b8b4567
>>>>>>         format: 2
>>>>>>         features: layering
>>>>>>         flags:
>>>>>>
>>>>>> On the above image with hammer rbd features on , it is giving 
>>>>>> *~105K
>>>>>> iops*
>>>>>>
>>>>>> root@stormeap-1:~# rbd info recovery_test/rbd_degradation_with_7
>>>>>> rbd image 'rbd_degradation_with_7':
>>>>>>         size 195 GB in 50000 objects
>>>>>>         order 22 (4096 kB objects)
>>>>>>         block_name_prefix: rbd_data.5fd86b8b4567
>>>>>>         format: 2
>>>>>>         features: layering, exclusive-lock
>>>>>>         flags:
>>>>>>
>>>>>> On the above image with feature 7 (exclusive lock feature on) , 
>>>>>> it is giving *~8K iops*...So, >12X degradation
>>>>>>
>>>>>> Tried with single numjob and QD = 128 , performance bumped up till ~40K..Further increasing QD , performance is not going up.
>>>>>>
>>>>>>
>>>>>> root@stormeap-1:~# rbd info recovery_test/rbd_degradation_with_15
>>>>>> rbd image 'rbd_degradation_with_15':
>>>>>>         size 195 GB in 50000 objects
>>>>>>         order 22 (4096 kB objects)
>>>>>>         block_name_prefix: rbd_data.5fab6b8b4567
>>>>>>         format: 2
>>>>>>         features: layering, exclusive-lock, object-map
>>>>>>         flags:
>>>>>>
>>>>>> On the above image with feature 15 (exclusive lock, object map 
>>>>>> feature
>>>>>> on) , it is giving *~8K iops*...So, >12X degradation
>>>>>>
>>>>>> Tried with single numjob and QD = 128 , performance bumped up till ~40K..Further increasing QD , performance is not going up.
>>>>>>
>>>>>>
>>>>>> root@stormeap-1:~# rbd info recovery_test/ceph_recovery_img_1 rbd 
>>>>>> image 'ceph_recovery_img_1':
>>>>>>         size 4882 GB in 1250000 objects
>>>>>>         order 22 (4096 kB objects)
>>>>>>         block_name_prefix: rbd_data.371b6b8b4567
>>>>>>         format: 2
>>>>>>         features: layering, exclusive-lock, object-map, fast-diff, deep-flatten
>>>>>>         flags:
>>>>>>
>>>>>> On the above image with feature 61 (Jewel default) , it is giving 
>>>>>> *~6K iops*...So, *>16X* degradation
>>>>>>
>>>>>> Tried with single numjob and QD = 128 , performance bumped up till ~35K..Further increasing QD , performance is not going up.
>>>>>>
>>>>>> Summary :
>>>>>> ------------
>>>>>>
>>>>>> 1. It seems exclusive lock feature is degrading performance.
>>>>>>
>>>>>> 2. It is degrading a bit further on enabling fast-diff, 
>>>>>> deep-flatten
>>>>>>
>>>>>>
>>>>>> Let me know if you need more information on this.
>>>>>>
>>>>>> Thanks & Regards
>>>>>> Somnath
>>>>>>
>>>>>>
>>>>>> PLEASE NOTE: The information contained in this electronic mail message is intended only for the use of the designated recipient(s) named above. If the reader of this message is not the intended recipient, you are hereby notified that you have received this message in error and that any review, dissemination, distribution, or copying of this message is strictly prohibited. If you have received this communication in error, please notify the sender by telephone or e-mail (as shown above) immediately and destroy any and all copies of this message in your possession (whether hard copies or electronically stored copies).
>>>>>> --
>>>>>> To unsubscribe from this list: send the line "unsubscribe ceph-devel"
>>>>>> in the body of a message to majordomo@vger.kernel.org More 
>>>>>> majordomo info at  http://vger.kernel.org/majordomo-info.html
>>>>>
>>>>>
>>>>>
>>>>> --
>>>>> Jason
>>>>
>>>>
>>>> --
>>>> Jason
>>>
>>>
>>>
>>> --
>>> Jason
>>
>>
>>
>> --
>> Jason



--
Jason

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: Severe performance degradation with jewel rbd image
  2016-05-26 18:49                     ` Somnath Roy
@ 2016-05-26 18:52                       ` Jason Dillaman
  2016-05-28  7:29                         ` Alexandre DERUMIER
  0 siblings, 1 reply; 18+ messages in thread
From: Jason Dillaman @ 2016-05-26 18:52 UTC (permalink / raw)
  To: Somnath Roy; +Cc: Samuel Just, ceph-devel

You can dynamically enable/disable exclusive-lock, object-map,
fast-diff, and journaling after the image is created via  "rbd feature
disable <image-spec> <feature>".  The deep-flatten feature can be
disabled-only via the same CLI.

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: Severe performance degradation with jewel rbd image
  2016-05-26 18:52                       ` Jason Dillaman
@ 2016-05-28  7:29                         ` Alexandre DERUMIER
  2016-05-28 16:23                           ` Somnath Roy
  2016-05-31 19:57                           ` Jason Dillaman
  0 siblings, 2 replies; 18+ messages in thread
From: Alexandre DERUMIER @ 2016-05-28  7:29 UTC (permalink / raw)
  To: dillaman; +Cc: Somnath Roy, Samuel Just, ceph-devel

Hi,

qemu should have soon multithreaded disk access support (multi queues + multi iothreads)

https://lists.gnu.org/archive/html/qemu-devel/2016-05/msg05023.html

Do you think it'll work with exclusive-lock ?

----- Mail original -----
De: "Jason Dillaman" <jdillama@redhat.com>
À: "Somnath Roy" <Somnath.Roy@sandisk.com>
Cc: "Samuel Just" <sjust@redhat.com>, "ceph-devel" <ceph-devel@vger.kernel.org>
Envoyé: Jeudi 26 Mai 2016 20:52:26
Objet: Re: Severe performance degradation with jewel rbd image

You can dynamically enable/disable exclusive-lock, object-map, 
fast-diff, and journaling after the image is created via "rbd feature 
disable <image-spec> <feature>". The deep-flatten feature can be 
disabled-only via the same CLI. 
-- 
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in 
the body of a message to majordomo@vger.kernel.org 
More majordomo info at http://vger.kernel.org/majordomo-info.html 
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 18+ messages in thread

* RE: Severe performance degradation with jewel rbd image
  2016-05-28  7:29                         ` Alexandre DERUMIER
@ 2016-05-28 16:23                           ` Somnath Roy
  2016-05-31 19:57                           ` Jason Dillaman
  1 sibling, 0 replies; 18+ messages in thread
From: Somnath Roy @ 2016-05-28 16:23 UTC (permalink / raw)
  To: Alexandre DERUMIER, dillaman; +Cc: Samuel Just, ceph-devel

I think you need to disable exclusive lock now explicitly (since it is default 'on' in Jewel) otherwise you need to live with severe performance degradation is I mentioned.

Thanks & Regards
Somnath

-----Original Message-----
From: Alexandre DERUMIER [mailto:aderumier@odiso.com]
Sent: Saturday, May 28, 2016 12:29 AM
To: dillaman@redhat.com
Cc: Somnath Roy; Samuel Just; ceph-devel
Subject: Re: Severe performance degradation with jewel rbd image

Hi,

qemu should have soon multithreaded disk access support (multi queues + multi iothreads)

https://lists.gnu.org/archive/html/qemu-devel/2016-05/msg05023.html

Do you think it'll work with exclusive-lock ?

----- Mail original -----
De: "Jason Dillaman" <jdillama@redhat.com>
À: "Somnath Roy" <Somnath.Roy@sandisk.com>
Cc: "Samuel Just" <sjust@redhat.com>, "ceph-devel" <ceph-devel@vger.kernel.org>
Envoyé: Jeudi 26 Mai 2016 20:52:26
Objet: Re: Severe performance degradation with jewel rbd image

You can dynamically enable/disable exclusive-lock, object-map, fast-diff, and journaling after the image is created via "rbd feature disable <image-spec> <feature>". The deep-flatten feature can be disabled-only via the same CLI.
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
PLEASE NOTE: The information contained in this electronic mail message is intended only for the use of the designated recipient(s) named above. If the reader of this message is not the intended recipient, you are hereby notified that you have received this message in error and that any review, dissemination, distribution, or copying of this message is strictly prohibited. If you have received this communication in error, please notify the sender by telephone or e-mail (as shown above) immediately and destroy any and all copies of this message in your possession (whether hard copies or electronically stored copies).

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: Severe performance degradation with jewel rbd image
  2016-05-28  7:29                         ` Alexandre DERUMIER
  2016-05-28 16:23                           ` Somnath Roy
@ 2016-05-31 19:57                           ` Jason Dillaman
  2016-06-01  6:58                             ` Alexandre DERUMIER
  1 sibling, 1 reply; 18+ messages in thread
From: Jason Dillaman @ 2016-05-31 19:57 UTC (permalink / raw)
  To: Alexandre DERUMIER; +Cc: Somnath Roy, Samuel Just, ceph-devel

That QEMU patch looks like it only supports raw devices and
pre-allocated files.  Once librbd/librados can eliminate all the
threading/queuing slowdowns, a similar approach could be extended to
RBD images without the need to open multiple copies of the same image.

On Sat, May 28, 2016 at 3:29 AM, Alexandre DERUMIER <aderumier@odiso.com> wrote:
> Hi,
>
> qemu should have soon multithreaded disk access support (multi queues + multi iothreads)
>
> https://lists.gnu.org/archive/html/qemu-devel/2016-05/msg05023.html
>
> Do you think it'll work with exclusive-lock ?
>
> ----- Mail original -----
> De: "Jason Dillaman" <jdillama@redhat.com>
> À: "Somnath Roy" <Somnath.Roy@sandisk.com>
> Cc: "Samuel Just" <sjust@redhat.com>, "ceph-devel" <ceph-devel@vger.kernel.org>
> Envoyé: Jeudi 26 Mai 2016 20:52:26
> Objet: Re: Severe performance degradation with jewel rbd image
>
> You can dynamically enable/disable exclusive-lock, object-map,
> fast-diff, and journaling after the image is created via "rbd feature
> disable <image-spec> <feature>". The deep-flatten feature can be
> disabled-only via the same CLI.
> --
> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
> --
> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html



-- 
Jason
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: Severe performance degradation with jewel rbd image
  2016-05-31 19:57                           ` Jason Dillaman
@ 2016-06-01  6:58                             ` Alexandre DERUMIER
  0 siblings, 0 replies; 18+ messages in thread
From: Alexandre DERUMIER @ 2016-06-01  6:58 UTC (permalink / raw)
  To: dillaman; +Cc: Somnath Roy, Samuel Just, ceph-devel

>>Once librbd/librados can eliminate all the 
>>threading/queuing slowdowns, a similar approach could be extended to 
>>RBD images without the need to open multiple copies of the same image. 

Thanks Jason, that's a really good news !

----- Mail original -----
De: "Jason Dillaman" <jdillama@redhat.com>
À: "aderumier" <aderumier@odiso.com>
Cc: "Somnath Roy" <Somnath.Roy@sandisk.com>, "Samuel Just" <sjust@redhat.com>, "ceph-devel" <ceph-devel@vger.kernel.org>
Envoyé: Mardi 31 Mai 2016 21:57:29
Objet: Re: Severe performance degradation with jewel rbd image

That QEMU patch looks like it only supports raw devices and 
pre-allocated files. Once librbd/librados can eliminate all the 
threading/queuing slowdowns, a similar approach could be extended to 
RBD images without the need to open multiple copies of the same image. 

On Sat, May 28, 2016 at 3:29 AM, Alexandre DERUMIER <aderumier@odiso.com> wrote: 
> Hi, 
> 
> qemu should have soon multithreaded disk access support (multi queues + multi iothreads) 
> 
> https://lists.gnu.org/archive/html/qemu-devel/2016-05/msg05023.html 
> 
> Do you think it'll work with exclusive-lock ? 
> 
> ----- Mail original ----- 
> De: "Jason Dillaman" <jdillama@redhat.com> 
> À: "Somnath Roy" <Somnath.Roy@sandisk.com> 
> Cc: "Samuel Just" <sjust@redhat.com>, "ceph-devel" <ceph-devel@vger.kernel.org> 
> Envoyé: Jeudi 26 Mai 2016 20:52:26 
> Objet: Re: Severe performance degradation with jewel rbd image 
> 
> You can dynamically enable/disable exclusive-lock, object-map, 
> fast-diff, and journaling after the image is created via "rbd feature 
> disable <image-spec> <feature>". The deep-flatten feature can be 
> disabled-only via the same CLI. 
> -- 
> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in 
> the body of a message to majordomo@vger.kernel.org 
> More majordomo info at http://vger.kernel.org/majordomo-info.html 
> -- 
> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in 
> the body of a message to majordomo@vger.kernel.org 
> More majordomo info at http://vger.kernel.org/majordomo-info.html 



-- 
Jason 

--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 18+ messages in thread

end of thread, other threads:[~2016-06-01  6:58 UTC | newest]

Thread overview: 18+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-05-25 20:48 Severe performance degradation with jewel rbd image Somnath Roy
2016-05-25 22:47 ` Jason Dillaman
2016-05-25 23:08   ` Somnath Roy
2016-05-26  0:50     ` Jason Dillaman
2016-05-26  3:39       ` Somnath Roy
2016-05-26  3:52         ` Jason Dillaman
2016-05-26  4:19           ` Somnath Roy
2016-05-26 13:28             ` Jason Dillaman
2016-05-26 17:47               ` Somnath Roy
2016-05-26 18:02                 ` Samuel Just
2016-05-26 18:17                   ` Jason Dillaman
2016-05-26 18:49                     ` Somnath Roy
2016-05-26 18:52                       ` Jason Dillaman
2016-05-28  7:29                         ` Alexandre DERUMIER
2016-05-28 16:23                           ` Somnath Roy
2016-05-31 19:57                           ` Jason Dillaman
2016-06-01  6:58                             ` Alexandre DERUMIER
2016-05-26  2:01     ` Haomai Wang

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.