All of lore.kernel.org
 help / color / mirror / Atom feed
* Perfomance CPU and IOPS
@ 2015-05-24 11:21 Casier David
  2015-05-24 12:09 ` Mark Nelson
  2015-05-26  8:34 ` Somnath Roy
  0 siblings, 2 replies; 4+ messages in thread
From: Casier David @ 2015-05-24 11:21 UTC (permalink / raw)
  To: ceph-devel

Hello everybody,
I have some suggestions to improve the Ceph performance with in case of 
using Rados Block Device.

On FileStore :
  - Remove all metadata in HDD and used omap on SSD. This reduce IOPS 
and increases throughput.
  - Remove journal, thread "sync_entry", and write directly in 
queue_transaction.

To compensate journal, you could use Cache Tier Ceph.

ceph-osd must be with less Thread and Lock.
With 1 OSD for 1 HDD, i think Lock is necessary only for scrub, recovery 
or other background job.
And only one thread with the use of libaio.

I think Ceph-OSD should be very light.
Potentially with direct writing aftergiven the transmitted data to other 
OSD from map.
In this case, a lot of ceph-osd could work on the same server.

Actually, i work on the repository https://www.github.com/dcasier/ceph.
You could see start works on FileStore.*
But potentially not safe.

David.

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: Perfomance CPU and IOPS
  2015-05-24 11:21 Perfomance CPU and IOPS Casier David
@ 2015-05-24 12:09 ` Mark Nelson
  2015-05-24 15:55   ` Casier David
  2015-05-26  8:34 ` Somnath Roy
  1 sibling, 1 reply; 4+ messages in thread
From: Mark Nelson @ 2015-05-24 12:09 UTC (permalink / raw)
  To: Casier David, ceph-devel



On 05/24/2015 06:21 AM, Casier David wrote:
> Hello everybody,
> I have some suggestions to improve the Ceph performance with in case of
> using Rados Block Device.
>
> On FileStore :
>   - Remove all metadata in HDD and used omap on SSD. This reduce IOPS
> and increases throughput.

This may matter more now that we've been removing other bottlenecks. In 
the past when we've tested this with filestore on spinning disks it 
hasn't made a huge difference.  Having said that Sam recently committed 
a change that should make it safer to use leveldb/rocksdb on alternate 
disks:

https://github.com/ceph/ceph/pull/4718

>   - Remove journal, thread "sync_entry", and write directly in
> queue_transaction.
>
> To compensate journal, you could use Cache Tier Ceph.

Cache teiring isn't really a good solution for general Ceph workloads 
right now.  The overhead of object promotions (4MB be default) into the 
cache tier is just too heavy when the hot/cold distribution is not 
highly skewed.

Instead I'd suggest reading up on Sage's newstore project:

http://www.spinics.net/lists/ceph-devel/msg22712.html
https://wiki.ceph.com/Planning/Blueprints/Infernalis/NewStore_%28new_osd_backend%29

Specifically note that the WAL is used during overwrites but for full 
object writes we can simply write out the new object using libaio and 
remove the old one.

Here's our latest performance results vs filestore.  The biggest area we 
need to improve is large object object partial overwrites on fast 
devices (ie SSDs):

http://nhm.ceph.com/newstore/8c8c5903_rbd_rados_tests.pdf

We may be able to split objects up into ~512k fragments to help deal 
with large partial object overwites.  It may also be that we could help 
the rocksdb folks change the way the WAL works (dedicated portion of the 
disk like Ceph journals rather than log files that get created/deleted 
on the disk)

>
> ceph-osd must be with less Thread and Lock.
> With 1 OSD for 1 HDD, i think Lock is necessary only for scrub, recovery
> or other background job.
> And only one thread with the use of libaio.

Which locks do you think could be safely removed?

>
> I think Ceph-OSD should be very light.
> Potentially with direct writing aftergiven the transmitted data to other
> OSD from map.
> In this case, a lot of ceph-osd could work on the same server.
>
> Actually, i work on the repository https://www.github.com/dcasier/ceph.
> You could see start works on FileStore.*
> But potentially not safe.

I'd highly suggest getting feedback from Sam/Sage before going too far 
down this rabbit hole. :)

>
> David.
> --
> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: Perfomance CPU and IOPS
  2015-05-24 12:09 ` Mark Nelson
@ 2015-05-24 15:55   ` Casier David
  0 siblings, 0 replies; 4+ messages in thread
From: Casier David @ 2015-05-24 15:55 UTC (permalink / raw)
  To: ceph-devel


> Which locks do you think could be safely removed?
I'm still trying to learn the code

Why there are as many threads in SimpleMessenger ?

Other way : why not 1 ceph-osd/server and multi drives/ceph-osd (without 
soft RAID 0) ?
Sample :
  - /var/lib/ceph/osd/ceph-server/drive1/current
  - /var/lib/ceph/osd/ceph-server/drive2/current

ceph osd drive set drive1 down
ceph osd drive migrate drive1 drive2
ceph osd drive remove drive1
ceph osd drive set drive2 up

On 24/05/2015 14:09, Mark Nelson wrote:
>
>
> On 05/24/2015 06:21 AM, Casier David wrote:
>> Hello everybody,
>> I have some suggestions to improve the Ceph performance with in case of
>> using Rados Block Device.
>>
>> On FileStore :
>>   - Remove all metadata in HDD and used omap on SSD. This reduce IOPS
>> and increases throughput.
>
> This may matter more now that we've been removing other bottlenecks. 
> In the past when we've tested this with filestore on spinning disks it 
> hasn't made a huge difference.  Having said that Sam recently 
> committed a change that should make it safer to use leveldb/rocksdb on 
> alternate disks:
>
> https://github.com/ceph/ceph/pull/4718
>
>>   - Remove journal, thread "sync_entry", and write directly in
>> queue_transaction.
>>
>> To compensate journal, you could use Cache Tier Ceph.
>
> Cache teiring isn't really a good solution for general Ceph workloads 
> right now.  The overhead of object promotions (4MB be default) into 
> the cache tier is just too heavy when the hot/cold distribution is not 
> highly skewed.
>
> Instead I'd suggest reading up on Sage's newstore project:
>
> http://www.spinics.net/lists/ceph-devel/msg22712.html
> https://wiki.ceph.com/Planning/Blueprints/Infernalis/NewStore_%28new_osd_backend%29 
>
>
> Specifically note that the WAL is used during overwrites but for full 
> object writes we can simply write out the new object using libaio and 
> remove the old one.
>
> Here's our latest performance results vs filestore.  The biggest area 
> we need to improve is large object object partial overwrites on fast 
> devices (ie SSDs):
>
> http://nhm.ceph.com/newstore/8c8c5903_rbd_rados_tests.pdf
>
> We may be able to split objects up into ~512k fragments to help deal 
> with large partial object overwites.  It may also be that we could 
> help the rocksdb folks change the way the WAL works (dedicated portion 
> of the disk like Ceph journals rather than log files that get 
> created/deleted on the disk)
>
>>
>> ceph-osd must be with less Thread and Lock.
>> With 1 OSD for 1 HDD, i think Lock is necessary only for scrub, recovery
>> or other background job.
>> And only one thread with the use of libaio.
>
> Which locks do you think could be safely removed?
>
>>
>> I think Ceph-OSD should be very light.
>> Potentially with direct writing aftergiven the transmitted data to other
>> OSD from map.
>> In this case, a lot of ceph-osd could work on the same server.
>>
>> Actually, i work on the repository https://www.github.com/dcasier/ceph.
>> You could see start works on FileStore.*
>> But potentially not safe.
>
> I'd highly suggest getting feedback from Sam/Sage before going too far 
> down this rabbit hole. :)
>
>>
>> David.
>> -- 
>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
>> the body of a message to majordomo@vger.kernel.org
>> More majordomo info at http://vger.kernel.org/majordomo-info.html


^ permalink raw reply	[flat|nested] 4+ messages in thread

* RE: Perfomance CPU and IOPS
  2015-05-24 11:21 Perfomance CPU and IOPS Casier David
  2015-05-24 12:09 ` Mark Nelson
@ 2015-05-26  8:34 ` Somnath Roy
  1 sibling, 0 replies; 4+ messages in thread
From: Somnath Roy @ 2015-05-26  8:34 UTC (permalink / raw)
  To: Casier David, ceph-devel

Ceph journal in case of filestore is providing transactional writes, so, you can't remove journal.

Thanks & Regards
Somnath

-----Original Message-----
From: ceph-devel-owner@vger.kernel.org [mailto:ceph-devel-owner@vger.kernel.org] On Behalf Of Casier David
Sent: Sunday, May 24, 2015 4:22 AM
To: ceph-devel@vger.kernel.org
Subject: Perfomance CPU and IOPS

Hello everybody,
I have some suggestions to improve the Ceph performance with in case of using Rados Block Device.

On FileStore :
  - Remove all metadata in HDD and used omap on SSD. This reduce IOPS and increases throughput.
  - Remove journal, thread "sync_entry", and write directly in queue_transaction.

To compensate journal, you could use Cache Tier Ceph.

ceph-osd must be with less Thread and Lock.
With 1 OSD for 1 HDD, i think Lock is necessary only for scrub, recovery or other background job.
And only one thread with the use of libaio.

I think Ceph-OSD should be very light.
Potentially with direct writing aftergiven the transmitted data to other OSD from map.
In this case, a lot of ceph-osd could work on the same server.

Actually, i work on the repository https://www.github.com/dcasier/ceph.
You could see start works on FileStore.* But potentially not safe.

David.
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majordomo@vger.kernel.org More majordomo info at  http://vger.kernel.org/majordomo-info.html

________________________________

PLEASE NOTE: The information contained in this electronic mail message is intended only for the use of the designated recipient(s) named above. If the reader of this message is not the intended recipient, you are hereby notified that you have received this message in error and that any review, dissemination, distribution, or copying of this message is strictly prohibited. If you have received this communication in error, please notify the sender by telephone or e-mail (as shown above) immediately and destroy any and all copies of this message in your possession (whether hard copies or electronically stored copies).


^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2015-05-26  8:53 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-05-24 11:21 Perfomance CPU and IOPS Casier David
2015-05-24 12:09 ` Mark Nelson
2015-05-24 15:55   ` Casier David
2015-05-26  8:34 ` Somnath Roy

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.