* crimson-osd vs legacy-osd: should the perf difference be already noticeable?
@ 2020-01-09 13:50 Roman Penyaev
2020-01-09 13:58 ` vitalif
2020-01-10 16:18 ` kefu chai
0 siblings, 2 replies; 8+ messages in thread
From: Roman Penyaev @ 2020-01-09 13:50 UTC (permalink / raw)
To: ceph-devel
Subject: crimson-osd vs legacy-osd: should the perf difference be
already noticeable?
Hi folks,
I was curios to read some early performance benchmarks which compare
crimson-osd vs legacy-osd, but could not find any. So eventually
decided to do my own micro benchmarks in order to test transport
together with PG layer, avoiding any storage costs completely
(no reason to test memcpy of memstore which is the only available
objectstore for crimson). At least recalling all these ad brochures
of seastar which should bring performance on another level by doing
preemption in userspace, the difference should be already there and
visible in numbers.
And yes I'm aware that crimson is in development, but if basic
functionality is already supported (like write path), then I can
squeeze some numbers.
For all testing loads I run original rbd.fio, taken from fio/examples/,
of course changing only block size. Since this is a micro benchmark I
run only 1 osd cluster.
[global]
ioengine=rbd
clientname=admin
pool=rbd
rbdname=fio_test
rw=randwrite
#bs=4k
[rbd_iodepth32]
iodepth=32
-- Part 1, turn MemStore and cyan_store into null block
Testing memcpy is not interesting so in order to run any memstore with
'memstore_debug_omit_block_device_write=true' option set and skip all
writes I have to do a small tweak in order to start osd, namely I still
need to pass small writes and omit big ones which are sent by the
client,
something as the following:
- if (len > 0 && !local_conf()->memstore_debug_omit_block_device_write)
{
+
+ if (len > 0 &&
+ (!local_conf()->memstore_debug_omit_block_device_write ||
+ // We still want cluster meta-data to be saved, so pass only
small
+ // writes, expecting user writes will be >= 4k.
+ len < 4096)) {
*** BTW at the bottom you can find the whole patch with all debug
modifications made to deliver these numbers.
# legacy-osd, MemStore
MON=1 MDS=0 OSD=1 MGR=1 ../src/vstart.sh --memstore -n \
-o 'memstore_debug_omit_block_device_write=true'
4k IOPS=42.7k, BW=167MiB/s, Lat=749.18usec
8k IOPS=40.2k, BW=314MiB/s, Lat=795.03usec
16k IOPS=37.6k, BW=588MiB/s, Lat=849.12usec
32k IOPS=32.0k, BW=1000MiB/s, Lat=998.56usec
64k IOPS=25.5k, BW=1594MiB/s, Lat=1253.99usec
128k IOPS=17.5k, BW=2188MiB/s, Lat=1826.54usec
256k IOPS=10.1k, BW=2531MiB/s, Lat=3157.33usec
512k IOPS=5252, BW=2626MiB/s, Lat=6071.37usec
1m IOPS=2656, BW=2656MiB/s, Lat=12029.65usec
# crimson-osd, cyan_store
MON=1 MDS=0 OSD=1 MGR=1 ../src/vstart.sh --crimson --memstore -n \
-o 'memstore_debug_omit_block_device_write=true'
4k IOPS=40.2k, BW=157MiB/s, Lat=796.07usec
8k IOPS=37.1k, BW=290MiB/s, Lat=861.51usec
16k IOPS=32.9k, BW=514MiB/s, Lat=970.99usec
32k IOPS=26.1k, BW=815MiB/s, Lat=1225.78usec
64k IOPS=21.3k, BW=1333MiB/s, Lat=1498.92usec
128k IOPS=14.4k, BW=1795MiB/s, Lat=2227.07usec
256k IOPS=6143, BW=1536MiB/s, Lat=5203.70usec
512k IOPS=3776, BW=1888MiB/s, Lat=8464.79usec
1m IOPS=1866, BW=1867MiB/s, Lat=17126.36usec
First thing that catches my eye is that for small blocks there is no big
difference at all, but as the block increases, crimsons iops starts to
decline. Can it be the transport issue? Can be tested as well.
-- Part 2, complete writes immediately, even not leaving the transport
Would be great to avoid PG logic costs, exactly like we did for
objectstore,
i.e. the following question can be asked "how fast we can handle writes
and
complete them immediately from the transport callback and measure socket
read/write costs?". I introduced new option 'osd_immediate_completions'
and handle it directly from 'OSD::ms_[fast_]dispatch' function replying
with
success just immediately (for details see patch at the bottom).
# legacy-osd
MON=1 MDS=0 OSD=1 MGR=1 ../src/vstart.sh --memstore -n \
-o 'osd_immediate_completions=true'
4k IOPS=59.2k, BW=231MiB/s, Lat=539.68usec
8k IOPS=55.1k, BW=430MiB/s, Lat=580.44usec
16k IOPS=50.5k, BW=789MiB/s, Lat=633.03usec
32k IOPS=44.6k, BW=1394MiB/s, Lat=716.74usec
64k IOPS=33.5k, BW=2093MiB/s, Lat=954.60usec
128k IOPS=20.8k, BW=2604MiB/s, Lat=1535.01usec
256k IOPS=10.6k, BW=2642MiB/s, Lat=3026.19usec
512k IOPS=5400, BW=2700MiB/s, Lat=5920.86usec
1m IOPS=2549, BW=2550MiB/s, Lat=12539.40usec
# crimson-osd
MON=1 MDS=0 OSD=1 MGR=1 ../src/vstart.sh --crimson --memstore -n \
-o 'osd_immediate_completions=true'
4k IOPS=60.2k, BW=235MiB/s, Lat=530.95usec
8k IOPS=52.0k, BW=407MiB/s, Lat=614.21usec
16k IOPS=47.1k, BW=736MiB/s, Lat=678.41usec
32k IOPS=37.8k, BW=1180MiB/s, Lat=846.75usec
64k IOPS=26.6k, BW=1660MiB/s, Lat=1203.51usec
128k IOPS=15.5k, BW=1936MiB/s, Lat=2064.12usec
256k IOPS=7506, BW=1877MiB/s, Lat=4259.19usec
512k IOPS=3941, BW=1971MiB/s, Lat=8112.67usec
1m IOPS=1785, BW=1786MiB/s, Lat=17896.44usec
As a summary I can say that for me is quite surprising not to notice any
iops improvements on crimson side (not to mention the problem with
reading
of big blocks). Since I run only 1 osd on one particular load I admit
the
artificial nature of such tests (thus called micro benchmark), but then
on what cluster scale and what benchmark can I run to see some
improvements
of a new architecture?
Roman
---
src/common/options.cc | 4 ++++
src/crimson/os/cyan_store.cc | 6 +++++-
src/crimson/osd/ops_executer.cc | 4 ++--
src/crimson/osd/osd.cc | 20 ++++++++++++++++++++
src/os/memstore/MemStore.cc | 6 +++++-
src/osd/OSD.cc | 24 ++++++++++++++++++++++++
6 files changed, 60 insertions(+), 4 deletions(-)
diff --git a/src/common/options.cc b/src/common/options.cc
index d91827c1a803..769666d2955c 100644
--- a/src/common/options.cc
+++ b/src/common/options.cc
@@ -4234,6 +4234,10 @@ std::vector<Option> get_global_options() {
.set_default(false)
.set_description(""),
+ Option("osd_immediate_completions", Option::TYPE_BOOL,
Option::LEVEL_ADVANCED)
+ .set_default(false)
+ .set_description(""),
+
// --------------------------
// bluestore
diff --git a/src/crimson/os/cyan_store.cc b/src/crimson/os/cyan_store.cc
index f0749cb921f9..c05e0e40b721 100644
--- a/src/crimson/os/cyan_store.cc
+++ b/src/crimson/os/cyan_store.cc
@@ -463,7 +463,11 @@ int CyanStore::_write(const coll_t& cid, const
ghobject_t& oid,
return -ENOENT;
ObjectRef o = c->get_or_create_object(oid);
- if (len > 0 && !local_conf()->memstore_debug_omit_block_device_write)
{
+ if (len > 0 &&
+ (!local_conf()->memstore_debug_omit_block_device_write ||
+ // We still want cluster meta-data to be saved, so pass only
small
+ // writes, expecting user writes will be >= 4k.
+ len < 4096)) {
const ssize_t old_size = o->get_size();
o->write(offset, bl);
used_bytes += (o->get_size() - old_size);
diff --git a/src/crimson/osd/ops_executer.cc
b/src/crimson/osd/ops_executer.cc
index 13f6f086c4ea..a76fc6e206d8 100644
--- a/src/crimson/osd/ops_executer.cc
+++ b/src/crimson/osd/ops_executer.cc
@@ -431,8 +431,8 @@ OpsExecuter::execute_osd_op(OSDOp& osd_op)
default:
logger().warn("unknown op {}", ceph_osd_op_name(op.op));
- throw std::runtime_error(
- fmt::format("op '{}' not supported", ceph_osd_op_name(op.op)));
+ // Without that `fio examples/rbd.fio` hangs on exit
+ throw ceph::osd::operation_not_supported{};
}
}
diff --git a/src/crimson/osd/osd.cc b/src/crimson/osd/osd.cc
index ddd8742d1a74..737cc266766e 100644
--- a/src/crimson/osd/osd.cc
+++ b/src/crimson/osd/osd.cc
@@ -17,6 +17,7 @@
#include "messages/MOSDOp.h"
#include "messages/MOSDPGLog.h"
#include "messages/MOSDRepOpReply.h"
+#include "messages/MOSDOpReply.h"
#include "messages/MPGStats.h"
#include "os/Transaction.h"
@@ -881,6 +882,25 @@ seastar::future<> OSD::committed_osd_maps(version_t
first,
seastar::future<> OSD::handle_osd_op(ceph::net::Connection* conn,
Ref<MOSDOp> m)
{
+
+ //
+ // Immediately complete requests even without leaving the transport
+ //
+ if (local_conf().get_val<bool>("osd_immediate_completions")) {
+ m->finish_decode();
+
+ for (auto op : m->ops) {
+ if (op.op.op == CEPH_OSD_OP_WRITE &&
+ // Complete big writes only
+ op.op.extent.length >= 4096) {
+
+ auto reply = make_message<MOSDOpReply>(m.get(), 0,
osdmap->get_epoch(),
+ CEPH_OSD_FLAG_ACK |
CEPH_OSD_FLAG_ONDISK, true);
+ return conn->send(reply);
+ }
+ }
+ }
+
shard_services.start_operation<ClientRequest>(
*this,
conn->get_shared(),
diff --git a/src/os/memstore/MemStore.cc b/src/os/memstore/MemStore.cc
index 05d16edb6cc0..265dc64c808d 100644
--- a/src/os/memstore/MemStore.cc
+++ b/src/os/memstore/MemStore.cc
@@ -1047,7 +1047,11 @@ int MemStore::_write(const coll_t& cid, const
ghobject_t& oid,
return -ENOENT;
ObjectRef o = c->get_or_create_object(oid);
- if (len > 0 && !cct->_conf->memstore_debug_omit_block_device_write) {
+ if (len > 0 &&
+ (!cct->_conf->memstore_debug_omit_block_device_write ||
+ // We still want cluster meta-data to be saved, so pass only
small
+ // writes, expecting user writes will be bigger than 4k.
+ len < 4096)) {
const ssize_t old_size = o->get_size();
o->write(offset, bl);
used_bytes += (o->get_size() - old_size);
diff --git a/src/osd/OSD.cc b/src/osd/OSD.cc
index 96aed0b706e3..796bf927126f 100644
--- a/src/osd/OSD.cc
+++ b/src/osd/OSD.cc
@@ -7223,6 +7223,30 @@ void OSD::ms_fast_dispatch(Message *m)
return;
}
+ //
+ // Immediately complete requests even without leaving the transport
+ //
+ if (g_conf().get_val<bool>("osd_immediate_completions") &&
+ m->get_type() == CEPH_MSG_OSD_OP) {
+ MOSDOp *osdop = static_cast<MOSDOp*>(m);
+
+ osdop->finish_decode();
+
+ for (auto op : osdop->ops) {
+ if (op.op.op == CEPH_OSD_OP_WRITE &&
+ // Complete big writes only
+ op.op.extent.length >= 4096) {
+ MOSDOpReply *reply;
+
+ reply = new MOSDOpReply(osdop, 0, osdmap->get_epoch(),
+ CEPH_OSD_FLAG_ACK |
CEPH_OSD_FLAG_ONDISK, true);
+ osdop->get_connection()->send_message(reply);
+ m->put();
+ return;
+ }
+ }
+ }
+
// peering event?
switch (m->get_type()) {
case CEPH_MSG_PING:
^ permalink raw reply related [flat|nested] 8+ messages in thread
* Re: crimson-osd vs legacy-osd: should the perf difference be already noticeable?
2020-01-09 13:50 crimson-osd vs legacy-osd: should the perf difference be already noticeable? Roman Penyaev
@ 2020-01-09 13:58 ` vitalif
2020-01-09 14:52 ` Roman Penyaev
2020-01-10 16:18 ` kefu chai
1 sibling, 1 reply; 8+ messages in thread
From: vitalif @ 2020-01-09 13:58 UTC (permalink / raw)
To: Roman Penyaev; +Cc: ceph-devel, ceph-devel-owner
I watched some of their crimson osd mettings on youtube and they
discussed something similar... however I thought they also said that
crimson-osd eats less CPU cores during that test. Did it eat less CPU in
your test?
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: crimson-osd vs legacy-osd: should the perf difference be already noticeable?
2020-01-09 13:58 ` vitalif
@ 2020-01-09 14:52 ` Roman Penyaev
2020-01-09 15:04 ` vitalif
0 siblings, 1 reply; 8+ messages in thread
From: Roman Penyaev @ 2020-01-09 14:52 UTC (permalink / raw)
To: vitalif; +Cc: ceph-devel, ceph-devel-owner
On 2020-01-09 14:58, vitalif@yourcmc.ru wrote:
> I watched some of their crimson osd mettings on youtube and they
> discussed something similar...
Could you please share the link?
> however I thought they also said that
> crimson-osd eats less CPU cores during that test. Did it eat less CPU
> in your test?
Hm, I can't prove even that. So here is the output of pidstat while
rbd.fio is running, 4k block only:
legacy-osd
[roman@dell ~]$ pidstat 1 -p 109930
Linux 5.3.13-arch1-1 (dell) 01/09/2020 _x86_64_ (8 CPU)
03:51:49 PM UID PID %usr %system %guest %wait %CPU
CPU Command
03:51:51 PM 1000 109930 14.00 8.00 0.00 0.00 22.00
1 ceph-osd
03:51:52 PM 1000 109930 40.00 19.00 0.00 0.00 59.00
1 ceph-osd
03:51:53 PM 1000 109930 44.00 17.00 0.00 0.00 61.00
1 ceph-osd
03:51:54 PM 1000 109930 40.00 20.00 0.00 0.00 60.00
1 ceph-osd
03:51:55 PM 1000 109930 39.00 18.00 0.00 0.00 57.00
1 ceph-osd
03:51:56 PM 1000 109930 41.00 20.00 0.00 0.00 61.00
1 ceph-osd
03:51:57 PM 1000 109930 41.00 15.00 0.00 0.00 56.00
1 ceph-osd
03:51:58 PM 1000 109930 42.00 16.00 0.00 0.00 58.00
1 ceph-osd
03:51:59 PM 1000 109930 42.00 15.00 0.00 0.00 57.00
1 ceph-osd
03:52:00 PM 1000 109930 43.00 15.00 0.00 0.00 58.00
1 ceph-osd
03:52:01 PM 1000 109930 24.00 12.00 0.00 0.00 36.00
1 ceph-osd
crimson-osd
pidstat 1 -p 108141
Linux 5.3.13-arch1-1 (dell) 01/09/2020 _x86_64_ (8 CPU)
03:47:50 PM UID PID %usr %system %guest %wait %CPU
CPU Command
03:47:55 PM 1000 108141 67.00 11.00 0.00 0.00 78.00
0 crimson-osd
03:47:56 PM 1000 108141 79.00 12.00 0.00 0.00 91.00
0 crimson-osd
03:47:57 PM 1000 108141 81.00 9.00 0.00 0.00 90.00
0 crimson-osd
03:47:58 PM 1000 108141 78.00 12.00 0.00 0.00 90.00
0 crimson-osd
03:47:59 PM 1000 108141 78.00 12.00 0.00 1.00 90.00
0 crimson-osd
03:48:00 PM 1000 108141 78.00 13.00 0.00 0.00 91.00
0 crimson-osd
03:48:01 PM 1000 108141 79.00 13.00 0.00 0.00 92.00
0 crimson-osd
03:48:02 PM 1000 108141 78.00 12.00 0.00 0.00 90.00
0 crimson-osd
03:48:03 PM 1000 108141 77.00 11.00 0.00 0.00 88.00
0 crimson-osd
03:48:04 PM 1000 108141 79.00 12.00 0.00 1.00 91.00
0 crimson-osd
--
Roman
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: crimson-osd vs legacy-osd: should the perf difference be already noticeable?
2020-01-09 14:52 ` Roman Penyaev
@ 2020-01-09 15:04 ` vitalif
0 siblings, 0 replies; 8+ messages in thread
From: vitalif @ 2020-01-09 15:04 UTC (permalink / raw)
To: Roman Penyaev; +Cc: ceph-devel, ceph-devel-owner
> Could you please share the link?
It was here
https://www.youtube.com/channel/UCno-Fry25FJ7B4RycCxOtfw/videos but I'm
not sure about what video it was.
> Hm, I can't prove even that. So here is the output of pidstat while
> rbd.fio is running, 4k block only:
Yeah... funny :)
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: crimson-osd vs legacy-osd: should the perf difference be already noticeable?
2020-01-09 13:50 crimson-osd vs legacy-osd: should the perf difference be already noticeable? Roman Penyaev
2020-01-09 13:58 ` vitalif
@ 2020-01-10 16:18 ` kefu chai
2020-01-10 18:53 ` Roman Penyaev
1 sibling, 1 reply; 8+ messages in thread
From: kefu chai @ 2020-01-10 16:18 UTC (permalink / raw)
To: Roman Penyaev, Radoslaw Zarzynski, Samuel Just
Cc: The Esoteric Order of the Squid Cybernetic
+ Radek, Sam, in case they are interested.
On Fri, Jan 10, 2020 at 12:18 AM Roman Penyaev <rpenyaev@suse.de> wrote:
>
> Subject: crimson-osd vs legacy-osd: should the perf difference be
> already noticeable?
>
> Hi folks,
>
> I was curios to read some early performance benchmarks which compare
> crimson-osd vs legacy-osd, but could not find any. So eventually
> decided to do my own micro benchmarks in order to test transport
> together with PG layer, avoiding any storage costs completely
> (no reason to test memcpy of memstore which is the only available
> objectstore for crimson). At least recalling all these ad brochures
> of seastar which should bring performance on another level by doing
> preemption in userspace, the difference should be already there and
> visible in numbers.
>
> And yes I'm aware that crimson is in development, but if basic
> functionality is already supported (like write path), then I can
> squeeze some numbers.
>
> For all testing loads I run original rbd.fio, taken from fio/examples/,
> of course changing only block size. Since this is a micro benchmark I
> run only 1 osd cluster.
>
> [global]
> ioengine=rbd
> clientname=admin
> pool=rbd
> rbdname=fio_test
> rw=randwrite
> #bs=4k
>
> [rbd_iodepth32]
> iodepth=32
>
>
> -- Part 1, turn MemStore and cyan_store into null block
>
> Testing memcpy is not interesting so in order to run any memstore with
> 'memstore_debug_omit_block_device_write=true' option set and skip all
> writes I have to do a small tweak in order to start osd, namely I still
> need to pass small writes and omit big ones which are sent by the
> client,
> something as the following:
>
> - if (len > 0 && !local_conf()->memstore_debug_omit_block_device_write)
> {
> +
> + if (len > 0 &&
> + (!local_conf()->memstore_debug_omit_block_device_write ||
> + // We still want cluster meta-data to be saved, so pass only
> small
> + // writes, expecting user writes will be >= 4k.
> + len < 4096)) {
>
>
> *** BTW at the bottom you can find the whole patch with all debug
> modifications made to deliver these numbers.
>
>
> # legacy-osd, MemStore
>
> MON=1 MDS=0 OSD=1 MGR=1 ../src/vstart.sh --memstore -n \
> -o 'memstore_debug_omit_block_device_write=true'
>
> 4k IOPS=42.7k, BW=167MiB/s, Lat=749.18usec
> 8k IOPS=40.2k, BW=314MiB/s, Lat=795.03usec
> 16k IOPS=37.6k, BW=588MiB/s, Lat=849.12usec
> 32k IOPS=32.0k, BW=1000MiB/s, Lat=998.56usec
> 64k IOPS=25.5k, BW=1594MiB/s, Lat=1253.99usec
> 128k IOPS=17.5k, BW=2188MiB/s, Lat=1826.54usec
> 256k IOPS=10.1k, BW=2531MiB/s, Lat=3157.33usec
> 512k IOPS=5252, BW=2626MiB/s, Lat=6071.37usec
> 1m IOPS=2656, BW=2656MiB/s, Lat=12029.65usec
>
>
> # crimson-osd, cyan_store
>
> MON=1 MDS=0 OSD=1 MGR=1 ../src/vstart.sh --crimson --memstore -n \
> -o 'memstore_debug_omit_block_device_write=true'
>
> 4k IOPS=40.2k, BW=157MiB/s, Lat=796.07usec
> 8k IOPS=37.1k, BW=290MiB/s, Lat=861.51usec
> 16k IOPS=32.9k, BW=514MiB/s, Lat=970.99usec
> 32k IOPS=26.1k, BW=815MiB/s, Lat=1225.78usec
> 64k IOPS=21.3k, BW=1333MiB/s, Lat=1498.92usec
> 128k IOPS=14.4k, BW=1795MiB/s, Lat=2227.07usec
> 256k IOPS=6143, BW=1536MiB/s, Lat=5203.70usec
> 512k IOPS=3776, BW=1888MiB/s, Lat=8464.79usec
> 1m IOPS=1866, BW=1867MiB/s, Lat=17126.36usec
>
>
> First thing that catches my eye is that for small blocks there is no big
> difference at all, but as the block increases, crimsons iops starts to
that's also our findings. and it's expected. as async messenger uses
the same reactor model as seastar does. actually its original
implementation was adapted from seastar's socket stream
implementation.
> decline. Can it be the transport issue? Can be tested as well.
because seastar's socket facility reads from the wire with 4K chunk
size, while classic OSD's async messenger reads the payload with the
size suggested by the header. so when it comes to larger block size,
it takes crimson-osd multiple syscalls and memcpy calls to read the
request from wire, that's why classic OSD wins in this case.
>
>
> -- Part 2, complete writes immediately, even not leaving the transport
>
> Would be great to avoid PG logic costs, exactly like we did for
> objectstore,
> i.e. the following question can be asked "how fast we can handle writes
> and
> complete them immediately from the transport callback and measure socket
> read/write costs?". I introduced new option 'osd_immediate_completions'
> and handle it directly from 'OSD::ms_[fast_]dispatch' function replying
> with
> success just immediately (for details see patch at the bottom).
>
>
> # legacy-osd
>
> MON=1 MDS=0 OSD=1 MGR=1 ../src/vstart.sh --memstore -n \
> -o 'osd_immediate_completions=true'
>
> 4k IOPS=59.2k, BW=231MiB/s, Lat=539.68usec
> 8k IOPS=55.1k, BW=430MiB/s, Lat=580.44usec
> 16k IOPS=50.5k, BW=789MiB/s, Lat=633.03usec
> 32k IOPS=44.6k, BW=1394MiB/s, Lat=716.74usec
> 64k IOPS=33.5k, BW=2093MiB/s, Lat=954.60usec
> 128k IOPS=20.8k, BW=2604MiB/s, Lat=1535.01usec
> 256k IOPS=10.6k, BW=2642MiB/s, Lat=3026.19usec
> 512k IOPS=5400, BW=2700MiB/s, Lat=5920.86usec
> 1m IOPS=2549, BW=2550MiB/s, Lat=12539.40usec
>
>
> # crimson-osd
>
> MON=1 MDS=0 OSD=1 MGR=1 ../src/vstart.sh --crimson --memstore -n \
> -o 'osd_immediate_completions=true'
>
> 4k IOPS=60.2k, BW=235MiB/s, Lat=530.95usec
> 8k IOPS=52.0k, BW=407MiB/s, Lat=614.21usec
> 16k IOPS=47.1k, BW=736MiB/s, Lat=678.41usec
> 32k IOPS=37.8k, BW=1180MiB/s, Lat=846.75usec
> 64k IOPS=26.6k, BW=1660MiB/s, Lat=1203.51usec
> 128k IOPS=15.5k, BW=1936MiB/s, Lat=2064.12usec
> 256k IOPS=7506, BW=1877MiB/s, Lat=4259.19usec
> 512k IOPS=3941, BW=1971MiB/s, Lat=8112.67usec
> 1m IOPS=1785, BW=1786MiB/s, Lat=17896.44usec
>
>
> As a summary I can say that for me is quite surprising not to notice any
> iops improvements on crimson side (not to mention the problem with
> reading
> of big blocks). Since I run only 1 osd on one particular load I admit
> the
> artificial nature of such tests (thus called micro benchmark), but then
> on what cluster scale and what benchmark can I run to see some
> improvements
> of a new architecture?
have you tried to use multiple fio clients to saturate CPU capacity of
OSD nodes?
>
> Roman
>
> ---
> src/common/options.cc | 4 ++++
> src/crimson/os/cyan_store.cc | 6 +++++-
> src/crimson/osd/ops_executer.cc | 4 ++--
> src/crimson/osd/osd.cc | 20 ++++++++++++++++++++
> src/os/memstore/MemStore.cc | 6 +++++-
> src/osd/OSD.cc | 24 ++++++++++++++++++++++++
> 6 files changed, 60 insertions(+), 4 deletions(-)
>
> diff --git a/src/common/options.cc b/src/common/options.cc
> index d91827c1a803..769666d2955c 100644
> --- a/src/common/options.cc
> +++ b/src/common/options.cc
> @@ -4234,6 +4234,10 @@ std::vector<Option> get_global_options() {
> .set_default(false)
> .set_description(""),
>
> + Option("osd_immediate_completions", Option::TYPE_BOOL,
> Option::LEVEL_ADVANCED)
> + .set_default(false)
> + .set_description(""),
> +
> // --------------------------
> // bluestore
>
> diff --git a/src/crimson/os/cyan_store.cc b/src/crimson/os/cyan_store.cc
> index f0749cb921f9..c05e0e40b721 100644
> --- a/src/crimson/os/cyan_store.cc
> +++ b/src/crimson/os/cyan_store.cc
> @@ -463,7 +463,11 @@ int CyanStore::_write(const coll_t& cid, const
> ghobject_t& oid,
> return -ENOENT;
>
> ObjectRef o = c->get_or_create_object(oid);
> - if (len > 0 && !local_conf()->memstore_debug_omit_block_device_write)
> {
> + if (len > 0 &&
> + (!local_conf()->memstore_debug_omit_block_device_write ||
> + // We still want cluster meta-data to be saved, so pass only
> small
> + // writes, expecting user writes will be >= 4k.
> + len < 4096)) {
> const ssize_t old_size = o->get_size();
> o->write(offset, bl);
> used_bytes += (o->get_size() - old_size);
> diff --git a/src/crimson/osd/ops_executer.cc
> b/src/crimson/osd/ops_executer.cc
> index 13f6f086c4ea..a76fc6e206d8 100644
> --- a/src/crimson/osd/ops_executer.cc
> +++ b/src/crimson/osd/ops_executer.cc
> @@ -431,8 +431,8 @@ OpsExecuter::execute_osd_op(OSDOp& osd_op)
>
> default:
> logger().warn("unknown op {}", ceph_osd_op_name(op.op));
> - throw std::runtime_error(
> - fmt::format("op '{}' not supported", ceph_osd_op_name(op.op)));
> + // Without that `fio examples/rbd.fio` hangs on exit
> + throw ceph::osd::operation_not_supported{};
> }
> }
>
> diff --git a/src/crimson/osd/osd.cc b/src/crimson/osd/osd.cc
> index ddd8742d1a74..737cc266766e 100644
> --- a/src/crimson/osd/osd.cc
> +++ b/src/crimson/osd/osd.cc
> @@ -17,6 +17,7 @@
> #include "messages/MOSDOp.h"
> #include "messages/MOSDPGLog.h"
> #include "messages/MOSDRepOpReply.h"
> +#include "messages/MOSDOpReply.h"
> #include "messages/MPGStats.h"
>
> #include "os/Transaction.h"
> @@ -881,6 +882,25 @@ seastar::future<> OSD::committed_osd_maps(version_t
> first,
> seastar::future<> OSD::handle_osd_op(ceph::net::Connection* conn,
> Ref<MOSDOp> m)
> {
> +
> + //
> + // Immediately complete requests even without leaving the transport
> + //
> + if (local_conf().get_val<bool>("osd_immediate_completions")) {
> + m->finish_decode();
> +
> + for (auto op : m->ops) {
> + if (op.op.op == CEPH_OSD_OP_WRITE &&
> + // Complete big writes only
> + op.op.extent.length >= 4096) {
> +
> + auto reply = make_message<MOSDOpReply>(m.get(), 0,
> osdmap->get_epoch(),
> + CEPH_OSD_FLAG_ACK |
> CEPH_OSD_FLAG_ONDISK, true);
> + return conn->send(reply);
> + }
> + }
> + }
> +
> shard_services.start_operation<ClientRequest>(
> *this,
> conn->get_shared(),
> diff --git a/src/os/memstore/MemStore.cc b/src/os/memstore/MemStore.cc
> index 05d16edb6cc0..265dc64c808d 100644
> --- a/src/os/memstore/MemStore.cc
> +++ b/src/os/memstore/MemStore.cc
> @@ -1047,7 +1047,11 @@ int MemStore::_write(const coll_t& cid, const
> ghobject_t& oid,
> return -ENOENT;
>
> ObjectRef o = c->get_or_create_object(oid);
> - if (len > 0 && !cct->_conf->memstore_debug_omit_block_device_write) {
> + if (len > 0 &&
> + (!cct->_conf->memstore_debug_omit_block_device_write ||
> + // We still want cluster meta-data to be saved, so pass only
> small
> + // writes, expecting user writes will be bigger than 4k.
> + len < 4096)) {
> const ssize_t old_size = o->get_size();
> o->write(offset, bl);
> used_bytes += (o->get_size() - old_size);
> diff --git a/src/osd/OSD.cc b/src/osd/OSD.cc
> index 96aed0b706e3..796bf927126f 100644
> --- a/src/osd/OSD.cc
> +++ b/src/osd/OSD.cc
> @@ -7223,6 +7223,30 @@ void OSD::ms_fast_dispatch(Message *m)
> return;
> }
>
> + //
> + // Immediately complete requests even without leaving the transport
> + //
> + if (g_conf().get_val<bool>("osd_immediate_completions") &&
> + m->get_type() == CEPH_MSG_OSD_OP) {
> + MOSDOp *osdop = static_cast<MOSDOp*>(m);
> +
> + osdop->finish_decode();
> +
> + for (auto op : osdop->ops) {
> + if (op.op.op == CEPH_OSD_OP_WRITE &&
> + // Complete big writes only
> + op.op.extent.length >= 4096) {
> + MOSDOpReply *reply;
> +
> + reply = new MOSDOpReply(osdop, 0, osdmap->get_epoch(),
> + CEPH_OSD_FLAG_ACK |
> CEPH_OSD_FLAG_ONDISK, true);
> + osdop->get_connection()->send_message(reply);
> + m->put();
> + return;
> + }
> + }
> + }
> +
> // peering event?
> switch (m->get_type()) {
> case CEPH_MSG_PING:
>
--
Regards
Kefu Chai
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: crimson-osd vs legacy-osd: should the perf difference be already noticeable?
2020-01-10 16:18 ` kefu chai
@ 2020-01-10 18:53 ` Roman Penyaev
2020-01-10 23:28 ` Liu, Chunmei
0 siblings, 1 reply; 8+ messages in thread
From: Roman Penyaev @ 2020-01-10 18:53 UTC (permalink / raw)
To: kefu chai
Cc: Radoslaw Zarzynski, Samuel Just,
The Esoteric Order of the Squid Cybernetic
On 2020-01-10 17:18, kefu chai wrote:
[skip]
>>
>> First thing that catches my eye is that for small blocks there is no
>> big
>> difference at all, but as the block increases, crimsons iops starts to
>
> that's also our findings. and it's expected. as async messenger uses
> the same reactor model as seastar does. actually its original
> implementation was adapted from seastar's socket stream
> implementation.
Hm, regardless of model messenger should not be a bottleneck. Take
a look on the results of fio_ceph_messenger load (runs pure messenger),
I can squeeze IOPS=89.8k, BW=351MiB/s on 4k block size, iodepth=32.
(also good example https://github.com/ceph/ceph/pull/26932 , almost
~200k)
With PG layer (memstore_debug_omit_block_device_write=true option)
I can reach 40k iops max. Without PG layer (immediate completion
from the transport callback, osd_immediate_completions=true)
I get almost 60k.
Seems that here starts playing costs on client side and these costs
prevail.
>
>> decline. Can it be the transport issue? Can be tested as well.
>
> because seastar's socket facility reads from the wire with 4K chunk
> size, while classic OSD's async messenger reads the payload with the
> size suggested by the header. so when it comes to larger block size,
> it takes crimson-osd multiple syscalls and memcpy calls to read the
> request from wire, that's why classic OSD wins in this case.
Do you plan to fix that?
> have you tried to use multiple fio clients to saturate CPU capacity of
> OSD nodes?
Not yet. But regarding CPU I have these numbers:
output of pidstat while rbd.fio is running, 4k block only:
legacy-osd
[roman@dell ~]$ pidstat 1 -p 109930
Linux 5.3.13-arch1-1 (dell) 01/09/2020 _x86_64_ (8 CPU)
03:51:49 PM UID PID %usr %system %guest %wait %CPU
CPU Command
03:51:51 PM 1000 109930 14.00 8.00 0.00 0.00 22.00
1 ceph-osd
03:51:52 PM 1000 109930 40.00 19.00 0.00 0.00 59.00
1 ceph-osd
03:51:53 PM 1000 109930 44.00 17.00 0.00 0.00 61.00
1 ceph-osd
03:51:54 PM 1000 109930 40.00 20.00 0.00 0.00 60.00
1 ceph-osd
03:51:55 PM 1000 109930 39.00 18.00 0.00 0.00 57.00
1 ceph-osd
03:51:56 PM 1000 109930 41.00 20.00 0.00 0.00 61.00
1 ceph-osd
03:51:57 PM 1000 109930 41.00 15.00 0.00 0.00 56.00
1 ceph-osd
03:51:58 PM 1000 109930 42.00 16.00 0.00 0.00 58.00
1 ceph-osd
03:51:59 PM 1000 109930 42.00 15.00 0.00 0.00 57.00
1 ceph-osd
03:52:00 PM 1000 109930 43.00 15.00 0.00 0.00 58.00
1 ceph-osd
03:52:01 PM 1000 109930 24.00 12.00 0.00 0.00 36.00
1 ceph-osd
crimson-osd
[roman@dell ~]$ pidstat 1 -p 108141
Linux 5.3.13-arch1-1 (dell) 01/09/2020 _x86_64_ (8 CPU)
03:47:50 PM UID PID %usr %system %guest %wait %CPU
CPU Command
03:47:55 PM 1000 108141 67.00 11.00 0.00 0.00 78.00
0 crimson-osd
03:47:56 PM 1000 108141 79.00 12.00 0.00 0.00 91.00
0 crimson-osd
03:47:57 PM 1000 108141 81.00 9.00 0.00 0.00 90.00
0 crimson-osd
03:47:58 PM 1000 108141 78.00 12.00 0.00 0.00 90.00
0 crimson-osd
03:47:59 PM 1000 108141 78.00 12.00 0.00 1.00 90.00
0 crimson-osd
03:48:00 PM 1000 108141 78.00 13.00 0.00 0.00 91.00
0 crimson-osd
03:48:01 PM 1000 108141 79.00 13.00 0.00 0.00 92.00
0 crimson-osd
03:48:02 PM 1000 108141 78.00 12.00 0.00 0.00 90.00
0 crimson-osd
03:48:03 PM 1000 108141 77.00 11.00 0.00 0.00 88.00
0 crimson-osd
03:48:04 PM 1000 108141 79.00 12.00 0.00 1.00 91.00
0 crimson-osd
Seems quite saturated, almost twice more than legacy-osd. Did you see
something
similar?
--
Roman
^ permalink raw reply [flat|nested] 8+ messages in thread
* RE: crimson-osd vs legacy-osd: should the perf difference be already noticeable?
2020-01-10 18:53 ` Roman Penyaev
@ 2020-01-10 23:28 ` Liu, Chunmei
2020-01-11 13:39 ` Mark Nelson
0 siblings, 1 reply; 8+ messages in thread
From: Liu, Chunmei @ 2020-01-10 23:28 UTC (permalink / raw)
To: Roman Penyaev, kefu chai, Ma, Jianpeng
Cc: Radoslaw Zarzynski, Samuel Just,
The Esoteric Order of the Squid Cybernetic
> -----Original Message-----
> From: ceph-devel-owner@vger.kernel.org <ceph-devel-owner@vger.kernel.org>
> On Behalf Of Roman Penyaev
> Sent: Friday, January 10, 2020 10:54 AM
> To: kefu chai <tchaikov@gmail.com>
> Cc: Radoslaw Zarzynski <rzarzyns@redhat.com>; Samuel Just
> <sjust@redhat.com>; The Esoteric Order of the Squid Cybernetic <ceph-
> devel@vger.kernel.org>
> Subject: Re: crimson-osd vs legacy-osd: should the perf difference be already
> noticeable?
>
> On 2020-01-10 17:18, kefu chai wrote:
>
> [skip]
>
> >>
> >> First thing that catches my eye is that for small blocks there is no
> >> big difference at all, but as the block increases, crimsons iops
> >> starts to
> >
> > that's also our findings. and it's expected. as async messenger uses
> > the same reactor model as seastar does. actually its original
> > implementation was adapted from seastar's socket stream
> > implementation.
>
> Hm, regardless of model messenger should not be a bottleneck. Take a look on
> the results of fio_ceph_messenger load (runs pure messenger), I can squeeze
> IOPS=89.8k, BW=351MiB/s on 4k block size, iodepth=32.
> (also good example https://github.com/ceph/ceph/pull/26932 , almost
> ~200k)
>
> With PG layer (memstore_debug_omit_block_device_write=true option) I can
> reach 40k iops max. Without PG layer (immediate completion from the
> transport callback, osd_immediate_completions=true) I get almost 60k.
>
> Seems that here starts playing costs on client side and these costs prevail.
>
> >
> >> decline. Can it be the transport issue? Can be tested as well.
> >
> > because seastar's socket facility reads from the wire with 4K chunk
> > size, while classic OSD's async messenger reads the payload with the
> > size suggested by the header. so when it comes to larger block size,
> > it takes crimson-osd multiple syscalls and memcpy calls to read the
> > request from wire, that's why classic OSD wins in this case.
>
> Do you plan to fix that?
>
> > have you tried to use multiple fio clients to saturate CPU capacity of
> > OSD nodes?
>
> Not yet. But regarding CPU I have these numbers:
>
> output of pidstat while rbd.fio is running, 4k block only:
>
> legacy-osd
>
> [roman@dell ~]$ pidstat 1 -p 109930
> Linux 5.3.13-arch1-1 (dell) 01/09/2020 _x86_64_ (8 CPU)
>
> 03:51:49 PM UID PID %usr %system %guest %wait %CPU
> CPU Command
> 03:51:51 PM 1000 109930 14.00 8.00 0.00 0.00 22.00
> 1 ceph-osd
> 03:51:52 PM 1000 109930 40.00 19.00 0.00 0.00 59.00
> 1 ceph-osd
> 03:51:53 PM 1000 109930 44.00 17.00 0.00 0.00 61.00
> 1 ceph-osd
> 03:51:54 PM 1000 109930 40.00 20.00 0.00 0.00 60.00
> 1 ceph-osd
> 03:51:55 PM 1000 109930 39.00 18.00 0.00 0.00 57.00
> 1 ceph-osd
> 03:51:56 PM 1000 109930 41.00 20.00 0.00 0.00 61.00
> 1 ceph-osd
> 03:51:57 PM 1000 109930 41.00 15.00 0.00 0.00 56.00
> 1 ceph-osd
> 03:51:58 PM 1000 109930 42.00 16.00 0.00 0.00 58.00
> 1 ceph-osd
> 03:51:59 PM 1000 109930 42.00 15.00 0.00 0.00 57.00
> 1 ceph-osd
> 03:52:00 PM 1000 109930 43.00 15.00 0.00 0.00 58.00
> 1 ceph-osd
> 03:52:01 PM 1000 109930 24.00 12.00 0.00 0.00 36.00
> 1 ceph-osd
>
>
> crimson-osd
>
> [roman@dell ~]$ pidstat 1 -p 108141
> Linux 5.3.13-arch1-1 (dell) 01/09/2020 _x86_64_ (8 CPU)
>
> 03:47:50 PM UID PID %usr %system %guest %wait %CPU
> CPU Command
> 03:47:55 PM 1000 108141 67.00 11.00 0.00 0.00 78.00
> 0 crimson-osd
> 03:47:56 PM 1000 108141 79.00 12.00 0.00 0.00 91.00
> 0 crimson-osd
> 03:47:57 PM 1000 108141 81.00 9.00 0.00 0.00 90.00
> 0 crimson-osd
> 03:47:58 PM 1000 108141 78.00 12.00 0.00 0.00 90.00
> 0 crimson-osd
> 03:47:59 PM 1000 108141 78.00 12.00 0.00 1.00 90.00
> 0 crimson-osd
> 03:48:00 PM 1000 108141 78.00 13.00 0.00 0.00 91.00
> 0 crimson-osd
> 03:48:01 PM 1000 108141 79.00 13.00 0.00 0.00 92.00
> 0 crimson-osd
> 03:48:02 PM 1000 108141 78.00 12.00 0.00 0.00 90.00
> 0 crimson-osd
> 03:48:03 PM 1000 108141 77.00 11.00 0.00 0.00 88.00
> 0 crimson-osd
> 03:48:04 PM 1000 108141 79.00 12.00 0.00 1.00 91.00
> 0 crimson-osd
>
>
> Seems quite saturated, almost twice more than legacy-osd. Did you see
> something similar?
Crimson-osd (seastar) use epoll, by default, it will use more cpu capacity,(you can change epoll mode setting to reduce it), add Ma, Jianpeng in the thread since he did more study on it.
BTW, by default crimson-osd is one thread, and legacy ceph-osd (3 threads for async messenger, 2x8 threads for osd (SDD), finisher thread etc,) ,so by default setting, it is 1 thread compare to over 10 threads work, it is expected crimson-osd not show obvious difference. you can change the default thread number for legacy ceph-osd(such as thread=1 for each layer to see more difference.)
BTW, please use release build to do test.
Crimson-osd is aysnc model, if workload is very light, can't take more advantage of it.
>
> --
> Roman
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: crimson-osd vs legacy-osd: should the perf difference be already noticeable?
2020-01-10 23:28 ` Liu, Chunmei
@ 2020-01-11 13:39 ` Mark Nelson
0 siblings, 0 replies; 8+ messages in thread
From: Mark Nelson @ 2020-01-11 13:39 UTC (permalink / raw)
To: Liu, Chunmei, Roman Penyaev, kefu chai, Ma, Jianpeng
Cc: Radoslaw Zarzynski, Samuel Just,
The Esoteric Order of the Squid Cybernetic
On 1/10/20 5:28 PM, Liu, Chunmei wrote:
>
>
>> -----Original Message-----
>> From: ceph-devel-owner@vger.kernel.org <ceph-devel-owner@vger.kernel.org>
>> On Behalf Of Roman Penyaev
>> Sent: Friday, January 10, 2020 10:54 AM
>> To: kefu chai <tchaikov@gmail.com>
>> Cc: Radoslaw Zarzynski <rzarzyns@redhat.com>; Samuel Just
>> <sjust@redhat.com>; The Esoteric Order of the Squid Cybernetic <ceph-
>> devel@vger.kernel.org>
>> Subject: Re: crimson-osd vs legacy-osd: should the perf difference be already
>> noticeable?
>>
>> On 2020-01-10 17:18, kefu chai wrote:
>>
>> [skip]
>>
>>>>
>>>> First thing that catches my eye is that for small blocks there is no
>>>> big difference at all, but as the block increases, crimsons iops
>>>> starts to
>>>
>>> that's also our findings. and it's expected. as async messenger uses
>>> the same reactor model as seastar does. actually its original
>>> implementation was adapted from seastar's socket stream
>>> implementation.
>>
>> Hm, regardless of model messenger should not be a bottleneck. Take a look on
>> the results of fio_ceph_messenger load (runs pure messenger), I can squeeze
>> IOPS=89.8k, BW=351MiB/s on 4k block size, iodepth=32.
>> (also good example https://github.com/ceph/ceph/pull/26932 , almost
>> ~200k)
>>
>> With PG layer (memstore_debug_omit_block_device_write=true option) I can
>> reach 40k iops max. Without PG layer (immediate completion from the
>> transport callback, osd_immediate_completions=true) I get almost 60k.
>>
>> Seems that here starts playing costs on client side and these costs prevail.
>>
>>>
>>>> decline. Can it be the transport issue? Can be tested as well.
>>>
>>> because seastar's socket facility reads from the wire with 4K chunk
>>> size, while classic OSD's async messenger reads the payload with the
>>> size suggested by the header. so when it comes to larger block size,
>>> it takes crimson-osd multiple syscalls and memcpy calls to read the
>>> request from wire, that's why classic OSD wins in this case.
>>
>> Do you plan to fix that?
>>
>>> have you tried to use multiple fio clients to saturate CPU capacity of
>>> OSD nodes?
>>
>> Not yet. But regarding CPU I have these numbers:
>>
>> output of pidstat while rbd.fio is running, 4k block only:
>>
>> legacy-osd
>>
>> [roman@dell ~]$ pidstat 1 -p 109930
>> Linux 5.3.13-arch1-1 (dell) 01/09/2020 _x86_64_ (8 CPU)
>>
>> 03:51:49 PM UID PID %usr %system %guest %wait %CPU
>> CPU Command
>> 03:51:51 PM 1000 109930 14.00 8.00 0.00 0.00 22.00
>> 1 ceph-osd
>> 03:51:52 PM 1000 109930 40.00 19.00 0.00 0.00 59.00
>> 1 ceph-osd
>> 03:51:53 PM 1000 109930 44.00 17.00 0.00 0.00 61.00
>> 1 ceph-osd
>> 03:51:54 PM 1000 109930 40.00 20.00 0.00 0.00 60.00
>> 1 ceph-osd
>> 03:51:55 PM 1000 109930 39.00 18.00 0.00 0.00 57.00
>> 1 ceph-osd
>> 03:51:56 PM 1000 109930 41.00 20.00 0.00 0.00 61.00
>> 1 ceph-osd
>> 03:51:57 PM 1000 109930 41.00 15.00 0.00 0.00 56.00
>> 1 ceph-osd
>> 03:51:58 PM 1000 109930 42.00 16.00 0.00 0.00 58.00
>> 1 ceph-osd
>> 03:51:59 PM 1000 109930 42.00 15.00 0.00 0.00 57.00
>> 1 ceph-osd
>> 03:52:00 PM 1000 109930 43.00 15.00 0.00 0.00 58.00
>> 1 ceph-osd
>> 03:52:01 PM 1000 109930 24.00 12.00 0.00 0.00 36.00
>> 1 ceph-osd
>>
>>
>> crimson-osd
>>
>> [roman@dell ~]$ pidstat 1 -p 108141
>> Linux 5.3.13-arch1-1 (dell) 01/09/2020 _x86_64_ (8 CPU)
>>
>> 03:47:50 PM UID PID %usr %system %guest %wait %CPU
>> CPU Command
>> 03:47:55 PM 1000 108141 67.00 11.00 0.00 0.00 78.00
>> 0 crimson-osd
>> 03:47:56 PM 1000 108141 79.00 12.00 0.00 0.00 91.00
>> 0 crimson-osd
>> 03:47:57 PM 1000 108141 81.00 9.00 0.00 0.00 90.00
>> 0 crimson-osd
>> 03:47:58 PM 1000 108141 78.00 12.00 0.00 0.00 90.00
>> 0 crimson-osd
>> 03:47:59 PM 1000 108141 78.00 12.00 0.00 1.00 90.00
>> 0 crimson-osd
>> 03:48:00 PM 1000 108141 78.00 13.00 0.00 0.00 91.00
>> 0 crimson-osd
>> 03:48:01 PM 1000 108141 79.00 13.00 0.00 0.00 92.00
>> 0 crimson-osd
>> 03:48:02 PM 1000 108141 78.00 12.00 0.00 0.00 90.00
>> 0 crimson-osd
>> 03:48:03 PM 1000 108141 77.00 11.00 0.00 0.00 88.00
>> 0 crimson-osd
>> 03:48:04 PM 1000 108141 79.00 12.00 0.00 1.00 91.00
>> 0 crimson-osd
>>
>>
>> Seems quite saturated, almost twice more than legacy-osd. Did you see
>> something similar?
> Crimson-osd (seastar) use epoll, by default, it will use more cpu capacity,(you can change epoll mode setting to reduce it), add Ma, Jianpeng in the thread since he did more study on it.
> BTW, by default crimson-osd is one thread, and legacy ceph-osd (3 threads for async messenger, 2x8 threads for osd (SDD), finisher thread etc,) ,so by default setting, it is 1 thread compare to over 10 threads work, it is expected crimson-osd not show obvious difference. you can change the default thread number for legacy ceph-osd(such as thread=1 for each layer to see more difference.)
> BTW, please use release build to do test.
> Crimson-osd is aysnc model, if workload is very light, can't take more advantage of it.
>>
>> --
>> Roman
>
FWIW I can drive the classical OSD pretty hard and get around 70-80K
IOPS out of a single OSD, but as Kefu says above it will consume a
larger number of cores. I do think per-OSD throughput is still
important to look at, but the per-OSD efficiency numbers as Radek has
been testing (I gathered some for classical OSD a while back for him)
are probably going to be more important overall.
Mark
^ permalink raw reply [flat|nested] 8+ messages in thread
end of thread, other threads:[~2020-01-11 13:40 UTC | newest]
Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-01-09 13:50 crimson-osd vs legacy-osd: should the perf difference be already noticeable? Roman Penyaev
2020-01-09 13:58 ` vitalif
2020-01-09 14:52 ` Roman Penyaev
2020-01-09 15:04 ` vitalif
2020-01-10 16:18 ` kefu chai
2020-01-10 18:53 ` Roman Penyaev
2020-01-10 23:28 ` Liu, Chunmei
2020-01-11 13:39 ` Mark Nelson
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.