All of lore.kernel.org
 help / color / mirror / Atom feed
From: Casey Bodley <cbodley@redhat.com>
To: kefu chai <tchaikov@gmail.com>, Josh Durgin <jdurgin@redhat.com>
Cc: Adam Emerson <aemerson@redhat.com>,
	Gregory Farnum <gfarnum@redhat.com>,
	ceph-devel <ceph-devel@vger.kernel.org>
Subject: Re: seastar and 'tame reactor'
Date: Wed, 7 Feb 2018 12:11:04 -0500	[thread overview]
Message-ID: <9e6bc174-c6b3-a37e-abd5-b96d572d1d1b@redhat.com> (raw)
In-Reply-To: <CAJE9aOOq+HAWZOec0TnSqXFqUx7GD=XgCByh8HFV9Rb8L8qc2A@mail.gmail.com>


On 02/07/2018 11:01 AM, kefu chai wrote:
> On Wed, Jan 31, 2018 at 6:32 AM, Josh Durgin <jdurgin@redhat.com> wrote:
>> [adding ceph-devel]
>>
>> On 01/30/2018 01:56 PM, Casey Bodley wrote:
>>> Hey Josh,
>>>
>>> I heard you mention in the call yesterday that you're looking into this
>>> part of seastar integration. I was just reading through the relevant code
>>> over the weekend, and wanted to compare notes:
>>>
>>>
>>> in seastar, all cross-core communication goes through lockfree spsc
>>> queues, which are encapsulated by 'class smp_message_queue' in
>>> core/reactor.hh. all of these queues (smp::_qs) are allocated on startup in
>>> smp::configure(). early in reactor::run() (which is effectively each seastar
>>> thread's entrypoint), it registers a smp_poller to poll all of the queues
>>> directed at that cpu
>>>
>>> what we need is a way to inject messages into each seastar reactor from
>>> arbitrary/external threads. our requirements are very similar to
> i think we will have a sharded<osd::PublicService> on each core. in
> each instance of PublicService, we will be listening and serving
> requests from external clients of cluster. the same applies to
> sharded<osd::ClusterService>, which will be responsible for serving
> the requests from its peers in the cluster. the control flow of a
> typical OSD read request from a public RADOS client will look like:
>
> 1. the TCP connection is accepted by one of the listening
> sharded<osd::PublicService>.
> 2. decode the message
> 3. osd encapsulates the request in the message as a future, and submit
> it to another core after hashing the involved pg # to the core #.
> something like (in pseudo code):
>    engine().submit_to(osdmap_shard, [] {
>      return get_newer_osdmap(m->epoch);
>      // need to figure out how to reference a "osdmap service" in seastar.
>    }).then([] (auto osdmap) {
>      submit_to(pg_to_shard(m->ops.op.pg, [] {
>        return pg.do_ops(m->ops);
>      });
>    });
> 4. the core serving the involved pg (i.e. pg service) will dequeue
> this request, and use read_dma() call to delegate the aio request to
> the core maintaining the io queue.
> 5. once the aio completes, the PublicService will continue on, with
> the then() block. it will send the response back to client.
>
> so question is: why do we need a mpsc queue? the nr_core*nr_core spsc
> is good enough for us, i think.
>

Hey Kefu,

That sounds entirely reasonable, but assumes that everything will be 
running inside of seastar from the start. We've been looking for an 
incremental approach that would allow us to start with some subset 
running inside of seastar, with a mechanism for communication between 
that and the osd's existing threads. One suggestion was to start with 
just the messenger inside of seastar, and gradually move that 
seastar-to-external-thread boundary further down the io path as code is 
refactored to support it. It sounds unlikely that we'll ever get rocksdb 
running inside of seastar, so the objectstore will need its own threads 
until there's a viable alternative.

So the mpsc queue and smp::external_submit_to() interface was a strategy 
for passing messages into seastar from arbitrary non-seastar threads. 
Communication in the other direction just needs to be non-blocking (my 
example just signaled a condition variable without holding its mutex).

What are your thoughts on the incremental approach?

Casey

ps. I'd love to see more thought put into the design of the finished 
product, and your outline is a good start! Avi Kivity @scylladb shared 
one suggestion that I really liked, which was to give each shard of the 
osd a separate network endpoint, and add enough information to the 
osdmap so that clients could send their messages directly to the shard 
that would process them. That piece can come in later, but could 
eliminate some of the extra latency from your step 3.

  reply	other threads:[~2018-02-07 17:11 UTC|newest]

Thread overview: 15+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <d0f50268-72bb-1196-7ce9-0b9e21808ffb@redhat.com>
2018-01-30 22:32 ` seastar and 'tame reactor' Josh Durgin
2018-02-07 16:01   ` kefu chai
2018-02-07 17:11     ` Casey Bodley [this message]
2018-02-07 19:22       ` Gregory Farnum
2018-02-12 15:45         ` kefu chai
2018-02-12 15:55           ` Matt Benjamin
2018-02-12 15:57             ` Gregory Farnum
2018-02-13 13:35             ` kefu chai
2018-02-13 15:58               ` Casey Bodley
2018-02-12 19:40       ` Allen Samuels
2018-02-13 15:46         ` Casey Bodley
2018-02-13 16:17           ` liuchang0812
2018-02-14  3:16             ` Allen Samuels
2018-02-15 20:04               ` Josh Durgin
2018-02-16 16:23                 ` Allen Samuels

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=9e6bc174-c6b3-a37e-abd5-b96d572d1d1b@redhat.com \
    --to=cbodley@redhat.com \
    --cc=aemerson@redhat.com \
    --cc=ceph-devel@vger.kernel.org \
    --cc=gfarnum@redhat.com \
    --cc=jdurgin@redhat.com \
    --cc=tchaikov@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.