From mboxrd@z Thu Jan 1 00:00:00 1970 From: Sugang Li Subject: Re: replicatedPG assert fails Date: Fri, 22 Jul 2016 10:00:26 -0400 Message-ID: References: Mime-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Return-path: Received: from mail-qk0-f179.google.com ([209.85.220.179]:33200 "EHLO mail-qk0-f179.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1754048AbcGVOA2 (ORCPT ); Fri, 22 Jul 2016 10:00:28 -0400 Received: by mail-qk0-f179.google.com with SMTP id p74so101963930qka.0 for ; Fri, 22 Jul 2016 07:00:27 -0700 (PDT) In-Reply-To: Sender: ceph-devel-owner@vger.kernel.org List-ID: To: Samuel Just Cc: ceph-devel Actually write lock the object only. Is that gonna work? Sugang On Thu, Jul 21, 2016 at 5:59 PM, Samuel Just wrote: > Write lock on the whole pg? How do parallel clients work? > -Sam > > On Thu, Jul 21, 2016 at 12:36 PM, Sugang Li wrote: >> The error above occurs when I am sending MOSOp to the replicas, and I >> have to fix that first. >> >> For the consistency, we are still using the Primary OSD as a control >> center. That is, the client always goes to Primary OSD to ask for a >> write lock, then write the replica. >> >> Sugang >> >> On Thu, Jul 21, 2016 at 3:28 PM, Samuel Just wrote: >>> Well, they are actually different types with different encodings and >>> different contents. The client doesn't really have the information >>> needed to build a MSG_OSD_REPOP. Your best bet will be to send an >>> MOSDOp to the replicas and hack up a write path that makes that work. >>> >>> How do you plan to address the consistency problems? >>> -Sam >>> >>> On Thu, Jul 21, 2016 at 11:11 AM, Sugang Li wrote: >>>> So, to start with, I think one naive way is to make the replica think >>>> it receives an op from the primary OSD, which actually comes from the >>>> client. And the branching point looks like started from >>>> OSD::dispatch_op_fast, where handle_op or handle_replica_op is called >>>> based on the type of the request. So my question is, at the client >>>> side, is there a way that I could set the corresponding variables >>>> referred by "op->get_req()->get_type()" to MSG_OSD_SUBOP or >>>> MSG_OSD_REPOP? >>>> >>>> Sugang >>>> >>>> On Thu, Jul 21, 2016 at 12:03 PM, Samuel Just wrote: >>>>> Parallel read will be a *lot* easier since read-from-replica already >>>>> works. Write to replica, however, is tough. The write path uses a >>>>> lot of structures which are only populated on the primary. You're >>>>> going to have to hack up most of the write path to bypass the existing >>>>> replication machinery. Beyond that, maintaining consistency will >>>>> obviously be a challenge. >>>>> -Sam >>>>> >>>>> On Thu, Jul 21, 2016 at 8:49 AM, Sugang Li wrote: >>>>>> My goal is to achieve parallel write/read from the client instead of >>>>>> the primary OSD. >>>>>> >>>>>> Sugang >>>>>> >>>>>> On Thu, Jul 21, 2016 at 11:47 AM, Samuel Just wrote: >>>>>>> I may be misunderstanding your goal. What are you trying to achieve? >>>>>>> -Sam >>>>>>> >>>>>>> On Thu, Jul 21, 2016 at 8:43 AM, Samuel Just wrote: >>>>>>>> Well, that assert is asserting that the object is in the pool that the >>>>>>>> pg operating on it belongs to. Something very wrong must have >>>>>>>> happened for it to be not true. Also, replicas have basically none of >>>>>>>> the code required to handle a write, so I'm kind of surprised it got >>>>>>>> that far. I suggest that you read the debug logging and read the OSD >>>>>>>> op handling path. >>>>>>>> -Sam >>>>>>>> >>>>>>>> On Thu, Jul 21, 2016 at 8:34 AM, Sugang Li wrote: >>>>>>>>> Yes, I understand that. I was introduced to Ceph only 1 month ago, but >>>>>>>>> I have the basic idea of Ceph communication pattern now. I have not >>>>>>>>> make any changes to OSD yet. So I was wondering what is purpose of >>>>>>>>> this "assert(oid.pool == static_cast(info.pgid.pool()))", and >>>>>>>>> to change the code in OSD, what are the main aspects I should pay >>>>>>>>> attention to? >>>>>>>>> Since this is only a research project, the implementation does not >>>>>>>>> have to be very sophisticated. >>>>>>>>> >>>>>>>>> I know my question is kinda too broad, any hints or suggestions will >>>>>>>>> be highly appreciated. >>>>>>>>> >>>>>>>>> Thanks, >>>>>>>>> >>>>>>>>> Sugang >>>>>>>>> >>>>>>>>> On Thu, Jul 21, 2016 at 11:22 AM, Samuel Just wrote: >>>>>>>>>> Oh, that's a much more complicated change. You are going to need to >>>>>>>>>> make extensive changes to the OSD to make that work. >>>>>>>>>> -Sam >>>>>>>>>> >>>>>>>>>> On Thu, Jul 21, 2016 at 8:21 AM, Sugang Li wrote: >>>>>>>>>>> Hi Sam, >>>>>>>>>>> >>>>>>>>>>> Thanks for the quick reply. The main modification I made is to call >>>>>>>>>>> calc_target within librados::IoCtxImpl::aio_operate before op_submit, >>>>>>>>>>> so that I can get all replicated OSDs' id, and send a write op to each >>>>>>>>>>> of them. I can also attach the modified code if necessary. >>>>>>>>>>> >>>>>>>>>>> I just reproduced this error with the conf you provided, please see below: >>>>>>>>>>> osd/ReplicatedPG.cc: In function 'int >>>>>>>>>>> ReplicatedPG::find_object_context(const hobject_t&, ObjectContextRef*, >>>>>>>>>>> bool, bool, hobject_t*)' thread 7fd6aba59700 time 2016-07-21 >>>>>>>>>>> 15:09:26.431436 >>>>>>>>>>> osd/ReplicatedPG.cc: 9042: FAILED assert(oid.pool == >>>>>>>>>>> static_cast(info.pgid.pool())) >>>>>>>>>>> ceph version 10.2.0-2562-g0793a28 (0793a2844baa38f6bcc5c1724a1ceb9f8f1bbd9c) >>>>>>>>>>> 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char >>>>>>>>>>> const*)+0x8b) [0x7fd6c5733e8b] >>>>>>>>>>> 2: (ReplicatedPG::find_object_context(hobject_t const&, >>>>>>>>>>> std::shared_ptr*, bool, bool, hobject_t*)+0x1e54) >>>>>>>>>>> [0x7fd6c51ef7c4] >>>>>>>>>>> 3: (ReplicatedPG::do_op(std::shared_ptr&)+0x186e) [0x7fd6c521fe9e] >>>>>>>>>>> 4: (ReplicatedPG::do_request(std::shared_ptr&, >>>>>>>>>>> ThreadPool::TPHandle&)+0x73c) [0x7fd6c51dca3c] >>>>>>>>>>> 5: (OSD::dequeue_op(boost::intrusive_ptr, >>>>>>>>>>> std::shared_ptr, ThreadPool::TPHandle&)+0x3f5) >>>>>>>>>>> [0x7fd6c5094d65] >>>>>>>>>>> 6: (PGQueueable::RunVis::operator()(std::shared_ptr >>>>>>>>>>> const&)+0x5d) [0x7fd6c5094f8d] >>>>>>>>>>> 7: (OSD::ShardedOpWQ::_process(unsigned int, >>>>>>>>>>> ceph::heartbeat_handle_d*)+0x86c) [0x7fd6c50b603c] >>>>>>>>>>> 8: (ShardedThreadPool::shardedthreadpool_worker(unsigned int)+0x947) >>>>>>>>>>> [0x7fd6c5724117] >>>>>>>>>>> 9: (ShardedThreadPool::WorkThreadSharded::entry()+0x10) [0x7fd6c5726270] >>>>>>>>>>> 10: (()+0x8184) [0x7fd6c3b98184] >>>>>>>>>>> 11: (clone()+0x6d) [0x7fd6c1aa937d] >>>>>>>>>>> NOTE: a copy of the executable, or `objdump -rdS ` is >>>>>>>>>>> needed to interpret this. >>>>>>>>>>> 2016-07-21 15:09:26.454854 7fd6aba59700 -1 osd/ReplicatedPG.cc: In >>>>>>>>>>> function 'int ReplicatedPG::find_object_context(const hobject_t&, >>>>>>>>>>> ObjectContextRef*, bool, bool, hobject_t*)' thread 7fd6aba59700 time >>>>>>>>>>> 2016-07-21 15:09:26.431436 >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> This error occurs three times since I wrote to three OSDs. >>>>>>>>>>> >>>>>>>>>>> Thanks, >>>>>>>>>>> >>>>>>>>>>> Sugang >>>>>>>>>>> >>>>>>>>>>> On Thu, Jul 21, 2016 at 10:54 AM, Samuel Just wrote: >>>>>>>>>>>> Hmm. Can you provide more information about the poison op? If you >>>>>>>>>>>> can reproduce with >>>>>>>>>>>> debug osd = 20 >>>>>>>>>>>> debug filestore = 20 >>>>>>>>>>>> debug ms = 1 >>>>>>>>>>>> it should be easier to work out what is going on. >>>>>>>>>>>> -Sam >>>>>>>>>>>> >>>>>>>>>>>> On Thu, Jul 21, 2016 at 7:13 AM, Sugang Li wrote: >>>>>>>>>>>>> Hi all, >>>>>>>>>>>>> >>>>>>>>>>>>> I am working on a research project which requires multiple write >>>>>>>>>>>>> operations for the same object at the same time from the client. At >>>>>>>>>>>>> the OSD side, I got this error: >>>>>>>>>>>>> osd/ReplicatedPG.cc: In function 'int >>>>>>>>>>>>> ReplicatedPG::find_object_context(const hobject_t&, ObjectContextRef*, >>>>>>>>>>>>> bool, bool, hobject_t*)' thread 7f0586193700 time 2016-07-21 >>>>>>>>>>>>> 14:02:04.218448 >>>>>>>>>>>>> osd/ReplicatedPG.cc: 9041: FAILED assert(oid.pool == >>>>>>>>>>>>> static_cast(info.pgid.pool())) >>>>>>>>>>>>> ceph version 10.2.0-2562-g0793a28 (0793a2844baa38f6bcc5c1724a1ceb9f8f1bbd9c) >>>>>>>>>>>>> 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char >>>>>>>>>>>>> const*)+0x8b) [0x7f059fe6dd7b] >>>>>>>>>>>>> 2: (ReplicatedPG::find_object_context(hobject_t const&, >>>>>>>>>>>>> std::shared_ptr*, bool, bool, hobject_t*)+0x1dbb) >>>>>>>>>>>>> [0x7f059f9296fb] >>>>>>>>>>>>> 3: (ReplicatedPG::do_op(std::shared_ptr&)+0x186e) [0x7f059f959d7e] >>>>>>>>>>>>> 4: (ReplicatedPG::do_request(std::shared_ptr&, >>>>>>>>>>>>> ThreadPool::TPHandle&)+0x73c) [0x7f059f916a0c] >>>>>>>>>>>>> 5: (OSD::dequeue_op(boost::intrusive_ptr, >>>>>>>>>>>>> std::shared_ptr, ThreadPool::TPHandle&)+0x3f5) >>>>>>>>>>>>> [0x7f059f7ced65] >>>>>>>>>>>>> 6: (PGQueueable::RunVis::operator()(std::shared_ptr >>>>>>>>>>>>> const&)+0x5d) [0x7f059f7cef8d] >>>>>>>>>>>>> 7: (OSD::ShardedOpWQ::_process(unsigned int, >>>>>>>>>>>>> ceph::heartbeat_handle_d*)+0x86c) [0x7f059f7f003c] >>>>>>>>>>>>> 8: (ShardedThreadPool::shardedthreadpool_worker(unsigned int)+0x947) >>>>>>>>>>>>> [0x7f059fe5e007] >>>>>>>>>>>>> 9: (ShardedThreadPool::WorkThreadSharded::entry()+0x10) [0x7f059fe60160] >>>>>>>>>>>>> 10: (()+0x8184) [0x7f059e2d2184] >>>>>>>>>>>>> 11: (clone()+0x6d) [0x7f059c1e337d] >>>>>>>>>>>>> >>>>>>>>>>>>> And at the client side, I got segmentation fault. >>>>>>>>>>>>> >>>>>>>>>>>>> I am wondering what will be the possible reason that cause the assert fail? >>>>>>>>>>>>> >>>>>>>>>>>>> Thanks, >>>>>>>>>>>>> >>>>>>>>>>>>> Sugang >>>>>>>>>>>>> -- >>>>>>>>>>>>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in >>>>>>>>>>>>> the body of a message to majordomo@vger.kernel.org >>>>>>>>>>>>> More majordomo info at http://vger.kernel.org/majordomo-info.html