All of lore.kernel.org
 help / color / mirror / Atom feed
From: Samuel Just <sjust@redhat.com>
To: Sugang Li <sugangli@winlab.rutgers.edu>
Cc: ceph-devel <ceph-devel@vger.kernel.org>
Subject: Re: replicatedPG assert fails
Date: Thu, 21 Jul 2016 08:47:15 -0700	[thread overview]
Message-ID: <CAN=+7FVxvdPXofUXfB1D1h_nm-0uZOUAgRwX4Tiak3Sp+a5Hkw@mail.gmail.com> (raw)
In-Reply-To: <CAN=+7FWis8Gm-X2Zm59fVdBbwuXyJYMWWy5rJ9Wrpdzhw-pH6g@mail.gmail.com>

I may be misunderstanding your goal.  What are you trying to achieve?
-Sam

On Thu, Jul 21, 2016 at 8:43 AM, Samuel Just <sjust@redhat.com> wrote:
> Well, that assert is asserting that the object is in the pool that the
> pg operating on it belongs to.  Something very wrong must have
> happened for it to be not true.  Also, replicas have basically none of
> the code required to handle a write, so I'm kind of surprised it got
> that far.  I suggest that you read the debug logging and read the OSD
> op handling path.
> -Sam
>
> On Thu, Jul 21, 2016 at 8:34 AM, Sugang Li <sugangli@winlab.rutgers.edu> wrote:
>> Yes, I understand that. I was introduced to Ceph only 1 month ago, but
>> I have the basic idea of Ceph communication pattern now. I have not
>> make any changes to OSD yet. So I was wondering what is purpose of
>> this "assert(oid.pool == static_cast<int64_t>(info.pgid.pool()))", and
>> to change the code in OSD, what are the main aspects I should pay
>> attention to?
>> Since this is only a research project, the implementation does not
>> have to be very sophisticated.
>>
>> I know my question is kinda too broad, any hints or suggestions will
>> be highly appreciated.
>>
>> Thanks,
>>
>> Sugang
>>
>> On Thu, Jul 21, 2016 at 11:22 AM, Samuel Just <sjust@redhat.com> wrote:
>>> Oh, that's a much more complicated change.  You are going to need to
>>> make extensive changes to the OSD to make that work.
>>> -Sam
>>>
>>> On Thu, Jul 21, 2016 at 8:21 AM, Sugang Li <sugangli@winlab.rutgers.edu> wrote:
>>>> Hi Sam,
>>>>
>>>> Thanks for the quick reply. The main modification I made is to call
>>>> calc_target within librados::IoCtxImpl::aio_operate before op_submit,
>>>> so that I can get all replicated OSDs' id, and send a write op to each
>>>> of them. I can also attach the modified code if necessary.
>>>>
>>>> I just reproduced this error with the conf you provided,  please see below:
>>>> osd/ReplicatedPG.cc: In function 'int
>>>> ReplicatedPG::find_object_context(const hobject_t&, ObjectContextRef*,
>>>> bool, bool, hobject_t*)' thread 7fd6aba59700 time 2016-07-21
>>>> 15:09:26.431436
>>>> osd/ReplicatedPG.cc: 9042: FAILED assert(oid.pool ==
>>>> static_cast<int64_t>(info.pgid.pool()))
>>>>  ceph version 10.2.0-2562-g0793a28 (0793a2844baa38f6bcc5c1724a1ceb9f8f1bbd9c)
>>>>  1: (ceph::__ceph_assert_fail(char const*, char const*, int, char
>>>> const*)+0x8b) [0x7fd6c5733e8b]
>>>>  2: (ReplicatedPG::find_object_context(hobject_t const&,
>>>> std::shared_ptr<ObjectContext>*, bool, bool, hobject_t*)+0x1e54)
>>>> [0x7fd6c51ef7c4]
>>>>  3: (ReplicatedPG::do_op(std::shared_ptr<OpRequest>&)+0x186e) [0x7fd6c521fe9e]
>>>>  4: (ReplicatedPG::do_request(std::shared_ptr<OpRequest>&,
>>>> ThreadPool::TPHandle&)+0x73c) [0x7fd6c51dca3c]
>>>>  5: (OSD::dequeue_op(boost::intrusive_ptr<PG>,
>>>> std::shared_ptr<OpRequest>, ThreadPool::TPHandle&)+0x3f5)
>>>> [0x7fd6c5094d65]
>>>>  6: (PGQueueable::RunVis::operator()(std::shared_ptr<OpRequest>
>>>> const&)+0x5d) [0x7fd6c5094f8d]
>>>>  7: (OSD::ShardedOpWQ::_process(unsigned int,
>>>> ceph::heartbeat_handle_d*)+0x86c) [0x7fd6c50b603c]
>>>>  8: (ShardedThreadPool::shardedthreadpool_worker(unsigned int)+0x947)
>>>> [0x7fd6c5724117]
>>>>  9: (ShardedThreadPool::WorkThreadSharded::entry()+0x10) [0x7fd6c5726270]
>>>>  10: (()+0x8184) [0x7fd6c3b98184]
>>>>  11: (clone()+0x6d) [0x7fd6c1aa937d]
>>>>  NOTE: a copy of the executable, or `objdump -rdS <executable>` is
>>>> needed to interpret this.
>>>> 2016-07-21 15:09:26.454854 7fd6aba59700 -1 osd/ReplicatedPG.cc: In
>>>> function 'int ReplicatedPG::find_object_context(const hobject_t&,
>>>> ObjectContextRef*, bool, bool, hobject_t*)' thread 7fd6aba59700 time
>>>> 2016-07-21 15:09:26.431436
>>>>
>>>>
>>>> This error occurs three times since I wrote to three OSDs.
>>>>
>>>> Thanks,
>>>>
>>>> Sugang
>>>>
>>>> On Thu, Jul 21, 2016 at 10:54 AM, Samuel Just <sjust@redhat.com> wrote:
>>>>> Hmm.  Can you provide more information about the poison op?  If you
>>>>> can reproduce with
>>>>> debug osd = 20
>>>>> debug filestore = 20
>>>>> debug ms = 1
>>>>> it should be easier to work out what is going on.
>>>>> -Sam
>>>>>
>>>>> On Thu, Jul 21, 2016 at 7:13 AM, Sugang Li <sugangli@winlab.rutgers.edu> wrote:
>>>>>> Hi all,
>>>>>>
>>>>>> I am working on a research project which requires multiple write
>>>>>> operations for the same object at the same time from the client. At
>>>>>> the OSD side, I got this error:
>>>>>> osd/ReplicatedPG.cc: In function 'int
>>>>>> ReplicatedPG::find_object_context(const hobject_t&, ObjectContextRef*,
>>>>>> bool, bool, hobject_t*)' thread 7f0586193700 time 2016-07-21
>>>>>> 14:02:04.218448
>>>>>> osd/ReplicatedPG.cc: 9041: FAILED assert(oid.pool ==
>>>>>> static_cast<int64_t>(info.pgid.pool()))
>>>>>>  ceph version 10.2.0-2562-g0793a28 (0793a2844baa38f6bcc5c1724a1ceb9f8f1bbd9c)
>>>>>>  1: (ceph::__ceph_assert_fail(char const*, char const*, int, char
>>>>>> const*)+0x8b) [0x7f059fe6dd7b]
>>>>>>  2: (ReplicatedPG::find_object_context(hobject_t const&,
>>>>>> std::shared_ptr<ObjectContext>*, bool, bool, hobject_t*)+0x1dbb)
>>>>>> [0x7f059f9296fb]
>>>>>>  3: (ReplicatedPG::do_op(std::shared_ptr<OpRequest>&)+0x186e) [0x7f059f959d7e]
>>>>>>  4: (ReplicatedPG::do_request(std::shared_ptr<OpRequest>&,
>>>>>> ThreadPool::TPHandle&)+0x73c) [0x7f059f916a0c]
>>>>>>  5: (OSD::dequeue_op(boost::intrusive_ptr<PG>,
>>>>>> std::shared_ptr<OpRequest>, ThreadPool::TPHandle&)+0x3f5)
>>>>>> [0x7f059f7ced65]
>>>>>>  6: (PGQueueable::RunVis::operator()(std::shared_ptr<OpRequest>
>>>>>> const&)+0x5d) [0x7f059f7cef8d]
>>>>>>  7: (OSD::ShardedOpWQ::_process(unsigned int,
>>>>>> ceph::heartbeat_handle_d*)+0x86c) [0x7f059f7f003c]
>>>>>>  8: (ShardedThreadPool::shardedthreadpool_worker(unsigned int)+0x947)
>>>>>> [0x7f059fe5e007]
>>>>>>  9: (ShardedThreadPool::WorkThreadSharded::entry()+0x10) [0x7f059fe60160]
>>>>>>  10: (()+0x8184) [0x7f059e2d2184]
>>>>>>  11: (clone()+0x6d) [0x7f059c1e337d]
>>>>>>
>>>>>> And at the client side, I got segmentation fault.
>>>>>>
>>>>>> I am wondering what will be the possible reason that cause the assert fail?
>>>>>>
>>>>>> Thanks,
>>>>>>
>>>>>> Sugang
>>>>>> --
>>>>>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
>>>>>> the body of a message to majordomo@vger.kernel.org
>>>>>> More majordomo info at  http://vger.kernel.org/majordomo-info.html

  reply	other threads:[~2016-07-21 15:47 UTC|newest]

Thread overview: 25+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-07-21 14:13 replicatedPG assert fails Sugang Li
2016-07-21 14:54 ` Samuel Just
2016-07-21 15:21   ` Sugang Li
2016-07-21 15:22     ` Samuel Just
2016-07-21 15:34       ` Sugang Li
2016-07-21 15:43         ` Samuel Just
2016-07-21 15:47           ` Samuel Just [this message]
2016-07-21 15:49             ` Sugang Li
2016-07-21 16:03               ` Samuel Just
2016-07-21 18:11                 ` Sugang Li
2016-07-21 19:28                   ` Samuel Just
2016-07-21 19:36                     ` Sugang Li
2016-07-21 21:59                       ` Samuel Just
2016-07-22 14:00                         ` Sugang Li
2016-07-22 15:27                           ` Samuel Just
2016-07-22 15:30                             ` Sugang Li
2016-07-22 15:36                               ` Samuel Just
2016-07-22 17:07                                 ` Sugang Li
2016-07-22 17:35                                   ` Samuel Just
2016-07-22 18:13                                     ` Sugang Li
2016-07-22 18:31                                       ` Samuel Just
2016-07-22 19:19                                         ` Sugang Li
2016-07-22 19:34                                           ` Samuel Just
2016-07-22 20:53                                             ` Sugang Li
2016-07-22 21:21                                               ` Samuel Just

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='CAN=+7FVxvdPXofUXfB1D1h_nm-0uZOUAgRwX4Tiak3Sp+a5Hkw@mail.gmail.com' \
    --to=sjust@redhat.com \
    --cc=ceph-devel@vger.kernel.org \
    --cc=sugangli@winlab.rutgers.edu \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.