* Re: Asking bug ceph 15113
@ 2016-10-08 3:52 kefu chai
[not found] ` <CAJ+86MP_7q-p63X6+6J7b7Xoiq-23RTjafo7Ux5QBwKxe1M2Cg@mail.gmail.com>
0 siblings, 1 reply; 2+ messages in thread
From: kefu chai @ 2016-10-08 3:52 UTC (permalink / raw)
To: agung Laksono, ceph-devel
+ ceph-devel
On Thu, Sep 29, 2016 at 2:35 PM, agung Laksono <agung.smarts@gmail.com> wrote:
>
> I would like to ask you relate to ceph-15113. On the bug description,
> the scenario to reproduce the bug is:
this is not how the bug is reproduces, it's just my analysis of the
root cause. IMO,
it would be a tricky to reproduce this racing if possible.
>
> so the session was not removed, that's why the request was handled after the
> connection is reset. this is a race condition:
>
> ______________________ SafeTimer::timer_thread(), with mon_lock:
> ______________________ elector: in win_election(), it
> resend_routed_requests(), and collects the routed requests
> msgr: in ms_handle_reset(), it reset the session
> msgr: it waits for the lock
> ______________________ elector: in win_election(), it handle_command(), but
> the session is reset, hence it panics.
> msgr: remove session, and erase related requests from
> Monitor::routed_requests.
>
>
> I try to reproduce this bug on a cluster in my local machine and
> find difficulty when reproducing ms_handle_reset.
>
> Does ms_handle_reset refer to Monitor::ms_handle_reset(Connection *con)?
yes.
> How to trigger this function? On my study, I've put mon log when this method
when the peer resets the connection.
> executed.
> however, I saw that ms_handle_reset was called randomly. I mean this
> function also be called
> in several times when the system run.
>
> Thank you in advance!
>
> --
> Cheers,
>
> Agung Laksono
>
--
Regards
Kefu Chai
^ permalink raw reply [flat|nested] 2+ messages in thread
* Re: Asking bug ceph 15113
[not found] ` <CAJE9aOOv1iRFaWuEn6K2y6fPEOKC3R+-+8Uq_bcwr+mgdUeL5g@mail.gmail.com>
@ 2016-10-24 16:52 ` kefu chai
0 siblings, 0 replies; 2+ messages in thread
From: kefu chai @ 2016-10-24 16:52 UTC (permalink / raw)
To: agung Laksono; +Cc: ceph-devel
resending to mailing list. was rejected due to non plain-text mail.
On Fri, Oct 21, 2016 at 5:38 PM, kefu chai <tchaikov@gmail.com> wrote:
>
>
> On Monday, October 10, 2016, agung Laksono <agung.smarts@gmail.com> wrote:
>>
>> Thank you for the answer..
>>
>>
>> Could you tell me a bit about the peer.
>>
>> Does the peer means a client?
>
>
> it's very likely an osd.
>
>> when I run:
>> agung@ceph:~/project/infernalis/src$ ag --cpp ms_handle_reset
>>
>> the result shows so many places call ms_handle_reset.
>> But I am not sure which one that trigger Monitor:ms_handle_reset.
>
>
> it must be somewhere in messenger subsystem. you can set a breakpoint at
> ms_handle_reset() in gdb, and connectthe monitor with ceph cli, then kill
> it.
>
>>
>>
>>
>>
>> On Sat, Oct 8, 2016 at 10:52 AM, kefu chai <tchaikov@gmail.com> wrote:
>>>
>>> + ceph-devel
>>>
>>> On Thu, Sep 29, 2016 at 2:35 PM, agung Laksono <agung.smarts@gmail.com>
>>> wrote:
>>> >
>>> > I would like to ask you relate to ceph-15113. On the bug description,
>>> > the scenario to reproduce the bug is:
>>>
>>> this is not how the bug is reproduces, it's just my analysis of the
>>> root cause. IMO,
>>> it would be a tricky to reproduce this racing if possible.
>>>
>>> >
>>> > so the session was not removed, that's why the request was handled
>>> > after the
>>> > connection is reset. this is a race condition:
>>> >
>>> > ______________________ SafeTimer::timer_thread(), with mon_lock:
>>> > ______________________ elector: in win_election(), it
>>> > resend_routed_requests(), and collects the routed requests
>>> > msgr: in ms_handle_reset(), it reset the session
>>> > msgr: it waits for the lock
>>> > ______________________ elector: in win_election(), it handle_command(),
>>> > but
>>> > the session is reset, hence it panics.
>>> > msgr: remove session, and erase related requests from
>>> > Monitor::routed_requests.
>>> >
>>> >
>>> > I try to reproduce this bug on a cluster in my local machine and
>>> > find difficulty when reproducing ms_handle_reset.
>>> >
>>> > Does ms_handle_reset refer to Monitor::ms_handle_reset(Connection
>>> > *con)?
>>>
>>> yes.
>>>
>>> > How to trigger this function? On my study, I've put mon log when this
>>> > method
>>>
>>> when the peer resets the connection.
>>>
>>> > executed.
>>> > however, I saw that ms_handle_reset was called randomly. I mean this
>>> > function also be called
>>> > in several times when the system run.
>>> >
>>> > Thank you in advance!
>>> >
>>> > --
>>> > Cheers,
>>> >
>>> > Agung Laksono
>>> >
>>>
>>>
>>>
>>> --
>>> Regards
>>> Kefu Chai
>>
>>
>>
>>
>> --
>> Cheers,
>>
>> Agung Laksono
>>
>
>
>
>
>
> --
> Regards
> Kefu Chai
--
Regards
Kefu Chai
^ permalink raw reply [flat|nested] 2+ messages in thread
end of thread, other threads:[~2016-10-24 16:52 UTC | newest]
Thread overview: 2+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-10-08 3:52 Asking bug ceph 15113 kefu chai
[not found] ` <CAJ+86MP_7q-p63X6+6J7b7Xoiq-23RTjafo7Ux5QBwKxe1M2Cg@mail.gmail.com>
[not found] ` <CAJE9aOOv1iRFaWuEn6K2y6fPEOKC3R+-+8Uq_bcwr+mgdUeL5g@mail.gmail.com>
2016-10-24 16:52 ` kefu chai
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.