All of lore.kernel.org
 help / color / mirror / Atom feed
* Re: Asking bug ceph 15113
@ 2016-10-08  3:52 kefu chai
       [not found] ` <CAJ+86MP_7q-p63X6+6J7b7Xoiq-23RTjafo7Ux5QBwKxe1M2Cg@mail.gmail.com>
  0 siblings, 1 reply; 2+ messages in thread
From: kefu chai @ 2016-10-08  3:52 UTC (permalink / raw)
  To: agung Laksono, ceph-devel

+ ceph-devel

On Thu, Sep 29, 2016 at 2:35 PM, agung Laksono <agung.smarts@gmail.com> wrote:
>
> I would like to ask you relate to ceph-15113. On the bug description,
> the scenario to reproduce the bug is:

this is not how the bug is reproduces, it's just my analysis of the
root cause. IMO,
it would be a tricky to reproduce this racing if possible.

>
> so the session was not removed, that's why the request was handled after the
> connection is reset. this is a race condition:
>
> ______________________ SafeTimer::timer_thread(), with mon_lock:
> ______________________ elector: in win_election(), it
> resend_routed_requests(), and collects the routed requests
> msgr: in ms_handle_reset(), it reset the session
> msgr: it waits for the lock
> ______________________ elector: in win_election(), it handle_command(), but
> the session is reset, hence it panics.
> msgr: remove session, and erase related requests from
> Monitor::routed_requests.
>
>
> I try to reproduce this bug on a cluster in my local machine and
> find difficulty when reproducing ms_handle_reset.
>
> Does ms_handle_reset refer to Monitor::ms_handle_reset(Connection *con)?

yes.

> How to trigger this function? On my study, I've put mon log when this method

when the peer resets the connection.

> executed.
> however, I saw that ms_handle_reset was called randomly. I mean this
> function also be called
> in several times when the system run.
>
>  Thank you in advance!
>
> --
> Cheers,
>
> Agung Laksono
>



-- 
Regards
Kefu Chai

^ permalink raw reply	[flat|nested] 2+ messages in thread

* Re: Asking bug ceph 15113
       [not found]   ` <CAJE9aOOv1iRFaWuEn6K2y6fPEOKC3R+-+8Uq_bcwr+mgdUeL5g@mail.gmail.com>
@ 2016-10-24 16:52     ` kefu chai
  0 siblings, 0 replies; 2+ messages in thread
From: kefu chai @ 2016-10-24 16:52 UTC (permalink / raw)
  To: agung Laksono; +Cc: ceph-devel

resending to mailing list. was rejected due to non plain-text mail.

On Fri, Oct 21, 2016 at 5:38 PM, kefu chai <tchaikov@gmail.com> wrote:
>
>
> On Monday, October 10, 2016, agung Laksono <agung.smarts@gmail.com> wrote:
>>
>> Thank you for the answer..
>>
>>
>> Could you tell me a bit about the peer.
>>
>> Does the peer means a client?
>
>
> it's very likely an osd.
>
>> when I run:
>> agung@ceph:~/project/infernalis/src$ ag --cpp ms_handle_reset
>>
>> the result shows so many places call ms_handle_reset.
>> But I am not sure which one that trigger Monitor:ms_handle_reset.
>
>
> it must be somewhere in messenger subsystem. you can set a breakpoint at
> ms_handle_reset() in gdb, and connectthe monitor with ceph cli, then kill
> it.
>
>>
>>
>>
>>
>> On Sat, Oct 8, 2016 at 10:52 AM, kefu chai <tchaikov@gmail.com> wrote:
>>>
>>> + ceph-devel
>>>
>>> On Thu, Sep 29, 2016 at 2:35 PM, agung Laksono <agung.smarts@gmail.com>
>>> wrote:
>>> >
>>> > I would like to ask you relate to ceph-15113. On the bug description,
>>> > the scenario to reproduce the bug is:
>>>
>>> this is not how the bug is reproduces, it's just my analysis of the
>>> root cause. IMO,
>>> it would be a tricky to reproduce this racing if possible.
>>>
>>> >
>>> > so the session was not removed, that's why the request was handled
>>> > after the
>>> > connection is reset. this is a race condition:
>>> >
>>> > ______________________ SafeTimer::timer_thread(), with mon_lock:
>>> > ______________________ elector: in win_election(), it
>>> > resend_routed_requests(), and collects the routed requests
>>> > msgr: in ms_handle_reset(), it reset the session
>>> > msgr: it waits for the lock
>>> > ______________________ elector: in win_election(), it handle_command(),
>>> > but
>>> > the session is reset, hence it panics.
>>> > msgr: remove session, and erase related requests from
>>> > Monitor::routed_requests.
>>> >
>>> >
>>> > I try to reproduce this bug on a cluster in my local machine and
>>> > find difficulty when reproducing ms_handle_reset.
>>> >
>>> > Does ms_handle_reset refer to Monitor::ms_handle_reset(Connection
>>> > *con)?
>>>
>>> yes.
>>>
>>> > How to trigger this function? On my study, I've put mon log when this
>>> > method
>>>
>>> when the peer resets the connection.
>>>
>>> > executed.
>>> > however, I saw that ms_handle_reset was called randomly. I mean this
>>> > function also be called
>>> > in several times when the system run.
>>> >
>>> >  Thank you in advance!
>>> >
>>> > --
>>> > Cheers,
>>> >
>>> > Agung Laksono
>>> >
>>>
>>>
>>>
>>> --
>>> Regards
>>> Kefu Chai
>>
>>
>>
>>
>> --
>> Cheers,
>>
>> Agung Laksono
>>
>
>
>
>
>
> --
> Regards
> Kefu Chai



-- 
Regards
Kefu Chai

^ permalink raw reply	[flat|nested] 2+ messages in thread

end of thread, other threads:[~2016-10-24 16:52 UTC | newest]

Thread overview: 2+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-10-08  3:52 Asking bug ceph 15113 kefu chai
     [not found] ` <CAJ+86MP_7q-p63X6+6J7b7Xoiq-23RTjafo7Ux5QBwKxe1M2Cg@mail.gmail.com>
     [not found]   ` <CAJE9aOOv1iRFaWuEn6K2y6fPEOKC3R+-+8Uq_bcwr+mgdUeL5g@mail.gmail.com>
2016-10-24 16:52     ` kefu chai

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.