Re: Asking bug ceph 15113

* Re: Asking bug ceph 15113
@ 2016-10-08  3:52 kefu chai
       [not found] ` <CAJ+86MP_7q-p63X6+6J7b7Xoiq-23RTjafo7Ux5QBwKxe1M2Cg@mail.gmail.com>
  0 siblings, 1 reply; 2+ messages in thread
From: kefu chai @ 2016-10-08  3:52 UTC (permalink / raw)
  To: agung Laksono, ceph-devel

+ ceph-devel

On Thu, Sep 29, 2016 at 2:35 PM, agung Laksono <agung.smarts@gmail.com> wrote:
>
> I would like to ask you relate to ceph-15113. On the bug description,
> the scenario to reproduce the bug is:

this is not how the bug is reproduces, it's just my analysis of the
root cause. IMO,
it would be a tricky to reproduce this racing if possible.

>
> so the session was not removed, that's why the request was handled after the
> connection is reset. this is a race condition:
>
> ______________________ SafeTimer::timer_thread(), with mon_lock:
> ______________________ elector: in win_election(), it
> resend_routed_requests(), and collects the routed requests
> msgr: in ms_handle_reset(), it reset the session
> msgr: it waits for the lock
> ______________________ elector: in win_election(), it handle_command(), but
> the session is reset, hence it panics.
> msgr: remove session, and erase related requests from
> Monitor::routed_requests.
>
>
> I try to reproduce this bug on a cluster in my local machine and
> find difficulty when reproducing ms_handle_reset.
>
> Does ms_handle_reset refer to Monitor::ms_handle_reset(Connection *con)?

yes.

> How to trigger this function? On my study, I've put mon log when this method

when the peer resets the connection.

> executed.
> however, I saw that ms_handle_reset was called randomly. I mean this
> function also be called
> in several times when the system run.
>
>  Thank you in advance!
>
> --
> Cheers,
>
> Agung Laksono
>

-- 
Regards
Kefu Chai

^ permalink raw reply	[flat|nested] 2+ messages in thread