All of lore.kernel.org
 help / color / mirror / Atom feed
* some issue about peering progress
@ 2017-10-27  7:45 Xinze Chi (信泽)
  2017-11-01 16:42 ` Ning Yao
  2017-11-01 20:26 ` Gregory Farnum
  0 siblings, 2 replies; 5+ messages in thread
From: Xinze Chi (信泽) @ 2017-10-27  7:45 UTC (permalink / raw)
  To: ceph-devel

hi, all:

     I confuse about the notify message during peering. Such as:

    epoch 1, primary osd do Pering , GetInfo and GetMissing, it
calling the func  proc_replica_log. in this func the last_complete and
last_update maybe reset.

    Before go to Activate. the OSDMap change (the new osdmap do not
lead to restart peering), the non-primary osd send the notify to
primary.

    When the primary receive the notify, Primary::react(const
MNotifyRec& notevt), so it call the func proc_replica_info.

    In the func, we update the pg info including last_complete and
last_update which modified in proc_replica_log.

    When the primary call the func activate, the primary osd  process
recovering based on pg info got by notify instead of proc_replica_log.

    so it is a bug?

-- 
Regards,
Xinze Chi

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: some issue about peering progress
  2017-10-27  7:45 some issue about peering progress Xinze Chi (信泽)
@ 2017-11-01 16:42 ` Ning Yao
  2017-11-01 20:26 ` Gregory Farnum
  1 sibling, 0 replies; 5+ messages in thread
From: Ning Yao @ 2017-11-01 16:42 UTC (permalink / raw)
  To: Xinze Chi (信泽); +Cc: ceph-devel

Can anyone take a look?

If anything will go wrong when divergent occurs and peer_info is reset
in Primary::react(const
MNotifyRec& notevt)?

@Sage  do you think so?
Regards
Ning Yao


2017-10-27 15:45 GMT+08:00 Xinze Chi (信泽) <xmdxcxz@gmail.com>:
> hi, all:
>
>      I confuse about the notify message during peering. Such as:
>
>     epoch 1, primary osd do Pering , GetInfo and GetMissing, it
> calling the func  proc_replica_log. in this func the last_complete and
> last_update maybe reset.
>
>     Before go to Activate. the OSDMap change (the new osdmap do not
> lead to restart peering), the non-primary osd send the notify to
> primary.
>
>     When the primary receive the notify, Primary::react(const
> MNotifyRec& notevt), so it call the func proc_replica_info.
>
>     In the func, we update the pg info including last_complete and
> last_update which modified in proc_replica_log.
>
>     When the primary call the func activate, the primary osd  process
> recovering based on pg info got by notify instead of proc_replica_log.
>
>     so it is a bug?
>
> --
> Regards,
> Xinze Chi
> --
> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: some issue about peering progress
  2017-10-27  7:45 some issue about peering progress Xinze Chi (信泽)
  2017-11-01 16:42 ` Ning Yao
@ 2017-11-01 20:26 ` Gregory Farnum
       [not found]   ` <CANE=7sXWMXTpTfgG6NmwxYYyjYA2_UZ3oNun4eAw+QNiht2nkg@mail.gmail.com>
  1 sibling, 1 reply; 5+ messages in thread
From: Gregory Farnum @ 2017-11-01 20:26 UTC (permalink / raw)
  To: Xinze Chi (信泽); +Cc: ceph-devel

On Fri, Oct 27, 2017 at 12:46 AM Xinze Chi (信泽) <xmdxcxz@gmail.com> wrote:
>
> hi, all:
>
>      I confuse about the notify message during peering. Such as:
>
>     epoch 1, primary osd do Pering , GetInfo and GetMissing, it
> calling the func  proc_replica_log. in this func the last_complete and
> last_update maybe reset.
>
>     Before go to Activate. the OSDMap change (the new osdmap do not
> lead to restart peering), the non-primary osd send the notify to
> primary.


I don't think this can happen. The OSD won't re-send a notify during
the same peering interval, and even if it did the message would be
tagged with a new (higher) epoch so the PG wouldn't process it until
after it had switched states, right?

>
>
>     When the primary receive the notify, Primary::react(const
> MNotifyRec& notevt), so it call the func proc_replica_info.
>
>     In the func, we update the pg info including last_complete and
> last_update which modified in proc_replica_log.

Note also that "PG::RecoveryState::Active::react(const MNotifyRec&
notevt)" does *not* unconditionally invoke proc_replica_info(). I
think you were trying to say we hadn't reached this state on receipt
of the message? But as I mentioned above, I think we block so that's
not actually possible either.

>
>     When the primary call the func activate, the primary osd  process
> recovering based on pg info got by notify instead of proc_replica_log.
>
>     so it is a bug?

Have you seen issues in the wild, or just trying to understand this
code/algorithm? I would be surprised if we had undiscovered issues
here just because our tests exercise peering quite vigorously, but I
might be missing what's happening in my own code skims.
-Greg

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: some issue about peering progress
       [not found]     ` <CAJ4mKGaFP+5vePBNzm0fk3pqONirfCyGhrGynZ4BTD3AwfNMuw@mail.gmail.com>
@ 2017-11-02  3:35       ` Xinze Chi (信泽)
  2017-11-28 23:35         ` Gregory Farnum
  0 siblings, 1 reply; 5+ messages in thread
From: Xinze Chi (信泽) @ 2017-11-02  3:35 UTC (permalink / raw)
  To: Gregory Farnum, ceph-devel

The Stray set send_notify false only if go to activate?

2017-11-02 10:27 GMT+08:00 Gregory Farnum <gfarnum@redhat.com>:
> On Wed, Nov 1, 2017 at 5:27 PM Xinze Chi (信泽) <xmdxcxz@gmail.com> wrote:
>>
>> 2017-11-02 4:26 GMT+08:00 Gregory Farnum <gfarnum@redhat.com>:
>> > On Fri, Oct 27, 2017 at 12:46 AM Xinze Chi (信泽) <xmdxcxz@gmail.com> wrote:
>> >>
>> >> hi, all:
>> >>
>> >>      I confuse about the notify message during peering. Such as:
>> >>
>> >>     epoch 1, primary osd do Pering , GetInfo and GetMissing, it
>> >> calling the func  proc_replica_log. in this func the last_complete and
>> >> last_update maybe reset.
>> >>
>> >>     Before go to Activate. the OSDMap change (the new osdmap do not
>> >> lead to restart peering), the non-primary osd send the notify to
>> >> primary.
>> >
>> >
>> > I don't think this can happen. The OSD won't re-send a notify during
>> > the same peering interval, and even if it did the message would be
>> > tagged with a new (higher) epoch so the PG wouldn't process it until
>> > after it had switched states, right?
>> >
>>
>>    I just want to understand this algorithm. When the Stray osd received ActMap
>>
>> it would send_notity even if during the same peering interval. see
>> Stray::react(const ActMap&).
>
> Note the
>
> if (pg->should_send_notify()
>
> check preceding that block. It checks a boolean send_notify value that
> is set true only when it enters a new peering interval, and is set
> false as soon as it shares its info. So I don't think the primary's
> behavior matters at all (other than from a security perspective,
> anyway).
>
>
>>   You say the priamry osd wouldn't process the notify msg, I do not
>> find out the code. The primary
>>
>> call handle_pg_notify and process it.
>
> I didn't actually track the order of the state machine here; I just
> saw that PG::RecoveryState::Active::react(const MNotifyRec& notevt)
> will throw them out if it's already seen the info. You're right
> PG::RecoveryState::Primary::react(const MNotifyRec& notevt) will
> process it unconditionally. I'm not sure if those are the replica and
> primary states, or if you move from Primary to Active (or vice versa).
> -Greg
>
>>
>>
>> >>
>> >>
>> >>     When the primary receive the notify, Primary::react(const
>> >> MNotifyRec& notevt), so it call the func proc_replica_info.
>> >>
>> >>     In the func, we update the pg info including last_complete and
>> >> last_update which modified in proc_replica_log.
>> >
>> > Note also that "PG::RecoveryState::Active::react(const MNotifyRec&
>> > notevt)" does *not* unconditionally invoke proc_replica_info(). I
>> > think you were trying to say we hadn't reached this state on receipt
>> > of the message? But as I mentioned above, I think we block so that's
>> > not actually possible either.
>> >
>> >>
>> >>     When the primary call the func activate, the primary osd  process
>> >> recovering based on pg info got by notify instead of proc_replica_log.
>> >>
>> >>     so it is a bug?
>> >
>> > Have you seen issues in the wild, or just trying to understand this
>> > code/algorithm? I would be surprised if we had undiscovered issues
>> > here just because our tests exercise peering quite vigorously, but I
>> > might be missing what's happening in my own code skims.
>> > -Greg
>>
>>
>>
>> --
>> Regards,
>> Xinze Chi



-- 
Regards,
Xinze Chi

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: some issue about peering progress
  2017-11-02  3:35       ` Xinze Chi (信泽)
@ 2017-11-28 23:35         ` Gregory Farnum
  0 siblings, 0 replies; 5+ messages in thread
From: Gregory Farnum @ 2017-11-28 23:35 UTC (permalink / raw)
  To: Xinze Chi (信泽); +Cc: ceph-devel

Hmm, just noticed I never replied to this.

You are correct and I was not reading carefully enough; the
send_notify is only set to false on Activate.

I still think something in the stack will block messages if they're
from a too-new epoch, or that this is correct because of something
else (we do a *lot* of OSD thrashing tests), but I didn't track down
exactly how/why.
-Greg

On Wed, Nov 1, 2017 at 11:35 PM Xinze Chi (信泽) <xmdxcxz@gmail.com> wrote:
>
> The Stray set send_notify false only if go to activate?
>
> 2017-11-02 10:27 GMT+08:00 Gregory Farnum <gfarnum@redhat.com>:
> > On Wed, Nov 1, 2017 at 5:27 PM Xinze Chi (信泽) <xmdxcxz@gmail.com> wrote:
> >>
> >> 2017-11-02 4:26 GMT+08:00 Gregory Farnum <gfarnum@redhat.com>:
> >> > On Fri, Oct 27, 2017 at 12:46 AM Xinze Chi (信泽) <xmdxcxz@gmail.com> wrote:
> >> >>
> >> >> hi, all:
> >> >>
> >> >>      I confuse about the notify message during peering. Such as:
> >> >>
> >> >>     epoch 1, primary osd do Pering , GetInfo and GetMissing, it
> >> >> calling the func  proc_replica_log. in this func the last_complete and
> >> >> last_update maybe reset.
> >> >>
> >> >>     Before go to Activate. the OSDMap change (the new osdmap do not
> >> >> lead to restart peering), the non-primary osd send the notify to
> >> >> primary.
> >> >
> >> >
> >> > I don't think this can happen. The OSD won't re-send a notify during
> >> > the same peering interval, and even if it did the message would be
> >> > tagged with a new (higher) epoch so the PG wouldn't process it until
> >> > after it had switched states, right?
> >> >
> >>
> >>    I just want to understand this algorithm. When the Stray osd received ActMap
> >>
> >> it would send_notity even if during the same peering interval. see
> >> Stray::react(const ActMap&).
> >
> > Note the
> >
> > if (pg->should_send_notify()
> >
> > check preceding that block. It checks a boolean send_notify value that
> > is set true only when it enters a new peering interval, and is set
> > false as soon as it shares its info. So I don't think the primary's
> > behavior matters at all (other than from a security perspective,
> > anyway).
> >
> >
> >>   You say the priamry osd wouldn't process the notify msg, I do not
> >> find out the code. The primary
> >>
> >> call handle_pg_notify and process it.
> >
> > I didn't actually track the order of the state machine here; I just
> > saw that PG::RecoveryState::Active::react(const MNotifyRec& notevt)
> > will throw them out if it's already seen the info. You're right
> > PG::RecoveryState::Primary::react(const MNotifyRec& notevt) will
> > process it unconditionally. I'm not sure if those are the replica and
> > primary states, or if you move from Primary to Active (or vice versa).
> > -Greg
> >
> >>
> >>
> >> >>
> >> >>
> >> >>     When the primary receive the notify, Primary::react(const
> >> >> MNotifyRec& notevt), so it call the func proc_replica_info.
> >> >>
> >> >>     In the func, we update the pg info including last_complete and
> >> >> last_update which modified in proc_replica_log.
> >> >
> >> > Note also that "PG::RecoveryState::Active::react(const MNotifyRec&
> >> > notevt)" does *not* unconditionally invoke proc_replica_info(). I
> >> > think you were trying to say we hadn't reached this state on receipt
> >> > of the message? But as I mentioned above, I think we block so that's
> >> > not actually possible either.
> >> >
> >> >>
> >> >>     When the primary call the func activate, the primary osd  process
> >> >> recovering based on pg info got by notify instead of proc_replica_log.
> >> >>
> >> >>     so it is a bug?
> >> >
> >> > Have you seen issues in the wild, or just trying to understand this
> >> > code/algorithm? I would be surprised if we had undiscovered issues
> >> > here just because our tests exercise peering quite vigorously, but I
> >> > might be missing what's happening in my own code skims.
> >> > -Greg
> >>
> >>
> >>
> >> --
> >> Regards,
> >> Xinze Chi
>
>
>
> --
> Regards,
> Xinze Chi

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2017-11-28 23:35 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-10-27  7:45 some issue about peering progress Xinze Chi (信泽)
2017-11-01 16:42 ` Ning Yao
2017-11-01 20:26 ` Gregory Farnum
     [not found]   ` <CANE=7sXWMXTpTfgG6NmwxYYyjYA2_UZ3oNun4eAw+QNiht2nkg@mail.gmail.com>
     [not found]     ` <CAJ4mKGaFP+5vePBNzm0fk3pqONirfCyGhrGynZ4BTD3AwfNMuw@mail.gmail.com>
2017-11-02  3:35       ` Xinze Chi (信泽)
2017-11-28 23:35         ` Gregory Farnum

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.