* some issue about peering progress @ 2017-10-27 7:45 Xinze Chi (信泽) 2017-11-01 16:42 ` Ning Yao 2017-11-01 20:26 ` Gregory Farnum 0 siblings, 2 replies; 5+ messages in thread From: Xinze Chi (信泽) @ 2017-10-27 7:45 UTC (permalink / raw) To: ceph-devel hi, all: I confuse about the notify message during peering. Such as: epoch 1, primary osd do Pering , GetInfo and GetMissing, it calling the func proc_replica_log. in this func the last_complete and last_update maybe reset. Before go to Activate. the OSDMap change (the new osdmap do not lead to restart peering), the non-primary osd send the notify to primary. When the primary receive the notify, Primary::react(const MNotifyRec& notevt), so it call the func proc_replica_info. In the func, we update the pg info including last_complete and last_update which modified in proc_replica_log. When the primary call the func activate, the primary osd process recovering based on pg info got by notify instead of proc_replica_log. so it is a bug? -- Regards, Xinze Chi ^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: some issue about peering progress 2017-10-27 7:45 some issue about peering progress Xinze Chi (信泽) @ 2017-11-01 16:42 ` Ning Yao 2017-11-01 20:26 ` Gregory Farnum 1 sibling, 0 replies; 5+ messages in thread From: Ning Yao @ 2017-11-01 16:42 UTC (permalink / raw) To: Xinze Chi (信泽); +Cc: ceph-devel Can anyone take a look? If anything will go wrong when divergent occurs and peer_info is reset in Primary::react(const MNotifyRec& notevt)? @Sage do you think so? Regards Ning Yao 2017-10-27 15:45 GMT+08:00 Xinze Chi (信泽) <xmdxcxz@gmail.com>: > hi, all: > > I confuse about the notify message during peering. Such as: > > epoch 1, primary osd do Pering , GetInfo and GetMissing, it > calling the func proc_replica_log. in this func the last_complete and > last_update maybe reset. > > Before go to Activate. the OSDMap change (the new osdmap do not > lead to restart peering), the non-primary osd send the notify to > primary. > > When the primary receive the notify, Primary::react(const > MNotifyRec& notevt), so it call the func proc_replica_info. > > In the func, we update the pg info including last_complete and > last_update which modified in proc_replica_log. > > When the primary call the func activate, the primary osd process > recovering based on pg info got by notify instead of proc_replica_log. > > so it is a bug? > > -- > Regards, > Xinze Chi > -- > To unsubscribe from this list: send the line "unsubscribe ceph-devel" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: some issue about peering progress 2017-10-27 7:45 some issue about peering progress Xinze Chi (信泽) 2017-11-01 16:42 ` Ning Yao @ 2017-11-01 20:26 ` Gregory Farnum [not found] ` <CANE=7sXWMXTpTfgG6NmwxYYyjYA2_UZ3oNun4eAw+QNiht2nkg@mail.gmail.com> 1 sibling, 1 reply; 5+ messages in thread From: Gregory Farnum @ 2017-11-01 20:26 UTC (permalink / raw) To: Xinze Chi (信泽); +Cc: ceph-devel On Fri, Oct 27, 2017 at 12:46 AM Xinze Chi (信泽) <xmdxcxz@gmail.com> wrote: > > hi, all: > > I confuse about the notify message during peering. Such as: > > epoch 1, primary osd do Pering , GetInfo and GetMissing, it > calling the func proc_replica_log. in this func the last_complete and > last_update maybe reset. > > Before go to Activate. the OSDMap change (the new osdmap do not > lead to restart peering), the non-primary osd send the notify to > primary. I don't think this can happen. The OSD won't re-send a notify during the same peering interval, and even if it did the message would be tagged with a new (higher) epoch so the PG wouldn't process it until after it had switched states, right? > > > When the primary receive the notify, Primary::react(const > MNotifyRec& notevt), so it call the func proc_replica_info. > > In the func, we update the pg info including last_complete and > last_update which modified in proc_replica_log. Note also that "PG::RecoveryState::Active::react(const MNotifyRec& notevt)" does *not* unconditionally invoke proc_replica_info(). I think you were trying to say we hadn't reached this state on receipt of the message? But as I mentioned above, I think we block so that's not actually possible either. > > When the primary call the func activate, the primary osd process > recovering based on pg info got by notify instead of proc_replica_log. > > so it is a bug? Have you seen issues in the wild, or just trying to understand this code/algorithm? I would be surprised if we had undiscovered issues here just because our tests exercise peering quite vigorously, but I might be missing what's happening in my own code skims. -Greg ^ permalink raw reply [flat|nested] 5+ messages in thread
[parent not found: <CANE=7sXWMXTpTfgG6NmwxYYyjYA2_UZ3oNun4eAw+QNiht2nkg@mail.gmail.com>]
[parent not found: <CAJ4mKGaFP+5vePBNzm0fk3pqONirfCyGhrGynZ4BTD3AwfNMuw@mail.gmail.com>]
* Re: some issue about peering progress [not found] ` <CAJ4mKGaFP+5vePBNzm0fk3pqONirfCyGhrGynZ4BTD3AwfNMuw@mail.gmail.com> @ 2017-11-02 3:35 ` Xinze Chi (信泽) 2017-11-28 23:35 ` Gregory Farnum 0 siblings, 1 reply; 5+ messages in thread From: Xinze Chi (信泽) @ 2017-11-02 3:35 UTC (permalink / raw) To: Gregory Farnum, ceph-devel The Stray set send_notify false only if go to activate? 2017-11-02 10:27 GMT+08:00 Gregory Farnum <gfarnum@redhat.com>: > On Wed, Nov 1, 2017 at 5:27 PM Xinze Chi (信泽) <xmdxcxz@gmail.com> wrote: >> >> 2017-11-02 4:26 GMT+08:00 Gregory Farnum <gfarnum@redhat.com>: >> > On Fri, Oct 27, 2017 at 12:46 AM Xinze Chi (信泽) <xmdxcxz@gmail.com> wrote: >> >> >> >> hi, all: >> >> >> >> I confuse about the notify message during peering. Such as: >> >> >> >> epoch 1, primary osd do Pering , GetInfo and GetMissing, it >> >> calling the func proc_replica_log. in this func the last_complete and >> >> last_update maybe reset. >> >> >> >> Before go to Activate. the OSDMap change (the new osdmap do not >> >> lead to restart peering), the non-primary osd send the notify to >> >> primary. >> > >> > >> > I don't think this can happen. The OSD won't re-send a notify during >> > the same peering interval, and even if it did the message would be >> > tagged with a new (higher) epoch so the PG wouldn't process it until >> > after it had switched states, right? >> > >> >> I just want to understand this algorithm. When the Stray osd received ActMap >> >> it would send_notity even if during the same peering interval. see >> Stray::react(const ActMap&). > > Note the > > if (pg->should_send_notify() > > check preceding that block. It checks a boolean send_notify value that > is set true only when it enters a new peering interval, and is set > false as soon as it shares its info. So I don't think the primary's > behavior matters at all (other than from a security perspective, > anyway). > > >> You say the priamry osd wouldn't process the notify msg, I do not >> find out the code. The primary >> >> call handle_pg_notify and process it. > > I didn't actually track the order of the state machine here; I just > saw that PG::RecoveryState::Active::react(const MNotifyRec& notevt) > will throw them out if it's already seen the info. You're right > PG::RecoveryState::Primary::react(const MNotifyRec& notevt) will > process it unconditionally. I'm not sure if those are the replica and > primary states, or if you move from Primary to Active (or vice versa). > -Greg > >> >> >> >> >> >> >> >> When the primary receive the notify, Primary::react(const >> >> MNotifyRec& notevt), so it call the func proc_replica_info. >> >> >> >> In the func, we update the pg info including last_complete and >> >> last_update which modified in proc_replica_log. >> > >> > Note also that "PG::RecoveryState::Active::react(const MNotifyRec& >> > notevt)" does *not* unconditionally invoke proc_replica_info(). I >> > think you were trying to say we hadn't reached this state on receipt >> > of the message? But as I mentioned above, I think we block so that's >> > not actually possible either. >> > >> >> >> >> When the primary call the func activate, the primary osd process >> >> recovering based on pg info got by notify instead of proc_replica_log. >> >> >> >> so it is a bug? >> > >> > Have you seen issues in the wild, or just trying to understand this >> > code/algorithm? I would be surprised if we had undiscovered issues >> > here just because our tests exercise peering quite vigorously, but I >> > might be missing what's happening in my own code skims. >> > -Greg >> >> >> >> -- >> Regards, >> Xinze Chi -- Regards, Xinze Chi ^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: some issue about peering progress 2017-11-02 3:35 ` Xinze Chi (信泽) @ 2017-11-28 23:35 ` Gregory Farnum 0 siblings, 0 replies; 5+ messages in thread From: Gregory Farnum @ 2017-11-28 23:35 UTC (permalink / raw) To: Xinze Chi (信泽); +Cc: ceph-devel Hmm, just noticed I never replied to this. You are correct and I was not reading carefully enough; the send_notify is only set to false on Activate. I still think something in the stack will block messages if they're from a too-new epoch, or that this is correct because of something else (we do a *lot* of OSD thrashing tests), but I didn't track down exactly how/why. -Greg On Wed, Nov 1, 2017 at 11:35 PM Xinze Chi (信泽) <xmdxcxz@gmail.com> wrote: > > The Stray set send_notify false only if go to activate? > > 2017-11-02 10:27 GMT+08:00 Gregory Farnum <gfarnum@redhat.com>: > > On Wed, Nov 1, 2017 at 5:27 PM Xinze Chi (信泽) <xmdxcxz@gmail.com> wrote: > >> > >> 2017-11-02 4:26 GMT+08:00 Gregory Farnum <gfarnum@redhat.com>: > >> > On Fri, Oct 27, 2017 at 12:46 AM Xinze Chi (信泽) <xmdxcxz@gmail.com> wrote: > >> >> > >> >> hi, all: > >> >> > >> >> I confuse about the notify message during peering. Such as: > >> >> > >> >> epoch 1, primary osd do Pering , GetInfo and GetMissing, it > >> >> calling the func proc_replica_log. in this func the last_complete and > >> >> last_update maybe reset. > >> >> > >> >> Before go to Activate. the OSDMap change (the new osdmap do not > >> >> lead to restart peering), the non-primary osd send the notify to > >> >> primary. > >> > > >> > > >> > I don't think this can happen. The OSD won't re-send a notify during > >> > the same peering interval, and even if it did the message would be > >> > tagged with a new (higher) epoch so the PG wouldn't process it until > >> > after it had switched states, right? > >> > > >> > >> I just want to understand this algorithm. When the Stray osd received ActMap > >> > >> it would send_notity even if during the same peering interval. see > >> Stray::react(const ActMap&). > > > > Note the > > > > if (pg->should_send_notify() > > > > check preceding that block. It checks a boolean send_notify value that > > is set true only when it enters a new peering interval, and is set > > false as soon as it shares its info. So I don't think the primary's > > behavior matters at all (other than from a security perspective, > > anyway). > > > > > >> You say the priamry osd wouldn't process the notify msg, I do not > >> find out the code. The primary > >> > >> call handle_pg_notify and process it. > > > > I didn't actually track the order of the state machine here; I just > > saw that PG::RecoveryState::Active::react(const MNotifyRec& notevt) > > will throw them out if it's already seen the info. You're right > > PG::RecoveryState::Primary::react(const MNotifyRec& notevt) will > > process it unconditionally. I'm not sure if those are the replica and > > primary states, or if you move from Primary to Active (or vice versa). > > -Greg > > > >> > >> > >> >> > >> >> > >> >> When the primary receive the notify, Primary::react(const > >> >> MNotifyRec& notevt), so it call the func proc_replica_info. > >> >> > >> >> In the func, we update the pg info including last_complete and > >> >> last_update which modified in proc_replica_log. > >> > > >> > Note also that "PG::RecoveryState::Active::react(const MNotifyRec& > >> > notevt)" does *not* unconditionally invoke proc_replica_info(). I > >> > think you were trying to say we hadn't reached this state on receipt > >> > of the message? But as I mentioned above, I think we block so that's > >> > not actually possible either. > >> > > >> >> > >> >> When the primary call the func activate, the primary osd process > >> >> recovering based on pg info got by notify instead of proc_replica_log. > >> >> > >> >> so it is a bug? > >> > > >> > Have you seen issues in the wild, or just trying to understand this > >> > code/algorithm? I would be surprised if we had undiscovered issues > >> > here just because our tests exercise peering quite vigorously, but I > >> > might be missing what's happening in my own code skims. > >> > -Greg > >> > >> > >> > >> -- > >> Regards, > >> Xinze Chi > > > > -- > Regards, > Xinze Chi ^ permalink raw reply [flat|nested] 5+ messages in thread
end of thread, other threads:[~2017-11-28 23:35 UTC | newest] Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2017-10-27 7:45 some issue about peering progress Xinze Chi (信泽) 2017-11-01 16:42 ` Ning Yao 2017-11-01 20:26 ` Gregory Farnum [not found] ` <CANE=7sXWMXTpTfgG6NmwxYYyjYA2_UZ3oNun4eAw+QNiht2nkg@mail.gmail.com> [not found] ` <CAJ4mKGaFP+5vePBNzm0fk3pqONirfCyGhrGynZ4BTD3AwfNMuw@mail.gmail.com> 2017-11-02 3:35 ` Xinze Chi (信泽) 2017-11-28 23:35 ` Gregory Farnum
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.