* crc error when decode_message? @ 2015-03-16 12:02 Xinze Chi 2015-03-16 13:19 ` Xinze Chi 0 siblings, 1 reply; 9+ messages in thread From: Xinze Chi @ 2015-03-16 12:02 UTC (permalink / raw) To: ceph-devel hi, all: I want to know what is the behavior of primary when decode_message crc error , such as read ack response message from remote peer? Thanks. ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: crc error when decode_message? 2015-03-16 12:02 crc error when decode_message? Xinze Chi @ 2015-03-16 13:19 ` Xinze Chi 2015-03-16 14:01 ` Haomai Wang 0 siblings, 1 reply; 9+ messages in thread From: Xinze Chi @ 2015-03-16 13:19 UTC (permalink / raw) To: ceph-devel Such as, Client send write request to osd.0 (primary), osd.0 send MOSDSubOp to osd.1 and osd.2 osd.1 send reply to osd.0 (primary), but accident happened: 1. decode_message crc error when decode reply msg or 2. the reply msg is lost when send to osd.0, so osd.0 do not receive replay msg Could anyone tell me what is the behavior if osd.0 (primary)? Thanks 2015-03-16 20:02 GMT+08:00 Xinze Chi <xmdxcxz@gmail.com>: > hi, all: > > I want to know what is the behavior of primary when > decode_message crc error , such as read > > ack response message from remote peer? > > Thanks. ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: crc error when decode_message? 2015-03-16 13:19 ` Xinze Chi @ 2015-03-16 14:01 ` Haomai Wang 2015-03-16 14:04 ` Xinze Chi 0 siblings, 1 reply; 9+ messages in thread From: Haomai Wang @ 2015-03-16 14:01 UTC (permalink / raw) To: Xinze Chi; +Cc: ceph-devel AFAR Pipe and AsyncConnection both will mark self fault and shutdown socket and peer will detect this reset. So each side has chance to rebuild the session. On Mon, Mar 16, 2015 at 9:19 PM, Xinze Chi <xmdxcxz@gmail.com> wrote: > Such as, Client send write request to osd.0 (primary), osd.0 send > MOSDSubOp to osd.1 and osd.2 > > osd.1 send reply to osd.0 (primary), but accident happened: > > 1. decode_message crc error when decode reply msg > or > 2. the reply msg is lost when send to osd.0, so osd.0 do not receive replay msg > > Could anyone tell me what is the behavior if osd.0 (primary)? > > Thanks > > 2015-03-16 20:02 GMT+08:00 Xinze Chi <xmdxcxz@gmail.com>: >> hi, all: >> >> I want to know what is the behavior of primary when >> decode_message crc error , such as read >> >> ack response message from remote peer? >> >> Thanks. > -- > To unsubscribe from this list: send the line "unsubscribe ceph-devel" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html -- Best Regards, Wheat ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: crc error when decode_message? 2015-03-16 14:01 ` Haomai Wang @ 2015-03-16 14:04 ` Xinze Chi 2015-03-16 14:06 ` Haomai Wang 0 siblings, 1 reply; 9+ messages in thread From: Xinze Chi @ 2015-03-16 14:04 UTC (permalink / raw) To: Haomai Wang, ceph-devel How to process the write request in primary? Thanks. 2015-03-16 22:01 GMT+08:00 Haomai Wang <haomaiwang@gmail.com>: > AFAR Pipe and AsyncConnection both will mark self fault and shutdown > socket and peer will detect this reset. So each side has chance to > rebuild the session. > > On Mon, Mar 16, 2015 at 9:19 PM, Xinze Chi <xmdxcxz@gmail.com> wrote: >> Such as, Client send write request to osd.0 (primary), osd.0 send >> MOSDSubOp to osd.1 and osd.2 >> >> osd.1 send reply to osd.0 (primary), but accident happened: >> >> 1. decode_message crc error when decode reply msg >> or >> 2. the reply msg is lost when send to osd.0, so osd.0 do not receive replay msg >> >> Could anyone tell me what is the behavior if osd.0 (primary)? >> >> Thanks >> >> 2015-03-16 20:02 GMT+08:00 Xinze Chi <xmdxcxz@gmail.com>: >>> hi, all: >>> >>> I want to know what is the behavior of primary when >>> decode_message crc error , such as read >>> >>> ack response message from remote peer? >>> >>> Thanks. >> -- >> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in >> the body of a message to majordomo@vger.kernel.org >> More majordomo info at http://vger.kernel.org/majordomo-info.html > > > > -- > Best Regards, > > Wheat ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: crc error when decode_message? 2015-03-16 14:04 ` Xinze Chi @ 2015-03-16 14:06 ` Haomai Wang 2015-03-17 7:23 ` Ning Yao 0 siblings, 1 reply; 9+ messages in thread From: Haomai Wang @ 2015-03-16 14:06 UTC (permalink / raw) To: Xinze Chi; +Cc: ceph-devel On Mon, Mar 16, 2015 at 10:04 PM, Xinze Chi <xmdxcxz@gmail.com> wrote: > How to process the write request in primary? > > Thanks. > > 2015-03-16 22:01 GMT+08:00 Haomai Wang <haomaiwang@gmail.com>: >> AFAR Pipe and AsyncConnection both will mark self fault and shutdown >> socket and peer will detect this reset. So each side has chance to >> rebuild the session. >> >> On Mon, Mar 16, 2015 at 9:19 PM, Xinze Chi <xmdxcxz@gmail.com> wrote: >>> Such as, Client send write request to osd.0 (primary), osd.0 send >>> MOSDSubOp to osd.1 and osd.2 >>> >>> osd.1 send reply to osd.0 (primary), but accident happened: >>> >>> 1. decode_message crc error when decode reply msg >>> or >>> 2. the reply msg is lost when send to osd.0, so osd.0 do not receive replay msg >>> >>> Could anyone tell me what is the behavior if osd.0 (primary)? >>> osd.0 and osd.1 both will try to reconnect peer side, and the lost message will be resend to osd.0 from osd.1 >>> Thanks >>> >>> 2015-03-16 20:02 GMT+08:00 Xinze Chi <xmdxcxz@gmail.com>: >>>> hi, all: >>>> >>>> I want to know what is the behavior of primary when >>>> decode_message crc error , such as read >>>> >>>> ack response message from remote peer? >>>> >>>> Thanks. >>> -- >>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in >>> the body of a message to majordomo@vger.kernel.org >>> More majordomo info at http://vger.kernel.org/majordomo-info.html >> >> >> >> -- >> Best Regards, >> >> Wheat -- Best Regards, Wheat ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: crc error when decode_message? 2015-03-16 14:06 ` Haomai Wang @ 2015-03-17 7:23 ` Ning Yao 2015-03-17 13:46 ` Sage Weil 0 siblings, 1 reply; 9+ messages in thread From: Ning Yao @ 2015-03-17 7:23 UTC (permalink / raw) To: Haomai Wang; +Cc: Xinze Chi, ceph-devel 2015-03-16 22:06 GMT+08:00 Haomai Wang <haomaiwang@gmail.com>: > On Mon, Mar 16, 2015 at 10:04 PM, Xinze Chi <xmdxcxz@gmail.com> wrote: >> How to process the write request in primary? >> >> Thanks. >> >> 2015-03-16 22:01 GMT+08:00 Haomai Wang <haomaiwang@gmail.com>: >>> AFAR Pipe and AsyncConnection both will mark self fault and shutdown >>> socket and peer will detect this reset. So each side has chance to >>> rebuild the session. >>> >>> On Mon, Mar 16, 2015 at 9:19 PM, Xinze Chi <xmdxcxz@gmail.com> wrote: >>>> Such as, Client send write request to osd.0 (primary), osd.0 send >>>> MOSDSubOp to osd.1 and osd.2 >>>> >>>> osd.1 send reply to osd.0 (primary), but accident happened: >>>> >>>> 1. decode_message crc error when decode reply msg >>>> or >>>> 2. the reply msg is lost when send to osd.0, so osd.0 do not receive replay msg >>>> >>>> Could anyone tell me what is the behavior if osd.0 (primary)? >>>> > > osd.0 and osd.1 both will try to reconnect peer side, and the lost > message will be resend to osd.0 from osd.1 So I wonder if different routing path delays the arrival of one message, then the in_seq would be setting ahead, then based on the logic. Later, if the delaying message arrives, it will be dropping and discard. Thus, if it is just a sub_op reply message as xinze describes, how ceph works after that? It seems repop of the write Op will be waiting infinite times until the osd restart? > > > -- > Best Regards, > > Wheat > -- > To unsubscribe from this list: send the line "unsubscribe ceph-devel" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: crc error when decode_message? 2015-03-17 7:23 ` Ning Yao @ 2015-03-17 13:46 ` Sage Weil 2015-03-17 13:58 ` Gregory Farnum 0 siblings, 1 reply; 9+ messages in thread From: Sage Weil @ 2015-03-17 13:46 UTC (permalink / raw) To: Ning Yao; +Cc: Haomai Wang, Xinze Chi, ceph-devel On Tue, 17 Mar 2015, Ning Yao wrote: > 2015-03-16 22:06 GMT+08:00 Haomai Wang <haomaiwang@gmail.com>: > > On Mon, Mar 16, 2015 at 10:04 PM, Xinze Chi <xmdxcxz@gmail.com> wrote: > >> How to process the write request in primary? > >> > >> Thanks. > >> > >> 2015-03-16 22:01 GMT+08:00 Haomai Wang <haomaiwang@gmail.com>: > >>> AFAR Pipe and AsyncConnection both will mark self fault and shutdown > >>> socket and peer will detect this reset. So each side has chance to > >>> rebuild the session. > >>> > >>> On Mon, Mar 16, 2015 at 9:19 PM, Xinze Chi <xmdxcxz@gmail.com> wrote: > >>>> Such as, Client send write request to osd.0 (primary), osd.0 send > >>>> MOSDSubOp to osd.1 and osd.2 > >>>> > >>>> osd.1 send reply to osd.0 (primary), but accident happened: > >>>> > >>>> 1. decode_message crc error when decode reply msg > >>>> or > >>>> 2. the reply msg is lost when send to osd.0, so osd.0 do not receive replay msg > >>>> > >>>> Could anyone tell me what is the behavior if osd.0 (primary)? > >>>> > > > > osd.0 and osd.1 both will try to reconnect peer side, and the lost > > message will be resend to osd.0 from osd.1 > So I wonder if different routing path delays the arrival of one > message, then the in_seq would be setting ahead, then based on the > logic. Later, if the delaying message arrives, it will be dropping and > discard. Thus, if it is just a sub_op reply message as xinze > describes, how ceph works after that? It seems repop of the write Op > will be waiting infinite times until the osd restart? These sorts of scenarios are why src/msg/simple/Pipe.cc (an in particular, accept()) is not so simple. The case you describe is https://github.com/ceph/ceph/blob/master/src/msg/simple/Pipe.cc#L492 or https://github.com/ceph/ceph/blob/master/src/msg/simple/Pipe.cc#L492 In other words, this is all masked by the Messenger layer so that the higher layers (OSD.cc etc) see a single, ordered, reliable stream of messages and all of the failure/retry/reconnect logic is hidden. sage ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: crc error when decode_message? 2015-03-17 13:46 ` Sage Weil @ 2015-03-17 13:58 ` Gregory Farnum 2015-03-18 2:52 ` Ning Yao 0 siblings, 1 reply; 9+ messages in thread From: Gregory Farnum @ 2015-03-17 13:58 UTC (permalink / raw) To: Ning Yao; +Cc: Haomai Wang, Xinze Chi, ceph-devel On Tue, Mar 17, 2015 at 6:46 AM, Sage Weil <sage@newdream.net> wrote: > On Tue, 17 Mar 2015, Ning Yao wrote: >> 2015-03-16 22:06 GMT+08:00 Haomai Wang <haomaiwang@gmail.com>: >> > On Mon, Mar 16, 2015 at 10:04 PM, Xinze Chi <xmdxcxz@gmail.com> wrote: >> >> How to process the write request in primary? >> >> >> >> Thanks. >> >> >> >> 2015-03-16 22:01 GMT+08:00 Haomai Wang <haomaiwang@gmail.com>: >> >>> AFAR Pipe and AsyncConnection both will mark self fault and shutdown >> >>> socket and peer will detect this reset. So each side has chance to >> >>> rebuild the session. >> >>> >> >>> On Mon, Mar 16, 2015 at 9:19 PM, Xinze Chi <xmdxcxz@gmail.com> wrote: >> >>>> Such as, Client send write request to osd.0 (primary), osd.0 send >> >>>> MOSDSubOp to osd.1 and osd.2 >> >>>> >> >>>> osd.1 send reply to osd.0 (primary), but accident happened: >> >>>> >> >>>> 1. decode_message crc error when decode reply msg >> >>>> or >> >>>> 2. the reply msg is lost when send to osd.0, so osd.0 do not receive replay msg >> >>>> >> >>>> Could anyone tell me what is the behavior if osd.0 (primary)? >> >>>> >> > >> > osd.0 and osd.1 both will try to reconnect peer side, and the lost >> > message will be resend to osd.0 from osd.1 >> So I wonder if different routing path delays the arrival of one >> message, then the in_seq would be setting ahead, then based on the >> logic. Later, if the delaying message arrives, it will be dropping and >> discard. Thus, if it is just a sub_op reply message as xinze >> describes, how ceph works after that? It seems repop of the write Op >> will be waiting infinite times until the osd restart? > > These sorts of scenarios are why src/msg/simple/Pipe.cc (an in particular, > accept()) is not so simple. The case you describe is > > https://github.com/ceph/ceph/blob/master/src/msg/simple/Pipe.cc#L492 > or > https://github.com/ceph/ceph/blob/master/src/msg/simple/Pipe.cc#L492 > > In other words, this is all masked by the Messenger layer so that the > higher layers (OSD.cc etc) see a single, ordered, reliable stream of > messages and all of the failure/retry/reconnect logic is hidden. Just to be clear, that's the original described case of reconnecting. The different routing paths stuff are all handled by TCP underneath us, which is one of the reasons we use it. ;) -Greg ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: crc error when decode_message? 2015-03-17 13:58 ` Gregory Farnum @ 2015-03-18 2:52 ` Ning Yao 0 siblings, 0 replies; 9+ messages in thread From: Ning Yao @ 2015-03-18 2:52 UTC (permalink / raw) To: Gregory Farnum; +Cc: Haomai Wang, Xinze Chi, ceph-devel Thanks all guys. I got the ideas Regards Ning Yao 2015-03-17 21:58 GMT+08:00 Gregory Farnum <greg@gregs42.com>: > On Tue, Mar 17, 2015 at 6:46 AM, Sage Weil <sage@newdream.net> wrote: >> On Tue, 17 Mar 2015, Ning Yao wrote: >>> 2015-03-16 22:06 GMT+08:00 Haomai Wang <haomaiwang@gmail.com>: >>> > On Mon, Mar 16, 2015 at 10:04 PM, Xinze Chi <xmdxcxz@gmail.com> wrote: >>> >> How to process the write request in primary? >>> >> >>> >> Thanks. >>> >> >>> >> 2015-03-16 22:01 GMT+08:00 Haomai Wang <haomaiwang@gmail.com>: >>> >>> AFAR Pipe and AsyncConnection both will mark self fault and shutdown >>> >>> socket and peer will detect this reset. So each side has chance to >>> >>> rebuild the session. >>> >>> >>> >>> On Mon, Mar 16, 2015 at 9:19 PM, Xinze Chi <xmdxcxz@gmail.com> wrote: >>> >>>> Such as, Client send write request to osd.0 (primary), osd.0 send >>> >>>> MOSDSubOp to osd.1 and osd.2 >>> >>>> >>> >>>> osd.1 send reply to osd.0 (primary), but accident happened: >>> >>>> >>> >>>> 1. decode_message crc error when decode reply msg >>> >>>> or >>> >>>> 2. the reply msg is lost when send to osd.0, so osd.0 do not receive replay msg >>> >>>> >>> >>>> Could anyone tell me what is the behavior if osd.0 (primary)? >>> >>>> >>> > >>> > osd.0 and osd.1 both will try to reconnect peer side, and the lost >>> > message will be resend to osd.0 from osd.1 >>> So I wonder if different routing path delays the arrival of one >>> message, then the in_seq would be setting ahead, then based on the >>> logic. Later, if the delaying message arrives, it will be dropping and >>> discard. Thus, if it is just a sub_op reply message as xinze >>> describes, how ceph works after that? It seems repop of the write Op >>> will be waiting infinite times until the osd restart? >> >> These sorts of scenarios are why src/msg/simple/Pipe.cc (an in particular, >> accept()) is not so simple. The case you describe is >> >> https://github.com/ceph/ceph/blob/master/src/msg/simple/Pipe.cc#L492 >> or >> https://github.com/ceph/ceph/blob/master/src/msg/simple/Pipe.cc#L492 >> >> In other words, this is all masked by the Messenger layer so that the >> higher layers (OSD.cc etc) see a single, ordered, reliable stream of >> messages and all of the failure/retry/reconnect logic is hidden. > > Just to be clear, that's the original described case of reconnecting. > The different routing paths stuff are all handled by TCP underneath > us, which is one of the reasons we use it. ;) > -Greg ^ permalink raw reply [flat|nested] 9+ messages in thread
end of thread, other threads:[~2015-03-18 2:52 UTC | newest] Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2015-03-16 12:02 crc error when decode_message? Xinze Chi 2015-03-16 13:19 ` Xinze Chi 2015-03-16 14:01 ` Haomai Wang 2015-03-16 14:04 ` Xinze Chi 2015-03-16 14:06 ` Haomai Wang 2015-03-17 7:23 ` Ning Yao 2015-03-17 13:46 ` Sage Weil 2015-03-17 13:58 ` Gregory Farnum 2015-03-18 2:52 ` Ning Yao
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.