All of lore.kernel.org
 help / color / mirror / Atom feed
* crc error when decode_message?
@ 2015-03-16 12:02 Xinze Chi
  2015-03-16 13:19 ` Xinze Chi
  0 siblings, 1 reply; 9+ messages in thread
From: Xinze Chi @ 2015-03-16 12:02 UTC (permalink / raw)
  To: ceph-devel

hi, all:

      I want to know what is the behavior of primary when
decode_message crc error , such as read

ack response message from remote peer?

      Thanks.

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: crc error when decode_message?
  2015-03-16 12:02 crc error when decode_message? Xinze Chi
@ 2015-03-16 13:19 ` Xinze Chi
  2015-03-16 14:01   ` Haomai Wang
  0 siblings, 1 reply; 9+ messages in thread
From: Xinze Chi @ 2015-03-16 13:19 UTC (permalink / raw)
  To: ceph-devel

Such as, Client send write request to osd.0 (primary), osd.0 send
MOSDSubOp to osd.1 and osd.2

osd.1 send reply to osd.0 (primary), but accident happened:

1. decode_message crc error when decode reply msg
or
2. the reply msg is lost when send to osd.0, so osd.0 do not receive replay msg

Could anyone tell me what is the behavior if osd.0 (primary)?

Thanks

2015-03-16 20:02 GMT+08:00 Xinze Chi <xmdxcxz@gmail.com>:
> hi, all:
>
>       I want to know what is the behavior of primary when
> decode_message crc error , such as read
>
> ack response message from remote peer?
>
>       Thanks.

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: crc error when decode_message?
  2015-03-16 13:19 ` Xinze Chi
@ 2015-03-16 14:01   ` Haomai Wang
  2015-03-16 14:04     ` Xinze Chi
  0 siblings, 1 reply; 9+ messages in thread
From: Haomai Wang @ 2015-03-16 14:01 UTC (permalink / raw)
  To: Xinze Chi; +Cc: ceph-devel

AFAR Pipe and AsyncConnection both will mark self fault and shutdown
socket and peer will detect this reset. So each side has chance to
rebuild the session.

On Mon, Mar 16, 2015 at 9:19 PM, Xinze Chi <xmdxcxz@gmail.com> wrote:
> Such as, Client send write request to osd.0 (primary), osd.0 send
> MOSDSubOp to osd.1 and osd.2
>
> osd.1 send reply to osd.0 (primary), but accident happened:
>
> 1. decode_message crc error when decode reply msg
> or
> 2. the reply msg is lost when send to osd.0, so osd.0 do not receive replay msg
>
> Could anyone tell me what is the behavior if osd.0 (primary)?
>
> Thanks
>
> 2015-03-16 20:02 GMT+08:00 Xinze Chi <xmdxcxz@gmail.com>:
>> hi, all:
>>
>>       I want to know what is the behavior of primary when
>> decode_message crc error , such as read
>>
>> ack response message from remote peer?
>>
>>       Thanks.
> --
> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html



-- 
Best Regards,

Wheat

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: crc error when decode_message?
  2015-03-16 14:01   ` Haomai Wang
@ 2015-03-16 14:04     ` Xinze Chi
  2015-03-16 14:06       ` Haomai Wang
  0 siblings, 1 reply; 9+ messages in thread
From: Xinze Chi @ 2015-03-16 14:04 UTC (permalink / raw)
  To: Haomai Wang, ceph-devel

How to process the write request in primary?

Thanks.

2015-03-16 22:01 GMT+08:00 Haomai Wang <haomaiwang@gmail.com>:
> AFAR Pipe and AsyncConnection both will mark self fault and shutdown
> socket and peer will detect this reset. So each side has chance to
> rebuild the session.
>
> On Mon, Mar 16, 2015 at 9:19 PM, Xinze Chi <xmdxcxz@gmail.com> wrote:
>> Such as, Client send write request to osd.0 (primary), osd.0 send
>> MOSDSubOp to osd.1 and osd.2
>>
>> osd.1 send reply to osd.0 (primary), but accident happened:
>>
>> 1. decode_message crc error when decode reply msg
>> or
>> 2. the reply msg is lost when send to osd.0, so osd.0 do not receive replay msg
>>
>> Could anyone tell me what is the behavior if osd.0 (primary)?
>>
>> Thanks
>>
>> 2015-03-16 20:02 GMT+08:00 Xinze Chi <xmdxcxz@gmail.com>:
>>> hi, all:
>>>
>>>       I want to know what is the behavior of primary when
>>> decode_message crc error , such as read
>>>
>>> ack response message from remote peer?
>>>
>>>       Thanks.
>> --
>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
>> the body of a message to majordomo@vger.kernel.org
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>
>
>
> --
> Best Regards,
>
> Wheat

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: crc error when decode_message?
  2015-03-16 14:04     ` Xinze Chi
@ 2015-03-16 14:06       ` Haomai Wang
  2015-03-17  7:23         ` Ning Yao
  0 siblings, 1 reply; 9+ messages in thread
From: Haomai Wang @ 2015-03-16 14:06 UTC (permalink / raw)
  To: Xinze Chi; +Cc: ceph-devel

On Mon, Mar 16, 2015 at 10:04 PM, Xinze Chi <xmdxcxz@gmail.com> wrote:
> How to process the write request in primary?
>
> Thanks.
>
> 2015-03-16 22:01 GMT+08:00 Haomai Wang <haomaiwang@gmail.com>:
>> AFAR Pipe and AsyncConnection both will mark self fault and shutdown
>> socket and peer will detect this reset. So each side has chance to
>> rebuild the session.
>>
>> On Mon, Mar 16, 2015 at 9:19 PM, Xinze Chi <xmdxcxz@gmail.com> wrote:
>>> Such as, Client send write request to osd.0 (primary), osd.0 send
>>> MOSDSubOp to osd.1 and osd.2
>>>
>>> osd.1 send reply to osd.0 (primary), but accident happened:
>>>
>>> 1. decode_message crc error when decode reply msg
>>> or
>>> 2. the reply msg is lost when send to osd.0, so osd.0 do not receive replay msg
>>>
>>> Could anyone tell me what is the behavior if osd.0 (primary)?
>>>

osd.0 and osd.1 both will try to reconnect peer side, and the lost
message will be resend to osd.0 from osd.1

>>> Thanks
>>>
>>> 2015-03-16 20:02 GMT+08:00 Xinze Chi <xmdxcxz@gmail.com>:
>>>> hi, all:
>>>>
>>>>       I want to know what is the behavior of primary when
>>>> decode_message crc error , such as read
>>>>
>>>> ack response message from remote peer?
>>>>
>>>>       Thanks.
>>> --
>>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
>>> the body of a message to majordomo@vger.kernel.org
>>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>
>>
>>
>> --
>> Best Regards,
>>
>> Wheat



-- 
Best Regards,

Wheat

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: crc error when decode_message?
  2015-03-16 14:06       ` Haomai Wang
@ 2015-03-17  7:23         ` Ning Yao
  2015-03-17 13:46           ` Sage Weil
  0 siblings, 1 reply; 9+ messages in thread
From: Ning Yao @ 2015-03-17  7:23 UTC (permalink / raw)
  To: Haomai Wang; +Cc: Xinze Chi, ceph-devel

2015-03-16 22:06 GMT+08:00 Haomai Wang <haomaiwang@gmail.com>:
> On Mon, Mar 16, 2015 at 10:04 PM, Xinze Chi <xmdxcxz@gmail.com> wrote:
>> How to process the write request in primary?
>>
>> Thanks.
>>
>> 2015-03-16 22:01 GMT+08:00 Haomai Wang <haomaiwang@gmail.com>:
>>> AFAR Pipe and AsyncConnection both will mark self fault and shutdown
>>> socket and peer will detect this reset. So each side has chance to
>>> rebuild the session.
>>>
>>> On Mon, Mar 16, 2015 at 9:19 PM, Xinze Chi <xmdxcxz@gmail.com> wrote:
>>>> Such as, Client send write request to osd.0 (primary), osd.0 send
>>>> MOSDSubOp to osd.1 and osd.2
>>>>
>>>> osd.1 send reply to osd.0 (primary), but accident happened:
>>>>
>>>> 1. decode_message crc error when decode reply msg
>>>> or
>>>> 2. the reply msg is lost when send to osd.0, so osd.0 do not receive replay msg
>>>>
>>>> Could anyone tell me what is the behavior if osd.0 (primary)?
>>>>
>
> osd.0 and osd.1 both will try to reconnect peer side, and the lost
> message will be resend to osd.0 from osd.1
So I wonder if different routing path delays the arrival of one
message, then the in_seq would be setting ahead, then based on the
logic. Later, if the delaying message arrives, it will be dropping and
discard. Thus, if it is just a sub_op reply message as xinze
describes, how ceph works after that? It seems repop of the write Op
will be waiting infinite times until the osd restart?

>
>
> --
> Best Regards,
>
> Wheat
> --
> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: crc error when decode_message?
  2015-03-17  7:23         ` Ning Yao
@ 2015-03-17 13:46           ` Sage Weil
  2015-03-17 13:58             ` Gregory Farnum
  0 siblings, 1 reply; 9+ messages in thread
From: Sage Weil @ 2015-03-17 13:46 UTC (permalink / raw)
  To: Ning Yao; +Cc: Haomai Wang, Xinze Chi, ceph-devel

On Tue, 17 Mar 2015, Ning Yao wrote:
> 2015-03-16 22:06 GMT+08:00 Haomai Wang <haomaiwang@gmail.com>:
> > On Mon, Mar 16, 2015 at 10:04 PM, Xinze Chi <xmdxcxz@gmail.com> wrote:
> >> How to process the write request in primary?
> >>
> >> Thanks.
> >>
> >> 2015-03-16 22:01 GMT+08:00 Haomai Wang <haomaiwang@gmail.com>:
> >>> AFAR Pipe and AsyncConnection both will mark self fault and shutdown
> >>> socket and peer will detect this reset. So each side has chance to
> >>> rebuild the session.
> >>>
> >>> On Mon, Mar 16, 2015 at 9:19 PM, Xinze Chi <xmdxcxz@gmail.com> wrote:
> >>>> Such as, Client send write request to osd.0 (primary), osd.0 send
> >>>> MOSDSubOp to osd.1 and osd.2
> >>>>
> >>>> osd.1 send reply to osd.0 (primary), but accident happened:
> >>>>
> >>>> 1. decode_message crc error when decode reply msg
> >>>> or
> >>>> 2. the reply msg is lost when send to osd.0, so osd.0 do not receive replay msg
> >>>>
> >>>> Could anyone tell me what is the behavior if osd.0 (primary)?
> >>>>
> >
> > osd.0 and osd.1 both will try to reconnect peer side, and the lost
> > message will be resend to osd.0 from osd.1
> So I wonder if different routing path delays the arrival of one
> message, then the in_seq would be setting ahead, then based on the
> logic. Later, if the delaying message arrives, it will be dropping and
> discard. Thus, if it is just a sub_op reply message as xinze
> describes, how ceph works after that? It seems repop of the write Op
> will be waiting infinite times until the osd restart?

These sorts of scenarios are why src/msg/simple/Pipe.cc (an in particular, 
accept()) is not so simple.  The case you describe is 

 https://github.com/ceph/ceph/blob/master/src/msg/simple/Pipe.cc#L492
or
 https://github.com/ceph/ceph/blob/master/src/msg/simple/Pipe.cc#L492

In other words, this is all masked by the Messenger layer so that the 
higher layers (OSD.cc etc) see a single, ordered, reliable stream of 
messages and all of the failure/retry/reconnect logic is hidden.

sage

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: crc error when decode_message?
  2015-03-17 13:46           ` Sage Weil
@ 2015-03-17 13:58             ` Gregory Farnum
  2015-03-18  2:52               ` Ning Yao
  0 siblings, 1 reply; 9+ messages in thread
From: Gregory Farnum @ 2015-03-17 13:58 UTC (permalink / raw)
  To: Ning Yao; +Cc: Haomai Wang, Xinze Chi, ceph-devel

On Tue, Mar 17, 2015 at 6:46 AM, Sage Weil <sage@newdream.net> wrote:
> On Tue, 17 Mar 2015, Ning Yao wrote:
>> 2015-03-16 22:06 GMT+08:00 Haomai Wang <haomaiwang@gmail.com>:
>> > On Mon, Mar 16, 2015 at 10:04 PM, Xinze Chi <xmdxcxz@gmail.com> wrote:
>> >> How to process the write request in primary?
>> >>
>> >> Thanks.
>> >>
>> >> 2015-03-16 22:01 GMT+08:00 Haomai Wang <haomaiwang@gmail.com>:
>> >>> AFAR Pipe and AsyncConnection both will mark self fault and shutdown
>> >>> socket and peer will detect this reset. So each side has chance to
>> >>> rebuild the session.
>> >>>
>> >>> On Mon, Mar 16, 2015 at 9:19 PM, Xinze Chi <xmdxcxz@gmail.com> wrote:
>> >>>> Such as, Client send write request to osd.0 (primary), osd.0 send
>> >>>> MOSDSubOp to osd.1 and osd.2
>> >>>>
>> >>>> osd.1 send reply to osd.0 (primary), but accident happened:
>> >>>>
>> >>>> 1. decode_message crc error when decode reply msg
>> >>>> or
>> >>>> 2. the reply msg is lost when send to osd.0, so osd.0 do not receive replay msg
>> >>>>
>> >>>> Could anyone tell me what is the behavior if osd.0 (primary)?
>> >>>>
>> >
>> > osd.0 and osd.1 both will try to reconnect peer side, and the lost
>> > message will be resend to osd.0 from osd.1
>> So I wonder if different routing path delays the arrival of one
>> message, then the in_seq would be setting ahead, then based on the
>> logic. Later, if the delaying message arrives, it will be dropping and
>> discard. Thus, if it is just a sub_op reply message as xinze
>> describes, how ceph works after that? It seems repop of the write Op
>> will be waiting infinite times until the osd restart?
>
> These sorts of scenarios are why src/msg/simple/Pipe.cc (an in particular,
> accept()) is not so simple.  The case you describe is
>
>  https://github.com/ceph/ceph/blob/master/src/msg/simple/Pipe.cc#L492
> or
>  https://github.com/ceph/ceph/blob/master/src/msg/simple/Pipe.cc#L492
>
> In other words, this is all masked by the Messenger layer so that the
> higher layers (OSD.cc etc) see a single, ordered, reliable stream of
> messages and all of the failure/retry/reconnect logic is hidden.

Just to be clear, that's the original described case of reconnecting.
The different routing paths stuff are all handled by TCP underneath
us, which is one of the reasons we use it. ;)
-Greg

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: crc error when decode_message?
  2015-03-17 13:58             ` Gregory Farnum
@ 2015-03-18  2:52               ` Ning Yao
  0 siblings, 0 replies; 9+ messages in thread
From: Ning Yao @ 2015-03-18  2:52 UTC (permalink / raw)
  To: Gregory Farnum; +Cc: Haomai Wang, Xinze Chi, ceph-devel

Thanks all guys. I got the ideas
Regards
Ning Yao


2015-03-17 21:58 GMT+08:00 Gregory Farnum <greg@gregs42.com>:
> On Tue, Mar 17, 2015 at 6:46 AM, Sage Weil <sage@newdream.net> wrote:
>> On Tue, 17 Mar 2015, Ning Yao wrote:
>>> 2015-03-16 22:06 GMT+08:00 Haomai Wang <haomaiwang@gmail.com>:
>>> > On Mon, Mar 16, 2015 at 10:04 PM, Xinze Chi <xmdxcxz@gmail.com> wrote:
>>> >> How to process the write request in primary?
>>> >>
>>> >> Thanks.
>>> >>
>>> >> 2015-03-16 22:01 GMT+08:00 Haomai Wang <haomaiwang@gmail.com>:
>>> >>> AFAR Pipe and AsyncConnection both will mark self fault and shutdown
>>> >>> socket and peer will detect this reset. So each side has chance to
>>> >>> rebuild the session.
>>> >>>
>>> >>> On Mon, Mar 16, 2015 at 9:19 PM, Xinze Chi <xmdxcxz@gmail.com> wrote:
>>> >>>> Such as, Client send write request to osd.0 (primary), osd.0 send
>>> >>>> MOSDSubOp to osd.1 and osd.2
>>> >>>>
>>> >>>> osd.1 send reply to osd.0 (primary), but accident happened:
>>> >>>>
>>> >>>> 1. decode_message crc error when decode reply msg
>>> >>>> or
>>> >>>> 2. the reply msg is lost when send to osd.0, so osd.0 do not receive replay msg
>>> >>>>
>>> >>>> Could anyone tell me what is the behavior if osd.0 (primary)?
>>> >>>>
>>> >
>>> > osd.0 and osd.1 both will try to reconnect peer side, and the lost
>>> > message will be resend to osd.0 from osd.1
>>> So I wonder if different routing path delays the arrival of one
>>> message, then the in_seq would be setting ahead, then based on the
>>> logic. Later, if the delaying message arrives, it will be dropping and
>>> discard. Thus, if it is just a sub_op reply message as xinze
>>> describes, how ceph works after that? It seems repop of the write Op
>>> will be waiting infinite times until the osd restart?
>>
>> These sorts of scenarios are why src/msg/simple/Pipe.cc (an in particular,
>> accept()) is not so simple.  The case you describe is
>>
>>  https://github.com/ceph/ceph/blob/master/src/msg/simple/Pipe.cc#L492
>> or
>>  https://github.com/ceph/ceph/blob/master/src/msg/simple/Pipe.cc#L492
>>
>> In other words, this is all masked by the Messenger layer so that the
>> higher layers (OSD.cc etc) see a single, ordered, reliable stream of
>> messages and all of the failure/retry/reconnect logic is hidden.
>
> Just to be clear, that's the original described case of reconnecting.
> The different routing paths stuff are all handled by TCP underneath
> us, which is one of the reasons we use it. ;)
> -Greg

^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2015-03-18  2:52 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-03-16 12:02 crc error when decode_message? Xinze Chi
2015-03-16 13:19 ` Xinze Chi
2015-03-16 14:01   ` Haomai Wang
2015-03-16 14:04     ` Xinze Chi
2015-03-16 14:06       ` Haomai Wang
2015-03-17  7:23         ` Ning Yao
2015-03-17 13:46           ` Sage Weil
2015-03-17 13:58             ` Gregory Farnum
2015-03-18  2:52               ` Ning Yao

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.