All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v2 0/2] libceph: fix sparse-read failure bug
@ 2023-12-08 16:05 xiubli
  2023-12-08 16:06 ` [PATCH v2 1/2] libceph: fail the sparse-read if there still has data in socket xiubli
  2023-12-08 16:06 ` [PATCH v2 2/2] libceph: just wait for more data to be available on the socket xiubli
  0 siblings, 2 replies; 11+ messages in thread
From: xiubli @ 2023-12-08 16:05 UTC (permalink / raw)
  To: idryomov, ceph-devel; +Cc: jlayton, vshankar, mchangir, Xiubo Li

From: Xiubo Li <xiubli@redhat.com>

The debug logs:

7271665 <7>[120727.292870] libceph:  get_reply tid 5526 000000002bfe53c9
7271666 <7>[120727.292875] libceph:  get_osd 000000009e8420b7 4 -> 5
7271667 <7>[120727.292882] libceph:  prep_next_sparse_read: [2] starting new sparse read req. srlen=0x7000
7271668 <7>[120727.292887] libceph:  prep_next_sparse_read: [2] new sparse read op at idx 0 0x60000~0x7000
7271669 <7>[120727.292894] libceph:  [2] got 1 extents
7271670 <7>[120727.292900] libceph:  [2] ext 0 off 0x60000 len 0x4000
7271671 <7>[120727.292912] libceph:  prep_next_sparse_read: [2] completed extent array len 1 cursor->resid 12288
7271672 <7>[120727.292917] libceph:  read_partial left: 21, have: 0, con->v1.in_base_pos: 53
7271673 <7>[120727.292923] libceph:  read_partial left: 7, have: 14, con->v1.in_base_pos: 67     ====> there were 7 bytes not received
7271674 <7>[120727.292928] libceph:  read_partial return 0
7271675 <7>[120727.292931] libceph:  try_read done on 00000000ddd953f1 ret 0
7271676 <7>[120727.292935] libceph:  try_write start 00000000ddd953f1 state 12
7271677 <7>[120727.292939] libceph:  try_write out_kvec_bytes 0
7271678 <7>[120727.292943] libceph:  try_write nothing else to write.
7271679 <7>[120727.292948] libceph:  try_write done on 00000000ddd953f1 ret 0
7271680 <7>[120727.292955] libceph:  put_osd 000000009e8420b7 5 -> 4
7271681 <7>[120727.293021] libceph:  ceph_sock_data_ready 00000000ddd953f1 state = 12, queueing work
7271682 <7>[120727.293029] libceph:  get_osd 000000009e8420b7 4 -> 5
7271683 <7>[120727.293041] libceph:  queue_con_delay 00000000ddd953f1 0
7271684 <7>[120727.293134] libceph:  try_read start 00000000ddd953f1 state 12
7271685 <7>[120727.293141] libceph:  try_read tag 7 in_base_pos 67
7271686 <7>[120727.293145] libceph:  read_partial_message con 00000000ddd953f1 msg 000000002bfe53c9
7271687 <7>[120727.293150] libceph:  read_partial return 1
7271688 <7>[120727.293154] libceph:  read_partial left: 7, have: 14, con->v1.in_base_pos: 67     ====> the left 7 bytes came
7271689 <7>[120727.293189] libceph:  read_partial return 1
7271690 <7>[120727.293193] libceph:  read_partial_message got msg 000000002bfe53c9 164 (216900879) + 0 (0) + 16408 (1227708997)
7271691 <7>[120727.293203] libceph:  ===== 000000002bfe53c9 3092 from osd2 43=osd_opreply len 164+0+16408 (216900879 0 1227708997) =====
7271692 <7>[120727.293211] libceph:  handle_reply msg 000000002bfe53c9 tid 5526                  ====> the req was successfully finished
7271693 <7>[120727.293217] libceph:  handle_reply req 00000000b7727657 tid 5526 flags 0x400015 pgid 3.55 epoch 52 attempt 0 v 0'0 uv 2275
7271694 <7>[120727.293225] libceph:   req 00000000b7727657 tid 5526 op 0 rval 0 len 16408
7271695 <7>[120727.293231] libceph:  handle_reply req 00000000b7727657 tid 5526 result 0 data_len 16408
7271696 <7>[120727.293236] libceph:  finish_request req 00000000b7727657 tid 5526
7271697 <7>[120727.293241] libceph:  unlink_request osd 000000009e8420b7 osd2 req 00000000b7727657 tid 5526



V2:
- fix the sparse-read bug in the sparse-read code instead

Xiubo Li (2):
  libceph: fail the sparse-read if there still has data in socket
  libceph: just wait for more data to be available on the socket

 include/linux/ceph/osd_client.h |  1 +
 net/ceph/osd_client.c           | 10 ++++++----
 2 files changed, 7 insertions(+), 4 deletions(-)

-- 
2.43.0


^ permalink raw reply	[flat|nested] 11+ messages in thread

* [PATCH v2 1/2] libceph: fail the sparse-read if there still has data in socket
  2023-12-08 16:05 [PATCH v2 0/2] libceph: fix sparse-read failure bug xiubli
@ 2023-12-08 16:06 ` xiubli
  2023-12-08 16:06 ` [PATCH v2 2/2] libceph: just wait for more data to be available on the socket xiubli
  1 sibling, 0 replies; 11+ messages in thread
From: xiubli @ 2023-12-08 16:06 UTC (permalink / raw)
  To: idryomov, ceph-devel; +Cc: jlayton, vshankar, mchangir, Xiubo Li

From: Xiubo Li <xiubli@redhat.com>

Once this happens that means there have bugs.

URL: https://tracker.ceph.com/issues/63586
Signed-off-by: Xiubo Li <xiubli@redhat.com>
---
 net/ceph/osd_client.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/net/ceph/osd_client.c b/net/ceph/osd_client.c
index 5753036d1957..848ef19055a0 100644
--- a/net/ceph/osd_client.c
+++ b/net/ceph/osd_client.c
@@ -5912,10 +5912,12 @@ static int osd_sparse_read(struct ceph_connection *con,
 		fallthrough;
 	case CEPH_SPARSE_READ_DATA:
 		if (sr->sr_index >= count) {
-			if (sr->sr_datalen && count)
+			if (sr->sr_datalen) {
 				pr_warn_ratelimited("sr_datalen %u sr_index %d count %u\n",
 						    sr->sr_datalen, sr->sr_index,
 						    count);
+				return -EREMOTEIO;
+			}
 
 			sr->sr_state = CEPH_SPARSE_READ_HDR;
 			goto next_op;
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [PATCH v2 2/2] libceph: just wait for more data to be available on the socket
  2023-12-08 16:05 [PATCH v2 0/2] libceph: fix sparse-read failure bug xiubli
  2023-12-08 16:06 ` [PATCH v2 1/2] libceph: fail the sparse-read if there still has data in socket xiubli
@ 2023-12-08 16:06 ` xiubli
  2023-12-12 16:31   ` Ilya Dryomov
  1 sibling, 1 reply; 11+ messages in thread
From: xiubli @ 2023-12-08 16:06 UTC (permalink / raw)
  To: idryomov, ceph-devel; +Cc: jlayton, vshankar, mchangir, Xiubo Li

From: Xiubo Li <xiubli@redhat.com>

The messages from ceph maybe split into multiple socket packages
and we just need to wait for all the data to be availiable on the
sokcet.

This will add a new _FINISH state for the sparse-read to mark the
current sparse-read succeeded. Else it will treat it as a new
sparse-read when the socket receives the last piece of the osd
request reply message, and the cancel_request() later will help
init the sparse-read context.

URL: https://tracker.ceph.com/issues/63586
Signed-off-by: Xiubo Li <xiubli@redhat.com>
---
 include/linux/ceph/osd_client.h | 1 +
 net/ceph/osd_client.c           | 6 +++---
 2 files changed, 4 insertions(+), 3 deletions(-)

diff --git a/include/linux/ceph/osd_client.h b/include/linux/ceph/osd_client.h
index 493de3496cd3..00d98e13100f 100644
--- a/include/linux/ceph/osd_client.h
+++ b/include/linux/ceph/osd_client.h
@@ -47,6 +47,7 @@ enum ceph_sparse_read_state {
 	CEPH_SPARSE_READ_DATA_LEN,
 	CEPH_SPARSE_READ_DATA_PRE,
 	CEPH_SPARSE_READ_DATA,
+	CEPH_SPARSE_READ_FINISH,
 };
 
 /*
diff --git a/net/ceph/osd_client.c b/net/ceph/osd_client.c
index 848ef19055a0..f1705b4f19eb 100644
--- a/net/ceph/osd_client.c
+++ b/net/ceph/osd_client.c
@@ -5802,8 +5802,6 @@ static int prep_next_sparse_read(struct ceph_connection *con,
 			advance_cursor(cursor, sr->sr_req_len - end, false);
 	}
 
-	ceph_init_sparse_read(sr);
-
 	/* find next op in this request (if any) */
 	while (++o->o_sparse_op_idx < req->r_num_ops) {
 		op = &req->r_ops[o->o_sparse_op_idx];
@@ -5919,7 +5917,7 @@ static int osd_sparse_read(struct ceph_connection *con,
 				return -EREMOTEIO;
 			}
 
-			sr->sr_state = CEPH_SPARSE_READ_HDR;
+			sr->sr_state = CEPH_SPARSE_READ_FINISH;
 			goto next_op;
 		}
 
@@ -5952,6 +5950,8 @@ static int osd_sparse_read(struct ceph_connection *con,
 		/* Bump the array index */
 		++sr->sr_index;
 		break;
+	case CEPH_SPARSE_READ_FINISH:
+		return 0;
 	}
 	return ret;
 }
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 11+ messages in thread

* Re: [PATCH v2 2/2] libceph: just wait for more data to be available on the socket
  2023-12-08 16:06 ` [PATCH v2 2/2] libceph: just wait for more data to be available on the socket xiubli
@ 2023-12-12 16:31   ` Ilya Dryomov
  2023-12-13  1:01     ` Xiubo Li
  2023-12-13  1:13     ` Xiubo Li
  0 siblings, 2 replies; 11+ messages in thread
From: Ilya Dryomov @ 2023-12-12 16:31 UTC (permalink / raw)
  To: xiubli; +Cc: ceph-devel, jlayton, vshankar, mchangir

On Fri, Dec 8, 2023 at 5:08 PM <xiubli@redhat.com> wrote:
>
> From: Xiubo Li <xiubli@redhat.com>
>
> The messages from ceph maybe split into multiple socket packages
> and we just need to wait for all the data to be availiable on the
> sokcet.
>
> This will add a new _FINISH state for the sparse-read to mark the
> current sparse-read succeeded. Else it will treat it as a new
> sparse-read when the socket receives the last piece of the osd
> request reply message, and the cancel_request() later will help
> init the sparse-read context.
>
> URL: https://tracker.ceph.com/issues/63586
> Signed-off-by: Xiubo Li <xiubli@redhat.com>
> ---
>  include/linux/ceph/osd_client.h | 1 +
>  net/ceph/osd_client.c           | 6 +++---
>  2 files changed, 4 insertions(+), 3 deletions(-)
>
> diff --git a/include/linux/ceph/osd_client.h b/include/linux/ceph/osd_client.h
> index 493de3496cd3..00d98e13100f 100644
> --- a/include/linux/ceph/osd_client.h
> +++ b/include/linux/ceph/osd_client.h
> @@ -47,6 +47,7 @@ enum ceph_sparse_read_state {
>         CEPH_SPARSE_READ_DATA_LEN,
>         CEPH_SPARSE_READ_DATA_PRE,
>         CEPH_SPARSE_READ_DATA,
> +       CEPH_SPARSE_READ_FINISH,
>  };
>
>  /*
> diff --git a/net/ceph/osd_client.c b/net/ceph/osd_client.c
> index 848ef19055a0..f1705b4f19eb 100644
> --- a/net/ceph/osd_client.c
> +++ b/net/ceph/osd_client.c
> @@ -5802,8 +5802,6 @@ static int prep_next_sparse_read(struct ceph_connection *con,
>                         advance_cursor(cursor, sr->sr_req_len - end, false);
>         }
>
> -       ceph_init_sparse_read(sr);
> -
>         /* find next op in this request (if any) */
>         while (++o->o_sparse_op_idx < req->r_num_ops) {
>                 op = &req->r_ops[o->o_sparse_op_idx];
> @@ -5919,7 +5917,7 @@ static int osd_sparse_read(struct ceph_connection *con,
>                                 return -EREMOTEIO;
>                         }
>
> -                       sr->sr_state = CEPH_SPARSE_READ_HDR;
> +                       sr->sr_state = CEPH_SPARSE_READ_FINISH;
>                         goto next_op;

Hi Xiubo,

This code appears to be set up to handle multiple (sparse-read) ops in
a single OSD request.  Wouldn't this break that case?

I think the bug is in read_sparse_msg_data().  It shouldn't be calling
con->ops->sparse_read() after the message has progressed to the footer.
osd_sparse_read() is most likely fine as is.

Notice how read_partial_msg_data() and read_partial_msg_data_bounce()
behave: if called at that point (i.e. after the data section has been
processed, meaning that cursor->total_resid == 0), they do nothing.
read_sparse_msg_data() should have a similar condition and basically
no-op itself.

While at it, let's rename it to read_partial_sparse_msg_data() to
emphasize the "partial"/no-op semantics that are required there.

Thanks,

                Ilya

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH v2 2/2] libceph: just wait for more data to be available on the socket
  2023-12-12 16:31   ` Ilya Dryomov
@ 2023-12-13  1:01     ` Xiubo Li
  2023-12-13 10:31       ` Ilya Dryomov
  2023-12-13  1:13     ` Xiubo Li
  1 sibling, 1 reply; 11+ messages in thread
From: Xiubo Li @ 2023-12-13  1:01 UTC (permalink / raw)
  To: Ilya Dryomov; +Cc: ceph-devel, jlayton, vshankar, mchangir


On 12/13/23 00:31, Ilya Dryomov wrote:
> On Fri, Dec 8, 2023 at 5:08 PM <xiubli@redhat.com> wrote:
>> From: Xiubo Li <xiubli@redhat.com>
>>
>> The messages from ceph maybe split into multiple socket packages
>> and we just need to wait for all the data to be availiable on the
>> sokcet.
>>
>> This will add a new _FINISH state for the sparse-read to mark the
>> current sparse-read succeeded. Else it will treat it as a new
>> sparse-read when the socket receives the last piece of the osd
>> request reply message, and the cancel_request() later will help
>> init the sparse-read context.
>>
>> URL: https://tracker.ceph.com/issues/63586
>> Signed-off-by: Xiubo Li <xiubli@redhat.com>
>> ---
>>   include/linux/ceph/osd_client.h | 1 +
>>   net/ceph/osd_client.c           | 6 +++---
>>   2 files changed, 4 insertions(+), 3 deletions(-)
>>
>> diff --git a/include/linux/ceph/osd_client.h b/include/linux/ceph/osd_client.h
>> index 493de3496cd3..00d98e13100f 100644
>> --- a/include/linux/ceph/osd_client.h
>> +++ b/include/linux/ceph/osd_client.h
>> @@ -47,6 +47,7 @@ enum ceph_sparse_read_state {
>>          CEPH_SPARSE_READ_DATA_LEN,
>>          CEPH_SPARSE_READ_DATA_PRE,
>>          CEPH_SPARSE_READ_DATA,
>> +       CEPH_SPARSE_READ_FINISH,
>>   };
>>
>>   /*
>> diff --git a/net/ceph/osd_client.c b/net/ceph/osd_client.c
>> index 848ef19055a0..f1705b4f19eb 100644
>> --- a/net/ceph/osd_client.c
>> +++ b/net/ceph/osd_client.c
>> @@ -5802,8 +5802,6 @@ static int prep_next_sparse_read(struct ceph_connection *con,
>>                          advance_cursor(cursor, sr->sr_req_len - end, false);
>>          }
>>
>> -       ceph_init_sparse_read(sr);
>> -
>>          /* find next op in this request (if any) */
>>          while (++o->o_sparse_op_idx < req->r_num_ops) {
>>                  op = &req->r_ops[o->o_sparse_op_idx];
>> @@ -5919,7 +5917,7 @@ static int osd_sparse_read(struct ceph_connection *con,
>>                                  return -EREMOTEIO;
>>                          }
>>
>> -                       sr->sr_state = CEPH_SPARSE_READ_HDR;
>> +                       sr->sr_state = CEPH_SPARSE_READ_FINISH;
>>                          goto next_op;
> Hi Xiubo,
>
> This code appears to be set up to handle multiple (sparse-read) ops in
> a single OSD request.  Wouldn't this break that case?

Yeah, it will break it. I just missed it.

> I think the bug is in read_sparse_msg_data().  It shouldn't be calling
> con->ops->sparse_read() after the message has progressed to the footer.
> osd_sparse_read() is most likely fine as is.

Yes it is. We cannot tell exactly whether has it progressed to the 
footer IMO, such as when in case 'con->v1.in_base_pos == 
sizeof(con->v1.in_hdr)' the socket buffer may break just after finishing 
sparse-read and before reading footer or some where in sparse-read. For 
the later case then we should continue in the sparse reads.


> Notice how read_partial_msg_data() and read_partial_msg_data_bounce()
> behave: if called at that point (i.e. after the data section has been
> processed, meaning that cursor->total_resid == 0), they do nothing.
> read_sparse_msg_data() should have a similar condition and basically
> no-op itself.

Correct, this what I want to do in the sparse-read code.

>
> While at it, let's rename it to read_partial_sparse_msg_data() to
> emphasize the "partial"/no-op semantics that are required there.

Sure.

Thanks

- Xiubo


> Thanks,
>
>                  Ilya
>


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH v2 2/2] libceph: just wait for more data to be available on the socket
  2023-12-12 16:31   ` Ilya Dryomov
  2023-12-13  1:01     ` Xiubo Li
@ 2023-12-13  1:13     ` Xiubo Li
  1 sibling, 0 replies; 11+ messages in thread
From: Xiubo Li @ 2023-12-13  1:13 UTC (permalink / raw)
  To: Ilya Dryomov, Jeff Layton; +Cc: ceph-devel, jlayton, vshankar, mchangir

Hi Ilya,

I just fixed it and it will be something like this below, I haven't 
tested it yet because I am running other tests locally.

This time it will set the state to 'CEPH_SPARSE_READ_FINISH' when the 
last sparse-read op is successfully read.

diff --git a/include/linux/ceph/osd_client.h 
b/include/linux/ceph/osd_client.h
index 493de3496cd3..00d98e13100f 100644
--- a/include/linux/ceph/osd_client.h
+++ b/include/linux/ceph/osd_client.h
@@ -47,6 +47,7 @@ enum ceph_sparse_read_state {
         CEPH_SPARSE_READ_DATA_LEN,
         CEPH_SPARSE_READ_DATA_PRE,
         CEPH_SPARSE_READ_DATA,
+       CEPH_SPARSE_READ_FINISH,
  };

  /*
diff --git a/net/ceph/osd_client.c b/net/ceph/osd_client.c
index 848ef19055a0..b3b61f010428 100644
--- a/net/ceph/osd_client.c
+++ b/net/ceph/osd_client.c
@@ -5813,6 +5813,7 @@ static int prep_next_sparse_read(struct 
ceph_connection *con,

         /* reset for next sparse read request */
         spin_unlock(&o->o_requests_lock);
+       sr->sr_state = CEPH_SPARSE_READ_FINISH;
         o->o_sparse_op_idx = -1;
         return 0;
  found:
@@ -5918,8 +5919,6 @@ static int osd_sparse_read(struct ceph_connection 
*con,
                                                     count);
                                 return -EREMOTEIO;
                         }
-
-                       sr->sr_state = CEPH_SPARSE_READ_HDR;
                         goto next_op;
                 }

@@ -5952,6 +5951,8 @@ static int osd_sparse_read(struct ceph_connection 
*con,
                 /* Bump the array index */
                 ++sr->sr_index;
                 break;
+       case CEPH_SPARSE_READ_FINISH:
+               return 0;
         }
         return ret;
  }

I will send out the V3 after my testing.


Jeff,

Could you help review it, want to make sure that it won't break anything 
here for sparse read.


Thanks

- Xiubo


On 12/13/23 00:31, Ilya Dryomov wrote:
> On Fri, Dec 8, 2023 at 5:08 PM <xiubli@redhat.com> wrote:
>> From: Xiubo Li <xiubli@redhat.com>
>>
>> The messages from ceph maybe split into multiple socket packages
>> and we just need to wait for all the data to be availiable on the
>> sokcet.
>>
>> This will add a new _FINISH state for the sparse-read to mark the
>> current sparse-read succeeded. Else it will treat it as a new
>> sparse-read when the socket receives the last piece of the osd
>> request reply message, and the cancel_request() later will help
>> init the sparse-read context.
>>
>> URL: https://tracker.ceph.com/issues/63586
>> Signed-off-by: Xiubo Li <xiubli@redhat.com>
>> ---
>>   include/linux/ceph/osd_client.h | 1 +
>>   net/ceph/osd_client.c           | 6 +++---
>>   2 files changed, 4 insertions(+), 3 deletions(-)
>>
>> diff --git a/include/linux/ceph/osd_client.h b/include/linux/ceph/osd_client.h
>> index 493de3496cd3..00d98e13100f 100644
>> --- a/include/linux/ceph/osd_client.h
>> +++ b/include/linux/ceph/osd_client.h
>> @@ -47,6 +47,7 @@ enum ceph_sparse_read_state {
>>          CEPH_SPARSE_READ_DATA_LEN,
>>          CEPH_SPARSE_READ_DATA_PRE,
>>          CEPH_SPARSE_READ_DATA,
>> +       CEPH_SPARSE_READ_FINISH,
>>   };
>>
>>   /*
>> diff --git a/net/ceph/osd_client.c b/net/ceph/osd_client.c
>> index 848ef19055a0..f1705b4f19eb 100644
>> --- a/net/ceph/osd_client.c
>> +++ b/net/ceph/osd_client.c
>> @@ -5802,8 +5802,6 @@ static int prep_next_sparse_read(struct ceph_connection *con,
>>                          advance_cursor(cursor, sr->sr_req_len - end, false);
>>          }
>>
>> -       ceph_init_sparse_read(sr);
>> -
>>          /* find next op in this request (if any) */
>>          while (++o->o_sparse_op_idx < req->r_num_ops) {
>>                  op = &req->r_ops[o->o_sparse_op_idx];
>> @@ -5919,7 +5917,7 @@ static int osd_sparse_read(struct ceph_connection *con,
>>                                  return -EREMOTEIO;
>>                          }
>>
>> -                       sr->sr_state = CEPH_SPARSE_READ_HDR;
>> +                       sr->sr_state = CEPH_SPARSE_READ_FINISH;
>>                          goto next_op;
> Hi Xiubo,
>
> This code appears to be set up to handle multiple (sparse-read) ops in
> a single OSD request.  Wouldn't this break that case?
>
> I think the bug is in read_sparse_msg_data().  It shouldn't be calling
> con->ops->sparse_read() after the message has progressed to the footer.
> osd_sparse_read() is most likely fine as is.
>
> Notice how read_partial_msg_data() and read_partial_msg_data_bounce()
> behave: if called at that point (i.e. after the data section has been
> processed, meaning that cursor->total_resid == 0), they do nothing.
> read_sparse_msg_data() should have a similar condition and basically
> no-op itself.
>
> While at it, let's rename it to read_partial_sparse_msg_data() to
> emphasize the "partial"/no-op semantics that are required there.
>
> Thanks,
>
>                  Ilya
>


^ permalink raw reply related	[flat|nested] 11+ messages in thread

* Re: [PATCH v2 2/2] libceph: just wait for more data to be available on the socket
  2023-12-13  1:01     ` Xiubo Li
@ 2023-12-13 10:31       ` Ilya Dryomov
  2023-12-13 11:03         ` Xiubo Li
  0 siblings, 1 reply; 11+ messages in thread
From: Ilya Dryomov @ 2023-12-13 10:31 UTC (permalink / raw)
  To: Xiubo Li; +Cc: ceph-devel, jlayton, vshankar, mchangir

On Wed, Dec 13, 2023 at 2:02 AM Xiubo Li <xiubli@redhat.com> wrote:
>
>
> On 12/13/23 00:31, Ilya Dryomov wrote:
> > On Fri, Dec 8, 2023 at 5:08 PM <xiubli@redhat.com> wrote:
> >> From: Xiubo Li <xiubli@redhat.com>
> >>
> >> The messages from ceph maybe split into multiple socket packages
> >> and we just need to wait for all the data to be availiable on the
> >> sokcet.
> >>
> >> This will add a new _FINISH state for the sparse-read to mark the
> >> current sparse-read succeeded. Else it will treat it as a new
> >> sparse-read when the socket receives the last piece of the osd
> >> request reply message, and the cancel_request() later will help
> >> init the sparse-read context.
> >>
> >> URL: https://tracker.ceph.com/issues/63586
> >> Signed-off-by: Xiubo Li <xiubli@redhat.com>
> >> ---
> >>   include/linux/ceph/osd_client.h | 1 +
> >>   net/ceph/osd_client.c           | 6 +++---
> >>   2 files changed, 4 insertions(+), 3 deletions(-)
> >>
> >> diff --git a/include/linux/ceph/osd_client.h b/include/linux/ceph/osd_client.h
> >> index 493de3496cd3..00d98e13100f 100644
> >> --- a/include/linux/ceph/osd_client.h
> >> +++ b/include/linux/ceph/osd_client.h
> >> @@ -47,6 +47,7 @@ enum ceph_sparse_read_state {
> >>          CEPH_SPARSE_READ_DATA_LEN,
> >>          CEPH_SPARSE_READ_DATA_PRE,
> >>          CEPH_SPARSE_READ_DATA,
> >> +       CEPH_SPARSE_READ_FINISH,
> >>   };
> >>
> >>   /*
> >> diff --git a/net/ceph/osd_client.c b/net/ceph/osd_client.c
> >> index 848ef19055a0..f1705b4f19eb 100644
> >> --- a/net/ceph/osd_client.c
> >> +++ b/net/ceph/osd_client.c
> >> @@ -5802,8 +5802,6 @@ static int prep_next_sparse_read(struct ceph_connection *con,
> >>                          advance_cursor(cursor, sr->sr_req_len - end, false);
> >>          }
> >>
> >> -       ceph_init_sparse_read(sr);
> >> -
> >>          /* find next op in this request (if any) */
> >>          while (++o->o_sparse_op_idx < req->r_num_ops) {
> >>                  op = &req->r_ops[o->o_sparse_op_idx];
> >> @@ -5919,7 +5917,7 @@ static int osd_sparse_read(struct ceph_connection *con,
> >>                                  return -EREMOTEIO;
> >>                          }
> >>
> >> -                       sr->sr_state = CEPH_SPARSE_READ_HDR;
> >> +                       sr->sr_state = CEPH_SPARSE_READ_FINISH;
> >>                          goto next_op;
> > Hi Xiubo,
> >
> > This code appears to be set up to handle multiple (sparse-read) ops in
> > a single OSD request.  Wouldn't this break that case?
>
> Yeah, it will break it. I just missed it.
>
> > I think the bug is in read_sparse_msg_data().  It shouldn't be calling
> > con->ops->sparse_read() after the message has progressed to the footer.
> > osd_sparse_read() is most likely fine as is.
>
> Yes it is. We cannot tell exactly whether has it progressed to the
> footer IMO, such as when in case 'con->v1.in_base_pos ==

Hi Xiubo,

I don't buy this.  If the messenger is trying to read the footer, it
_has_ progressed to the footer.  This is definitive and irreversible,
not a "maybe".

> sizeof(con->v1.in_hdr)' the socket buffer may break just after finishing
> sparse-read and before reading footer or some where in sparse-read. For
> the later case then we should continue in the sparse reads.

The size of the data section of the message is always known.  The
contract is that read_partial_msg_data*() returns 1 and does nothing
else after the data section is consumed.  This is how the messenger is
told to move on to the footer.

read_partial_sparse_msg_data() doesn't adhere to this contract and
should be fixed.

>
>
> > Notice how read_partial_msg_data() and read_partial_msg_data_bounce()
> > behave: if called at that point (i.e. after the data section has been
> > processed, meaning that cursor->total_resid == 0), they do nothing.
> > read_sparse_msg_data() should have a similar condition and basically
> > no-op itself.
>
> Correct, this what I want to do in the sparse-read code.

No, this needs to be done on the messenger side.  sparse-read code
should not be invoked after the messenger has moved on to the footer.

Thanks,

                Ilya

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH v2 2/2] libceph: just wait for more data to be available on the socket
  2023-12-13 10:31       ` Ilya Dryomov
@ 2023-12-13 11:03         ` Xiubo Li
  2023-12-13 11:54           ` Ilya Dryomov
  0 siblings, 1 reply; 11+ messages in thread
From: Xiubo Li @ 2023-12-13 11:03 UTC (permalink / raw)
  To: Ilya Dryomov; +Cc: ceph-devel, jlayton, vshankar, mchangir


On 12/13/23 18:31, Ilya Dryomov wrote:
> On Wed, Dec 13, 2023 at 2:02 AM Xiubo Li <xiubli@redhat.com> wrote:
>>
>> On 12/13/23 00:31, Ilya Dryomov wrote:
>>> On Fri, Dec 8, 2023 at 5:08 PM <xiubli@redhat.com> wrote:
>>>> From: Xiubo Li <xiubli@redhat.com>
>>>>
>>>> The messages from ceph maybe split into multiple socket packages
>>>> and we just need to wait for all the data to be availiable on the
>>>> sokcet.
>>>>
>>>> This will add a new _FINISH state for the sparse-read to mark the
>>>> current sparse-read succeeded. Else it will treat it as a new
>>>> sparse-read when the socket receives the last piece of the osd
>>>> request reply message, and the cancel_request() later will help
>>>> init the sparse-read context.
>>>>
>>>> URL: https://tracker.ceph.com/issues/63586
>>>> Signed-off-by: Xiubo Li <xiubli@redhat.com>
>>>> ---
>>>>    include/linux/ceph/osd_client.h | 1 +
>>>>    net/ceph/osd_client.c           | 6 +++---
>>>>    2 files changed, 4 insertions(+), 3 deletions(-)
>>>>
>>>> diff --git a/include/linux/ceph/osd_client.h b/include/linux/ceph/osd_client.h
>>>> index 493de3496cd3..00d98e13100f 100644
>>>> --- a/include/linux/ceph/osd_client.h
>>>> +++ b/include/linux/ceph/osd_client.h
>>>> @@ -47,6 +47,7 @@ enum ceph_sparse_read_state {
>>>>           CEPH_SPARSE_READ_DATA_LEN,
>>>>           CEPH_SPARSE_READ_DATA_PRE,
>>>>           CEPH_SPARSE_READ_DATA,
>>>> +       CEPH_SPARSE_READ_FINISH,
>>>>    };
>>>>
>>>>    /*
>>>> diff --git a/net/ceph/osd_client.c b/net/ceph/osd_client.c
>>>> index 848ef19055a0..f1705b4f19eb 100644
>>>> --- a/net/ceph/osd_client.c
>>>> +++ b/net/ceph/osd_client.c
>>>> @@ -5802,8 +5802,6 @@ static int prep_next_sparse_read(struct ceph_connection *con,
>>>>                           advance_cursor(cursor, sr->sr_req_len - end, false);
>>>>           }
>>>>
>>>> -       ceph_init_sparse_read(sr);
>>>> -
>>>>           /* find next op in this request (if any) */
>>>>           while (++o->o_sparse_op_idx < req->r_num_ops) {
>>>>                   op = &req->r_ops[o->o_sparse_op_idx];
>>>> @@ -5919,7 +5917,7 @@ static int osd_sparse_read(struct ceph_connection *con,
>>>>                                   return -EREMOTEIO;
>>>>                           }
>>>>
>>>> -                       sr->sr_state = CEPH_SPARSE_READ_HDR;
>>>> +                       sr->sr_state = CEPH_SPARSE_READ_FINISH;
>>>>                           goto next_op;
>>> Hi Xiubo,
>>>
>>> This code appears to be set up to handle multiple (sparse-read) ops in
>>> a single OSD request.  Wouldn't this break that case?
>> Yeah, it will break it. I just missed it.
>>
>>> I think the bug is in read_sparse_msg_data().  It shouldn't be calling
>>> con->ops->sparse_read() after the message has progressed to the footer.
>>> osd_sparse_read() is most likely fine as is.
>> Yes it is. We cannot tell exactly whether has it progressed to the
>> footer IMO, such as when in case 'con->v1.in_base_pos ==
> Hi Xiubo,
>
> I don't buy this.  If the messenger is trying to read the footer, it
> _has_ progressed to the footer.  This is definitive and irreversible,
> not a "maybe".
>
>> sizeof(con->v1.in_hdr)' the socket buffer may break just after finishing
>> sparse-read and before reading footer or some where in sparse-read. For
>> the later case then we should continue in the sparse reads.
> The size of the data section of the message is always known.  The
> contract is that read_partial_msg_data*() returns 1 and does nothing
> else after the data section is consumed.  This is how the messenger is
> told to move on to the footer.
>
> read_partial_sparse_msg_data() doesn't adhere to this contract and
> should be fixed.
>
>>
>>> Notice how read_partial_msg_data() and read_partial_msg_data_bounce()
>>> behave: if called at that point (i.e. after the data section has been
>>> processed, meaning that cursor->total_resid == 0), they do nothing.
>>> read_sparse_msg_data() should have a similar condition and basically
>>> no-op itself.
>> Correct, this what I want to do in the sparse-read code.
> No, this needs to be done on the messenger side.  sparse-read code
> should not be invoked after the messenger has moved on to the footer.

 From my reading, even the messenger has moved on to the 'footer', it 
will always try to invoke to parse the 'header':

read_partial(con, end, size, &con->v1.in_hdr);

But it will do nothing and just returns.

And the same for 'front', 'middle' and '(page) data', then the last for 
'footer'.

Did I miss something ?

Thanks

- Xiubo

> Thanks,
>
>                  Ilya
>


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH v2 2/2] libceph: just wait for more data to be available on the socket
  2023-12-13 11:03         ` Xiubo Li
@ 2023-12-13 11:54           ` Ilya Dryomov
       [not found]             ` <9115452a-0ca0-4760-9407-bcc3146134ff@redhat.com>
  0 siblings, 1 reply; 11+ messages in thread
From: Ilya Dryomov @ 2023-12-13 11:54 UTC (permalink / raw)
  To: Xiubo Li; +Cc: ceph-devel, jlayton, vshankar, mchangir

On Wed, Dec 13, 2023 at 12:03 PM Xiubo Li <xiubli@redhat.com> wrote:
>
>
> On 12/13/23 18:31, Ilya Dryomov wrote:
> > On Wed, Dec 13, 2023 at 2:02 AM Xiubo Li <xiubli@redhat.com> wrote:
> >>
> >> On 12/13/23 00:31, Ilya Dryomov wrote:
> >>> On Fri, Dec 8, 2023 at 5:08 PM <xiubli@redhat.com> wrote:
> >>>> From: Xiubo Li <xiubli@redhat.com>
> >>>>
> >>>> The messages from ceph maybe split into multiple socket packages
> >>>> and we just need to wait for all the data to be availiable on the
> >>>> sokcet.
> >>>>
> >>>> This will add a new _FINISH state for the sparse-read to mark the
> >>>> current sparse-read succeeded. Else it will treat it as a new
> >>>> sparse-read when the socket receives the last piece of the osd
> >>>> request reply message, and the cancel_request() later will help
> >>>> init the sparse-read context.
> >>>>
> >>>> URL: https://tracker.ceph.com/issues/63586
> >>>> Signed-off-by: Xiubo Li <xiubli@redhat.com>
> >>>> ---
> >>>>    include/linux/ceph/osd_client.h | 1 +
> >>>>    net/ceph/osd_client.c           | 6 +++---
> >>>>    2 files changed, 4 insertions(+), 3 deletions(-)
> >>>>
> >>>> diff --git a/include/linux/ceph/osd_client.h b/include/linux/ceph/osd_client.h
> >>>> index 493de3496cd3..00d98e13100f 100644
> >>>> --- a/include/linux/ceph/osd_client.h
> >>>> +++ b/include/linux/ceph/osd_client.h
> >>>> @@ -47,6 +47,7 @@ enum ceph_sparse_read_state {
> >>>>           CEPH_SPARSE_READ_DATA_LEN,
> >>>>           CEPH_SPARSE_READ_DATA_PRE,
> >>>>           CEPH_SPARSE_READ_DATA,
> >>>> +       CEPH_SPARSE_READ_FINISH,
> >>>>    };
> >>>>
> >>>>    /*
> >>>> diff --git a/net/ceph/osd_client.c b/net/ceph/osd_client.c
> >>>> index 848ef19055a0..f1705b4f19eb 100644
> >>>> --- a/net/ceph/osd_client.c
> >>>> +++ b/net/ceph/osd_client.c
> >>>> @@ -5802,8 +5802,6 @@ static int prep_next_sparse_read(struct ceph_connection *con,
> >>>>                           advance_cursor(cursor, sr->sr_req_len - end, false);
> >>>>           }
> >>>>
> >>>> -       ceph_init_sparse_read(sr);
> >>>> -
> >>>>           /* find next op in this request (if any) */
> >>>>           while (++o->o_sparse_op_idx < req->r_num_ops) {
> >>>>                   op = &req->r_ops[o->o_sparse_op_idx];
> >>>> @@ -5919,7 +5917,7 @@ static int osd_sparse_read(struct ceph_connection *con,
> >>>>                                   return -EREMOTEIO;
> >>>>                           }
> >>>>
> >>>> -                       sr->sr_state = CEPH_SPARSE_READ_HDR;
> >>>> +                       sr->sr_state = CEPH_SPARSE_READ_FINISH;
> >>>>                           goto next_op;
> >>> Hi Xiubo,
> >>>
> >>> This code appears to be set up to handle multiple (sparse-read) ops in
> >>> a single OSD request.  Wouldn't this break that case?
> >> Yeah, it will break it. I just missed it.
> >>
> >>> I think the bug is in read_sparse_msg_data().  It shouldn't be calling
> >>> con->ops->sparse_read() after the message has progressed to the footer.
> >>> osd_sparse_read() is most likely fine as is.
> >> Yes it is. We cannot tell exactly whether has it progressed to the
> >> footer IMO, such as when in case 'con->v1.in_base_pos ==
> > Hi Xiubo,
> >
> > I don't buy this.  If the messenger is trying to read the footer, it
> > _has_ progressed to the footer.  This is definitive and irreversible,
> > not a "maybe".
> >
> >> sizeof(con->v1.in_hdr)' the socket buffer may break just after finishing
> >> sparse-read and before reading footer or some where in sparse-read. For
> >> the later case then we should continue in the sparse reads.
> > The size of the data section of the message is always known.  The
> > contract is that read_partial_msg_data*() returns 1 and does nothing
> > else after the data section is consumed.  This is how the messenger is
> > told to move on to the footer.
> >
> > read_partial_sparse_msg_data() doesn't adhere to this contract and
> > should be fixed.
> >
> >>
> >>> Notice how read_partial_msg_data() and read_partial_msg_data_bounce()
> >>> behave: if called at that point (i.e. after the data section has been
> >>> processed, meaning that cursor->total_resid == 0), they do nothing.
> >>> read_sparse_msg_data() should have a similar condition and basically
> >>> no-op itself.
> >> Correct, this what I want to do in the sparse-read code.
> > No, this needs to be done on the messenger side.  sparse-read code
> > should not be invoked after the messenger has moved on to the footer.
>
>  From my reading, even the messenger has moved on to the 'footer', it
> will always try to invoke to parse the 'header':
>
> read_partial(con, end, size, &con->v1.in_hdr);
>
> But it will do nothing and just returns.
>
> And the same for 'front', 'middle' and '(page) data', then the last for
> 'footer'.

This is correct.

>
> Did I miss something ?

No, it's how the messenger is set up to work.  The problem is that
read_sparse_msg_data() doesn't fit this model, so it should be fixed
and renamed to read_partial_sparse_msg_data().

Thanks,

                Ilya

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH v2 2/2] libceph: just wait for more data to be available on the socket
       [not found]             ` <9115452a-0ca0-4760-9407-bcc3146134ff@redhat.com>
@ 2023-12-13 13:07               ` Ilya Dryomov
  2023-12-13 14:11                 ` Jeff Layton
  0 siblings, 1 reply; 11+ messages in thread
From: Ilya Dryomov @ 2023-12-13 13:07 UTC (permalink / raw)
  To: Xiubo Li; +Cc: ceph-devel, jlayton, vshankar, mchangir

On Wed, Dec 13, 2023 at 1:05 PM Xiubo Li <xiubli@redhat.com> wrote:
>
>
> On 12/13/23 19:54, Ilya Dryomov wrote:
>
> On Wed, Dec 13, 2023 at 12:03 PM Xiubo Li <xiubli@redhat.com> wrote:
>
> On 12/13/23 18:31, Ilya Dryomov wrote:
>
> On Wed, Dec 13, 2023 at 2:02 AM Xiubo Li <xiubli@redhat.com> wrote:
>
> On 12/13/23 00:31, Ilya Dryomov wrote:
>
> On Fri, Dec 8, 2023 at 5:08 PM <xiubli@redhat.com> wrote:
>
> From: Xiubo Li <xiubli@redhat.com>
>
> The messages from ceph maybe split into multiple socket packages
> and we just need to wait for all the data to be availiable on the
> sokcet.
>
> This will add a new _FINISH state for the sparse-read to mark the
> current sparse-read succeeded. Else it will treat it as a new
> sparse-read when the socket receives the last piece of the osd
> request reply message, and the cancel_request() later will help
> init the sparse-read context.
>
> URL: https://tracker.ceph.com/issues/63586
> Signed-off-by: Xiubo Li <xiubli@redhat.com>
> ---
>    include/linux/ceph/osd_client.h | 1 +
>    net/ceph/osd_client.c           | 6 +++---
>    2 files changed, 4 insertions(+), 3 deletions(-)
>
> diff --git a/include/linux/ceph/osd_client.h b/include/linux/ceph/osd_client.h
> index 493de3496cd3..00d98e13100f 100644
> --- a/include/linux/ceph/osd_client.h
> +++ b/include/linux/ceph/osd_client.h
> @@ -47,6 +47,7 @@ enum ceph_sparse_read_state {
>           CEPH_SPARSE_READ_DATA_LEN,
>           CEPH_SPARSE_READ_DATA_PRE,
>           CEPH_SPARSE_READ_DATA,
> +       CEPH_SPARSE_READ_FINISH,
>    };
>
>    /*
> diff --git a/net/ceph/osd_client.c b/net/ceph/osd_client.c
> index 848ef19055a0..f1705b4f19eb 100644
> --- a/net/ceph/osd_client.c
> +++ b/net/ceph/osd_client.c
> @@ -5802,8 +5802,6 @@ static int prep_next_sparse_read(struct ceph_connection *con,
>                           advance_cursor(cursor, sr->sr_req_len - end, false);
>           }
>
> -       ceph_init_sparse_read(sr);
> -
>           /* find next op in this request (if any) */
>           while (++o->o_sparse_op_idx < req->r_num_ops) {
>                   op = &req->r_ops[o->o_sparse_op_idx];
> @@ -5919,7 +5917,7 @@ static int osd_sparse_read(struct ceph_connection *con,
>                                   return -EREMOTEIO;
>                           }
>
> -                       sr->sr_state = CEPH_SPARSE_READ_HDR;
> +                       sr->sr_state = CEPH_SPARSE_READ_FINISH;
>                           goto next_op;
>
> Hi Xiubo,
>
> This code appears to be set up to handle multiple (sparse-read) ops in
> a single OSD request.  Wouldn't this break that case?
>
> Yeah, it will break it. I just missed it.
>
> I think the bug is in read_sparse_msg_data().  It shouldn't be calling
> con->ops->sparse_read() after the message has progressed to the footer.
> osd_sparse_read() is most likely fine as is.
>
> Yes it is. We cannot tell exactly whether has it progressed to the
> footer IMO, such as when in case 'con->v1.in_base_pos ==
>
> Hi Xiubo,
>
> I don't buy this.  If the messenger is trying to read the footer, it
> _has_ progressed to the footer.  This is definitive and irreversible,
> not a "maybe".
>
> sizeof(con->v1.in_hdr)' the socket buffer may break just after finishing
> sparse-read and before reading footer or some where in sparse-read. For
> the later case then we should continue in the sparse reads.
>
> The size of the data section of the message is always known.  The
> contract is that read_partial_msg_data*() returns 1 and does nothing
> else after the data section is consumed.  This is how the messenger is
> told to move on to the footer.
>
> read_partial_sparse_msg_data() doesn't adhere to this contract and
> should be fixed.
>
> Notice how read_partial_msg_data() and read_partial_msg_data_bounce()
> behave: if called at that point (i.e. after the data section has been
> processed, meaning that cursor->total_resid == 0), they do nothing.
> read_sparse_msg_data() should have a similar condition and basically
> no-op itself.
>
> Correct, this what I want to do in the sparse-read code.
>
> No, this needs to be done on the messenger side.  sparse-read code
> should not be invoked after the messenger has moved on to the footer.
>
>  From my reading, even the messenger has moved on to the 'footer', it
> will always try to invoke to parse the 'header':
>
> read_partial(con, end, size, &con->v1.in_hdr);
>
> But it will do nothing and just returns.
>
> And the same for 'front', 'middle' and '(page) data', then the last for
> 'footer'.
>
> This is correct.
>
> Did I miss something ?
>
> No, it's how the messenger is set up to work.  The problem is that
> read_sparse_msg_data() doesn't fit this model, so it should be fixed
> and renamed to read_partial_sparse_msg_data().
>
> Okay, let me try.
>
> Did you see my new patch in last mail ? Will that work ?
>
> If not I will try to fix it in read_partial_sparse_msg_data().

It might work around the problem, but it's not the right fix.  Think
about it: what business does code in the OSD client have being called
when the messenger is 14 bytes into reading the footer (number taken
from the log in the cover letter)?  All data is read at that point and
the last op in a multi-op OSD request may not even be sparse-read...

Thanks,

                Ilya

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH v2 2/2] libceph: just wait for more data to be available on the socket
  2023-12-13 13:07               ` Ilya Dryomov
@ 2023-12-13 14:11                 ` Jeff Layton
  0 siblings, 0 replies; 11+ messages in thread
From: Jeff Layton @ 2023-12-13 14:11 UTC (permalink / raw)
  To: Ilya Dryomov, Xiubo Li; +Cc: ceph-devel, vshankar, mchangir

On Wed, 2023-12-13 at 14:07 +0100, Ilya Dryomov wrote:
> On Wed, Dec 13, 2023 at 1:05 PM Xiubo Li <xiubli@redhat.com> wrote:
> > 
> > 
> > On 12/13/23 19:54, Ilya Dryomov wrote:
> > 
> > On Wed, Dec 13, 2023 at 12:03 PM Xiubo Li <xiubli@redhat.com> wrote:
> > 
> > On 12/13/23 18:31, Ilya Dryomov wrote:
> > 
> > On Wed, Dec 13, 2023 at 2:02 AM Xiubo Li <xiubli@redhat.com> wrote:
> > 
> > On 12/13/23 00:31, Ilya Dryomov wrote:
> > 
> > On Fri, Dec 8, 2023 at 5:08 PM <xiubli@redhat.com> wrote:
> > 
> > From: Xiubo Li <xiubli@redhat.com>
> > 
> > The messages from ceph maybe split into multiple socket packages
> > and we just need to wait for all the data to be availiable on the
> > sokcet.
> > 
> > This will add a new _FINISH state for the sparse-read to mark the
> > current sparse-read succeeded. Else it will treat it as a new
> > sparse-read when the socket receives the last piece of the osd
> > request reply message, and the cancel_request() later will help
> > init the sparse-read context.
> > 
> > URL: https://tracker.ceph.com/issues/63586
> > Signed-off-by: Xiubo Li <xiubli@redhat.com>
> > ---
> >    include/linux/ceph/osd_client.h | 1 +
> >    net/ceph/osd_client.c           | 6 +++---
> >    2 files changed, 4 insertions(+), 3 deletions(-)
> > 
> > diff --git a/include/linux/ceph/osd_client.h b/include/linux/ceph/osd_client.h
> > index 493de3496cd3..00d98e13100f 100644
> > --- a/include/linux/ceph/osd_client.h
> > +++ b/include/linux/ceph/osd_client.h
> > @@ -47,6 +47,7 @@ enum ceph_sparse_read_state {
> >           CEPH_SPARSE_READ_DATA_LEN,
> >           CEPH_SPARSE_READ_DATA_PRE,
> >           CEPH_SPARSE_READ_DATA,
> > +       CEPH_SPARSE_READ_FINISH,
> >    };
> > 
> >    /*
> > diff --git a/net/ceph/osd_client.c b/net/ceph/osd_client.c
> > index 848ef19055a0..f1705b4f19eb 100644
> > --- a/net/ceph/osd_client.c
> > +++ b/net/ceph/osd_client.c
> > @@ -5802,8 +5802,6 @@ static int prep_next_sparse_read(struct ceph_connection *con,
> >                           advance_cursor(cursor, sr->sr_req_len - end, false);
> >           }
> > 
> > -       ceph_init_sparse_read(sr);
> > -
> >           /* find next op in this request (if any) */
> >           while (++o->o_sparse_op_idx < req->r_num_ops) {
> >                   op = &req->r_ops[o->o_sparse_op_idx];
> > @@ -5919,7 +5917,7 @@ static int osd_sparse_read(struct ceph_connection *con,
> >                                   return -EREMOTEIO;
> >                           }
> > 
> > -                       sr->sr_state = CEPH_SPARSE_READ_HDR;
> > +                       sr->sr_state = CEPH_SPARSE_READ_FINISH;
> >                           goto next_op;
> > 
> > Hi Xiubo,
> > 
> > This code appears to be set up to handle multiple (sparse-read) ops in
> > a single OSD request.  Wouldn't this break that case?
> > 
> > Yeah, it will break it. I just missed it.
> > 
> > I think the bug is in read_sparse_msg_data().  It shouldn't be calling
> > con->ops->sparse_read() after the message has progressed to the footer.
> > osd_sparse_read() is most likely fine as is.
> > 
> > Yes it is. We cannot tell exactly whether has it progressed to the
> > footer IMO, such as when in case 'con->v1.in_base_pos ==
> > 
> > Hi Xiubo,
> > 
> > I don't buy this.  If the messenger is trying to read the footer, it
> > _has_ progressed to the footer.  This is definitive and irreversible,
> > not a "maybe".
> > 
> > sizeof(con->v1.in_hdr)' the socket buffer may break just after finishing
> > sparse-read and before reading footer or some where in sparse-read. For
> > the later case then we should continue in the sparse reads.
> > 
> > The size of the data section of the message is always known.  The
> > contract is that read_partial_msg_data*() returns 1 and does nothing
> > else after the data section is consumed.  This is how the messenger is
> > told to move on to the footer.
> > 
> > read_partial_sparse_msg_data() doesn't adhere to this contract and
> > should be fixed.
> > 
> > Notice how read_partial_msg_data() and read_partial_msg_data_bounce()
> > behave: if called at that point (i.e. after the data section has been
> > processed, meaning that cursor->total_resid == 0), they do nothing.
> > read_sparse_msg_data() should have a similar condition and basically
> > no-op itself.
> > 
> > Correct, this what I want to do in the sparse-read code.
> > 
> > No, this needs to be done on the messenger side.  sparse-read code
> > should not be invoked after the messenger has moved on to the footer.
> > 
> >  From my reading, even the messenger has moved on to the 'footer', it
> > will always try to invoke to parse the 'header':
> > 
> > read_partial(con, end, size, &con->v1.in_hdr);
> > 
> > But it will do nothing and just returns.
> > 
> > And the same for 'front', 'middle' and '(page) data', then the last for
> > 'footer'.
> > 
> > This is correct.
> > 
> > Did I miss something ?
> > 
> > No, it's how the messenger is set up to work.  The problem is that
> > read_sparse_msg_data() doesn't fit this model, so it should be fixed
> > and renamed to read_partial_sparse_msg_data().
> > 
> > Okay, let me try.
> > 
> > Did you see my new patch in last mail ? Will that work ?
> > 
> > If not I will try to fix it in read_partial_sparse_msg_data().
> 
> It might work around the problem, but it's not the right fix.  Think
> about it: what business does code in the OSD client have being called
> when the messenger is 14 bytes into reading the footer (number taken
> from the log in the cover letter)?  All data is read at that point and
> the last op in a multi-op OSD request may not even be sparse-read...
> 

The assumption in ceph_osdc_new_request is that if you have a multi-op
request, that they are either all reads or all writes. The assumption
about writes has been there a long time. I simply added the ability to
do the same for reads:

    4e8c4c235578 libceph: allow ceph_osdc_new_request to accept a multi-op read

Note that we do this in ceph_sync_write in the rmw case, so that does
need to keep working.

Technically we could have a normal read in the same request as a sparse
read but I figured that would be a little silly. That's why it tries to
prep a second sparse read once the first is done. If there isn't one
then that should fall through to the return 0 just before the "found:"
label in prep_next_sparse_read.

So, I tend to agree with Ilya that the problem isn't down in the OSD
state machine, but more likely at the receive handling layer. It's
certainly plausible that I didn't get the handling of short receives
right (particularly in the messenger_v1 part).
-- 
Jeff Layton <jlayton@kernel.org>

^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2023-12-13 14:11 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-12-08 16:05 [PATCH v2 0/2] libceph: fix sparse-read failure bug xiubli
2023-12-08 16:06 ` [PATCH v2 1/2] libceph: fail the sparse-read if there still has data in socket xiubli
2023-12-08 16:06 ` [PATCH v2 2/2] libceph: just wait for more data to be available on the socket xiubli
2023-12-12 16:31   ` Ilya Dryomov
2023-12-13  1:01     ` Xiubo Li
2023-12-13 10:31       ` Ilya Dryomov
2023-12-13 11:03         ` Xiubo Li
2023-12-13 11:54           ` Ilya Dryomov
     [not found]             ` <9115452a-0ca0-4760-9407-bcc3146134ff@redhat.com>
2023-12-13 13:07               ` Ilya Dryomov
2023-12-13 14:11                 ` Jeff Layton
2023-12-13  1:13     ` Xiubo Li

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.