All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH] libceph: remove the max extents check for sparse read
@ 2023-11-03  3:39 xiubli
  2023-11-03 10:07 ` Ilya Dryomov
  0 siblings, 1 reply; 11+ messages in thread
From: xiubli @ 2023-11-03  3:39 UTC (permalink / raw)
  To: ceph-devel; +Cc: idryomov, jlayton, vshankar, mchangir, Xiubo Li

From: Xiubo Li <xiubli@redhat.com>

There is no any limit for the extent array size and it's possible
that when reading with a large size contents. Else the messager
will fail by reseting the connection and keeps resending the inflight
IOs.

URL: https://tracker.ceph.com/issues/62081
Signed-off-by: Xiubo Li <xiubli@redhat.com>
---
 net/ceph/osd_client.c | 12 ------------
 1 file changed, 12 deletions(-)

diff --git a/net/ceph/osd_client.c b/net/ceph/osd_client.c
index 7af35106acaf..177a1d92c517 100644
--- a/net/ceph/osd_client.c
+++ b/net/ceph/osd_client.c
@@ -5850,8 +5850,6 @@ static inline void convert_extent_map(struct ceph_sparse_read *sr)
 }
 #endif
 
-#define MAX_EXTENTS 4096
-
 static int osd_sparse_read(struct ceph_connection *con,
 			   struct ceph_msg_data_cursor *cursor,
 			   char **pbuf)
@@ -5882,16 +5880,6 @@ static int osd_sparse_read(struct ceph_connection *con,
 
 		if (count > 0) {
 			if (!sr->sr_extent || count > sr->sr_ext_len) {
-				/*
-				 * Apply a hard cap to the number of extents.
-				 * If we have more, assume something is wrong.
-				 */
-				if (count > MAX_EXTENTS) {
-					dout("%s: OSD returned 0x%x extents in a single reply!\n",
-					     __func__, count);
-					return -EREMOTEIO;
-				}
-
 				/* no extent array provided, or too short */
 				kfree(sr->sr_extent);
 				sr->sr_extent = kmalloc_array(count,
-- 
2.39.1


^ permalink raw reply related	[flat|nested] 11+ messages in thread

* Re: [PATCH] libceph: remove the max extents check for sparse read
  2023-11-03  3:39 [PATCH] libceph: remove the max extents check for sparse read xiubli
@ 2023-11-03 10:07 ` Ilya Dryomov
  2023-11-03 10:14   ` Jeff Layton
       [not found]   ` <e350e6e7-22a2-69de-258f-70c2050dbd50@redhat.com>
  0 siblings, 2 replies; 11+ messages in thread
From: Ilya Dryomov @ 2023-11-03 10:07 UTC (permalink / raw)
  To: xiubli; +Cc: ceph-devel, jlayton, vshankar, mchangir

On Fri, Nov 3, 2023 at 4:41 AM <xiubli@redhat.com> wrote:
>
> From: Xiubo Li <xiubli@redhat.com>
>
> There is no any limit for the extent array size and it's possible
> that when reading with a large size contents. Else the messager
> will fail by reseting the connection and keeps resending the inflight
> IOs.
>
> URL: https://tracker.ceph.com/issues/62081
> Signed-off-by: Xiubo Li <xiubli@redhat.com>
> ---
>  net/ceph/osd_client.c | 12 ------------
>  1 file changed, 12 deletions(-)
>
> diff --git a/net/ceph/osd_client.c b/net/ceph/osd_client.c
> index 7af35106acaf..177a1d92c517 100644
> --- a/net/ceph/osd_client.c
> +++ b/net/ceph/osd_client.c
> @@ -5850,8 +5850,6 @@ static inline void convert_extent_map(struct ceph_sparse_read *sr)
>  }
>  #endif
>
> -#define MAX_EXTENTS 4096
> -
>  static int osd_sparse_read(struct ceph_connection *con,
>                            struct ceph_msg_data_cursor *cursor,
>                            char **pbuf)
> @@ -5882,16 +5880,6 @@ static int osd_sparse_read(struct ceph_connection *con,
>
>                 if (count > 0) {
>                         if (!sr->sr_extent || count > sr->sr_ext_len) {
> -                               /*
> -                                * Apply a hard cap to the number of extents.
> -                                * If we have more, assume something is wrong.
> -                                */
> -                               if (count > MAX_EXTENTS) {
> -                                       dout("%s: OSD returned 0x%x extents in a single reply!\n",
> -                                            __func__, count);
> -                                       return -EREMOTEIO;
> -                               }
> -
>                                 /* no extent array provided, or too short */
>                                 kfree(sr->sr_extent);
>                                 sr->sr_extent = kmalloc_array(count,
> --
> 2.39.1
>

Hi Xiubo,

As noted in the tracker ticket, there are many "sanity" limits like
that in the messenger and other parts of the kernel client.  First,
let's change that dout to pr_warn_ratelimited so that it's immediately
clear what is going on.  Then, if the limit actually gets hit, let's
dig into why and see if it can be increased rather than just removed.

Thanks,

                Ilya

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH] libceph: remove the max extents check for sparse read
  2023-11-03 10:07 ` Ilya Dryomov
@ 2023-11-03 10:14   ` Jeff Layton
  2023-11-06  0:23     ` Xiubo Li
  2023-11-06  6:43     ` Xiubo Li
       [not found]   ` <e350e6e7-22a2-69de-258f-70c2050dbd50@redhat.com>
  1 sibling, 2 replies; 11+ messages in thread
From: Jeff Layton @ 2023-11-03 10:14 UTC (permalink / raw)
  To: Ilya Dryomov, xiubli; +Cc: ceph-devel, vshankar, mchangir

On Fri, 2023-11-03 at 11:07 +0100, Ilya Dryomov wrote:
> On Fri, Nov 3, 2023 at 4:41 AM <xiubli@redhat.com> wrote:
> > 
> > From: Xiubo Li <xiubli@redhat.com>
> > 
> > There is no any limit for the extent array size and it's possible
> > that when reading with a large size contents. Else the messager
> > will fail by reseting the connection and keeps resending the inflight
> > IOs.
> > 
> > URL: https://tracker.ceph.com/issues/62081
> > Signed-off-by: Xiubo Li <xiubli@redhat.com>
> > ---
> >  net/ceph/osd_client.c | 12 ------------
> >  1 file changed, 12 deletions(-)
> > 
> > diff --git a/net/ceph/osd_client.c b/net/ceph/osd_client.c
> > index 7af35106acaf..177a1d92c517 100644
> > --- a/net/ceph/osd_client.c
> > +++ b/net/ceph/osd_client.c
> > @@ -5850,8 +5850,6 @@ static inline void convert_extent_map(struct ceph_sparse_read *sr)
> >  }
> >  #endif
> > 
> > -#define MAX_EXTENTS 4096
> > -
> >  static int osd_sparse_read(struct ceph_connection *con,
> >                            struct ceph_msg_data_cursor *cursor,
> >                            char **pbuf)
> > @@ -5882,16 +5880,6 @@ static int osd_sparse_read(struct ceph_connection *con,
> > 
> >                 if (count > 0) {
> >                         if (!sr->sr_extent || count > sr->sr_ext_len) {
> > -                               /*
> > -                                * Apply a hard cap to the number of extents.
> > -                                * If we have more, assume something is wrong.
> > -                                */
> > -                               if (count > MAX_EXTENTS) {
> > -                                       dout("%s: OSD returned 0x%x extents in a single reply!\n",
> > -                                            __func__, count);
> > -                                       return -EREMOTEIO;
> > -                               }
> > -
> >                                 /* no extent array provided, or too short */
> >                                 kfree(sr->sr_extent);
> >                                 sr->sr_extent = kmalloc_array(count,
> > --
> > 2.39.1
> > 
> 
> Hi Xiubo,
> 
> As noted in the tracker ticket, there are many "sanity" limits like
> that in the messenger and other parts of the kernel client.  First,
> let's change that dout to pr_warn_ratelimited so that it's immediately
> clear what is going on.  Then, if the limit actually gets hit, let's
> dig into why and see if it can be increased rather than just removed.
> 

Yeah, agreed. I think when I wrote this, I couldn't figure out if there
was an actual hard cap on the number of extents, so I figured 4k ought
to be enough for anybody. Clearly that was wrong though.

I'd still favor raising the cap instead eliminating it altogether. Is
there a hard cap on the number of extents that the OSD will send in a
single reply? That's really what this limit should be.
-- 
Jeff Layton <jlayton@kernel.org>

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH] libceph: remove the max extents check for sparse read
  2023-11-03 10:14   ` Jeff Layton
@ 2023-11-06  0:23     ` Xiubo Li
  2023-11-06  6:43     ` Xiubo Li
  1 sibling, 0 replies; 11+ messages in thread
From: Xiubo Li @ 2023-11-06  0:23 UTC (permalink / raw)
  To: Jeff Layton, Ilya Dryomov; +Cc: ceph-devel, vshankar, mchangir


On 11/3/23 18:14, Jeff Layton wrote:
> On Fri, 2023-11-03 at 11:07 +0100, Ilya Dryomov wrote:
>> On Fri, Nov 3, 2023 at 4:41 AM <xiubli@redhat.com> wrote:
>>> From: Xiubo Li <xiubli@redhat.com>
>>>
>>> There is no any limit for the extent array size and it's possible
>>> that when reading with a large size contents. Else the messager
>>> will fail by reseting the connection and keeps resending the inflight
>>> IOs.
>>>
>>> URL: https://tracker.ceph.com/issues/62081
>>> Signed-off-by: Xiubo Li <xiubli@redhat.com>
>>> ---
>>>   net/ceph/osd_client.c | 12 ------------
>>>   1 file changed, 12 deletions(-)
>>>
>>> diff --git a/net/ceph/osd_client.c b/net/ceph/osd_client.c
>>> index 7af35106acaf..177a1d92c517 100644
>>> --- a/net/ceph/osd_client.c
>>> +++ b/net/ceph/osd_client.c
>>> @@ -5850,8 +5850,6 @@ static inline void convert_extent_map(struct ceph_sparse_read *sr)
>>>   }
>>>   #endif
>>>
>>> -#define MAX_EXTENTS 4096
>>> -
>>>   static int osd_sparse_read(struct ceph_connection *con,
>>>                             struct ceph_msg_data_cursor *cursor,
>>>                             char **pbuf)
>>> @@ -5882,16 +5880,6 @@ static int osd_sparse_read(struct ceph_connection *con,
>>>
>>>                  if (count > 0) {
>>>                          if (!sr->sr_extent || count > sr->sr_ext_len) {
>>> -                               /*
>>> -                                * Apply a hard cap to the number of extents.
>>> -                                * If we have more, assume something is wrong.
>>> -                                */
>>> -                               if (count > MAX_EXTENTS) {
>>> -                                       dout("%s: OSD returned 0x%x extents in a single reply!\n",
>>> -                                            __func__, count);
>>> -                                       return -EREMOTEIO;
>>> -                               }
>>> -
>>>                                  /* no extent array provided, or too short */
>>>                                  kfree(sr->sr_extent);
>>>                                  sr->sr_extent = kmalloc_array(count,
>>> --
>>> 2.39.1
>>>
>> Hi Xiubo,
>>
>> As noted in the tracker ticket, there are many "sanity" limits like
>> that in the messenger and other parts of the kernel client.  First,
>> let's change that dout to pr_warn_ratelimited so that it's immediately
>> clear what is going on.  Then, if the limit actually gets hit, let's
>> dig into why and see if it can be increased rather than just removed.
>>
> Yeah, agreed. I think when I wrote this, I couldn't figure out if there
> was an actual hard cap on the number of extents, so I figured 4k ought
> to be enough for anybody. Clearly that was wrong though.

Okay.

> I'd still favor raising the cap instead eliminating it altogether. Is
> there a hard cap on the number of extents that the OSD will send in a
> single reply? That's really what this limit should be.

I didn't find any thing about this in ceph if I didn't miss something.

 From my test it could reach up to very large number, such as 10000 
after randomly writes to a file and then read it with a large size.

I'm thinking could we limit this to depends on the memories being 
allocated for 'sr->sr_extent' ?

Thanks

- Xiubo




^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH] libceph: remove the max extents check for sparse read
  2023-11-03 10:14   ` Jeff Layton
  2023-11-06  0:23     ` Xiubo Li
@ 2023-11-06  6:43     ` Xiubo Li
  2023-11-06 11:42       ` Ilya Dryomov
  1 sibling, 1 reply; 11+ messages in thread
From: Xiubo Li @ 2023-11-06  6:43 UTC (permalink / raw)
  To: Jeff Layton, Ilya Dryomov; +Cc: ceph-devel, vshankar, mchangir


On 11/3/23 18:14, Jeff Layton wrote:
> On Fri, 2023-11-03 at 11:07 +0100, Ilya Dryomov wrote:
>> On Fri, Nov 3, 2023 at 4:41 AM <xiubli@redhat.com> wrote:
>>> From: Xiubo Li <xiubli@redhat.com>
>>>
>>> There is no any limit for the extent array size and it's possible
>>> that when reading with a large size contents. Else the messager
>>> will fail by reseting the connection and keeps resending the inflight
>>> IOs.
>>>
>>> URL: https://tracker.ceph.com/issues/62081
>>> Signed-off-by: Xiubo Li <xiubli@redhat.com>
>>> ---
>>>   net/ceph/osd_client.c | 12 ------------
>>>   1 file changed, 12 deletions(-)
>>>
>>> diff --git a/net/ceph/osd_client.c b/net/ceph/osd_client.c
>>> index 7af35106acaf..177a1d92c517 100644
>>> --- a/net/ceph/osd_client.c
>>> +++ b/net/ceph/osd_client.c
>>> @@ -5850,8 +5850,6 @@ static inline void convert_extent_map(struct ceph_sparse_read *sr)
>>>   }
>>>   #endif
>>>
>>> -#define MAX_EXTENTS 4096
>>> -
>>>   static int osd_sparse_read(struct ceph_connection *con,
>>>                             struct ceph_msg_data_cursor *cursor,
>>>                             char **pbuf)
>>> @@ -5882,16 +5880,6 @@ static int osd_sparse_read(struct ceph_connection *con,
>>>
>>>                  if (count > 0) {
>>>                          if (!sr->sr_extent || count > sr->sr_ext_len) {
>>> -                               /*
>>> -                                * Apply a hard cap to the number of extents.
>>> -                                * If we have more, assume something is wrong.
>>> -                                */
>>> -                               if (count > MAX_EXTENTS) {
>>> -                                       dout("%s: OSD returned 0x%x extents in a single reply!\n",
>>> -                                            __func__, count);
>>> -                                       return -EREMOTEIO;
>>> -                               }
>>> -
>>>                                  /* no extent array provided, or too short */
>>>                                  kfree(sr->sr_extent);
>>>                                  sr->sr_extent = kmalloc_array(count,
>>> --
>>> 2.39.1
>>>
>> Hi Xiubo,
>>
>> As noted in the tracker ticket, there are many "sanity" limits like
>> that in the messenger and other parts of the kernel client.  First,
>> let's change that dout to pr_warn_ratelimited so that it's immediately
>> clear what is going on.  Then, if the limit actually gets hit, let's
>> dig into why and see if it can be increased rather than just removed.
>>
> Yeah, agreed. I think when I wrote this, I couldn't figure out if there
> was an actual hard cap on the number of extents, so I figured 4k ought
> to be enough for anybody. Clearly that was wrong though.
>
> I'd still favor raising the cap instead eliminating it altogether. Is
> there a hard cap on the number of extents that the OSD will send in a
> single reply? That's really what this limit should be.

I went through the messager code again carefully, I found that even in 
case when the errno is '-ENOMEM' for a request the messager will trigger 
the connection fault, which will reconnect the connection and retry all 
the osd requests. This looks incorrect.

IMO only in case when the errno is any of '-EBADMSG' or '-EREMOTEIO', 
etc should we retry the osd requests. And for the errors that caused by 
the client side we should fail the osd requests instead.

Thanks

- Xiubo





^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH] libceph: remove the max extents check for sparse read
       [not found]   ` <e350e6e7-22a2-69de-258f-70c2050dbd50@redhat.com>
@ 2023-11-06 11:38     ` Ilya Dryomov
  2023-11-06 12:14       ` Xiubo Li
  0 siblings, 1 reply; 11+ messages in thread
From: Ilya Dryomov @ 2023-11-06 11:38 UTC (permalink / raw)
  To: Xiubo Li; +Cc: ceph-devel, jlayton, vshankar, mchangir

On Mon, Nov 6, 2023 at 1:14 AM Xiubo Li <xiubli@redhat.com> wrote:
>
>
> On 11/3/23 18:07, Ilya Dryomov wrote:
>
> On Fri, Nov 3, 2023 at 4:41 AM <xiubli@redhat.com> wrote:
>
> From: Xiubo Li <xiubli@redhat.com>
>
> There is no any limit for the extent array size and it's possible
> that when reading with a large size contents. Else the messager
> will fail by reseting the connection and keeps resending the inflight
> IOs.
>
> URL: https://tracker.ceph.com/issues/62081
> Signed-off-by: Xiubo Li <xiubli@redhat.com>
> ---
>  net/ceph/osd_client.c | 12 ------------
>  1 file changed, 12 deletions(-)
>
> diff --git a/net/ceph/osd_client.c b/net/ceph/osd_client.c
> index 7af35106acaf..177a1d92c517 100644
> --- a/net/ceph/osd_client.c
> +++ b/net/ceph/osd_client.c
> @@ -5850,8 +5850,6 @@ static inline void convert_extent_map(struct ceph_sparse_read *sr)
>  }
>  #endif
>
> -#define MAX_EXTENTS 4096
> -
>  static int osd_sparse_read(struct ceph_connection *con,
>                            struct ceph_msg_data_cursor *cursor,
>                            char **pbuf)
> @@ -5882,16 +5880,6 @@ static int osd_sparse_read(struct ceph_connection *con,
>
>                 if (count > 0) {
>                         if (!sr->sr_extent || count > sr->sr_ext_len) {
> -                               /*
> -                                * Apply a hard cap to the number of extents.
> -                                * If we have more, assume something is wrong.
> -                                */
> -                               if (count > MAX_EXTENTS) {
> -                                       dout("%s: OSD returned 0x%x extents in a single reply!\n",
> -                                            __func__, count);
> -                                       return -EREMOTEIO;
> -                               }
> -
>                                 /* no extent array provided, or too short */
>                                 kfree(sr->sr_extent);
>                                 sr->sr_extent = kmalloc_array(count,
> --
> 2.39.1
>
> Hi Xiubo,
>
> As noted in the tracker ticket, there are many "sanity" limits like
> that in the messenger and other parts of the kernel client.  First,
> let's change that dout to pr_warn_ratelimited so that it's immediately
> clear what is going on.
>
> Sounds good to me.
>
>  Then, if the limit actually gets hit, let's
> dig into why and see if it can be increased rather than just removed.
>
> The test result in https://tracker.ceph.com/issues/62081#note-16 is that I just changed the 'len' to 5000 in ceph PR https://github.com/ceph/ceph/pull/54301 to emulate a random writes to a file and then tries to read with a large size:
>
> [ RUN      ] LibRadosIoPP.SparseReadExtentArrayOpPP
> extents array size : 4297
>
> For the 'extents array size' it could reach up to a very large number, then what it should be ? Any idea ?

Hi Xiubo,

I don't think it can be a very large number in practice.

CephFS uses sparse reads only in the fscrypt case, right?  With
fscrypt, sub-4K writes to the object store aren't possible, right?
If the answer to both is yes, then the maximum number of extents
would be

    64M (CEPH_MSG_MAX_DATA_LEN) / 4K = 16384

even if the object store does tracking at byte granularity.

Thanks,

                Ilya

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH] libceph: remove the max extents check for sparse read
  2023-11-06  6:43     ` Xiubo Li
@ 2023-11-06 11:42       ` Ilya Dryomov
  2023-11-06 12:15         ` Xiubo Li
  0 siblings, 1 reply; 11+ messages in thread
From: Ilya Dryomov @ 2023-11-06 11:42 UTC (permalink / raw)
  To: Xiubo Li; +Cc: Jeff Layton, ceph-devel, vshankar, mchangir

On Mon, Nov 6, 2023 at 7:43 AM Xiubo Li <xiubli@redhat.com> wrote:
>
>
> On 11/3/23 18:14, Jeff Layton wrote:
> > On Fri, 2023-11-03 at 11:07 +0100, Ilya Dryomov wrote:
> >> On Fri, Nov 3, 2023 at 4:41 AM <xiubli@redhat.com> wrote:
> >>> From: Xiubo Li <xiubli@redhat.com>
> >>>
> >>> There is no any limit for the extent array size and it's possible
> >>> that when reading with a large size contents. Else the messager
> >>> will fail by reseting the connection and keeps resending the inflight
> >>> IOs.
> >>>
> >>> URL: https://tracker.ceph.com/issues/62081
> >>> Signed-off-by: Xiubo Li <xiubli@redhat.com>
> >>> ---
> >>>   net/ceph/osd_client.c | 12 ------------
> >>>   1 file changed, 12 deletions(-)
> >>>
> >>> diff --git a/net/ceph/osd_client.c b/net/ceph/osd_client.c
> >>> index 7af35106acaf..177a1d92c517 100644
> >>> --- a/net/ceph/osd_client.c
> >>> +++ b/net/ceph/osd_client.c
> >>> @@ -5850,8 +5850,6 @@ static inline void convert_extent_map(struct ceph_sparse_read *sr)
> >>>   }
> >>>   #endif
> >>>
> >>> -#define MAX_EXTENTS 4096
> >>> -
> >>>   static int osd_sparse_read(struct ceph_connection *con,
> >>>                             struct ceph_msg_data_cursor *cursor,
> >>>                             char **pbuf)
> >>> @@ -5882,16 +5880,6 @@ static int osd_sparse_read(struct ceph_connection *con,
> >>>
> >>>                  if (count > 0) {
> >>>                          if (!sr->sr_extent || count > sr->sr_ext_len) {
> >>> -                               /*
> >>> -                                * Apply a hard cap to the number of extents.
> >>> -                                * If we have more, assume something is wrong.
> >>> -                                */
> >>> -                               if (count > MAX_EXTENTS) {
> >>> -                                       dout("%s: OSD returned 0x%x extents in a single reply!\n",
> >>> -                                            __func__, count);
> >>> -                                       return -EREMOTEIO;
> >>> -                               }
> >>> -
> >>>                                  /* no extent array provided, or too short */
> >>>                                  kfree(sr->sr_extent);
> >>>                                  sr->sr_extent = kmalloc_array(count,
> >>> --
> >>> 2.39.1
> >>>
> >> Hi Xiubo,
> >>
> >> As noted in the tracker ticket, there are many "sanity" limits like
> >> that in the messenger and other parts of the kernel client.  First,
> >> let's change that dout to pr_warn_ratelimited so that it's immediately
> >> clear what is going on.  Then, if the limit actually gets hit, let's
> >> dig into why and see if it can be increased rather than just removed.
> >>
> > Yeah, agreed. I think when I wrote this, I couldn't figure out if there
> > was an actual hard cap on the number of extents, so I figured 4k ought
> > to be enough for anybody. Clearly that was wrong though.
> >
> > I'd still favor raising the cap instead eliminating it altogether. Is
> > there a hard cap on the number of extents that the OSD will send in a
> > single reply? That's really what this limit should be.
>
> I went through the messager code again carefully, I found that even in
> case when the errno is '-ENOMEM' for a request the messager will trigger
> the connection fault, which will reconnect the connection and retry all
> the osd requests. This looks incorrect.

In theory, ENOMEM can be transient.  If memory is too fragmented (e.g.
kmalloc is used and there is no physically contiguous chunk of required
size available), it makes sense to retry the allocation after some time
passes.

>
> IMO only in case when the errno is any of '-EBADMSG' or '-EREMOTEIO',
> etc should we retry the osd requests. And for the errors that caused by
> the client side we should fail the osd requests instead.

The messenger never fails higher level requests, no matter what the
error is.  Whether it's a good idea is debatable (personally I'm not
a fan), but this is how it behaves in userspace, so there isn't much
implementation freedom here.

Thanks,

                Ilya

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH] libceph: remove the max extents check for sparse read
  2023-11-06 11:38     ` Ilya Dryomov
@ 2023-11-06 12:14       ` Xiubo Li
  2023-11-06 12:32         ` Ilya Dryomov
  0 siblings, 1 reply; 11+ messages in thread
From: Xiubo Li @ 2023-11-06 12:14 UTC (permalink / raw)
  To: Ilya Dryomov; +Cc: ceph-devel, jlayton, vshankar, mchangir


On 11/6/23 19:38, Ilya Dryomov wrote:
> On Mon, Nov 6, 2023 at 1:14 AM Xiubo Li <xiubli@redhat.com> wrote:
>>
>> On 11/3/23 18:07, Ilya Dryomov wrote:
>>
>> On Fri, Nov 3, 2023 at 4:41 AM <xiubli@redhat.com> wrote:
>>
>> From: Xiubo Li <xiubli@redhat.com>
>>
>> There is no any limit for the extent array size and it's possible
>> that when reading with a large size contents. Else the messager
>> will fail by reseting the connection and keeps resending the inflight
>> IOs.
>>
>> URL: https://tracker.ceph.com/issues/62081
>> Signed-off-by: Xiubo Li <xiubli@redhat.com>
>> ---
>>   net/ceph/osd_client.c | 12 ------------
>>   1 file changed, 12 deletions(-)
>>
>> diff --git a/net/ceph/osd_client.c b/net/ceph/osd_client.c
>> index 7af35106acaf..177a1d92c517 100644
>> --- a/net/ceph/osd_client.c
>> +++ b/net/ceph/osd_client.c
>> @@ -5850,8 +5850,6 @@ static inline void convert_extent_map(struct ceph_sparse_read *sr)
>>   }
>>   #endif
>>
>> -#define MAX_EXTENTS 4096
>> -
>>   static int osd_sparse_read(struct ceph_connection *con,
>>                             struct ceph_msg_data_cursor *cursor,
>>                             char **pbuf)
>> @@ -5882,16 +5880,6 @@ static int osd_sparse_read(struct ceph_connection *con,
>>
>>                  if (count > 0) {
>>                          if (!sr->sr_extent || count > sr->sr_ext_len) {
>> -                               /*
>> -                                * Apply a hard cap to the number of extents.
>> -                                * If we have more, assume something is wrong.
>> -                                */
>> -                               if (count > MAX_EXTENTS) {
>> -                                       dout("%s: OSD returned 0x%x extents in a single reply!\n",
>> -                                            __func__, count);
>> -                                       return -EREMOTEIO;
>> -                               }
>> -
>>                                  /* no extent array provided, or too short */
>>                                  kfree(sr->sr_extent);
>>                                  sr->sr_extent = kmalloc_array(count,
>> --
>> 2.39.1
>>
>> Hi Xiubo,
>>
>> As noted in the tracker ticket, there are many "sanity" limits like
>> that in the messenger and other parts of the kernel client.  First,
>> let's change that dout to pr_warn_ratelimited so that it's immediately
>> clear what is going on.
>>
>> Sounds good to me.
>>
>>   Then, if the limit actually gets hit, let's
>> dig into why and see if it can be increased rather than just removed.
>>
>> The test result in https://tracker.ceph.com/issues/62081#note-16 is that I just changed the 'len' to 5000 in ceph PR https://github.com/ceph/ceph/pull/54301 to emulate a random writes to a file and then tries to read with a large size:
>>
>> [ RUN      ] LibRadosIoPP.SparseReadExtentArrayOpPP
>> extents array size : 4297
>>
>> For the 'extents array size' it could reach up to a very large number, then what it should be ? Any idea ?
> Hi Xiubo,
>
> I don't think it can be a very large number in practice.
>
> CephFS uses sparse reads only in the fscrypt case, right?

Yeah, it is.


>    With
> fscrypt, sub-4K writes to the object store aren't possible, right?

Yeah.


> If the answer to both is yes, then the maximum number of extents
> would be
>
>      64M (CEPH_MSG_MAX_DATA_LEN) / 4K = 16384
>
> even if the object store does tracking at byte granularity.
So yeah, just for the fscrypt case if we set the max extent number to 
16384 it should be fine. But the sparse read could also be enabled in 
non-fscrypt case. From my test in 
https://github.com/ceph/ceph/pull/54301 if we set the segment size to 8 
bytes the extent number could be very large.

Thanks

- Xiubo

> Thanks,
>
>                  Ilya
>


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH] libceph: remove the max extents check for sparse read
  2023-11-06 11:42       ` Ilya Dryomov
@ 2023-11-06 12:15         ` Xiubo Li
  0 siblings, 0 replies; 11+ messages in thread
From: Xiubo Li @ 2023-11-06 12:15 UTC (permalink / raw)
  To: Ilya Dryomov; +Cc: Jeff Layton, ceph-devel, vshankar, mchangir


On 11/6/23 19:42, Ilya Dryomov wrote:
> On Mon, Nov 6, 2023 at 7:43 AM Xiubo Li <xiubli@redhat.com> wrote:
>>
>> On 11/3/23 18:14, Jeff Layton wrote:
>>> On Fri, 2023-11-03 at 11:07 +0100, Ilya Dryomov wrote:
>>>> On Fri, Nov 3, 2023 at 4:41 AM <xiubli@redhat.com> wrote:
>>>>> From: Xiubo Li <xiubli@redhat.com>
>>>>>
>>>>> There is no any limit for the extent array size and it's possible
>>>>> that when reading with a large size contents. Else the messager
>>>>> will fail by reseting the connection and keeps resending the inflight
>>>>> IOs.
>>>>>
>>>>> URL: https://tracker.ceph.com/issues/62081
>>>>> Signed-off-by: Xiubo Li <xiubli@redhat.com>
>>>>> ---
>>>>>    net/ceph/osd_client.c | 12 ------------
>>>>>    1 file changed, 12 deletions(-)
>>>>>
>>>>> diff --git a/net/ceph/osd_client.c b/net/ceph/osd_client.c
>>>>> index 7af35106acaf..177a1d92c517 100644
>>>>> --- a/net/ceph/osd_client.c
>>>>> +++ b/net/ceph/osd_client.c
>>>>> @@ -5850,8 +5850,6 @@ static inline void convert_extent_map(struct ceph_sparse_read *sr)
>>>>>    }
>>>>>    #endif
>>>>>
>>>>> -#define MAX_EXTENTS 4096
>>>>> -
>>>>>    static int osd_sparse_read(struct ceph_connection *con,
>>>>>                              struct ceph_msg_data_cursor *cursor,
>>>>>                              char **pbuf)
>>>>> @@ -5882,16 +5880,6 @@ static int osd_sparse_read(struct ceph_connection *con,
>>>>>
>>>>>                   if (count > 0) {
>>>>>                           if (!sr->sr_extent || count > sr->sr_ext_len) {
>>>>> -                               /*
>>>>> -                                * Apply a hard cap to the number of extents.
>>>>> -                                * If we have more, assume something is wrong.
>>>>> -                                */
>>>>> -                               if (count > MAX_EXTENTS) {
>>>>> -                                       dout("%s: OSD returned 0x%x extents in a single reply!\n",
>>>>> -                                            __func__, count);
>>>>> -                                       return -EREMOTEIO;
>>>>> -                               }
>>>>> -
>>>>>                                   /* no extent array provided, or too short */
>>>>>                                   kfree(sr->sr_extent);
>>>>>                                   sr->sr_extent = kmalloc_array(count,
>>>>> --
>>>>> 2.39.1
>>>>>
>>>> Hi Xiubo,
>>>>
>>>> As noted in the tracker ticket, there are many "sanity" limits like
>>>> that in the messenger and other parts of the kernel client.  First,
>>>> let's change that dout to pr_warn_ratelimited so that it's immediately
>>>> clear what is going on.  Then, if the limit actually gets hit, let's
>>>> dig into why and see if it can be increased rather than just removed.
>>>>
>>> Yeah, agreed. I think when I wrote this, I couldn't figure out if there
>>> was an actual hard cap on the number of extents, so I figured 4k ought
>>> to be enough for anybody. Clearly that was wrong though.
>>>
>>> I'd still favor raising the cap instead eliminating it altogether. Is
>>> there a hard cap on the number of extents that the OSD will send in a
>>> single reply? That's really what this limit should be.
>> I went through the messager code again carefully, I found that even in
>> case when the errno is '-ENOMEM' for a request the messager will trigger
>> the connection fault, which will reconnect the connection and retry all
>> the osd requests. This looks incorrect.
> In theory, ENOMEM can be transient.  If memory is too fragmented (e.g.
> kmalloc is used and there is no physically contiguous chunk of required
> size available), it makes sense to retry the allocation after some time
> passes.
>
>> IMO only in case when the errno is any of '-EBADMSG' or '-EREMOTEIO',
>> etc should we retry the osd requests. And for the errors that caused by
>> the client side we should fail the osd requests instead.
> The messenger never fails higher level requests, no matter what the
> error is.  Whether it's a good idea is debatable (personally I'm not
> a fan), but this is how it behaves in userspace, so there isn't much
> implementation freedom here.

Okay. Get it.

Thanks

- Xiubo


> Thanks,
>
>                  Ilya
>


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH] libceph: remove the max extents check for sparse read
  2023-11-06 12:14       ` Xiubo Li
@ 2023-11-06 12:32         ` Ilya Dryomov
  2023-11-06 13:02           ` Xiubo Li
  0 siblings, 1 reply; 11+ messages in thread
From: Ilya Dryomov @ 2023-11-06 12:32 UTC (permalink / raw)
  To: Xiubo Li; +Cc: ceph-devel, jlayton, vshankar, mchangir

On Mon, Nov 6, 2023 at 1:15 PM Xiubo Li <xiubli@redhat.com> wrote:
>
>
> On 11/6/23 19:38, Ilya Dryomov wrote:
> > On Mon, Nov 6, 2023 at 1:14 AM Xiubo Li <xiubli@redhat.com> wrote:
> >>
> >> On 11/3/23 18:07, Ilya Dryomov wrote:
> >>
> >> On Fri, Nov 3, 2023 at 4:41 AM <xiubli@redhat.com> wrote:
> >>
> >> From: Xiubo Li <xiubli@redhat.com>
> >>
> >> There is no any limit for the extent array size and it's possible
> >> that when reading with a large size contents. Else the messager
> >> will fail by reseting the connection and keeps resending the inflight
> >> IOs.
> >>
> >> URL: https://tracker.ceph.com/issues/62081
> >> Signed-off-by: Xiubo Li <xiubli@redhat.com>
> >> ---
> >>   net/ceph/osd_client.c | 12 ------------
> >>   1 file changed, 12 deletions(-)
> >>
> >> diff --git a/net/ceph/osd_client.c b/net/ceph/osd_client.c
> >> index 7af35106acaf..177a1d92c517 100644
> >> --- a/net/ceph/osd_client.c
> >> +++ b/net/ceph/osd_client.c
> >> @@ -5850,8 +5850,6 @@ static inline void convert_extent_map(struct ceph_sparse_read *sr)
> >>   }
> >>   #endif
> >>
> >> -#define MAX_EXTENTS 4096
> >> -
> >>   static int osd_sparse_read(struct ceph_connection *con,
> >>                             struct ceph_msg_data_cursor *cursor,
> >>                             char **pbuf)
> >> @@ -5882,16 +5880,6 @@ static int osd_sparse_read(struct ceph_connection *con,
> >>
> >>                  if (count > 0) {
> >>                          if (!sr->sr_extent || count > sr->sr_ext_len) {
> >> -                               /*
> >> -                                * Apply a hard cap to the number of extents.
> >> -                                * If we have more, assume something is wrong.
> >> -                                */
> >> -                               if (count > MAX_EXTENTS) {
> >> -                                       dout("%s: OSD returned 0x%x extents in a single reply!\n",
> >> -                                            __func__, count);
> >> -                                       return -EREMOTEIO;
> >> -                               }
> >> -
> >>                                  /* no extent array provided, or too short */
> >>                                  kfree(sr->sr_extent);
> >>                                  sr->sr_extent = kmalloc_array(count,
> >> --
> >> 2.39.1
> >>
> >> Hi Xiubo,
> >>
> >> As noted in the tracker ticket, there are many "sanity" limits like
> >> that in the messenger and other parts of the kernel client.  First,
> >> let's change that dout to pr_warn_ratelimited so that it's immediately
> >> clear what is going on.
> >>
> >> Sounds good to me.
> >>
> >>   Then, if the limit actually gets hit, let's
> >> dig into why and see if it can be increased rather than just removed.
> >>
> >> The test result in https://tracker.ceph.com/issues/62081#note-16 is that I just changed the 'len' to 5000 in ceph PR https://github.com/ceph/ceph/pull/54301 to emulate a random writes to a file and then tries to read with a large size:
> >>
> >> [ RUN      ] LibRadosIoPP.SparseReadExtentArrayOpPP
> >> extents array size : 4297
> >>
> >> For the 'extents array size' it could reach up to a very large number, then what it should be ? Any idea ?
> > Hi Xiubo,
> >
> > I don't think it can be a very large number in practice.
> >
> > CephFS uses sparse reads only in the fscrypt case, right?
>
> Yeah, it is.
>
>
> >    With
> > fscrypt, sub-4K writes to the object store aren't possible, right?
>
> Yeah.
>
>
> > If the answer to both is yes, then the maximum number of extents
> > would be
> >
> >      64M (CEPH_MSG_MAX_DATA_LEN) / 4K = 16384
> >
> > even if the object store does tracking at byte granularity.
> So yeah, just for the fscrypt case if we set the max extent number to
> 16384 it should be fine. But the sparse read could also be enabled in
> non-fscrypt case.

Ah, I see that it's also exposed as a mount option.  If we expect
people to use it, then dropping MAX_EXTENTS altogether might be the
best approach after all -- it doesn't make sense to warn about
something that we don't really control.

How about printing the number of extents only if the allocation fails?
Something like:

    if (!sr->sr_extent) {
            pr_err("failed to allocate %u sparse read extents\n", count);
            return -ENOMEM;
    }

Thanks,

                Ilya

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH] libceph: remove the max extents check for sparse read
  2023-11-06 12:32         ` Ilya Dryomov
@ 2023-11-06 13:02           ` Xiubo Li
  0 siblings, 0 replies; 11+ messages in thread
From: Xiubo Li @ 2023-11-06 13:02 UTC (permalink / raw)
  To: Ilya Dryomov; +Cc: ceph-devel, jlayton, vshankar, mchangir


On 11/6/23 20:32, Ilya Dryomov wrote:
> On Mon, Nov 6, 2023 at 1:15 PM Xiubo Li <xiubli@redhat.com> wrote:
>>
>> On 11/6/23 19:38, Ilya Dryomov wrote:
>>> On Mon, Nov 6, 2023 at 1:14 AM Xiubo Li <xiubli@redhat.com> wrote:
>>>> On 11/3/23 18:07, Ilya Dryomov wrote:
>>>>
>>>> On Fri, Nov 3, 2023 at 4:41 AM <xiubli@redhat.com> wrote:
>>>>
>>>> From: Xiubo Li <xiubli@redhat.com>
>>>>
>>>> There is no any limit for the extent array size and it's possible
>>>> that when reading with a large size contents. Else the messager
>>>> will fail by reseting the connection and keeps resending the inflight
>>>> IOs.
>>>>
>>>> URL: https://tracker.ceph.com/issues/62081
>>>> Signed-off-by: Xiubo Li <xiubli@redhat.com>
>>>> ---
>>>>    net/ceph/osd_client.c | 12 ------------
>>>>    1 file changed, 12 deletions(-)
>>>>
>>>> diff --git a/net/ceph/osd_client.c b/net/ceph/osd_client.c
>>>> index 7af35106acaf..177a1d92c517 100644
>>>> --- a/net/ceph/osd_client.c
>>>> +++ b/net/ceph/osd_client.c
>>>> @@ -5850,8 +5850,6 @@ static inline void convert_extent_map(struct ceph_sparse_read *sr)
>>>>    }
>>>>    #endif
>>>>
>>>> -#define MAX_EXTENTS 4096
>>>> -
>>>>    static int osd_sparse_read(struct ceph_connection *con,
>>>>                              struct ceph_msg_data_cursor *cursor,
>>>>                              char **pbuf)
>>>> @@ -5882,16 +5880,6 @@ static int osd_sparse_read(struct ceph_connection *con,
>>>>
>>>>                   if (count > 0) {
>>>>                           if (!sr->sr_extent || count > sr->sr_ext_len) {
>>>> -                               /*
>>>> -                                * Apply a hard cap to the number of extents.
>>>> -                                * If we have more, assume something is wrong.
>>>> -                                */
>>>> -                               if (count > MAX_EXTENTS) {
>>>> -                                       dout("%s: OSD returned 0x%x extents in a single reply!\n",
>>>> -                                            __func__, count);
>>>> -                                       return -EREMOTEIO;
>>>> -                               }
>>>> -
>>>>                                   /* no extent array provided, or too short */
>>>>                                   kfree(sr->sr_extent);
>>>>                                   sr->sr_extent = kmalloc_array(count,
>>>> --
>>>> 2.39.1
>>>>
>>>> Hi Xiubo,
>>>>
>>>> As noted in the tracker ticket, there are many "sanity" limits like
>>>> that in the messenger and other parts of the kernel client.  First,
>>>> let's change that dout to pr_warn_ratelimited so that it's immediately
>>>> clear what is going on.
>>>>
>>>> Sounds good to me.
>>>>
>>>>    Then, if the limit actually gets hit, let's
>>>> dig into why and see if it can be increased rather than just removed.
>>>>
>>>> The test result in https://tracker.ceph.com/issues/62081#note-16 is that I just changed the 'len' to 5000 in ceph PR https://github.com/ceph/ceph/pull/54301 to emulate a random writes to a file and then tries to read with a large size:
>>>>
>>>> [ RUN      ] LibRadosIoPP.SparseReadExtentArrayOpPP
>>>> extents array size : 4297
>>>>
>>>> For the 'extents array size' it could reach up to a very large number, then what it should be ? Any idea ?
>>> Hi Xiubo,
>>>
>>> I don't think it can be a very large number in practice.
>>>
>>> CephFS uses sparse reads only in the fscrypt case, right?
>> Yeah, it is.
>>
>>
>>>     With
>>> fscrypt, sub-4K writes to the object store aren't possible, right?
>> Yeah.
>>
>>
>>> If the answer to both is yes, then the maximum number of extents
>>> would be
>>>
>>>       64M (CEPH_MSG_MAX_DATA_LEN) / 4K = 16384
>>>
>>> even if the object store does tracking at byte granularity.
>> So yeah, just for the fscrypt case if we set the max extent number to
>> 16384 it should be fine. But the sparse read could also be enabled in
>> non-fscrypt case.
> Ah, I see that it's also exposed as a mount option.  If we expect
> people to use it, then dropping MAX_EXTENTS altogether might be the
> best approach after all -- it doesn't make sense to warn about
> something that we don't really control.
>
> How about printing the number of extents only if the allocation fails?
> Something like:
>
>      if (!sr->sr_extent) {
>              pr_err("failed to allocate %u sparse read extents\n", count);
>              return -ENOMEM;
>      }

Yeah, this also looks good to me.

I will fix it.

Thanks

- Xiubo


> Thanks,
>
>                  Ilya
>


^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2023-11-06 13:02 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-11-03  3:39 [PATCH] libceph: remove the max extents check for sparse read xiubli
2023-11-03 10:07 ` Ilya Dryomov
2023-11-03 10:14   ` Jeff Layton
2023-11-06  0:23     ` Xiubo Li
2023-11-06  6:43     ` Xiubo Li
2023-11-06 11:42       ` Ilya Dryomov
2023-11-06 12:15         ` Xiubo Li
     [not found]   ` <e350e6e7-22a2-69de-258f-70c2050dbd50@redhat.com>
2023-11-06 11:38     ` Ilya Dryomov
2023-11-06 12:14       ` Xiubo Li
2023-11-06 12:32         ` Ilya Dryomov
2023-11-06 13:02           ` Xiubo Li

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.